Up to now, you've seen that UML classes translate into tables. Weāre going to look closer at these tables to analyze their component parts (i.e., rows and columns).
To begin with, youāll see how the relational model allows you to identify a row within a table.
Do you remember that you need to ensure that each data item exists only once within the database to avoid redundancy?
Only having each data item exist once implies that there must not be two rows describing the same production in a table (e.g., Ā ProductionĀ ).
ProductionĀ will contain exactly the same number of rows as the number of productions with movie scenes stored in your CSV file.
But how do you ensure that two rows in the Ā productionĀ table refer to two different productions? In other words, how do you distinguish between two rows?
This question is fundamental to identity. You are unique - and your identity guarantees it. Two people canāt have the same identity, and one person canāt have more than one (legally ā but thatās another story!).
In an RDBMS, every row must have an identifier.
A row identifier is called the primary key.
What does this mean?
If you're sure that you can retrieve any database row by knowing the value of some of its attributes, it means that these attributes make up the primary key.
This diagram will help to illustrate the idea:

Letās take a more concrete example. The concept of identification is essential for postal delivery workers who need to identify residences with certainty to deliver the mail. Imagine a table that holds residential details with the following columns:
Type of residence (house, apartment, etc.).
The number of bedrooms.
Residence number (e.g., apartment number 2).
Property number (e.g., 523).
Street name (e.g., Montrose Ave.).
Town/City (e.g., Chicago).
Zip code.
County.
Country.
This tableās primary key will be the minimum set of columns you can give to the postal delivery worker to ensure that they can deliver the mail to the correct residence.
If you just give them the Ā street nameĀ , Ā zip codeĀ , and Ā town/cityĀ (i.e., Montrose Ave, Chicago, IL 60618), they will have difficulty delivering the mail because Montrose Avenue has many residential properties.
If you give them all these attributes, they can deliver the mail correctly. However, the set of columns 1 to 9 isn't considered minimal: you donāt need to know the number of bedrooms to deliver mail. Therefore, the Ā number of bedroomsĀ doesn't need to form part of the primary key.
A good primary key for this table would include attributes 3 to 9. Even if you don't need to know the Ā residence numberĀ for a house, it's still important to have it in the tableās primary key to identify the different apartments within the same building.
In other words, you can't use ( Ā street nameĀ , Ā zip codeĀ ) as a primary key because rows ā Ā 4300 N Lincoln AveĀ , Ā 60618Ā ā and ā Ā 4340 N Lincoln Ave, 60618Ā ā both have the same values for ( Ā street nameĀ , Ā zip codeĀ ) i.e., Ā Lincoln AveĀ and Ā 60618Ā . This set of attributes would violate the uniqueness constraint.

Okay, now you're ready to determine the primary keys for your model. Hereās the model in its current form:

ProductionCompany and directorThere isnāt much choice for theĀ ProductionCompanyĀ and Ā directorĀ tables. So use the Ā nameĀ attribute for both tables. Of course, this assumes that there aren't two production companies or directors with the same name. Itās doubtful, so consider the Ā nameĀ a good primary key for these two tables.
locationThere are three possible primary keys for the Ā locationĀ table:
Attribute set ( Ā locationĀ ).
Attribute set ( Ā locationĀ , Ā zip codeĀ ).
Hypothetical attribute set ( Ā locationĀ , Ā zip codeĀ , Ā latitudeĀ , Ā longitudeĀ ).
You would have to assume that Ā locationĀ includes at least the street name and the town/city in the first instance. The uniqueness constraint would be met in this case because one town/city can't have two street names. Even if there are towns or cities with the same name, the street name allows you to distinguish them. Assuming this condition is met, you could identify on which road a particular production was filmed with this primary key.
However, this solution has certain disadvantages. First of all, you canāt guarantee the people entering the data will put the full address or enter it in a standard way. For example, nothing forces them to provide the town/city name. They could enter the zip code, but that's already present in the Ā zip codeĀ attribute. Hello, redundancy! Also, the user can type an address differently (e.g., 4772 N Lincoln St vs. 4772 N Lincoln Street) making it difficult to spot duplicates!Ā
The second solution somewhat circumvents this problem. The user has to enter the zip code to distinguish locations (i.e., an address in the suburb of San Francisco has the same address as one in the city).
The third (hypothetical) key introduces latitude and longitude. This data exists but is not stored in the example shown. You might think this would be a good primary key and consider including it. However, this one has a problem of its own: precision. Should you consider a latitude-longitude pair (48.87512, 2.348979) as different from the pair (48.87506, 2.348963), even though these two locations are only three feet apart on the same street and essentially describe the same filming location?
productionĀ and Ā shootingĀ Itās impossible (for the moment) to identify a primary key for the Ā productionĀ and Ā shootingĀ tables.
In the UML composition chapter , we said different films could have the same name. Therefore, if you want to identify a film uniquely, you need its title and the production company name, which proves a dependency between these two classes.Ā
The problem is that the Ā productionĀ table does not currently contain an attribute holding the production companyās name. You canāt have a primary key without this attribute.
There are only two attributes in the Ā shootingĀ table: Ā start dateĀ and Ā end dateĀ .
Two shooting events (for two different films in two different locations) might start and end on exactly the same dates, which means you canāt make it unique.
Attributes that make up the primary key in diagrams are most commonly shown in the central section of the tableās box, while all the other attributes are in the bottom section. But sometimes, you will come across formats that are not the same as those in this course. For example, the primary key might be in bold or underlined. It might also be indicated against the attributes using the abbreviation āPK.āĀ
![Diagram of the relational model with primary keys indicated using [PK]](https://user.oc-static.com/upload/2022/04/13/16498870964727_image10.png)
Sometimes, itās impossible to find a primary key within the attributes of a table. Other times, the primary key is too complicated (e.g., too many attributes). In these cases, you can create an artificial key (also known as a surrogate key).
For example, your Social Security and Passport numbers are artificial keys that enable a U.S. citizen to be identified.
Letās take this extract from your CSV file as an example:

If you transform the data to fit your relational model (with two tables:Ā Ā productionĀ and Ā production_companyĀ ,Ā with a foreign key Ā prod_companyĀ ) by adding artificial keys, you could be tempted to cut the file in half:Ā

Remember that the primary key Ā (name)Ā in the Ā ProductionCompanyĀ table is a uniqueness constraint, even when choosing an artificial key, so you must ensure that Ā (name)Ā doesn't contain any duplicates. If you do this, you will find the right solution:

However, once you get to the PDM (the third step in the modeling process), you will often use artificial keys systematically because:Ā
Thereās always a risk that a natural primary key could lose its uniqueness (e.g., if a single production company produces two films with the same name).
If the value of a natural key is modified (e.g., if a production company changes its name), then all the foreign keys that reference it would also need to change.
A tableās primary key is the minimum set of columns that allow a row to be uniquely identified.
A primary key must have a uniqueness constraint.
In some cases, itās impossible to find a primary key within the existing columns. In other cases, the primary key is too complicated. In these cases, create an artificial primary key.
You only use an artificial primary key in the LDM for specific cases and take certain precautions.
Youāve identified your primary keys. Well done! But your tables are still separate. In the next chapter, youāll see how to link them together.