We just saw that more than one set of attributes can be a key. But in a database, we want to agree on just one of them, once and for all. We, therefore, need to designate a primary key!
Note that every key is considered a candidate key. For an attribute to be considered a candidate key, it must fit the criteria established in the previous chapter.
Criteria for choosing a primary key
In the context of relational databases, the notion of the primary key is very important. In fact, some relational database management systems (RDBMS) will not allow you to create a table without assigning a primary key!
The selection of the primary key can actually be arbitrary. However, you will want to take into account the following criteria:
Brevity: You generally want to choose a primary key that contains the smallest possible number of attributes.
Simplicity: Try to choose "basic” attributes such as whole numbers or short character strings. Don't use special characters or differential capitalization, as that could get confusing!
Context: Context usually helps in selecting the most "logical" primary key, the one that is the most intelligible to a human being.
Yes, but what if none of the candidate keys seem simple or have a reasonable number of attributes?
Good question. In that case, you can create an artificial key.
Use an artificial key
An artificial key is an attribute that is added to the relation. This attribute has no real meaning in the domain being modeled; its sole purpose is to uniquely identify the tuples of the relation.
Remember our apple relation?
identifier | weight | diameter | color | name_variety |
1 | 151 g | 8.3 cm | red | McIntosh |
2 | 169 g | 9.1 cm | green | Granny Smith |
3 | 134 g | 8.0 cm | green | Granny Smith |
4 | 134 g | 8.0 cm | green | McIntosh |
In the domain being modeled (apples), weight, diameter, and color satisfactorily describe one aspect of our apple (here, it's its form). However, the identifier column does not inherently describe the apple; in fact, we could very well have switched identifiers, and assigned the identifier 3
(instead of 1
) to the red apple, without altering the accuracy of our representation in any way.
As you have guessed, this means that the identifier attribute is an artificial key, because it was created solely to differentiate and identify the apples!
Whether you choose to use an artificial key or not, there is an important point to bear in mind:
Key attributes can never be NULL
. This means that every time a new row is added, the key attribute value must be known. If there is any risk that these values will not be populated, then you absolutely must choose a different primary key.
For any other attribute value, however, it is possible to have a cell with an unknown or non-existent value. In this case, it is possible to specify its value as NULL
.
Represent a primary key
The attributes of a primary key are usually indicated by the notation [PK] (for Primary Key) placed after their names.
matriculation [PK] | country_matriculation [PK] | color | number | owner |
2983-AA | France | blue | 5 | ... |
1940-BD | France | green | 4 | ... |
Another way of specifying these attributes is to bold them:
matriculation | country_matriculation | color | number | owner |
2983-AA | France | blue | 5 | ... |
1940-BD | France | green | 4 | ... |
Go Further: Why Are Artificial Keys So Popular with Databases?
Artificial keys are generally quite common in databases. In some applications, they are even used systematically. Why?
Scalability: In the previous chapter, we asked whether the two attributes [ own_num_secu, own_num_vehicle ] could be a candidate key. The answer was yes, so long as our application included only owners of French nationality. However, if our application evolves and we want to expand it to include people of other nationalities, our key would no longer work. Therefore, in terms of database scalability, artificial keys are the best option.
Processing-time: In terms of performance, non-artificial keys are often not optimal when asking the RDBMS to find a row in the table.
Summary
When there are multiple possible keys in a table, these keys are referred to as candidate keys.
One of the candidate keys is designated to be the official primary key (PK).
If none of the candidate keys is simple and intelligible, an artificial key, often called the identifier, is created.
Key attributes can never be NULL.