We’ve seen the basic building blocks of entities including primary keys, foreign keys and entity types. It's time to build on those ideas to see how to keep your database intact and providing accurate data for years to come.
Entity Integrity and Why It Matters
Relational databases are built on the idea that tables are related to one another through primary and foreign keys.
The primary key is used as a unique identifier for a table. No two records or rows will have the same value for the primary key. If they do, there will be issues with the quality of the data, and it will be very difficult to maintain the database moving forward. If the primary keys are not unique, you won’t know which records really belong with which primary key.
The primary key and any component that makes it up can't be null. The DBMS enforces entity integrity by not allowing any operation such as insert or update to produce a primary key that violates those rules. Database operations that produce a null or non-unique value for the primary key will be rejected, and the DBMS will generate an error message. In most instances, it will indicate that the entity integrity has been violated and the action will not commit the bad data.
Have a look at the example below. Do you see any problems with entity integrity?
PublisherID | Name | Phone | Website | |
1 | American University in Cairo Press | NULL | NULL | |
NULL | Michigan State University Press | NULL | ||
2 | Westminster John Knox Press | 5551234 | NULL | NULL |
In this example, entity integrity is violated because there is no value for the primary key in the second row (the value for PublisherID is missing and null).
Referential Integrity and How it Relates to Entities
Referential integrity gives you the rules for how foreign keys are maintained. The foreign key must reference an existing primary key or contain a null value. Foreign keys are often not required, so it's okay to have a null value, but not in a primary key field.
The DBMS also has rules to prevent primary key values from being changed without also changing the foreign key value to match. No update or delete operation can be performed on a primary key field if the foreign key value exists.
This prevents orphan records (unusable junk data), which emerge from a foreign key having no primary key to reference.
Likewise, the DBMS would prevent an insert or update on a foreign key field that would result in an orphan record where there is no matching primary key. You could say there is a broken link between the primary and foreign key since the foreign key has no equivalent value in the table with the primary key. The only exception to this is setting a foreign key field to not contain nulls, which would force the value to be something other than null, and have a matching primary key field in another table.
Have a look at the example below. Do you see any problems with referential integrity?
Authors table
AuthorID | LastName | FirstName |
1 | Austen | Jane |
2 | Patterson | James |
4 | King | Stephen |
BookAuthor table
AuthorID | BookID |
1 | 12 |
2 | 15 |
3 | 222 |
4 | 259 |
In this example, referential integrity is violated because there is a broken link between the two tables. The foreign key AuthorID in the BookAuthor table links back to the primary key AuthorID in the Authors table, but the row containing information for the AuthorID value of 3 doesn't have anything to refer back to! This row of data is an orphan record.
Let’s Recap!
Entity integrity keeps the primary key field from being null or empty which is enforced by the DBMS.
Referential integrity ensures the foreign key values match up with the primary key values in another table.
Orphan records result from the broken links that form when there is no matching primary key for an existing foreign key.
Now that we understand entity and referential integrity, let's learn how to optimize a database design: with normalization!