• 20 hours
  • Medium

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 11/25/19

Optimize with normal forms

Log in or subscribe for free to enjoy all this course has to offer!

Edgar Codd, the father of relational databases developed the Normal Forms in 1970 and 1971 as a way to optimize a database's schema. In this chapter we are going to cover the First Normal Form (1NF), the Second Normal Form (2NF) and the Third Normal Form (3NF). There are a whole mess more of them, but the first 3 are the basic ones.

The Normal Forms are a series of requirements. If your schema's organization doesn't conflict with those requirements then you're cool, at least as far as that specific normal form goes.

The requirements make sure that you don't store the same information twice in your database and that your data's integrity is high. And what does that mean? Well let's jump to the 1st Normal Form!

1st Normal Form (1NF)

The First Normal Form is all about Atomicity! (Try saying that 10 times fast).

And this means that we have to make sure that each database field contains only atomic parts. Atomic parts are pieces that cannot be split into smaller pieces.

Repeating groups are not allowed either!

Is this an atomic creature or is it composed of multiple parts?
Is this creature atomic?

This creature is composed of multiple parts - so it isn't atomic.
This creature is composed of many "eyeballish" parts  - so it isn't atomic.

In the context of a database, saving a list in a single field is a no-no because a list is composed of multiple parts.

The first 1NF check is to make sure that you are not saving lists in your fields.

For example, this table called  AptOwner is 1NF compliant:

aptOwnerID

firstName

lastName

email

1

Tony

Sillypuddy

bounce@floor.com

2

Phil

Playdough

green@blue.com

3

Mac

Potatohead

starch@fries.com

But if Tony Sillypuddy wants to add another email address and we add it like this:

aptOwnerID

firstName

lastName

email

1

Tony

Sillypuddy

bounce@floor.com, copy@comics.com

2

Phil

Playdough

green@blue.com

3

Mac

Potatohead

starch@fries.com

Then we are not 1NF compliant because we have a list with 2 email addresses in a single field.

We could try to solve the problem like this:

aptOwnerID

firstName

lastName

email

email2

1

Tony

Sillypuddy

bounce@floor.com

copy@comics.com

2

Phil

Playdough

green@blue.com

 

3

Mac

Potatohead

starch@fries.com

 

I've added another column to the database for the second email address, but this second column is what is known as a repeating group and repeating groups are not allowed!

You can tell if a column is in a repeating group if there are multiple columns called the same thing, just with a number tacked on the end. 

The most stable way of solving this problem is to split out the email into another table, making a one-to-many relationship between the aptOwner table and the email  table.

Splitting out the email addresses to their own table.
Splitting out the email addresses to their own table.

A very common (and effective) solution to fixing a normalization issue is to create a new table for the offending member. With this organization we don't care how many emails an owner has, it won't change the schema.

2nd Normal Form (2NF)

2NF concerns the relationships between your columns.

This 2NF only is an issue for tables with a composite primary key

The commandment of 2NF states that, "Any non-key field should be dependent on the entire primary key."

In the condominium's domain, suppose there is a club of dog owners and they have meetings where the dogs can show off their tricks. At each meeting, a dog is only allowed to perform once.

A table for this group could look like this:

I've named the table DogTrickClub .

dogID

date

dogName

trick

difficulty

attendance

2

2/4/17

Fluffy

roll over

2

12

4

2/4/17

Spot

plays dead

3

8

2

2/10/17

Fluffy

jumps through a hoop

5

3

5

2/10/17

Piggie

 roll over

2

16

Here's the  Dog  table, the source for the  dogID :

dogID

ownerID

name

breed

bites

1

33

Rummy

St. Bernard

0

2

18

Fluffy

Standard Poodle

1

3

4

Grandpa

Bull Terrier

0

4

71

Spot

French Mastiff

1

5

21

Piggie

Havanese

0

The primary key of the DogTrickClub table has been set to be a composite of dogID and  date.  So the combination of  dogID and  date makes for a unique value (which is what a primary key needs to be).

If we look at the row of the  DogTrickClub  table where the  dogID is 2 and the  date is 2/4/17, we see that the  dogName is Fluffy. 

Fluffy's name is not going to change on another date. Her name is only related to her  dogID.

This is in violation of 2NF!

This violation of 2NF reveals an important problem with our schema. We have the dog's names stored in 2 locations: in the  Dog  table and in the  DogTrickClub  table. 

If the value changes on one place, it won't necessarily be changed in the other.

So if Fluffy's name gets changed to Fluppy, to keep the database accurate, this change would need to happen one time in the  Dog  table and one every line of the  DogTrickClub  table where the dog formerly known as Fluffy appeared.

For effective information management (which is another way of saying, "to reduce screw-ups") there should be only one place that the dog's name gets changed and one place to ask for information about that one thing.

So let's change the table so that it is 2NF compliant by just cutting out the  dogName:

dogID

date

trick

difficulty

2

2/4/17

roll over

2

4

2/4/17

play dead

3

2

2/10/17

jump through a hoop

5

5

2/10/17

roll over

2

To get a dog's name, we'd query the  Dog table with the  dogID

The rest of the information in this updated  DogTrickClub table is dependent on what happened that specific night.

3rd Normal Form (3NF)

In officialese, 3NF test checks that, "No non-key field is dependent on another non-key field."

You violate 3NF if you can figure out a field's value by looking across the row.

Let's look at the  DogTrick  table again.

dogID

date

trick

difficulty

2

2/4/17

roll over

2

4

2/4/17

play dead

3

2

2/10/17

jump through a hoop

5

5

2/10/17

roll over

2

So if in the next meeting of the Dog Trick Club,  dogID is 4 (whose name is Spot) performs a "roll over," the difficulty of that trick will be 2. I can figure that out by looking at what the other dogs who performed that trick received .

In other words, I can figure out a trick's difficulty by looking at its name in the trick column. So the trick's difficulty has nothing to do with its primary key, and that means 3FN VIOLATION!

How do we fix this? You guessed it, split off the tricks to another table!

So here's the 3NF compliant  DogTrickClub table:

dogId

date

trickID

2

2/4/17

1

4

2/4/17

2

2

2/10/17

3

5

2/10/17

1

And here's the new  Trick table that I split off of the  DogTrickClub table:

trickID

name

difficulty

1

roll over

2

2

play dead

3

3

jump through a hoop

5

While the tables will be a bit harder for humans to read (although there are tools to assist this issue), the information is stored without duplication and there is one definitive place to ask "how hard is it to jump through a hoop?"

But to finish this section on 3NF, there is still another hoop to jump through.

Figuring out what the value of one non-key field is from the value of another non-key field extends to math operations.

Here is the Saletable for bubblegum company:

saleID

quantity

price

total

1

3

.05

.15

2

5

.04

.20

3

1

.10

.10

In this table, I can figure out what the value of the total column will be by multiplying the price column times the quantity column.

This is another type of 3NF violation. The information needs to be stored in one place and by doing simple arithmetic, we are storing it in two places, thus making the info redundant.

Problems can occur if the total gets changed independent of the price and quantity. For example, if  the total of saleID:2 gets changed to .25 from .20, where is the error? In quantity, price or total?

We need our information to be stored in one place and one place only to keep out inconsistencies.

The Normal Forms are great tools for reviewing and analyzing your schema. They really can help prevent inconsistent and inaccurate information before it is generated.

Next, we'll go back into UML to cover some advanced topics!

Example of certificate of achievement
Example of certificate of achievement