Identify a redundant table
I’m fine with having several tables linked together by foreign keys, but in my database, I would prefer to have all of my information in one big table! Can I do that?
This approach would yield a relation where the information about our apples and varieties are grouped together as follows:
identifier | weight | diameter | color | name_variety | price_per_kilo | maturity_date | taste |
1 | 151 g | 8.3 cm | red | Red Delicious | 3.19 | late Sept. | sweet |
2 | 169 g | 9.1 cm | green | McIntosh | 3.49 | mid Sept. | tart |
3 | 134 g | 8.0 cm | green | McIntosh | 3.49 | mid Sept. | tart |
This creates several problems:
If the price per kilo of a certain apple variety changes, we need to change the price in every single one of the rows relating to that variety in question. If an information is not modified everywhere at the same time, then errors may occur.
We can't add a new row (i.e. a new variety of apples) unless we actually have apples belonging to that variety. In other words, if we discover new characteristics about a variety of apples, we have to have more apples than before in order to add a new row.
Fine, but if I happen to come across this relation (containing both apples and varieties), how would I know that I needed to divide it into two relations?
There is a rule for that! ;) It's a little complicated, but stay tuned for an example that will hopefully clarify the concept :
In a relation, if an attribute A depends uniquely on a group of attributes G (and this group of attributes is not a candidate key), it is possible to create a new relation that will contain the attributes of both A and G.
However, you must make sure that G is the minimum set (that is, an attribute cannot be removed from the G group without destroying the dependency relationship between A and G).
Reconfigure a redundant table
So here is our example ... stay tuned!
In the example above, does the price_per_kilo attribute depend solely on name_variety?
YES
And is name_variety a candidate key?
NO
This example, therefore, satisfies both of the conditions we need to relocate both price_per_kilo and name_variety to a new table.
Are we done yet? Not quite. Remember that if another attribute B also depends uniquely on G, then it also has to be moved to the newly created relation.
Do maturity and taste also depend uniquely on name_variety?
YES
We can, therefore, safely move these new attributes to the new table as well.
identifier (name_variety) | price_per_kilo | maturity | taste |
Red Delicious | 3.19 | late Sept. | Sweet |
Macintosh | 3.49 | mid Sept. | Tart
|
(...) | (...) | (...) | (...) |
Voilà! A beautiful new relation with name_variety as the primary key.
Go Further: Functional Dependencies
Yes, but how can I know if "an attribute depends on a set of attributes"? It’s not always clear!
This is true.
Say we have an apple relation. Imagine a hypothetical apple whose characteristics you don’t know. Or, rather, whose characteristics you know only some of. Let’s call these attributes G. Now, let’s ask ourselves whether an attribute A depends (or not) on G. The question is as follows:
If we only know G, and we have an apple relation at our disposal, can we find A? Let's see...
Suppose I tell you that I have an apple hidden behind my back. I also give you a piece of paper on which the entire apple relation is printed.
Now, I tell you that this apple belongs to the Red Delicious variety ( G = [name_variety] ). Can you tell me whether it tastes sweet or tart ( A = taste )?
Can you provide details about the weight of this same apple?
Now, what if I tell you that the apple's identifier is 1 and its color is red (G=[identifier, color]). Can you now provide details about the weight of this apple?
Summary
Data redundancy should be avoided in databases.
If a table contains redundant data, then you should divide it into multiple tables in order to store it in a database.
There is a rule for identifying and reconfiguring a redundant table.