We’re almost at the end of the course. You've worked hard to create your CDM and LDM.
It would be mean (but true) to say at this point that all you've done is a couple of drawings; your database doesn't exist yet! But thankfully, by the end of this chapter, you’ll have an actual database that you can query using SQL.
Adapt Your Relational Model and Create the PDM
You’ve created a high-quality relational model, so well done!
But the data you have available isn’t particularly high quality, and you're about to find out why. So you first need to go through a process of data cleansing. Some data cleansing activities will be simple, while others are complex. We’ll leave those aside for now because:
The
Director
column in your file is multivalued (i.e., some rows contain pairs of directors). However, thedirector
table can only have one director per row in your relational model. You either need to check thousands of rows by hand and separate the pairs to resolve this or create a formula that will do it automatically. Either way, it's too tricky.For series productions, the season number is often missing. When this happens, you need to look at the filming date for each series, which would take too long.
Instead, we’re going to change the relational model. It’s a shame because it will impact the quality, but it will make it much simpler for you.
To resolve the first point, allow director
to hold a pair of directors in the same row. So, the name attribute in the director will no longer be atomic, which will violate the 1NF. It also means that the association between shooting_period
and director
will turn into a one-to-many relationship (instead of many-to-many). To resolve the second point, remove the season
attribute from the production
table, meaning that you’ll lose this piece of information.
Next, you need to develop the physical data model (PDM). In an RDBMS, this is written in SQL. The PDM is highly dependent on the chosen RDBMS because each has its own variant of SQL and different features. The PDM flows directly from the relational model (LDM).
So, here’s your adapted relational model after correcting the issues mentioned above and adding artificial keys:
Here’s the SQL code for the PDM, which flows from the relational schema and is compatible with the SQLite3 RDBMS.
Adapt Your CSV to Fit the Relational Model
You're now going to transform your Excel file to match the latest version of the relational model provided above.
Even though Excel and LibreOffice Calc are very similar, there are a few differences you need to know before watching the videos. If you use LibreOffice, follow the provided document showing you how to perform the steps described in the videos. If you use Excel, you’ll need to create a function to normalize the text in your file before you start.
In this first video, you’ll learn how to create the location
table. First, you’ll see how to sort your table by normalizing the content, deleting duplicates, and adding the primary and foreign keys.
We’ll use the same method to create the production
and production_company
tables in the second video.
Import Your Data Into SQLite3
All done! The worksheets in your spreadsheet will match your relational model. So, now you can export each worksheet into a CSV file. The method for exporting to CSV is described in the guidance notes. If you haven't completed the previous step, download the CSVs.
Computer programs use RDBMS. But here, you want to talk to the SQLite3 RDBMS. Therefore, you need some software to act as an interface between SQLite3 and you. The most well-known software of this kind is SQLiteStudio, which you can download here. No installation is necessary!
Open SQLiteStudio and then create a database file by clicking on Database
and then Add a database
(or use the keyboard shortcut CTRL + O
). Then click on the green button:
Next, name your future database file (e.g., filming_db
). Then click OK.
Your database has been created. Its name appears on the left-hand pane—double-click to open it. Create the table structures by running the PDM SQL code provided earlier in the chapter. Next, open the SQL code editor by clicking on Tools
, then Open SQL Editor
, or use the shortcut Alt + E
. A blank screen will appear, and you can copy your SQL code in here:
This code contains six separate statements separated by ;
. Select all six statements ( CTRL+A
), and then run them by clicking on the blue Execute query
button or the F9
key. Because this code generates the six tables in your database, it will only work the first time you run it because it’s impossible to create the same table a second time without deleting the first one.
Now that your table structures are in place, you need to populate them with your data. To do this, click Tools
, then Import
. A window will open and ask you which table will receive the imported data. Start with the location table. Click on Next
, and this gives you the option to select your location.csv
file:
Use the same method to import data to the director
, production_company
, production
, shooting_period
, and assc_period_loc tables
(in this order).
So there you have it! You can now return to the SQL editor you used before, input the SQL queries provided by your colleagues, and display the result. You could also modify the queries to search for different films or other locations. Note that if you watch the series Sense8, the query about the Eiffel Tower gives a spoiler for the final episode. Sorry about that!
Let’s Recap!
To implement a relational database using SQLite3, you need to:
Adapt your relational model to create the PDM.
Adapt your CSV to fit the relational model.
Import your data into SQLite3.
You have reached the end of this course. I hope you enjoyed it. You now know how to model a relational database using the three required steps: CDM, LDM, and PDM. In addition, you can now provide the data that the app developers will need.