What Does it Mean to Load Data?
Python can be used to read data from a variety of places, including databases and files. Two file types that are often used are .txt and .csv. You can import and export files using built-in Python functionality or Python's CSV library. We’ll go through both options!
Load Data With Built-In Python Functions
To both read from and write to a file, you can use the built-in function open()
, which takes in two parameters: file name and mode.
File name: the directory path to the file that you want to read or write to.
Mode: the mode you want to use for the file. The main options are:
Read:
"r"
Write:
"w"
Append:
"a"
Read and write:
"r+"
To create a new file called “hello.txt” and write “Hello, world!” to it, use the following code:
file = open("hello.txt", "w")
file.write("Hello, world!")
file.close()
You can also use the with
statement to read a file line by line:
with open("file.txt") as f:
for line in f:
#do something with line
print(line)
This will print out the input file line by line.
The CSV Library
While the open()
method can read and write to both .txt and .csv files, you can also use Python’s CSV library to read from and write to CSV files. This library gives you extra functionality.
When using the CSV library, you also need to use the open()
function to open the file, but then you can pass the file to the CSV reader()
or writer()
methods to read from or write to a file.
Read External Files
Let's start with reading external files. Let’s say you have a CSV file named favorite_colors.csv that looks like the following:
name,occupation,favorite_color
Jacob Smith,Software Engineer,Purple
Nora Scheffer,Digital Strategist,Blue
Emily Adams,Marketing Manager,Orange
The .reader()
method will take all the text in a CSV, parse it line by line, and convert each row into a list of strings. You can use different delimiters to decide how to break up each row, but the most common one is a comma. The code snippet below reads the CSV file and prints each row.
import csv
with open('favorite_colors.csv') as file:
reader = csv.reader(file, delimiter=',')
for row in reader:
print row
The output will be the following:
['name', 'occupation', 'favorite_color']
['Jacob Smith', 'Software Engineer', 'Purple']
['Nora Scheffer', 'Digital Strategist', 'Blue']
['Emily Adams', 'Marketing Manager', 'Orange']
While this approach can be helpful sometimes, it treats the header row the same as any other. A more useful method for reading CSVs while recognizing headers to identify the columns is the DictReader()
method. This method knows the first line is a header and saves the rest of the rows as dictionaries with each key as the column name and the value as the column value.
The code below shows how to use the DictReader()
method.
import csv
with open('favorite_colors.csv') as file:
reader = csv.DictReader(file, delimiter=',')
for row in reader:
print(row['name'] + " works as a " + row['occupation'] + " and their favorite color is " + row['favorite_color'])
The output for this will be:
Jacob Smith works as a Software Engineer and their favorite color is Purple
Nora Scheffer works as a Digital Strategist and their favorite color is Blue
Emily Adams works as a Marketing Manager and their favorite color is Orange
Much more useful, right?
Write to External Files
To understand writing to external files, let’s go back to our web scraping example. We’ve already written the code to extract and transform the data from the UK government services and information website. We have all the titles and descriptions saved as lists of strings. Now we can use the .writer()
and .writerow()
functions to write the data into a CSV file.
#Create list for the headers
headers = ["title", "description"]
#Open a new file to write to called ‘data.csv’
with open('data.csv', 'w', newline='') as csvfile:
#Create a writer object with that file
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(headers)
#Loop through each element in titles and descriptions lists
for i in range(len(titles)):
#Create a new row with the title and description at that point in the loop
row = [titles[i], descriptions[i]]
writer.writerow(row)
And there you have it! Your very own file populated with data scraped from the web. Follow along with the screencast below to go through each line.
Now download the code by clicking here and run it on your own in your editor. Take the time to understand what each line does, and feel free to revisit the screencast if needed.
You may have noticed that some instructions in this code repeat. Try and separate some of this functionality into functions on your own. Once you’ve given it a go, check out this file to compare how I’ve done it, but there is no right or wrong answer.
Level-Up: Create, Read, and Write to Files
Context:
Suppose you are an HR manager and you need to create a file containing the salaries of your employees. We will read from a CSV file the names of the employees and the hours worked, then create another CSV file with their calculated salaries.
Instructions:
Write a script to read the contents of our file
input.csv
in the following format:
name | hours_worked |
Tavin Quickshadow | 35 |
Elara Sunleaf | 41 |
Mirelle Starwhisper | 40 |
Create a new CSV file named
output.csv
which should have the following format:
Salaries are calculated using the formula hours_worked * 15
. (Note: Keys must be lowercase for the tests to pass)
name | salary |
Tavin Quickshadow | 540 |
Elara Sunleaf | 615 |
Mirelle Starwhisper | 600 |
Once you have completed the exercise, you can run the following command in the VS code terminal pytest tests.py
Let’s Recap!
You load data by reading from or writing to a file.
You can read and write to files using Python’s built-in
open()
method.The
.writer()
and.DictReader()
methods from Python's CSV library make it even easier to work with CSV files in your Python code.The main modes of writing files are
“r”
for read,“w”
for write, and“a”
for append.
Awesome! You’ve learned how to web scrape by extracting, transforming, and loading data from the web. Next, we’ll delve into the ethical concerns and challenges with web scraping.