What Does it Mean to Load Data?
Python can be used to read data from a variety of places, including databases and files. Two file types that are often used are .txt and .csv. You can import and export files using built-in Python functionality or Python's CSV library. We’ll go through both options!
Load Data With Built-In Python Functions
To both read from and write to a file, you can use the built-in function
open(), which takes in two parameters: file name and mode.
File name: the directory path to the file that you want to read or write to.
Mode: the mode you want to use for the file. The main options are:
Read and write:
To create a new file called “hello.txt” and write “Hello, world!” to it, use the following code:
file = open("hello.txt", "w")file.write("Hello, world!")file.close()
You can also use the
with statement to read a file line by line:
with open("file.txt") as f:for line in f:#do something with lineprint(line)
This will print out the input file line by line.
The CSV Library
open() method can read and write to both .txt and .csv files, you can also use Python’s CSV library to read from and write to CSV files. This library gives you extra functionality.
When using the CSV library, you also need to use the
open() function to open the file, but then you can pass the file to the CSV
writer() methods to read from or write to a file.
Read External Files
Let's start with reading external files. Let’s say you have a CSV file named favorite_colors.csv that looks like the following:
Jacob Smith,Software Engineer,Purple
Nora Scheffer,Digital Strategist,Blue
Emily Adams,Marketing Manager,Orange
.reader() method will take all the text in a CSV, parse it line by line, and convert each row into a list of strings. You can use different delimiters to decide how to break up each row, but the most common one is a comma. The code snippet below reads the CSV file and prints each row.
import csvwith open('favorite_colors.csv') as file:reader = csv.reader(file, delimiter=',')for row in reader:print row
The output will be the following:
['name', 'occupation', 'favorite_color']['Jacob Smith', 'Software Engineer', 'Purple']['Nora Scheffer', 'Digital Strategist', 'Blue']['Emily Adams', 'Marketing Manager', 'Orange']
While this approach can be helpful sometimes, it treats the header row the same as any other. A more useful method for reading CSVs while recognizing headers to identify the columns is the
DictReader() method. This method knows the first line is a header and saves the rest of the rows as dictionaries with each key as the column name and the value as the column value.
The code below shows how to use the
import csvwith open('favorite_colors.csv') as file:reader = csv.DictReader(file, delimiter=',')for row in reader:print(row['name'] + " works as a " + row['occupation'] + " and their favorite color is " + row['favorite_color'])
The output for this will be:
Jacob Smith works as a Software Engineer and their favorite color is PurpleNora Scheffer works as a Digital Strategist and their favorite color is BlueEmily Adams works as a Marketing Manager and their favorite color is Orange
Much more useful, right?
Write to External Files
To understand writing to external files, let’s go back to our web scraping example. We’ve already written the code to extract and transform the data from the UK government services and information website. We have all the titles and descriptions saved as lists of strings. Now we can use the
.writerow() functions to write the data into a CSV file.
#Create list for the headersheaders = ["title", "description"]#Open a new file to write to called ‘data.csv’with open('data.csv', 'w', newline='') as csvfile:#Create a writer object with that filewriter = csv.writer(csvfile, delimiter=',')writer.writerow(headers)#Loop through each element in titles and descriptions listsfor i in range(len(titles)):#Create a new row with the title and description at that point in the looprow = [titles[i], descriptions[i]]writer.writerow(row)
And there you have it! Your very own file populated with data scraped from the web. Follow along with the screencast below to go through each line.
Now download the code by clicking here and run it on your own in your editor. Take the time to understand what each line does, and feel free to revisit the screencast if needed.
You may have noticed that some instructions in this code repeat. Try and separate some of this functionality into functions on your own. Once you’ve given it a go, check out this file to compare how I’ve done it, but there is no right or wrong answer.
Level-Up: Create, Read, and Write to Files
Time for some practice in this exercise! 😁
Then, check your work by:
submitting your project,
navigating back to the course team page (click on the exercise title on the top left of your screen, then the team name "OCpythonbasics")
clicking on "Fork the solution".
Level-Up, Bonus Round: Work With CSV Files
Here's a chance to get more comfortable with CSV Files in the following interactive activity. 😁 Don't forget to check your work in the last interactive exercise of this course.
You load data by reading from or writing to a file.
You can read and write to files using Python’s built-in
.DictReader()methods from Python's CSV library make it even easier to work with CSV files in your Python code.
The main modes of writing files are
“w”for write, and
Awesome! You’ve learned how to web scrape by extracting, transforming, and loading data from the web. Next, we’ll delve into the ethical concerns and challenges with web scraping.