• 6 hours
  • Easy

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 4/11/24

Load Data With Python

What Does it Mean to Load Data?

Python can be used to read data from a variety of places, including databases and files. Two file types that are often used are .txt and .csv. You can import and export files using built-in Python functionality or Python's CSV library. We’ll go through both options!

A window containing code beside a CSV file.
Loading data means transferring data from files to code or vice versa.

Load Data With Built-In Python Functions

To both read from and write to a file, you can use the built-in function  open(), which takes in two parameters: file name and mode

File name: the directory path to the file that you want to read or write to. 

Mode: the mode you want to use for the file. The main options are:

  • Read:  "r"

  • Write:  "w"

  • Append:  "a"

  • Read and write:  "r+" 

To create a new file called “hello.txt” and write “Hello, world!” to it, use the following code:

file = open("hello.txt", "w")
file.write("Hello, world!")
file.close()

 You can also use the with statement to read a file line by line:

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)

This will print out the input file line by line.

The CSV Library

While the  open()  method can read and write to both .txt and .csv files, you can also use Python’s CSV library to read from and write to CSV files. This library gives you extra functionality.

When using the CSV library, you also need to use the  open()  function to open the file, but then you can pass the file to the CSV  reader()  or  writer()  methods to read from or write to a file.

Read External Files

Let's start with reading external files. Let’s say you have a CSV file named favorite_colors.csv that looks like the following: 

name,occupation,favorite_color
Jacob Smith,Software Engineer,Purple
Nora Scheffer,Digital Strategist,Blue
Emily Adams,Marketing Manager,Orange

The  .reader()  method will take all the text in a CSV, parse it line by line, and convert each row into a list of strings. You can use different delimiters to decide how to break up each row, but the most common one is a comma. The code snippet below reads the CSV file and prints each row.

import csv

with open('favorite_colors.csv') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        print row

The output will be the following:

['name', 'occupation', 'favorite_color']
['Jacob Smith', 'Software Engineer', 'Purple']
['Nora Scheffer', 'Digital Strategist', 'Blue']
['Emily Adams', 'Marketing Manager', 'Orange']

While this approach can be helpful sometimes, it treats the header row the same as any other. A more useful method for reading CSVs while recognizing headers to identify the columns is the  DictReader()  method. This method knows the first line is a header and saves the rest of the rows as dictionaries with each key as the column name and the value as the column value.

The code below shows how to use the  DictReader()  method. 

import csv

with open('favorite_colors.csv') as file:
    reader = csv.DictReader(file, delimiter=',')
    for row in reader:
        print(row['name'] + " works as a " + row['occupation'] + " and their favorite color is " + row['favorite_color'])

The output for this will be:

Jacob Smith works as a Software Engineer and their favorite color is Purple
Nora Scheffer works as a Digital Strategist and their favorite color is Blue
Emily Adams works as a Marketing Manager and their favorite color is Orange

Much more useful, right?

Write to External Files

To understand writing to external files, let’s go back to our web scraping example. We’ve already written the code to extract and transform the data from the UK government services and information website. We have all the titles and descriptions saved as lists of strings. Now we can use the  .writer()  and  .writerow()  functions to write the data into a CSV file. 

#Create list for the headers
headers = ["title", "description"]
 
#Open a new file to write to called ‘data.csv’
with open('data.csv', 'w', newline='') as csvfile:
    #Create a writer object with that file
    writer = csv.writer(csvfile, delimiter=',')
    writer.writerow(headers)
    #Loop through each element in titles and descriptions lists
    for i in range(len(titles)):
        #Create a new row with the title and description at that point in the loop
        row = [titles[i], descriptions[i]]
        writer.writerow(row)

And there you have it! Your very own file populated with data scraped from the web. Follow along with the screencast below to go through each line.

Now download the code by clicking here and run it on your own in your editor. Take the time to understand what each line does, and feel free to revisit the screencast if needed.

You may have noticed that some instructions in this code repeat. Try and separate some of this functionality into functions on your own. Once you’ve given it a go, check out this file to compare how I’ve done it, but there is no right or wrong answer.

Level-Up: Create, Read, and Write to Files

Context:

Suppose you are an HR manager and you need to create a file containing the salaries of your employees. We will read from a CSV file the names of the employees and the hours worked, then create another CSV file with their calculated salaries.

Instructions:

  • Write a script to read the contents of our file  input.csv  in the following format:

name

hours_worked

Tavin Quickshadow

35

Elara Sunleaf

41

Mirelle Starwhisper

40

  • Create a new CSV file named  output.csv  which should have the following format:

Salaries are calculated using the formula  hours_worked * 15. (Note: Keys must be lowercase for the tests to pass)

name

salary

Tavin Quickshadow

540

Elara Sunleaf

615

Mirelle Starwhisper

600

Once you have completed the exercise, you can run the following command in the VS code terminal  pytest tests.py

Let’s Recap!

  • You load data by reading from or writing to a file.

  • You can read and write to files using Python’s built-in open()method.

  • The .writer()  and  .DictReader()  methods from Python's CSV library make it even easier to work with CSV files in your Python code. 

  • The main modes of writing files are  “r”  for read,  “w”  for write, and  “a”  for append. 

Awesome! You’ve learned how to web scrape by extracting, transforming, and loading data from the web. Next, we’ll delve into the ethical concerns and challenges with web scraping.

Ever considered an OpenClassrooms diploma?
  • Up to 100% of your training program funded
  • Flexible start date
  • Career-focused projects
  • Individual mentoring
Find the training program and funding option that suits you best
Example of certificate of achievement
Example of certificate of achievement