- 6 hours
- Easy
Free online content available in this course.
course.header.alt.is_video
course.header.alt.is_certifying
Got it!Last updated on 1/30/24
Check Your Knowledge About Cleansing a Dataset
Evaluated skills
- Cleanse a Dataset
Question 1
The following table shows the Rainfall (in mm) and the Average Temperature (in °C) for each city over the month of April.
City
Country
Rainfall (mm)
Average Temperature (°C)
Vancouver
Canada
100.8
9.8
Bogota
Colombia
90.1
15.2
Queenstown
New Zealand
67
11
Paris
France
25
32
Mombasa
Kenya
150
27
Taking a look at this, you notice that the average temperature for Paris, France, seems abnormally high! In previous years, the average temperature in April has been much lower. You are unsure whether or not the value is erroneous.
What are your options?
Delete the value and impute another value
Decide that the statistical treatment you are using is robust (i.e. not sensitive to outliers), and keep the value.
Decide that the statistical treatment is not robust, and delete the value.
All of the above
Question 2
Examine this block of code. What does it do?
import pandas as pd STATUS_VALUES = ["GUEST","EMPLOYER","EMPLOYEE"] df = pd.read_csv("mylittlecompany.csv") def process(value): if value not in STATUS_VALUES: return "GUEST" else: return value df["status"] = df["status"].map(process)
They assign the value “GUEST” to individuals that do not have the value “EMPLOYER” or “EMPLOYEE” for the “status” variable.
They assign the value “GUEST” only to individuals that have no value for the “status” variable.
Question 3
Which column in this table contains an error of irregularity?
identifier
age
first name
score
2873
27
Leila
39 points
1028
999
45 points
3892
78
Samir
89%
8273
12
Cindy
24 points
Identifier
Age
First Name
Score
- Up to 100% of your training program funded
- Flexible start date
- Career-focused projects
- Individual mentoring