• 15 hours
  • Easy

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 1/23/20

Check Your Knowledge About Cleansing a Dataset

Log in or subscribe for free to enjoy all this course has to offer!

Evaluated skills

  • Cleanse a Dataset
  • Question 1

    The following table shows the Rainfall (in mm) and the Average Temperature (in °C) for each city over the month of April

    City

    Country

    Rainfall (mm)

    Average Temperature (°C)

    Vancouver

    Canada

    100.8

    9.8

    Bogota

    Colombia

    90.1

    15.2

    Queenstown

    New Zealand

    67

    11

    Paris

    France

    25

    32

    Mombasa

    Kenya

    150

    27

    Taking a look at this, you notice that the average temperature for Paris, France, seems abnormally high! In previous years, the average temperature in April has been much lower. You are unsure whether or not the value is erroneous. 

    What are your options?

    • Delete the value and impute another value 

    • Decide that the statistical treatment you are using is robust (i.e. not sensitive to outliers), and keep the value.

    • Decide that the statistical treatment is not robust, and delete the value. 

    • All of the above 

  • Question 2

    Examine this block of code. What does it do?

    import pandas as pd
    STATUS_VALUES = ["GUEST","EMPLOYER","EMPLOYEE"]
    df = pd.read_csv("mylittlecompany.csv")
    def process(value):
    if value not in STATUS_VALUES:
    return "GUEST"
    else:
    return value
    df["status"] = df["status"].map(process)

     

    • They assign the value “GUEST” to individuals that do not have the value “EMPLOYED” or “EMPLOYEE” for the “status” variable.

    • They assign the value “GUEST” only to individuals that have no value for the “status” variable.

  • Question 3

    Which column in this table contains an error of irregularity?

    identifier 

    age

    first name

    score

    2873

    27

    Leila 

    39 points

    1028

    999

     

    45 points

    3892

    78

    Samir

    89%

    8273

    12

    Cindy

    24 points

    • Identifier

    • Age

    • First Name

    • Score