Last updated on 1/23/20
Check Your Knowledge About Cleansing a Dataset
- Cleanse a Dataset
The following table shows the Rainfall (in mm) and the Average Temperature (in °C) for each city over the month of April.
Average Temperature (°C)
Taking a look at this, you notice that the average temperature for Paris, France, seems abnormally high! In previous years, the average temperature in April has been much lower. You are unsure whether or not the value is erroneous.
What are your options?
Delete the value and impute another value
Decide that the statistical treatment you are using is robust (i.e. not sensitive to outliers), and keep the value.
Decide that the statistical treatment is not robust, and delete the value.
All of the above
Examine this block of code. What does it do?import pandas as pdSTATUS_VALUES = ["GUEST","EMPLOYER","EMPLOYEE"]df = pd.read_csv("mylittlecompany.csv")def process(value):if value not in STATUS_VALUES:return "GUEST"else:return valuedf["status"] = df["status"].map(process)
They assign the value “GUEST” to individuals that do not have the value “EMPLOYED” or “EMPLOYEE” for the “status” variable.
They assign the value “GUEST” only to individuals that have no value for the “status” variable.
Which column in this table contains an error of irregularity?