- 8 hours
- Medium
Free online content available in this course.
course.header.alt.is_video
course.header.alt.is_certifying
Got it!Last updated on 8/5/21
Apply Your Feature Engineering Skills to the Titanic Dataset
Evaluated skills
- Prepare data with feature engineering techniques
Description
In this exercise, you will analyze the Titanic dataset from Kaggle.
The following is a description of the features in the data:
Feature | Definition | Key |
---|---|---|
survival | Survival | 0 = No, 1 = Yes |
pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |
sex | Sex | |
Age | Age in years | |
sibsp | # of siblings / spouses aboard the Titanic | |
parch | # of parents / children aboard the Titanic | |
ticket | Ticket number | |
fare | Passenger fare | |
cabin | Cabin number | |
embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
A copy of the dataset can be found on the course GitHub repository as titanic.csv. If you download the data from Kaggle, just use the file called train.csv. Let's start by loading the dataset and taking a quick peek at the head:
import pandas as pd
df = pd.read_csv("titanic.csv")
df.head()
Question 1
Use the
isnull()
function to find the columns containing nulls. Which feature contains the most nulls?Age
Cabin
Fare
Embarked
Question 2
The Pclass feature is the ticket class. Use the
unique()
andvalue_counts()
functions to understand the feature. What would be a good strategy for processing it?It's a continuous value feature, so keep as an integer.
It's a categorical feature, so convert to text such as Class1, Class2, and Class3.
It's a categorical feature, so use one-hot encoding to convert to dummy variables.
There are too few distinct values, and therefore of little value. Delete it.
Question 3
Use binning to split the Fare feature into four equal bands based on the quartile boundaries. Call the bands Q1, Q2, Q3, and Q4. Use the
describe()
function to determine the quartile boundaries, then use cut() to create the bins. Finally, group on the new binned category, and find the range of values in each band.
What is the minimum and maximum Fare in band Q1?0 and 7.8958
0 and 7.9104
4.0125 and 7.8958
0 and 7.9250
- Up to 100% of your training program funded
- Flexible start date
- Career-focused projects
- Individual mentoring