- 8 hours
- Medium
Free online content available in this course.
course.header.alt.is_video
course.header.alt.is_certifying
Got it!Last updated on 8/5/21
Build a Classification Model With the Titanic Dataset
Evaluated skills
- Build a supervised learning model to address a classification task
Description
In this exercise, you will carry out a classification using the Titanic dataset from Kaggle. We used this dataset in the feature engineering exercise in Part 2.
https://www.kaggle.com/c/titanic
The following is a description of the features in the data:
Feature | Definition | Key |
---|---|---|
survival | Survival | 0 = No, 1 = Yes |
pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |
sex | Sex | |
Age | Age in years | |
sibsp | # of siblings / spouses aboard the Titanic | |
parch | # of parents / children aboard the Titanic | |
ticket | Ticket number | |
fare | Passenger fare | |
cabin | Cabin number | |
embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
You can find the code and data for this activity on the course GitHub repository.
The file titanic_clean.csv
contains data that has already been cleaned up as follows:
- Nulls in Age have been imputed with the mean age.
- The first letter of the cabin has been split to provide a new deck feature. This has then been one-hot encoded, with nulls going to a column Deck_nan.
- Sex has been one-hot encoded.
- Embarked has been one-hot encoded, with nulls going to a column Embarked_nan.
The Jupyter Notebook classification_activity.ipynb
contains a template for the code you will run. Open the template in Jupyter and write the code as guided within the template. The objective is to predict the survival of passengers based on the available features.
Then answer the following questions.
Question 1
In the correlation visualization, select the two features below that have the most significant correlation to the target feature, Survived.
Careful, there are several correct answers.Sex
Age
Pclass
Sibsp
Question 2
Which feature should be selected for the target?
Fare
Survived
Age
Sex
Question 3
After scaling with the
MinMaxScaler
, which of the following are correct statements about the data?Careful, there are several correct answers.All features have a mean of 0.5.
All features have a min of 0.
All features have a max of 1.
The features are sorted from lowest to highest importance.
- Up to 100% of your training program funded
- Flexible start date
- Career-focused projects
- Individual mentoring