Last updated on 8/5/21
Build a Classification Model With the Titanic Dataset
- Build a supervised learning model to address a classification task
In this exercise, you will carry out a classification using the Titanic dataset from Kaggle. We used this dataset in the feature engineering exercise in Part 2.
The following is a description of the features in the data:
|survival||Survival||0 = No, 1 = Yes|
|pclass||Ticket class||1 = 1st, 2 = 2nd, 3 = 3rd|
|Age||Age in years|
|sibsp||# of siblings / spouses aboard the Titanic|
|parch||# of parents / children aboard the Titanic|
|embarked||Port of Embarkation||C = Cherbourg, Q = Queenstown, S = Southampton|
You can find the code and data for this activity on the course GitHub repository.
titanic_clean.csv contains data that has already been cleaned up as follows:
- Nulls in Age have been imputed with the mean age.
- The first letter of the cabin has been split to provide a new deck feature. This has then been one-hot encoded, with nulls going to a column Deck_nan.
- Sex has been one-hot encoded.
- Embarked has been one-hot encoded, with nulls going to a column Embarked_nan.
The Jupyter Notebook
classification_activity.ipynb contains a template for the code you will run. Open the template in Jupyter and write the code as guided within the template. The objective is to predict the survival of passengers based on the available features.
Then answer the following questions.
In the correlation visualization, select the two features below that have the most significant correlation to the target feature, Survived.Careful, there are several correct answers.
Which feature should be selected for the target?
After scaling with the
MinMaxScaler, which of the following are correct statements about the data?Careful, there are several correct answers.
All features have a mean of 0.5.
All features have a min of 0.
All features have a max of 1.
The features are sorted from lowest to highest importance.