• 10 hours
  • Medium

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 4/24/20

Reduce Dimensions in your Data Using Principal Component Analysis

Evaluated skills

  • Carry out a principal component analysis

Description

For questions 1 to 6, you will be carrying out a principal component analysis on the wine quality dataset

Before we dive in, let's import the libraries we need using the following code:

import pandas as pd
import numpy as np

from functions import *

Now, let's load the data into a Pandas data frame called orginal_data:

original_data = pd.read_csv('winequality-red.csv')
original_data.head()

 

  • Question 1

    Which of the variables should you keep for your analysis? 

    Careful, there are several correct answers.
    • Only fixed acidityvolatile acidity and citric acid because they are the most relevant to our analysis

    • All of them, because they are all quantitative variables

    • All of them, because they all look relevant to our analysis of wine

    • Only fixed acidityvolatile acidity and citric acid because they are the only qualitative variables 

  • Question 2

    How many nulls does our data contain? 

    • 1

    • 3

    • 0

    • 2

  • Question 3

    After cleaning and preparing the data, you feel ready to carry out a PCA, but a colleague recommends you use the describe()  method first. Why do you have to do this? 

    Careful, there are several correct answers.
    • To check the range of values for the variables.

    • To decide what variables needs to be normalized.

    • To evaluate the confidence interval of the variables.

    • To select the variables for the PCA.

Ever considered an OpenClassrooms diploma?
  • Up to 100% of your training program funded
  • Flexible start date
  • Career-focused projects
  • Individual mentoring
Find the training program and funding option that suits you best