• 20 hours
  • Medium

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 5/27/20

Test Your Knowledge on Linearity, Correlation, and Hypothesis Testing

Log in or subscribe for free to enjoy all this course has to offer!

Evaluated skills

  • Understand the Fundamentals of Statistical Modeling

Description

In this exercise, you are going to analyze a new dataset in terms of linearity, correlation, and the statistical significance of the means of certain categories and whether some variables follow a normal distribution.

The dataset is the bike sharing dataset available from the UCI repository. This dataset has daily and and hourly data. We are going to work on the day.csv file which has 731 samples and 16 different variables. 

And we are going to focus on the following five attributes:

  • Season: Season (1:spring, 2:summer, 3:fall, 4:winter)
  • Temp: Normalized temperature in Celsius. 
  • Hum: Normalized humidity. 
  • Wind speed: Normalized wind speed. 
  • CNT: Count of total rental bikes. 

You can load the dataset with:

import pandas as pd
df = pd.read_csv('day.csv')

And remove the non essential columns with:

df = df[['season', 'temp','hum','windspeed','cnt']]
  • Question 1

    Draw the scatter plots of the variables (use  sns.pairplot(df)  from the Seaborn library).

    Looking at the scatter plots of the variables, which relation is the most linear looking?

    • hum vs. temp

    • cnt vs. temp

    • wind speed vs. hum 

    • All of the above.

       

  • Question 2

    Calculate the correlation of the different variables using the Pearson method.

    Which pair of variables are negatively correlated with the number of users (cnt)?

    • season & temp

    • temp & hum

    • wind speed & hum

    • temp & wind speed

  • Question 3

    Consider the correlation of the wind speed versus the other variables.

    What can you conclude when there's more wind?

    Careful, there are several correct answers.
    • A slight decrease in the number of people biking. 

    • A very important decrease in the number of people biking.

    • Colder temperatures and less humidity.

    • A slight increase in the number of users.