Last updated on 6/23/22

# Test Your Knowledge on Building Linear Regression Models!

### Evaluated skills

• Build Linear Regression Models

### Description

In this quiz, we're going to build several models based on the bike-sharing dataset.

The bike -sharing dataset is available from the UCI repository.

The dataset has over 17k samples and 16 different variables. We are going to focus on the following five attributes:

• Season: season (1:spring, 2:summer, 3:fall, 4:winter).
• Temp: Normalized temperature in Celsius.
• Hum: Normalized humidity.
• Wind speed: Normalized wind speed.
• Cnt: count of total rental bikes.

You can load the dataset with:

``````import pandas as pd
``````

Remove the non-essential columns with:

``````df = df[['season', 'temp','hum','windspeed','cnt']]
``````

To do this quiz, you should first import the following packages:

``````import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm
import numpy as np
import pandas as pd
``````
• ### Question 1

Let's find out which predictor is driving the usage of bikes (the cnt outcome variable). First, build the three univariate regression models:

1. cnt ~ temp
2. cnt ~ hum
3. cnt ~ wind speed

Looking at the R-squared metric, which variable explains the most the variability in the outcome cnt?

• Temp

• Hum

• Wind speed

• ### Question 2

Look at the influence of each predictor for the different seasons.
The seasons are defined as:

``````seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}
``````

You can limit the regression to a specific season, for instance, spring, with the following line:

``````res = smf.ols(formula, data = df[df.season == 1]).fit()
``````

Looking at the R-squared for each season and univariate model:

• cnt ~ temp
• cnt ~ hum
• cnt ~ windspeed

Which if the following assertion below is true?

Don't hesitate to loop over the seasons and the predictors with the following code:

``````seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}

for season in range(1,5):
print("--"* 20)
print("season {}".format(seasons[season]))
for variable in ['temp', 'hum', 'windspeed']:
formula = "cnt ~ {}".format(variable)
res = smf.ols(formula, data = df[df.season == season]).fit()
print("- R^2 for {}: {:.2f}".format(variable, res.rsquared))
``````
• Temp has the most influence in spring.

• Humidity is the most important factor in the fall.

• Wind speed always has very little influence on the usage.

• All of the above.

• ### Question 3

When looking at the data over all four seasons, all p-values are well below 0.05, and the three predictors are relevant.
However, when selecting a specific season, some predictors are no longer significant.

Consider the p-values of the predictors in each univariate model for each season:

• cnt ~ temp
• cnt ~ hum
• cnt ~ windspeed

Which assertion is true?

• Humidity is never significant.

• Temperature is not significant in the fall.

• Wind speed is always a significant factor.

• None of the predictors are significant in winter.

Ever considered an OpenClassrooms diploma?
• Up to 100% of your training program funded
• Flexible start date
• Career-focused projects
• Individual mentoring
Find the training program and funding option that suits you best 