12 hours
- Medium
Free online content available in this course.
course.header.alt.is_video
course.header.alt.is_certifying
Got it!Last updated on 6/23/22
Test Your Knowledge on Building Linear Regression Models!
Evaluated skills
- Build Linear Regression Models
Description
In this quiz, we're going to build several models based on the bike-sharing dataset.
The bike -sharing dataset is available from the UCI repository.
The dataset has over 17k samples and 16 different variables. We are going to focus on the following five attributes:
- Season: season (1:spring, 2:summer, 3:fall, 4:winter).
- Temp: Normalized temperature in Celsius.
- Hum: Normalized humidity.
- Wind speed: Normalized wind speed.
- Cnt: count of total rental bikes.
You can load the dataset with:
import pandas as pd
df = pd.read_csv('bike_sharing_day.csv')
Remove the non-essential columns with:
df = df[['season', 'temp','hum','windspeed','cnt']]
To do this quiz, you should first import the following packages:
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm
import numpy as np
import pandas as pd
Question 1
Let's find out which predictor is driving the usage of bikes (the cnt outcome variable). First, build the three univariate regression models:
- cnt ~ temp
- cnt ~ hum
- cnt ~ wind speed
Looking at the R-squared metric, which variable explains the most the variability in the outcome cnt?
Temp
Hum
Wind speed
Question 2
Look at the influence of each predictor for the different seasons.
The seasons are defined as:seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}You can limit the regression to a specific season, for instance, spring, with the following line:
res = smf.ols(formula, data = df[df.season == 1]).fit()Looking at the R-squared for each season and univariate model:
- cnt ~ temp
- cnt ~ hum
- cnt ~ windspeed
Which if the following assertion below is true?
Don't hesitate to loop over the seasons and the predictors with the following code:
seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}for season in range(1,5):print("--"* 20)print("season {}".format(seasons[season]))for variable in ['temp', 'hum', 'windspeed']:formula = "cnt ~ {}".format(variable)res = smf.ols(formula, data = df[df.season == season]).fit()print("- R^2 for {}: {:.2f}".format(variable, res.rsquared))Temp has the most influence in spring.
Humidity is the most important factor in the fall.
Wind speed always has very little influence on the usage.
All of the above.
Question 3
When looking at the data over all four seasons, all p-values are well below 0.05, and the three predictors are relevant.
However, when selecting a specific season, some predictors are no longer significant.Consider the p-values of the predictors in each univariate model for each season:
- cnt ~ temp
- cnt ~ hum
- cnt ~ windspeed
Which assertion is true?
Humidity is never significant.
Temperature is not significant in the fall.
Wind speed is always a significant factor.
None of the predictors are significant in winter.
- Up to 100% of your training program funded
- Flexible start date
- Career-focused projects
- Individual mentoring