- 20 hours
- Medium

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!Last updated on 5/27/20

# Test Your Knowledge on Building Linear Regression Models!

### Evaluated skills

- Build Linear Regression Models

### Description

In this quiz, we're going to build several models based on the bike-sharing dataset.

The bike -sharing dataset is available from the UCI repository.

The dataset has over 17k samples and 16 different variables. We are going to focus on the following five attributes:

**Season:**season (1:spring, 2:summer, 3:fall, 4:winter).**Temp:**Normalized temperature in Celsius.**Hum:**Normalized humidity.**Wind speed:**Normalized wind speed.**Cnt:**count of total rental bikes.

You can load the dataset with:

```
import pandas as pd
df = pd.read_csv('bike_sharing_day.csv')
```

Remove the non-essential columns with:

```
df = df[['season', 'temp','hum','windspeed','cnt']]
```

To do this quiz, you should first import the following packages:

```
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm
import numpy as np
import pandas as pd
```

### Question 1

Let's find out which predictor is driving the usage of bikes (the

*cnt*outcome variable). First, build the three univariate regression models:- cnt ~ temp
- cnt ~ hum
- cnt ~ wind speed

Looking at the R-squared metric, which variable explains the most the variability in the outcome

*cnt*?Temp

Hum

Wind speed

### Question 2

Look at the influence of each predictor for the different seasons.

The seasons are defined as:seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}You can limit the regression to a specific season, for instance,

*spring,*with the following line:res = smf.ols(formula, data = df[df.season == 1]).fit()Looking at the R-squared for each season and univariate model:

- cnt ~ temp
- cnt ~ hum
- cnt ~ windspeed

Which if the following assertion below is true?

*Don't hesitate to loop over the seasons and the predictors with the following code:*seasons = {1:'spring', 2:'summer', 3:'fall', 4:'winter'}for season in range(1,5):print("--"* 20)print("season {}".format(seasons[season]))for variable in ['temp', 'hum', 'windspeed']:formula = "cnt ~ {}".format(variable)res = smf.ols(formula, data = df[df.season == season]).fit()print("- R^2 for {}: {:.2f}".format(variable, res.rsquared))Temp has the most influence in spring.

Humidity is the most important factor in the fall.

Wind speed always has very little influence on the usage.

All of the above.

### Question 3

When looking at the data over all four seasons, all p-values are well below 0.05, and the three predictors are relevant.

However, when selecting a specific season, some predictors are no longer significant.Consider the p-values of the predictors in each univariate model for each season:

- cnt ~ temp
- cnt ~ hum
- cnt ~ windspeed

Which assertion is true?

Humidity is never significant.

Temperature is not significant in the fall.

Wind speed is always a significant factor.

None of the predictors are significant in winter.