In the previous chapter, we looked at how a classification task (predicting high or low happiness) can be carried out by building a decision tree. But decision trees are not the only algorithm you can use for classification! In this chapter, we will look at another algorithm, logistic regression.
But First - Linear Regression
Before we dive into logistic regression, let’s look at its cousin, linear regression. We will be taking a deeper look at linear regression in the next part of this course. But for now, you need to know just enough to help you understand logistic regression.
Take a look at the following scatter plot showing income against happiness for 100 countries.
The mean income per person is measured in thousands of dollars. The happiness is measured by a survey, with each country accumulating a score between 0 and 10, with 0 being deeply unhappy, and 10 being incredibly happy.
We can fit a line to this data:
You will see how this line is fit when we look at regression in the next part of this course. You may be familiar with the general idea of fitting a line to a plot of sample points. You may have done this when analyzing the results of a science experiment at school!
The fitted line gives a feature you can use to predict happiness from income levels. For example, a country with a mean income of $70,000 per person has a predicted happiness rating of 5.9:
We've built a linear regression model!
Binomial Logistic Regression
Instead of a happiness value between 0 and 10, what if we measured countries just with high and low happiness, as in the example in Part 1 of this course?
The plot would look more like this:
The machine learning task now is to use income to predict high or low happiness. But the relationship is not linear, so you can't just fit a line like this:
Consider three areas of the chart: One where the countries with income < $40k are low happiness, another where countries with income > $90k are high happiness, and another, where for countries between are less clear-cut:
A straight line can't model this relationship, but what if you could do this:
This is exactly what logistic regression does!
The curve is called a sigmoid curve, because it is S shaped.
You can use this curve to obtain a probability of a low or high classification. Here we have an 88% probability of high happiness for a country with a mean income of $75,000.
Now we can use a threshold of 50% to make a hard high or low prediction.
Multinomial Logistic Regression
What if there are more than two classes? For example, what if we split the countries into high, medium, and low happiness?
There are a few strategies for dealing with this. The one-vs-rest (OVR) algorithm trains a model for each class against the others. So you would train the following models:
High vs. medium and low
Medium vs. high and low
Low vs. high and medium
Logistic regression is a regression algorithm. Regression helps predict continuous variables.
With logistic regression, the continuous variable is the probability of a categorial prediction.
Binomial Logistic Regression predicts one of two categories.
Multinomial Logistic Regression makes predictions when there are more than two categories.
In the next chapter, we will build and evaluate a classification model.