• 10 hours
  • Hard

Free online content available in this course.


Got it!

Last updated on 3/4/22

Apply Classifier Models for Sentiment Analysis

Log in or subscribe for free to enjoy all this course has to offer!

Understand the Goal of Sentiment Analysis

Sentiment analysis is one of the most common classification applications of vectorization, so we've dedicated a whole chapter to this subject!

Sentiment analysis is mostly used to infer customers' views on a product or a service on social networks. You can also apply it to political discourse, digital sociology, or information extraction.

Consider the following statement taken from a movie review:

"This movie was amazing."

Rather easy to identify the positive word, right? Now, what about this one?

"That movie was amazingly bad."

Slightly more complex! Now what about this social media comment:

"It's not unattractive!"

Even more complex!

In some cases, sentiment analysis is more complicated than just looking at positive and negative words. Negations, idioms, slang, sarcasm, and sentence complexity complexify sentiment analysis.

In this chapter, we will look at three different ways to calculate a sentiment score on a text piece.

Perform an Ontology-Based Sentiment Analysis

If you google "sentiment analysis in Python," you will at some point stumble upon the TextBlob library. It is a simple library that offers many NLP tasks right out of the box, such as tokenization, part-of-speech tagging, language detection, and sentiment analysis. Let's take it for a ride!

Install the library and download the data with:

pip install -U textblob
python -m textblob.download_corpora
from textblob import TextBlob
text = "That was a narrow escape, Alice said. A good deal frightened at the sudden change. But very glad to find herself still in existence;"
blob = TextBlob(text)
for sentence in blob.sentences:
print(sentence.raw, sentence.sentiment.polarity, sentence.sentiment.subjectivity)

You get:

That was a narrow escape, Alice said. -0.2 
A good deal frightened at the sudden change. 0.35 
But very glad to find herself still in existence; 0.65 

TextBlob has correctly identified the first sentence as slightly negative (narrow, escape), and the third one as more positive (glad).

However, the polarity score (0.35) of the second sentence is not great. The score indicates a positive sentence, but Alice is saying that she is a good deal....frightened! Not such a positive statement after all.

What's happening here? Let's try to understand how TextBlob scores the sentence for polarity by looking at some variations of the initial sentence:

def polarity(text):
polarity_score = TextBlob(text).sentences[0].sentiment.polarity
print(f"{text} \t {polarity_score}")
# original sentence, positive
polarity("A good deal frightened at the change.")
# remove 'a good deal', you get neutral
polarity("Frightened at the change.")
# what if we add a negation, and change the noun
polarity("Happy at the change.")
# or add just the word very
polarity("Very frightened at the change.")

You get:

A good deal frightened at the change.  0.7
Frightened at the change.              0.0
Happy at the change.                   0.8
Very frightened at the change.         0.2

TextBlob is looking at the polarity of the adjectives (with some additional rules to handle negations) to calculate the sentence's polarity. Although the word frightened is obviously negative, it is not being taken into account since TextBlob considers it a verb instead of an adjective!

So, where does TextBlob get the scores for adjectives? o_O

Scores come directly from a file that lists over 2900 entries, mostly composed of adjectives and some verbs. This dictionary file is available on the TextBlob GitHub repository.

Let's look at the entries for the word amateur:

<word form="amateur" cornetto_synset_id="n_a-502167" wordnet_id="a-01870636" pos="JJ" sense="lacking professional skill or expertise" polarity="-0.5" subjectivity="0.5" intensity="1.0" confidence="0.9" />

<word form="amateur" cornetto_synset_id="n_a-525291" wordnet_id="a-01869634" pos="JJ" sense="engaged in as a pastime" polarity="0.0" subjectivity="0.0" intensity="1.0" confidence="0.9" />

You can see that the word amateur has two different and opposite meanings. The word is negative when referring to a lack of expertise and neutral when referring to a pastime.

There are also see two different IDs: wordnet_id  and cornetto_synset_id. These IDs refer to two major dictionaries: WordNet (that we have previously worked with) and Cornetto, a lexical resource for the Dutch language. 

WordNet and Cornetto are dictionaries with extra information linking concepts (synsets) together. Such dictionaries, sometimes called ontologies, can be used to find:

  • Synonyms.

  • Hypernyms and hyponyms: a hyponym is in a type of relationship with its hypernym. A car (hyponym) is a type of vehicle (hypernym).

  • Holonyms (parts) and meronyms (whole): a finger (holonym) is a part of a hand (meronym).

This information is fascinating. As a seasoned NLP practitioner, you should be aware these resources exist. :)

Back to sentiment analysis with TextBlob.

As you've seen in the Alice in Wonderland extract, the dictionary-based approach used by TextBlob, where each word has a fixed negative or positive polarity, is too rough to give reliable results:

  • It cannot handle words that are not already in the dictionary (out-of-vocabulary or OOV).

  • It does not take into account the word context. For instance, green beans is negative (-0.2) because green has a negative score in the TextBlob dictionary. 

  • It's arbitrary. Why would green be negative, whereas blue neutral?

  • It propagates bias (you can verify that on some of the entries in the TextBlob dictionary). 

Perform a Cloud-Based Sentiment Analysis

Since sentiment analysis is so widely used, multiple companies offer out-of-the-box sentiment analysis services include AWS ComprehendMicrosoft Text Analysis, and Google NLP.

These services require you to register and get an API key. Once that done, the process is to:

  • Send a request to the API with your text.

  • Get back a JSON file with information on your text such as polarity, named entities, part-of-speech, and language detection.

Let's take a closer look at the Google Natural Language service, as an example.

You can send a request to the Google NLP API with the following code:

import requests
import json
key = { "key": "<Your API KEY here>"}
data = {
"document": {
"content":"Alice was very frightened."
results = requests.post(url, params=key, json=data)
content = results.content.decode('utf-8')

This returns:

{'magnitude': 0.8, 'score': -0.8}

The Google sentiment analysis returns a polarity score (-0.8) and a magnitude score (0.8), which can be interpreted as the importance of the sentence's sentiment.

If you send it the original text, Google NLP will return a score for the whole document and a score for each sentence.

Document sentiment {'magnitude': 1.2, 'score': 0.3}
     - That was a narrow escape, Alice said. {'magnitude': 0.1, 'score': -0.1}
     - A good deal frightened at the sudden change. {'magnitude': 0.2, 'score': 0.2}
     - But very glad to find herself still in existence; {'magnitude': 0.8, 'score': 0.8}

The analysis still struggles with the sentence "A good deal frightened," giving it a slightly positive score of 0.2. If you were to replace "good deal frightened" with more neutral adjectives such as "very frightened" or "quite frightened," you would get definitively negative sentiment scores.

- Very frightened at the sudden change. {'magnitude': 0.7, 'score': -0.7}
- Quite frightened at the sudden change. {'magnitude': 0.6, 'score': -0.6

Nonetheless, using off-the-shelf sentiment analysis services is a valid option, which, more often than not, will carry out more reliable results than an in-house trained model. If I had to implement a sentiment analysis service in production, I would probably use a cloud- based solution for its reliability and out-of-the box implementation!

Let's Recap!

  • The goal of sentiment analysis is to classify an opinion as positive or negative.

  • It is the most widely-used vectorization application for classification.

  • Negations, idioms, slang, sarcasm, and sentence complexity confuse sentiment analysis.

  • You can calculate a sentiment score using dictionary-based sentiment analysis with the TextBlob library. It is easy yet quite restricted in its ability to extract sentiment from text.

  • The cloud-based sentiment analysis method is quite reliable, but, at scale, could become costly.  

This concludes the second part of the course! You've seen how to classify text by first vectorizing it and applying a standard classifier model. In Part 3 of the course, you will learn about a powerful vectorization technique called word embeddings, primarily for exploration.

Example of certificate of achievement
Example of certificate of achievement