• 10 heures
  • Difficile

Ce cours est visible gratuitement en ligne.

course.header.alt.is_video

course.header.alt.is_certifying

J'ai tout compris !

Mis à jour le 15/12/2022

Apply Classifier Models for Sentiment Analysis

Understand the Goal of Sentiment Analysis

Sentiment analysis is one of the most common classification applications of vectorization, so we’ve dedicated a whole chapter to this subject!

Sentiment analysis is mostly used to infer customers’ views on a product or a service on social networks. However, you can also apply it to political discourse, digital sociology, or information extraction.

Consider the following statement taken from a movie review:

This movie was amazing.”

Rather easy to identify the positive word, right? Now, what about this one?

“That movie was amazingly bad.”

Slightly more complex! Now, what about this social media comment:

It’s not unattractive!

Even more complex!

In some cases, sentiment analysis is more complicated than just looking at positive and negative words. Negations, idioms, slang, sarcasm, and sentence complexity can make it more difficult to understand the polarity of a piece of text.

In this chapter, we will look at two different ways to calculate a sentiment score on a text piece.

Perform a Dictionary-Based Sentiment Analysis

If you google “sentiment analysis in Python,” you will stumble upon the TextBlob library at some point. It is a simple library that offers many NLP tasks, such as tokenization, part-of-speech tagging, language detection, and sentiment analysis. Let’s take it for a ride!

Install the library and download the data with:

pip install -U textblob 
python -m textblob.download_corpora

Similarly to spaCy, when you apply TextBlob to a text, the text is parsed, and you get a list of all the sentences in the text. And with each sentence, you get its:

  • Sentiment, also known as its polarity (negative or positive). 

  • Subjectivity, a measure of the intensity of the sentiment. 

Let’s apply it to a text:

from  textblob import TextBlob

text = '''That was a narrow escape, Alice said. 
A good deal frightened at the sudden change. 
But very glad to find herself still in existence;
'''

blob = TextBlob(text)
# For each sentence calculate the sentiment (polarity) and importance (subjectivity)
for sentence in blob.sentences:
   print(sentence.raw, sentence.sentiment.polarity, sentence.sentiment.subjectivity)

You get:

That was a narrow escape, Alice said. -0.2 0.4 
A good deal frightened at the sudden change. 0.35 0.55 
But very glad to find herself still in existence; 0.65 1.0

TextBlob has correctly identified the first sentence as slightly negative (narrow, escape) and the third one as more positive (glad).

However, the second sentence’s polarity score (0.35) is not great. The score indicates a positive sentence, but Alice is saying that she is a good deal....frightened! Not such a positive statement after all.

What’s happening here? Let’s try to understand how TextBlob scores the sentence for polarity by looking at some variations of the initial sentence:

def polarity(text):

   polarity_score = TextBlob(text).sentences[0].sentiment.polarity

   print(f"{text} \t {polarity_score}")


# original sentence, positive
polarity("A good deal frightened at the change.")
> A good deal frightened at the change. 	 0.7

# remove 'a good deal', you get neutral
polarity("Frightened at the change.")
> Frightened at the change. 	 0.0

# what if we change the adjective
polarity("Happy at the change.")
> Happy at the change. 	 0.8

# or instead add just the word 'very'
polarity("Very frightened at the change.")
> Very frightened at the change. 	 0.2

To score the sentiment of a sentence, TextBlob takes into account the adjectives and their polarity (with some additional rules to handle negations). For example, although the word “frightened” is obviously negative, it is not being taken into account since TextBlob considers it a verb instead of an adjective!

So, where does TextBlob get the scores for adjectives? 🤔

Scores come directly from a file that lists over 2900 entries, mostly composed of adjectives and some verbs. This dictionary file is available on the TextBlob GitHub repository.

Let’s look at the entries for the word “amateur”:

<word form="amateur" cornetto_synset_id="n_a-502167" wordnet_id="a-01870636" pos="JJ" sense="lacking professional skill or expertise" polarity="-0.5" subjectivity="0.5" intensity="1.0" confidence="0.9" />

<word form="amateur" cornetto_synset_id="n_a-525291" wordnet_id="a-01869634" pos="JJ" sense="engaged in as a pastime" polarity="0.0" subjectivity="0.0" intensity="1.0" confidence="0.9" />

You can see that the word “amateur” has two different and opposite meanings. The word is negative when referring to a lack of expertise and neutral when referring to a pastime.

Let’s go back to sentiment analysis with TextBlob.

Remember the Alice in Wonderland extract using the dictionary-based approach utilized by TextBlob? Unfortunately, each word had a fixed negative or positive polarity and was too rough to give reliable results: 

  • It cannot handle words not already in the dictionary (out-of-vocabulary or OOV).

  • It does not consider the word context. For instance, green beans is negative (-0.2) because green has a negative score in the TextBlob dictionary. 

  • It’s arbitrary. Why would green be negative, whereas blue is neutral?

  • It propagates bias (you can verify that on some of the entries in the TextBlob dictionary). 

Perform a Cloud-Based Sentiment Analysis

Since sentiment analysis is so widely used, multiple companies offer out-of-the-box sentiment analysis services, such as AWS Comprehend, Microsoft Text Analysis, and Google NLP.

These services require you to register and get an API key. Once that is done, the process is to:

  • Send a request to the API with your text.

  • Get back a JSON file with information on your text, such as polarity, named entities, part-of-speech, and language detection.

Let’s take a closer look at the Google Natural Language service.

You can send a request to the Google NLP API with the following code:

# specify the content you want to score
document = {
    "content": "Alice was very frightened." , 
    "type_": language_v1.Document.Type.PLAIN_TEXT, 
    "language": "en"
}
# import the library and instantiate the client
from google.cloud import language_v1

client = language_v1.LanguageServiceClient()

# send your query
response = client.analyze_sentiment(
    request = {
        'document': document, 
        'encoding_type': language_v1.EncodingType.UTF8
    }
)
# It returns the sentiment polarity and magnitude scores
print("sentiment score:", response.document_sentiment.score)
print("sentiment magnitude:", response.document_sentiment.magnitude)

This returns:

sentiment score: -0.8 
sentiment magnitude: 0.8

The Google sentiment analysis returns a polarity score (-0.8) and a magnitude score (0.8), which can be interpreted as the importance of the sentence’s sentiment.

If you send it the original text, Google NLP will return a score for the whole document and a score for each sentence.

Document sentiment {'magnitude': 1.2, 'score': 0.3}
That was a narrow escape, Alice said. {'magnitude': 0.1, 'score': -0.1}
A good deal frightened at the sudden change. {'magnitude': 0.2, 'score': 0.2}
But very glad to find herself still in existence; {'magnitude': 0.8, 'score': 0.8}

The analysis still struggles with the sentence, “A good deal frightened,” giving it a slightly positive score of 0.2. If you were to replace “good deal frightened” with more neutral adjectives such as “very frightened” or “quite frightened,” you would get definitively negative sentiment scores.

Very frightened at the sudden change. {'magnitude': 0.7, 'score': -0.7}
Quite frightened at the sudden change. {'magnitude': 0.6, 'score':

Nonetheless, using off-the-shelf sentiment analysis services is a valid option, which, more often than not, will carry out more reliable results than an in-house trained model. If I had to implement a sentiment analysis service in production, I would probably use a cloud-based solution for its reliability and immediate implementation!

Let’s Recap!

  • The goal of sentiment analysis is to classify an opinion as positive or negative.

  • It is the most widely-used vectorization application for classification.

  • Negations, idioms, slang, sarcasm, and sentence complexity confuse sentiment analysis.

  • You can calculate a sentiment score using dictionary-based sentiment analysis with the TextBlob library. It is easy yet quite restricted in its ability to extract sentiment from text.

  • The cloud-based sentiment analysis method is reliable but, at scale, could become costly.  

This concludes the second part of the course! You’ve seen how to classify text by first vectorizing it and applying a standard classifier model. In Part 3 of the course, you will learn about a powerful vectorization technique called word embeddings.

Exemple de certificat de réussite
Exemple de certificat de réussite