Data

Introduction to Natural Language Processing

Natural Language Processing (NLP) allows us to classify, correct, predict, and even translate large text data quantities. In this course, you will discover how to transform text into vectors for exploration and classification. We will explore bag-of-words, word embeddings, and sentiment analysis.

Difficile

10 heures

Ce cours en libre accès vous intéresse ?

Commencer

Natural language processing, otherwise known as NLP, is the technology behind Siri, autocorrect, chatbots, and Google Translate. It’s what helps you translate text, filter spam, and detect fake news. In short, this technology allows a machine to understand and process human language.

But how does it work under the hood? How can you use NLP to transform human language into something a computer can understand? Look no further; this course has the answer!

In Part 1 of this course, we will explore how to preprocess text data and prepare it for further exploitation by a computer.
In Part 2, we will explore a text vectorization technique called bag-of-words and solve text classification problems such as sentiment analysis.
In Part 3, you will learn a more powerful vectorization technique called word embeddings and apply it to infer meaning from a text.

Once you complete this course, you will have a basic understanding of how NLP models work and how to use them in machine-learning projects. We will also introduce you to the spaCy 3.4, scikit-learn 1.1, and NLTK 3.7 libraries in Python 3.10.

Ready to dive into one of the most innovative domains in artificial intelligence? Then let’s get started!

Objectifs pédagogiques

Preprocess Text Data
Vectorize Text for Classification Using Bag-of-Words
Vectorize Text For Exploration Using Word Embeddings

Prérequis

Prerequisites:

To take advantage of this course, you must be familiar with Python 3.10 and be able to use Python libraries to manipulate data. You must also be familiar with basic linear algebra and stats and the main concepts behind machine learning, including scoring and training.

If you are unfamiliar with these concepts, take the following courses:

Required tools:

Python 3.10, including:

spaCy 3.4
NLTK 3.7
scikit-learn 1.1
pandas 1.5

Et si vous en faisiez votre métier ?

Suivez une des formations diplômantes de notre école 100% en ligne, et transformez vos connaissances en compétences professionnelles.

Formations jusqu’à 100 % financées
Date de début flexible
Projets professionnalisants
Mentorat individuel

Démarrer mon inscription

DataAI EngineerPerform advanced data analysis and business predictions using data science.

Certification OpenClassrooms

À plein temps : 9 mois

Table des matières

Partie 1
Preprocess Text Data
Partie 2
Vectorize Text for Classification Using Bag-of-Words
Partie 3
Vectorize Text for Exploration Using Word Embeddings

Contributeurs

Professeur

Alexis Perrier

Auteur et enseignant en Data Science, expert Machine Learning. Suivez @alexip sur Twitter.

Créé par

OpenClassrooms

Mis à jour le 23/01/2025

Licence

Data