Table des matières

Partie 1
Get Ready to Navigate the Data-Driven World
Partie 2
Turn Data Into Information to Make It Useful
Partie 3
Visualize Your Data
Partie 4
Make Your Data Impactful

Table des matières

Partie 1
Get Ready to Navigate the Data-Driven World
Partie 2
Turn Data Into Information to Make It Useful
Partie 3
Visualize Your Data
Partie 4
Make Your Data Impactful

Explore the Concept of a Data Pipeline

Discover the Data Pipeline

By now, it is pretty clear what data is, but where does it come from? And where does it go?

In the data world, you'll often hear about data pipelines, which can be compared to a factory. For example, in a cookie factory, you have raw ingredients (flour, butter, and sugar) going in one end. Then you have the product, cookies, coming out of the other end. Finally, in the middle, you have a transformation process, (i.e., a set of operations or processes executed in the correct sequence):

A picture shows a data pipeline for the biscuit-making process. The raw ingredients sourced from the farms - butter, flour and sugar - undergo mixing, baking, packing to become the final product - biscuits sent to the final destination - shops.

The factory has taken raw ingredients and added value, turning them into something more refined and useful to the everyday person. Likewise, as you become more data literate, you will be able to transform raw data into something useful, just like the factory workers do with flour, sugar, and butter. So let’s start looking at the magical transformation process for your data!

Consider the scenario of a newly health-conscious individual. Your friend Zara has bought a fitness tracker that can track her steps and distance she's moved (whether by walking, running, cycling, etc.). She hopes this is the start of a journey to better all-around health and fitness. She’s even taken up cycling! Since you have expressed your interest in working with data, Zara has asked you to help her on this journey. She wants you to use your data skills to add value to her fitness data.

What data can I extract from the app to follow Zara’s fitness journey?

Data Sources and Raw Data

You can start by building the first stage of the data pipeline, which shows the data sources and the raw data you’re dealing with.

Let’s say the app records the number of steps and Zara’s resting heart rate.

A picture shows a smartphone screen with an open fitness tracker app. On the app dashboard we can see the information about Zara’s heart rate, number of steps per day, water consumption, sleep quality, length of training and calories spent. You can show this on the pipeline like this:

A picture shows the beginning of a data pipeline for Zara’s fitness tracker data. The data source is the fitness tracker app on Zara’s phone. The raw data issued from it are the number of steps and the resting heart rate.

Operations and Information

The next stage in the data pipeline shows the operations you can perform on this data to transform it into useful information. In this case, the app can produce a report that shows Zara’s achievements, such as weekly improvements or beating her previous best day.

A picture shows the continuation of the previous data pipeline for Zara’s fitness tracker data. The operation for both raw data is creating a fitness report in the phone app. The resulting information is fitness achievements.

Destination

Then you can see the destination (i.e., where this data might go next after processing). Zara might want to share her successes with her friends!

A picture shows the continuation of the previous data pipeline for Zara’s fitness tracker data. The destination of the information is sharing achievements with Zara’s friends.

Now you have a data pipeline for Zara’s fitness achievements. It’s like an information factory: raw data goes in one end, and refined, processed information comes out on the other.

Adding More Data to the Pipeline

Can we add more ingredients to our information factory?

Sure! Let’s ask your health-conscious friend to start recording other details about her health, particularly any worrying symptoms she has.

Zara starts to record this data in a spreadsheet that looks like this:

Date	Symptoms
February 01, 2022
February 02, 2022	Headache
February 03, 2022	Fatigue
February 04, 2022
…	…

How would you add this new data source to the data pipeline above?

Hopefully, you came up with something like this, which shows the new spreadsheet as a data source and the symptom data coming from the spreadsheet above:

A picture shows the continuation of the previous data pipeline for Zara, but with a new data source: health log. The raw data issued from it are Zara's symptoms.

Of course, Zara will want to do something useful with this new data, which we will explore in the next chapter!

Let’s Recap!

In this chapter, you learned that:

A data pipeline is like an information factory. You add value to raw data to transform it into information.
The raw data comes from different data sources.
Operations transform data into information.
You can send the final information to different data destinations.

In this chapter, you saw the basics of turning data into low-value information. Now that you understand the process, you are ready to take it to the next level. In the next chapter, we will extract the maximum value out of data by tailoring the pipeline to specific objectives.

Avez-vous une suggestion pour nous ?

Et si vous obteniez un diplôme OpenClassrooms ?

Formations jusqu’à 100 % financées
Date de début flexible
Projets professionnalisants
Mentorat individuel

Trouvez la formation et le financement faits pour vous

Être orienté Comparez nos types de formation

Table des matières

Get Ready to Navigate the Data-Driven World

Turn Data Into Information to Make It Useful

Visualize Your Data

Make Your Data Impactful

Table des matières

Get Ready to Navigate the Data-Driven World

Turn Data Into Information to Make It Useful

Visualize Your Data

Make Your Data Impactful

Explore the Concept of a Data Pipeline

Discover the Data Pipeline

Data Sources and Raw Data

Operations and Information

Destination

Adding More Data to the Pipeline

Let’s Recap!