• 8 hours
  • Easy

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 3/15/23

Design Your Data Journey and Destination

Explore Ways of Improving a Data Pipeline

Zara’s data pipeline from the previous chapter is a good way for her to understand what can be done with her data. But is there more? Can you:

  • Add value to create more useful information? 

  • Drive some actionable insights

  • Think about this process as a journey toward a destination where data informs decision-making

Define a New Data-Driven Objective

At this stage, there are a few questions you can ask to clarify data processing objectives.

1. What eventual outcomes would Zara want to see after collecting and processing the data?
Think about actionable insights you want to draw from the information. Who will this information be shared with?

 

2. What questions must the data answer to generate the desired outcomes?
Write a question you have that you would like answered.

 

3. What additional data must Zara gather to answer those questions?

The answer could be “none,” or it could come from existing or new sources.

 

Zara wants to work more effectively with her personal trainer by giving them access to her activity stats. A data-driven objective might look something like this:

1. What eventual outcomes would Zara want to see after collecting and processing the data?

Think about what actionable insights you want to draw from the information. Who will this information be shared with?

To work with her trainer to devise a fitness plan.

2. What questions must the data answer to generate the desired outcomes?

Write a question that you have that you want answers to.

Is Zara’s exercise plan improving her fitness?

3. What additional data must Zara gather to answer those questions?

The answer could be “none,” or it could come from existing or new sources.

Which days she cycled.

The above questions and answers help clarify your overall data-driven destination. You now want to use these answers to modify the data pipeline. Think about each question and answer in turn (this time working backward from the new data):

3. What additional data must Zara gather to answer those questions?

Which days she cycled.

This tells you what new data sources you need. Zara must record the days she goes cycling. She can do this in her health log:

A picture shows the data pipeline sourcing from Zara’s health log. The raw data extracted from it are the symptoms and which days she cycled.

2. What questions must the data answer to generate the desired outcomes?

Is Zara’s exercise regime improving her fitness?

This tells you what new information you need. To answer the question about Zara’s fitness, you can combine the number of steps and whether she cycled with her resting heart rate: A picture shows the information rectangle that says “fitness report” (showing improvements in resting heart rate with exercise).

1.What eventual outcomes would Zara want to see after collecting and processing the data?

To work with her personal trainer to devise a fitness plan.

This tells you your destination and what actionable insights you hope to get:

A picture shows the destination rectangle that says “fitness plan”. Below it there’s an icon representing a person that is titled “fitness instructor”.

Reflect on Your Data Pipeline

Now that you have identified some specific data sources, information, and destinations, let's add them to the data pipeline. I’ve removed the previous operations, information, and destinations to keep the diagram simple. They are still there, but currently don’t support your specific objectives: A picture the data pipeline for Zara without Operations step

How do we connect Zara’s raw data to the new information? 

You could perform various operations on the data (you will see some later in this course). But a handy one is to combine data from different sources. So let’s combine some raw data to create the fitness report:  A picture shows the completed version of the previous data pipeline for Zara with Combine in Operations

Finally, you can add a data-driven action to the pipeline, even if you don’t know exactly what Zara and her trainer will come up with in their fitness plan.

A picture shows the completed version of the previous data pipeline for Zara

Your Turn!

 

You’ve improved the fitness aspect of Zara’s data pipeline. Let’s now pay attention to the health side of things. Zara is a bit concerned about her health, particularly some recurring symptoms. Also, she doesn’t always feel motivated and energetic for the day ahead. She is wondering if there is some underlying pattern.  

Define a new data-driven objective for Zara using the template below:

1. What are the eventual desired outcomes I would want to see after collecting and processing the data?
Think about what actionable insights you want to draw from the information. Who will this information be shared with?

 

2. What questions do I want the data to answer?
List a question you want answers to.

 

3. What additional data do I need to answer those questions?

The answer could be “none,” or it could come from existing or new sources.

 

Then extend the data pipeline to address this new objective.

✅ Check your work: How did you do? You can download my answers here to compare with yours.

Let’s Recap!

In this chapter, you:

  • defined the objective as creating a data destination where decisions and actions are data-driven. 

  • asked the right questions to clarify the data destination. 

  • used the answers to the three questions posed in the beginning of the course to identify necessary changes to improve the data pipeline

Congratulations on reaching the end of the first part of this course. I hope you are excited about the possibilities that data brings! In the next part, we will get hands on with the data and turn it into useful information.

Example of certificate of achievement
Example of certificate of achievement