• 6 hours
  • Easy

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 4/11/24

Meet the Challenges to Web Scraping

Programmer facing a universe of floating screens

Ethical Concerns With Web Scraping

Web scraping is a super cool and powerful tool, but sometimes there are morally gray areas. 

You probably wouldn’t want people to scrape your personal data, right? Or collect information online about you or your friends? These are not just legal but moral issues. Whether or not there is a law in the country you’re in that forbids web scraping, it’s important to make sure you are scraping the web ethically. We’ll go over how to do just that.

Know Which Websites to Scrape

Not all websites have policies that permit scraping their data. To know if a company is OK with it, make sure to check their terms and conditions.

For many websites, you can simply add "/robots.txt" after the site URL to see if they have a policy. Here’s a screenshot of Facebook’s robot.txt page, which very clearly states that scraping is prohibited:

#Notice: collection of data on Facebook by automated means is prohibited
The collection of data on Facebook is prohibited by the website.

Solutions to Ethical Problems

The following tips will help you avoid potential moral and ethical problems with data collection:

  • Use a public API if possible and available (learn more about APIs from my course Build Your Web Projects With REST APIs).

  • Provide a user agent in your request header to identify yourself as an ethical web scraper.

  • Always credit the owner to ensure no copyright issues.

  • Request data at a reasonable rate.

Challenges With Web Scraping

Web scraping is a powerful and useful tool - but it can have its difficulties.

Variety of Websites in HTML Structures 

If you recall, we had to go into the source code and look up the precise class names for that page to scrape data from the UK government site. If we tried to use the same code to extract data from another site, it wouldn’t work. We’d need to have a specific web scraper for each site.

Durability of Scripts 

As you know, the internet is constantly changing and updating. While that is great for getting the most up-to-date information, web scraping scripts that rely on a particular HTML structure can become quickly outdated. If you have a script that you are repeatedly using over a long time, make sure it still works the way you intended.

Let’s Recap!

  • There are legitimate ethical and privacy concerns when scraping the web.

  • Never scrape anyone’s personal data or a website that prohibits it.

  • Because the internet is constantly updating, web scraping scripts require maintenance.

Congratulations!

Well done on reaching the end of this course! Once you’ve completed the quiz, don’t hesitate to review the chapters and exercises that were more difficult.

If you’re feeling confident in your Python skills, consider moving on to our more advanced courses, such as Set Up a Python Environment.

From there, you can apply your skills in the field of data with the course Use Python Libraries for Data Science or in software programming with the course Learn Programming With Python.

Wherever you head next, I’ve enjoyed preparing this course for you and wish you the best of luck in your future projects! 😁

Ever considered an OpenClassrooms diploma?
  • Up to 100% of your training program funded
  • Flexible start date
  • Career-focused projects
  • Individual mentoring
Find the training program and funding option that suits you best
Example of certificate of achievement
Example of certificate of achievement