Ethical Concerns With Web Scraping
Web scraping is a super cool and powerful tool, but sometimes there are morally gray areas.
You probably wouldn’t want people to scrape your personal data, right? Or collect information online about you or your friends? These are not just legal but moral issues. Whether or not there is a law in the country you’re in that forbids web scraping, it’s important to make sure you are scraping the web ethically. We’ll go over how to do just that.
Know Which Websites to Scrape
Not all websites have policies that permit scraping their data. To know if a company is OK with it, make sure to check their terms and conditions.
For many websites, you can simply add "/robots.txt" after the site URL to see if they have a policy. Here’s a screenshot of Facebook’s robot.txt page, which very clearly states that scraping is prohibited:
Solutions to Ethical Problems
The following tips will help you avoid potential moral and ethical problems with data collection:
Use a public API if possible and available (learn more about APIs from my course Build Your Web Projects With REST APIs).
Provide a user agent in your request header to identify yourself as an ethical web scraper.
Always credit the owner to ensure no copyright issues.
Request data at a reasonable rate.
Challenges With Web Scraping
Web scraping is a powerful and useful tool - but it can have its difficulties.
Variety of Websites in HTML Structures
If you recall, we had to go into the source code and look up the precise class names for that page to scrape data from the UK government site. If we tried to use the same code to extract data from another site, it wouldn’t work. We’d need to have a specific web scraper for each site.
Durability of Scripts
As you know, the internet is constantly changing and updating. While that is great for getting the most up-to-date information, web scraping scripts that rely on a particular HTML structure can become quickly outdated. If you have a script that you are repeatedly using over a long time, make sure it still works the way you intended.
There are legitimate ethical and privacy concerns when scraping the web.
Never scrape anyone’s personal data or a website that prohibits it.
Because the internet is constantly updating, web scraping scripts require maintenance.
Well done on reaching the end of this course! Once you’ve completed the quiz, don’t hesitate to review the chapters and exercises that were more difficult.
If you’re feeling confident in your Python skills, consider moving on to our more advanced courses, such as Set Up a Python Environment.
Wherever you head next, I’ve enjoyed preparing this course for you and wish you the best of luck in your future projects! 😁