Learn Python web scraping and collect data from the web automatically
Master HTTP requests, HTML parsing, pagination, and data storage. Py guides you step by step through every concept so you understand what you're building — not just how to copy it.
From 800+ Python learners
What you'll be able to scrape
These are real use cases you can start building after the first few modules. Each one is practical, not a toy example.
Product prices across e-commerce
Track price changes on Amazon, eBay, or any retailer. Build alerts that notify you when a price drops below your target.
Job listings
Aggregate job postings from multiple boards into one spreadsheet. Filter by location, salary, or keywords automatically.
News headlines
Monitor news sources for keywords relevant to your industry or research. Build a daily digest that runs without your input.
Financial data
Collect stock prices, economic indicators, or earnings reports from public sources for analysis or backtesting.
Real estate listings
Scrape property listings, track price changes over time, and identify market trends across neighbourhoods.
Sports statistics
Build datasets of player stats, match results, or league tables for analytics, fantasy sports, or personal projects.
The web scraping curriculum
8 modules, prerequisite: Python Intermediate. Each module builds toward the capstone project.
How the web works: HTTP, HTML, and the DOM
Understand what happens when a browser requests a page: HTTP methods, status codes, HTML structure, and how the DOM represents a document as a tree. This foundation makes everything else click.
Fetching pages with requests
Use Python's requests library to send HTTP GET and POST requests, pass headers and parameters, handle redirects, and inspect response objects. Fetch your first real web page in code.
Parsing HTML with BeautifulSoup
Load HTML into BeautifulSoup, navigate the parse tree, and extract text, attributes, and nested elements. Understand the difference between find() and find_all().
Finding elements by tag, class, and ID
Use CSS selectors and BeautifulSoup's search methods to target exactly the elements you need. Handle inconsistent HTML structure gracefully.
Handling pagination
Scrape data across multiple pages by detecting next-page links, constructing page URLs programmatically, and looping through paginated results without duplicating records.
Saving data to CSV and JSON
Write scraped data to structured formats: CSV for spreadsheets and tabular analysis, JSON for APIs and more complex nested data. Handle encoding issues correctly.
Respecting robots.txt and scraping ethics
Read and interpret robots.txt files, implement polite rate limiting with delays, and understand the legal and ethical boundaries of scraping public data.
Capstone: real price tracker or job monitor
Build a complete scraper that fetches data on a schedule, stores results over time, and detects changes. Choose between a price tracker or a job listing monitor.
Scraping ethics: what you need to know
Before you scrape anything, check the site's robots.txt file (e.g. https://example.com/robots.txt). This file tells you which paths the site owner does not want scraped. Respecting it is both a legal courtesy and a practical one — many sites take action against scrapers that ignore it.
Rate limiting matters. Sending hundreds of requests per second to a site can overload their server, gets your IP banned, and can constitute a denial-of-service attack under some laws. A simple time.sleep(1) between requests is often all that's needed. If a site has an API, use it instead — it's faster, more stable, and explicitly allowed.
Terms of service vary. Scraping publicly accessible data for personal use or research is generally low-risk. Scraping at scale for commercial purposes, or storing personally identifiable information, requires more care. The course covers how to read a site's terms, what language to look for, and when scraping is and isn't appropriate. We give you honest, practical guidance — not just "consult a lawyer."
Web scraping is part of MyPyMentor's Automation path
Web scraping doesn't live in isolation. In the Automation path, scraping is module 3 of 7 — and it connects directly to the skills before and after it. Before scraping, you'll have covered file handling and CSV/Excel automation, so you already know how to store structured data. After scraping, you'll learn email automation and task scheduling — which means you can scrape data, process it, and email a summary report on a cron schedule.
The capstone project ties all of this together: build a price tracker or job monitor that runs automatically, saves results over time, detects changes, and sends you an alert. That's not a toy project — it's a real automation pipeline that many professionals use variations of every day.
What scraping learners say
“I track prices for 80 products across three sites. Built the whole thing during week 6. My business was spending hours doing this manually every week. Now it runs itself.”
“I'm a grad student and needed data for my thesis. Learned web scraping in two weeks and collected a dataset that would have taken months to gather manually. Genuinely life-changing for research.”
“Our agency now tracks competitor mentions and campaign data automatically. I built the pipeline after completing this path. Saved our team at least 5 hours a week from day one.”
Frequently asked questions
Is web scraping legal?
Web scraping publicly accessible data is generally legal in most jurisdictions, but it depends on how you scrape and what you do with the data. Always check a site's robots.txt file, respect rate limits, and read the terms of service. Scraping data behind a login or in ways that violate terms of service can create legal exposure. The course covers this in detail.
What Python libraries are used for web scraping?
The two most common are requests (for fetching pages via HTTP) and BeautifulSoup (for parsing HTML and extracting data). For JavaScript-rendered pages, Playwright or Selenium handle dynamic content. For large-scale scraping, Scrapy is a full framework. This course focuses on requests and BeautifulSoup, which handle the majority of real-world tasks.
How hard is Python web scraping for beginners?
Web scraping requires Python Intermediate knowledge — you should be comfortable with functions, loops, dictionaries, and basic file I/O before starting. The scraping concepts themselves are not difficult once you understand HTML structure. Most learners get their first working scraper running within the first two modules.
Can you scrape any website?
Not every website can be scraped with simple requests and BeautifulSoup. Some sites render content with JavaScript, which means the HTML you receive won't contain the visible data. For those sites you need Playwright or Selenium. This course focuses on static HTML scraping, which covers a large portion of public websites.
What is the difference between BeautifulSoup and Scrapy?
BeautifulSoup is a parsing library you combine with requests to fetch and extract data from HTML. Scrapy is a full scraping framework with built-in request handling, middleware, pipelines, and scheduling. BeautifulSoup is the right starting point: simpler, more Pythonic, and sufficient for most tasks. Scrapy becomes worth learning when you need to scrape thousands of pages at scale.
Related learning paths
- Python Automation path: files, spreadsheets, email, scheduling
- Python Intermediate: OOP, decorators, file I/O (prerequisite for scraping)
- Python for Data Analysts: turn scraped data into insights with pandas
- Python for Finance: scrape and analyse financial data
- Full Python curriculum: all 8 learning paths overview