The Ultimate Web Scraping With Python Bootcamp 2023

Learn to extract data from the web with Python with just one course, covering select lax, Playwright, scrappy and more.

What you’ll learn

The Ultimate Web Scraping With Python Bootcamp 2023

Understand the fundamentals of web scraping in Python from absolute scratch
Scrape information from static and dynamic websites and extract it into a variety of formats
Intercept and emulate hidden APIs to identify highly productive alternatives to getting your data
Master the requests library for working with HTTP
Parse and extract content from HTML using beautiful soup, select laxly, and Microsoft Playwright
Master complex CSS selectors, including descendant, child, and sibling combinators
Understand how the web works, including HTTP, HTML, CSS, and JavaScript
Create scrapy crawlers and practice items, item loaders, and custom pipelines
Integrate Scrappy with Playwright for highly performant, fine-tuned dynamic website crawling
Practice processing and extracting data to a variety of formats, including CSV, JSON, XML, and SQL

Requirements

No programming experience is needed – I’ll teach you everything you need to know.
No paid software is required – we’ll be using open-sourced Python libraries
A computer with access to the internet
Prepare to learn real skills you could put to practice right away

Description

Welcome to the Ultimate Web Scraping With Python Bootcamp,

the only course you need

to go from a complete beginner in Python to a very competent web scraper.

Web scraping is the process of programmatically extracting data from the web. Scraping agents visit a web resource, extract content from it, and then process the resulting data to parse some specific information of interest.

Scraping is a programming skill that offers

immediate feedback

and can be used to automate various data collection and processing tasks.

Over the next

17+ hours

, we will methodically cover

everything you need to know

to write web scraping agents in Python.

This boot camp is organized in

three parts of increasing difficulty

designed to help you

progressively build your skill

.

Part I – Begin

We’ll start by

understanding how the web works

by taking a closer look at HTTP, the modern web’s key application layer communication protocol. Next, we’ll explore

HTML

,

CSS

, and

JavaScript

from the first principles to better understand how websites are built. Finally, we’ll learn how to use Python to send HTTP requests and parse the resulting HTML, CSS, and JavaScript to extract the needed data. Our goal in the first part of the course is to build

a solid foundation

in web scraping and Python and put those skills to practice by

building functional web scrapers from scratch

. Selected topics include:

a detailed overview of

the request-response

cycle
understanding user agents, HTTP

verbs, headers, and statuses
understanding why custom headers can often be used to

bypass paywalls
mastering the requests

library to work with HTTP in Python
what

stateless

means and how

cookies

work
exploring the

Role of Proxies

in modern web architectures
mastering

beautiful soup

for parsing and data extraction

Part II – Refine

In the second part of the course, we’ll build on the foundation we’ve already laid to explore more advanced topics in web scraping. We’ll learn how to

scrape dynamic websites

that use JavaScript to render their content by setting up

Microsoft Playwright

as a headless browser to automate this process. We’ll also learn how to

identify and emulate API calls

to scrape data from websites that don’t have formally public APIs. Our projects in this section will include an image scraper that can

download a set number of high-resolution images

given some keyword and another scraping agent that

extracts the price and content

of discounted video games from a dynamically rendered website. Topics include:

identifying and using

hidden APIs

and understanding the benefits they offer
emulating headers, cookies, and body

content with ease
automatically generating Python code

from intercepted API requests using Postman and httpie
working with the

highly performant select lax

parsing library
mastering

CSS selectors
introducing

Microsoft Playwright

for headless browsing and dynamic rendering

Part III – Master

In the final part of the course, we’ll introduce Scrappy. This will give us an excellent, time-tested framework for building more complex and robust web scrapers. We’ll learn how to

set up Scrappy within a virtual environment

and create

spiders and pipelines

to extract data from websites in various

formats

. Having learned how to use Scraps, we’ll then explore how to

integrate it with Playwright

so that we tackle the challenge of scraping dynamic websites from right within Scrappy. We’ll conclude this section by building a

scraping agent that executes custom JavaScript

code before returning the resulting HTML to Scrappy. Some topics from this section:

learning how to set up Scrappy and explore its command line interface (“

the Scrapy tool

“)
dynamically explore response objects using

scrappy shell
understand and define

item schemas

and load data using

item loaders

and

input/output processors
integrate Playwright into Scrappy to

tackle dynamically rendered

JavaScript sites
write PageMethods

to specify highly

specific instructions

to the headless browser from right within Scrappy
define custom pipelines

for saving into SQL databases and highly customized output formats

In this boot camp, I will take you

step-by-step

through engaging video lectures and teach you everything you need to know to start with Python web scraping.

By the end of this course, you will have a complete toolset to conceptualize and implement scraping agents for any

website you can imagine.

See you inside!