All Courses Python Tutorials

The Ultimate Web Scraping With Python Bootcamp 2023

The Ultimate Web Scraping With Python Bootcamp 2023

Learn to extract data from the web with Python with just one course, covering select lax, Playwright, scrappy and more.

What you’ll learn

The Ultimate Web Scraping With Python Bootcamp 2023



  • Understand the fundamentals of web scraping in Python from absolute scratch



  • Scrape information from static and dynamic websites and extract it into a variety of formats



  • Intercept and emulate hidden APIs to identify highly productive alternatives to getting your data



  • Master the requests library for working with HTTP



  • Parse and extract content from HTML using beautiful soup, select laxly, and Microsoft Playwright



  • Master complex CSS selectors, including descendant, child, and sibling combinators



  • Understand how the web works, including HTTP, HTML, CSS, and JavaScript



  • Create scrapy crawlers and practice items, item loaders, and custom pipelines



  • Integrate Scrappy with Playwright for highly performant, fine-tuned dynamic website crawling



  • Practice processing and extracting data to a variety of formats, including CSV, JSON, XML, and SQL

Requirements

  • No programming experience is needed – I’ll teach you everything you need to know.

  • No paid software is required – we’ll be using open-sourced Python libraries

  • A computer with access to the internet

  • Prepare to learn real skills you could put to practice right away

Description

Welcome to the Ultimate Web Scraping With Python Bootcamp,



the only course you need



to go from a complete beginner in Python to a very competent web scraper.

Web scraping is the process of programmatically extracting data from the web. Scraping agents visit a web resource, extract content from it, and then process the resulting data to parse some specific information of interest.







Scraping is a programming skill that offers



immediate feedback



and can be used to automate various data collection and processing tasks.

Over the next



17+ hours



, we will methodically cover



everything you need to know



to write web scraping agents in Python.

This boot camp is organized in



three parts of increasing difficulty



designed to help you



progressively build your skill



.



Part I – Begin



We’ll start by



understanding how the web works



by taking a closer look at HTTP, the modern web’s key application layer communication protocol. Next, we’ll explore



HTML



,



CSS



, and



JavaScript



from the first principles to better understand how websites are built. Finally, we’ll learn how to use Python to send HTTP requests and parse the resulting HTML, CSS, and JavaScript to extract the needed data. Our goal in the first part of the course is to build



a solid foundation



in web scraping and Python and put those skills to practice by



building functional web scrapers from scratch



. Selected topics include:

  • a detailed overview of



    the request-response



    cycle
  • understanding user agents, HTTP



    verbs, headers, and statuses

  • understanding why custom headers can often be used to



    bypass paywalls



  • mastering the requests



    library to work with HTTP in Python
  • what



    stateless



    means and how



    cookies



    work
  • exploring the



    Role of Proxies



    in modern web architectures
  • mastering



    beautiful soup



    for parsing and data extraction











Part II – Refine



In the second part of the course, we’ll build on the foundation we’ve already laid to explore more advanced topics in web scraping. We’ll learn how to



scrape dynamic websites



that use JavaScript to render their content by setting up



Microsoft Playwright



as a headless browser to automate this process. We’ll also learn how to



identify and emulate API calls



to scrape data from websites that don’t have formally public APIs. Our projects in this section will include an image scraper that can



download a set number of high-resolution images



given some keyword and another scraping agent that



extracts the price and content



of discounted video games from a dynamically rendered website. Topics include:

  • identifying and using



    hidden APIs



    and understanding the benefits they offer


  • emulating headers, cookies, and body



    content with ease


  • automatically generating Python code



    from intercepted API requests using Postman and httpie
  • working with the



    highly performant select lax



    parsing library
  • mastering



    CSS selectors

  • introducing



    Microsoft Playwright



    for headless browsing and dynamic rendering



Part III – Master



In the final part of the course, we’ll introduce Scrappy. This will give us an excellent, time-tested framework for building more complex and robust web scrapers. We’ll learn how to



set up Scrappy within a virtual environment



and create



spiders and pipelines



to extract data from websites in various



formats



. Having learned how to use Scraps, we’ll then explore how to



integrate it with Playwright



so that we tackle the challenge of scraping dynamic websites from right within Scrappy. We’ll conclude this section by building a



scraping agent that executes custom JavaScript



code before returning the resulting HTML to Scrappy. Some topics from this section:







  • learning how to set up Scrappy and explore its command line interface (“



    the Scrapy tool



    “)
  • dynamically explore response objects using



    scrappy shell

  • understand and define



    item schemas



    and load data using



    item loaders



    and



    input/output processors

  • integrate Playwright into Scrappy to



    tackle dynamically rendered



    JavaScript sites


  • write PageMethods



    to specify highly



    specific instructions



    to the headless browser from right within Scrappy


  • define custom pipelines



    for saving into SQL databases and highly customized output formats

In this boot camp, I will take you



step-by-step



through engaging video lectures and teach you everything you need to know to start with Python web scraping.

By the end of this course, you will have a complete toolset to conceptualize and implement scraping agents for any



website you can imagine.

See you inside!

Who this course is for:

  • Anyone who wants to learn how to collect data from the web programmatically
  • Students with or without web scraping experience looking to level up
  • Complete beginners with no experience



Last updated 4/2023



English



English [Auto]

The Ultimate Web Scraping With Python Bootcamp 2023










If the links does not work, contact us we will fix them











Categories

Advertisement