18 and older
Have you ever visited a website rich with the data you want, but the website provided no "download" button for you to retrieve it in a convenient way? Or have you ever found the perfect source for the CSV or other data files you need, but you must tediously click the "next" and "download" buttons dozens or even hundreds of times to get what you need?
In this course, you'll learn how to harness Python to automate and streamline data collection from sites that require logins, have tables, and more to help make your job easier and more efficient. We'll also discuss the ethics surrounding these practices, so you understand when it's okay to use scraping and when you need to find an alternative.
Whether you work with data for personal, professional, or academic reasons, you'll walk away with a concrete new skill that helps you automate and streamline cumbersome tasks.Takeaways
Prereqs & Preparation
- Explore the ethical debate surrounding web scraping
- Understand how web scraping works and why Python is an excellent tool to programmatically extract data from websites
- Gain practice scraping web pages with Python using Requests, BeautifulSoup, and Selenium
- Learn how to properly format and store the scraped data as a CSV
What to bring to class:
- This workshop is designed for students with a basic knowledge of Python, or experience programming in another language.
- Anyone who has taken a Python workshop at GA will be well-equipped for this workshop, but self-taught learners and anyone who is willing to follow along are welcome!
- Knowledge of basic HTML syntax will be useful, but is not required.
- All students must bring their own laptops with an installation of Anaconda 3.6, a free distribution of Python that includes libraries of open source Python tools.
- In case of technical difficulties on your local computer, opening an account on Google Colaboratory, a cloud-based Python environment, is highly encouraged.
For students enrolling in 12 week part time and immersive classes, it is not recommended that you book more than one class simultaneously.