Of course the first step to any data science project is to get the data.
You can get the data in 3 ways: 1) Download (easy) 2) API (medium) 3) Scrape (Hard)
See Data sets page.
- Google Maps,
- Facebook (only your friends)
Via Scraping (really hard)
Scraping can be done in Python via the Beautiful Soup package. Basically, it uses a web browser to open web pages, and inspects the HTML. You can use regex inside the HTML to find data within certain fields to return the data you are interested in.
Other viable packages to scrape include:
Some webpages might look different depending on the browser so many of the packages include or need a “web driver”. Some also might need to interact or click buttons.