Web scraping project aimed at collecting data for further Analysis.
Project | Description | Source |
---|---|---|
Cryptocurrency Prices Dataset | Price, Volume, Market Cap, CMC Rank | https://www.coinmarketcap.com |
H&M Clothing | Products and price data | https://www2.hm.com/en_us/index.html |
Daraz | Category wise Products and price data | https://www.daraz.com.bd/ |
Evaly | Products and price data | https://evaly.com.bd/ |
Pharmeasy | Products and price data | https://pharmeasy.in/ |
Pickaboo | Products and price data | https://www.pickaboo.com/ |
Ryans | Laptop Price and specification Data | https://www.ryanscomputers.com/ |
A few of these projects use browser automation, but most do not. For this, I use Selenium to automate the browser. Other libraries that are used include:
- Requests
- BeautifulSoup
- Selenium
- Scrapy
- Pandas
These projects are designed to give you experience web scraping, but assume that you have some basic familiarity with at least Requests and BeautifulSoup. Selenium is not used extensively enough to need familiarity, but you will need to install it on the few projects that require it.
While I will try to keep these projects updated, please keep in mind that websites can change at any time, rendering an existing scraper useless. This is unfortunately the nature of webscraping. Your production models will require constant attention and maintenance to ensure they are delivering the data and results that you expect.