Abstraction of Archit Sharma's procedural scraper: Selenium Python scraper to collect photos of different rooms for training ML models to distinguish between {bathroom, kitchen, living room, outside, bedroom, terrace} types.
This Scraper
class is an abstraction and improvement over Archit Sharma's original procedural Selenium Python scraper. The original code was designed to collect photos of different rooms for training machine learning models to distinguish between various room types like bathroom, kitchen, living room, etc. This class-based version aims to offer the same functionality but in a more reusable and maintainable format.
- Object-Oriented Design: The original procedural code has been refactored into a class-based design, making it easier to integrate into larger, multi-functional projects.
- Modularization: Methods like
get_images_from_google()
anddownload_image()
encapsulate specific functionalities, improving code readability and maintainability. - Error Handling: Improved error handling mechanisms have been implemented in the class methods.
- Code Reusability: The class-based structure allows for easy reusability across different scraping tasks without duplicating code.
Instantiate an object of the Scraper
class and call its do_scrape()
method with the appropriate query file path.
scraper = Scraper()
scraper.do_scrape("path/to/query_file.csv")
- Selenium
- Pandas
- urllib
- requests
- PIL (Pillow)
- time
- os
- io
- datetime
MIT License
We plan to extend the functionality of this class to include more advanced features, such as AI-driven scraping based on user behavior and preferences.
Initializes the webdriver and sets the maximum number of images to scrape and the delay between actions.
Fetches image URLs based on the query string.
Downloads an image from a URL and saves it to a specified path.
Reads a CSV file containing queries and download paths, then performs the scraping.
- The script may encounter issues if the initial user interaction with the browser is not handled properly.
Feel free to contribute to this project to make it more robust and feature-rich.