The DentalStall Scraper Tool is a Python-based application developed to automate the process of scraping product information from the DentalStall online shop. Utilizing the FastAPI framework, this tool efficiently extracts product names, prices, and images from the catalogue pages without the need to open each product card individually. The scraped data is stored locally as a JSON file, with flexibility built into the design for easy adaptation to other storage strategies.
- Selective Scraping: Ability to limit the scraping process to a specified number of pages.
- Proxy Support: Option to use a proxy for scraping, enhancing privacy and bypassing potential scraping blocks.
- Local Storage: Scraped data is stored in a JSON format on the local storage, with an easy option to switch to other storage methods.
- Notification System: At the end of each scraping cycle, the tool notifies designated recipients about the number of products scraped and updated in the database, with flexibility for different notification strategies.
- Data Integrity: Ensures type validation and data integrity using appropriate methods for efficient data processing.
- Retry Mechanism: Implements a simple retry mechanism for handling server errors during the scraping process.
- Authentication: Secures the endpoint with simple authentication using a static token.
- Caching: Utilizes an in-memory database for caching scraping results, avoiding unnecessary updates for products whose prices have not changed.
POST /scrape
: Initiates the scraping process for a specified number of pages or products.
SMTP_USERNAME="deployprojects@gmail.com"
SMTP_PASSWORD=''
DENTAL_BASE_URL="https://dentalstall.com/shop/"
SMTP_SERVER='smtp.gmail.com'
SMTP_PORT=587
- Clone the repository:
git clone https://github.com/imsahil007/dental-scraper-fastapi.git
- Navigate to the project directory:
cd dental-scraper-fastapi
- Install the dependencies:
pip install -r requirements.txt
- Set env params:
set -a && source .env
- Run the application:
uvicorn settings.main:app --reload
This will start the FastAPI application, and you can access the documentation of the API by navigating to http://127.0.0.1:8000/docs
in your web browser.
Before running the scraper, ensure to configure the following settings in the config.json
file:
pages
: The maximum number of pages to scrape. Takes integer inputproxy
: The proxy string to use for scraping. Takes IPx_token
: The static token for authenticating the endpoint. Takes valuesome_static_token
. Gives 401email
: Where you want to notify the user
http://localhost:8000/docs
The application can be easily containerized using Docker, simplifying deployment and execution.
To build and run the application in Docker:
- Build the Docker image:
docker build -t dentalstall-scraper .
- Run the Docker container:
docker run -p 8000:8000 dentalstall-scraper