I built this project to enhance my Python skills after a long of time without coding
it's a web scraper that collects information on the pdfdrive.com site and then saves it in a file and in a mongodb database
- Install requirements
pip3 install poetry
- Laucch Spider
Before changing
.env
to your URI MongoDB and Redis
poetry install && cpdfdrive && poetry run scrapy crwal pdfdrive
docker pull darixsamani/pdfdrive
docker run -it -e MONGO_URI="mongodb://localhost" -e MONGO_DATABASE="pdfdrive" -e REDIS_HOST="localhost" -e REDIS_PORT=6379 -e REDIS_PASSWORD="" darixsamani/pdfdrive