A web scraper to scrape the content of some car auctions from Vavato.
Notes: you will need python and pipenv installed in your system.
- Clone the repository
- Install the dependencies with
pipenv install
- Run the script with
pipenv run python main.py
Some adjustemnts:
- You should change the destination folder
save_path_prefix
- You can filter the "categories" of auctions from the variable
allowed_auctions_roots
- You can change the request delay in order to not be blocked because of too many requests by changing the variable
request_delay
- In case a block happens, the seconds before retrying can be changed in the variable
retry_delay
. At every retry it gets doubled.
The script gets the auctions information and for each auction it gets the urls to the cars in the lots. Everything is divided into archived auctions and current/future actions. The result is an excel file with four sheets, one for the auctions and another for the car lots, for both archived and new auctions.
In example_result
some example files are available.
The script uses normal http requests to navigate the website and get the information. This is much faster than using for example Selenium. The websites returns content that is already rendered and does not use AJAX to load the content. This makes it possible to get the content of the pages with requests.
In particular, each page has in the end a tag <script type="application/json">
that contains the information of the page. This is the information that is used to get the number of pages and the content for each page.