python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python ./main.py 2010-01-01 2024-06-19
- Inspect https://www.investing.com/economic-calendar/ with Chrome Dev Tool
- Since it is a infinite scroll website, there should be an API for the frontend to call
- Found the API at
/economic-calendar/Service/getCalendarFilteredData
- Use
dateFrom
anddateTo
to request data from a time range. - Use
limitFrom
to loop through the pagination - The response from the API is a JSON, but the actual data is rendered in HTML
- Handle invalid data such as date separators and holiday
- Scale the scraper
- Modular design, separate network request, parsing, and storage.
- Communicate via task queues
- Data integrity check
- Retry and resume from failed jobs
- Stream processing instead of bulk