This is a simple web scraper to download text from a website. The content will be written to two text files in the output
folder.
- Content.txt
- Headers.txt
The environment file contains 3 values
BASE_URL=Base URL of the content
HEADER_SLUGS=Slug containing the headers
CONTENT_SLUGS=Slug containing all the content
I use PyCharm as my IDE. Really nice UI, similar to Rider. You can also just run the solution from within PyCharm.
- Download and install Python3
- Install required packages using -
pip3 install -r requirements.txt
- Make sure you have a folder called
output
at the same level asmain.py
- Run the command -
python3 main.py
This guide contains guides for all operating systems.