- Clone the project
git clone <repo>
- Go to the project directory
cd <repo>
- Install dependencies
pip install -r requirements.txt
-
Edit config file - add URLs with pre-selected categories (simply go to the main page and use filters and then copy URL) + add your email(s) if you want to send messages via email or telegram. Also add the name of the file you want to save links to, basic 'links.txt' will do. This is mainly if you're running 2+ instances of the app as a cron job and each needs its own 'DB'.
-
If using telegram to send messages, you need a) to manually set up a telegram bot, and b) create a folder and json file -
authentication\telegram.json
- with the newly created bot's info.bot_id
as the API key given by Telegram andchat_id
of the group created, where you add the bot and the users you want to notify. If unsure, google how to find chat ID. -
Run the app
python run.py
This results in scraping everything once into a .txt file. The file then serves as a "database" so it is updated every time the scraper runs. Proprietary solution.
- Create free AWS account
- Create Amazon Linux t2.micro EC2 (free for ~1 year) and create a private key
- Connect to instance in the browser (using the "connect" option in the instance menu)
- Download WinSCP and connect to the server via the session (key created in step 2. needed) - see tutorial
- Copy the folder cloned from git into the server using WinSCP
- In the server command line, first install chrome
sudo curl https://intoli.com/install-google-chrome.sh | bash
sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
google-chrome --version && which google-chrome
- then install pip3:
sudo curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py --user
pip3 --version
- Install all requirements
pip3 install -r requirements.txt
-
Edit config file - add URLs with pre-selected categories (simply go to the main page and use filters and then copy URL) + add your email(s)
-
Run cron job (every 15 minutes default, edit the file in notepad for different settings)
crontab run.cron
- Verify crontab is running - the cronjob should be visible after the following command
crontab -l