Web scraper utilizing scrapy to scrape live stock prices from the Nairobi Stock Exchange. The prices are then saved in Postgres Database after each scrape, we use sqlalchemy as ORM and psycopg2 as database connector.
The accompanying article(part one) can be found here Part Two detailing deployment and notification here
The actual platform we are scraping is afx website.
- Python and pip (I am currently using 3.9.2) Any version above 3.7 should work.
- An Africas Talking account.
- Api Key and username from your account. Create an app and take note of the api key.
- Postgresql Database.
Clone this repo
git clone https://github.com/KenMwaura1/stock-price-scraper
Change into the directory
cd stock-price-scraper
Create a virtual environment (venv) to hold all the required dependencies.Here we use the built-in venv module.
python -m venv env
Activate the virtual environment
source env/bin/activate
Alternatively if you are using pyenv.
pyenv virtualenv nse_scraper
pyenv activate nse_scraper
Install the required dependencies:
pip install -r requirements
Change into the nse_scraper folder and create an environment file.
cd nse_scraper
touch .env
Add your credentials as specified in the example file.
OR
Copy the provided example and edit as required:
cp .env-example env
Navigate up to the main folder stock-price-scraper Run the scraper and check the logs for any errors .
cd ..
scrapy crawl afx_scraper
or Run the scraper and have it output to a json file to preview.
scrapy crawl afx_scraper -o test.json
Heroku account registration First and foremost, you would need to register an account with Heroku, it’s free! Installing Heroku CLI After your account registration, let’s use Heroku CLI to create and manage our project. You may check out the installation steps for other OS here.
# for arch-linux
sudo pacman -S heroku
To log in using Heroku’s CLI, simply cd to your project folder and run heroku login
.
$ cd nse_scraper
$ heroku login
heroku: Press any key to open up the browser to login or q to exit:
Opening browser to https://cli-auth.heroku.com/auth/cli/browser/xxxx-xxxx-xxxx-xxxx-xxxx?requestor=xxxx.xxxx.xxxxLogging in... done
Logged in as &*^@gmail.com
...
Checkout the heroku_deployment
branch.
git checkout heroku_deployment
At this stage, you should already have set up your project with git init and git commit etc. The next steps are to turn your project into a git repository and push it to Heroku.
# i. To create a Heroku application:
$ heroku apps:create daily-nse-scrape
# ii. Add a remote to your local repository:
$ heroku git:remote -a daily-nse-scraper
# iii. Deploy your code
$ git push heroku heroku_deployment:main
Tweak the project name as necessary.
This step is fairly simple, simply go to the ‘Resources’ tab on your Heroku dashboard and look for ‘Heroku Postgres’, select the free tier (or whichever you deem fit). Finally, adjust your Scrapy project’s connection to your database accordingly.
# settings.py
# POSTGRES SETTINGS
host = os.environ.get("POSTGRES_HOST")
port = os.environ.get("POSTGRES_PORT")
username = os.environ.get("POSTGRES_USER")
password = os.environ.get("POSTGRES_PASS")
database = os.environ.get("POSTGRES_DB")
drivername = "postgresql"
DATABASE = f"{drivername}://{username}:{password}@{host}:{port}/{database}"
# Or alternatively:
DATABASE_CONNECTION_STRING = ‘postgres://xxxx:xxxx@ec2-xx-xxx-xxx-xx.compute-1.amazonaws.com:5432/xxxxxx
Ensure you add your configuration variables in ‘Settings’ → ‘Reveal Config Vars‘. This will allow Heroku to get and set the required environment configuration for our web scraper to run.
This section of the article shows you how you can run your crawlers/spiders periodically. Though Heroku offers several different schedulers that could run your application periodically, I personally prefer ‘Heroku Scheduler’ as it has a free tier and it is super simple to use.
To use the free tier of this add-on, Heroku requires you to add a payment method to your account.
- Configuration Inside your newly added ‘Heroku Scheduler’ add-on, simply select ‘Add Job’ on the top right corner and you should see the screen as shown in the picture beside.
To run thescrapy crawl afx_scraper
command periodically, simply select a time interval and save job.
2. How do I schedule a daily job?
Simply configure our ‘Heroku Scheduler’ to run our python script every day at a specified time. In our case its every hour at 10 minutes. Then it should run our command.
scrapy crawl afx_scraper
Now we need add a scheduler for Heroku to run our notifiction script which will inturn send us texts. Since we already have an instance of Heroku running in our app we need an alternative. Advanced scheduler is a good option as it offers a free trial and if need be a $5 per month for an upgrade.
-
Setup Inside our daily-nse-scraper app, search for the advanced scheduler addon. Select the trail-free plan and submit order form.
-
Configuration
Click on the Advanced Scheduler addon. Inside the overview page. Click on Create trigger button. The free trial allows up-to 3 triggers.
We'll set a trigger for 11.00 am each day, specify the command python nse_scraper/stock_notification.py
to run. Remember to select the correct timezone in my case its Africa/Nairobi and save the changes.
3. Testing
To ensure the scheduler will execute as expected. We can manually run the trigger: on the overview page, click on the more button and select execute trigger.
You should now have received a notification text if everything went as expected.