/finn_scraper

Data mining and analysis for old houses for sale.

Primary LanguageJupyter Notebook

Finn Scraper

This application scrapes, stores and notifies the user through Discord bot of scraping process. I wrote it to collect data for a data analysis task as finn doesnt show more than 50 pages of house at a time so it collects new house data everyday. I have done some preliminary data exploration in the jupyter notebook but there wasnt enough data make prediction models so I wrote a script that can collect new data every day.

What it does

  • Mines house on sale data from finn
  • Stores the data in a database in datastore/ directory
  • Notifies the user over discord on how many houses has been scraped
Tested on
  • Arch Linux on a Laptop
  • RaspberryPi OS on RaspberryPi 4B

The project also includes a bash script cronjob.sh that can be used to periodically run the application using crontab on linux. Im using pipenv for package and environment management, TinyDB for database and Discord's webhook to send message to discord server. The webhook needs to be saved in a .env file in the project directory as an environment variable named DISCORD_WEBHOOK, pipenv will load the environment variable.

TODO:

  • Deterministic Build System
  • More robust scraping error reporting
  • Scraping Job posts
  • Dashboarding