/fiserv-extractor

Script made to scrape and download files from Fiserv

Primary LanguagePython

Fiserv scraper

Script made to scrape and download txt files from Fiserv

Disclaimer (?)

This script was made to solve a personal problem at work and to automate a repetitive task (downloading ~35 files manually, every month).

While I've tried to follow Python best practices, there may be bugs, typos, or things that could simply be improved (see end of readme).

I am not responsible for the use that is given to the script.

Steps to follow on the web

  1. Go to Fiserv and login
  2. Go to tab "Movimientos", then "Liquidación electrónica"
  3. Scraping filenames to know which ones to download (div//div//table//tbody//tr[n]//td[1]//b)
  4. Click button (div//div//table//tbody//tr[n]//td[5]//b) to dowload file
  5. Process the files with script procesar_liquidacion.py (private script 🤪)

Pre-configurations

First, we need to install all the necessary packages and libs on our system or in a virtual environment. In a terminal inside the project dir:

pip install -r requirements.txt

Config your credentials on a config.ini file, located on root project dir. You have an example here to look the file structure.

Note: the logger can be configured as desired on base_logger.py file

Run the script

To run the script, launch a terminal on the project directory and execute:

python src/extractor.py

Crontab example:

To modify our crontab file on Linux, launch a terminal and run crontab -e. Then add the following line:

0 12 * * * /path/to/code/script.py >> /path/to/some/log.txt 2>&1

This way the script would run every day at 12 o'clock, saving errors to a file called log.txt

Note: to make the script work with cron we need to add chromedriver to our path, add shebang and change the permissions of the .py file (sudo chmod a+x script.py), and maybe some other Linux "tricks" depending of our system 🥴

Things to improve

In order of importance:

  • Improve performance and speed avoiding unnecessary sleeps and for cicles
  • Improve validations on try-except blocks
  • Implement and improve check_if_exists, check_local_files and delete_unnecessary_files
  • Add commands (maybe with python-click) to choose number of files to download (adding pagination to the scraper) and also to set the log-level
  • Implement some tool to run the script avoiding using crontab
  • Add AirFlow to automate the processing

Obviously, feel free to fork the code and make a pull request 😊


Made with ❤ and 🐍 by akalautaro.