Script made to scrape and download txt files from Fiserv
This script was made to solve a personal problem at work and to automate a repetitive task (downloading ~35 files manually, every month).
While I've tried to follow Python best practices, there may be bugs, typos, or things that could simply be improved (see end of readme).
I am not responsible for the use that is given to the script.
- Go to Fiserv and login
- Go to tab "Movimientos", then "Liquidación electrónica"
- Scraping filenames to know which ones to download (div//div//table//tbody//tr[n]//td[1]//b)
- Click button (div//div//table//tbody//tr[n]//td[5]//b) to dowload file
- Process the files with script procesar_liquidacion.py (private script 🤪)
First, we need to install all the necessary packages and libs on our system or in a virtual environment. In a terminal inside the project dir:
pip install -r requirements.txt
Config your credentials on a config.ini
file, located on root project dir. You have an example here to look the file structure.
Note: the logger can be configured as desired on base_logger.py file
To run the script, launch a terminal on the project directory and execute:
python src/extractor.py
To modify our crontab file on Linux, launch a terminal and run crontab -e
. Then add the following line:
0 12 * * * /path/to/code/script.py >> /path/to/some/log.txt 2>&1
This way the script would run every day at 12 o'clock, saving errors to a file called log.txt
Note: to make the script work with cron we need to add chromedriver to our path, add shebang and change the permissions of the .py file (sudo chmod a+x script.py), and maybe some other Linux "tricks" depending of our system 🥴
In order of importance:
- Improve performance and speed avoiding unnecessary
sleeps
andfor cicles
- Improve validations on
try-except
blocks - Implement and improve
check_if_exists
,check_local_files
anddelete_unnecessary_files
- Add commands (maybe with
python-click
) to choose number of files to download (adding pagination to the scraper) and also to set the log-level - Implement some tool to run the script avoiding using crontab
- Add AirFlow to automate the processing
Obviously, feel free to fork the code and make a pull request 😊
Made with ❤ and 🐍 by akalautaro.