/SI664-scripts

Miscellaneous Python scripts

Primary LanguagePythonMIT LicenseMIT

SI664-scripts

Miscellaneous Python scripts

1.0 Install

Either fork this repo and then clone to your working directory or download a *.zip file of the code. Create a virtual environment and then run pip to install package dependencies listed in the requirements.txt file.

macOS

$ cd path/to/SI664/scripts
$ source venv/bin/activate
(venv) $ pip3 install -r requirements.txt

Windows

> cd path/to/SI664/scripts
> venv\Scripts\activate
(venv) >

Next, install the mysqlclient package. You must utilize Christoph Gohlke's collection of [Unoffical Windows Binaries for Python Extension Packages](https://www.lfd.uci .edu/~gohlke/pythonlibs/) to install the mysqlclient package. Download the appropriate the mysqlclient the wheel (*.whl) file. For Python 3.7 click on "mysqlclient‑1.3.13‑cp37‑cp37m‑win_amd64.whl" and it will download to your machine. Then perform a manual install of the package via pip:

(venv) > pip install C:\Users\someuser\Downloads\mysqlclient-1.3.13-cp37-cp37m-win_amd64.whl
Processing c:\users\someuser\downloads\mysqlclient-1.3.13-cp37-cp37m-win_amd64.whl
Installing collected packages: mysqlclient
Successfully installed mysqlclient-1.3.13

After the mysqlclient package is installed manually run pip install using requirements.txt to ensure that the remaining required packages are installed.

(venv) > pip install -r requirements.txt

2.0 Available scripts

2.1 run_mysql_script.py

This python script is designed to process MySQL scripts. The run_mysql_script.py script requires a valid database connection provided via local *.yaml config file. After opening a connection and creating a cursor, the script creates a list of SQL statements after splitting the SQL script on each semi-colon encountered (;). The script then loops through the statements, attempting to execute each. If successful, the script commits the changes, closes the cursor and then closes the connection. Otherwise, it rolls back the transaction and reports the error encountered.

2.1.1 Create a .yaml configuration file

Create a .yaml configuration file. The run_mysql_script.py script reads this file in order to retrieve the database connection settings. Add the following database connection settings to your .yaml file. Make sure you set the user and passwd variables to the correct values.

mysql:
  host: localhost
  port: [yer port number, typically 3306]
  user: [yer MySQL user]
  passwd: [yer MySQL user password]
  db: unesco_heritage_sites
  local_infile: True

2.1.2 Script arguments

The run_mysql_script.py features the following arguments:

  • -h, --help (show this help message and exit)
  • -c, --config (path to config file)
  • -p, --path (path to script)

2.1.3 Running the script

Run run_mysql_script.py as follows, tailoring the *.yaml and *.sql file paths as necessary:

macOS
(venv) $ python3 run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql
Windows
(venv) > python run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql

2.2 inspect_un_data_sets.py

Run inspect_un_data_sets.py to "inspect" two UN data sets included in the project /input directory:

  • un_area_country_codes-m49.csv
  • unesco_heritage_sites.csv

The script utilizes the Pandas library to peruse the data sets, generating a set of column-based *.csv files that contain distinct column values (duplicate values and NaN values are filtered out) sorted in ascending order. The files are stored in the project /output directory.

2.2.1 Running the script

macOS
(venv) $ python3 inspect_un_data_sets.py
Output
INFO: Source file read /absolute/path/to/input/un_area_country_codes-m49.csv
INFO: UNSD M49 regions written to file /absolute/path/to/output/unsd_region.csv
INFO: UNSD M49 sub-regions written to file /absolute/path/to/output/unsd_sub_region.csv
INFO: UNSD M49 intermediate regions written to file /absolute/path/to/output/unsd_intermed_region.csv
INFO: UNSD M49 countries and areas written to file /absolute/path/to/output/unsd_country_area.csv
INFO: UNSD M49 development status written to file /absolute/path/to/output/unsd_dev_status.csv
INFO: Source file read /absolute/path/to/input/unesco_heritage_sites.csv
INFO: UNESCO heritage site countries/areas written to file /absolute/path/to/output/unesco_heritage_site_country_area.csv
INFO: UNESCO heritage site categories written to file /absolute/path/to/output/unesco_heritage_site_category.csv
INFO: UNESCO heritage site regions written to file /absolute/path/to/output/unesco_heritage_site_region.csv
INFO: UNESCO heritage site transboundary values written to file /absolute/path/to/output/unesco_heritage_site_transboundary.csv
Windows 10
(venv) > python inspect_un_data_sets.py
Output
INFO: Source file read C:\path\to\input\un_area_country_codes-m49.csv
INFO: UNSD M49 regions written to file C:\path\to\output\unsd_region.csv
INFO: UNSD M49 sub-regions written to file C:\path\to\output\unsd_sub_region.csv
INFO: UNSD M49 intermediate regions written to file C:\path\to\output\unsd_intermed_region.csv
INFO: UNSD M49 countries and areas written to file C:\path\to\output\unsd_country_area.csv
INFO: UNSD M49 development status written to file C:\path\to\output\unsd_dev_status.csv
INFO: Source file read C:\path\to\input\unesco_heritage_sites.csv
INFO: UNESCO heritage site countries/areas written to file C:\path\to\output\unesco_heritage_site_country_area.csv
INFO: UNESCO heritage site categories written to file C:\path\to\output\unesco_heritage_site_category.csv
INFO: UNESCO heritage site regions written to file C:\path\to\output\unesco_heritage_site_region.csv
INFO: UNESCO heritage site transboundary values written to file C:\path\to\output\unesco_heritage_site_transboundary.csv