US East (N. Virginia) us-east-1
Key File | Description |
---|---|
cleaning.py | Contains code to create the accumulative file for input into the database. This will need to be run after each year. |
cumulative.sql | Contains the sql code to create the database. |
run_mysql_script.py | Contains code to create the accumulative file for input into the database. This will need to be run after each year. |
In root directory:
pip install virtualenv #if you don't have virtualenv installed
Create virtualenv
virtualenv <Name_of_Virtual_Environment>
Activate virtualenv
source <Name_of_Virtual_Environment>/bin/activate
Install project requirements usings the requirements.text
pip install -r requirements.txt
Use (MySQL Workbench)[https://www.mysql.com/products/workbench/] to configure a database quickly for the upcoming steps. After configuring mysql, open mysql workbench and do the following:
- Create a connection. Make sure to remember all the information you enter in here! This is the information you'll be entering in the
config/my.yaml
file. - Enter the information you used to create a connection into a *.yaml file in the
config
folder. The rest of this setup assumes you've used root as your username (which isn't a good practice). Be warned - (Optional) Create a database/schema named
cio4good
INSIDE the connection you just created
- Run the
cleaning.py
file.
python cleaning.py
- Run
run_mysql_script.py
as follows, tailoring the *.yaml and *.sql file paths as necessary (see Troubleshooting below if errors occur with loading data in this step):
macOS
python3 run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql
Windows
python run_mysql_script.py -c ./path/to/config/file/*.yaml -p ./path/to/sql/script/*.sql
so
python run_mysql_script.py -c ./config/my.yaml -p ./cumulative.sql
- In your terminal run
mysqldump
to create a.sql
file
sudo mysqldump cio4good cio4good -p > mySQLdump.sql
This way works if your username/user is root.
Now we have to push our MySQL database to AWS. Here's the reference material for this section
- Run the aws connection string with to input the new database.
mysql -h <awsendpoint> -P 3306 -u <mymasteruser> -p
- Double check some queries to make sure that the import was successful and all fields mapped.
Example:
mysql -h <HOSTNAME> -P <PORT> -u <MASTER_USER> -p <DATABASE> < <PATH/TO/DUMP.SQL>
This python script is designed to process MySQL scripts. The run_mysql_script.py script requires a valid database connection provided via local *.yaml config file. After opening a connection and creating a cursor, the script creates a list of SQL statements after splitting the SQL script on each semi-colon encountered (;). The script then loops through the statements, attempting to execute each. If successful, the script commits the changes, closes the cursor and then closes the connection. Otherwise, it rolls back the transaction and reports the error encountered.
Create a .yaml configuration file. The run_mysql_script.py script reads this file in order to retrieve the database connection settings. Add the following database connection settings to your .yaml file. Make sure you set the user and passwd variables to the correct values.
mysql: host: localhost port: [yer port number, typically 3306] user: [yer MySQL user] passwd: [yer MySQL user password] db: unesco_heritage_sites local_infile: True
The run_mysql_script.py features the following arguments:
-h, --help (show this help message and exit) -c, --config (path to config file) -p, --path (path to script)
Make sure to validate the data after running the cleaning.py
file to create the agrgregate CSV file for input into the database. There is a known issue with the # of Organization Staff
field putting data from 2001 in a different column of it's own toward the beginning and the rest in a duplicate column in subsequent years. Just move them over into the later column.
If you run into issues with step #2 of creating a MySQL database, verify that the data is being loaded from a local infile (in run_mysql_script.py
) and the mysql 'local_infile' variable is set to ON (mysql> SET GLOBAL local_infile=1
). (Reference: https://stackoverflow.com/questions/59993844/error-loading-local-data-is-disabled-this-must-be-enabled-on-both-the-client)