This is a basic script that performs the following functions:
- Accepts input parameters for data filtering
- Connects to a mySQL database and extracts data from a single table for NY cab data
- Load a csv file for Weather data
- Transforms the mySQL and CSV data into a single aggregated view by date
- Exports the result to a pipe delimited CSV file
- Upload the generated csv file to Azure Blob storage
The script is dependent on 3 input parameters:
- Start Date - Mandatory - yyyy-mm-dd
- End Date - Mandatory - yyyy-mm-dd
- Medallion - Optional - Must be a comma seperated list eg. xxx,xxx
The script can be executed in 2 ways:
- Local runtime; and
- Docker
- Run pip install -r requirements.txt
- Configure the database_config.json file to point to your mySQL datasource
- Execute the assessement.py script
python assessment.py 2013-12-01 2013-12-31 D7D598CD99978BD012A87A76A7C891B7,5455D5FF2BD94D10B304A15D4B7F2735
- To run docker, ensure you have docker-compose installed
- Change directory to the root of the project folder where the docker-compose.yml file is located
- Excute docker up command
docker-compose up db
Wait a few minutes for the database to be created and initialized with the taxi data. The create script locks the tables, so the app won't be able to access the database. After a few minutes run the application
docker-compose up app
If both containers have already been created before the following can command can be executed at any time
docker-compose up
To change the arguments in the docker run, open the docker-compose.yml file and change the parameters as below
command: python assessment.py 2013-12-01 2013-12-31
With both methods, an app.log file will be generated and a results.csv file will be generated if no errors occurred
If the AZURE_STORAGE_CONNECTION_STRING environment variable is set, the output file will be uploaded to Azure Blob storage
In order to run the unit test, follow the follow step:
- pip install -r requirements-test.txt
- cd to project directory
- Execute pytest
pytest
13 Tests will be executed
tests\test_functions.py ............. [100%]
======================================================= 13 passed in 0.71s ============================================