All query information is saved in bigquery_config.json files and can be configured there.
Please procced through all steps for script to work properly:
- Set up virtual environment
- pip install -r requirements.txt
- install MySql server and write host, user, passwords in sql_config.json. If your OS is Ubuntu you can use this. Make sure to set your user's authentication method to "mysql_native_password". The default plugin is "auth_socket". CSV data insertion doesn't work for now.
- Donwload google api credentials for your project
- Create bucket in Cloud Storage
- Set target_project_id according to your credentials and bucket_name according to step 4 in biqquery_config.json
- Set target_dataset_id, target_table_id, target_column in biqquery_config.json how you wish
big_query_manager.py
- cast_to_timestamp() used for casting "repository_creeated_at" to timestamp type for later querying purposes
- export_table_to_storage() used for exporting table to Cloud Storage, if table bigger than 500mb it'll be chunked
- create_query_table() query a table and save result in a new table
- download_blob() Download directory from Cloud Storage
- get_date_range() Query date interval and download results
- create_sql_from_table() Create a "Create TABLE" script with table schema from "target_table_id"
sql_table_manager.py: A simple SQL manager suitable for this project's purposes only
TO-DO:
- insert csv files into sql table