Using an API within a Cloud Function to load content in Google Cloud Storage.
In this case I am using the Google Trends API (Pytrends).
There are 2 folders in this repo:
-
Option A folder is for programmers that know their code works perfectly on their pc, and they want to get the very same .csv files in a Cloud Storage bucket. The fast, cool way.
-
Option B is for programmers that for some reason, need to check things out, want to run the script locally and also reach the bucket loading the csv into it. Once the script runs as you want, it will work fine within the Cloud Function.
Working with objects:
- Option C: Suppose you have a pipeline within a Cloud Function that generates pictures (with Matplotlib for instance), and you need to send them to Cloud Storage to work with them.
If that's your case, use this script (copied literally from the Google Documentation (good job, Google)). Save your generated figures in ../tmp/ and then use this function.
Option A:
Click to expand
- You need a Cloud Storage bucket ready to play with. If you do not have it, check it out here => https://github.com/albertovpd/automated_etl_google_cloud-social_dashboard
Now to the code itselt: Make the script work on your PC. When it works as you want (check comments in pytrends_request.py):
- Activate the first and last line of the script.
- Zip all the scripts in option_a.
- Upload that zip to the Cloud Function (take a glance to the link provided before).
Option B:
Click to expand
-
You will need service account credentials: A JSON file with permissions. Here (in the corresponding section) is described why and how to => https://github.com/albertovpd/automated_etl_google_cloud-social_dashboard
-
Once you have your beautiful JSON, do not forget to have a .gitignore file. Open it, write there .json and also .env in other line. Now, in the .env file write your environment variables:
TOKEN_NAME="the name of your credentials.json" PROJECT_NAME="your project in Google Cloud" PROJECT_PATH="your bucket in GCS/the_csv_name_there.csv" PROJECT_TMP="../tmp/the_csv_you_will_create.csv"
-
Follow the architecture of the folder option_b_gcsfs_library: everything in the same folder, all outputs pointing to "../tmp" (that folder will be generated by the Cloud Function).
Once the code works as you want:
-
Zip the elements of this folder (not the folder, the scripts) with requirements.txt, your json credentials and upload it to GCF.
-
When creating the Cloud Function, in Advanced, write your environment variables (TOKEN_NAME,TOKEN_PATH...And their values. All, names and values without declaring string type (without " "))
-
You need to first run manually on your PC the code, get the CSV and upload it to the bucket in GCS, it is more like overwriting a file than writing it.
Warning: The problem with Google Trends API.
Click to expand
Check info about it, for example, here:
That means Pytrends is useless? Of course not. It looks like the following:
-
Google trends searches the maximum on the specified period, makes that maximum the 100% of Trend Index and everything else is averaged by that top. If you request information weekly, you will have a point with 100% of Trend Index each week.
-
If you request a list of elements, all elements will be averaged by the top one.
-
If you request each of your keywords separately, each keyword will be averaged on time by its own top.
This Cloud Function is part of a full ETL in Google Cloud. Complete instructions available in the following links:
-
Without Dataprep => https://github.com/albertovpd/automated_etl_google_cloud-social_dashboard
-
With Dataprep before it was stupidly expensive => https://towardsdatascience.com/creation-of-an-etl-in-google-cloud-platform-for-automated-reporting-8a0309ee8a78