We would like to analyze the market data for indices and track daily market changes.
We will leverage the Deriv API to retrieve the necessary market data. The Deriv API provides a way to fetch market data for indices and various other assets. Refer to the API documentation here for more details.
Data of Interest:
- Dimensions: Symbol, Country, Asset
- Closing tick Our primary interest is capturing closing tick data for a select group of actively traded symbols, which will be stored in BigQuery.
- Daily candle data for the past 30 days for each symbol.
- The plan is to extract market data, store it in BigQuery for further analysis, and create a dashboard in Data Studio.
- To start, we will create a data pipeline that fetches the last tick for some active symbols and stores them in GCS (which will act as our data lake). We also plan to store the historical data for the last 30 days for each symbol, then load it into our data warehouse, BigQuery.
- This data pipeline will be created using Google Cloud Composer, a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.
- Next, we'll model our data using dbt, a command-line tool that allows data analysts and engineers to effectively transform data in their warehouses.
- Finally, we will create a dashboard in Data Studio to visualize the data.
- Check the dashboard here 📈📊.
A Google Cloud Platform account. If you do not have a GCP account, create one now from here.
- The gcloud CLI installed locally.
- Terraform 0.15.3+ installed locally.
- Docker installed locally.
Check the full architecture here
-
run:
gcloud auth application-default login
-
Display the project IDs for your Google Cloud projects:
gcloud projects list
-
Using the applicable project ID from the previous step, set the default project to the one in which you want to enable the API:
gcloud config set project YOUR_PROJECT_ID
-
Display the project Number for your Google Cloud projects:
gcloud projects describe YOUR_PROJECT_ID
-
Open
terraform/terraform.tfvars
in your text editor, and paste in the configuration below. Be sure to replace <PROJECT_ID> with your project's ID, and <PROJECT_NUMBER> project with your project's Number then save the file. -
Open
terraform/variables.tf
in your text editor, replace <market_data_bucket> default value with your selected name then save the file. (as bucket names should be unique across all GCP projects) -
Enable Compute Engine API:
gcloud services enable compute.googleapis.com
-
Enable the Cloud Composer API:
gcloud services enable composer.googleapis.com
-
Go to the
terraform
directory:cd terraform
-
Run
terraform init
to initialize the Terraform configuration. -
Run
terraform plan
to view the resources that Terraform will create. -
Run
terraform apply -auto-approve
to create the resources. -
Run
terraform show
to view the resources that Terraform created.
Due to Quotas limit on the size of environments and amount of workers, you may need to run the dags in batches by enabling them one by one. Avoid running all the dags at once, as it may exceed the worker quota limit and cause the worker environment to fail.