
Primary LanguagePythonOtherNOASSERTION


WARNING: This project is no longer under active development and further improvements are by customers and their partners.

Initial Setup

Enable APIs

Enable the following APIs & Services in your GCP Project:

You can ignore the prompt to create credentials for each API.

Environment Setup

  • Confirm the gcloud configuration to confirm project_id and region.

    gcloud config list

    If the intended project is not set, then

    gcloud config set project <project name>
    gcloud auth login
  • Edit setup/set_env.sh and configure with desired properties.

  • Setup required environment variables in your console session:

    source setup/set_env.sh
  • Setup "Private IP Google Access" on the network in the region to be used. This allows Dataflow to spin up CPUs without assigning external IPs.

    gcloud compute networks subnets update default \
        --region ${GCP_REGION} \
  • Create the GCS Bucket and BQ Dataset:

    gsutil ls -L "gs://${GCP_BUCKET}" 2>/dev/null \
        || gsutil mb -c regional -l "${GCP_REGION}" "gs://${GCP_BUCKET}"
    bq --location=${BQ_LOCATION} mk --dataset \
        --description "${THIS_PROJECT} working dataset." \
  • Create a Cloud Composer environment (This step will take ~40min):

    gcloud composer environments create $GCP_COMPOSER_ENV_NAME \
        --location=$GCP_REGION \
        --disk-size=20 \
        --python-version=3 \
  • Update the installed dependencies with requirements.txt (This step will take ~45min):

      gcloud composer environments update $GCP_COMPOSER_ENV_NAME \
        --location=$GCP_REGION \

Grant service account permissions

  • Create an Automl Service Account:

      gcloud iam service-accounts create service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com \
          --description="Auto ML service" \
          --display-name="Auto ML service"
  • Grant Owner permissions to the default compute service account:

      gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
        --member="serviceAccount:${GCP_PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  • Grant Data Viewer permissions to the AutoML Tables service account:

      gcloud projects add-iam-policy-binding $DATA_STORAGE_PROJECT \
        --member="serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com" \
      gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
        --member="serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com" \

Setup Cloud Composer Variables

  • If kubectl is not installed run the following command :

    sudo-apt get install kubectl
  • Copy the generated variables.json file to the environment and import it:

    gcloud composer environments storage data import \
      --environment=$GCP_COMPOSER_ENV_NAME \
      --location=$GCP_REGION \
    gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      variables  -- --i /home/airflow/gcs/data/variables.json
  • Upload the features to the composer environment(modify the features.json file, if needed):

    gcloud composer environments storage data import \
      --environment=$GCP_COMPOSER_ENV_NAME \
      --location=$GCP_REGION \
    gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      variables -- --i /home/airflow/gcs/data/features.json

Generate ML Windowing Pipeline Templates:

  • In a working directory on your local machine, clone the "Cloud for Marketing" git repository and initiate the build (This step will take ~5 min):

      git clone https://github.com/GoogleCloudPlatform/cloud-for-marketing.git
      cd cloud-for-marketing/marketing-analytics/predicting/ml-data-windowing-pipeline/ && \
      gcloud builds submit \
        --config=cloud_build.json \
        --substitutions=_BUCKET_NAME=${GCP_BUCKET} && \
      cd ../../../../

Update DAGs

  • Get the GCS location associated with your Composer instance:

    export BB_DAG_BUCKET=$(
      gcloud composer environments describe $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      --format "value(config.dagGcsPrefix)")
  • Copy the contents of the dags folder to that location:

    gsutil -m cp -r ./dags/* ${BB_DAG_BUCKET}
  • Verify the DAGS loaded correctly. Run the following:

      gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
        --location $GCP_REGION \

    If you completed all above steps correctly, the result will include:

  • Open the Airflow UX. You can find the URL in the Cloud Composer page in GCP, or run the following to get the URL (Note it may take 1-2 minutes for the UX to show the new DAGs):

      gcloud composer environments describe $GCP_COMPOSER_ENV_NAME \
        --location $GCP_REGION \
        --format "value(config.airflowUri)"