/GCPNotes

Quick Notes for Google Cloud Platform

Google Cloud Platform Reference

Notes

API Documents

  • BigQuery (bq)
    • BigQuery Machine Learning (BQML)
  • Storage (gsutil)
  • Google Cloud (gcloud)
    • Project (gcloud projects): config settings
    • Compute Engine (gcloud compute): firewall rules, backend services
    • Pub/Sub (gcloud pubsub)
    • Cloud function (gcloud function)
    • Container (gcloud container): K8s clsuters
    • IAM (gcloud iam)
    • IoT (gcloud beta iot)
  • Vision (Vision Annotation)
    • Restful API
  • Speech (Speech to Text)
    • Restful API
  • Translation (Translation from one language to another)
    • Restful API
  • Natural Language (Classify text into categories)
    • Restful API
  • ML Engine (a.k.a. gcloud ai-platform .)
  • Video Intelligence
    • Restful API
  • Datalab (datalab)

Topics

Machine Learning / Deep Learning / Artificial Intelligence

A workflow shows how to do an End-to-End ML or AI works on the Google Cloud Platform.

Big Data / Data Engineering

Dataprep: Data Transformation Pipeline via Trifacta
   |
Dataflow: Batch or Streaming Data Processing Pipeline
   |
Dataproc: Hadoop or Spark Computing Core
  • Dataprep: Qwik Start (GSP105)

    • This tutorial helps you preprocess datasets on Dataprep that is actually a data wrangling tool named Trifacta.
  • Dataprep: Creating a Data Transformation Pipeline with Cloud Dataprep (GSP430)

    • This tutorial guides you to use the Dataprep module (actually is Trifacta) preprocessing a BigQuery table and then exporting the processed results back into a new table in BigQuery.
  • Dataflow: Qwik Start - Templates (GSP192)

    • This tutorial guides you to use a template in Dataflow to process the dataset in BigQuery and to insert the processed data into a new table in BigQuery.
  • Dataflow: Qwik Start - Python (GSP207)

    • This tutorial guides you to run a Python script on Dataflow, it processes the dataset on a bucket and then exports the result on the bucket.
  • Dataflow: Run a Big Data Text Processing Pipeline in Cloud Dataflow (GSP047)

    • This tutorial guides you to run a processing task on Dataflow using a given Maven project.
  • Dataproc: Qwik Start - Console (GSP103)

    • This tutorial guides you to use Dataproc, that is a cloud service for Hadoop or Spark, and to submit a job running on it.
  • Dataproc: Qwik Start - Command Line (GSP104)

    • This tutorial guides you to use the shell commands operating a cluster on Dataproc and submitting a job running on it.
  • Cloud IoT Core: Building an IoT Analytics Pipeline on Google Cloud Platform (GSP088)

    • The tutorial shows you how to operate the Cloud IoT Core module as well as its components (registries and devices, the devices manager and the protocol bridge).
    • The tutorial guides you integrating Cloud IoT Core with the Pub/Sub module, parsing the subscribed data with Dataflow, and at the end writing the data into BigQuery.
  • Cloud Pub/Sub: Streaming IoT Kafka to Google Cloud Pub/Sub (GSP285)

    • The tutorial guides you to integrate two different kinds of streaming architectures, Kafka and Pub/Sub.
    • The integrated architectures maybe not the best solution but can be used for various extensions of concatenated systems, for example, the Cloud IoT Core module.
  • ETL Processing on GCP Using Dataflow and BigQuery (GSP290)

    • This tutorial guides you on how to use the Dataflow service to do a specific data processing task like ETL through running Python scripts on it.
    • After that, you can assign the script for inserting processed data into a BigQuery table.

Cloud Infrastructure

The following courses are mainly related to GCP Essentials on qwiklabs.

Docker with Kubernetes