/Data-Engineering-with-Google-Cloud-Platform

Data Engineering with Google Cloud Platform, published by Packt

Primary LanguagePythonMIT LicenseMIT


For a limited period, all eBooks and Videos are only $10. All the practical content you need - by developers, for developers

Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform

This is the code repository for Data Engineering with Google Cloud Platform, published by Packt.

A practical guide to operationalizing scalable data analytics systems on GCP

What is this book about?

With this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards.

This book covers the following exciting features:

  • Load data into BigQuery and materialize its output for downstream consumption
  • Build data pipeline orchestration using Cloud Composer
  • Develop Airflow jobs to orchestrate and automate a data warehouse
  • Build a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc cluster
  • Leverage Pub/Sub for messaging and ingestion for event-driven systems
  • Use Dataflow to perform ETL on streaming data
  • Unlock the power of your data with Data Studio
  • Calculate the GCP cost estimation for your end-to-end data solutions

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

html, body, #map {
height: 100%;
margin: 0;
padding: 0
}

Following is what you need for this book: This book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12).

Software and Hardware List

To successfully follow the examples in this book, you need a GCP account and project. If, at this point, you don't have a GCP account and project, don't worry. We will cover that as part of the exercises in this book. Occasionally, we will use the free tier from GCP for practice, but be aware that some products might not have free tiers. Notes will be provided if this is the case. All the exercises in this book can be completed without any additional software installation. The exercises will be done in the GCP console that you can open from any operating system using your favorite browser.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Adi Wijaya is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.