This repository contains a collection of demos illustrating the integration of Databricks with various AWS native services, focusing on data ingestion, transformation, and serving.
- Databricks Intro
- Ingestion
- Data Transformation/Enrichment
- Data Serving/Consumption
- Prerequisites
- Setup and Running Demos
- Contributing
- License
- Contact
- What is Databricks
- Databricks on AWS - networking considerations
- Ingesting data from S3 with Autoloader (Directory Listing and File Notification)
- Ingesting data from a Kinesis
- Structured Streaming
- Kinesis Firehose + Autoloader
- Ingesting data from an RDS Database
- Bulk load using AWS Database Migration Service (DMS)
- CDC using AWS Database Migration Service (DMS)
- Building an ETL Pipeline using Delta Live Tables
- Building a data pipeline using Databricks Workflows
- Reduce TCO by using AWS Graviton
- Query your Delta Lake using Amazon Athena
- Pushing Gold data to DynamoDB for low latency use cases
- Real-Time ML Inference using Sagemaker Serverless Endpoints
- Real-Time ML Inference using Databricks Model Serving V2
- Visualization using Amazon QuickSight
To run these demos, you will need:
- An AWS account with necessary permissions to create and manage resources
- A Databricks account
- Basic knowledge of AWS services and Databricks
- Clone this repository to your local machine.
- Set up your AWS and Databricks credentials.
- Follow the individual READMEs in each demo's folder to set up and run the demos.
We welcome contributions to this project. Please refer to the CONTRIBUTING.md file for more details.
© 2023 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source].
For any questions or feedback, please open an issue on this GitHub repository.
We hope you find these demos useful as you explore the capabilities of Databricks on AWS! Happy data engineering!
Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.