A curated list of awesome Amazon Redshift libraries, utilities, and resources.
- Awesome Redshift
- General Resources
- Contributing
Hands-on workshops to learn Redshift.
- Amazon Redshift Deep Dive Workshop - A hands-on workshop covering topics such as: Data API, Spectrum, Redshift ML, Lambda UDF, Query federation, SageMaker, Apache Hudi, QuickSight, PowerBI, Oracle/SQL Server migrations.
- Redshift Immersion Labs Workshop - A hands-on workshop covering topics such as: ELT, Materialized Views, Data Sharing, and Redshift ML.
Amazon Redshift Serverless resources
- Amazon Redshift Serverless RSQL ETL Framework - A Serverless ETL framework.
- Self-service analytics with Amazon Redshift Serverless - A video session on getting started and best practices with Redshift Serverelss.
Data Sharing for sharing data between Redshift clusters
- Seamless Data Sharing Using Amazon Redshift - A hands-on workshop to share live data across Amazon Redshift clusters.
- Optimize Data Pattern using Data Sharing - A hands-on workshop using data sharing to reduce the provisioned storage required to support your workload.
Resources related to Data APIs for accessing Redshift from web services–based applications
- Getting Started with Redshift Data API - A sample project to access Redshift Data API from AWS Lambda.
Resources related to Federated Queries querying live data from external databases
- Best Practices for Amazon Redshift Federated Queries - A blog post listing best practices to apply when using Redshift Federated Queries.
Resources related to Streaming Ingestion querying stream data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka
- Real-time Analytics with Amazon Redshift Streaming Ingestion - A blog post describing how to query stream data in the same account.
- Near Real-time Fraud Detection using Amazon Redshift Streaming Ingestion - A blog post describing how to use Amazon Redshift Streaming Ingestion, Amazon Kinesis Data Streams, and Amazon Redshift ML to detect fraud near real-time.
- Cross-Account Streaming Ingestion for Amazon Redshift - A blog post describing how to query stream data across accounts.
- Amazon Redshift Streaming Workshop - A hands-on workshop and sample library to build a near-realtime logistics dashboard using Amazon Redshift and Amazon Managed Grafana.
Resources related to Redshift Spectrum for querying S3 data
- Redshift Spectrum Row and Cell Level Security - A blog post describing how to use row and cell level security defined in AWS Lake Formation.
Collections of User defined functions (UDFs)
- UDFs Collection - A collection of useful UDFs, such as bitwise ops, url parsing, masking, kms encryption, dynamodb lookups, and converting json to upper case.
- Text UDFs - UDFs to analyze text, such as translating, detecting language, detecting sentiment, detecting and redacting entities, detecting and redacting PII.
Resources related to Amazon Redshift ML
- Create and train ML Models using Amazon Redshift ML - A hands-on workshop using Redshift ML to predict customer churn.
- Streaming Ingestion and ML Predictions with Amazon Redshift - A hands-on workshop using Streaming Ingestion and Redshift ML to detect fraud near real-time.
Tools and tips to measure and tune Redshift's performance.
- Top 10 Redshift Performance Tuning Techniques - A blog post outlining performance tuning techniques.
- Test Drive - A collection of utilities and automation to compare performance of different Redshift configurations for a given workload.
- Simple Replay - A library to record your queries and replay them on a different cluster to test performance. (DEPRECATED: Use Test Drive)
- Node Configuration Compare - A library to compare performance of different cluster sizes and configurations by recording and replaying your queries (uses Simple Replay under the hood). (DEPRECATED: Use Test Drive)
- Admin Scripts - A collection of queries and scripts to inspect performance and other administrative tasks.
- Admin Views - A collection of views to inspect performance and other adminstrative tasks.
- Benchmark Redshift Using TPC-DS and TPC-H - A collection of commands and queries to setup and run TPC-DS/TPC-H on Redshift.
- The adx-tpc-ds Benchmark Scripts - A library to benchmark Redshift without having to generate and load data
- ClickBench - Compare Analytical DBMS - A comparison of performance of various data warehouses and analytical DBMS.
Redshift connectors and drivers
- Amazon Redshift Python Driver - Amazon Redshift's connector for Python.
- Amazon Redshift JDBC driver - Amazon Redshift JDBC driver.
- Amazon Redshift ODBC driver - Amazon Redshift ODBC driver.
- Amazon Redshift Integration with Apache Spark on EMR and Glue - Connecting to Redshift from Amazon EMR 6.9, EMR Serverless, and AWS Glue 4.0.
- Redshift Data Source for Apache Spark - Community Edition - Connecting to Redshift from Apache Spark - community edition.
- Query Amazon Redshift with Databricks - Connecting to Redshift from Databricks Runtime.
Tools and scripts to automate management and operations of Redshift.
- Visualize Redshift Operational Metrics Using Grafana - A blog post how to use Amazon Redshift plugin for Grafana to query and visualize Redshift operational metrics.
- Redshift Stored Procedures - A collection of stored procedures to perform common data tasks, such as integrity checks, permissions, and changing your data model.
- Redshift Automation - A library to automate common tasks using AWS CloudWatch events and AWS Lambda.
- QMR Notifications Utility - A library to set SNS notifications for changes in WLM Query Monitoring Rules (QMR).
Libraries and resources to help integrate Redshift with other frameworks and AWS services
- AWS SDK for Pandas - A library to transfer data between Pandas, Redshift, and other AWS services.
- Best Practices for Leveraging Amazon Redshift and dbt - A white-paper covering best practices and performance tuning when using dbt and Amazon Redshift.
- Using DBT with Amazon Redshift Workshop - A hands on workshop on integrating DBT and Redshift.
- Amazon Redshift Plugin for Grafana - Redshift plugin for Grafana.
- Amazon Redshift SQL Operator - An operator allowing Apache Airflow users to execute statements against Redshift in workflows.
- Use the Amazon Redshift SQLAlchemy dialect to interact with Amazon Redshift - A blog covering how to use the
sqlalchemy-redshift
dialect with SQLAlchemy.
- Adding Amazon Redshift Query engine to Querybook - A step by step guide showing how to add a Amazon Redshift query engine to Querybook.
- Execute Amazon Redshift Commands using AWS Glue - A library to use a AWS Glue Python Shell Job to execute SQL scripts on Amazon Redshift.
- Amazon Redshift User Defined Functions to Call Amazon Location Service APIs - A library using Lambda-based User Defined Functions (UDF) to call Amazon Location Service APIs.
General resources for Redshift's security
- AWS Summit NY 2022 - Amazon Redshift Security Enhancements - A video session covering authentication, access control, audit, and encryption.
- AWS Config Rules for Redshift Security - An AWS Config Rules conformance pack to apply Amazon Redshift's security best practices.
Integration with SSO providers
- Integrate Amazon Redshift with Microsoft Azure AD - A blog post describing how to integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client.
- Federate Amazon Redshift Access with Microsoft Azure AD SSO - A blog post describing how to federate Amazon Redshift access with Microsoft Azure AD single sign-on.
- Federate SSO Access to Amazon Redshit with Okta - A blog post describing how to federate single sign-on access to Amazon Redshift query editor v2 with Okta.
- Federate Access to Your Amazon Redshift cluster with Active Directory Federation Services - A 3-part blog post describing how to federate access to Amazon Redshift cluster with Active Directory Federation Services (AD FS).
Using role-based access control (RBAC) to manage database permissions
- Simplify Management of Database Privileges in Amazon Redshift - A blog post providing a step-by-step guide to setting up role based access control.
- Introducing Role Based Access Control (RBAC) in Amazon Redshift - A video providing an overview and step-by-step guide to setting up role based access control.
Using row-level security (RLS) to gain granular access control
- Achieve Fine-Grained Data Security with Row-Level Access Control - A blog post providing a step-by-step guide to setting up row level security.
- AWS Summit NY 2022 - Amazon Redshift Security Enhancements - A video session explaining how to protect data with role-based access controls, row-level security, and other AWS security features.
Protect your data using encryption
- Encrypt Amazon Redshift Data Loads with Amazon S3 and AWS KMS - A blog post describing how to encrypted data loads end-to-end.
- Accelerate Resize and Encryption of Amazon Redshift Clusters with Asynchronous Resize - A blog post how to asynchronously resize and encrypt an existing cluster.
Tools and resources to help reduce Redshift cost
- Cost Optimization Guidelines for Amazon Redshift - A white paper of best-practices to optimize Redshift's cost.
- Query to Analyze Redshift's Cost and Usage Report (CUR) - SQL query to analyze analyze Redshift's cost and usage using Amazon Athena.
- How to Attribute Amazon Redshift Costs to your End-Users - A blog detailing step-by-step instructions on how to attribute redshift costs to end users
Libraries and resources to help provision Redshift using CI/CD tools
- Apply CI/CD DevOps Principles to Amazon Redshift Development - A blog post and accompanying repo step by step guide to provision Redshift as a part of a deployment pipeline, using AWS CodeCommit, AWS CodeBuild, and AWS CodePipeline.
- Amazon Redshift Infrastructure Automation - A library to help automate provisoning of Redshift including data migration.
- Terraform Redshift Example Module - A template repository to deploy Redshift using Terraform.
- CDK Redshift Project - An AWS Cloud Development Kit (CDK) construct to run SQL in Redshift using AWS Step Functions.
Redshift's internal architecture and design
- Amazon Redshift Re-invented - A paper outlining Redshift's internal system architecture, data organization, and query processing flow.
Blogs, forums, and other online Redshift resources
- AWS re:Post - A Q&A forum and knowledge sharing community.
- AWS Big Data Blog - AWS official data blog.
- AWS Event YouTube Channel - Recorded presentations and talks from AWS events, such as re:Invent and AWS summits.
Your contributions are always welcome! Please take a look at the contribution guidelines first.
We will keep some pull requests open if we aren't sure whether those resources are awesome, you could vote for them by adding 👍 to them.