Pinned Repositories
airflow-aws-shared-secrets
SecretsManagerBackend with cross-account access
awesome-spark
A curated list of awesome Apache Spark packages and resources.
data-platform-tutorial
Data Platform Tutorial
mini-data-platform
Mini Data Platform
mlflow-workshop
First steps to interact with MLflow (mlflow.org)
pytest-dbt-postgres
Unittest DBT Postgres projects
rabbitmq-poc
Notification System PoC with Delayed/Expired queues.
afranzi's Repositories
afranzi/mlflow-workshop
First steps to interact with MLflow (mlflow.org)
afranzi/mini-data-platform
Mini Data Platform
afranzi/pytest-dbt-postgres
Unittest DBT Postgres projects
afranzi/airflow-aws-shared-secrets
SecretsManagerBackend with cross-account access
afranzi/awesome-spark
A curated list of awesome Apache Spark packages and resources.
afranzi/rabbitmq-poc
Notification System PoC with Delayed/Expired queues.
afranzi/airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
afranzi/aws-glue-data-catalog-client-for-apache-hive-metastore
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
afranzi/data-access-layer
Library to facilitate accessing Data from Databricks
afranzi/datahub
A Generalized Metadata Search & Discovery Tool
afranzi/airflow-charts
The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
afranzi/datahub-helm
Repository of helm charts for deploying DataHub on a Kubernetes cluster
afranzi/dbt-common
afranzi/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
afranzi/helm-charts
Lightdash Community helm charts
afranzi/json-schema
JSON Schema validator for java, based on the org.json API
afranzi/kafdrop
Kafka Web UI
afranzi/kafka-connect-field-and-time-partitioner
Kafka Connect Store Partitioner by a custom field and time
afranzi/loka-tor
Lokalise Chrome Extension
afranzi/prefect
The easiest way to automate your data
afranzi/prefect-poc
Prefect Flow Evaluation
afranzi/presto
Distributed SQL query engine for big data
afranzi/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
afranzi/redis-poc
Notification System PoC with ZSETs using the time to send as Scores
afranzi/rudderstack-helm
Open-source, warehouse-first Customer Data Pipeline and Segment-alternative. Collects and routes clickstream data and builds your customer data lake on your data warehouse.
afranzi/rust-efimer
PoC to try & learn Rust
afranzi/scala-skeleton
Scala Skeleton
afranzi/spark-daria
Essential Spark extensions and helper methods ✨😲
afranzi/spark-json-schemas
Create Spark schemas using JSON-schemas
afranzi/thunderstruck
CDP based on ray.io