Pinned Repositories
airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
airflow-eks-helm-chart
Airflow helm chart for AWS EKS
airflow-pipeline
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.
airflow-toolkit
Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) :desktop_computer: >> [ :rocket:, :ship: ]
alpakka-kafka
Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
amundsen-custom
Apache-Kafka-Series---Learn-Apache-Kafka-for-Beginners
Code Repository for Apache Kafka Series - Learn Apache Kafka for Beginners, Published by Packt
atom
:atom: The hackable text editor
autoscaler
Autoscaling components for Kubernetes
tufanrakshit's Repositories
tufanrakshit/airflow-pipeline
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.
tufanrakshit/airflow-toolkit
Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) :desktop_computer: >> [ :rocket:, :ship: ]
tufanrakshit/awesome-scalability
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
tufanrakshit/dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation
tufanrakshit/datahub
A Metadata Platform for the Modern Data Stack
tufanrakshit/dbt-event-logging
a dbt package to make auditing dbt runs easy.
tufanrakshit/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
tufanrakshit/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
tufanrakshit/dione
Dione - a Spark and HDFS indexing library
tufanrakshit/dr-elephant
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
tufanrakshit/FASTER
Fast persistent recoverable log and key-value store + cache, in C# and C++.
tufanrakshit/flink-statefun
Apache Flink Stateful Functions
tufanrakshit/fraud-detection-demo
Repository for Advanced Flink Application Patterns series
tufanrakshit/kafka-delta-ingest
A highly efficient daemon for streaming data from Kafka into Delta Lake
tufanrakshit/magellan
Geo Spatial Data Analytics on Spark
tufanrakshit/modern-unix
A collection of modern/faster/saner alternatives to common unix commands.
tufanrakshit/netflix-graph
Compact in-memory representation of directed graph data
tufanrakshit/OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
tufanrakshit/platform-spark-kubernetes-samples
Spark on Kubernetes samples
tufanrakshit/presto-gateway
A load balancer / proxy / gateway for prestodb
tufanrakshit/querybook
Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
tufanrakshit/rocksdb
A library that provides an embeddable, persistent key-value store for fast storage.
tufanrakshit/rocksplicator
RocksDB Replication
tufanrakshit/schemaspy
SchemaSpy code home
tufanrakshit/spark
Apache Spark - A unified analytics engine for large-scale data processing
tufanrakshit/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
tufanrakshit/sqlfluff
A SQL linter and auto-formatter for Humans
tufanrakshit/strimzi-kafka-operator
Apache Kafka running on Kubernetes
tufanrakshit/trino-getting-started
tufanrakshit/zingg
Scalable fuzzy matching for data mastering, deduplication and entity resolution.