gchatzip's Stars
TheAlgorithms/Python
All Algorithms implemented in Python
mingrammer/diagrams
:art: Diagram as Code for prototyping cloud system architectures
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
sebastianruder/NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
duckdb/duckdb
DuckDB is an analytical in-process SQL database management system
andkret/Cookbook
The Data Engineering Cookbook
oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
lib/pq
Pure Go Postgres driver for database/sql
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
igorbarinov/awesome-data-engineering
A curated list of data engineering tools for software developers
fluentpython/example-code
Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)
apache/pinot
Apache Pinot - A realtime distributed OLAP datastore
cloudtools/troposphere
troposphere - Python library to create AWS CloudFormation descriptions
gregmalcolm/python_koans
Python Koans - Learn Python through TDD
linkedin/databus
Source-agnostic distributed change data capture system
awslabs/amazon-redshift-utils
Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment
h2oai/datatable
A Python package for manipulating 2-dimensional tabular data structures
mre/the-coding-interview
Programming exercises, code katas and puzzles for your job interview training - or just for fun.
Sceptre/sceptre
Build better AWS infrastructure
pinterest/pinball
Pinball is a scalable workflow manager
pythonspeed/filprofiler
A Python memory profiler for data processing and scientific computing applications
awslabs/aws-glue-libs
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
grycap/scar
Serverless Container-aware ARchitectures (e.g. Docker in AWS Lambda)
awsdocs/aws-glue-developer-guide
The open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.
aws-samples/amazon-serverless-datalake-workshop
A workshop demonstrating the capabilities of S3, Athena, Glue, Kinesis, and Quicksight.
laughingman7743/PyAthenaJDBC
PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).
aws-samples/flink-stream-processing-refarch
Reference architecture for real-time stream processing with Apache Flink on Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service.
knowsuchagency/airflow-cdk
Constructs to deploy airflow via the aws cdk