vingov's Stars
basilysf1709/distributed-systems
Comprehensive guide, algorithms and tools on distributed systems
ashishps1/awesome-leetcode-resources
Awesome LeetCode resources to learn Data Structures and Algorithms and prepare for Coding Interviews.
apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
linkedin/openhouse
Open Control Plane for Tables in Data Lakehouse
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
nat/natbot
Drive a browser with GPT-3
llSourcell/AI_Humanities
This is the curriculum for AI Humanities by Siraj Raval on Youtube
ray-project/llm-numbers
Numbers every LLM developer should know
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
meta-llama/llama
Inference code for Llama models
vijayaphanindra/kafka-streams-dataquality
Data Quality for real-time streaming datasets
apache/gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
igorbarinov/awesome-data-engineering
A curated list of data engineering tools for software developers
abti/awesome-data-engineering
A curated list of data engineering tools for software developers
Liquid-Prep/Liquid-Prep
Liquid Prep offers an end-to-end solution for farmers looking to optimize their water usage, especially during times of drought.
xizhengszhang/Leetcode_company_frequency
Collection of leetcode company tag problems. Periodically updating.
google/zetasql
ZetaSQL - Analyzer Framework for SQL
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
koxudaxi/datamodel-code-generator
Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
debezium/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
vingov/dbt-spark
spark plugin for dbt
dbt-labs/dbt-spark
dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
linkedin/coral
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows