Pinned Repositories
cs133
Final Project for cs133
duckdb
DuckDB is an in-process SQL OLAP Database Management System
gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
hadoop
Apache Hadoop
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
iceberg
Apache Iceberg
incubator-superset
Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application
llama
Inference code for Llama models
pybigquery
SQLAlchemy dialect for BigQuery
zynx
sumedhsakdeo's Repositories
sumedhsakdeo/cs133
Final Project for cs133
sumedhsakdeo/pybigquery
SQLAlchemy dialect for BigQuery
sumedhsakdeo/zynx
sumedhsakdeo/duckdb
DuckDB is an in-process SQL OLAP Database Management System
sumedhsakdeo/gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
sumedhsakdeo/hadoop
Apache Hadoop
sumedhsakdeo/haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
sumedhsakdeo/iceberg
Apache Iceberg
sumedhsakdeo/incubator-superset
Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application
sumedhsakdeo/llama
Inference code for Llama models
sumedhsakdeo/one_1.4.1
sumedhsakdeo/openhouse
OpenHouse - A Control Plane for Tables
sumedhsakdeo/orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
sumedhsakdeo/spark
Apache Spark - A unified analytics engine for large-scale data processing
sumedhsakdeo/TheOne
sumedhsakdeo/TheOneProject
sumedhsakdeo/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
sumedhsakdeo/unitycatalog
Open, Multi-modal Catalog for Data & AI