vishalkhondre's Stars
malloydata/malloy
Malloy is an experimental language for describing data relationships and transformations.
StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
run-llama/llama-hub
A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
great-expectations/great_expectations
Always know what to expect from your data.
logicalclocks/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
julianhyde/sqlline
Shell for issuing SQL to relational databases via JDBC
linkedin/Hoptimator
Multi-hop declarative data pipelines
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
tobymao/sqlglot
Python SQL Parser and Transpiler
elyase/geotext
Geotext extracts country and city mentions from text
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
superstreamlabs/memphis
Memphis.dev is a highly scalable and effortless data streaming platform
Breaka84/Spooq
aws/aws-cdk
The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
Beuth-Erdelt/Benchmark-Experiment-Host-Manager
This python tool helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.
Beuth-Erdelt/DBMS-Benchmarker
DBMS-Benchmarker is a Python-based application-level blackbox benchmark tool for Database Management Systems (DBMS). It connects to a given list of DBMS (via JDBC) and runs a given list of parametrized and randomized (SQL) benchmark queries. Evaluations are available via a Python interface and on an interactive multi-dimensional dashboard.
Swiple/swiple
Swiple enables you to easily observe, understand, validate and improve the quality of your data
petl-developers/petl
Python Extract Transform and Load Tables of Data
frictionlessdata/frictionless-py
Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
frictionlessdata/datapackage
Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
papers-we-love/papers-we-love
Papers from the computer science community to read and discuss.
sfu-db/connector-x
Fastest library to load data from DB to DataFrames in Rust and Python
zsvoboda/ngods
New generation opensource data stack
edornd/clidantic
Typed Command Line Interfaces powered by Click and Pydantic
apache/iceberg
Apache Iceberg
zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
zsvoboda/ngods-stocks
New Generation Opensource Data Stack Demo