mrhallak's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
zed-industries/zed
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
spacedriveapp/spacedrive
Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
Netflix/chaosmonkey
Chaos Monkey is a resiliency tool that helps applications tolerate random instance failures.
DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
sqlfluff/sqlfluff
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
datafuselabs/databend
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
tobymao/sqlglot
Python SQL Parser and Transpiler
delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
quarylabs/quary
Open-source BI for engineers
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
dbt-labs/metricflow
MetricFlow allows you to define, build, and maintain metrics in code.
ddotta/awesome-polars
A curated list of Polars talks, tools, examples & articles. Contributions welcome !
qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
paypal/data-contract-template
Template for a data contract used in a data mesh.
facebookincubator/nimble
New file format for storage of large columnar datasets.
dbt-labs/dbt-spark
dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
quarylabs/sqruff
Fast SQL formatter/linter
datacontract/datacontract-specification
The Data Contract Specification Repository
dbt-msft/dbt-sqlserver
dbt adapter for SQL Server and Azure SQL
airbytehq/airbyte-platform
The platform that powers Airbyte. Please file issues in https://github.com/airbytehq/airbyte
mrhallak/pycutter
Python project cookiecutter template