chloeh13q's Stars
jwasham/coding-interview-university
A complete computer science study plan to become a software engineer.
twitter/the-algorithm
Source code for Twitter's Recommendation Algorithm
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
Lightning-AI/pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
apache/flink
Apache Flink
MostlyAdequate/mostly-adequate-guide
Mostly adequate guide to FP (in javascript)
deepset-ai/haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
guipsamora/pandas_exercises
Practice your pandas skills!
theanalyst/awesome-distributed-systems
A curated list to learn about distributed systems
kedro-org/kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
chiphuyen/machine-learning-systems-design
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
apache/datafusion
Apache DataFusion SQL Query Engine
paperswithcode/ai-deadlines
:alarm_clock: AI conference deadline countdowns
ibis-project/ibis
the portable Python dataframe library
hibayesian/awesome-automl-papers
A curated list of automated machine learning papers, articles, tutorials, slides and projects
Netflix/maestro
Maestro: Netflix’s Workflow Orchestrator
quixio/quix-streams
Python stream processing for Kafka
substrait-io/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
airbnb/chronon
Chronon is a data platform for serving for AI/ML applications.
pytorch/torcharrow
High performance model preprocessing library on PyTorch
bytewax/awesome-public-real-time-datasets
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
RecList/reclist
Behavioral "black-box" testing for recommender systems
jayinai/kaggle-classification
A compiled list of kaggle competitions and their winning solutions for classification problems.
jacopotagliabue/post-modern-stack
Joining the modern data stack with the modern ML stack
adijo/data-science-prep
Problems from https://datascienceprep.com/
ibis-project/ibis-ml
IbisML is a library for building scalable ML pipelines using Ibis.
ibis-project/ibis-substrait
Ibis Substrait Compiler
TU-Berlin-DIMA/scotty-window-processor
This repository provides Scotty, a framework for efficient window aggregations for out-of-order Stream Processing.
daviddwlee84/MachineLearningPractice
Some practices using statistical machine learning technique based on some dataset. (notes and doing from scratch)