tpulkit's Stars
OpenInterpreter/open-interpreter
A natural language interface for computers
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
karpathy/LLM101n
LLM101n: Let's build a Storyteller
fastai/fastai
The fastai deep learning library
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
kubeflow/kubeflow
Machine Learning Toolkit for Kubernetes
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
stas00/ml-engineering
Machine Learning Engineering Open Book
statsmodels/statsmodels
Statsmodels: statistical modeling and econometrics in Python
daytonaio/daytona
The Open Source Dev Environment Manager.
scikit-learn-contrib/imbalanced-learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
NVIDIA/nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUs
unitycatalog/unitycatalog
Open, Multi-modal Catalog for Data & AI
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Guang000/Awesome-Dataset-Distillation
A curated list of awesome papers on dataset distillation and related applications.
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
a16z-infra/llm-app-stack
NVIDIA-Merlin/HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
stas00/the-art-of-debugging
The Art of Debugging
AssemblyAI-Community/Machine-Learning-From-Scratch
Implementation of popular ML algorithms from scratch
huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
huggingface/text-clustering
Easily embed, cluster and semantically label text datasets
PatrickZH/DeepCore
Code for coreset selection methods
KellerJordan/cifar10-airbench
94% on CIFAR-10 in 3.09 seconds 💨 96% in 27 seconds
facebookresearch/ssl-data-curation
PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning
ZaydH/influence_analysis_papers
Influence Analysis and Estimation - Survey, Papers, and Taxonomy
granica-ai/DataSelectionICLR2024