zyxie's Stars
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
livekit/python-sdks
LiveKit real-time and server SDKs for Python
google/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
prometheus/prometheus
The Prometheus monitoring system and time series database.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
hidet-org/hidet
An open-source efficient deep learning framework/compiler, written in python.
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
huggingface/text-generation-inference
Large Language Model Text Generation Inference
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
google/aqt
ptillet/triton-llvm-releases
weaviate/weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
karpathy/llama2.c
Inference Llama 2 in one file of pure C
ggerganov/ggml
Tensor library for machine learning
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
facebookincubator/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
ggerganov/llama.cpp
LLM inference in C/C++
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
twitter/the-algorithm
Source code for Twitter's Recommendation Algorithm
dmlc/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
apache/mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more