stu1130

Contribute to Apache MXNet, Co-author of DJL (Deep Java Library), Focusing on distributed training now

Amazon AITaiwan USA

stu1130's Stars

langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Language:Python87k 665 6.9k13.6k
ggerganov/llama.cpp
LLM inference in C/C++
Language:C++60k 508 3.2k8.5k
facebookresearch/llama
Inference code for LLaMA models
Language:Python50.9k 499 8728.7k
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python33.3k 355 2975.1k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++15.1k 138 437865
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python14.6k 109 9281.4k
unslothai/unsloth
Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Language:Python11.3k 80 470723
karpathy/nn-zero-to-hero
Neural Networks: Zero to Hero
Language:Jupyter Notebook10.7k 274 271.3k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python10.4k 195 2.1k2.2k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python9.5k 64 102601
mistralai/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
Language:Jupyter Notebook8.8k 116 115761
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python7.9k 77 489555
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.1k 82 1.4k760
dair-ai/ML-Papers-Explained
Explanation to key concepts in ML
6.7k 146 8525
facebookresearch/metaseq
Repo for external large-scale work
Language:Python6.4k 109 292720
Lightning-AI/lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Language:Python5.9k 67 268505
TimDettmers/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python5.6k 47 954573
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.6k 65 623869
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++4.7k 107 898820
ServiceWeaver/weaver
Programming framework for writing and deploying cloud applications.
Language:Go4.6k 63 126219
mosaicml/llm-foundry
LLM training code for Databricks foundation models
Language:Python3.8k 48 360492
turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Language:Python2.6k 35 219212
FranxYao/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
Language:Jupyter Notebook2.4k 37 34120
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python1.7k 23 168328
aws/aws-parallelcluster
AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
Language:Python814 89 1.2k308
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
772 17 964
stanford-futuredata/megablocks
Language:Python765 12 34131
NVIDIA/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Language:C++337 13 14248
NVIDIA/NeMo-Aligner
Scalable toolkit for efficient model alignment
Language:Python295 9 5031
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
Language:Python106 24 1554

stu1130

stu1130's Stars

langchain-ai/langchain

ggerganov/llama.cpp

facebookresearch/llama

karpathy/nanoGPT

ml-explore/mlx

huggingface/peft

unslothai/unsloth

karpathy/nn-zero-to-hero

NVIDIA/NeMo

microsoft/LoRA

mistralai/mistral-src

facebookresearch/xformers

NVIDIA/TensorRT-LLM

dair-ai/ML-Papers-Explained

facebookresearch/metaseq

Lightning-AI/lit-llama

TimDettmers/bitsandbytes

NVIDIA/FasterTransformer

NVIDIA/cutlass

ServiceWeaver/weaver

mosaicml/llm-foundry

turboderp/exllama

FranxYao/chain-of-thought-hub

microsoft/Megatron-DeepSpeed

aws/aws-parallelcluster

AIoT-MLSys-Lab/Efficient-LLMs-Survey

stanford-futuredata/megablocks

NVIDIA/DCGM

NVIDIA/NeMo-Aligner

facebookresearch/param