stu1130
Contribute to Apache MXNet, Co-author of DJL (Deep Java Library), Focusing on distributed training now
Amazon AITaiwan USA
stu1130's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
ggerganov/llama.cpp
LLM inference in C/C++
facebookresearch/llama
Inference code for LLaMA models
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
ml-explore/mlx
MLX: An array framework for Apple silicon
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
unslothai/unsloth
Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
karpathy/nn-zero-to-hero
Neural Networks: Zero to Hero
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
mistralai/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
dair-ai/ML-Papers-Explained
Explanation to key concepts in ML
facebookresearch/metaseq
Repo for external large-scale work
Lightning-AI/lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
TimDettmers/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
ServiceWeaver/weaver
Programming framework for writing and deploying cloud applications.
mosaicml/llm-foundry
LLM training code for Databricks foundation models
turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
FranxYao/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
aws/aws-parallelcluster
AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
stanford-futuredata/megablocks
NVIDIA/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
NVIDIA/NeMo-Aligner
Scalable toolkit for efficient model alignment
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.