jinzex's Stars
meta-llama/llama
Inference code for Llama models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
ml-explore/mlx
MLX: An array framework for Apple silicon
tensorflow/tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
THUDM/GLM-130B
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
google-deepmind/gemma
Open weights LLM from Google DeepMind.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
CalculatedContent/WeightWatcher
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
NVIDIA/cccl
CUDA Core Compute Libraries
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
alibaba/Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
NVIDIA/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
pytorch-labs/float8_experimental
This repository contains the experimental PyTorch native float8 training UX
microsoft/microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
canbula/ieee754
Python module which finds the IEEE-754 representation of a floating point number.
Oletus/float16-simulator.js
A simulator for low-precision floating point calculations running in the browser
opencomputeproject/FP8