jinzex

@NVIDIA

jinzex's Stars

meta-llama/llama
Inference code for Llama models
Language:Python55.4k 516 9559.4k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python26.1k 225 4.3k3.8k
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python19.5k 300 1.4k2.5k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++16.4k 141 503930
tensorflow/tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Language:Python15.3k 465 1.2k3.5k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.3k 116 1k1.2k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python11.4k 201 2.2k2.4k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python9.9k 162 6952.2k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.1k 86 1.8k896
THUDM/GLM-130B
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Language:Python7.7k 99 198608
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python7.6k 95 1.6k926
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.5k 64 98503
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
Language:Python5.2k 39 37499
google-deepmind/gemma
Open weights LLM from Google DeepMind.
Language:Python2.4k 33 31298
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1.8k 34 305296
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language:Python1.5k 38 36141
CalculatedContent/WeightWatcher
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
Language:Python1.4k 33 237121
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Language:C++1.2k 24 18981
Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Language:Python1.1k 33 42469
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.1k 31 1.3k129
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python647 22 46293
alibaba/Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
Language:Python596 5 6251
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
Language:Python505 11 6338
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Language:Python416 13 6425
NVIDIA/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Language:C++373 12 15349
pytorch-labs/float8_experimental
This repository contains the experimental PyTorch native float8 training UX
Language:Python210 25 4720
microsoft/microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
Language:Python139 7 2215
canbula/ieee754
Python module which finds the IEEE-754 representation of a floating point number.
Language:Python27 3 45
Oletus/float16-simulator.js
A simulator for low-precision floating point calculations running in the browser
Language:JavaScript120
opencomputeproject/FP8
6 3 03

jinzex

jinzex's Stars

meta-llama/llama

vllm-project/vllm

microsoft/unilm

ml-explore/mlx

tensorflow/tensor2tensor

Dao-AILab/flash-attention

NVIDIA/NeMo

NVIDIA/Megatron-LM

NVIDIA/TensorRT-LLM

THUDM/GLM-130B

huggingface/accelerate

pytorch-labs/gpt-fast

google/gemma_pytorch

google-deepmind/gemma

NVIDIA/TransformerEngine

kyegomez/BitNet

CalculatedContent/WeightWatcher

NVIDIA/MatX

Lightning-AI/lightning-thunder

NVIDIA/cccl

triton-inference-server/tensorrtllm_backend

alibaba/Megatron-LLaMA

Azure/MS-AMP

NVIDIA/TensorRT-Model-Optimizer

NVIDIA/DCGM

pytorch-labs/float8_experimental

microsoft/microxcaling

canbula/ieee754

Oletus/float16-simulator.js

opencomputeproject/FP8