Pinned Repositories
flash-attention
Fast and memory-efficient exact attention
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
cogvlm2
Large Language Model Text Generation Inference
cute-gemm
cute-gemm
cutlass
CUDA Templates for Linear Algebra Subroutines
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
Megatron-LM
Ongoing research training transformer models at scale
SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
leizhao1234's Repositories
leizhao1234/cogvlm2
Large Language Model Text Generation Inference
leizhao1234/cute-gemm
cute-gemm
leizhao1234/cutlass
CUDA Templates for Linear Algebra Subroutines
leizhao1234/FasterTransformer
Transformer related optimization, including BERT, GPT
leizhao1234/flash-attention
Fast and memory-efficient exact attention
leizhao1234/Megatron-LM
Ongoing research training transformer models at scale
leizhao1234/SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
leizhao1234/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.