NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

PythonApache-2.0

Issues

Some doubts about the usage of `DelayedScaling.interval`.
#805 opened 21 days ago by wzzju
1
Error in installing
#887 opened 23 days ago by ziyang-arch
4
When ub_overlap_rs_dgrad is set to True, the error "Caught signal 8 (Floating point exception: integer divide by zero)" is raised.
#788 opened 2 months ago by JJGSBGQ
2
Ubuntu session close during building wheel step
#873 opened 24 days ago by Ciclarion
3
TypeError: UbufP2PCommOverlap(): incompatible function arguments.
#870 opened a month ago by holmes313
1
Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'
#856 opened a month ago by sam-h-bean
2
Failed to build Transformer Engine
#881 opened a month ago by zirui
2
Replacing nn.Linear w/ te.Linear FP8 convergence issue
#764 opened 3 months ago by viclzhu
9
Can't find `nvToolsExt` during build
#879 opened a month ago by kvablack
0
import transformer_engine initializes CUDA
#872 opened a month ago by szmigacz
1
Strange behavior when import torch after import te.
#871 opened a month ago by GGGGGGXY
1
[URGENT] Malware hosted somewhere in this repo
#864 opened a month ago by andrei-cb
5
Release GIL when calling C extensions
#868 opened a month ago by szmigacz
0
Port down up may cause hang when using TE in training.
#866 opened a month ago by holmes313
0
[ERROR] cannot install the package,
#803 opened 2 months ago by xju2
4
te.Checkpoint does not work for nested autocast
#787 opened a month ago by tohinz
4
ERROR: Failed building wheel for transformer-engine
#857 opened a month ago by Weifan1226
4
Request for Adaptive Layer Norm MLP
#789 opened 2 months ago by fordflip
7
FP8 not converging during Supervised Fine-Tuning (though BF16 is)
#841 opened 2 months ago by ThomasKluiters
1
[Pytorch] LayerNormMLP seems to causing grad norm explosion under multi-node
#709 opened 4 months ago by tylaar
1
wath's the benefit of using comm_gemm_overlap.h:bulk_overlap
#742 opened 3 months ago by huxiao0
0
[Question] Why Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled?
#746 opened 3 months ago by hxdtest
1
Support for overlapping tensor-parallel collectives with matmuls in fprop?
#737 opened 3 months ago by cbcase
2
Best out of the box framework for training a BitNet model
#723 opened 3 months ago by RonanKMcGovern
0
Feature request: Add Llama-style MLP with three linear layers
#743 opened a month ago by rationalism
2
When using Import Transformer_engine, many processes will be created
#751 opened 3 months ago by zte-tcb
0
The package name passed to `find_package_handle_standard_args` (LIBRARY) does not match the name of the calling package (CUDNN)
#752 opened 3 months ago by shanepeckham
2
With using the fp8, after the interruption of training, and then continue , there may be a little difference in loss. Is this caused by the fp8 mechanism?
#759 opened 3 months ago by zte-tcb
1
CPU Overhead of te.Linear FP8 Layers
#761 opened 3 months ago by tohinz
7
Could TransformerEngine work with Deepspeed Zero w/ offloading?
#762 opened 3 months ago by leiwen83
0
Output scale not being used with `te_gemm` in FP8
#778 opened 2 months ago by snarayan21
3
MLP without LayerNorm
#817 opened 2 months ago by sriniiyer
0
Training the 1B model on H800 resulted in a decrease in throughput
#836 opened 2 months ago by forevergj
3
`inv_freq` of `RotaryPositionEmbedding` is hard-coded to 10k
#849 opened a month ago by shijie-wu
1
[ERROR] cuBLAS error when launch training with Megatron-LM and TransformerEngine
#847 opened a month ago by Btlmd
2
[Question] ub_tp_comm_overlap config setup
#827 opened a month ago by tylaar
9
Can TE optimize the find cudnn?
#823 opened a month ago by MARD1NO
1
how to disable fused_attention when building?
#822 opened 2 months ago by janelu9
3
v1.6: FP8GlobalStateManager seems to be preserving state in distributed setting
#814 opened 2 months ago by kshitij12345
1
`warnings.simplefilter('default')` in global scope causes excessive DeprecationWarnings
#812 opened 2 months ago by jckhan
5
[JAX] Support fused SwiGLU MLP
#708 opened 2 months ago by irhum
0
ncclIpcSocketSendFd failed in register_user_buffer_collective(alloc=true), --tp-comm-overlap
#801 opened 2 months ago by jingjie01ai
0
Cannot import transformer_engine.pytorch
#792 opened 2 months ago by sriniiyer
0
te.checkpoint does not work on nn.Module that consists of te blocks
#776 opened 2 months ago by tohinz
4
Build fails when using jax NGC image
#767 opened 3 months ago by bmac3
3
_ZN18transformer_engine6getenvIiEET_RKSsRKS1_ on the latest main branch
#756 opened 3 months ago by leiwen83
6
When A and B are fp8 tensors, the compute type could be `CUBLAS_COMPUTE_16F`
#758 opened 3 months ago by condy0919
2
[Pytorch] Swiglu implementation not aligned with jiterator version in probability
#717 opened 3 months ago by tylaar
4
Primary weights profiling question
#712 opened 3 months ago by afcruzs
4
[JAX/PyTorch] slower kernel calls on `sm90_xmma_gemm_e4m3bf16_e4m3f32_f32`
#710 opened 4 months ago by irhum
1