NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
PythonApache-2.0
Issues
- 1
- 4
Error in installing
#887 opened by ziyang-arch - 2
When ub_overlap_rs_dgrad is set to True, the error "Caught signal 8 (Floating point exception: integer divide by zero)" is raised.
#788 opened by JJGSBGQ - 3
Ubuntu session close during building wheel step
#873 opened by Ciclarion - 1
- 2
Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'
#856 opened by sam-h-bean - 2
Failed to build Transformer Engine
#881 opened by zirui - 9
- 0
Can't find `nvToolsExt` during build
#879 opened by kvablack - 1
import transformer_engine initializes CUDA
#872 opened by szmigacz - 1
Strange behavior when import torch after import te.
#871 opened by GGGGGGXY - 5
[URGENT] Malware hosted somewhere in this repo
#864 opened by andrei-cb - 0
Release GIL when calling C extensions
#868 opened by szmigacz - 0
- 4
[ERROR] cannot install the package,
#803 opened by xju2 - 4
te.Checkpoint does not work for nested autocast
#787 opened by tohinz - 4
- 7
Request for Adaptive Layer Norm MLP
#789 opened by fordflip - 1
- 1
- 0
- 1
[Question] Why Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled?
#746 opened by hxdtest - 2
- 0
- 2
- 0
- 2
The package name passed to `find_package_handle_standard_args` (LIBRARY) does not match the name of the calling package (CUDNN)
#752 opened by shanepeckham - 1
With using the fp8, after the interruption of training, and then continue , there may be a little difference in loss. Is this caused by the fp8 mechanism?
#759 opened by zte-tcb - 7
CPU Overhead of te.Linear FP8 Layers
#761 opened by tohinz - 0
- 3
Output scale not being used with `te_gemm` in FP8
#778 opened by snarayan21 - 0
MLP without LayerNorm
#817 opened by sriniiyer - 3
- 1
- 2
[ERROR] cuBLAS error when launch training with Megatron-LM and TransformerEngine
#847 opened by Btlmd - 9
[Question] ub_tp_comm_overlap config setup
#827 opened by tylaar - 1
Can TE optimize the find cudnn?
#823 opened by MARD1NO - 3
how to disable fused_attention when building?
#822 opened by janelu9 - 1
v1.6: FP8GlobalStateManager seems to be preserving state in distributed setting
#814 opened by kshitij12345 - 5
`warnings.simplefilter('default')` in global scope causes excessive DeprecationWarnings
#812 opened by jckhan - 0
[JAX] Support fused SwiGLU MLP
#708 opened by irhum - 0
ncclIpcSocketSendFd failed in register_user_buffer_collective(alloc=true), --tp-comm-overlap
#801 opened by jingjie01ai - 0
Cannot import transformer_engine.pytorch
#792 opened by sriniiyer - 4
- 3
Build fails when using jax NGC image
#767 opened by bmac3 - 6
- 2
When A and B are fp8 tensors, the compute type could be `CUBLAS_COMPUTE_16F`
#758 opened by condy0919 - 4
[Pytorch] Swiglu implementation not aligned with jiterator version in probability
#717 opened by tylaar - 4
Primary weights profiling question
#712 opened by afcruzs - 1