AttributeError: module 'torch.amp' has no attribute 'GradScaler'
djl70 opened this issue ยท 4 comments
๐ Describe the bug
Hello, I'm using pytorch 2.2.1 and torchtnt 0.2.3, and the change from #697 seems to be causing an AttributeError
for me when trying to import fit
.
from torchtnt.framework import fit
Full traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/framework/__init__.py", line 7, in <module>
from .auto_unit import AutoPredictUnit, AutoUnit
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/framework/auto_unit.py", line 19, in <module>
from torchtnt.framework._loop_utils import _step_requires_iterator
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/framework/_loop_utils.py", line 15, in <module>
from torchtnt.framework.state import State
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/framework/state.py", line 13, in <module>
from torchtnt.utils.timer import BoundedTimer, TimerProtocol
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/utils/__init__.py", line 50, in <module>
from .precision import convert_precision_str_to_dtype
File "/home/djl70/miniconda3/envs/tnt-pull-697/lib/python3.10/site-packages/torchtnt/utils/precision.py", line 41, in <module>
) -> Optional[torch.amp.GradScaler]:
AttributeError: module 'torch.amp' has no attribute 'GradScaler'
Steps to reproduce:
$ conda create -n tnt-pull-697 python=3.10
$ conda activate tnt-pull-697
$ conda install pytorch==2.2.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
$ conda install -c conda-forge torchtnt==0.2.3
$ python
>>> from torchtnt.framework import fit
Versions
Collecting environment information...
PyTorch version: 2.2.1
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA RTX A5000
GPU 1: NVIDIA RTX A5000
GPU 2: NVIDIA RTX A5000
GPU 3: NVIDIA RTX A5000
Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD Ryzen Threadripper PRO 3975WX 32-Cores
Stepping: 0
Frequency boost: enabled
CPU MHz: 2200.000
CPU max MHz: 4368.1641
CPU min MHz: 2200.0000
BogoMIPS: 6986.99
Virtualization: AMD-V
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 16 MiB
L3 cache: 128 MiB
NUMA node0 CPU(s): 0-63
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] torch==2.2.1
[pip3] torchaudio==2.2.1
[pip3] torchsnapshot==0.1.0
[pip3] torchtnt==0.2.3
[pip3] torchvision==0.17.1
[pip3] triton==2.2.0
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.8 py310h5eee18b_0
[conda] mkl_random 1.2.4 py310hdb19cb5_0
[conda] numpy 1.26.4 py310h5f9d8c6_0
[conda] numpy-base 1.26.4 py310hb5e798b_0
[conda] pytorch 2.2.1 py3.10_cuda12.1_cudnn8.9.2_0 pytorch
[conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 2.2.1 py310_cu121 pytorch
[conda] torchsnapshot 0.1.0 pyhd8ed1ab_0 conda-forge
[conda] torchtnt 0.2.3 pyhd8ed1ab_0 conda-forge
[conda] torchtriton 2.2.0 py310 pytorch
[conda] torchvision 0.17.1 py310_cu121 pytorch
Hey @djl70, thanks for reporting and apologies for this. We'll work on a fix and release a new version.
In the meantime, if you just need fit
and you aren't using AutoUnit
and you are using just Unit
, you can import by "from torchtnt.framework.fit import fit". When you import from torchtnt.framework
you get the all of the deps in torchtnt/framework/__init__.py
which also contain the AutoUnit
.
Hi @galrotem, it looks like the correct import is already at the top of precision.py, so someone just needs to delete "torch.amp." on line 42 and it's fixed!
Hi @yiminglin-ai, we aim to release within the next day or two, sorry for the delay!