aws/amazon-sagemaker-examples

Out of Memory when running the notebook according to instructions

Leggerla opened this issue · 2 comments

Link to the notebook
Notebook

Describe the bug
When running the
!docker run -it --gpus all -v ${PWD}:/mount nvcr.io/nvidia/pytorch:22.10-py3 /bin/bash /mount/export.sh --verbose | tee conversion.txt
cell
the error
Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:40] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
occurs.

To reproduce
In AWS Sagemaker, create a Notebook instance of type ml.g4dn.xlarge and run the notebook

Logs

Unable to find image 'nvcr.io/nvidia/pytorch:22.10-py3' locally
22.10-py3: Pulling from nvidia/pytorch
fb0b3276a519: Pulling fs layer
2416db5e3ba6: Pulling fs layer
2ba01ce48f03: Pulling fs layer
1953d8b854c3: Pulling fs layer
76cd223c882b: Pulling fs layer
45bae771bc00: Pulling fs layer
416ceba70e02: Pulling fs layer
9f29debe0d89: Pulling fs layer
94cb84c1285d: Pulling fs layer
d8dcc244fe18: Pulling fs layer
33a5fab03e15: Pulling fs layer
02fe0924ac3c: Pulling fs layer
608c8a053303: Pulling fs layer
4f4fb700ef54: Pulling fs layer
079063e0d5ea: Pulling fs layer
3eb787a0a71b: Pulling fs layer
688b2c892903: Pulling fs layer
b706a6c654f3: Pulling fs layer
6f1e1da7bd7b: Pulling fs layer
dfb5d79a074f: Pulling fs layer
9cb4ae6c9b9e: Pulling fs layer
78180d3f3014: Pulling fs layer
7f0de540b633: Pulling fs layer
f874466111b2: Pulling fs layer
a113f7ab8786: Pulling fs layer
ef89dda3be8a: Pulling fs layer
0a7a609209c7: Pulling fs layer
d6c9f654232d: Pulling fs layer
0552db6fc3c7: Pulling fs layer
6faa6f074f1c: Pulling fs layer
76290bc2ba87: Pulling fs layer
4d6f0741709b: Pulling fs layer
d0f39b540bbd: Pulling fs layer
cb0ff236cd2f: Pulling fs layer
fe469f00cd2e: Pulling fs layer
8c9265b5196f: Pulling fs layer
a1ce2bb994e5: Pulling fs layer
745802940cf0: Pulling fs layer
fad8d441dc48: Pulling fs layer
62e169860c65: Pulling fs layer
56fc6ee11a76: Pulling fs layer
8079d7cc9429: Pulling fs layer
c2959f3c1f79: Pulling fs layer
b2f869cedbea: Pulling fs layer
342f9ecd7d0b: Pulling fs layer
7ed5a40d20ce: Pulling fs layer
b75f99413198: Pulling fs layer
9b13389ffa92: Pulling fs layer
f874466111b2: Waiting
a113f7ab8786: Waiting
a1ce2bb994e5: Waiting
ef89dda3be8a: Waiting
745802940cf0: Waiting
0a7a609209c7: Waiting
fad8d441dc48: Waiting
62e169860c65: Waiting
1953d8b854c3: Waiting
56fc6ee11a76: Waiting
76cd223c882b: Waiting
8079d7cc9429: Waiting
45bae771bc00: Waiting
c2959f3c1f79: Waiting
416ceba70e02: Waiting
b2f869cedbea: Waiting
342f9ecd7d0b: Waiting
9f29debe0d89: Waiting
7ed5a40d20ce: Waiting
94cb84c1285d: Waiting
d8dcc244fe18: Waiting
b75f99413198: Waiting
9b13389ffa92: Waiting
33a5fab03e15: Waiting
6f1e1da7bd7b: Waiting
608c8a053303: Waiting
dfb5d79a074f: Waiting
4f4fb700ef54: Waiting
9cb4ae6c9b9e: Waiting
079063e0d5ea: Waiting
78180d3f3014: Waiting
3eb787a0a71b: Waiting
7f0de540b633: Waiting
688b2c892903: Waiting
b706a6c654f3: Waiting
d6c9f654232d: Waiting
76290bc2ba87: Waiting
0552db6fc3c7: Waiting
4d6f0741709b: Waiting
6faa6f074f1c: Waiting
fe469f00cd2e: Waiting
d0f39b540bbd: Waiting
cb0ff236cd2f: Waiting
8c9265b5196f: Waiting
fb0b3276a519: Verifying Checksum
fb0b3276a519: Download complete
1953d8b854c3: Verifying Checksum
1953d8b854c3: Download complete
fb0b3276a519: Pull complete
2416db5e3ba6: Verifying Checksum
2416db5e3ba6: Download complete
2ba01ce48f03: Verifying Checksum
2ba01ce48f03: Download complete
45bae771bc00: Verifying Checksum
45bae771bc00: Download complete
416ceba70e02: Verifying Checksum
416ceba70e02: Download complete
9f29debe0d89: Verifying Checksum
9f29debe0d89: Download complete
94cb84c1285d: Verifying Checksum
94cb84c1285d: Download complete
d8dcc244fe18: Verifying Checksum
d8dcc244fe18: Download complete
02fe0924ac3c: Verifying Checksum
02fe0924ac3c: Download complete
2416db5e3ba6: Pull complete
33a5fab03e15: Verifying Checksum
33a5fab03e15: Download complete
4f4fb700ef54: Verifying Checksum
4f4fb700ef54: Download complete
079063e0d5ea: Verifying Checksum
079063e0d5ea: Download complete
2ba01ce48f03: Pull complete
1953d8b854c3: Pull complete
608c8a053303: Verifying Checksum
608c8a053303: Download complete
688b2c892903: Download complete
b706a6c654f3: Verifying Checksum
b706a6c654f3: Download complete
6f1e1da7bd7b: Verifying Checksum
6f1e1da7bd7b: Download complete
3eb787a0a71b: Verifying Checksum
3eb787a0a71b: Download complete
9cb4ae6c9b9e: Verifying Checksum
9cb4ae6c9b9e: Download complete
dfb5d79a074f: Verifying Checksum
dfb5d79a074f: Download complete
78180d3f3014: Verifying Checksum
78180d3f3014: Download complete
f874466111b2: Download complete
7f0de540b633: Verifying Checksum
7f0de540b633: Download complete
ef89dda3be8a: Verifying Checksum
ef89dda3be8a: Download complete
a113f7ab8786: Verifying Checksum
a113f7ab8786: Download complete
d6c9f654232d: Verifying Checksum
d6c9f654232d: Download complete
0552db6fc3c7: Download complete
0a7a609209c7: Verifying Checksum
0a7a609209c7: Download complete
76290bc2ba87: Verifying Checksum
76290bc2ba87: Download complete
4d6f0741709b: Download complete
d0f39b540bbd: Verifying Checksum
d0f39b540bbd: Download complete
cb0ff236cd2f: Verifying Checksum
fe469f00cd2e: Verifying Checksum
fe469f00cd2e: Download complete
6faa6f074f1c: Verifying Checksum
6faa6f074f1c: Download complete
a1ce2bb994e5: Verifying Checksum
a1ce2bb994e5: Download complete
745802940cf0: Download complete
fad8d441dc48: Verifying Checksum
fad8d441dc48: Download complete
62e169860c65: Verifying Checksum
62e169860c65: Download complete
56fc6ee11a76: Verifying Checksum
56fc6ee11a76: Download complete
8079d7cc9429: Verifying Checksum
8079d7cc9429: Download complete
c2959f3c1f79: Download complete
b2f869cedbea: Verifying Checksum
b2f869cedbea: Download complete
342f9ecd7d0b: Download complete
7ed5a40d20ce: Verifying Checksum
7ed5a40d20ce: Download complete
b75f99413198: Verifying Checksum
b75f99413198: Download complete
9b13389ffa92: Verifying Checksum
9b13389ffa92: Download complete
76cd223c882b: Verifying Checksum
76cd223c882b: Download complete
8c9265b5196f: Verifying Checksum
8c9265b5196f: Download complete
76cd223c882b: Pull complete
45bae771bc00: Pull complete
416ceba70e02: Pull complete
9f29debe0d89: Pull complete
94cb84c1285d: Pull complete
d8dcc244fe18: Pull complete
33a5fab03e15: Pull complete
02fe0924ac3c: Pull complete
608c8a053303: Pull complete
4f4fb700ef54: Pull complete
079063e0d5ea: Pull complete
3eb787a0a71b: Pull complete
688b2c892903: Pull complete
b706a6c654f3: Pull complete
6f1e1da7bd7b: Pull complete
dfb5d79a074f: Pull complete
9cb4ae6c9b9e: Pull complete
78180d3f3014: Pull complete
7f0de540b633: Pull complete
f874466111b2: Pull complete
a113f7ab8786: Pull complete
ef89dda3be8a: Pull complete
0a7a609209c7: Pull complete
d6c9f654232d: Pull complete
0552db6fc3c7: Pull complete
6faa6f074f1c: Pull complete
76290bc2ba87: Pull complete
4d6f0741709b: Pull complete
d0f39b540bbd: Pull complete
cb0ff236cd2f: Pull complete
fe469f00cd2e: Pull complete
8c9265b5196f: Pull complete
a1ce2bb994e5: Pull complete
745802940cf0: Pull complete
fad8d441dc48: Pull complete
62e169860c65: Pull complete
56fc6ee11a76: Pull complete
8079d7cc9429: Pull complete
c2959f3c1f79: Pull complete
b2f869cedbea: Pull complete
342f9ecd7d0b: Pull complete
7ed5a40d20ce: Pull complete
b75f99413198: Pull complete
9b13389ffa92: Pull complete
Digest: sha256:7ad18fc3d2b9cdc35f9e5f0043987e8391fcf592c88177fdd9daa31b3b886be9
Status: Downloaded newer image for nvcr.io/nvidia/pytorch:22.10-py3

=============
== PyTorch ==

NVIDIA Release 22.10 (build 46164382)
PyTorch Version 1.13.0a0+d0d6b1f

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/
Collecting transformers
Downloading transformers-4.40.2-py3-none-any.whl (9.0 MB)
|████████████████████████████████| 9.0 MB 25.9 MB/s eta 0:00:01
Collecting ftfy
Downloading ftfy-6.2.0-py3-none-any.whl (54 kB)
|████████████████████████████████| 54 kB 62.4 MB/s eta 0:00:01
Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (1.6.3)
Collecting tokenizers<0.20,>=0.19
Downloading tokenizers-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
|████████████████████████████████| 3.6 MB 74.7 MB/s eta 0:00:01
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers) (2.28.1)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from transformers) (6.0)
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.8/site-packages (from transformers) (1.22.2)
Collecting huggingface-hub<1.0,>=0.19.3
Downloading huggingface_hub-0.23.0-py3-none-any.whl (401 kB)
|████████████████████████████████| 401 kB 84.4 MB/s eta 0:00:01
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers) (4.64.1)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers) (2022.9.13)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from transformers) (21.3)
Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers) (3.8.0)
Collecting safetensors>=0.4.1
Downloading safetensors-0.4.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
|████████████████████████████████| 1.2 MB 70.5 MB/s eta 0:00:01
Collecting wcwidth<0.3.0,>=0.2.12
Downloading wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Collecting fsspec>=2023.5.0
Downloading fsspec-2024.3.1-py3-none-any.whl (171 kB)
|████████████████████████████████| 171 kB 82.4 MB/s eta 0:00:01
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (4.4.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->transformers) (3.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (2022.9.24)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (3.3)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (2.1.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (1.26.11)
Installing collected packages: fsspec, huggingface-hub, wcwidth, tokenizers, safetensors, transformers, ftfy
Attempting uninstall: fsspec
Found existing installation: fsspec 2022.8.2
Uninstalling fsspec-2022.8.2:
Successfully uninstalled fsspec-2022.8.2
Attempting uninstall: wcwidth
Found existing installation: wcwidth 0.2.5
Uninstalling wcwidth-0.2.5:
Successfully uninstalled wcwidth-0.2.5
Successfully installed fsspec-2024.3.1 ftfy-6.2.0 huggingface-hub-0.23.0 safetensors-0.4.3 tokenizers-0.19.1 transformers-4.40.2 wcwidth-0.2.13
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/
Requirement already satisfied: transformers[onnxruntime] in /opt/conda/lib/python3.8/site-packages (4.40.2)
Requirement already satisfied: tokenizers<0.20,>=0.19 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.19.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.23.0)
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (1.22.2)
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (2.28.1)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (4.64.1)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (21.3)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (2022.9.13)
Requirement already satisfied: safetensors>=0.4.1 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.4.3)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (6.0)
Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (3.8.0)
Collecting onnxruntime-tools>=1.4.2
Downloading onnxruntime_tools-1.7.0-py3-none-any.whl (212 kB)
|████████████████████████████████| 212 kB 27.1 MB/s eta 0:00:01
Collecting onnxruntime>=1.4.0
Downloading onnxruntime-1.17.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.8 MB)
|████████████████████████████████| 6.8 MB 51.6 MB/s eta 0:00:01
Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers[onnxruntime]) (2024.3.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers[onnxruntime]) (4.4.0)
Collecting sympy
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
|████████████████████████████████| 5.7 MB 77.3 MB/s eta 0:00:01
Collecting flatbuffers
Downloading flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB)
Collecting coloredlogs
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
|████████████████████████████████| 46 kB 60.7 MB/s eta 0:00:01
Requirement already satisfied: protobuf in /opt/conda/lib/python3.8/site-packages (from onnxruntime>=1.4.0->transformers[onnxruntime]) (3.20.3)
Requirement already satisfied: psutil in /opt/conda/lib/python3.8/site-packages (from onnxruntime-tools>=1.4.2->transformers[onnxruntime]) (5.9.2)
Collecting py3nvml
Downloading py3nvml-0.2.7-py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 63.4 MB/s eta 0:00:01
Requirement already satisfied: onnx in /opt/conda/lib/python3.8/site-packages (from onnxruntime-tools>=1.4.2->transformers[onnxruntime]) (1.12.0)
Collecting py-cpuinfo
Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->transformers[onnxruntime]) (3.0.9)
Collecting humanfriendly>=9.1
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
|████████████████████████████████| 86 kB 74.2 MB/s eta 0:00:01
Collecting protobuf
Downloading protobuf-3.20.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 83.3 MB/s eta 0:00:01
Collecting xmltodict
Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (2022.9.24)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (2.1.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (1.26.11)
Collecting mpmath>=0.19
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
|████████████████████████████████| 536 kB 82.4 MB/s eta 0:00:01
Installing collected packages: xmltodict, protobuf, mpmath, humanfriendly, sympy, py3nvml, py-cpuinfo, flatbuffers, coloredlogs, onnxruntime-tools, onnxruntime
Attempting uninstall: protobuf
Found existing installation: protobuf 3.20.3
Uninstalling protobuf-3.20.3:
Successfully uninstalled protobuf-3.20.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.1 which is incompatible.
Successfully installed coloredlogs-15.0.1 flatbuffers-24.3.25 humanfriendly-10.0 mpmath-1.3.0 onnxruntime-1.17.3 onnxruntime-tools-1.7.0 protobuf-3.20.1 py-cpuinfo-9.0.0 py3nvml-0.2.7 sympy-1.12 xmltodict-0.13.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/
Collecting diffusers
Downloading diffusers-0.27.2-py3-none-any.whl (2.0 MB)
|████████████████████████████████| 2.0 MB 19.3 MB/s eta 0:00:01
Requirement already satisfied: safetensors>=0.3.1 in /opt/conda/lib/python3.8/site-packages (from diffusers) (0.4.3)
Requirement already satisfied: huggingface-hub>=0.20.2 in /opt/conda/lib/python3.8/site-packages (from diffusers) (0.23.0)
Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from diffusers) (3.8.0)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.8/site-packages (from diffusers) (5.0.0)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from diffusers) (2022.9.13)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.8/site-packages (from diffusers) (9.0.1)
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from diffusers) (2.28.1)
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from diffusers) (1.22.2)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (4.4.0)
Requirement already satisfied: tqdm>=4.42.1 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (4.64.1)
Requirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (21.3)
Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (2024.3.1)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (6.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.9->huggingface-hub>=0.20.2->diffusers) (3.0.9)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.8/site-packages (from importlib-metadata->diffusers) (3.9.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (3.3)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (2.1.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (1.26.11)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (2022.9.24)
Installing collected packages: diffusers
Successfully installed diffusers-0.27.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().
0it [00:00, ?it/s]
Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

.
/opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
vae/config.json: 100% 551/551 [00:00<00:00, 84.5kB/s]
diffusion_pytorch_model.safetensors: 100% 335M/335M [00:00<00:00, 387MB/s]
tokenizer_config.json: 100% 905/905 [00:00<00:00, 139kB/s]
vocab.json: 100% 961k/961k [00:00<00:00, 40.9MB/s]
merges.txt: 100% 525k/525k [00:00<00:00, 48.5MB/s]
special_tokens_map.json: 100% 389/389 [00:00<00:00, 239kB/s]
tokenizer.json: 100% 2.22M/2.22M [00:00<00:00, 27.9MB/s]
config.json: 100% 4.52k/4.52k [00:00<00:00, 904kB/s]
model.safetensors: 100% 1.71G/1.71G [00:03<00:00, 443MB/s]
/opt/conda/lib/python3.8/site-packages/diffusers/models/upsampling.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/opt/conda/lib/python3.8/site-packages/diffusers/models/upsampling.py:165: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:
/opt/conda/lib/python3.8/site-packages/diffusers/models/autoencoders/autoencoder_kl.py:306: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not return_dict:
/opt/conda/lib/python3.8/site-packages/torch/onnx/_patch_torch.py:69: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.)
torch._C._jit_pass_onnx_node_shape_type_inference(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:649: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.)
_C._jit_pass_onnx_graph_shape_type_inference(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1125: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.)
_C._jit_pass_onnx_graph_shape_type_inference(
Here is the shape of the input -----------------------------------------------------
torch.Size([1, 77])
/opt/conda/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1 or self.sliding_window is not None:
/opt/conda/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:279: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:287: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
/opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:319: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/opt/conda/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:4595: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
warnings.warn(
&&&& RUNNING TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16
[05/08/2024-19:50:50] [I] === Model Options ===
[05/08/2024-19:50:50] [I] Format: ONNX
[05/08/2024-19:50:50] [I] Model: vae.onnx
[05/08/2024-19:50:50] [I] Output:
[05/08/2024-19:50:50] [I] === Build Options ===
[05/08/2024-19:50:50] [I] Max batch: explicit batch
[05/08/2024-19:50:50] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/08/2024-19:50:50] [I] minTiming: 1
[05/08/2024-19:50:50] [I] avgTiming: 8
[05/08/2024-19:50:50] [I] Precision: FP32+FP16
[05/08/2024-19:50:50] [I] LayerPrecisions:
[05/08/2024-19:50:50] [I] Calibration:
[05/08/2024-19:50:50] [I] Refit: Disabled
[05/08/2024-19:50:50] [I] Sparsity: Disabled
[05/08/2024-19:50:50] [I] Safe mode: Disabled
[05/08/2024-19:50:50] [I] DirectIO mode: Disabled
[05/08/2024-19:50:50] [I] Restricted mode: Disabled
[05/08/2024-19:50:50] [I] Build only: Disabled
[05/08/2024-19:50:50] [I] Save engine: vae.plan
[05/08/2024-19:50:50] [I] Load engine:
[05/08/2024-19:50:50] [I] Profiling verbosity: 0
[05/08/2024-19:50:50] [I] Tactic sources: Using default tactic sources
[05/08/2024-19:50:50] [I] timingCacheMode: local
[05/08/2024-19:50:50] [I] timingCacheFile:
[05/08/2024-19:50:50] [I] Heuristic: Disabled
[05/08/2024-19:50:50] [I] Preview Features: Use default preview flags.
[05/08/2024-19:50:50] [I] Input(s)s format: fp32:CHW
[05/08/2024-19:50:50] [I] Output(s)s format: fp32:CHW
[05/08/2024-19:50:50] [I] Input build shape: latent_sample=1x4x64x64+4x4x64x64+8x4x64x64
[05/08/2024-19:50:50] [I] Input calibration shapes: model
[05/08/2024-19:50:50] [I] === System Options ===
[05/08/2024-19:50:50] [I] Device: 0
[05/08/2024-19:50:50] [I] DLACore:
[05/08/2024-19:50:50] [I] Plugins:
[05/08/2024-19:50:50] [I] === Inference Options ===
[05/08/2024-19:50:50] [I] Batch: Explicit
[05/08/2024-19:50:50] [I] Input inference shape: latent_sample=4x4x64x64
[05/08/2024-19:50:50] [I] Iterations: 10
[05/08/2024-19:50:50] [I] Duration: 3s (+ 200ms warm up)
[05/08/2024-19:50:50] [I] Sleep time: 0ms
[05/08/2024-19:50:50] [I] Idle time: 0ms
[05/08/2024-19:50:50] [I] Streams: 1
[05/08/2024-19:50:50] [I] ExposeDMA: Disabled
[05/08/2024-19:50:50] [I] Data transfers: Enabled
[05/08/2024-19:50:50] [I] Spin-wait: Disabled
[05/08/2024-19:50:50] [I] Multithreading: Disabled
[05/08/2024-19:50:50] [I] CUDA Graph: Disabled
[05/08/2024-19:50:50] [I] Separate profiling: Disabled
[05/08/2024-19:50:50] [I] Time Deserialize: Disabled
[05/08/2024-19:50:50] [I] Time Refit: Disabled
[05/08/2024-19:50:50] [I] NVTX verbosity: 0
[05/08/2024-19:50:50] [I] Persistent Cache Ratio: 0
[05/08/2024-19:50:50] [I] Inputs:
[05/08/2024-19:50:50] [I] === Reporting Options ===
[05/08/2024-19:50:50] [I] Verbose: Disabled
[05/08/2024-19:50:50] [I] Averages: 10 inferences
[05/08/2024-19:50:50] [I] Percentiles: 90,95,99
[05/08/2024-19:50:50] [I] Dump refittable layers:Disabled
[05/08/2024-19:50:50] [I] Dump output: Disabled
[05/08/2024-19:50:50] [I] Profile: Disabled
[05/08/2024-19:50:50] [I] Export timing to JSON file:
[05/08/2024-19:50:50] [I] Export output to JSON file:
[05/08/2024-19:50:50] [I] Export profile to JSON file:
[05/08/2024-19:50:50] [I]
[05/08/2024-19:50:50] [I] === Device Information ===
[05/08/2024-19:50:50] [I] Selected Device: Tesla T4
[05/08/2024-19:50:50] [I] Compute Capability: 7.5
[05/08/2024-19:50:50] [I] SMs: 40
[05/08/2024-19:50:50] [I] Compute Clock Rate: 1.59 GHz
[05/08/2024-19:50:50] [I] Device Global Memory: 15102 MiB
[05/08/2024-19:50:50] [I] Shared Memory per SM: 64 KiB
[05/08/2024-19:50:50] [I] Memory Bus Width: 256 bits (ECC enabled)
[05/08/2024-19:50:50] [I] Memory Clock Rate: 5.001 GHz
[05/08/2024-19:50:50] [I]
[05/08/2024-19:50:50] [I] TensorRT version: 8.5.0
[05/08/2024-19:50:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 26, GPU 103 (MiB)
[05/08/2024-19:50:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +340, GPU +74, now: CPU 418, GPU 177 (MiB)
[05/08/2024-19:50:55] [I] Start parsing network model
[05/08/2024-19:50:55] [I] [TRT] ----------------------------------------------------------------
[05/08/2024-19:50:55] [I] [TRT] Input filename: vae.onnx
[05/08/2024-19:50:55] [I] [TRT] ONNX IR version: 0.0.7
[05/08/2024-19:50:55] [I] [TRT] Opset version: 14
[05/08/2024-19:50:55] [I] [TRT] Producer name: pytorch
[05/08/2024-19:50:55] [I] [TRT] Producer version: 1.13.0
[05/08/2024-19:50:55] [I] [TRT] Domain:
[05/08/2024-19:50:55] [I] [TRT] Model version: 0
[05/08/2024-19:50:55] [I] [TRT] Doc string:
[05/08/2024-19:50:55] [I] [TRT] ----------------------------------------------------------------
[05/08/2024-19:50:55] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/08/2024-19:50:56] [I] Finish parsing network model
[05/08/2024-19:50:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 639, GPU 827 (MiB)
[05/08/2024-19:50:57] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 639, GPU 837 (MiB)
[05/08/2024-19:50:57] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2024-20:16:40] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:16:40] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:16:40] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:16:40] [W] [TRT] Skipping tactic 2 due to insufficient memory on requested size of 34359738368 detected for tactic 0x0000000000000002.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[05/08/2024-20:16:51] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:16:51] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:16:51] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:16:51] [W] [TRT] Skipping tactic 7 due to insufficient memory on requested size of 34359738368 detected for tactic 0x000000000000003a.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[05/08/2024-20:21:17] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:21:17] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:21:17] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:21:17] [W] [TRT] Skipping tactic 2 due to insufficient memory on requested size of 34359738368 detected for tactic 0x0000000000000002.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[05/08/2024-20:21:22] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:21:22] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[05/08/2024-20:21:22] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:21:22] [W] [TRT] Skipping tactic 7 due to insufficient memory on requested size of 34359738368 detected for tactic 0x000000000000003a.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[05/08/2024-20:26:27] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/08/2024-20:26:27] [I] [TRT] Total Host Persistent Memory: 169088
[05/08/2024-20:26:27] [I] [TRT] Total Device Persistent Memory: 16685568
[05/08/2024-20:26:27] [I] [TRT] Total Scratch Memory: 33554432
[05/08/2024-20:26:27] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 97 MiB, GPU 12290 MiB
[05/08/2024-20:26:27] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 105.026ms to assign 7 blocks to 311 nodes requiring 3556769796 bytes.
[05/08/2024-20:26:27] [I] [TRT] Total Activation Memory: 3556769796
[05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1210, GPU 1355 (MiB)
[05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1210, GPU 1363 (MiB)
[05/08/2024-20:26:27] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/08/2024-20:26:27] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/08/2024-20:26:27] [W] [TRT] Check verbose logs for the list of affected weights.
[05/08/2024-20:26:27] [W] [TRT] - 53 weights are affected by this issue: Detected subnormal FP16 values.
[05/08/2024-20:26:27] [W] [TRT] - 27 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +96, GPU +112, now: CPU 96, GPU 112 (MiB)
[05/08/2024-20:26:28] [I] Engine built in 2137.59 sec.
[05/08/2024-20:26:28] [I] [TRT] Loaded engine size: 97 MiB
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 664, GPU 779 (MiB)
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 664, GPU 787 (MiB)
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +112, now: CPU 0, GPU 112 (MiB)
[05/08/2024-20:26:28] [I] Engine deserialized in 0.0565643 sec.
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 664, GPU 797 (MiB)
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 664, GPU 805 (MiB)
[05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3408, now: CPU 0, GPU 3520 (MiB)
[05/08/2024-20:26:28] [I] Setting persistentCacheLimit to 0 bytes.
[05/08/2024-20:26:28] [I] Using random values for input latent_sample
[05/08/2024-20:26:28] [I] Created input binding for latent_sample with dimensions 4x4x64x64
[05/08/2024-20:26:28] [I] Using random values for output sample
[05/08/2024-20:26:28] [I] Created output binding for sample with dimensions 4x3x512x512
[05/08/2024-20:26:28] [I] Starting inference
[05/08/2024-20:26:34] [I] Warmup completed 1 queries over 200 ms
[05/08/2024-20:26:34] [I] Timing trace has 10 queries over 6.31709 s
[05/08/2024-20:26:34] [I]
[05/08/2024-20:26:34] [I] === Trace details ===
[05/08/2024-20:26:34] [I] Trace averages of 10 runs:
[05/08/2024-20:26:34] [I] Average on 10 runs - GPU latency: 576.906 ms - Host latency: 579.155 ms (enqueue 2.51767 ms)
[05/08/2024-20:26:34] [I]
[05/08/2024-20:26:34] [I] === Performance summary ===
[05/08/2024-20:26:34] [I] Throughput: 1.58301 qps
[05/08/2024-20:26:34] [I] Latency: min = 563.076 ms, max = 591.083 ms, mean = 579.155 ms, median = 580.306 ms, percentile(90%) = 590.798 ms, percentile(95%) = 591.083 ms, percentile(99%) = 591.083 ms
[05/08/2024-20:26:34] [I] Enqueue Time: min = 2.23538 ms, max = 2.66504 ms, mean = 2.51767 ms, median = 2.52121 ms, percentile(90%) = 2.66113 ms, percentile(95%) = 2.66504 ms, percentile(99%) = 2.66504 ms
[05/08/2024-20:26:34] [I] H2D Latency: min = 0.0797439 ms, max = 0.0925293 ms, mean = 0.0872102 ms, median = 0.0871582 ms, percentile(90%) = 0.0898438 ms, percentile(95%) = 0.0925293 ms, percentile(99%) = 0.0925293 ms
[05/08/2024-20:26:34] [I] GPU Compute Time: min = 560.796 ms, max = 588.808 ms, mean = 576.906 ms, median = 578.167 ms, percentile(90%) = 588.523 ms, percentile(95%) = 588.808 ms, percentile(99%) = 588.808 ms
[05/08/2024-20:26:34] [I] D2H Latency: min = 1.90869 ms, max = 2.19568 ms, mean = 2.16147 ms, median = 2.1897 ms, percentile(90%) = 2.19324 ms, percentile(95%) = 2.19568 ms, percentile(99%) = 2.19568 ms
[05/08/2024-20:26:34] [I] Total Host Walltime: 6.31709 s
[05/08/2024-20:26:34] [I] Total GPU Compute Time: 5.76906 s
[05/08/2024-20:26:34] [W] * GPU compute time is unstable, with coefficient of variance = 1.55722%.
[05/08/2024-20:26:34] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[05/08/2024-20:26:34] [I] Explanations of the performance metrics are printed in the verbose logs.
[05/08/2024-20:26:34] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16

@Leggerla he error message indicates that the amount of GPU memory requested (34 GB) could not be allocated. The ml.g4dn.xlarge instance type has a GPU with 16 GB of memory, which is insufficient for the requested memory allocation.
Might try out these solutions

  1. Reduce Memory Requirements: Modify the script or model configuration to require less GPU memory.
  2. Use a Larger Instance: Use an instance type with more GPU memory. For example:
    ml.p3.2xlarge with a Tesla V100 GPU (16 GB)
    ml.p3.8xlarge with four Tesla V100 GPUs (64 GB total)
    ml.p4d.24xlarge with eight A100 GPUs (320 GB total)
    (though not recommended)
  3. Optimize the model:-
    a. Model pruning:-
import torch
import torchvision.models as models

# Load a pre-trained ResNet model
model = models.resnet18(pretrained=True)
import torch.nn.utils.prune as prune

# Prune 20% of the smallest weights in the convolutional layers
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Conv2d):
        prune.l1_unstructured(module, name='weight', amount=0.2)

# Remove the pruning reparameterization to make the model more efficient
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Conv2d):
        prune.remove(module, 'weight')

b.) model quantization:-

# Set the model to evaluation mode
model.eval()

# Fuse Conv, BatchNorm and ReLU layers for quantization
model.fuse_model()

# Specify the quantization configuration
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')

# Prepare the model for quantization aware training
torch.quantization.prepare_qat(model, inplace=True)

(You can also do fine tuning)
c. Save the Optimized Model

torch.save(quantized_model.state_dict(), 'optimized_model.pth')

d. Load and Use the Optimized Model

# Load the quantized model
model = models.resnet18(pretrained=False)
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
quantized_model = torch.quantization.convert(model.eval(), inplace=False)
quantized_model.load_state_dict(torch.load('optimized_model.pth'))

# Set the model to evaluation mode
quantized_model.eval()

# Perform inference
input_tensor = torch.randn(1, 3, 224, 224)
output = quantized_model(input_tensor)
print(output)

plz let me know, if it helps and solve the issue
Thanks