Is there something wrong in torch_neuronx.trace ?
mhokchuekchuek opened this issue · 3 comments
I compile YOLOv10 on inf1
and inf2
.
model complication
-
inf1
inf1
is OK -
inf2
Inf2
I got an error when checking theis leaf param
, see more the error in compiler output
after the errors, I comment assert param.is_leaf
in this, I can compile my model to inf2
then I check pytorch v.1.1.3, it also checkis leaf params
in this but everything is fine on inf1
can you explain what I did wrong when I compile in inf2
?
how to compile
-
follow this instruction to start the ec2, then activate the env
source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate
-
clone this repo then cd to directory
compile
in yolov10 -
load model yolov10 weight via this command
wget -P ./weights -q https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10l.pt
-
install yolov10 requirements
pip install -r requirements-inf.txt
-
run model compiler command
python complier.py --checkpoint weights/yolov10l.pt --output_dir . --mode neuronx
compiler output
06/18/2024 04:26:16 - INFO - __main__ - Tracing the model on CPU
YOLOv10l summary (fused): 461 layers, 25839728 parameters, 25839712 gradients
/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py:844: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:489.)
if param.grad is not None:
06/18/2024 04:26:16 - INFO - torch_neuron - PJRT_DEVICE not set, defaulting to NEURON
/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py:844: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:489.)
if param.grad is not None:
Traceback (most recent call last):
File "/home/ubuntu/yolov10/compile/complier.py", line 84, in <module>
traced_model = torch_neuronx.trace(yolo_model, preprocess_img)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 556, in trace
neff_filename, metaneff, flattener, packer, weights = _trace(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 614, in _trace
) = generate_hlo(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 404, in generate_hlo
) = xla_trace(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 114, in xla_trace
placement.move(state, xla_device)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/placement.py", line 51, in move
func.to(device)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/home/ubuntu/yolov10/compile/ultralytics/nn/tasks.py", line 270, in _apply
self = super()._apply(fn)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 840, in _apply
assert param.is_leaf
AssertionError
env
Package Version
----------------------------- -------------------
absl-py 2.1.0
aiohttp 3.9.5
aiosignal 1.3.1
amqp 5.2.0
annotated-types 0.7.0
ansicolors 1.1.8
anyio 4.4.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
astroid 3.2.2
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.2.0
Automat 22.10.0
aws-neuronx-runtime-discovery 2.9
awscli 1.32.113
Babel 2.15.0
beautifulsoup4 4.12.3
billiard 4.2.0
bleach 6.1.0
boto3 1.34.113
botocore 1.34.113
build 1.2.1
cachetools 5.3.3
celery 5.4.0
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
click-didyoumean 0.3.1
click-plugins 1.1.1
click-repl 0.3.0
cloud-tpu-client 0.10
cloudpickle 3.0.0
cmake 3.29.3
colorama 0.4.6
comm 0.2.2
constantly 23.10.4
contourpy 1.2.1
cryptography 42.0.7
cssselect 1.2.0
cycler 0.12.1
dask 2024.5.1
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
distlib 0.3.8
dnspython 2.6.1
docutils 0.16
dparse 0.6.3
ec2-metadata 2.10.0
email_validator 2.1.1
entrypoints 0.4
environment-kernels 1.2.0
exceptiongroup 1.2.1
executing 2.0.1
fastapi 0.111.0
fastapi-cli 0.0.4
fastjsonschema 2.19.1
filelock 3.14.0
fonttools 4.52.1
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2024.5.0
google-api-core 1.34.1
google-api-python-client 1.8.0
google-auth 2.29.0
google-auth-httplib2 0.2.0
googleapis-common-protos 1.63.0
h11 0.14.0
httpcore 1.0.5
httpie 3.2.2
httplib2 0.22.0
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.3
hyperlink 21.0.0
idna 3.7
imageio 2.34.1
importlib_metadata 7.1.0
incremental 22.10.0
iniconfig 2.0.0
ipykernel 6.29.4
ipython 8.24.0
ipywidgets 8.1.2
islpy 2023.1
isoduration 20.11.0
isort 5.13.2
itemadapter 0.9.0
itemloaders 1.2.0
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json5 0.9.25
jsonpointer 2.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
jupyter 1.0.0
jupyter_client 8.6.2
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.0
jupyter_server_terminals 0.5.3
jupyterlab 4.2.1
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.2
jupyterlab_widgets 3.0.10
kiwisolver 1.4.5
kombu 5.3.7
libneuronxla 2.0.965
llvmlite 0.42.0
locket 1.0.0
lockfile 0.12.2
lxml 5.2.2
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.0
matplotlib-inline 0.1.7
mccabe 0.7.0
mdurl 0.1.2
mistune 3.0.2
mpmath 1.3.0
multidict 6.0.5
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 2.6.3
neuronx-cc 2.13.72.0+78a426937
neuronx-distributed 0.7.0
notebook 7.2.0
notebook_shim 0.2.4
numba 0.59.1
numpy 1.25.2
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
oauth2client 4.1.3
opencv-python 4.9.0.80
orjson 3.10.3
overrides 7.7.0
packaging 21.3
pandas 2.2.2
pandocfilters 1.5.1
papermill 2.6.0
parsel 1.9.1
parso 0.8.4
partd 1.4.2
pexpect 4.9.0
pgzip 0.3.5
pillow 10.3.0
pip 24.0
pip-tools 7.4.1
pipenv 2023.12.1
platformdirs 4.2.2
plotly 5.22.0
pluggy 1.5.0
prometheus_client 0.20.0
prompt-toolkit 3.0.43
Protego 0.3.1
protobuf 3.19.6
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.22
pydantic 2.7.1
pydantic_core 2.18.2
PyDispatcher 2.0.7
Pygments 2.18.0
pyinstrument 4.6.2
pylint 3.2.2
pyOpenSSL 24.1.0
pyparsing 3.1.2
pyproject_hooks 1.1.0
PySocks 1.7.1
pytest 8.2.1
python-daemon 3.0.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-json-logger 2.0.7
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
pyzmq 26.0.3
qtconsole 5.5.2
QtPy 2.4.1
queuelib 1.7.0
referencing 0.35.1
requests 2.32.2
requests-file 2.1.0
requests-toolbelt 1.0.0
requests-unixsocket 0.3.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rpds-py 0.18.1
rsa 4.7.2
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.8
s3transfer 0.10.1
safety 2.3.5
scikit-learn 1.5.0
scipy 1.11.2
Scrapy 2.11.2
seaborn 0.13.2
Send2Trash 1.8.3
service-identity 24.1.0
setuptools 70.0.0
shap 0.45.1
shellingham 1.5.4
six 1.16.0
slicer 0.0.8
sniffio 1.3.1
soupsieve 2.5
stack-data 0.6.3
starlette 0.37.2
sympy 1.12
tenacity 8.3.0
terminado 0.18.1
threadpoolctl 3.5.0
tinycss2 1.3.0
tldextract 5.1.2
tomli 2.0.1
tomlkit 0.12.5
toolz 0.12.1
torch 2.1.2
torch-neuronx 2.1.2.2.1.0
torch-xla 2.1.2
torchvision 0.16.2
tornado 6.4
tqdm 4.66.4
traitlets 5.14.3
triton 2.1.0
Twisted 24.3.0
typer 0.12.3
types-python-dateutil 2.9.0.20240316
typing_extensions 4.12.0
tzdata 2024.1
ujson 5.10.0
uri-template 1.3.0
uritemplate 3.0.1
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
vine 5.1.0
virtualenv 20.26.2
w3lib 2.1.2
watchfiles 0.22.0
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.8.0
websockets 12.0
wget 3.2
wheel 0.43.0
widgetsnbextension 4.0.10
yarl 1.9.4
zipp 3.19.0
zope.interface 6.4.post2
Hi @mhokchuekchuek,
Would you be able to provide instructions on how to reproduce this error? Which version of the YoloV10 model code are you executing?
A minimal reproduction would allow us to debug on our end and let us diagnose which component is failing. The error that you see most likely occurs when moving parameters to the XLA device, but it is unclear from the context why this is happening.
I apologize for the previous unclear description. I have attached how to compile YOLOv10 in the description.
Hi @mhokchuekchuek,
Looking at the code, it's unnecessary to have fuse=True
here since our compiler will fuse operators together optimally for our hardware. Furthermore, when fuse=True
, the manipulations done to the module code results in a model that can't change it's device due to the existence of non-leaf tensors. This was the reason that torch_neuronx.trace
failed in the first place.
When we set fuse=False
, the model compiles and we're able to get 8-10ms latency on neuron vs 140ms on cpu. However, we've found the resulting model produces incorrect output. We are working on fixing the correctness issue and will respond as soon as we have an update.