aws-neuron/aws-neuron-sdk

Compilation Error with torch_neuronx.trace on EC2 inf2.xlarge

SergioMartinezAvahitech opened this issue · 5 comments

Hello,

We are currently using an EC2 inf2.xlarge instance and have set it up according to the PyTorch Neuron Setup on Ubuntu 22. We successfully executed some of the inference examples from AWS Neuron Samples for Torch-Neuron which performed as expected.

However, we are facing issues when trying to compile models from the Coqui TTS repository. Specifically, when attempting to compile the PyTorch models using torch_neuronx.trace, we encounter the following error:

Pytorch model:
Screenshot 2024-05-07 at 5 37 56 p m

Error:
Screenshot 2024-05-07 at 5 47 36 p m

The PyTorch model mention above runs fine on the instance CPU.

Any insights or suggestions on how to resolve this issue would be greatly appreciated!

Thank you for your assistance.

I am trying to reproduce, but can't. Please share the output of pip list so I can see the exact versions you are using of all packages. When I run the test code I have to use the TTS version of tacotron, since I could not find an alternative compatible package.

from TTS.tts.models.tacotron import Tacotron

When I run the code you provided (after retyping from the image) it produces multiple outputs - many with NaN outputs (CPU output).

Alternatively since the failure appears to be a compilation failure. If you can directly share the hlo_module.pb file on the last line I can ask our compiler folks to take a look.

Hello @mrnikwaws,

This is the output of pip list requirements.txt.

This is my code:

# Import necessary libraries
import torch
import torch_neuronx
# Import necessary from TTS
from TTS.tts.configs.tacotron_config import TacotronConfig
from TTS.tts.models.tacotron import Tacotron
from TTS.config import load_config

# Configuration of the Tacotron Model
config = TacotronConfig(num_chars=32, num_speakers=5, out_channels=513, decoder_output_dim=80)
config.use_speaker_embedding = False
config.num_speakers = 1

# Load the model
model = Tacotron(config)

# Get an example input
input_dummy = torch.randint(0, 24, (8, 128)).long()
input_lengths = torch.randint(100, 129, (8,)).long()
input_lengths[-1] = 128
mel_spec = torch.rand(8, 30, config.audio["num_mels"])
mel_lengths = torch.randint(20, 30, (8,)).long()
mel_lengths[-1] = mel_spec.size(1)

example_inputs = input_dummy, input_lengths, mel_spec, mel_lengths

# Run inference on CPU
output_cpu = model(*example_inputs)

# Compile the model using the wrapper
model_neuron = torch_neuronx.trace(model, example_inputs)

This is the last line of the error I get:

2024-05-13 21:21:14.000156:  5468  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.13.72.0+78a426937/MODULE_662100174625409431+d41d8cd9/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/e625350d-bc77-4a22-8235-21c865f66ea4/model.MODULE_662100174625409431+d41d8cd9.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/e625350d-bc77-4a22-8235-21c865f66ea4/model.MODULE_662100174625409431+d41d8cd9.neff', '--verbose=35']: 2024-05-07T22:38:17Z [TEN404] Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new.

Thanks

I can reproduce the failure now (thanks). However there are many compilations which leads me to believe the model is being badly fragmented. Taking a closer look.

Hi @mrnikwaws ,

Do we have any update on this? Looking to solve the same issue.

This model is an encoder-decoder that contains GRU operators, which is a type of model that's known to have some limitations on Neuron.

In the case of this model, it's likely that the GRU operators are causing a high amount of graph fragmentation, leading to the high number of compiled graphs. As a first step, our recommendation is to get the model into a format that can be traced with torch.jit.trace() (see: https://pytorch.org/docs/stable/generated/torch.jit.trace.html). This will enable you to run the torch_neuronx.analyze() API, which will identify the specific operators are causing the high amount of fragmentation.

If you can share the output of the torch_neuronx.analyze() API once you have it working, we can help advise you on the next steps for running this model on Neuron. Ideally the problematic operators can be run manually extracted and run on CPU to get the model into a compilable state. In the worst case, we'll find that the problematic operators will cause such a high degree of fragmentation that we cannot create a large enough single graph to get adequate performance.