aws-neuron/aws-neuron-sdk

RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: PJRT_Client_Create: error condition nullptr != (args)->client->Error(): Init: error condition !(num_devices > 0):

PrateekAg1511 opened this issue · 4 comments

Hi,

I am facing this error when trying to trace a model using torch_neuronx 2.1.

RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: PJRT_Client_Create: error condition nullptr != (args)->client->Error(): Init: error condition !(num_devices > 0):

Packages:

torch_neuronx : '2.1.2.2.1.0'

neuron-cc:
NeuronX Compiler version 2.11.0.34+c5231f848

Python version 3.10.12
HWM version 2.11.0.2-e34678757
NumPy version 1.23.5

torch : '2.1.2+cu121'

torch_xla: '2.1.2'

Can some help me debug this ?

@PrateekAg1511 are you running on trn1/inf2 instance ? Do you have the rest of the Neuron SKD installed?

this: num_devices > 0 looks like neuron driver is not installed. Did you follow setup steps? https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html#setup-torch-neuronx

@awsilya I am using the AWS SageMaker Neuron Image. It works fine when I use torch neuronx 1.3.

When I upgrade it to torch neuronx 2.1, I am getting this error.

the reason for moving to neuronx 2.1 is that when using neuronx 1.3 , I am getting warning that input tensors are not being used.

When I upgrade it to torch neuronx 2.1, I am getting this error.

The error you are running into is indicating that the frontend framework cannot find any Neuron devices on the instance. This is either because the instance type does not have any NeuronCores available (only trn1/inf2-type instances expose these devices) or because the driver is not installed.

Can you confirm if the NeuronCores are accessible by using the neuron-ls command-line tool?

the reason for moving to neuronx 2.1 is that when using neuronx 1.3 , I am getting warning that input tensors are not being used.

It is unlikely that moving to neuronx 2.1 will resolve your issue. However, it is still a good idea to validate that the NeuronCores are accessible before you begin testing the trace functionality