aws-neuron/aws-neuron-sdk

Compiling model using torch.neuron in local is possible, but compiling model using torch.neuronx in local is not possible.

newgrit1004 opened this issue · 2 comments

Hi, I want to compile pytorch model into an AWS neuron model or neuronx model in local to reduce server costs.

I can compile ResNet50 into neuron model in local, but some models such as transformers are not available to convert them into neuronx model in local.

Here's the script I used below.

Compile ResNet50 pytorch model into neuron model

Setup environment

python3 -m venv neuron_venv
. neuron_venv/bin/activate
pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
pip install torch-neuron neuron-cc[tensorflow] "protobuf" torchvision

Run python script

import torch
import torch.neuron
import torchvision.models as models

input_tensor = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

model = models.resnet50(pretrained=True)
model.eval()

torch.neuron.analyze_model(model, example_inputs=[input_tensor])

model_neuron = torch.neuron.trace(model, example_inputs=[input_tensor])
model_neuron.save("model_neuron.pt")

Compile Transformer model into neuronx model

Setup environment

python3 -m venv neuronx_venv
. neuronx_venv/bin/activate
sudo apt-get install aws-neuronx-collectives=2.* -y
sudo apt-get install aws-neuronx-runtime-lib=2.* -y
sudo apt-get install aws-neuronx-collectives=2.* -y
pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
pip install neuronx-cc==2.* torch-neuronx torchvision

Run python script

import os
import torch
import torch_neuronx
import argparse
from transformers import AutoTokenizer, AutoModelForSequenceClassification


def generate_sample_inputs(tokenizer, inputs, max_length=128):
    embeddings = tokenizer(
        inputs, max_length=max_length, padding="max_length", return_tensors="pt"
    )
    return tuple(embeddings.values())


def get_model_neuron(args):
    neuron_model_dir = os.path.join(args.save_path, args.model_id)
    neuron_model_filepath = os.path.join(neuron_model_dir, "model_neuron.pt")
    if os.path.exists(neuron_model_filepath):
        print("Load pre-compiled model")
        tokenizer = AutoTokenizer.from_pretrained(neuron_model_dir)
        model_neuron = torch.jit.load(neuron_model_filepath)
    else:
        print("Compile model")
        os.makedirs(neuron_model_dir, exist_ok=True)
        tokenizer = AutoTokenizer.from_pretrained(args.model_id)
        model = AutoModelForSequenceClassification.from_pretrained(
            args.model_id, torchscript=True
        )
        model.eval()
        model_neuron = compile_model(model, tokenizer, args.max_length)
        torch.jit.save(model_neuron, neuron_model_filepath)
        tokenizer.save_pretrained(neuron_model_dir)
    return model_neuron, tokenizer, neuron_model_filepath


def compile_model(model, tokenizer, max_length=128):
    dummy_inputs = generate_sample_inputs(tokenizer, "dummy", max_length)
    model_neuron = torch_neuronx.trace(model, example_inputs=dummy_inputs)
    return model_neuron


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--max_length", type=int, default=128)
    parser.add_argument("--save_path", type=str, default="neuron_model")
    parser.add_argument(
        "--model_id",
        type=str,
        default="distilbert-base-uncased-finetuned-sst-2-english",
    )

    parser_args, _ = parser.parse_known_args()
    return parser_args


def predict(model, tokenizer, inputs, postprocess_output=False):
    payloads = generate_sample_inputs(tokenizer, inputs)
    outputs = model(*payloads)

    if postprocess_output:
        softmax_fn = torch.nn.Softmax(dim=1)
        softmax_outputs = softmax_fn(outputs[0])
        _, pred = torch.max(softmax_outputs, dim=1)
        return (outputs, pred)
    else:
        return outputs


def main(args):
    print(args)
    model_neuron, tokenizer, neuron_model_filepath = get_model_neuron(args)
    inputs = "I do not like you"
    outputs = predict(model_neuron, tokenizer, inputs, postprocess_output=True)
    print(outputs)


if __name__ == "__main__":
    main(parse_args())

Error log

2024-Aug-12 13:15:49.958851 1216810:1216810 ERROR  TDRV:tdrv_get_dev_info                       No neuron device available
Traceback (most recent call last):
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_compile.py", line 78, in <module>
    main(parse_args())
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_compile.py", line 71, in main
    model_neuron, tokenizer, neuron_model_filepath = get_model_neuron(args)
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_compile.py", line 30, in get_model_neuron
    model_neuron = compile_model(model, tokenizer, args.max_length)
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_compile.py", line 38, in compile_model
    model_neuron = torch_neuronx.trace(model, example_inputs=dummy_inputs)
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_neuronx/xla_impl/trace.py", line 592, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_neuronx/xla_impl/trace.py", line 651, in _trace
    ) = generate_hlo(
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_neuronx/xla_impl/trace.py", line 424, in generate_hlo
    ) = xla_trace(
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 72, in xla_trace
    xla_device = xla_model.xla_device()
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_xla/core/xla_model.py", line 207, in xla_device
    return runtime.xla_device(n, devkind)
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_xla/runtime.py", line 82, in wrapper
    return fn(*args, **kwargs)
  File "/home/sewonkim/Desktop/projects/neuron_test/neuronx_venv/lib/python3.9/site-packages/torch_xla/runtime.py", line 111, in xla_device
    return torch.device(torch_xla._XLAC._xla_get_default_device())
RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: PJRT_Client_Create: error condition nullptr != (args)->client->Error(): Init: error condition !(num_devices > 0): 

I want to know whether it is possible to compile pytorch model into neuronx model without using inferentia2 instances.
I heard that one of the pros using inferentia instances is to compile model in local. Resnet and EfficientNet are possible, but what I am interested in are stable diffusion and transformer models.

I also used DLC images but I got the same error log, neuron device error.

If you have any information about how to compile pytorch model into neuronx model, please let me know.

Hi @newgrit1004,

Please confirm that you are running the torch-neuron code on an inf1 and the torch-neuronx code on a trn1 or inf2 (the two APIs are for different hardware) - also that you have the most recent packages installed.

I ran both of you examples with no issue, and I am going to close this ticket. Feel free to re-open if you are still experiencing problems and need more help.

I ran the second test first using your exact installation script and commands:

(neuronx_venv) ubuntu@ip-172-31-18-243:~/waldronn/test/github-944$ python test.py 
Namespace(max_length=128, save_path='neuron_model', model_id='distilbert-base-uncased-finetuned-sst-2-english')
Compile model
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 497kB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 629/629 [00:00<00:00, 9.32MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 63.0MB/s]
/home/ubuntu/waldronn/test/github-944/neuronx_venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 268M/268M [00:00<00:00, 522MB/s]
.
Compiler status PASS
((tensor([[ 3.8870, -3.1771]]),), tensor([0]))

The first example needs a little tweaking to run on my trn2:

import torch
import torch_neuronx
import torchvision.models as models

input_tensor = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

model = models.resnet50(pretrained=True)
model.eval()

model_neuron = torch_neuronx.trace(model, input_tensor)
model_neuron.save("model_neuron.pt")

This also runs fine:

(neuronx_venv) ubuntu@ip-172-31-18-243:~/waldronn/test/github-944$ python test2.py 
/home/ubuntu/waldronn/test/github-944/neuronx_venv/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/ubuntu/waldronn/test/github-944/neuronx_venv/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
:
:
.
Compiler status PASS

This error implies you either had all device bound to a process (maybe something stalled?), your setup was incorrect (e.g. the driver was not correctly installed), or you were not running on an inf2 / trn2.

2024-Aug-12 13:15:49.958851 1216810:1216810 ERROR  TDRV:tdrv_get_dev_info                       No neuron device available

We should improve this error message for clarity - it tells you there were zero neuron devices to work with. Please review the setup instructions, or https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html?highlight=visible#example-neuron-rt-visible-cores if you are running multiple processes on the same instance.

Hi, mrnikwaws,

Thank you for testing my script on the inf1 and inf2 instances. However, your answer is not what I expected.

To my knowledge, compiling a neuron model doesn't require an Inferentia chip. This is why I can compile a ResNet50 PyTorch model into a neuron model locally after setting up the environment.

What I'm particularly interested in is whether it's possible to compile a neuronx model (for Inferentia 2) without using an Inferentia 2 instance. From your answer, I understand that this might not be possible without an Inferentia 2 instance. Is this correct?

Could you please clarify this point? Your insights would be greatly appreciated.