Exporting Dofbot Policy to Onnx From Omni Isaac Gym

Question

Exporting Dofbot Policy to Onnx From Omni Isaac Gym

DJT777 opened this issue 2 years ago · 23 comments

Hi,

I'm working on exporting a policy trained on the example from this repo https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher.

I've based my export on the example you gave from your code in the notebook here: https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_continuous.ipynb

I think that these lines of code should be sufficient as the config for the model is the exact same kind of model:
https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher/blob/main/omniisaacgymenvs/cfg/train/DofbotReacherPPO.yaml

To export my model incorporated your code into the rlgames_train.py from that repo as rlgames_export.py, with my alterations on line 48-124: https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/scripts/rlgames_export.py. You can see I am restoring the default checkpoint as the weights.

This is exporting a runnable ONNX model, and I am trying to use it here: https://github.com/DJT777/Excolligere/blob/main/notebooks/Load%20ONNX%20and%20Predict.ipynb

However, I am not getting the same results as the simulator running. I am wondering if you can help me understand if I am exporting correctly in my script for rlgames_export.py, and if I am performing inference with the ONNX model correctly here.

Answer 1 · 2023-02-19T00:12:49.000Z

HI, could yo you give me an access to the repo?

Answer 2 · 2023-02-19T00:33:29.000Z

I've invited you to have access to the repo.

Answer 3 · 2023-02-19T02:35:14.000Z

From the first view your code is good.
Could you check observation normalization?
It should be inside of the model, but earlier I had it inside of the training code.
Could you confirm that you re using the latest rlgames version?

Answer 4 · 2023-02-19T02:49:42.000Z

This project is kind of my first journey into RL, so I'm kind of new to this terminology. Excuse me if I'm not correct on answering your questions.

Here is the Netron view of the model:

I suppose the observation normalization would be the Sub and Div operations shown in Netron as well as clip and flatten.

The version of RLGames is what is being used in Isaac Sim 2022.1.1, which is showing version 1.5.2 from the conda environment given with Isaac Sim.

Answer 5 · 2023-02-19T02:56:18.000Z

If this is helpful for debugging:

What I'm doing is printing out the placement tensor found here:
https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/tasks/dofbot_reacher.py
On lines 144-151

These coordinates are then plugged in to the target position for the inference in the notebook.

I'm also printing out my actions from the step function here on the first line after the function definition:
https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/omniisaacgymenvs/envs/vec_env_rlgames.py

Then I'm comparing the outputs from Isaac Sim and the notebook I linked above.

One consideration I've had is that I'm not reading the output correctly based on this documentation suggesting that there is a different ordering of the outputs: https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/docs/transfering_policies_from_isaac_gym.md

Answer 6 · 2023-02-19T03:17:37.000Z

The best case if you pass the same input to the bothh onnx and pytorch model and compare the outputs.

Answer 7 · 2023-02-19T03:26:20.000Z

Here is a sample of the different outputs as you suggested:

From the Isaac Gym:



Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')

Whereas running this code:


import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}




while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    # Run the model on the input data
    mus = output[0][0]
    sigmas = output[1][0]
    for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        sigma = np.exp(sigma)
        action = np.random.normal(mu, sigma)
        action = clip_actions(action)
        #print(str(action))
        actions.append(action)
        # Print the output
    prevAct = np.array(actions, dtype=np.float32)
    joinPos = prevAct
    mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    print("Sampled Actions: " + str(actions)+"\n")

produces very different results in the action output:



Sampled Actions: [1.0, -0.44891554912899617, 0.4055521331764843, 1.0, 0.044669989955658046, 0.6681064546999487]

Sampled Actions: [1.0, -0.6341698266623044, 0.42099329096228416, 1.0, -0.03949862364674661, 0.5454127401465692]

Sampled Actions: [1.0, -0.9018518161188407, 0.43396113195718267, 0.8811261401993116, -0.1814963516831179, 0.6290464824858719]

Sampled Actions: [1.0, -0.5114059775648212, 0.21491532992916937, 1.0, 0.22194688542268481, 0.8891714238151054]

Sampled Actions: [1.0, -0.555514435619405, 0.32891076544248093, 1.0, 0.2197003265248571, 0.5689791679823528]

Answer 8 · 2023-02-19T03:30:24.000Z

Could you print only 'mu' outpus from both models. Because it is expected behavior if you are sampling.

Answer 9 · 2023-02-19T03:32:37.000Z

I've not been able to find in Nvidia's Isaac Gym's code where they are doing their actual inference and getting predictions. A lot of their RL code is based on your RL Games repo. Would you know where to look?

Answer 10 · 2023-02-19T03:33:54.000Z

I mean you don't need sampling action = np.random.normal(mu, sigma)
if you want to have a deterministic policy you can jsut make action = mu.

Answer 11 · 2023-02-19T03:36:12.000Z

Right, but I want to verify that I'm getting the correct and same predictions across the simulator and the ONNX model. My experiments with deploying the ONNX model lead me to believe that I was getting inaccurate predictions from the ONNX model. This is because the reaches were not matching what they were in simulation.

Answer 12 · 2023-02-19T03:48:42.000Z

You can see here that even with the deterministic approach that the expected value is vastly different from what actions are actually being taken in the simulator:

Case 1:

Simulator:

new position tensor([[ 0.1826, -0.1787,  0.2417]], device='cuda:0')
Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')

ONNX


input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
          
From Mu: [ 1.         -0.47443828  0.32300434  1.          0.15414204  0.65032417]

Case 2:

Simulator

Observation:tensor([[-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.5072,  0.2407,  0.3249,  1.0000, -1.0000, -0.3381]],
       device='cuda:0')

ONNX

input_data = np.array([-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000], dtype=np.float32)
          
From Mu: [ 0.56676865 -1.         -0.08646691  1.          0.14331801  0.48769447]

Case 3

From Simulation

observation:tensor([[ 0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.1546,  0.7010,  0.4620,  0.8540, -0.6394, -0.5318]],
       device='cuda:0')

From ONNX

input_data = np.array([0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000], dtype=np.float32)
From Mu: [1.         0.59151983 1.         1.         0.71608543 0.70238996]```

Answer 13 · 2023-02-19T03:57:03.000Z

could you check that pythorch model returns determinisitc outputs too?

Answer 14 · 2023-02-19T03:58:19.000Z

I've no idea where in the codebase they are actually generating the outputs of the model. I'm wondering if you would lend your expertise in helping me find in Nvidia's implementation of your rlgames library to find where they are making predictions and outputs.

Answer 15 · 2023-02-19T04:15:52.000Z

I mean you can call my model twice with the same input.

Answer 16 · 2023-02-19T04:19:48.000Z

Do you mean use your library to generate the model then compare outputs?

Why not identify where in their code they are generating outputs of their model. Then print/store the outputs and the observation buffer and compare with the ONNX notebook? Might also help to see how they are generating the actions from the output of the model.

Answer 17 · 2023-02-19T04:23:41.000Z

you can call:
agent.get_action(obs, True)
And make sure it is the same checkpoint what was exported to the onnx.

Answer 18 · 2023-02-19T04:27:58.000Z

Okay, going to try that.

Answer 19 · 2023-02-19T04:39:34.000Z

Here is the code I've used (it's also pushed to the Excolligere repo you have access to)

These should have equal output on the same data, correct? All of the mus, logstd, and rewards are different.

Omni Isaac


        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))

Here is the output:


Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))

Whereas this is the output from the notebook:

Notebook With ONNX

import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}



while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    print(output)
    # Run the model on the input data
    #mus = output[0][0]
    #sigmas = output[1][0]
    #for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        #sigma = np.exp(sigma)
        #action = np.random.normal(mu, sigma)
        #action = clip_actions(action)
        #print(str(action))
        #actions.append(action)
        # Print the output
    #prevAct = np.array(actions, dtype=np.float32)
    #oinPos = prevAct
    #mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    #print("Sampled Actions: " + str(actions)+"\n")

Notebook output:

[array([[1.6248189 , 0.59151983, 2.016268  , 1.9443805 , 0.71608543,
        0.70238996]], dtype=float32), array([[-2.7289395, -2.1363962, -2.010333 , -2.3337207, -1.4635202,
        -1.023427 ]], dtype=float32), array([[0.5652896]], dtype=float32)]

You can see I'm using the same checkpoint and model.

Full run function in OmniIsaac

    def run(self):

        # create runner and set the settings
        runner = Runner(RLGPUAlgoObserver())
        runner.load(self.rlg_config_dict)
        runner.reset()

        # dump config dict
        experiment_dir = os.path.join('runs', self.cfg.train.params.config.name)
        os.makedirs(experiment_dir, exist_ok=True)
        with open(os.path.join(experiment_dir, 'config.yaml'), 'w') as f:
            f.write(OmegaConf.to_yaml(self.cfg))


        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent" + str(agent.get_action(obs, True)))

        inputs = {
            'obs': torch.zeros((1,) + agent.obs_shape).to(agent.device),
        }

        with torch.no_grad():
            adapter = flatten.TracingAdapter(ModelWrapper(agent.model), inputs, allow_non_tensor=True)
            traced = torch.jit.trace(adapter, adapter.flattened_inputs, check_trace=False)
            flattened_outputs = traced(*adapter.flattened_inputs)
            print(flattened_outputs)

        torch.onnx.export(traced, *adapter.flattened_inputs, "dofbotreacherDefault.onnx", verbose=True, input_names=['obs'],
                          output_names=['mu', 'log_std', 'value'])

        runner.run({
            'train': not self.cfg.test,
            'play': self.cfg.test,
            'checkpoint': self.cfg.checkpoint,
            'sigma': None
        })

Notebook loading the model


import onnxruntime as ort
import numpy as np

# Load the ONNX model
sess = ort.InferenceSession('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/dofbotreacherDefault.onnx')

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# Get the input names and shapes
input_info = sess.get_inputs()
output_info = sess.get_outputs()

for i in input_info:
    print("Input name:", i.name)
    print("Input shape:", i.shape)
    
for i in output_info:
    print("Output name:", i.name)
    print("Output shape:", i.shape)

Answer 20 · 2023-02-19T04:46:24.000Z

Actually, reloading my model has now generated similar results:

Isaac Gym

Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(**tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))**

ONNX


[array([[ 0.15400116,  0.7014214 ,  0.4610022 ,  0.85426027, -0.63877946,
        -0.53179383]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[-0.01199287]], dtype=float32)]

Answer 21 · 2023-02-19T05:21:58.000Z

More data confirms successful exporting of models:

(note that results are being clipped to -1 and 1 for all results in Isaac Sim whereas the ONNX model's outputs have not yet been clipped in this post)

From Isaac Sim:

**input**

input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))

**Output**

Actions from agent: tensor([-0.1589,  0.3981, -1.0000,  0.7192, -1.0000, -0.0660], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))

From Notebook

**input**
    # Concatenate the arrays into a single array
input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}


input_dict = {input_name: input_data.reshape(1, -1)}
actions = []
output = sess.run(None, input_dict)
print(output)

**Output**

[array([[-0.15905663,  0.3985302 , -1.0090353 ,  0.7184627 , -1.4696665 ,
        -0.06618351]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[1.3237184]], dtype=float32)]

Answer 22 · 2023-02-19T05:28:11.000Z

Closing issue

Answer 23 · 2024-01-08T08:39:02.000Z

I refer to the example in https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_lstm_continuous.ipynb run and export the onnx model. But when I tried to modify the config file to modify the network structure, I found that the observation quantity of the LSTM layer was always 3 and could not be modified, and the number of actions output by the network could not be modified.
I guess it is related to 'env_config': {'env_name': 'Pendulum-v1', 'seed': 5}, 'env_name': 'envpool' is related to the content. How to set up my own environment and modify the number of observations and actions?

Looking forward to your reply, Thank you!