Denys88/rl_games

Exporting Dofbot Policy to Onnx From Omni Isaac Gym

DJT777 opened this issue · 23 comments

Hi,

I'm working on exporting a policy trained on the example from this repo https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher.

I've based my export on the example you gave from your code in the notebook here: https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_continuous.ipynb

I think that these lines of code should be sufficient as the config for the model is the exact same kind of model:
https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher/blob/main/omniisaacgymenvs/cfg/train/DofbotReacherPPO.yaml

To export my model incorporated your code into the rlgames_train.py from that repo as rlgames_export.py, with my alterations on line 48-124: https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/scripts/rlgames_export.py. You can see I am restoring the default checkpoint as the weights.

This is exporting a runnable ONNX model, and I am trying to use it here: https://github.com/DJT777/Excolligere/blob/main/notebooks/Load%20ONNX%20and%20Predict.ipynb

However, I am not getting the same results as the simulator running. I am wondering if you can help me understand if I am exporting correctly in my script for rlgames_export.py, and if I am performing inference with the ONNX model correctly here.

HI, could yo you give me an access to the repo?

I've invited you to have access to the repo.

From the first view your code is good.
Could you check observation normalization?
It should be inside of the model, but earlier I had it inside of the training code.
Could you confirm that you re using the latest rlgames version?

This project is kind of my first journey into RL, so I'm kind of new to this terminology. Excuse me if I'm not correct on answering your questions.

Here is the Netron view of the model:

dofbotreacherDefault onnx

I suppose the observation normalization would be the Sub and Div operations shown in Netron as well as clip and flatten.

The version of RLGames is what is being used in Isaac Sim 2022.1.1, which is showing version 1.5.2 from the conda environment given with Isaac Sim.

If this is helpful for debugging:

What I'm doing is printing out the placement tensor found here:
https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/tasks/dofbot_reacher.py
On lines 144-151

These coordinates are then plugged in to the target position for the inference in the notebook.

I'm also printing out my actions from the step function here on the first line after the function definition:
https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/omniisaacgymenvs/envs/vec_env_rlgames.py

Then I'm comparing the outputs from Isaac Sim and the notebook I linked above.

One consideration I've had is that I'm not reading the output correctly based on this documentation suggesting that there is a different ordering of the outputs: https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/docs/transfering_policies_from_isaac_gym.md

The best case if you pass the same input to the bothh onnx and pytorch model and compare the outputs.

Here is a sample of the different outputs as you suggested:

From the Isaac Gym:



Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')



Whereas running this code:


import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}




while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    # Run the model on the input data
    mus = output[0][0]
    sigmas = output[1][0]
    for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        sigma = np.exp(sigma)
        action = np.random.normal(mu, sigma)
        action = clip_actions(action)
        #print(str(action))
        actions.append(action)
        # Print the output
    prevAct = np.array(actions, dtype=np.float32)
    joinPos = prevAct
    mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    print("Sampled Actions: " + str(actions)+"\n")

produces very different results in the action output:



Sampled Actions: [1.0, -0.44891554912899617, 0.4055521331764843, 1.0, 0.044669989955658046, 0.6681064546999487]

Sampled Actions: [1.0, -0.6341698266623044, 0.42099329096228416, 1.0, -0.03949862364674661, 0.5454127401465692]

Sampled Actions: [1.0, -0.9018518161188407, 0.43396113195718267, 0.8811261401993116, -0.1814963516831179, 0.6290464824858719]

Sampled Actions: [1.0, -0.5114059775648212, 0.21491532992916937, 1.0, 0.22194688542268481, 0.8891714238151054]

Sampled Actions: [1.0, -0.555514435619405, 0.32891076544248093, 1.0, 0.2197003265248571, 0.5689791679823528]

Could you print only 'mu' outpus from both models. Because it is expected behavior if you are sampling.

I've not been able to find in Nvidia's Isaac Gym's code where they are doing their actual inference and getting predictions. A lot of their RL code is based on your RL Games repo. Would you know where to look?

I mean you don't need sampling action = np.random.normal(mu, sigma)
if you want to have a deterministic policy you can jsut make action = mu.

Right, but I want to verify that I'm getting the correct and same predictions across the simulator and the ONNX model. My experiments with deploying the ONNX model lead me to believe that I was getting inaccurate predictions from the ONNX model. This is because the reaches were not matching what they were in simulation.

You can see here that even with the deterministic approach that the expected value is vastly different from what actions are actually being taken in the simulator:

Case 1:

Simulator:

new position tensor([[ 0.1826, -0.1787,  0.2417]], device='cuda:0')
Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')

ONNX


input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
          
From Mu: [ 1.         -0.47443828  0.32300434  1.          0.15414204  0.65032417]

Case 2:

Simulator

Observation:tensor([[-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.5072,  0.2407,  0.3249,  1.0000, -1.0000, -0.3381]],
       device='cuda:0')

ONNX

input_data = np.array([-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000], dtype=np.float32)
          
From Mu: [ 0.56676865 -1.         -0.08646691  1.          0.14331801  0.48769447]

Case 3

From Simulation

observation:tensor([[ 0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.1546,  0.7010,  0.4620,  0.8540, -0.6394, -0.5318]],
       device='cuda:0')

From ONNX

input_data = np.array([0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000], dtype=np.float32)
From Mu: [1.         0.59151983 1.         1.         0.71608543 0.70238996]```


could you check that pythorch model returns determinisitc outputs too?

I've no idea where in the codebase they are actually generating the outputs of the model. I'm wondering if you would lend your expertise in helping me find in Nvidia's implementation of your rlgames library to find where they are making predictions and outputs.

I mean you can call my model twice with the same input.

Do you mean use your library to generate the model then compare outputs?

Why not identify where in their code they are generating outputs of their model. Then print/store the outputs and the observation buffer and compare with the ONNX notebook? Might also help to see how they are generating the actions from the output of the model.

you can call:
agent.get_action(obs, True)
And make sure it is the same checkpoint what was exported to the onnx.

Okay, going to try that.

Here is the code I've used (it's also pushed to the Excolligere repo you have access to)

These should have equal output on the same data, correct? All of the mus, logstd, and rewards are different.

Omni Isaac


        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))

Here is the output:


Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))

Whereas this is the output from the notebook:

Notebook With ONNX

import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}



while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    print(output)
    # Run the model on the input data
    #mus = output[0][0]
    #sigmas = output[1][0]
    #for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        #sigma = np.exp(sigma)
        #action = np.random.normal(mu, sigma)
        #action = clip_actions(action)
        #print(str(action))
        #actions.append(action)
        # Print the output
    #prevAct = np.array(actions, dtype=np.float32)
    #oinPos = prevAct
    #mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    #print("Sampled Actions: " + str(actions)+"\n")

Notebook output:

[array([[1.6248189 , 0.59151983, 2.016268  , 1.9443805 , 0.71608543,
        0.70238996]], dtype=float32), array([[-2.7289395, -2.1363962, -2.010333 , -2.3337207, -1.4635202,
        -1.023427 ]], dtype=float32), array([[0.5652896]], dtype=float32)]

You can see I'm using the same checkpoint and model.

Full run function in OmniIsaac

    def run(self):

        # create runner and set the settings
        runner = Runner(RLGPUAlgoObserver())
        runner.load(self.rlg_config_dict)
        runner.reset()

        # dump config dict
        experiment_dir = os.path.join('runs', self.cfg.train.params.config.name)
        os.makedirs(experiment_dir, exist_ok=True)
        with open(os.path.join(experiment_dir, 'config.yaml'), 'w') as f:
            f.write(OmegaConf.to_yaml(self.cfg))


        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent" + str(agent.get_action(obs, True)))

        inputs = {
            'obs': torch.zeros((1,) + agent.obs_shape).to(agent.device),
        }

        with torch.no_grad():
            adapter = flatten.TracingAdapter(ModelWrapper(agent.model), inputs, allow_non_tensor=True)
            traced = torch.jit.trace(adapter, adapter.flattened_inputs, check_trace=False)
            flattened_outputs = traced(*adapter.flattened_inputs)
            print(flattened_outputs)

        torch.onnx.export(traced, *adapter.flattened_inputs, "dofbotreacherDefault.onnx", verbose=True, input_names=['obs'],
                          output_names=['mu', 'log_std', 'value'])

        runner.run({
            'train': not self.cfg.test,
            'play': self.cfg.test,
            'checkpoint': self.cfg.checkpoint,
            'sigma': None
        })

Notebook loading the model


import onnxruntime as ort
import numpy as np

# Load the ONNX model
sess = ort.InferenceSession('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/dofbotreacherDefault.onnx')

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# Get the input names and shapes
input_info = sess.get_inputs()
output_info = sess.get_outputs()

for i in input_info:
    print("Input name:", i.name)
    print("Input shape:", i.shape)
    
for i in output_info:
    print("Output name:", i.name)
    print("Output shape:", i.shape)

Actually, reloading my model has now generated similar results:

Isaac Gym

Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(**tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))**

ONNX


[array([[ 0.15400116,  0.7014214 ,  0.4610022 ,  0.85426027, -0.63877946,
        -0.53179383]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[-0.01199287]], dtype=float32)]

More data confirms successful exporting of models:

(note that results are being clipped to -1 and 1 for all results in Isaac Sim whereas the ONNX model's outputs have not yet been clipped in this post)

From Isaac Sim:

**input**

input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))
**Output**

Actions from agent: tensor([-0.1589,  0.3981, -1.0000,  0.7192, -1.0000, -0.0660], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))


From Notebook

**input**
    # Concatenate the arrays into a single array
input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}


input_dict = {input_name: input_data.reshape(1, -1)}
actions = []
output = sess.run(None, input_dict)
print(output)


**Output**

[array([[-0.15905663,  0.3985302 , -1.0090353 ,  0.7184627 , -1.4696665 ,
        -0.06618351]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[1.3237184]], dtype=float32)]


Closing issue

I refer to the example in https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_lstm_continuous.ipynb run and export the onnx model. But when I tried to modify the config file to modify the network structure, I found that the observation quantity of the LSTM layer was always 3 and could not be modified, and the number of actions output by the network could not be modified.
I guess it is related to 'env_config': {'env_name': 'Pendulum-v1', 'seed': 5}, 'env_name': 'envpool' is related to the content. How to set up my own environment and modify the number of observations and actions?

Looking forward to your reply, Thank you!