MyoHub/myosuite

Help request to apply inverse dynamics on myohand model

andreh1111 opened this issue · 7 comments

Hi,
I'm trying to exploit MuJoCo's inverse dynamics to get ctrl given target qpos. Following MuJoCo's documentation, I then implemented the following function:

import scipy.optimize as optimize
from copy import deepcopy
import numpy as np
import mujoco

def get_ctrl(model, data, target_qpos):
    data_copy = deepcopy(data)
    # ---- qfrc computation
    data_copy.qpos = target_qpos.copy()
    mujoco.mj_forward(model, data_copy)
    data_copy.qacc = 0
    mujoco.mj_inverse(model, data_copy)
    qfrc = data_copy.qfrc_inverse.copy()
    # ---- afrc computation
    amnt = data_copy.actuator_moment.copy()
    afrc = np.zeros(model.nu,)
    sign_constr = ({'type':'ineq', 'fun': lambda afrc: -afrc}) # only negative actuator forces allowed
    cost_function = lambda afrc: np.linalg.norm(afrc @ amnt - qfrc)
    result = optimize.minimize(cost_function, afrc, method='COBYLA', constraints=sign_constr)
    actuator_force = result.x
    # ---- act computation
    act = np.zeros(actuator_force.shape)
    for idx_actuator, force in enumerate(actuator_force):
        length = data_copy.actuator_length[idx_actuator]
        lengthrange = model.actuator_lengthrange[idx_actuator]
        acc0 = model.actuator_acc0[idx_actuator]
        prmb = model.actuator_biasprm[idx_actuator][:9]
        prmg = model.actuator_gainprm[idx_actuator][:9]
        bias = mujoco.mju_muscleBias(length, lengthrange, acc0, prmb)
        gain = mujoco.mju_muscleGain(length, 0, lengthrange, acc0, prmg)
        act[idx_actuator] = min(max((force - bias) / gain, 0), 1)
    return act

Sometimes this function works quite well (i.e., applying the obtained ctrl I'm able to get the target position) but often (most of the time) there are individual qpos values wrong by about 90°, while other times the obtained position is completely wrong.
I therefore think that maybe I'm doing something very wrong without realising it.

Do you have any hints about this?
Thank you very much for your support.

Hi @andreh1111 can you add an example with a few trajectories to test when the wrong qpos values are generated?

Hi @Vittorio-Caggiano, thanks for your answer. Here is a full code to test both a good-working condition and a bad-working one (qpos2reach = 1 or 2 respectively):

import scipy.optimize as optimize
from threading import Thread
from copy import deepcopy
import numpy as np
import mujoco
import mujoco.viewer
import time
import sys

def get_ctrl(model, data, target_qpos):
    data_copy = deepcopy(data)
    # ---- qfrc computation
    data_copy.qpos = target_qpos.copy()
    mujoco.mj_forward(model, data_copy)
    data_copy.qacc = 0
    mujoco.mj_inverse(model, data_copy)
    qfrc = data_copy.qfrc_inverse.copy()
    print(f'qfrc resulted from inverse dynamics:\n{qfrc}\n')
    # ---- afrc computation
    amnt = data_copy.actuator_moment.copy()
    afrc = np.zeros(model.nu,)
    sign_constr = ({'type':'ineq', 'fun': lambda afrc: -afrc}) # only negative actuator forces allowed
    cost_function = lambda afrc: np.linalg.norm(afrc @ amnt - qfrc)
    result = optimize.minimize(cost_function, afrc, method='COBYLA', constraints=sign_constr)
    actuator_force = result.x
    # ---- act computation
    act = np.zeros(actuator_force.shape)
    for idx_actuator, force in enumerate(actuator_force):
        length = data_copy.actuator_length[idx_actuator]
        lengthrange = model.actuator_lengthrange[idx_actuator]
        acc0 = model.actuator_acc0[idx_actuator]
        prmb = model.actuator_biasprm[idx_actuator][:9]
        prmg = model.actuator_gainprm[idx_actuator][:9]
        bias = mujoco.mju_muscleBias(length, lengthrange, acc0, prmb)
        gain = mujoco.mju_muscleGain(length, 0, lengthrange, acc0, prmg)
        act[idx_actuator] = min(max((force - bias) / gain, 0), 1)
    return act

def see_pos(model, qpos):
    data_copy = mujoco.MjData(model)
    data_copy.qpos = qpos
    # ----
    azimuth = 166.553
    distance = 1.493
    elevation = -36.793
    lookat = [-0.89994152, -0.18327544, 0.818737]
    with mujoco.viewer.launch_passive(model, data_copy) as viewer:
        viewer.cam.azimuth = azimuth
        viewer.cam.distance = distance
        viewer.cam.elevation = elevation
        viewer.cam.lookat = np.array(lookat)
        viewer.scn.flags[:] = 0
        viewer.opt.flags[:] = 0
        viewer.opt.geomgroup[1:] = 0
        while viewer.is_running():
            viewer.sync()
            time.sleep(0.1)

if __name__ == '__main__':
    model_name = '<your path>\\myohand.xml'
    model = mujoco.MjModel.from_xml_path(model_name)
    # model.opt.disableflags = mujoco.mjtDisableBit.mjDSBL_GRAVITY
    data = mujoco.MjData(model)

    qpos2reach = 1 # <1> for the good-working condition, or <2> for the bad-working
    
    if qpos2reach == 1: # this results in a good replication, with the exception of 87° error on pm4_flexion (see result after last print)
        target_qpos = np.array([0.29865028,  0.46311759,  0.80771347, -0.67851221,  0.74317296,
                                -0.78729943, -1.35795467,  1.5902815 ,  0.29188749,  1.54643497,
                                -0.01068739,  1.67790172,  0.02214216,  1.64058934,  1.5890064 ,
                                0.04424725,  0.29232586,  1.51057946, -0.03351803, -0.00766682,
                                0.26878197, -0.04846721, -0.04105932])

    else: # this completely fails (NOTE: if gravity is disabled, qfrc resulted from inverse dynamics is all 0)
        target_qpos = np.array([ 0.        ,  0.        ,  0.        , -0.03095467,  0.7       ,
                                -0.785398  , -0.19860439,  0.2690076 ,  0.        ,  0.2690076 ,
                                0.        ,  1.43710411,  0.        ,  1.43710411,  0.        ,
                                0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
                                0.        ,  0.        ,  0.        ])

    Thread(target = lambda: see_pos(model, target_qpos)).start()

    data.ctrl = get_ctrl(model, data, target_qpos)
    for _ in range(int(1e3)):
        mujoco.mj_step(model, data)
    qpos_achieved = data.qpos

    Thread(target = lambda: see_pos(model, qpos_achieved)).start()

    joints_names_list = np.array([mujoco.mj_id2name(model, mujoco.mju_str2Type('joint'.encode(sys.getdefaultencoding())), i) for i in range(model.nq)])
    print('qpos errors:')
    print(np.vstack((np.array(joints_names_list).T, np.rad2deg(target_qpos-qpos_achieved).round(1).T)).T)

@andreh1111 thanks for the example... from a quick look a few considerations:

  1. The posture you want to achieve seems to be very extreme and it wouldn't be achievable in one time step. Indeed, when you simulate it, you run it for 1e3 time steps. You should consider the ID in the same time frame as the dynamics you want to simulate.
  2. In addition to the gravity there is also the passive force in each muscle. If you run the model e.g. in simulate, and apply forces to the joints, you will notice those effects.

Thanks for your answer @Vittorio-Caggiano!
I did some modifications to the code and now, given a trajectory (and not a single position as I was mistakenly doing before) I can get a sequence of muscle activations suitable for replicating it.
Since I struggled to find examples on the web (particularly on the use of inverse dynamics with muscle actuators) I provide below my current solution hoping to be helpful and to receive feedback in case of mistakes.

def get_act(model, data, target_qpos):
    data_copy = deepcopy(data)
    data_copy.qpos = target_qpos.copy()
    data_copy.qvel = (target_qpos - data.qpos)/model.opt.timestep
    mujoco.mj_forward(model, data_copy)
    data_copy.qacc = 0
    mujoco.mj_inverse(model, data_copy)
    qfrc = data_copy.qfrc_inverse.copy()
    gain = np.array([])
    bias = np.array([])
    bounds = [] #NOTE: optimization w.r.t. act instead of actuator_force allows for act bounded
    # between 0 and 1, rather than too high actuator_force that would then result in clamped act
    bounds_margin = 0.10 # this is to prevent the solution to change too much w.r.t. the previous state
    for idx_actuator in range(model.nu):
        length = data_copy.actuator_length[idx_actuator]
        lengthrange = model.actuator_lengthrange[idx_actuator]
        velocity = data_copy.actuator_velocity[idx_actuator]
        acc0 = model.actuator_acc0[idx_actuator]
        prmb = model.actuator_biasprm[idx_actuator,:9]
        prmg = model.actuator_gainprm[idx_actuator,:9]
        bias = np.append(bias, mujoco.mju_muscleBias(length, lengthrange, acc0, prmb))
        gain = np.append(gain, mujoco.mju_muscleGain(length, velocity, lengthrange, acc0, prmg))
        bounds.append((max(0, data.act[idx_actuator]-bounds_margin), min(1, data.act[idx_actuator]+bounds_margin)))
    cost_function = lambda act: np.linalg.norm((gain * act + bias) @ data_copy.actuator_moment - qfrc)
    result = optimize.minimize(cost_function, data.act, method='SLSQP', bounds=tuple(bounds))
    return result.x

At first my intention was to compute ctrl given obtained act, i.e.:

''' for each trajectory sample '''
    act = get_act(model, data, target_qpos)
    act_dot = (act - data.act)/model.opt.timestep
    tau = model.actuator_dynprm[:,0] * (0.5 + 1.5 * act)
    tau[act_dot<=0] = model.actuator_dynprm[act_dot<=0,1] / (0.5 + 1.5 * act[act_dot<=0])
    ctrl = act_dot * tau + act
    data.ctrl = ctrl
    mujoco.mj_step(model, data)

but I then realized that doing so the act resulting from applying ctrl would be different from the target act computed previously. Not knowing how to efficiently implement the muscle filter dynamics in get_act, I then decided to directly set data.act = get_act(model, data, target_qpos), with model.actuator_dyntype = mujoco.mjtDyn.mjDYN_INTEGRATOR instead of mjDYN_MUSCLE.

Note 1: Since I have not yet implemented any closed-loop feedback in this solution, in case the replication of the target trajectory generates enough contact forces to prevent the movement of a finger of myohand, in the long run the resulting trajectory of that finger will be different from the target one.
Note 2: Although the purpose of this implementation is related to the control of MyoSuite (of myohand in particular), I understand that I may have been wrong in not opening the conversation among MuJoCo issues, and in that case I apologize.

Hi @andreh1111 , I think having this as an example/tutorial could be useful for the community. We could add it to our tutorials section?

Done, happy to contribute. Thanks again @Vittorio-Caggiano.