Quaternion difference in rotation reward

It's not mathematically sound to directly substract quaternions as they follow their own algebra.
This was used to compute a rotational reward here:

Line 363 in 4d146e1

return np.exp(-np.linalg.norm(5.0 * (self.sim.data.qpos[3:7] - target_rot)))

It should be replaced by a proper quaternion difference.