una-dinosauria/human-motion-prediction

Visualization Code and Data Clarification

kaufManu opened this issue · 2 comments

Thank you for taking the time to make your code publicly available! I also really liked your paper and found it very interesting.

I am a bit confused regarding the data representation, though, and how the visualization works. Specifically, I am referring to this code snippet in forward_kynematics.fkl:

    for i in np.arange(njoints):

        if not rotInd[i]:  # If the list is empty
            xangle, yangle, zangle = 0, 0, 0
        else:
            xangle = angles[rotInd[i][0] - 1]
            yangle = angles[rotInd[i][1] - 1]
            zangle = angles[rotInd[i][2] - 1]

        r = angles[expmapInd[i]]

        thisRotation = data_utils.expmap2rotmat(r)
        thisPosition = np.array([xangle, yangle, zangle])

        if parent[i] == -1:  # Root node
            xyzStruct[i]['rotation'] = thisRotation
            xyzStruct[i]['xyz'] = np.reshape(offset[i, :], (1, 3)) + thisPosition
        else:
            xyzStruct[i]['xyz'] = (offset[i, :] + thisPosition).dot(xyzStruct[parent[i]]['rotation']) + \
                                  xyzStruct[parent[i]]['xyz']
            xyzStruct[i]['rotation'] = thisRotation.dot(xyzStruct[parent[i]]['rotation'])

What confuses me is the fact that thisPosition = np.array([xangle, yangle, zangle]) is added to the offset, i.e. the final 3D position of each joint. The data (i.e. angles) has shape (99,). I believe that the first three dimensions are the position of the root (I read this somewhere in a comment, but forgot where :)). So the remaining 96 dimensions are the 32 exponential map coordinates for the 32 joints, right? rotInd points into angles, so thisPosition = np.array([xangle, yangle, zangle]) is actually joint angle data and thus should not be added to a position vector in my opinion. In fact, I tried to just set thisPosition to zero and the plot looks very similar. I assume the angles (given in radians) are just small enough to not make a hug difference.

Another thing that confuses me is that we seem to have 32*3 exponential map coordinates, implying we have 32 joints. However the H3.6M skeletons only have 25 joints (I checked this by downloading the H3.6M code files). I believe that the remaining 7 "joints" are in fact end effector nodes, for which joint angles are typically not defined. This is also confirmed by the contents of your rotInd matrix (the end effectors being the entries in rotInd that are empty). I checked the contents of S1 walking_1.txt and the data corresponding to the 7 end effectors, i.e. indices [5, 10, 15, 21, 23, 29, 31], is empty anyways (i.e. zero vectors in every frame). This is a minor thing, as the visualization is not impacted by that and because I believe that you remove those entries from the data before you feed it to the model. However, this confused me a lot, so I just wanted to ask if you could confirm that and I also wanted to write it down somewhere for reference for future readers.

Hi @kaufManu,

I believe that the first three dimensions are the position of the root (I read this somewhere in a comment, but forgot where :)). So the remaining 96 dimensions are the 32 exponential map coordinates for the 32 joints, right?

Yes, the first 3 numbers are the translation of the root joint (ie, the "global" translation), and yes, the remaining 96 dimensions are angles for 32 "joints".

rotInd points into angles, so thisPosition = np.array([xangle, yangle, zangle]) is actually joint angle data and thus should not be added to a position vector in my opinion. In fact, I tried to just set thisPosition to zero and the plot looks very similar. I assume the angles (given in radians) are just small enough to not make a hug difference.

🤔 that actually makes a lot of sense. Probably the angles are small enough tho. It doesn't help that I seem to have mixed up the numbers of the variables (in https://github.com/asheshjain399/RNNexp/blob/7fc5a53292dc0f232867beb66c3a9ef845d705cb/structural_rnn/CRFProblems/H3.6m/mhmublv/Motion/exp2xyz.m, these variables are called [xpos, ypos, zpos], but they are indexed from a posInd(ex?) variable, which is conspicuously missing from my code.

Regarding the 32 joints, I believe only 17 are independent, and the rest are end effectors as you call them. IIRC some joints are repeated -- I remember observing this when I plotted the index in 3d as I was going down the tree, but you may want to confirm it yourself.

If I were you, I would plot a couple of sequences using the matlab code, then use this python code and compare the two outputs and tweak it until they produce the same output (that's my standard procedure for porting eg matlab -> python, numpy -> tf, c++ -> cuda...).
You always have to leave some room for numerical error though, so I might have erroneously assumed that the small differences due to adding the angles to the position were due to numerical error. If that's the case, the good news is that it doesn't matter too much 😅

Many thanks for your quick answer and the clarifications :)!