Evaluation with past research
Closed this issue · 4 comments
Hi dear authors,
I would like to start by saying thank you for your amazing work.
Did you re-implement past research(Lin et. al./ JL2P/ Ghosh et al.)?
How can I evaluate them with your code?
Hello @L190201301,
I will put more information in the README, in the next few weeks. (I may also distributes their motions.)
To tell you what I use for comparison with previsous works (while waiting for me to update the README):
- For Lin et al:. as the code was not released, I use the reimplementation and pretrained model of Language2Pose (JL2P)
- JL2P: code and pretrained model: https://github.com/chahuja/language2pose
- Ghosh et al.: code and pretrained model: https://github.com/anindita127/Complextext2animation (this code is heavily based on JL2P)
To get the motions as npy files, I follow each README.md
to do the installation, then I do:
python sample_wordConditioned.py
=> to get motion samples (still as rifke features)- To get proper xyz joints, I "hack" their
render.py
script, such that it saves motions in npy instead of rendering them. (I do it in this way, to make sure the conversion rifke => xyz is done correctly). I will explain it more clearly in the README.
I will also update the eval.py
script, and upload the script to create a table with all the results.
Hello, I also have a question with evaluation.
As far as I'm concerned, previous studies have not provided the results of the variable sequence length study.
Are all the results presented in the paper conducted using a fixed length?
Hello,
That's a very good question. Actually, I am not doing something ideal, we can discuss a bit if you think of something better.
After generating motions (from any method), for each sequence inthe test set, I load the GT motion and the generated one. Then, I calculate the maximum number of frames in common (minimum length of both), and compute the metrics (APE root joints etc) on those frames. (This is an average, so sometimes it will compute the metrics with less elements).
TEMOS always generates motions of appropriate length, as this is one of the inputs to the model (all poses are generated in one pass). Previous work are generally auto-regressive, and are trained to generate a fixed number of poses at a time (which requires several passes through the model). When I evaluate, I take what they generate.
If you are interested in the code, you can check it out:
- Compute the min length:
Line 177 in ea12cf6
- Compute the metrics:
TEMOS/temos/model/metrics/compute.py
Line 82 in ea12cf6
Hi,
I update the README.md. You can use the command line: bash prepare/download_previous_works.sh
to download the motion generated from previous work. Then python evaluate.py folder=previous_work/ghosh
to evaluate on ghosh et al.