cvlab-epfl/tf-lift

no pre-trained models and high training difficulty

hangong opened this issue · 3 comments

Hello. I enjoyed reading the paper. However, I find it very difficult to prepare data for training. If the following issues are resolved, it will be much less difficult for other academic users to adopt it.

  1. Should there be some scripts to automate the training data preparation (i.e. users only need to provide images)?

  2. Or, are there some pre-trained models for other users to compare with? I have not seen them here.

  3. I have tried the theano implementation. I noticed that there is a pre-trained model there. However, the code is really slow. A 300x300 image takes about 8 minutes to get processed on a Titan X GPU. I noticed that some comments in code stating that some steps are for "proof of concept only". However, it would be still good to have a more efficient implementation from the official. I'm not sure about the speed of this tf implementation. Is it actually faster?

Thanks very much for your contribution.

kmyi commented
  1. Should there be some scripts to automate the training data preparation (i.e. users only need to provide images)?

We currently don't provide that.

  1. Or, are there some pre-trained models for other users to compare with? I have not seen them here.

We are trying to find the time to release models for TF for easier use. But right now, we are struggling to find time to work on this.

  1. I have tried the theano implementation. I noticed that there is a pre-trained model there. However, the code is really slow. A 300x300 image takes about 8 minutes to get processed on a Titan X GPU. I noticed that some comments in code stating that some steps are for "proof of concept only". However, it would be still good to have a more efficient implementation from the official. I'm not sure about the speed of this tf implementation. Is it actually faster?

We'll have to see, but for the Detector part, you can turn off theano, which would make it much faster. However, the NMS process is quite slow due to its pure python implementation, which is not that great.

For the latter two parts, the descriptor and orientation, the majority of the time is due to compilation. You can probably by pass this by applying the same function object to multiple images and text files.

A 300x300 image takes about 8 minutes to get processed on a Titan X GPU.

Once you get the initial compilation done with TF, it does not take as long

Please understand we don't have a person dedicating their time to maintain this repo.

Btw, why is NMS used instead of softargmax as stated in the paper ?

kmyi commented

Hi, the SoftArgMax is used for training, and we use NMS for testing. It's shown in Fig 4.

Cheers,
Kwang