How to generate the data for training descriptor network

Question

How to generate the data for training descriptor network

13331151 opened this issue 8 years ago · 12 comments

Hi,
I'm Jack. I recently trained a model of descriptor network but it didn't work well. Could you tell me how your data set for training descriptor network generate? And could you tell me the validation err you get when training the descriptor network(mine is about 2.1)?
My process:
1. Pick a structure, then re-project it back into two features in two corresponding images;
2. Pick another structure very close to the former one, also re-project it back;
3. In each image, I have two projected feature points, I can get the direction and scale from it;
4. Then I crop a patch according to each feature point's direction and scale.

That's what I get for example:

Thanks!!! :)

13331151 commented 8 years ago

@kmyid

Answer 1 · 2017-01-03T09:44:10.000Z

Hi Jack,

Could you tell me how your data set for training descriptor network generate?

We do a similar process. However, in our case, we use the actual raw SIFT points detected at each image, not the reprojected points. We crop 6 times the scale at the feature point location, which is the same area the SIFT descriptor looks at.

And could you tell me the validation err you get when training the descriptor network(mine is about 2.1)?

I am not really sure if I can give you a value that can be compared, as we have multiple constants multiplied to balance the positives and negatives. One very important thing is to apply hard mining, depending on the data. And this hard mining should progressively increase as you proceed with learning. Have a look at Eduard's descriptor paper, as it is a paper specifically on this learning strategy.

Hope my answer helps!,
Kwang

Answer 2 · 2017-01-03T11:13:23.000Z

I'm very appreciate your help, Kwang. Your answer do help a lot! A little more question is, are you generate the data using Visual SfM's data. I'm new to Visual SfM, and I fail to find the file which stores information about structure points(including their scale, position and orientation in corresponding image pairs).

What I can retrieve now:
from *.sift: [x, y, color, scale, orientation]
from .nvm: information(without scale and orientation) about structure as well as its corresponding image IDs.

And below is the loss function of the key-point training. Is it the same as what you describe in the paper?:

prediction1_class = theano.log(lasagne.layers.get_output(layers[0]["kp-scoremap"], deterministic=False))
prediction1_class = lasagne.nonlinearity.softmax(prediction1)
prediction1_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction1))**2

prediction2_class = theano.log(lasagne.layers.get_output(layers[1]["kp-scoremap"], deterministic=False))
prediction2_class = lasagne.nonlinearity.softmax(prediction2)
prediction2_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction2))**2

prediction3_class = theano.log(lasagne.layers.get_output(layers[2]["kp-scoremap"], deterministic=False))
prediction3_class = lasagne.nonlinearity.softmax(prediction3)
prediction3_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction3))**2

prediction4_class = theano.log(lasagne.layers.get_output(layers[3]["kp-scoremap"], deterministic=False))
prediction4_class = lasagne.nonlinearity.softmax(prediction4)
prediction4_class = np.cast[floatX](3./6)* theano.tensor.nnet.relu((prediction4np.cast[floatX](1.)))**2

loss_class = prediction1_class+prediction2_class+prediction3_class+prediction4_class
loss_class = lasagne.objectives.aggregate(loss_class, mode='mean')

prediction1 = lasagne.layers.get_output(layers[0]["desc-output"], deterministic=False)
prediction2 = lasagne.layers.get_output(layers[1]["desc-output"], deterministic=False)

loss_pair = theano.tensor.sum((prediction1-prediction2)**2+1e-7, axis=1)
loss_pair = lasagne.objectives.aggregate(loss_pair, mode='mean')

loss = loss_class+loss_pair

params = lasagne.layers.get_all_params(layers[0]["kp-scoremap"], trainable=True)

print ("Kp-output params: " , params)

updates = lasagne.updates.sgd(loss, params, np.cast[floatX](config.learning_rate))

myNet.train_ori_stochastic = theano.function(inputs=[], outputs=loss,\
                                        givens=givens_train, updates=updates)

Thanks again! @kmyid

Answer 3 · 2017-01-06T18:24:30.000Z

I think it's better if @etrulls answers this :-)

fail to find the file which stores information about structure points(including their scale, position and orientation in corresponding image pairs).

Answer 4 · 2017-01-16T11:20:52.000Z

And below is the loss function of the key-point training. Is it the same as what you describe in the paper?:

You also need to include the overlap loss in the pre-training phase at least. In case of the class loss, I think it's similar to what we did. You also need a hyper parameter to balance loss_class and loss_pair. This parameter should be data-dependent.

Answer 5 · 2017-01-16T12:28:50.000Z

Sorry about the delay, I wasn't receiving issue notifications. Extracting patches from the NVM and SIFT files is quite easy, this does most of the work: https://github.com/jheinly/visual_sfm_support (it's mostly self-explanatory)

You should be able to retrieve the SIFT keypoints used by the reconstruction, and from there you can extract the patches from the original images.

Answer 6 · 2017-01-30T06:00:50.000Z

Thanks for your reply, I will check it out immediately ：)

Answer 7 · 2017-02-23T04:34:09.000Z

Hi,
Sorry to border you guys, but I really wonder how to extract the data to train the model mentioned in LIFT. In the paper, you say Roman Forum has 1.6k images and 51k unique points, but the dataset I downloaded has 7k images. And even after VisualSfM's 3D reconstruction, there are still 1.8k images remained, and 400k unique 3D points in all nvm files. Seeing the bad performance of my trained model, I'm thinking if I did something wrong or different from you.

I generated nvm files in these manner:
Start VisualSfM->Open multiple images->Choose all images in Roman Dataset(7k in total)->Compute missing matches->Compute 3D reconstruction->Save NView Matches->Then I got 22 nvm files for 22 different scene->I parsed each nvm file and below is my parsing log, you can see that the number of the points is very large...Could you tell me where I did wrong, please? Thank you so much. @kmyid @etrulls

#Images: 1669
#Points: 377543
Done loading ../data/TrainingData/Roman_Forum/roman1.nvm
#Images: 38
#Points: 10450
Done loading ../data/TrainingData/Roman_Forum/roman2.nvm
#Images: 32
#Points: 3917
Done loading ../data/TrainingData/Roman_Forum/roman3.nvm
#Images: 19
#Points: 4145
Done loading ../data/TrainingData/Roman_Forum/roman4.nvm
#Images: 17
#Points: 3085
Done loading ../data/TrainingData/Roman_Forum/roman5.nvm
#Images: 13
#Points: 6765
Done loading ../data/TrainingData/Roman_Forum/roman6.nvm
#Images: 12
#Points: 4045
Done loading ../data/TrainingData/Roman_Forum/roman7.nvm
#Images: 11
#Points: 597
Done loading ../data/TrainingData/Roman_Forum/roman8.nvm
#Images: 8
#Points: 2841
Done loading ../data/TrainingData/Roman_Forum/roman9.nvm
#Images: 8
#Points: 1665
Done loading ../data/TrainingData/Roman_Forum/roman10.nvm
#Images: 7
#Points: 2446
Done loading ../data/TrainingData/Roman_Forum/roman11.nvm
#Images: 5
#Points: 1080
Done loading ../data/TrainingData/Roman_Forum/roman12.nvm
#Images: 5
#Points: 486
Done loading ../data/TrainingData/Roman_Forum/roman13.nvm
#Images: 4
#Points: 1518
Done loading ../data/TrainingData/Roman_Forum/roman14.nvm
#Images: 4
#Points: 1467
Done loading ../data/TrainingData/Roman_Forum/roman15.nvm
#Images: 4
#Points: 1282
Done loading ../data/TrainingData/Roman_Forum/roman16.nvm
#Images: 4
#Points: 122
Done loading ../data/TrainingData/Roman_Forum/roman17.nvm
#Images: 3
#Points: 1485
Done loading ../data/TrainingData/Roman_Forum/roman18.nvm
#Images: 3
#Points: 654
Done loading ../data/TrainingData/Roman_Forum/roman19.nvm
#Images: 3
#Points: 324
Done loading ../data/TrainingData/Roman_Forum/roman20.nvm
#Images: 3
#Points: 207
Done loading ../data/TrainingData/Roman_Forum/roman21.nvm
#Images: 3
#Points: 187
Done loading ../data/TrainingData/Roman_Forum/roman22.nvm

Answer 8 · 2017-02-23T06:32:00.000Z

http://www.cs.cornell.edu/projects/1dsfm/

This the link where I downloaded the Roman Dataset.

Answer 9 · 2017-02-23T09:06:45.000Z

Hi Jack,

As ICCV is approaching, I think I won't have much time to answer you. I'll try to get back to you as soon as I can!

Cheers,
Kwang

Answer 10 · 2017-02-24T00:09:59.000Z

Wow, I am very expected to see your new work and I wish you have a great success in ICCV~ ：）

Answer 11 · 2017-06-06T08:37:14.000Z

Hi Jack,

I believe I am a bit late now. We are working on releasing the training part as well. Hopefully soon. This time, it will be tensorflow.