terminology clarification
mrgransky opened this issue · 1 comments
I wonder if you could kindly clarify differences between the following terms:
batchSize
vs cacheBatchSize
vs cacheRefreshRate
self.nNegSample
vs self.nNeg
self.nontrivial_positives
vs self.potential_positives
self.potential_negatives
, self.negCache
and self.negCache
For Pittsburgh 30k
, for instance, I can print these info for two classes WholeDatasetFromStruct
and QueryDatasetFromStruct
:
----------------------------------------------------------------------------------------------------
Loading pittsburgh in train mode
>> Defining whole_train_set...
>> whole_train_set [17416]:
WholeDatasetFromStruct
dataset: pitts30k mode: train
IMGs (db: 10000 qu: 7416) onlyDB: False => |IMGs|: 17416
positives: None
Transforms (if any): Compose(
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
>> Defining whole_training_data_loader given whole_train_set using torch.utils.data.DataLoader...
>> ok!
>> Defining train_set for queries with 0.1 margin...
>> train_set [7320]:
QueryDatasetFromStruct
Dataset: pitts30k mode: train margin: 0.1
nontrivial (+) th: 10.0 m potential (+) th: 25 m
Negs: 10 Neg samples: 1000 potential Negs (> 25 m): 7416
nontrivial pos: 7416 potential pos: 7416
IMGs (db: 10000 qu: 7416)
All queries without nontrivial positives: 7320
negative Cache: 7416
Transforms (if any): Compose(
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
>> Defining whole_test_set...
>> whole_test_set [17608]:
WholeDatasetFromStruct
dataset: pitts30k mode: val
IMGs (db: 10000 qu: 7608) onlyDB: False => |IMGs|: 17608
positives: None
Transforms (if any): Compose(
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
>> Evaluating on val set, query count: 7608
Done
----------------------------------------------------------------------------------------------------
I have already read, #33, #26, #4, #9!
I am trying to adjust NetVLAD to another dataset with GPS info and I am confused how to modify code accordingly?
These terms are copied from the original netvlad codebase and are mostly described in the paper also (pay attention to section 4 and appendix A of the paper), plus most of these there is actually some documentation in the code:
batchSize, cacheBatchSize, cacheRefreshRate: https://github.com/Nanne/pytorch-NetVlad/blob/master/main.py#L29-L33
nNegSample & nNeg: https://github.com/Nanne/pytorch-NetVlad/blob/master/pittsburgh.py#L179-L180
nontrivial_positives are those images within self.dbStruct.nonTrivPosDistSqThr**0.5
(10 meter) of query position, and are used for the Multiple instance learning selection of the positive image in the triplet.
Potential_positives are the images within self.dbStruct.posDistThr
(25 meter?) of the query, and are used for negative selection (i.e., these for sure should not be a negative), and during evalution these are labeled as correct.
Potential_negatives: https://github.com/Nanne/pytorch-NetVlad/blob/master/pittsburgh.py#L198 i.e., those images which are more than 25 meters away from the query. And these might be used for negative mining.
NegCache is the cache of the negatives, i.e., the 10 images selected as negatives last time this query was seen (last epoch). Which implements this step from the paper (p.11):
The mining is done by keeping the 10 hardest negatives from a pool of 1000 randomly sampled negatives and 10 hardest negatives from the previous epoch. We find that remembering previous hard negatives adds stability to the training process.
If you study this codebase and the NetVlad paper then you should be able to figure out how it works, from there you can see how to adapt it to your own dataset.