aalto-ics-kepaco/msms_rt_ssvm

Open questions regarding implementation details

Opened this issue · 1 comments

  • If two MS-features, e.g. within one sequence, have them same candidate sets, should the randomly sub-sampled candidate sets, e.g. used during training, be identical for both features?
  • Is it sufficient to calculate the average, e.g. top-k, accuracy over the sequences in the sample? Thereby we first calculate the average, e.g. top-k, accuracy over the sequence and subsequently average over the samples.

Answer to first question: I think it does not really matter. My experiments showed that we can even have different random candidate subsets each time a specific spectrum re-appears in a training sequence. Only in during testing this matters, but that we anyway do not randomly sub-sample.