sameerkhurana10/DSOL_rv0.2

Number of test examples

Closed this issue · 6 comments

Hi,

Thanks a lot for the great work. I would like to reuse the data you provide to do a comparison of various techniques on the same task. While preparing the dataset I found little inconsistency and it would be great if you could shorlty clarify it for me.

In your paper, you speak of 2001 test examples which corresponds to the number of examples in test_src_bio but test_src and test_tgt both only contain 1999 examples. It seems there are two negative examples missing as there are only 999 avialable. Not a big deal but depending on which examples are missing the aglignment of the biological data and sequences provided will be skewed.

Thanks a lot for you clarification.

Great, thanks a lot for the fast reaction!

With the alignment of the biological data, I mean that when a missing sequence is somewhere in the middle of the test_src file all following sequences will be off by one from the corresponding line in test_src_bio. Consequently, when I combine the two files the wrong SCRATCH features are assigned to a sequence.

Just to make sure the last column in test_src_bio corresponds to the target variable?

right. I will try to find it. Its been a year. Maybe its just the first 1999 sequences from src_bio.

Why don't you try to run it. The code won't throw an error, because it is just taking the first 1999 sequences.

@raghvendra5688 pinging Raghavendra if he remembers.

Hi,
I have added the two missing samples and double checked that there is no problem of alignment of the biological data.

@svgsponer What all methods do you plan to run as we are writing a continuation paper comparing latest deep learning methods on the same dataset?

Hi,

Great thanks a lot!

@raghvendra5688 I currently work on various methods that learn linear models in the unlimited length k-mer feature space based on work done for https://github.com/svgsponer/SqLoss.
A continuation paper sounds interesting and I'm curious to see new improvements. What architectures are you planning to try out?

We are planning to use GANs and VAE for the same problem. I will update you about results when we have a draft ready.