haowenz/sigmap

Can I re-configure chunk size?

harisankarsadasivan opened this issue · 7 comments

I think traditionally the real-time chunk size is 4000 samples(1 sec). But I'm doing some offline testing.
Say all my input reads have > 2000 raw trimmed squiggle samples and I want to test SIGMAP with 2000 raw samples, how do I configure SIGMAP accordingly? Kindly point me to relevant code.

I am not sure if I get your question correctly. What Sigmap does is that it maps the signal chunk by chunk, and once the best mapping is much better than the second best mapping, it will stop and output the mapping. If you want to control the number of chunks that you want to use in the mapping given you already have the reads in full length, you can set --stop-mapping and --stop-mapping-mean to a large number, and --min-num-anchors and together with --max-num-chunks to the same very large number too. In this case, the mapping may not stop early so that the number of specified chunks might be used in the mapping. Since Sigmap is not designed for this purpose, I never tried this and not sure if it would work. But you may give it a try if you want.

@haowenz I think my application is slightly different. What I want to do is always test SIGMAP only on 2000 raw samples however long each read maybe. I do not want to map the entire read. How may I achieve this?

Thank You so much for helping out.

Currently, it is not a parameter. But you can easily change the code at the following line and recompile the program.

uint32_t chunk_size = 4000;

Just use 2000 instead of 4000 and the chunk size will be change to 2000.

If you want to trim the signal, you may also change the following line and recompile.

size_t signal_length =

For example, if you can change this to 6000, then at most the first 6000 current readings will be used.

Hope these answer your question.

Awesome, thanks a lot.

@haowenz I get the following error when I configure uint32 chunk_size = 100 in line 639 of sigmap.cc, re-compile and run with the options --max-num-chunks 11 --min-num-anchors-output 2 --step-size 1. However, works when chunk_size=500.
I was facing same error when I tried to chaneg size_t signal_length to a fixed number on line 641, sigmap.cc.
My intention is to correctly benchmark sigmap for Read Until (with due diligence) on various read prefixes of length: 10,20,30,40,50,60,70,80,90,100,150,200,250,300,350,400,450,500,550,.....900 bases (1base=~10 samples) after possibly trimming the first 1000 samples because of adapter noise. Kindly guide me.

sigmap: src/event.h:188: sigmap::Event sigmap::CreateEvent(size_t, size_t, const float*, const float*, size_t): Assertion `start < signal_length' failed.

100 as chunk size (100 current values) can be too small. There is almost no event for some types of data, and even the code is modified to get executed, its logic might not work since there is too few number of events.

@haowenz Thanks for your response. Also, I wanted to trim the beginning of the signal not the end.
Because that is where the adapters are. Could you help me with that.

Please also suggest a minimum chunk size required for the code to work.

@haowenz Thanks for your response. Also, I wanted to trim the beginning of the signal not the end. Because that is where the adapters are. Could you help me with that.

I think it would require some code change to do that. I don't really have the bandwidth right now. You may have a look at the code to see if it is easy to do the trim.

Please also suggest a minimum chunk size required for the code to work.

We didn't test Chromap in the way you described. So I have no idea on the minimum chunk size it would require to work well.