Questions about Index Size and Short Mode
Opened this issue · 0 comments
adlpecer commented
Hi @yfukasawa,
In the first place, thank you for developing LongQC.
I am currently testing the tool to understand all the parameters better and choose their optimal configuration. However, I have several questions about the Index Size and the Short Mode since my test results seem unclear.
I have used two public datasets for my tests: flnc.bam (PacBio, Transcriptomic, ~4 Gb) and pb.bam (Pacbio, Genomic, ~12 Gb).
These are the results of my tests:
Test 1 - flnc.bam
Command (Only modifying the index size on each iteration):
longQC.py sampleqc -o /tmp/results -x pb-hifi -n 10000 -p 8 -m 2 -i 1G -t /data/input/flnc.bam
Results
Metrics table
CPUs and Memory use over time
Index Size = 1G
Index Size = 8G
Test 2 - pb.bam
Command (This time I have modified both index size and short mode):
longQC.py sampleqc -o /tmp/results -x pb-sequel -n 10000 -p 8 -m 2 -i 1G -b /data/input/pb.bam
Results
Metrics table
CPUs and Memory use over time
Index Size = 1G
Index Size = 8G
Conclusions
- There is a significant variation of results across index sizes.
- However, non-sense reads fraction is quite similar between different runs.
- I have not found a linear correlation between index size and the execution time.
- Enabling the Short Mode tends to reduce the non-sense read fraction slightly.
According to the results, my questions are the next:
- Since bigger index sizes are much more costly in terms of memory and time, what are the advantages of selecting a big index size (e.g., 8G) compared with a smaller one (e.g., 1G)?
- In your opinion, what Index size value should provide the most accurate results?
- In which cases is it advisable to activate the short mode?
Thanks!,
Adolfo