marbl/MashMap

Problemas with illumina files

Closed this issue · 2 comments

Hello,

I want to use your code with illumina sequences encoded in Sanger / Illumina 1.9. In the case I have two files R1 and R2 with read length of 77 and I use as reference the human dna v37 (human_g1k_v37_decoy.fasta). I try to run the code with the comand
./mashmap -s ../ref/human_g1k_v37_decoy.fasta --ql input.txt -m 75 -o illu.txt
The input.txt has my two illumina files (sub_270654-003_4_S48_L004_R1_001.fastq.gz and sub_270654-003_4_S48_L004_R2_001.fastq.gz).

But the program consumes lots of memory (45 GB) and it terminates with a Killed 9.
The outputs of the console is this:
./mashmap -s ../ref/human_g1k_v37_decoy.fasta --ql input.txt -m 75 -o illu.txt

Reference = [../ref/human_g1k_v37_decoy.fasta]
Query = [../data/illumina/sub_270654-003_4_S48_L004_R1_001.fastq.gz, ../data/illumina/sub_270654-003_4_S48_L004_R2_001.fastq.gz]
Kmer size = 16
Window size = 2
Read length >= 75
Alphabet = DNA
P-value = 0.001
Percentage identity threshold = 85
Mapping output file = illu.txt

INFO, skch::Sketch::build, minimizers picked from reference = 1933153479
Killed: 9

Am I making something wrong?
Thanks.

Mashmap is not developed for short reads (77 bp as you mentioned). I will suggest using bwa mem or bowtie for this purpose. They should be faster and memory efficient than mashmap in this case.

I see.
Thanks!