Problem with the -F option: the names of the output files get messed up
thecorz opened this issue · 2 comments
thecorz commented
Hello, thank you for developing the tool, really useful
When I try to demultiplex my reads into different sample files, I get files with the names of multiple samples concatenated
When I run the below line in Ubuntu 20.04 using Python 3.8:
python3 minibar.py minibar_barcode.txt FAT74402_pass_fc9452d2_1.fasta
-F
If I name my samples '1', '2', '3' and '4' I get files like the following:
`sample1.fasta
sample1_2.fasta
sample1_2_3.fasta
sample2
sample2_4'
What I tried:
- Changing the names of the files to something very simple
- Play with the options -M {1,2} -C -CC -e -l -n
- Changing from fastq to fasta
- Running the test files: When I run minibar with the test files of the repo, the names of the output files are fine
Is it a python version issue?
Thanks you very much
thecorz commented
I can see the problem happening now in python3.8 and python2.7 so I don't think it's a matter of the python version
jbh-cas commented
Javier,
The python version should be fine. The problem is in one pathway where when multiple barcodes matched in a particular way that the ID match became part of the name. Whereas in other instances the desired behavior of the output worked. This is to append the multiply matching sequence to the <prefix>_Multiple_Matches.fastq file.
The version 0.23 now on github should fix the pathway I was able to reproduce so that all these sequences will be in the Multiple_Matches file.
You can try to reduce this by using the -e <num> option to allow fewer sequencing errors in the barcode but that is a tradeoff with having more unmatched sequences.
Also, to check how close the barcodes are to each other you can do minibar.py <barcode_filename> -info all and that will give you a sense of a good -e value.
best,
Jim Henderson
…---------------------------- Original Message ----------------------------
Subject: Re: [calacademy-research/minibar] Problem with the -F option: the names of the output files get messed up (Issue #7)
From: "Javier Cuadrado Corz" ***@***.***>
Date: Wed, October 19, 2022 3:51 am
To: "calacademy-research/minibar" ***@***.***>
Cc: "Subscribed" ***@***.***>
--------------------------------------------------------------------------
I can see the problem happening now in python3.8 and python2.7 so I don't think it's a matter of the python version
--
Reply to this email directly or view it on GitHub:
#7 (comment)
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>