Get single marker gene from read

Question

Get single marker gene from read

Opened this issue 2 years ago · 0 comments

I have a fasta file with a maker gene and I would like to extract it from raw illumina reads:

The marker is in a fasta file with only one sequence:

LLX10@00074518
MAIEDNPYVFRFEGRLWVSEEPRETAAAQLRAQREWDRQNARLQHWWVAISVSAVAGVAV
TLYLGTSAGLAPAIYLVLLPIGFGAGAVLGALVNKRFFAPELQHGSLPPRPELAKLTRIP
SRVARAAPDNASARDLIDWSTRGFVD

I try to construct a custom database with matam by I have this error:

$ singularity exec --bind /scratch/ulg/bioec/lcornet/matam:/mnt matam.sif matam_db_preprocessing.py -i /mnt/marker.fasta -d /mnt/marker/ --cpu 1 --max_memory 10000 -v

#################################
MATAM db pre-processing
#################################

CMD: /opt/miniconda/opt/matam-1.6.0/scripts/matam_db_preprocessing.py --verbose --cpu 1 --max_memory 10000 --min_length 10 --max_consecutive_n 5 --clustering_id_threshold 0.95 --db_dir /mnt/marker --input_ref /mnt/marker.fasta

INFO - Starting ref db pre-processing
INFO - Extracting taxonomies from reference DB
INFO - Cleaning reference db
1 sequences were rejected
INFO - Starting ref db clustering
INFO - Clustering sequences @ 95 pct id
vsearch v2.15.2_linux_x86_64, 251.8GB RAM, 64 cores
https://github.com/torognes/vsearch

Reading file /mnt/marker/marker.cleaned.fasta 100%
0 nt in 0 seqs
minseqlength 32: 1 sequence discarded.
Masking 100%
Sorting by length 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 0
Singletons: 0
Traceback (most recent call last):
File "/opt/miniconda/opt/matam-1.6.0/scripts/fasta_clean_name.py", line 62, in
sequence_id = header.split()[0]
IndexError: list index out of range
INFO - Renaming output files as MATAM db files
INFO - Indexing complete ref db

WARNING: no write permissions in directory /tmpscratch: No such file or directory
will try /tmp/.

Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr

Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000

Total number of databases to index: 1

Begin indexing file /mnt/marker/marker_NR95.complete.fasta under index name /mnt/marker/marker_NR95.complete:

ERROR: at least one of your reads is shorter than the seed length 19, please filter out all reads shorter than 19 to continue index construction.

Collecting sequence distribution statistics ..
INFO - Indexing clustered ref db
The input file is empty, an index was not built.

Output MATAM db: /mnt/marker/marker_NR95

matam_db_preprocessing.py terminated with some errors. Check the log for additional infos