ParkinsonLab/MetaPro

config file not being read

Closed this issue · 9 comments

ramay commented

Hi,
I have modified the provided config.ini files to point to the location of downloaded databases but I think it is not being used. Also it stops when it cannot find this file .

file name: nr
/project/j/jparkin/Lab_Databases/nr/nr
file does not exists

I have only changed database_path in the config.ini and it is pasted below.
Please let me know if I am doing something wrong.
Thanks!
Hena

Here is the command i used

singularity exec -C metapro_latest.sif  python3 /pipeline/MetaPro.py -c config.ini -1 con_1.fastq.gz -2 con_2.fastq.gz -o test
INFO:    underlay of /etc/localtime required more than 50 (110) bind mounts
METAPRO metatranscriptomic analysis pipeline
output folder does not exist.  Now building directory.
=====================================
no-host: False
verbose_mode: quiet
CHECKING CONFIG
USING CONFIG config.ini
Settings no section found, using default: genus
Settings no section found, using default: 30
Settings no section found, using default: No
Settings no section found, using default: bypass_log.txt
Settings no section found, using default: none
Settings no section found, using default: 32
Settings no section found, using default: 0.1
Settings no section found, using default: chocophlan
Settings no section found, using default: 0.01
Settings no section found, using default: 90
Settings no section found, using default: 85
Settings no section found, using default: 0.65
Settings no section found, using default: 60
Settings no section found, using default: 85
Settings no section found, using default: 0.65
Settings no section found, using default: 60
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 10
Settings no section found, using default: 50
Settings no section found, using default: 10
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 5
Settings no section found, using default: 50
Settings no section found, using default: 50
Settings no section found, using default: 5
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 1000
Settings no section found, using default: 1000
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 24
Settings no section found, using default: 1
Settings no section found, using default: 24
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 5
Settings no section found, using default: 10
Settings no section found, using default: 1
Settings no section found, using default: yes
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: no
Settings no section found, using default: high
Settings no section found, using default: 25000
Settings no section found, using default: 50000
Settings no section found, using default: 50000
Labels no section found, using default: quality_filter
Labels no section found, using default: host_filter
Labels no section found, using default: vector_filter
Labels no section found, using default: rRNA_filter
Labels no section found, using default: rRNA_filter_split
Labels no section found, using default: rRNA_filter_convert
Labels no section found, using default: rRNA_filter_barrnap
Labels no section found, using default: rRNA_filter_barrnap_merge
Labels no section found, using default: rRNA_filter_barrnap_pp
Labels no section found, using default: rRNA_filter_infernal
Labels no section found, using default: rRNA_filter_infernal_prep
Labels no section found, using default: rRNA_filter_splitter
Labels no section found, using default: rRNA_filter_post
Labels no section found, using default: duplicate_repopulation
Labels no section found, using default: assemble_contigs
Labels no section found, using default: destroy_contigs
Labels no section found, using default: GA_pre_scan
Labels no section found, using default: GA_split
Labels no section found, using default: GA_BWA
Labels no section found, using default: GA_BWA_pp
Labels no section found, using default: GA_BWA_merge
Labels no section found, using default: GA_BLAT
Labels no section found, using default: GA_BLAT_cleanup
Labels no section found, using default: GA_BLAT_cat
Labels no section found, using default: GA_BLAT_pp
Labels no section found, using default: GA_BLAT_merge
Labels no section found, using default: GA_DMD
Labels no section found, using default: GA_DMD_pp
Labels no section found, using default: GA_final_merge
Labels no section found, using default: taxonomic_annotation
Labels no section found, using default: enzyme_annotation
Labels no section found, using default: enzyme_annotation_detect
Labels no section found, using default: enzyme_annotation_priam
Labels no section found, using default: enzyme_annotation_priam_split
Labels no section found, using default: enzyme_annotation_priam_cat
Labels no section found, using default: enzyme_annotation_DMD
Labels no section found, using default: enzyme_annotation_pp
Labels no section found, using default: outputs
Labels no section found, using default: output_copy_gene_map
Labels no section found, using default: output_clean_ec
Labels no section found, using default: output_copy_taxa
Labels no section found, using default: output_network_generation
Labels no section found, using default: output_unique_hosts_singletons
Labels no section found, using default: output_unique_hosts_pair_1
Labels no section found, using default: output_unique_hosts_pair_2
Labels no section found, using default: output_unique_vectors_singletons
Labels no section found, using default: output_unique_vectors_pair_1
Labels no section found, using default: output_unique_vectors_pair_2
Labels no section found, using default: output_combine_hosts
Labels no section found, using default: output_per_read_scores
Labels no section found, using default: output_contig_stats
Labels no section found, using default: output_ec_heatmap
Labels no section found, using default: output_taxa_groupby
Labels no section found, using default: output_read_count
Databases no section found, using default: /project/j/jparkin/Lab_Databases/univec_core/UniVec_Core.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Trimmomatic_adapters/TruSeq3-PE-2.fa
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Mouse_cds/Mouse_cds.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Rfam/Rfam.cm
Databases no section found, using default: /project/j/jparkin/Lab_Databases/ChocoPhlAn/ChocoPhlAn.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/family_llbs
Databases no section found, using default: /project/j/jparkin/Lab_Databases/nr/nr
Databases no section found, using default: /project/j/jparkin/Lab_Databases/nr/nr
Databases no section found, using default: /project/j/jparkin/Lab_Databases/accession2taxid/accession2taxid
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/nodes.dmp
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/names.dmp
Databases no section found, using default: /project/j/jparkin/Lab_Databases/kaiju_db/kaiju_db_nr.fmi
Databases no section found, using default: /project/j/jparkin/Lab_Databases/centrifuge_db/nt
Databases no section found, using default: /project/j/jparkin/Lab_Databases/swiss_prot_db/swiss_prot_db
Databases no section found, using default: /project/j/jparkin/Lab_Databases/swiss_prot_db/SwissProt_EC_Mapping.tsv
Databases no section found, using default: /project/j/jparkin/Lab_Databases/PRIAM_db/
Databases no section found, using default: /project/j/jparkin/Lab_Databases/DETECTv2
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/
Databases no section found, using default: /project/j/jparkin/Lab_Databases/EC_pathway.txt
Databases no section found, using default: /pipeline/custom_databases/pathway_to_superpathway.csv
Databases no section found, using default: /pipeline_tools/mgm/MetaGeneMark_v1.mod
Databases no section found, using default: /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt
Databases no section found, using default: /pipeline/custom_databases/taxid_trees/family_tree.tsv
Databases no section found, using default: /pipeline/custom_databases/kraken2_db
dir name: /project/j/jparkin/Lab_Databases/nr
file name: nr
/project/j/jparkin/Lab_Databases/nr/nr
file does not exists

config file:

[Databases]
database_path: /bulk/IMCbinf_bulk/hramay/projects/Pfeffer/python_lib
UniVec_Core: %(database_path)s/univec_core/UniVec_Core.fasta
Adapter: %(database_path)s/Trimmomatic_adapters/TruSeq3-PE-2.fa
Host: %(database_path)s/human_genome/GRCh37_38_human_genome.fasta
Rfam: %(database_path)s/Rfam/Rfam.cm
source_taxa_db: /home/billy/storage/choco_h3/order_group
DNA_DB: $(database_path)s/choco_mpro_h3/order_group
Prot_DB: %(database_path)s/nr/nr
Prot_DB_reads: %(database_path)s/nr/nr
accession2taxid: %(database_path)s/accession2taxid/accession2taxid
nodes: %(database_path)s/WEVOTE_db/nodes_wevote.dmp
names: %(database_path)s/WEVOTE_db/names_wevote.dmp
Kaiju_db: %(database_path)s/kaiju_db/kaiju_db_nr.fmi
Centrifuge_db: %(database_path)s/centrifuge_db/nt
SWISS_PROT: %(database_path)s/swiss_prot_db/swiss_prot_db
SWISS_PROT_map: %(database_path)s/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB: %(database_path)s/PRIAM_db/
DetectDB: %(database_path)s/DETECTv2
WEVOTEDB: %(database_path)s/WEVOTE_db/
EC_pathway: %(database_path)s/EC_pathway/EC_pathway.txt
path_to_superpath: %(database_path)s/path_to_superpath/pathway_to_superpathway.csv
MetaGeneMark_model: /pipeline_tools/mgm/MetaGeneMark_v1.mod
taxid_tree: $(database_path)s/order_tree.tsv
kraken2_db: $(database_path)s/kraken2/db

[code]
ga_pre_scan_get_lib = /home/billy/storage/human_flu/30785/ga_pre_scan_get_libs.py
ga_pre_scan_assemble_lib = /home/billy/storage/human_flu/30785/assemble_libs.py
ta_combine = /home/billy/storage/human_flu/30785/ta_combine_v3.py

[Tools]
Python = python3
Java = java -jar
cdhit_dup = /pipeline_tools/cdhit_dup/cd-hit-dup
Timmomatic = /pipeline_tools/Trimmomatic/trimmomatic-0.36.jar
AdapterRemoval = /pipeline_tools/adapterremoval/AdapterRemoval
vsearch = /pipeline_tools/vsearch/vsearch
Flash = /pipeline_tools/FLASH/flash
BWA = /pipeline_tools/BWA/bwa
SAMTOOLS = /pipeline_tools/samtools/samtools
BLAT = /pipeline_tools/PBLAT/pblat
DIAMOND = /pipeline_tools/DIAMOND/diamond
Blastp = /pipeline_tools/BLAST_p/blastp
Barrnap = /pipeline_tools/Barrnap/bin/barrnap
#note: needle is quite slow.  This argument can be swapped for stretcher, but stretcher is not as accurate
Needle = /pipeline_tools/EMBOSS-6.6.0/emboss/needle
Blastdbcmd = /pipeline_tools/BLAST_p/blastdbcmd
Makeblastdb = /pipeline_tools/BLAST_p/makeblastdb
Infernal = /pipeline_tools/infernal/cmscan
Kaiju = /pipeline_tools/kaiju/kaiju
kraken2 = /pipeline_tools/kraken2-2.1.2/kraken2
Centrifuge = /pipeline_tools/centrifuge/centrifuge
Priam = /pipeline_tools/PRIAM_search/PRIAM_search.jar
Detect = /pipeline/Scripts/Detect_2.2.9.py
BLAST_dir = /pipeline_tools/BLAST_p
WEVOTE = /pipeline_tools/WEVOTE/WEVOTE
Spades = /pipeline_tools/SPAdes/bin/spades.py
MetaGeneMark = /pipeline_tools/mgm/gmhmmp

#default is 30, to coincide with cdhit_dup's default.
[Settings]
AdapterRemoval_minlength = 30
bypass_log_name = order_bypass_log_1.txt
debug_stop_flag = order_ta_1
taxa_existence_cutoff = 1
DNA_DB_mode = chocophlan

Show_unclassified = No
RPKM_cutoff = 0.01
BWA_cigar_cutoff = 90
BLAT_identity_cutoff = 85
BLAT_length_cutoff = 0.65
BLAT_score_cutoff = 60
DIAMOND_identity_cutoff = 85
DIAMOND_length_cutoff = 0.65
DIAMOND_score_cutoff = 60

#determines how much memory should be free for use with each tool.  [1-99]
BWA_mem_threshold = 75
BLAT_mem_threshold = 75
DIAMOND_mem_threshold = 80
BWA_pp_mem_threshold = 30
BLAT_pp_mem_threshold = 75
DIAMOND_pp_mem_threshold = 80
DETECT_mem_threshold = 80
Infernal_mem_threshold = 75
Barrnap_mem_threshold = 75

#maximum allowable concurrent instances of the tool [1-number of cores on your machine]
BWA_job_limit = 40
BLAT_job_limit = 40
DIAMOND_job_limit = 32
BWA_pp_job_limit = 40
BLAT_pp_job_limit = 40
DETECT_job_limit = 40
Infernal_job_limit = 40
Barrnap_job_limit = 40

#determines how long (seconds) each job should wait before they are launched (imposes delay.  for memory-use-detection)
BWA_job_delay = 0.5
BLAT_job_delay = 5
DIAMOND_job_delay = 5
BWA_pp_job_delay = 0.01
BLAT_pp_job_delay = 0.05
DIAMOND_pp_job_delay = 5

#toggle these options to delete the interim data the steps will generate
keep_all = yes
keep_quality = no
keep_host = no
keep_vector = no
keep_rRNA = no
keep_repop = no
keep_assemble_contigs = yes
keep_GA_BWA = no
keep_GA_BLAT = no
keep_GA_DIAMOND = no
keep_GA_final = no
keep_TA = no
keep_EC = no
keep_outputs = no

#Data gets chunked in the pipeline.  Control the chunk size to reduece the number of files generated.  Increase for speed, and reduced memory load per tool-run [1-99999]
rRNA_chunk_size = 50000
GA_chunk_size = 10000
EC_chunk_size = 1000

#Decides the lossiness of host + vector filtration.  For paired-end annotation only.  [high/low] high: only if both pairs pass through the filter.  low: if either pair passes through filter, both pairs pass
filter_stringency = high


TA_mem_threshold = 80
TA_job_delay = 10


[Labels]
GA_pre_scan = order_1_GA_pre_scan
GA_BWA = order_test_GA_BWA_1
GA_BLAT = order_test_GA_BLAT_1
host_filter = host_read_filter
vector_filter = vector_read_filter
ta = order_ta_1
#GA_DIAMOND = order_test_GA_dmd_1

ramay commented

Hi guys, Wondering if anyone can answer this question. I really want to try metapro on my samples but cannot get it to work at the momement.
Thanks!
Hena

ramay commented

Hi all, Any input on this?
Thanks!
Hena

in your launch command, you just specify -c config.ini

where is config.ini?
where is the console when you launch MetaPro?
This is a hunch, but try giving it an absolute path instead, to be sure that the command is actually getting the correct config.ini

ramay commented

I used -c config.ini because it is in the folder where I am launching metapro from.

But I changed it as you asked and there is no difference. In the config.ini the location of nr is specified and different but it it still not using the paths provided in this config.ini file.
Any other suggestions?
Thanks!
Hena


singularity exec -C metapro_v2.1.1.sif  python3 /pipeline/MetaPro.py -c /bulk/IMCbinf_bulk/hramay/projects/Pfeffer/config.ini -1 Neg_con_1.fastq.gz -2 Neg_con_2.fastq.gz -o test
INFO:    underlay of /etc/localtime required more than 50 (109) bind mounts
METAPRO metatranscriptomic analysis pipeline
output folder does not exist.  Now building directory.
=====================================
no-host: False
verbose_mode: quiet
CHECKING CONFIG
USING CONFIG
Databases no section found, using default: /project/j/jparkin/Lab_Databases/univec_core/UniVec_Core.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Trimmomatic_adapters/TruSeq3-PE-2.fa
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Mouse_cds/Mouse_cds.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/Rfam/Rfam.cm
Databases no section found, using default: /project/j/jparkin/Lab_Databases/ChocoPhlAn/ChocoPhlAn.fasta
Databases no section found, using default: /project/j/jparkin/Lab_Databases/nr/nr
Databases no section found, using default: /project/j/jparkin/Lab_Databases/nr/nr
Databases no section found, using default: /project/j/jparkin/Lab_Databases/accession2taxid/accession2taxid
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/nodes.dmp
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/names.dmp
Databases no section found, using default: /project/j/jparkin/Lab_Databases/kaiju_db/kaiju_db_nr.fmi
Databases no section found, using default: /project/j/jparkin/Lab_Databases/centrifuge_db/nt
Databases no section found, using default: /project/j/jparkin/Lab_Databases/swiss_prot_db/swiss_prot_db
Databases no section found, using default: /project/j/jparkin/Lab_Databases/swiss_prot_db/SwissProt_EC_Mapping.tsv
Databases no section found, using default: /project/j/jparkin/Lab_Databases/PRIAM_db/
Databases no section found, using default: /project/j/jparkin/Lab_Databases/DETECTv2
Databases no section found, using default: /project/j/jparkin/Lab_Databases/WEVOTE_db/
Databases no section found, using default: /project/j/jparkin/Lab_Databases/EC_pathway.txt
Databases no section found, using default: /pipeline/custom_databases/pathway_to_superpathway.csv
Databases no section found, using default: /pipeline_tools/mgm/MetaGeneMark_v1.mod
Databases no section found, using default: /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt
dir name: /project/j/jparkin/Lab_Databases/nr
file name: nr
/project/j/jparkin/Lab_Databases/nr/nr
file does not exists

That singularity command call looks suspicious.
It's maybe missing the bind mounts. In this situation: though your files are where you say they are, singularity can't see it.

Bind mounts tell singularity where to look. Singularity is a mini VM. it needs to mount drives and folders. Bind mounting binds your filesystem to the singularity instance.

ramay commented

Hi Billy,
Thanks for the tip. I fixed that problem but now here is a new one :). Seems like I don't have the correct file bwa needs. the files in the nr folder are from July 2021.

Command I used was:
singularity exec -B /bulk:/bulk -C metapro_v2.1.1.sif python3 /pipeline/MetaPro.py -c /bulk/IMCbinf_bulk/hramay/projects/Pfeffer/config.ini -1 Neg_con_1.fastq.gz -2 Neg_con_2.fastq.gz -o test

Output:

INFO:    underlay of /etc/localtime required more than 50 (109) bind mounts
METAPRO metatranscriptomic analysis pipeline
output folder does not exist.  Now building directory.
=====================================
no-host: False
verbose_mode: quiet
CHECKING CONFIG
USING CONFIG
enzyme_db no inner section found. using default /pipeline/custom_databases/FREQ_EC_pairs_3_mai_2020.txt
dir name: /bulk/IMCbinf_bulk/hramay/projects/Pfeffer/python_lib/nr
file name: nr
/bulk/IMCbinf_bulk/hramay/projects/Pfeffer/python_lib/nr/nr
2023-09-20 10:08:22.111588 /bulk/IMCbinf_bulk/hramay/projects/Pfeffer/python_lib/nr/nr exists
2023-09-20 10:08:22.112136 DMD index is ok
Error: no fasta file found. BWA only accepts .fasta extensions
```

apologies. There is currently a refactor happening to address configuration inconveniences.

Hi @ramay!

I was running into a similar issue with this and found that the resolution for this specific issue (at least on my end) was to change all of the colons in the [Databases] section to equal signs. So the first few lines read as:

[Databases]
database_path = /my/database/location
UniVec_Core = %(database_path)s/univec_core/UniVec_Core.fasta
Adapter = %(database_path)s/Trimmomatic_adapters/TruSeq3-PE-2.fa

I also needed to fix source_taxa_db (changing it to %(database_path)s like above) and remove the [Code] section that points to additional files I don't need.

Hope that's helpful!

I'll patch this in the config of the next new release. thanks!