bfosso/MetaShot

metashot failed when running into python env

Closed this issue · 30 comments

--Hi,

psutil functions crashed when they are called from python virtualenv:

source ~/metashot_env/bin/activate
python2.7 ./MetaShot_Master_script.py -p ./parameters_file.txt -m ./filelist.txt

['/home/RAW_DATA/63_R1.fastq.gz', '/home/RAW_DATA/63_R2.fastq.gz']
/home/data/MetaShot
/home/data/MetaShot/MetaShot_reference_data
/home/data/MetaShot/MetaShot_reference_data/Bacteria/bacterial_bowtie_index_reference.lst
correctly formatted read_list file
fastq control
QC.1.trimmed.fastq
QC.2.trimmed.fastq
START 22/02/2019 10:09:46
Traceback (most recent call last):
File "/home/data/MetaShot/find_microbiome.py", line 217, in
process_status = pid_status( proc_id )
File "/home/data/MetaShot/find_microbiome.py", line 123, in pid_status
status = psutil.Process( process_pid ).status( )
File "/home/metashot_env/lib/python2.7/site-packages/psutil/init.py", line 438, in init
self._init(pid)
File "/home/metashot_env/lib/python2.7/site-packages/psutil/init.py", line 478, in _init
raise NoSuchProcess(pid, None, msg)
psutil.NoSuchProcess: psutil.NoSuchProcess no process found with pid 6293
Feb 22 10:25:05 ..... started STAR run
Feb 22 10:25:05 ..... loading genome
correctly formatted read_list file
Mapping on the genome
Feb 22 10:25:05 ..... started STAR run
Feb 22 10:25:05 ..... loading genome
Feb 22 10:25:05 ..... started mapping
Feb 22 10:30:26 ..... finished successfully
sam parsing
HOST MAPPING IS DONE
1 1
Feb 22 10:30:27 ..... started STAR run
Feb 22 10:30:27 ..... loading genome
the indicated /usr/local/data/prov/MetaShot/trimmed_data_1/R1_micro_candidates.fastq fastq file doesn't exist
This script works in two different steps:
(1) it maps the sequencing data against the Bacterial division;
(2) it extract the mapped pairs and tries to map them on the human reference;
Options:
-i paired-end file list: a line containing the R1, the R2 files [MANDATORY]
-p Number of threads/processors available [MANDATORY] -r reference path [MANDATORY]
-s script path path [MANDATORY]
-h print this help.
Usage:
python2.7 division_analyser -i read_list

/home/data/MetaShot/MetaShot_reference_data/Virus/bowtie_index/Viral_division

Could you please describe how your virtualenv environment has been created?
It seems an error associated with the process id and not the module load.

the problem is that the MetaShot pipeline uses python2.7,
in 1 year python2.7 will no longer be maintained. Today most machines use python3.
Why not to update code for Python 3 ?

python2.7 -m virtualenv ~/metashot_env
python2.7 -m pip install numpy
python2.7 -m pip install pysam
python2.7 -m pip install psutil
python2.7 -m pip install biopython

source ~/metashot_env/bin/activate
python2.7 ./MetaShot_Master_script.py -p./parameters_file.txt -m./filelist.txt
deactivate

We are working to a new version of MetaShot and it will be written in Python 3.
Regarding your env I suggest to use conda to create it.

yes, i'm going to try with conda.

I installed old versions of numpy and pysam with python2.7 and now it's ok.
This kind of manipulation is not obvious to all users.
The best would be to create a conda package dedicated to the installation of MetaShot and make it available with your future update.

I just saw that there is a problem with the call from FaQCs.pl:
Bwa extension trimming algorithm is used.
Processing /usr/local/data/prov/MetaShot/phix_removal_1/R1_no_phix.fastq /usr/local/data/prov/MetaShot/phix_removal_1/R2_no_phix.fast
q file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.

It may be a problem related to the FaQCs installation. Which version are you using?

Version: 1.34

Please execute FaQCs.pl externally MetaShot. Let me know if it works or not.
FaQCs.pl -p read1.fq read2.fp -mode BWA_plus -q 25 -min_L 50 -n 2 -lc 0.70 -t 10 -d test

are read1.fq _read2.fq located in phix_removal_1 or trimmed_data_1 directory ?

read1.fq = phix_removal_1/R1_no_phix.fastq
read2.fq = phix_removal_1/R2_no_phix.fastq

I would like to understand if it is a problem related to the FaQCs installation or to an error during the previous MetaShot steps.

FaQCs.pl -p phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq -mode BWA_plus -q 25min_L 50 -n 2 -lc 0.70 -t 10 -d test
Bwa extension trimming algorithm is used.
Processing phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 214918 reads with Overall quality 37.67
(140.96, 20.63, 151.0, 151, 50)

seems to work but very slowly whereas my input are not very big.

Why not to use a more powerfull and fastest tool than 'FaQCs' such as 'fastp' (https://github.com/OpenGene/fastp) ?

I will consider fastp in the new version.
By the way, which perl version are you using?

sorry the FaQCs command failed:

FaQCs.pl -p phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq -mode BWA_plus -q 25 -min_L 50 -n 2 -lc 0.70 -t 10 -d test
Bwa extension trimming algorithm is used.
Processing phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 214918 reads with Overall quality 37.67
(140.96, 20.63, 151.0, 151, 50)
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 2215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 1995530 reads with Overall quality 37.50
(138.30, 22.33, 150.0, 151, 50)
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.

perl -v
This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
(with 85 registered patches, see perl -V for more detail)

... micro_candidates files are not created:

psutil.NoSuchProcess: psutil.NoSuchProcess no process found with pid 15994
Feb 22 15:53:59 ..... started STAR run
Feb 22 15:53:59 ..... loading genome
correctly formatted read_list file
Mapping on the genome
Feb 22 15:54:13 ..... started STAR run
Feb 22 15:54:13 ..... loading genome
Feb 22 15:54:13 ..... started mapping
Feb 22 15:59:35 ..... finished successfully
sam parsing
HOST MAPPING IS DONE
1 1
Feb 22 15:59:37 ..... started STAR run
Feb 22 15:59:37 ..... loading genome
the indicated /data/prov/MetaShot/trimmed_data_1/R1_micro_candidates.fastq fastq file doesn't exist
This script works in two different steps:
(1) it maps the sequencing data against the Bacterial division;
(2) it extract the mapped pairs and tries to map them on the human reference;
Options:
-i paired-end file list: a line containing the R1, the R2 files [MANDATORY]
-p Number of threads/processors available [MANDATORY] -r reference path [MANDATORY]
-s script path path [MANDATORY]
-h print this help.
Usage:
python2.7 division_analyser -i read_list

It may be related to the perl version. I'm using the version 5.26.

--
it's annoying to have to update several tools (perl, python...) for the program to work properly and especially when you're not the machine administrator.
It would have to be packaged in a conda environment that would be functional.
But it may be heavy to deploy.

Currently, we are engaged to develop the new version. This means that we have planned to solve all the current MetaShot limitation in order to facilitate its usage. I'm sorry if you're unable to properly configure you're environment.

ok.
Have you an idea why R1_micro_candidates.fastq and R2_micro_candidates.fastq are not created ?
I have the impression that the processes mix and communicate badly with each other.

Probably it is related to the fact there are no trimmed files.
If you send me your input files I can check them.

I've modified all the scripts and updated the FaQCs version. Please have a look at the User Guide.

--Hi,
ok thank you so much for this update.
I'm going to try this new version as soon as possible.
Is it updated to run with python 3 and FaQCs v1.34 ?

Thank you --

FaQCs 2.08
Python 2.7

now i need to upgrade my FaQCs version, don't you ?

Yes, just install it by using conda:
conda install faqcs -c bioconda

is it possible to install it directly from a github (https://github.com/chienchi/FaQCs) ?

I've tested the conda version. Actually, I don't know if the GitHub version works.

it's strange, i don't find version 2.08 from github.

ok, thank you so much for the link, i'm going to install this new version.