metashot failed when running into python env

Question

metashot failed when running into python env

Closed this issue 5 years ago · 30 comments

--Hi,

psutil functions crashed when they are called from python virtualenv:

source ~/metashot_env/bin/activate
python2.7 ./MetaShot_Master_script.py -p ./parameters_file.txt -m ./filelist.txt

['/home/RAW_DATA/63_R1.fastq.gz', '/home/RAW_DATA/63_R2.fastq.gz']
/home/data/MetaShot
/home/data/MetaShot/MetaShot_reference_data
/home/data/MetaShot/MetaShot_reference_data/Bacteria/bacterial_bowtie_index_reference.lst
correctly formatted read_list file
fastq control
QC.1.trimmed.fastq
QC.2.trimmed.fastq
START 22/02/2019 10:09:46
Traceback (most recent call last):
File "/home/data/MetaShot/find_microbiome.py", line 217, in
process_status = pid_status( proc_id )
File "/home/data/MetaShot/find_microbiome.py", line 123, in pid_status
status = psutil.Process( process_pid ).status( )
File "/home/metashot_env/lib/python2.7/site-packages/psutil/init.py", line 438, in init
self._init(pid)
File "/home/metashot_env/lib/python2.7/site-packages/psutil/init.py", line 478, in _init
raise NoSuchProcess(pid, None, msg)
psutil.NoSuchProcess: psutil.NoSuchProcess no process found with pid 6293
Feb 22 10:25:05 ..... started STAR run
Feb 22 10:25:05 ..... loading genome
correctly formatted read_list file
Mapping on the genome
Feb 22 10:25:05 ..... started STAR run
Feb 22 10:25:05 ..... loading genome
Feb 22 10:25:05 ..... started mapping
Feb 22 10:30:26 ..... finished successfully
sam parsing
HOST MAPPING IS DONE
1 1
Feb 22 10:30:27 ..... started STAR run
Feb 22 10:30:27 ..... loading genome
the indicated /usr/local/data/prov/MetaShot/trimmed_data_1/R1_micro_candidates.fastq fastq file doesn't exist
This script works in two different steps:
(1) it maps the sequencing data against the Bacterial division;
(2) it extract the mapped pairs and tries to map them on the human reference;
Options:
-i paired-end file list: a line containing the R1, the R2 files [MANDATORY]
-p Number of threads/processors available [MANDATORY] -r reference path [MANDATORY]
-s script path path [MANDATORY]
-h print this help.
Usage:
python2.7 division_analyser -i read_list

/home/data/MetaShot/MetaShot_reference_data/Virus/bowtie_index/Viral_division

Answer 1 · 2019-02-22T10:21:51.000Z

Could you please describe how your virtualenv environment has been created?
It seems an error associated with the process id and not the module load.

Answer 2 · 2019-02-22T10:30:03.000Z

the problem is that the MetaShot pipeline uses python2.7,
in 1 year python2.7 will no longer be maintained. Today most machines use python3.
Why not to update code for Python 3 ?

python2.7 -m virtualenv ~/metashot_env
python2.7 -m pip install numpy
python2.7 -m pip install pysam
python2.7 -m pip install psutil
python2.7 -m pip install biopython

source ~/metashot_env/bin/activate
python2.7 ./MetaShot_Master_script.py -p./parameters_file.txt -m./filelist.txt
deactivate

Answer 3 · 2019-02-22T10:32:29.000Z

We are working to a new version of MetaShot and it will be written in Python 3.
Regarding your env I suggest to use conda to create it.

Answer 4 · 2019-02-22T10:42:10.000Z

yes, i'm going to try with conda.

Answer 5 · 2019-02-22T10:54:17.000Z

I installed old versions of numpy and pysam with python2.7 and now it's ok.
This kind of manipulation is not obvious to all users.
The best would be to create a conda package dedicated to the installation of MetaShot and make it available with your future update.

Answer 6 · 2019-02-22T10:57:34.000Z

I just saw that there is a problem with the call from FaQCs.pl:
Bwa extension trimming algorithm is used.
Processing /usr/local/data/prov/MetaShot/phix_removal_1/R1_no_phix.fastq /usr/local/data/prov/MetaShot/phix_removal_1/R2_no_phix.fast
q file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.

Answer 7 · 2019-02-22T11:09:22.000Z

It may be a problem related to the FaQCs installation. Which version are you using?

Answer 8 · 2019-02-22T11:10:16.000Z

Version: 1.34

Answer 9 · 2019-02-22T11:12:54.000Z

Please execute FaQCs.pl externally MetaShot. Let me know if it works or not.
FaQCs.pl -p read1.fq read2.fp -mode BWA_plus -q 25 -min_L 50 -n 2 -lc 0.70 -t 10 -d test

Answer 10 · 2019-02-22T11:15:21.000Z

are read1.fq _read2.fq located in phix_removal_1 or trimmed_data_1 directory ?

Answer 11 · 2019-02-22T11:16:46.000Z

read1.fq = phix_removal_1/R1_no_phix.fastq
read2.fq = phix_removal_1/R2_no_phix.fastq

I would like to understand if it is a problem related to the FaQCs installation or to an error during the previous MetaShot steps.

Answer 12 · 2019-02-22T11:32:46.000Z

FaQCs.pl -p phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq -mode BWA_plus -q 25min_L 50 -n 2 -lc 0.70 -t 10 -d test
Bwa extension trimming algorithm is used.
Processing phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 214918 reads with Overall quality 37.67
(140.96, 20.63, 151.0, 151, 50)

seems to work but very slowly whereas my input are not very big.

Why not to use a more powerfull and fastest tool than 'FaQCs' such as 'fastp' (https://github.com/OpenGene/fastp) ?

Answer 13 · 2019-02-22T11:42:08.000Z

I will consider fastp in the new version.
By the way, which perl version are you using?

Answer 14 · 2019-02-22T15:26:20.000Z

sorry the FaQCs command failed:

FaQCs.pl -p phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq -mode BWA_plus -q 25 -min_L 50 -n 2 -lc 0.70 -t 10 -d test
Bwa extension trimming algorithm is used.
Processing phix_removal_1/R1_no_phix.fastq phix_removal_1/R2_no_phix.fastq file
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 214918 reads with Overall quality 37.67
(140.96, 20.63, 151.0, 151, 50)
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
Processed 2215398/2215398
Post Trimming Length(Mean, Std, Median, Max, Min) of 1995530 reads with Overall quality 37.50
(138.30, 22.33, 150.0, 151, 50)
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.

perl -v
This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
(with 85 registered patches, see perl -V for more detail)

Answer 15 · 2019-02-22T15:29:12.000Z

... micro_candidates files are not created:

psutil.NoSuchProcess: psutil.NoSuchProcess no process found with pid 15994
Feb 22 15:53:59 ..... started STAR run
Feb 22 15:53:59 ..... loading genome
correctly formatted read_list file
Mapping on the genome
Feb 22 15:54:13 ..... started STAR run
Feb 22 15:54:13 ..... loading genome
Feb 22 15:54:13 ..... started mapping
Feb 22 15:59:35 ..... finished successfully
sam parsing
HOST MAPPING IS DONE
1 1
Feb 22 15:59:37 ..... started STAR run
Feb 22 15:59:37 ..... loading genome
the indicated /data/prov/MetaShot/trimmed_data_1/R1_micro_candidates.fastq fastq file doesn't exist
This script works in two different steps:
(1) it maps the sequencing data against the Bacterial division;
(2) it extract the mapped pairs and tries to map them on the human reference;
Options:
-i paired-end file list: a line containing the R1, the R2 files [MANDATORY]
-p Number of threads/processors available [MANDATORY] -r reference path [MANDATORY]
-s script path path [MANDATORY]
-h print this help.
Usage:
python2.7 division_analyser -i read_list

Answer 16 · 2019-02-22T15:29:51.000Z

It may be related to the perl version. I'm using the version 5.26.

Answer 17 · 2019-02-22T15:34:27.000Z

--
it's annoying to have to update several tools (perl, python...) for the program to work properly and especially when you're not the machine administrator.
It would have to be packaged in a conda environment that would be functional.
But it may be heavy to deploy.

Answer 18 · 2019-02-22T15:40:00.000Z

Currently, we are engaged to develop the new version. This means that we have planned to solve all the current MetaShot limitation in order to facilitate its usage. I'm sorry if you're unable to properly configure you're environment.

Answer 19 · 2019-02-22T15:46:14.000Z

ok.
Have you an idea why R1_micro_candidates.fastq and R2_micro_candidates.fastq are not created ?
I have the impression that the processes mix and communicate badly with each other.

Answer 20 · 2019-02-22T15:47:20.000Z

Probably it is related to the fact there are no trimmed files.
If you send me your input files I can check them.

Answer 21 · 2019-03-22T15:23:45.000Z

I've modified all the scripts and updated the FaQCs version. Please have a look at the User Guide.

Answer 22 · 2019-03-22T15:38:55.000Z

--Hi,
ok thank you so much for this update.
I'm going to try this new version as soon as possible.
Is it updated to run with python 3 and FaQCs v1.34 ?

Thank you --

Answer 23 · 2019-03-22T15:40:21.000Z

FaQCs 2.08
Python 2.7

Answer 24 · 2019-03-22T15:43:10.000Z

now i need to upgrade my FaQCs version, don't you ?

Answer 25 · 2019-03-22T15:44:23.000Z

Yes, just install it by using conda:
conda install faqcs -c bioconda

Answer 26 · 2019-03-22T15:46:39.000Z

is it possible to install it directly from a github (https://github.com/chienchi/FaQCs) ?

Answer 27 · 2019-03-22T15:47:56.000Z

I've tested the conda version. Actually, I don't know if the GitHub version works.

Answer 28 · 2019-03-22T15:59:18.000Z

it's strange, i don't find version 2.08 from github.

Answer 29 · 2019-03-22T16:04:00.000Z

https://github.com/LANL-Bioinformatics/FaQCs

Answer 30 · 2019-03-22T19:43:16.000Z

ok, thank you so much for the link, i'm going to install this new version.