ParkinsonLab/MetaPro

output errors

Closed this issue · 9 comments

mimhmw commented

I ran the code below as tutorial mode, but there are errors as below. Could you figure out what the problem is?

python3 /pipeline/MetaPro.py -c /pipeline/Config.ini -s /input/PF_TruSeq.fastq --no-host --tutorial output -o /output
.
.
.
making Taxa summary
Traceback (most recent call last):
File "/pipeline/Scripts/output_table_v3.py", line 186, in
rank_name.append(names_dict[taxid])
KeyError: '2'
Reformat RPKM for EC heatmap
Traceback (most recent call last):
File "/pipeline/Scripts/output_reformat_rpkm_table.py", line 12, in
with open(input_rpkm, "r") as rpkm_file:
FileNotFoundError: [Errno 2] No such file or directory: '/output/outputs/final_results/RPKM_table.tsv'
2023-08-09 20:12:35.180214 running: output_network_generation
2023-08-09 20:12:35.180310 running: output_read_count
2023-08-09 20:12:35.182646 running: output_per_read_scores
2023-08-09 20:12:35.184891 running: output_ec_heatmap
2023-08-09 20:12:35.187970 output report phase 3 launched. waiting for sync
2023-08-09 20:12:35.188069 closing down processes: 3
2023-08-09 20:12:35.180455 generating read count table
collecting per-read quality
2023-08-09 20:12:35.185002 forming EC heatmap
Traceback (most recent call last):
File "/pipeline/Scripts/output_EC_metrics.py", line 40, in
super_df = pd.read_csv(pathway_superpathway_file, sep = ',', skip_blank_lines = False)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in init
self._make_engine(self.engine)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: '/database/path_to_superpath/pathway_to_superpathway.csv'
2023-08-09 20:14:04.020475 running: output_read_count
2023-08-09 20:14:04.020641 running: output_ec_heatmap
2023-08-09 20:14:04.020733 running: output_per_read_scores
Outputs: 122.7 s
Outputs cleanup: 0.0 s

Dear MetaPro developer,

I have the exact same issue when running the whole pipeline using the mouse tutorial dataset and the databases using (lib_downloader.py).

python3 /pipeline/MetaPro.py -c /meta_tut/config_mouse.ini -s /meta_tut/tutorial/mouse1.fastq -o /meta_tut/tutorial/231116_mouse_trial

Error that I get:

2023-11-15 09:59:08.690826 running: output_taxa_groupby
Generating RPKM and Cytoscape network
making Taxa summary
Traceback (most recent call last):
File "/pipeline/Scripts/output_table_v3.py", line 186, in
rank_name.append(names_dict[taxid])
KeyError: '2'
Reformat RPKM for EC heatmap
Traceback (most recent call last):
File "/pipeline/Scripts/output_reformat_rpkm_table.py", line 12, in
with open(input_rpkm, "r") as rpkm_file:
FileNotFoundError: [Errno 2] No such file or directory: '/meta_tut/231115_trial/outputs/final_results/RPKM_table.tsv'
2023-11-15 09:59:08.691932 output report phase 2 launched. waiting for sync
2023-11-15 09:59:08.691985 closing down processes: 2
2023-11-15 09:59:08.691998 closed down: 0/2
2023-11-15 09:59:12.253750 closed down: 1/2
2023-11-15 09:59:12.254583 running: output_network_generation
2023-11-15 09:59:12.254880 running: output_read_count
2023-11-15 09:59:12.258975 running: output_per_read_scores
2023-11-15 09:59:12.262682 running: output_ec_heatmap
2023-11-15 09:59:12.255344 generating read count table
collecting per-read quality
2023-11-15 09:59:12.262799 forming EC heatmap
Traceback (most recent call last):
File "/pipeline/Scripts/output_EC_metrics.py", line 78, in
rpkm_df = pd.read_csv(rpkm_table_file, sep = '\t', skip_blank_lines = False)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in init
self._make_engine(self.engine)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

Is there any way this could be fixed?

Looking forward to your answer.

Kind regards,
Morten Schostag - DTU

Hi, there seems to be something funny happening in your table.
The error is having issues with a taxid 2 (bacteria).
The problem is: whatever file you're using for names.dmp can't find the key of "2".
What does your config look like? (specifically, what file are you using for the names: under [Databases])?

Dear Billy,

Thanks for replying. For the names.dmp I have used the names_wevote.dmp. Just like in the example that you have uploaded on github. Is this wrong??

I have attached the config file
config_low_mem.txt

Thanks for looking into this.

Kind regards,
Morten

walking through the checklist here:

your names_wevote.dmp is located in /meta_tut/databases/WEVOTE_db/names_wevote.dmp?

What does your singularity call look like? Need to know if it's simply not detecting the wevote file.

the brief explanation is: the outputs is having trouble looking for taxid 2, when it's trying to find it from names_wevote.dmp.
And the only way it'll run into this error is if names_wevote doesn't have 2. but that's bacteria.
So, something is off with the import of names, which is improbable since it's the file we supplied, so I'm following the chain.

your names_wevote.dmp is located in /meta_tut/databases/WEVOTE_db/names_wevote.dmp?

  • Yes
    What does your singularity call look like?
    I use Docker. First I log in:
    docker run -it -v /mnt/raid2/mdesc/:/meta_tut parkinsonlab/metapro
    Then I ran this command:
    python3 /pipeline/MetaPro.py -c /meta_tut/config_low_mem.ini -1 /meta_tut/DSM2_10_3_Unmapped.out_F.fq -2 /meta_tut/DSM2_10_3_Unmapped.out_R.fq -o /meta_tut/231123_no_host_reads --nhost

I have also tried using the Mouse tutorial data, however I have downloaded all the database from your server. Again I get the same error.

But there seems to be a difference in the two files names_wevote.dmp and names.dmp that you provide. Is there a reason that there is the two files? there is also a nodes.dmp and a nodes_wevote.dmp. so which one to choose?

  1. I'm not seeing duplicates of names_wevote, and names. What version of the pipeline are you using?

  2. names.dmp, and its counterpart nodes.dmp are created by NCBI as a part of their taxonomy dump

  3. names_wevote.dmp is the same file. but the special name is for WEVOTE. <which we'll be getting rid of in a future version. I need time to polish the next version>

  4. You could pull your own copy of names.dmp and nodes.dmp, but WEVOTE would need them named accordingly, and you would have to restart taxonomic classification.

  5. Are you able to confirm that taxonomic classification ran with no issues? <specifically wevote. if it can't sense names_wevote.dmp, then I have a suspicion that the failure is upstream too.>

  6. Are you able to confirm that /mnt/raid2/mdesc contains a folder called WEVOTE_db/names_wevote.dmp? <I'm seeing a bunch of issues stemming from a bad bind-mount from other users.>

  1. I'm using the latest version.
    docker run -it -v /mnt/raid2/mdesc/:/meta_tut parkinsonlab/metapro:latest
    1-3. Okay, thanks for the info
  2. Everything ran perfectly except the last part of the pipeline. See the attached stderr and stdout file
    output_231115_trial.txt
  3. Yes. Here is the folder content.
    6.root@b1674dc3fabf:/# ls -lh meta_tut/databases/WEVOTE_db/
    total 577M
    -rw-r--r-- 1 1714099238 132000513 17M Jul 19 2021 citations.dmp
    -rw-r--r-- 1 1714099238 132000513 3.7M Jul 19 2021 delnodes.dmp
    -rw-r--r-- 1 1714099238 132000513 442 Jul 19 2021 division.dmp
    -rw-r--r-- 1 1714099238 132000513 15K Jul 19 2021 gc.prt
    -rw-r--r-- 1 1714099238 132000513 4.5K Jul 19 2021 gencode.dmp
    -rw-r--r-- 1 1714099238 132000513 907K Jul 19 2021 merged.dmp
    -rw-r--r-- 1 1714099238 132000513 151M Jul 19 2021 names.dmp
    -rw-r--r-- 1 1714099238 132000513 103M Jul 19 2021 names_wevote.dmp
    -rw-r--r-- 1 1714099238 132000513 118M Jul 19 2021 nodes.dmp
    -rw-r--r-- 1 1714099238 132000513 118M Jul 19 2021 nodes_wevote.dmp
    -rw-r--r-- 1 1714099238 132000513 2.6K Jul 19 2021 readme.txt

Everything was downloaded through your script: lib_downloader.py

I just tried changing from names_wevote.dmp to names.dmp, and the same for nodes_wevote.dmp to nodes.dmp, and now it works fine. So there most be something with the nodes_wevote.dmp names_wevote.dmp files coming from https://compsysbio.org/metapro_libs/.

thanks for the info! I'll fix that