KULL-Centre/PRISM

PDB download error

Closed this issue · 3 comments

I'm not sure why it thinks the PDB ID is LEGA - that's the first 4 letters of the output directory, but -s and -o should have sorted that out.

run_ddG_pipeline_dev_amelie -s ./1PGA.pdb -o legacy_no_opt_pro -i create --overwrite_path true
current env paths & exec: /groups/sbinlab/software/muscle/muscle3.8.31_i86linux64 /groups/sbinlab/amelie/projects/PRISM/PRISM/software/rosetta_ddG_pipeline/ /groups/sbinlab/software/Rosetta_2020_July_dc83fa/source/ /sbinlab/software/Rosetta_tools/tools/ /groups/sbinlab/software/Rosetta_2020_July_dc83fa/database/ linuxgccrelease /sbinlab/tiemann/repos/PRISM/prism/scripts
WARNING:root:Directory legacy_no_opt_pro already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/logs already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/input already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/relax already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/ddG already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/output already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/analysis already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare/input already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare/mutfiles already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare/cleaning already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare/checking already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/prepare/output already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/relax/input already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/relax/run already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/relax/output already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/ddG/input already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/ddG/run already exists. overwrite_path set to True, so we will use this path.
WARNING:root:Directory legacy_no_opt_pro/ddG/output already exists. overwrite_path set to True, so we will use this path.
2020-07-03 13:11:32 - INFO - Preparation started
INFO:Pipeline_logger:Preparation started
2020-07-03 13:11:32 - INFO - Creating structure instance
INFO:Pipeline_logger:Creating structure instance
Special residues in structure = 0
Special residues in structure = 0
2020-07-03 13:11:32 - INFO - Prepare the pdb and extract fasta file
INFO:Pipeline_logger:Prepare the pdb and extract fasta file
2020-07-03 13:11:32 - INFO - Running clean_pdb.py script
INFO:Pipeline_logger:Running clean_pdb.py script
File for legacy_no_opt_pro/prepare/input/input.pdb doesn't exist, downloading from internet.
wget --quiet http://www.rcsb.org/pdb/files/LEGA.pdb.gz -O /lustre/hpc/sbinlab/amelie/projects/PRISM/ddg_benchmark/legacy_no_opt_pro/prepare/cleaning/LEGA.pdb.gz

gzip: /lustre/hpc/sbinlab/amelie/projects/PRISM/ddg_benchmark/legacy_no_opt_pro/prepare/cleaning/LEGA.pdb.gz: unexpected end of file
LEGA A 0 --- --- --- --- BAD
2020-07-03 13:11:32 - INFO - end of output from clean_pdb.py
INFO:Pipeline_logger:end of output from clean_pdb.py
Traceback (most recent call last):
File "/groups/sbinlab/amelie/projects/PRISM/PRISM/software/rosetta_ddG_pipeline/run_pipeline.py", line 279, in
predict_stability(args)
File "/groups/sbinlab/amelie/projects/PRISM/PRISM/software/rosetta_ddG_pipeline/run_pipeline.py", line 115, in predict_stability
structure_instance.path_to_cleaned_pdb, struc_dic_cleaned = structure_instance.clean_up_and_isolate()
File "/lustre/hpc/sbinlab/amelie/projects/PRISM/PRISM/software/rosetta_ddG_pipeline/structure_input.py", line 59, in clean_up_and_isolate
fasta_lines = open(self.path_to_cleaned_fasta, 'r').readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'legacy_no_opt_pro/prepare/cleaning/input_A.fasta'

seems it tried to download a file... could you try the run again with providing a full-length path to the file? if I remember correctly Anders was not done yet with converting file paths to save-mode

Seems that clean_pdb.py couldnt find the pdb and therefore tried to download it. I am unsure why the errors occurs and I cant seem to replicate it

This issue should be solved. An automatic download of pdb-ids is not possible anymore and also not recommended. We highly recommend to pre-process first the pdb before handing it to the pipeline (e.g. with PDB-redo) to reduce errors.