IndexError: list index out of range
Closed this issue · 4 comments
I’m attempting to get the DOCKQ score of a model of CAPRI target #50, from the score_set dataset. The model is named Target50_0000.pdb and the correct crystal structure is named Target50_3r2x.pdb. Both are attached (but with the extention txt added, as pdb files aren't allowed to be uploaded by github) here:
Target50_0000.pdb.txt
Target50_3r2x.pdb.txt
Running
scripts/fix_numbering.pl /path/to/Target50_0000.pdb /path/to/Target50_3r2x.pdb
works fine, but running
python3 DockQ.py /path/to/Target50_0000.pdb.fixed /path/to/Target50_3r2x.pdb -native_chain1 A B -native_chain2 C -model_chain1 A B -model_chain2 C
results in the following error:
Traceback (most recent call last):
File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 732, in <module>
main()
File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 510, in main
model_chains=get_pdb_chains(model)
File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 387, in get_pdb_chains
pdb_struct = pdb_parser.get_structure("reference", pdb)[0]
File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 100, in get_structure
self._parse(lines)
File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 123, in _parse
self.trailer = self._parse_coordinates(coords_trailer)
File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 198, in _parse_coordinates
resseq = int(line[22:26].split()[0]) # sequence identifier
IndexError: list index out of range
I got the same error too, while comparing the native and predicted
python3 DockQ.py ./complex.1.pdb ./5j13_native.pdb -native_chain1 A B -model_chain1 A B -native_chain2 C -model_chain2 C > decoy1.log
Traceback (most recent call last):
File "/home/randd/Desktop/Desktop_Office/October2023/ThirdWeek/dockq/DockQ/DockQ.py", line 732, in <module>
main()
File "/home/path/to/dockq/DockQ/DockQ.py", line 648, in main
info=calc_DockQ(model_fixed,native,use_CA_only)
File "/home/path/to/dockq/DockQ/DockQ.py", line 137, in calc_DockQ
sample_structure = pdb_parser.get_structure("model", model)
File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 100, in get_structure
self._parse(lines)
File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 123, in _parse
self.trailer = self._parse_coordinates(coords_trailer)
File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 198, in _parse_coordinates
resseq = int(line[22:26].split()[0]) # sequence identifier
IndexError: list index out of range
I'm also experiencing this issue--it seems the issue is with the file written by the renumbering step. Biopython is unable to parse this pdb file; for me the file looks like this:
ATOM 8808 HZ PHE B9017X 34.056 40.265 41.472 1.00 0.00 H
ATOM 8809 N THR B 37.287 41.477 35.884 1.00 0.00 N
The second line here, with no resseq entry, causes the problem for Biopython.
Hi, check the new released version of DockQ (v2.0). This works for me now given your attached files. Notice that I had to use the new --allowed_mismatches
flag since the two structures don't have identical sequences:
DockQ ~/Downloads/Target50_0000.pdb.txt ~/Downloads/Target50_3r2x.pdb.txt --allowed_mismatches 4
****************************************************************
* DockQ *
* Scoring function for protein-protein docking models *
* Statistics on CAPRI data: *
* 0.00 <= DockQ < 0.23 - Incorrect *
* 0.23 <= DockQ < 0.49 - Acceptable quality *
* 0.49 <= DockQ < 0.80 - Medium quality *
* DockQ >= 0.80 - High quality *
* Ref: S. Basu and B. Wallner, DockQ: A quality measure for *
* protein-protein docking models *
* doi:10.1371/journal.pone.0161879 *
* For comments, please email: bjorn.wallner@.liu.se *
****************************************************************
Model : /home/claudio/Downloads/Target50_0000.pdb.txt
Native : /home/claudio/Downloads/Target50_3r2x.pdb.txt
Total DockQ over 3 native interfaces: 0.977
Native chains: A, B
Model chains: A, B
DockQ_F1: 0.937
DockQ: 0.950
irms: 0.520
Lrms: 0.883
fnat: 0.969
Native chains: A, C
Model chains: A, C
DockQ_F1: 0.014
DockQ: 0.014
irms: 14.961
Lrms: 47.828
fnat: 0.000
Native chains: B, C
Model chains: B, C
DockQ_F1: 0.013
DockQ: 0.013
irms: 16.792
Lrms: 47.373
fnat: 0.000
Thanks so much! This update looks great; I appreciate the ability to match the sequence differences now!