[stability-pipeline] residue missmatch when pdb is cleaned

Question

[stability-pipeline] residue missmatch when pdb is cleaned

Closed this issue 2 years ago · 11 comments

If there are missing atoms of residues, PDB-clean will remove those (e.g. residues 22-28 in the example). The check uses the pre-cleaned info, so a mismatch may arise. This was observed using a prism-file as input for an MP call - likely also occurs for soluble proteins.

Bug in the main pipeline. Error rises for check:

File "/lustre/hpc/sbinlab/tiemann/repos/PRISM/PRISM/software/rosetta_ddG_pipeline/structure_input.py", line 224, in make_mutfiles
check = self.fasta_seq[residue_number_ros-1] in list(
IndexError: string index out of range

with previous mismatches:

INFO:Pipeline_logger:Convert prism file: /groups/sbinlab/tiemann/projects/PRISM/debug-files/issue24/output/input/prism_mave_input.txt
2020-09-28 11:28:26 - WARNING - MissmatchK, 57,W
WARNING:Pipeline_logger:MissmatchK, 57,W
2020-09-28 11:28:26 - WARNING - MissmatchG, 143,W
WARNING:Pipeline_logger:MissmatchG, 143,W

Problem during the conversion of prism to mut-file: uses wrongly aligned converter (from checking, not clean).

Example call:

python /groups/sbinlab/tiemann/repos/PRISM/PRISM/software/rosetta_ddG_pipeline/run_pipeline.py \
    --structure /groups/sbinlab/tiemann/projects/PRISM/debug-files/issue36/1bxw-clean.pdb \
    --mutate_mode prism \
    --prism /groups/sbinlab/tiemann/projects/PRISM/debug-files/issue36/prism_mave_103_OmpA_ecoli_unfolding_dG-stability_MP_tmp.txt \
    --outputpath /groups/sbinlab/tiemann/projects/PRISM/debug-files/issue36/output \
    --mode fullrun --chainid A --is_membrane True --mp_calc_span_mode DSSP --mp_align_ref 1bxw_A \
    --mp_prep_align_mode OPM --benchmark_mp_repack 8.0 --benchmark_mp_repeat 5 --benchmark_mp_relax_repeat 1 \
    --benchmark_mp_relax_strucs 1 --slurm_partition sbinlab --overwrite_path True

Answer 1 · 2020-09-28T09:34:51.000Z

@andershbf likely needs a discussion about checks. could avoid this using refined pdb-files (e.g. from pdb redo).

Answer 2 · 2020-09-28T09:37:32.000Z

Actually it should only remove residues where backbone atoms are missing - could you check what exactly happens?
In any case, the safest is to keep a record of the coordinate sequence as Rosetta reads it, so e.g. after relax, and use that for resfile generation.

Answer 3 · 2020-09-28T10:22:31.000Z

I didn't check for all but it def residues where the bb is there. @andershbf did the check-script, so it makes more sense he looks into that and removes/adds those lookup-dicts which are not used/wrong, .... happy to assist/discuss what best to do!

Answer 4 · 2020-09-28T11:07:44.000Z

Thanks. It's sort of an independent issue, but if there are residues removed you think are fine (well-defined bb atoms) that seems wrong. Could you post this in Rosetta Slack, with a specific example?

Answer 5 · 2020-09-30T08:24:23.000Z

I would like to check with Anders or with more time myself first if it's not somewhere due to something within our pipeline.

Answer 6 · 2020-09-30T08:32:12.000Z

sure - doesn't seem urgent, especially so long we don't have external users. Just look into it whenever it becomes a problem.

Answer 7 · 2020-09-30T08:37:07.000Z

It might become an unsupervised problem - so everyone who uses the pipeline should check their relax/output structure!

Answer 8 · 2020-09-30T08:50:13.000Z

That's generally true, certainly at this stage :) Please put such a note in the README if it's not there already.
It might be good to have the pipeline write the sequence of the coordinates after relax to an easily checked location.

Answer 9 · 2020-09-30T09:10:15.000Z

Will do!
A sequence file is written but not one which aligns with the input sequence - so you don't see easily if something is missing.

Add note to readme
Make alignment in-out pdb sequences

Answer 10 · 2020-09-30T09:40:36.000Z

Thanks! I think it's fine to just have the plain sequence, then one can quickly check if it's identical - if yes, all is fine (wrt mutfiles and such at least). If not, what to do will depend on what exactly the issue is, which we will probably need to figure out on a case-by-case basis for now.

Answer 11 · 2022-03-10T12:03:00.000Z

Issue is solved - missing atoms are added, we have a checking of the initial sequence alignment and final output (via the prism parser) and a note in README is posted.