keitaroyam/servalcat

Consecutive non-neighbouring nucleotides get joined (when part of the sequence is not modelled)

Closed this issue · 20 comments

Hi,

We are routinely using servalcat for our refinements and modelling tasks, so far the experience has been great!

One thing I noticed though is that nucleotides from the same chain get connected when there is a gap in between (when we have removed part of the sequence). This does not happen for protein chains, so I am assuming its maybe a bug? In the case shown in the picture the residues were deliberately quite far away before the refinement, so the refinement really moved one nucleotide in to do this (I initially assumed that close nucleotides get merged in this way).

nucleotide-joining

Bonus question: Is there a way to link the nascent chain to the tRNA (which we often encounter in our ribosome structures) and refine them together? It would be super nice if these could also be moved in Coot, but just having a refined model for deposition with the link intact would already help us tremendously.

Thanks for reading and best wishes,

Helge

Thank you for your feedback!

One thing I noticed though is that nucleotides from the same chain get connected when there is a gap in between (when we have removed part of the sequence).

This is quite strange. Perhaps your pdb file has a LINK record (_struct_conn record in case of mmcif) for these residues? What is the distance between O3' and P before the refinement? Please also let me know the version of Refmac5 and Servalcat.

Is there a way to link the nascent chain to the tRNA

You mean aminoacylation, right? Actually another person requested this recently. I think we should add this link to the monomer library. For now, could you try this file? AA-RNA_link.zip You can give it to servalcat via --ligand and read it into Coot from "File - Import CIF dictionary". You need Coot 0.9.8.4 (just recently released, as part of CCP4 8.0.005).

Hi,

I checked the .pdb file and could not find LINK records for these residues. I have also deleted them one by one and the effect is the same, so I think its not related to this?

I am using servalcat 0.2.116 and CCP4 8.0.001: Refmac version 5.8.0350 : 05/11/22

For the distances I have checked two locations where this is happening:

  1. 14.76A distance before refinement.

Before Refinement:
1-before

After Refinement:
1-after

  1. 16.02A before refinement.

Before Refinement:
2-before

After Refinement:
2-after

For the aminoacylation: Thanks for the restraint file! We will test this out (I have Coot 0.9.6 on my work laptop, as the never version did not appear to have any restraints, even for standard nucleotides...). I will check with some colleagues who run the newest Coot version.

That Refmac version had a bug in linking. It should not have happened even with that bug, but it is worth trying the latest version. So please apply the latest CCP4 update.

If it will not solve your problem, could you share some files here or privately? Servalcat/Refmac log files (servalcat.log and shifted_refined.log) and part of model files before/after the refinement (all header part and ~four residues) would be sufficient.

I tried updating, but only CCP4 version 8.0.004 was available. Will get back to you when the newer version is released (I managed to get Coot 9.8.3 working now so I assume it will also work once 9.8.4 is installed... then I can also look into the tRNA-NC linking).

Sorry, it seems 8.0.005 has not been officially released yet (it will come soon hopefully). But 8.0.003-005 do not include Refmac updates, so 004 is fine.

I now ran the refinement with Refmac 5.8.0352 and the linking is still occuring. I managed to create a minimal example from the published PDB 7K00 by opening the .cif from the PDB in Coot 0.9.8.3 and deleting three stretches of RNA
Chain A 439-446
Chain a 1724-1731
Chain a 2099-2197

For this model I could not find half-maps, thats why I just used two copies of the deposited map (I assume this does not create problems for the issue at hand?).

After refinement the gaps are merged, as observed before for my own model.

There is output in the log about wrong bond distances for these regions (among other ones inherent in the model):

Bond distance deviations from the ideal >10.000Sigma will be monitored

A 204 G O3' . - A 214 C P . mod.=17.407 id.= 1.606 dev=-15.801 sig.= 0.010 Symmetry flag = N
A 438 U O3' . - A 447 G P . mod.=23.678 id.= 1.606 dev=-22.072 sig.= 0.010 Symmetry flag = N
A 840 C O3' . - A 846 G P . mod.=15.615 id.= 1.606 dev=-14.009 sig.= 0.010 Symmetry flag = N
K 119 IAS C . - K 120 GLY N . mod.= 4.468 id.= 1.337 dev= -3.131 sig.= 0.011 Symmetry flag = N
Y 15 G O3' . - Y 18 G P . mod.=10.132 id.= 1.606 dev= -8.526 sig.= 0.010 Symmetry flag = N
a 276 U O3' . - a 426 C P . mod.=33.879 id.= 1.606 dev=-32.273 sig.= 0.010 Symmetry flag = N
a 1052 C O3' . - a 1107 G P . mod.=17.840 id.= 1.606 dev=-16.234 sig.= 0.010 Symmetry flag = N
a 1172 C O3' . - a 1177 G P . mod.=15.179 id.= 1.606 dev=-13.573 sig.= 0.010 Symmetry flag = N
a 1723 G O3' . - a 1732 C P . mod.=17.715 id.= 1.606 dev=-16.109 sig.= 0.010 Symmetry flag = N
a 2098 U O3' . - a 2191 A P . mod.=18.113 id.= 1.606 dev=-16.507 sig.= 0.010 Symmetry flag = N
a 6016 MG MG . - k 99 ASN OD1 . mod.= 2.299 id.= 1.990 dev= -0.309 sig.= 0.020 Symmetry flag = N
a 6113 MG MG . - d 115 GLY O . mod.= 2.243 id.= 1.990 dev= -0.253 sig.= 0.020 Symmetry flag = N
l 81 4D4 CZ . - l 81 4D4 NH2 . mod.= 1.432 id.= 1.322 dev= -0.110 sig.= 0.010 Symmetry flag = N

You can find the models before and after refinement plus the logs and the map here:
https://wolke.chemie.uni-hamburg.de/s/Ysid5jp4iJoP5zi

Thank you very much! Very useful. We will look into it. Let me check one thing - in your case you used pdb file (not mmcif) as input?

By the way, when (only when!) half maps are not available, you can use --map instead of --halfmaps. Giving two identical maps to --halfmaps should cause problems (I should add code to check this).

Let me check one thing - in your case you used pdb file (not mmcif) as input?

Yes I did, although I checked once using .cif (by saving it in Coot as such). In my model I have chains with two-letter chain IDs, this was a problem for phenix cryo-EM vaildation where I had to supply the .cif file output from servalcat. Its on the list to change the chain IDs to one letter down the line.

Thank you again for looking into the issue, I hope its nothing too complicated.

OK thanks. I think we can provide a solution shortly. I thought phenix supported PDB with two-letter chain IDs (called hybrid-36 extension) so it sounds strange.. By the way, Servalcat internally converts PDB to mmcif in case two-letter chain IDs are there, because Refmac does not support them. This is just an internal thing you do not have to worry (FYI).

Now CCP4 8.0.005 and Coot 0.9.8.5 are released, so I hope aminoacylation works with the cif file above.

I added a workaround in Servalcat 0.2.132. It is not available from PyPI, so you need to install it from github:

pip install -U git+https://github.com/keitaroyam/servalcat.git

Could you test it with your structure? This Refmac bug happened when

  • mmcif file is given to Refmac (that includes Refmac-incompatible pdb file as explained above), and
  • polymer chain includes non-polymer residue (in 7k00 test case, PAR residue in chain A)

so I added a workaround to change chain IDs of non-polymer residues and change them back after Refmac.

One more thing: 7k00 has IAS and it seems to cause another problem in Refmac. This can be avoided by defining "gap" link. If PDB file, just put this to header:

LINKR            IAS K 119                     GLY K 120                gap

If mmcif file, following kind of thing is needed.

loop_
_struct_conn.id
_struct_conn.conn_type_id
_struct_conn.ptnr1_auth_asym_id
_struct_conn.ptnr1_label_asym_id
_struct_conn.ptnr1_label_comp_id
_struct_conn.ptnr1_label_seq_id
_struct_conn.ptnr1_label_atom_id
_struct_conn.pdbx_ptnr1_label_alt_id
_struct_conn.ptnr1_auth_seq_id
_struct_conn.pdbx_ptnr1_PDB_ins_code
_struct_conn.ptnr1_symmetry
_struct_conn.ptnr2_auth_asym_id
_struct_conn.ptnr2_label_asym_id
_struct_conn.ptnr2_label_comp_id
_struct_conn.ptnr2_label_seq_id
_struct_conn.ptnr2_label_atom_id
_struct_conn.pdbx_ptnr2_label_alt_id
_struct_conn.ptnr2_auth_seq_id
_struct_conn.pdbx_ptnr2_PDB_ins_code
_struct_conn.ptnr2_symmetry
_struct_conn.details
_struct_conn.pdbx_dist_value
_struct_conn.ccp4_link_id
? . K K IAS 119 ? ? 119 ? 1_555 K K GLY 120 ? ? 120 ? ? ? ? gap

This bug has been fixed in Refmac5.8.0382 and will be hopefully released from the next CCP4 update.

Unfortunately, with my model the linking is still happening. Here are some excerpts from my servalcat.log:

# Servalcat ver. 0.2.132 (Python 3.10.5)
# Library vers. gemmi 0.5.7, scipy 1.9.0, numpy 1.23.2, pandas 1.4.3
# Started on 2022-10-17 11:20:25.810533
# Host: ws01 User: agwilson
# Command-line args:
# refine_spa --model Coot-76.cif --resolution 2.4 --halfmaps ../data/run_half1_class001_unfil.mrc ../data>/run_half2_class001_unfil.mrc --ncycle 10 --mask_for_fofc ../data/mask.mrc --ligand ../restraints/acedrg_* --weight_auto_scale=0.7

**** Bond distance outliers ****

Bond distance deviations from the ideal >10.000Sigma will be monitored

B 216 LEU C . - B 217 TYR N . mod.= 1.943 id.= 1.337 dev= -0.606 sig.= 0.011 Symmetry flag = N
1 437 G O3' . - 1 495 G P . mod.= 8.388 id.= 1.606 dev= -6.782 sig.= 0.010 Symmetry flag = N
1 1951 C O3' . - 1 2095 G P . mod.= 8.715 id.= 1.606 dev= -7.109 sig.= 0.010 Symmetry flag = N
1 2445 A O3' . - 1 2497 U P . mod.=13.078 id.= 1.606 dev=-11.472 sig.= 0.010 Symmetry flag = N
1 4241 K K . - L3 3 HIS O . mod.= 3.088 id.= 2.790 dev= -0.298 sig.= 0.020 Symmetry flag = N
C4 135 ARG C . - C4 136 ARG N . mod.= 1.094 id.= 1.337 dev= 0.243 sig.= 0.011 Symmetry flag = N

My rRNA chain here has the ID "1", could that be part of the problem?

For the aminoacylation issue: I could not get the new Coot standalone to work on my computer (which is still running Ubuntu 18.04); there seems to be some libraries missing. Normally, I use Coot from CCP4 which always seems to fix these kind of issues for me. Do you know when the new Coot release will be included in CCP4?

Well, chain ID "1" should be ok; I tested changing chain ID to "1" in 7k00 case and it worked. Did you see Refmac workaround (nonpolymer-fix) ... message in servalcat.log?

Have the number of problematic links decreased? (For example it happened in many chains before but now only with chain 1?) If only happened with chain 1, how many non-RNA residues in the same chain? Could you try changing chain ID of non-RNA residues?

Aminoacylation: could you try CCP4 8.0.005? Then Coot 0.9.8.4 will be there.

Did you see Refmac workaround (nonpolymer-fix) ... message in servalcat.log?

I have not seen this message in any run,yet

I managed to set up a minimal example with my own chain1 and a map from the EMDB. First, I took my model and extracted only chain1 from it. If this is refined alone the issues does not arise. Then I consecutively removed the other chains from the original model to see which part causes the issue. As it turns out only keeping one extra atom does it!

I have now refined two models: 1) Just Chain1, as a negative control 2) Chain1 plus 1 atom which shows the issue for me.

You can find the files here:

https://wolke.chemie.uni-hamburg.de/s/6qCQFB7BoMb4KYQ

It seems that with EMDB-26034 the nucleotides are not moved completely on top of each other as before, but they are moved towards each other nevertheless (e. g. C1951-G2095 or G437-G495).

Regarding the Aminoacylation: I have added this LINKR record to my file:

LINKR O3' A Y 76 C VAL V 8 AA-RNA

When I open my model in Coot 0.9.8.4 and load the .cif restraint you provided, the atoms are not linked and move apart, both in chain refine and sphere refine (what is the correct way to do this in Coot? I assume sphere refine?).

I then tried to refine using servalcat and here I see that the link is recognized:

Link detected: AA-RNA atom1= V/VAL 8/C atom2= Y/A 76/O3' dist= 1.37 ideal= 1.28

To prepare the linking, I had already removed the excess atoms before I introdruced the LINKR record, I assume that is why servalcat mentions this:

WARNING: atom(s) not found for link: atom1= A/ -4565266/O3.' atom2= L/VA -4565298/C id=

So in the future, I do not have to manually remove atoms, just adding the LINKR record is enough?

The good news is that the link is still there after servalcat refinement! It just that when I open it in Coot, it does not persist (as mentioned above; A fix by you is in the changelog for Coot 0.9.8.5, maybe this addresses this issue?). Anyway, this is already great news so thank you again for the effort, we really appreciate it!

I really appreciate your detailed feedback! It turned out that my workaround was not sufficient, which was only enabled when mmcif format was given, but in your case the program decided to use mmcif (because of two-letter chain ID) after that procedure. I changed the code and it should work now. Please try the latest Servalcat 0.2.133.

If you are using Linux, could you also try the latest (pre-release) Refmac refmac5.8.0386.zip which includes several bug fixes? You can specify the location of the binary using --exe option of refine_spa (do not forget to add the execute permission after extraction, with chmod +x refmac5.8.0386).

Aminoacylation: You do not have to add LINKR record manually, but need to remove the excess atom (OXT) in any case. In my understanding Coot does not use LINK(R) for restraint generation (just for drawing dotted lines.. I suppose). Most likely your LINKR header is misaligned which is the reason you saw the WARNING message with wrong chain/residue/atom names.

I tested Coot 0.9.8.4 and 0.9.8.5, and found RNA and amino acid should be in the same chain to form the link. In this version, inter-chain disulfide bond problem was fixed, but unfortunately it does not seem to allow inter-chain RNA/amino acid links..

It works now, I get the Refmac workaround (nonpolymer-fix) ... message and the nucleotides stay where they should!
I tried using the new Refmac, but this required a libfortran.so.3 or similar which is not so easy to get under Ubuntu 20.04...

Aminoacylation: I renumbered the nascent chain by an offset of 1000 and then merged it into the tRNA chain, afterwards the refinement in Coot is indeed working! Thank you again for the prompt fixes! I guess I can refine the model this way and then move the nascent chain back to its own chain before deposition?

While looking into the last servalcat.log, I noticed something else that I would like to ask you. It is about restraints that I created according to your earlier suggestion from the PDB Expo .cif files using acedrg. I see now in the servalcat.log that one of these seems to have some bad bond angles, that also do not improve after the final iteration:

eI 51 5CT H23 - eI 51 5CT H24 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410
eI 51 5CT H23 - eI 51 5CT H25 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410
eI 51 5CT H24 - eI 51 5CT H25 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410

I was wondering what I should do here, of if I can ignore this.

Good. Thanks for your patience! I will try to resolve the libfortran issue but hopefully it will be officially released from CCP4 soon..

I guess I can refine the model this way and then move the nascent chain back to its own chain before deposition?

It does not seem unusual to have them in the same chain, even with contiguous residue numbers (e.g. 1vq8 and 5afi). So perhaps you may leave it.

eI 51 5CT H23 - eI 51 5CT H24 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410
eI 51 5CT H23 - eI 51 5CT H25 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410
eI 51 5CT H24 - eI 51 5CT H25 mod.= 0.00 id.= 109.03 dev=109.030 sig.= 2.410

I assume 5CT is a part of polymer chain. H23, H24, and H25 are hydrogen atoms attached to N (nitrogen in main chain), while their names should be H, H2, and H3 in standard "peptide" monomers (and H2 and H3 are removed upon linking, and N becomes sp2). This is why 5CT is classified as NON-POLYMER. In this case, automatic linking between 5CT and other peptides is suboptimal, because programs do not know what to do for these hydrogen atoms. In the current Refmac, generation of link-involved hydrogen atoms does not work properly.

This is not too harmful, although these hydrogen atoms would contribute to geometry. To eliminate these outliers, you need to create link restraints between 5CT and peptide using AceDRG and give it to Servalcat/Refmac.

This "NON-POLYMER" peptide linking is a long standing problem, and we are trying to solve it (actually just starting).

I think, I might have discovered a similar issue to what I had described here for nucleotides with the IAS-link that you fixed before. I have succesfully used servalcat for refining this link in the past, but now in a new model it is always, I think, trying to move the "C" to make a bond instead of the C-gamma.

The model before refinement:

IAS-1

After the refinement it looks like this:

IAS-linking-Issue

The log mentions something along the lines:

Bond distance deviations from the ideal >10.000Sigma will be monitored

K 117 PRO C . - K 118 HIS N . mod.= 1.932 id.= 1.337 dev= -0.595 sig.= 0.011 Symmetry flag = N
K 118 HIS CA . - K 118 HIS C . mod.= 1.719 id.= 1.533 dev= -0.186 sig.= 0.010 Symmetry flag = N
K 119 IAS C . - K 120 GLY N . mod.= 4.367 id.= 1.337 dev= -3.030 sig.= 0.011 Symmetry flag = N

This is from a model where I already manually tried to reposition the residues at reasonable distances, but it also keeps happening when I copy and replace this region from the parental model PDB-7K00.

Sorry about this. This is actually a bug in Refmac from CCP4 8.0.002. We need to solve the libgfortran problem to give you the latest Refmac build.
The workaround was actually explained after "One more thing" of #6 (comment). Could you try this for now?

I am sure the new Refmac will be included in the next CCP4 update, but don't know when it will happen..

Its working with the fix you mentioned above. I had forgotten about it (was in "yeast" mode at the time, now I am back to E. coli ;) ).

Glad it worked out. You need it every time so it is far from ideal. We will try to release the new version as soon as possible.