lhatsk/AlphaLink

Running "predict_with_crosslinks.py" with "restraints.csv --distograms" gives the error

BelyaevaJuly opened this issue ยท 6 comments

Dear AlphaLink developers,

I am trying to run "predict_with_crosslinks.py" as follows:

# Running
predict_with_crosslinks.py $FASTA_FILE restraints.csv --distograms $UNIREF90_PATH $MGNIFY_PATH $PDB70_PATH $MMCIF_PATH $UNICLUST30_PATH --features features.pkl --checkpoint_path $ALPHALINK_WEIGHTS

where "restraints.csv" looks like:

55,236,35.0,0.5,normal
236,311,26.0,1.5,normal

$ALPHALINK_WEIGHTS corresponds to <'PATH'>finetuning_model_5_ptm_CACA_10A.pt.
'features.pkl' is an output file after AlphaFold2 run (with my protein).

I receive the following error message:

Traceback (most recent call last):
  File "<'PATH'>/predict_with_crosslinks.py", line 550, in <module>
    main(args)
  File "<'PATH'>/predict_with_crosslinks.py", line 367, in main
    model, output_directory = load_models_from_command_line(args, config)
  File "<'PATH'>/predict_with_crosslinks.py", line 270, in load_models_from_command_line
    model.load_state_dict(sd)
  File "<'PATH'>/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AlphaFold:
        size mismatch for xl_embedder.linear.weight: copying a param with shape torch.Size([128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]).

Could you please clarify where my mistake is?

lhatsk commented

There are two problems. First, you are trying to use the distogram settings with the "normal" network. If you use the --distograms flag, you have to use the finetuning_model_5_ptm_distogram.pt network weights. Second, you would need to transform your restraints.csv first into a distogram. This can be done with the preprocessing_distributions.py script, see https://github.com/lhatsk/AlphaLink#crosslinking-data

There are two problems. First, you are trying to use the distogram settings with the "normal" network. If you use the --distograms flag, you have to use the finetuning_model_5_ptm_distogram.pt network weights. Second, you would need to transform your restraints.csv first into a distogram. This can be done with the preprocessing_distributions.py script, see https://github.com/lhatsk/AlphaLink#crosslinking-data

I was eventually able to run AlphaLink on the provided test example.
I will try further on my own case.

Thank you!

There are two problems. First, you are trying to use the distogram settings with the "normal" network. If you use the --distograms flag, you have to use the finetuning_model_5_ptm_distogram.pt network weights. Second, you would need to transform your restraints.csv first into a distogram. This can be done with the preprocessing_distributions.py script, see https://github.com/lhatsk/AlphaLink#crosslinking-data

Dear AlphaLink developer, sorry again.

I managed to launch AlphaLink for testing case, with xl-restraints looking like (it failed in case of space-separated file): |

52,148,0.05
55,148,0.05
78,148,0.05
129,164,0.05

Now I am trying to launch code with --distograms flag.
I performed the following steps:

  1. created file rest.csv:
52,148,15.0,5.0,normal
54,151,20.0,5.0,normal
124,152,25.0,6.0,normal

  1. launched processing_distributions.py and obtained distr.csv:
52,148,0.000502613590671492,0.00130679533574588,0.00130679533574588,0.00100522718134298,0.00211097708082027,0.00170888620828307,0.00241254523522316,0.00291515882589465,0.00321672698029755,0.00462404503417772,0.00432247687977483,0.00361881785283474,0.00502613590671492,0.00542822677925211,0.00874547647768395,0.00763972657820667,0.00804181745074387,0.00995174909529554,0.0101527945315641,0.0106554081222356,0.0109569762766385,0.0121632488942501,0.0131684760755931,0.0129674306393245,0.0152794531564133,0.0155810213108162,0.018194611982308,0.0185967028548452,0.017189384800965,0.0201045436268597,0.021210293526337,0.0205066344993969,0.021913952553277,0.0237233614796944,0.0206071572175312,0.0220144752714113,0.0211097708082027,0.0254322476879775,0.0279453156413349,0.0250301568154403,0.023924406915963,0.02291917973462,0.0245275432247688,0.0280458383594692,0.0231202251708886,0.0265379975874548,0.0223160434258142,0.0206071572175312,0.0228186570164857,0.0225170888620828,0.0214113389626055,0.0210092480900684,0.0188982710092481,0.0208082026537998,0.0165862484921592,0.0150784077201448,0.0164857257740249,0.0154804985926819,0.0127663852030559,0.0109569762766385,0.0115601125854443,0.0111580217129071,0.00753920386007238,0.00944913550462404,0.00944913550462404,0.00753920386007238,0.00693606755126659,0.00733815842380378,0.00572979493365501,0.00392038600723764,0.00472456775231202,0.00462404503417772,0.00170888620828307,0.00402090872537193,0.00201045436268597,0.00301568154402895,0.00221149979895456,0.00211097708082027,0.00221149979895456,0.00100522718134298,0.000703659026940088,0.00140731805388018,0.000402090872537193,0.000703659026940088,0.00100522718134298,0.000804181745074387,0.000100522718134298,0.000201045436268597,0.000301568154402895,0.000201045436268597,0.000100522718134298,0,0,0.000100522718134298,0.000100522718134298,0,0,0.000301568154402895,0,0,0,0,0,0,0,0.000100522718134298,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
54,151,0.0001,0,0.0002,0.0001,0,0,0.0001,0.0003,0.0003,0.0005,0.0004,0.0006,0.0005,0.0007,0.0007,0.0007,0.0013,0.0012,0.0015,0.0016,0.0012,0.0023,0.0024,0.0028,0.002,0.0035,0.0043,0.0041,0.0082,0.0062,0.006,0.0064,0.0063,0.0092,0.0092,0.0106,0.0135,0.0108,0.0151,0.0143,0.0155,0.0171,0.0178,0.0199,0.0172,0.0189,0.0198,0.0221,0.0205,0.0236,0.0215,0.0212,0.0241,0.0239,0.0258,0.0229,0.0239,0.0249,0.0232,0.0248,0.0252,0.0242,0.0234,0.024,0.0232,0.0214,0.0206,0.019,0.019,0.0168,0.0162,0.0174,0.0157,0.0126,0.014,0.0101,0.0135,0.0116,0.0109,0.009,0.0095,0.0058,0.0076,0.0075,0.0055,0.0057,0.0032,0.0043,0.0032,0.0026,0.0028,0.0029,0.0016,0.0013,0.0019,0.0011,0.0016,0.0008,0.0017,0.0005,0.0003,0.0004,0.0004,0.0004,0.0003,0.0002,0.0001,0.0003,0.0002,0.0002,0.0002,0,0,0.0001,0,0.0001,0,0,0,0,0,0.0001,0,0,0,0,0,0
124,152,0,0,0.000100240577385726,0,0,0,0,0.000100240577385726,0.000200481154771451,0.000200481154771451,0,0.000200481154771451,0.000200481154771451,0,0.000200481154771451,0.000200481154771451,0.000400962309542903,0.000200481154771451,0.000200481154771451,0.000601443464314354,0.00070168404170008,0.000601443464314354,0.000801924619085806,0.000801924619085806,0.000601443464314354,0.000902165196471532,0.00120288692862871,0.000801924619085806,0.00130312750601443,0.00160384923817161,0.00160384923817161,0.00240577385725742,0.00200481154771451,0.00300721732157177,0.00250601443464314,0.00370890136327185,0.0035084202085004,0.00330793905372895,0.00421010425020048,0.00471130713712911,0.00501202886928629,0.00571371291098637,0.00721732157177225,0.00761828388131516,0.00651563753007217,0.00801924619085806,0.00831996792301524,0.00972333600641539,0.0100240577385726,0.00982357658380112,0.0110264635124298,0.0110264635124298,0.0134322373696873,0.0118283881315156,0.0107257417802727,0.0136327185244587,0.0138331996792302,0.0147353648757017,0.0145348837209302,0.0163392141138733,0.0176423416198877,0.0172413793103448,0.0177425821972735,0.0191459502806736,0.0203488372093023,0.0195469125902165,0.0188452285485164,0.019747393744988,0.020850040096231,0.0216519647153168,0.0241579791499599,0.0203488372093023,0.0216519647153168,0.0211507618283881,0.0191459502806736,0.0205493183640738,0.0224538893344026,0.0211507618283881,0.0183440256615878,0.0189454691259022,0.0168404170008019,0.0196471531676022,0.0180433039294306,0.0176423416198877,0.0203488372093023,0.0165396952686447,0.0173416198877306,0.0163392141138733,0.0155372894947875,0.0129310344827586,0.0117281475541299,0.0123295910184443,0.0132317562149158,0.011327185244587,0.0106255012028869,0.0106255012028869,0.00872093023255814,0.00801924619085806,0.00831996792301524,0.00721732157177225,0.00721732157177225,0.00691659983961508,0.00651563753007217,0.00471130713712911,0.00461106655974338,0.00571371291098637,0.00431034482758621,0.0035084202085004,0.0035084202085004,0.00280673616680032,0.00370890136327185,0.00230553327987169,0.00260625501202887,0.00300721732157177,0.00200481154771451,0.00180433039294306,0.00130312750601443,0.00100240577385726,0.00150360866078589,0.000902165196471532,0.000601443464314354,0.00070168404170008,0.00070168404170008,0.000300721732157177,0.000300721732157177,0.000501202886928629,0.000601443464314354,0.000100240577385726

Then I executed:

ALPHALINK_WEIGHTS=/software/databases/alphalink/finetuning_model_5_ptm_distogram.pt
FASTA_FILE=qna<...>/test/CDK.fasta

predict_with_crosslinks.py $FASTA_FILE distr.csv --distograms $UNIREF90_PATH $MGNIFY_PATH $PDB70_PATH $MMCIF_PATH $UNICLUST30_PATH --features CDK_neff10.pkl --checkpoint_path $ALPHALINK_WEIGHTS

And got the following error:

INFO:/software/all/AlphaLink/1.0-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Loaded OpenFold parameters at /software/databases/alphalink/finetuning_model_5_ptm_distogram.pt...
Traceback (most recent call last):
  File "/software/all/AlphaLink/1.0-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py", line 550, in <module>
    main(args)
  File "/software/all/AlphaLink/1.0-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py", line 394, in main
    crosslinks, grouping = load_crosslinks(args.crosslinks, args.fdr, seq)
  File "/software/all/AlphaLink/1.0-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py", line 304, in load_crosslinks
    for i_, (i,j) in enumerate(links):
ValueError: too many values to unpack (expected 2)

I tried with .csv files space- or comma-separated, but got errors.
In case of space-separated distr.csv I got the following one:

ValueError: could not convert string to float: '52 148 0.000502613590671492 0.00130679533574588 0.00130679533574588 0.00100522718134298 0.00211097708082027 0.00170888620828307 0.00241254523522316 0.00291515882589465 0.00321672698029755 0.00462404503417772 0.00432247687977483 0.00361881785283474 0.00502613590671492 0.00542822677925211 0.00874547647768395 0.00763972657820667 0.00804181745074387 0.00995174909529554 0.0101527945315641 0.0106554081222356 0.0109569762766385 0.0121632488942501 0.0131684760755931 0.0129674306393245 0.0152794531564133 0.0155810213108162 0.018194611982308 0.0185967028548452 0.017189384800965 0.0201045436268597 0.021210293526337 0.0205066344993969 0.021913952553277 0.0237233614796944 0.0206071572175312 0.0220144752714113 0.0211097708082027 0.0254322476879775 0.0279453156413349 0.0250301568154403 0.023924406915963 0.02291917973462 0.0245275432247688 0.0280458383594692 0.0231202251708886 0.0265379975874548 0.0223160434258142 0.0206071572175312 0.0228186570164857 0.0225170888620828 0.0214113389626055 0.0210092480900684 0.0188982710092481 0.0208082026537998 0.0165862484921592 0.0150784077201448 0.0164857257740249 0.0154804985926819 0.0127663852030559 0.0109569762766385 0.0115601125854443 0.0111580217129071 0.00753920386007238 0.00944913550462404 0.00944913550462404 0.00753920386007238 0.00693606755126659 0.00733815842380378 0.00572979493365501 0.00392038600723764 0.00472456775231202 0.00462404503417772 0.00170888620828307 0.00402090872537193 0.00201045436268597 0.00301568154402895 0.00221149979895456 0.00211097708082027 0.00221149979895456 0.00100522718134298 0.000703659026940088 0.00140731805388018 0.000402090872537193 0.000703659026940088 0.00100522718134298 0.000804181745074387 0.000100522718134298 0.000201045436268597 0.000301568154402895 0.000201045436268597 0.000100522718134298 0 0 0.000100522718134298 0.000100522718134298 0 0 0.000301568154402895 0 0 0 0 0 0 0 0.000100522718134298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'

Without --distograms, with "FDR xl restraints" everything works on testing example.

Could you please check again where may be the problem?

distr.csv should be space-separated, not comma-separated.

Are you by chance using an old version of AlphaLink? The load_crosslinks function call in your error message no longer exists and doesn't include the distogram flag. It's unfortunate, but it looks like the "paper release" commit was before the distogram flag. The code release v1.0 on the other hand should contain it. But if you can, you should update to the newest version with git pull.

One more thing, if you need to use the databases, they have to be named now, e.g., --uniref90_database_path uniref90.fasta --mgnify_database_path ... etc. see the README

distr.csv should be space-separated, not comma-separated.

Are you by chance using an old version of AlphaLink? The load_crosslinks function call in your error message no longer exists and doesn't include the distogram flag. It's unfortunate, but it looks like the "paper release" commit was before the distogram flag. The code release v1.0 on the other hand should contain it. But if you can, you should update to the newest version with git pull.

One more thing, if you need to use the databases, they have to be named now, e.g., --uniref90_database_path uniref90.fasta --mgnify_database_path ... etc. see the README

We with our system administrator managed to establish a correct version you wrote above (probably).
With it, I got the following error:

info:/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Loaded OpenFold parameters at /software/databases/alphalink/finetuning_model_5_ptm_distogram.pt...
info:/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Loaded 3 restraints...
info:/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Running inference for sp|P24941|CDK2_HUMAN...
info:/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Inference time: 29.482414484024048
info:/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py:Output written to /work/users/ig962nbia/progs/afexplorer/sfb_retreat_agt2r1_data/december_2023/alpha_link/test_00/test3/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb...
Traceback (most recent call last):
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py", line 570, in <module>
    main(args)
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/predict_with_crosslinks.py", line 469, in main
    relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/relax/relax.py", line 61, in process
    out = amber_minimize.run_pipeline(
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/relax/amber_minimize.py", line 528, in run_pipeline
    ret.update(get_violation_metrics(prot))
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/relax/amber_minimize.py", line 394, in get_violation_metrics
    structural_violations, struct_metrics = find_violations(prot)
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/relax/amber_minimize.py", line 373, in find_violations
    violations = loss.find_structural_violations_np(
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/utils/loss.py", line 1177, in find_structural_violations_np
    out = find_structural_violations(batch, atom14_pred_positions, **config)
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/utils/loss.py", line 1099, in find_structural_violations
    restype_atom14_bounds = residue_constants.make_atom14_dists_bounds(
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/residue_constants.py", line 1225, in make_atom14_dists_bounds
    residue_bonds, residue_virtual_bonds, _ = load_stereo_chemical_props()
  File "/software/all/AlphaLink/1.0.122023-foss-2021a-CUDA-11.3.1/bin/openfold/np/residue_constants.py", line 456, in load_stereo_chemical_props
    with open(stereo_chemical_props_path, "rt") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'openfold/resources/stereo_chemical_props.txt'

It was fixed as follows:

# edited the residue_constants.py, added import os and changed the path setting for the stereo_chemical_props to
stereo_chemical_props_path = '../resources/stereo_chemical_props.txt'
stereo_chemical_props_path = os.path.join(os.path.dirname(__file__), stereo_chemical_props_path)
# loading that file with a relative path.

After that, the following code was executed successfully:

ALPHALINK_WEIGHTS=/software/databases/alphalink/finetuning_model_5_ptm_distogram.pt
FASTA_FILE=<...>/test/CDK.fasta

predict_with_crosslinks.py $FASTA_FILE dist.csv --distograms --uniref90_database_path $UNIREF90_PATH --mgnify_database_path $MGNIFY_PATH --pdb70_database_path $PDB70_PATH --uniclust30_database_path $UNICLUST30_PATH --features CDK_neff10.pkl --checkpoint_path $ALPHALINK_WEIGHTS

I got the following output files:

  1. 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl'
  2. 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb'
  3. 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb'

And dist.csv now really space-separated :)

Great! You might need to copy the stereo_chemical_props.txt file to the openfold/resources folder or if you installed it in some kind of environment to the respective folder in the environment. But setting the path directly obviously also works.