ml-struct-bio/cryodrgn

Way to import CisTEM .star and .mrc files in to cryodrgn?

Closed this issue · 7 comments

Hello,

I am trying to run cryodrgn on a particle stack created in Cistem (.star, .mrc). Do you know of anyways to convert these file types into inputs that can be used for cryodrgn?

Thanks!

Have you tried cryodrgn parse_pose_star and cryodrgn parse_ctf_star?

Have you tried cryodrgn parse_pose_star and cryodrgn parse_ctf_star?

Yes. I get the following error:Traceback (most recent call last):
File "/home/laina/.local/bin/cryodrgn", line 8, in
sys.exit(main())
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/main.py", line 72, in main
args.func(args)
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/commands/parse_pose_star.py", line 33, in main
s = starfile.Starfile.load(args.input)
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/starfile.py", line 46, in load
return cls.parse_block(starfile, block_header="data")
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/starfile.py", line 97, in _parse_block
), f"Error in parsing. Number of columns {words.shape[1]} != number of headers {len(headers)}"

cryoDRGN expects .mrcs files for the particle stack. You could try renaming from .mrc to .mrcs and then correcting the paths the same way in your star file. If that doesn't fix the issue, can you share the top 50 lines or so of your star file? It sounds like there could be something in the formatting or column naming causing this error.

Here are the top 50 lines, thank you!

# Written by cisTEM Version 2.0.0-alpha-6-8cf890c on 2023-08-30 19:50:00
data_

loop_
_cisTEMPositionInStack #1
_cisTEMAnglePsi #2
_cisTEMAngleTheta #3
_cisTEMAnglePhi #4
_cisTEMXShift #5
_cisTEMYShift #6
_cisTEMDefocus1 #7
_cisTEMDefocus2 #8
_cisTEMDefocusAngle #9
_cisTEMPhaseShift #10
_cisTEMImageActivity #11
_cisTEMOccupancy #12
_cisTEMLogP #13
_cisTEMSigma #14
_cisTEMScore #15
_cisTEMScoreChange #16
_cisTEMPixelSize #17
_cisTEMMicroscopeVoltagekV #18
_cisTEMMicroscopeCsMM #19
_cisTEMAmplitudeContrast #20
_cisTEMBeamTiltX #21
_cisTEMBeamTiltY #22
_cisTEMImageShiftX #23
_cisTEMImageShiftY #24
_cisTEMBest2DClass #25
_cisTEMBeamTiltGroup #26
_cisTEMStackFilename #27
_cisTEMOriginalImageFilename #28
#    POS     PSI   THETA     PHI       SHX       SHY      DF1      DF2  ANGAST  PSHIFT  STAT     OCC      LogP      SIGMA   SCORE  CHANGE    PSIZE    VOLT      Cs    AmpC  BTILTX  BTILTY  ISHFTX  ISHFTY 2DCLS  TGRP                                      STACK_FILENAME                             ORIGINAL_IMAGE_FILENAME                               REFERENCE_3D_FILENAME    PaGRP  SUBSET  PREEXP  TOTEXP
       1  151.40  125.40  237.00      0.00      0.00   8884.8   9067.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       2  134.00   26.70    5.25      0.00      0.00   8984.8   9167.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       3  204.70   45.50   15.12      0.00      0.00   8624.8   8807.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       4  107.10  131.30  190.18      0.00      0.00   8444.8   8627.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       5  180.80   61.70  297.80      0.00      0.00   8944.8   9127.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       6  196.70  152.80   47.39      0.00      0.00   8684.8   8867.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       7  260.20   96.80    8.45      0.00      0.00   8824.8   9007.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       8   19.30   62.20   74.43      0.00      0.00   8884.8   9067.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
       9  184.20   73.70  164.98      0.00      0.00   8244.8   8427.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      10   53.70  112.90   99.32      0.00      0.00   8564.8   8747.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      11    9.10  150.50  298.60      0.00      0.00   8924.8   9107.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      12  308.90   63.10  106.07      0.00      0.00   8384.8   8567.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      13    6.10   77.30   94.34      0.00      0.00   8964.8   9147.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      14  332.20  118.50  203.70      0.00      0.00   8724.8   8907.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      15   12.00  164.30  261.15      0.00      0.00   8064.8   8247.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      16  187.60  117.40   24.51      0.00      0.00   8724.8   8907.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
      17   17.30  151.80  217.12      0.00      0.00   8484.8   8667.5  161.78    0.00     1    1.00      5000    10.0000   50.00    0.00  1.06000  300.00    2.70  0.0700   0.000   0.000   0.000   0.000     0     0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc

Thanks for sharing! These are definitely different column names than what cryodrgn expects in star files.

Can you try exporting from cisTEM? The export option is under Assets > Refine pkgs. > Export. On the second page of the export pop up you can select Relion format.

Thank you Ryan, that was super helpful! I was able to export from cisTEM in the way you described but now am getting this error.

Traceback (most recent call last):
  File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '_rlnAngleRot'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/laina/.local/bin/cryodrgn", line 8, in <module>
    sys.exit(main())
  File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/__main__.py", line 72, in main
    args.func(args)
  File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/commands/parse_pose_star.py", line 50, in main
    euler[:, 0] = s.df["_rlnAngleRot"]
  File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: '_rlnAngleRot'

No problem! It now looks like the rlnAngleRot column might be missing or not recognized. Can you double check that the output star file looks normal? You can share the top of that file here if you are unsure.