Way to import CisTEM .star and .mrc files in to cryodrgn?
Closed this issue · 7 comments
Hello,
I am trying to run cryodrgn on a particle stack created in Cistem (.star, .mrc). Do you know of anyways to convert these file types into inputs that can be used for cryodrgn?
Thanks!
Have you tried cryodrgn parse_pose_star
and cryodrgn parse_ctf_star
?
Have you tried
cryodrgn parse_pose_star
andcryodrgn parse_ctf_star
?
Yes. I get the following error:Traceback (most recent call last):
File "/home/laina/.local/bin/cryodrgn", line 8, in
sys.exit(main())
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/main.py", line 72, in main
args.func(args)
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/commands/parse_pose_star.py", line 33, in main
s = starfile.Starfile.load(args.input)
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/starfile.py", line 46, in load
return cls.parse_block(starfile, block_header="data")
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/starfile.py", line 97, in _parse_block
), f"Error in parsing. Number of columns {words.shape[1]} != number of headers {len(headers)}"
cryoDRGN expects .mrcs files for the particle stack. You could try renaming from .mrc to .mrcs and then correcting the paths the same way in your star file. If that doesn't fix the issue, can you share the top 50 lines or so of your star file? It sounds like there could be something in the formatting or column naming causing this error.
Here are the top 50 lines, thank you!
# Written by cisTEM Version 2.0.0-alpha-6-8cf890c on 2023-08-30 19:50:00
data_
loop_
_cisTEMPositionInStack #1
_cisTEMAnglePsi #2
_cisTEMAngleTheta #3
_cisTEMAnglePhi #4
_cisTEMXShift #5
_cisTEMYShift #6
_cisTEMDefocus1 #7
_cisTEMDefocus2 #8
_cisTEMDefocusAngle #9
_cisTEMPhaseShift #10
_cisTEMImageActivity #11
_cisTEMOccupancy #12
_cisTEMLogP #13
_cisTEMSigma #14
_cisTEMScore #15
_cisTEMScoreChange #16
_cisTEMPixelSize #17
_cisTEMMicroscopeVoltagekV #18
_cisTEMMicroscopeCsMM #19
_cisTEMAmplitudeContrast #20
_cisTEMBeamTiltX #21
_cisTEMBeamTiltY #22
_cisTEMImageShiftX #23
_cisTEMImageShiftY #24
_cisTEMBest2DClass #25
_cisTEMBeamTiltGroup #26
_cisTEMStackFilename #27
_cisTEMOriginalImageFilename #28
# POS PSI THETA PHI SHX SHY DF1 DF2 ANGAST PSHIFT STAT OCC LogP SIGMA SCORE CHANGE PSIZE VOLT Cs AmpC BTILTX BTILTY ISHFTX ISHFTY 2DCLS TGRP STACK_FILENAME ORIGINAL_IMAGE_FILENAME REFERENCE_3D_FILENAME PaGRP SUBSET PREEXP TOTEXP
1 151.40 125.40 237.00 0.00 0.00 8884.8 9067.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
2 134.00 26.70 5.25 0.00 0.00 8984.8 9167.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
3 204.70 45.50 15.12 0.00 0.00 8624.8 8807.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
4 107.10 131.30 190.18 0.00 0.00 8444.8 8627.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
5 180.80 61.70 297.80 0.00 0.00 8944.8 9127.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
6 196.70 152.80 47.39 0.00 0.00 8684.8 8867.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
7 260.20 96.80 8.45 0.00 0.00 8824.8 9007.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
8 19.30 62.20 74.43 0.00 0.00 8884.8 9067.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
9 184.20 73.70 164.98 0.00 0.00 8244.8 8427.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
10 53.70 112.90 99.32 0.00 0.00 8564.8 8747.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
11 9.10 150.50 298.60 0.00 0.00 8924.8 9107.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
12 308.90 63.10 106.07 0.00 0.00 8384.8 8567.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
13 6.10 77.30 94.34 0.00 0.00 8964.8 9147.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
14 332.20 118.50 203.70 0.00 0.00 8724.8 8907.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
15 12.00 164.30 261.15 0.00 0.00 8064.8 8247.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
16 187.60 117.40 24.51 0.00 0.00 8724.8 8907.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc'
17 17.30 151.80 217.12 0.00 0.00 8484.8 8667.5 161.78 0.00 1 1.00 5000 10.0000 50.00 0.00 1.06000 300.00 2.70 0.0700 0.000 0.000 0.000 0.000 0 0 'data/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW_stack.mrc' '/nrs/lucasb/Lucas_lab/Joshua_2023/BY4741_CHX_Ga+30kVpolish_mrc_correctedSums/2022-06-24_14.15.46_BY4741+CHX_30kV_polish_81_001_correctedSum_DW.mrc
Thanks for sharing! These are definitely different column names than what cryodrgn expects in star files.
Can you try exporting from cisTEM? The export option is under Assets > Refine pkgs. > Export. On the second page of the export pop up you can select Relion format.
Thank you Ryan, that was super helpful! I was able to export from cisTEM in the way you described but now am getting this error.
Traceback (most recent call last):
File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '_rlnAngleRot'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/laina/.local/bin/cryodrgn", line 8, in <module>
sys.exit(main())
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/__main__.py", line 72, in main
args.func(args)
File "/home/laina/.local/lib/python3.7/site-packages/cryodrgn/commands/parse_pose_star.py", line 50, in main
euler[:, 0] = s.df["_rlnAngleRot"]
File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/EMAN_2.91/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: '_rlnAngleRot'
No problem! It now looks like the rlnAngleRot column might be missing or not recognized. Can you double check that the output star file looks normal? You can share the top of that file here if you are unsure.