CryoNeFEN: High-resolution real-space reconstruction of cryo-EM structures using a neural field network
CryoNeFEN is a neural network based algorithm for cryo-EM reconstruction. In particular, the method models an isotropic representation of 3D structures using neural fields in the spatial domain.
# clone the repo.
git clone https://github.com/yuehuang2023/cryoNeFEN.git
cd CryoNeFEN
# Make a conda environment.
conda create -n cryonefen python=3.9 pytorch==1.13.0 torchvision==0.14.0 pytorch-cuda=11.6 -c pytorch -c nvidia
conda activate cryonefen
# Install required packages
conda install matplotlib
pip install starfile mrcfile scipy
Note: Please ensure that you have met the prerequisites for PyTorch, the code is tested with pytorch version 1.13, 2.0, and 2.1.
Dependencies (click to expand)
- pytorch
- starfile
- mrcfile
- matplotlib
- scipy
Perform a homogeneous refinement in cryoSPARC software. We will use the poses and CTF parameters from this "consensus reconstruction".
- In cryoSPARC, 1) import the particles, 2) run an ab initio reconstruction job, and 3) run a homogeneous refinement job, all with default parameters.
- CryoNeFEN extracts image poses from a
.cs
file directly. Copy the path of cryoSPARC's metadata file (.cs
file) that contains particle poses and CTF parameters.
When the input image stack (.cs
file) has been prepared, a cryoNeFEN model can be trained with python train.py
:
python train.py -h
usage: train.py [-h] -o OUTDIR [--poses POSES] [--ctf pkl] [--mask mrc] [--split {1,2}] [--load WEIGHTS.PKL] [--checkpoint CHECKPOINT] [--log-interval LOG_INTERVAL] [--seed SEED] [--uninvert-data] [--no-window]
[--window-r WINDOW_R] [--ind IND] [--lazy] [--datadir DATADIR] [-n NUM_EPOCHS] [-b BATCH_SIZE] [--wd WD] [--lr LR] [--norm NORM NORM] [--layers LAYERS] [--dim DIM] [--l-extent L_EXTENT]
[--pe-type {geom_ft,geom_full,geom_lowf,geom_nohighf,linear_lowf,gaussian,none}] [--pe-dim PE_DIM] [--activation {relu,leaky_relu}]
particles
positional arguments:
particles Input particles (.mrcs, .star, .cs, or .txt)
optional arguments:
-h, --help show this help message and exit
-o OUTDIR, --outdir OUTDIR
Output directory to save model
--poses POSES Image poses (.pkl)
--ctf pkl CTF parameters (.pkl)
--mask mrc Optional mask (.mrc, default: sphere mask)
--split {1,2} Split dataset for computing GSFSC
--load WEIGHTS.PKL Initialize training from a checkpoint
--checkpoint CHECKPOINT
Checkpointing interval in N_EPOCHS (default: 1)
--log-interval LOG_INTERVAL
Logging interval in N_IMGS (default: 100)
--seed SEED Random seed
--symmetry SYMMETRY Symmetry for training
Dataset loading:
--uninvert-data Do not invert data sign
--no-window Turn off real space windowing of dataset
--window-r WINDOW_R Windowing radius (default: 0.85)
--ind IND Filter particle stack by these indices
--lazy Lazy loading if full dataset is too large to fit in memory
--datadir DATADIR Path prefix to particle stack if loading relative paths from a .star or .cs file
Training parameters:
-n NUM_EPOCHS, --num-epochs NUM_EPOCHS
Number of training epochs (default: 20)
-b BATCH_SIZE, --batch-size BATCH_SIZE
Minibatch size (default: 4)
--wd WD Weight decay in Adam optimizer (default: 0)
--lr LR Learning rate in Adam optimizer (default: 0.001)
--norm NORM NORM Data normalization as shift, 1/scale (default: 0, 1)
Network Architecture:
--layers LAYERS Number of hidden layers (default: 2)
--dim DIM Number of nodes in hidden layers (default: 256)
--l-extent L_EXTENT Coordinate lattice size (if not using positional encoding) (default: 0.5)
--pe-type {geom_ft,geom_full,geom_lowf,geom_nohighf,linear_lowf,gaussian,none}
Type of positional encoding (default: geom_ft)
--pe-dim PE_DIM Number of sinusoid features in positional encoding (default: 32)
--activation {relu,leaky_relu}
Activation (default: relu)
The required arguments are:
particles
, an input image stack (.cs
or other listed file types)-o
, a clean output directory for storing results
Additional parameters that are typically set include:
-n
, Number of epochs to train-b
, Batchsize of the image stack during the training--lazy
, Lazy loading if the full dataset is too large--mask
, Mask for accelerating the training--symmetry
, Enforce symmetry during training--split
, Split the image stack randomly
Neural network architecture settings
--layers
, Number of hidden layers--dim
, Number of hidden dims--pe-dim
, Number of sinusoid features in positional encoding
If the golden standard Fourier shell correlation (GSFSC) is required in further benchmarking, run commands:
python train.py {cryoSPARC directory}/xxx_particles.cs --mask {cryoSPARC directory}/xxx_volume_mask_refine.mrc --lazy --outdir ./tutorial/ --split 1
python train.py {cryoSPARC directory}/xxx_particles.cs --mask {cryoSPARC directory}/xxx_volume_mask_refine.mrc --lazy --outdir ./tutorial/ --split 2
Notes:
{cryoSPARC directory}/xxx_particles.cs
is the.cs
file processed in step 1.{cryoSPARC directory}/xxx_volume_mask_refine.mrc
is the mask refined by cryoSPARC.
We strongly recommend using a mask to accelerate the training. In cryoSPARC, the refined mask can be found from the web interface as the "mask_refine" output.
Once the model has finished training, the generated density maps are saved in outdir
for further visualization, and analysis.
GSFSC of final results can be computed with python analysis.py
:
python analysis.py -h
usage: analysis.py [-h] [--mask mrc] volumes
positional arguments:
volumes Half-maps directory (.mrc)
optional arguments:
-h, --help show this help message and exit
--mask mrc FSC mask (.mrc)
Example usage:
python analysis.py ./tutorial/ --mask {cryoSPARC directory}/xxx_volume_mask_refine.mrc
Masked FSC curves and reconstructed maps will be plotted.
Heterogeneous reconstruction can be trained with python train_heter.py
:
Example usage:
python train_heter.py {cryoSPARC directory}/xxx_particles.cs --mask mask.mrc --zdim 8 --lazy --outdir ./tutorial/
Notes:
- The applied mask should contain all the heterogeneous particle volumes. A spherical mask is the default.
- Heterogeneous reconstruction requires more GPU memory and training time than standard cryoNeFEN.
After the training, the heterogeneous density maps can be generated with commands in the file analysis_heter.ipynb
.
Trained models and reconstructed maps for EMPIAR-10005, EMPIAR-10049, EMPIAR-10076, EMPIAR-10492 are deposited here.
Huang, Y., Zhu, C., Yang, X. et al. High-resolution real-space reconstruction of cryo-EM structures using a neural field network. Nat Mach Intell (2024). https://doi.org/10.1038/s42256-024-00870-2