A simple app to fetch acquisition metadata from a EPU session or SerialEM. It parses the first found xml/mdoc/mrc/tif file (from EPU/SerialEM) associated with a data collection session and launches Relion 4 or Scipion 3 pipeline.
You can install either using pip (recommended) or from sources.
Dependencies
Dependencies are installed by pip automatically:
- python
- pyqt6 (GUI)
- mrcfile (to parse MRC header)
- tifffile (to parse TIF header)
- emtable (STAR file parser)
Install from pip
pip install --user MDCatch
Install from sources
Create conda env (requires miniconda3 installed):
conda create -n mdcatch python=3
conda activate mdcatch
git clone https://github.com/azazellochg/MDCatch.git
cd MDCatch
pip install -e .
To run simply type mdcatch.
Important
Make sure the detected dose per frame is correct! The reported dose is obtained from an image (at the camera level), so it can differ due to sample thickness, obj. aperture and energy filtering. If you are collecting EER data, the reported dose is per EER frame! EER movies will be fractionated such that final frames will have 1 e/A2.
Here you can find information about how the app works and how to configure it for your setup.
General information
The app is installed on a pre-processing server with GPU(s). The server requires the following software installed:
- RELION 4.0 or/and Scipion 3
- CTFFIND4
- Topaz or/and crYOLO 1.9+ (installed in a separate conda environment)
Relion and/or Scipion should be available from your shell PATH. For Relion's schemes you also need to define the following variables:
export RELION_SCRATCH_DIR="/ssd/$USER"
export RELION_CTFFIND_EXECUTABLE=/home/gsharov/soft/ctffind
export RELION_TOPAZ_EXECUTABLE=/home/gsharov/soft/topaz
export RELION_PYTHON=/home/gsharov/soft/miniconda3/envs/topaz-0.2.4/bin/python # is used by Relion's PyTorch for 2D cls sorting
/home/gsharov/soft/topaz is a bash script like below, that activates topaz environment:
#!/bin/bash
source /home/gsharov/soft/miniconda3/bin/activate topaz-0.2.4
topaz $@
If you are using crYOLO, you need to edit a few variables at the top of external_job_cryolo.py file. This script can also be used completely independently from MDCatch.
Additionally, this server needs access to both EPU session folder (with metadata files) and raw movies folder. In our case both storage systems are mounted via NFSv4.
Configuration
Most of the configuration is done in config.py. For the very first time it is useful to set DEBUG=1 to see additional output and make sure it all works as expected.
Important points to mention:
- camera names in the SCOPE_DICT must match the names in EPU_MOVIES_DICT, GAIN_DICT and MTF_DICT
- since in EPU Falcon cameras are called "BM-Falcon" or "EF-Falcon" and Gatan cameras are called "EF-CCD", MOVIE_PATH_DICT keys should not be changed, only the values
- Relion schemes use two GPUs: 0-1
Below is an example of the folders setup on our server. Data points to movies storage, while Metadata is for EPU sessions.
/mnt
├── Data
│ ├── Krios1
│ │ ├── Falcon3
│ │ └── K3 (with DoseFractions folder inside)
│ ├── Krios2
│ │ ├── Falcon4
│ │ └── K2 (with DoseFractions folder inside)
│ ├── Krios3
│ │ ├── Falcon3
│ │ └── K3 (with DoseFractions folder inside)
│ ├── Krios4
│ │ └── Falcon4
│ └── Glacios
│ └── Falcon3
└── MetaData
├── Krios1
├── Krios2
├── Krios3
└── Krios4
Working principle
- find and parse the first metadata file, getting all acquisition metadata
- create a Relion/Scipion project folder
username_microscope_date_time
inside PROJECT_PATH (or inside Scipion default projects folder) - create symlink for movies folder; copy gain reference, defects file, MTF into the project folder
- save found acquisition params in a text file (e.g.
EPU_session_params
), save Relion params inrelion_it_options.py
- modify existing Relion Schemes/Scipion template, copy them to the project folder then launch Relion/Scipion on-the-fly processing
While EPU xml files are most rich in terms of needed metadata, other formats can be used as well. If you set PATTERN_EPU to mrc format, the app will try to parse MRC header of unaligned movie sums in the EPU session folder. However we cannot detect number of movie frames and super-resolution mode from such a header, so you would need to check and input correct pixel size and/or fluence per frame.
In case of SerialEM, mdoc file is expected to contain a microscope D-number (see example in tests/testdata). If you set PATTERN_SEM to tif, the TIF header of a movie will be parsed. Unfortunately SerialEM does not save much metadata in such header, so a lot of values will be missing. Default values will be used for microscope ID, detector, voltage and binning (see utils/tiff.py). So, parsing tif is not recommended. EER header parsing is also possible, but again, it's just a special kind of TIF format.
When choosing EPU option, the user must browse to the EPU session folder (that contains Images-Disc folder) with the GUI. The app will search and parse the first found xml or mrc file from that folder (depending on PATTERN_EPU). The metadata folder name (EPU session name) matches the folder name with movies on a storage server.
In case of SerialEM, the movies and metadata (mdoc file) are expected to be in the same folder, so here user must select a folder with movies in the GUI.
From MDCatch v2.2 onwards crYOLO picker can be run in helical mode (crYOLO v1.9+ required). Instead of a particle size, user provides the filament width. A pre-trained crYOLO model is also required. The suggested parameters in this case are:
- tube diameter = 1.2 x filament width
- box size = 1.5 x tube diameter
- mask size = 0.9 x box size
- inter-box distance = 0.1 x box size
When running standard SPA, the suggested parameters are:
- box size = 1.5 x particle size
- mask size = 1.1 x particle size
More details can be found in the code, see calcBox() inside parser.py
So far RELION runs are more tested than Scipion. In the latter case, the app provides a single template.json, so irrespective of particle picker choice crYOLO will always be used. Have a look into the json file to see what pipeline will be launched.
Scipion project will be created in the default Scipion projects folder.
Relion schemes description
There are two schemes: prep and proc-cryolo (or proc-topaz). The latter is available in 3 variants: cryolo, topaz and log. Both schemes launched at the same time and will run for 18 hours
The prep scheme includes 3 jobs that run in a loop, processing batches of 50 movies each time:
- import movies
- motion correction (relion motioncor)
- ctffind4-4.1.14
Important
The movie frames will be grouped if the dose per frame is < 0.8 e/A2. EER movies are fractionated such that final frames have 1 e/A2.
The proc scheme starts once ctffind results are available. Proc includes multiple jobs:
- micrograph selection (CTF resolution < 6A)
- particle picking: Cryolo (proc-cryolo) or Topaz/Logpicker (proc-topaz)
- binned particles extraction
- 2D classification with 50 classes
- auto-selection of good 2D classes (thr=0.35)
- 3D initial model if number of good particles from previous step is > 5000
- 3D refinement
The last four steps are always executed as new jobs (not overwriting previous results).
Testing installation
The test only checks if the parsers are working correctly using files from tests/testdata folder.
python -m unittest mdcatch.tests
The MDCatch package provides extra command-line scripts to parse MRC, XML, MDOC or TIF file headers. Simply use one of the commands below followed by a filename:
- parse-mrc filename.mrc
- parse-xml filename.xml
- parse-mdoc filename.mdoc
- parse-tif filename.tiff
Kimanius D, Dong L, Sharov G, Nakane T, Scheres SHW. New tools for automated cryo-EM single-particle analysis in RELION-4.0. Biochem J. 2021, 478(24), p. 4169-4185. doi:10.1042/BCJ20210708
Please report bugs and suggestions for improvements as a Github issue.