GisSPA

ATTENTION PLEASE

The big file Data_TEST in github is stored by LFS, so the "git lfs" is necessary to be installed before CLONING THIS PROJECT!!!!!!!!!!!!!!!!

Time for compling: short
No need to install
In Data_TEST, process one image(720*720) with A100 need 40 seconds.
Template is 182 * 182 * 2295

Introduction

GPU parallel for isspa, pick particle with CC.
CPU version from CJ, BeiJing

structure

./deprated_code/:ignored this file
./EMReader/: read .hdf and .mrc file(just read data). Extracted from EMAN1.9.
./main.cu: main function
./GPU_func.cu(h): function processed on GPU
./Makefile: complier
./hdf5: the source package of hdf5 // no need temp file:
./Data_TEST: test data
./Output: res for test data and obj files.
./Config_example: configure file to run test data.

Install

install git lfs to get the test_Data
In ubuntu, sudo apt install libhdf5-dev. In Centos, sudo yum install hdf5-devel.x86_64 gcc-c++ . CUDA is a must for compiling this.
In ubuntu, cd /usr/lib/x86_64-linux-gnu && sudo ln -s libhdf5_serial.so.xx.xx.xx libhdf5.so && sudo ln -s libhdf5_serial_hl.so.xx.xx.xx libhdf5_hl.so

replace xx.xx.xx to the actual extension name.

complie with Makefile in ubuntu, or make -f Makefile_for_centos in Centos.
write a config file
./main + config_file

Quick start

-> install git lfs

sudo apt-get update
sudo apt-get install git-lfs
git lfs install
git lfs clone https://github.com/Autumn1998/GisSPA.git

-> run GisSPA

cd GisSPA
vim Makfile, set LIB_HDF5="your hdf5 install path"/lib, set INCLUDE_HDF5="your hdf5 install path"/include
make clean (I modified clean section so that removing ./main)
make
If "can not create /Output/Objects/main.o", mkdir /Output/Objects
./main ./Data_TEST/config

WARNING:The Data_TEST need at least 13GB memory at GPU!
If error occurred and temp_size = 10,10 and img_size = 10,10, check the git lfs.
If "error while loading shared libraries", check your LD_LIBRARY_PATH, and source. If it not work, do
cp "your hdf5 install path"/lib/libhdf5.so.10 .
Make ./main and libhdf5.so.10 in the same path.

The output will be find at GisSPA/Output/test_Image_output.lst. We attacted the result at ./Data_TEST/Output/test_Image_output.lst.

Parameters

This program detect targets with orientations and tanslations.

How to use this problem: "./main config_file"
use "./main -h" to get help msg

--------------------------------config example:--------------------------------------
# Do leave spaces on either side of '='
# Parameter request
input = Data_TEST/test_Image.lst template = Data_TEST/emd_9976_apix3p336_proj.hdf
eulerfile = Data_TEST/proj_step3_c2.lst
angpix = 3.336
phistep = 2
kk = 3
energy = 300
cs = 2.7
Highres = 9
Lowres = 100
diameter = 180

# Parameter optional
threshold = 8
output = Data_TEST/test_Image_output.lst first = 0
last = 12
window_size = 480
phase_flip = 1
GPU_ID = 0
overlap = 180

--------------------------------------------------------------------------------------

HDF5 and CUDA lib are needed to run
All parameters should be set on the config file.
All parameters are listed as below which can be set at config file.
(for .eg, "input = /Data/inputfile" )Requested Parameters:
input = input contast-inverted images lstfile with ctf information
template = input 2D projections templates in .hdf format
eulerfile = euler file with euler values
angpix = input pixel size in angstroms
phistep = inplane rotation sampling
kk = overlapping density parameter, default is 3.
energy = accerlerating voltage in kV.
cs = spherical aberration in um.
Highres = high resolution cut
Lowres = low resolution cut
diameter = target diameter in pixels Optional Parameters:
threshold = cc threshold value, only output score beyond this value
output = output lstfile filaname
first = the first image id to process.
last = the last image id to process.
window_size = the window size which is splitted from raw IMG.
GPU_ID = ID of GPU device.
phase_flip = Whether do filtering on images, operation(1) or not(0, in case of input being filtered already).
overlap = size of overlap between diff window.

attention:

should be:
template size: tx,ty
img size:ix.iy
a)max(tx,ty) < padding_size < max(ix,iy)
b)padding_size%32 = 0
as padding_size increasing, consumed memory decreases↓ first, then increases↑.
for parameter "input". prefix will be auto-added. prefix is current dir
all imgs should have same size.

注意： 1.padding_size必须大于template的边长，小于raw img短边边长，并且为32的倍数，图像会被分割成为padding_size*padding_size1的子区域，overlap大小为padding_size的13%.
2.padding_size过小会使并行度不足，过大会提高申请存储的时间和overlap的消耗。在显存足够的情况下，padding_size设置为320左右效果较好。但是仍需根据实际实验结果进行调试(数据量大的时候适当增大).
4.使用的时候,--input的文件所在的文件夹与其中包含的.MRC/.hdf文件相同,即程序在解析raw image的时候，文件所在的文件夹(前缀)与--input的输入相同.
5.所有img的size必须相同

Python script

relion2lst_only_particles.py (rewritten from Wen Jiang's script from JSPR)
This script read particles.star file and generate file in .lst format and rewrite images in .hdf format.
Please run ./relion2lst_only_particles.py -h for details.

remove_repeat_particles_from_list.py

This python script is used to merge duplicate detections:
= Data_TEST/test_Image_bin2_output.lst
= last-first (eg. 12)

= Threshold of coordinate distance (eg. 4) = Threshold of euler distance (eg. 6) = Merged list filename (eg. Data_TEST/test_Image_bin2_output_merged.lst) Detections that are within both the center and euler thresholds are considered to be duplicate detections

convert_my-output_to_relion.py

= detection file after removing duplicate detections (eg. Data_TEST/test_Image_output_merged.lst)
= contast-inverted images lstfile with ctf information (eg. test_Image.lst)
= scale factor of images used in localization (eg. 2)
= window size in localization (eg. 720)
= pixel size in original micrograph (eg. 1.668)
= output file in .star format

A complete workflow of the demo data can be found at folder Data_TEST/workflow.txt ##Micrographs are clipped into multiple images to save GPU memory. In case of sufficient GPU memory, this step can be ignored, but the contast of micrographs must be inverted.

contributor

LT & CJ.