Based on pytorch
By Jing Wang and Siwei Yu (siweiyu@hit.edu.cn)
Center of Geophysics, Harbin Insititute of Technology, Harbin, China
If you find this toolbox useful, please cite the following paper (accepted by Geophysics):
Deep learning for denoising (https://arxiv.org/abs/1810.11614)
Note that the results from examples of this toolbox are not identical to those in the paper. The training set, test set, programming language are different.
- This code is used to generate sample data from .segy seismic data for deep learning based on pytorch.
- It can be used for denoising or interpolation for seismic data.
- This code is modified from KaiZhang.
- you own .segy or .sgy seismic data or you can download some .segy or .sgy data online by the code we provide
- the model we provided is trained with Model94_shots and 7m_shots_0201_0329 dataset (mode: DNCNN)
from get_patch import*
from gain import *
# original data generates patch
train_data = datagenerator(data_dir,patch_size = (128,128),stride = (32,32), train_data_num = float('inf'), download=False,datasets=[],aug_times=0,scales = [1],verbose=True,jump=1,agc=True)
train_data = train_data.astype(np.float64)
xs = torch.from_numpy(train_data.transpose((0, 3, 1, 2)))
# add noise
DDataset = DenoisingDataset(xs,25)
'''
#random downsampling,rate : the sampling rate
DDataset = DownsamplingDataset(xs,rate = 0.7,regular = False)
#sampling regularly, rate : sampling interval
DDataset = DownsamplingDataset(xs,rate = 2,regular = True)
'''
Parameters in datagenerator :
data_dir : the path of the .segy file exit or you want to download in
patch_size : the size the of patch
stride : when get patches, the step size to slide on the data
train_data_num: int or float('inf'),default=float('inf'),mean all the data will be used to Generate patches,
if you just need 3000 patches, you can set train_data_num=3000;
download(bool): whether you will download the dataset from the internet,and we provide 7 inline datasets,the order is
1. http://s3.amazonaws.com/open.source.geoscience/open_data/bpmodel94/Model94_shots.segy.gz
2. http://s3.amazonaws.com/open.source.geoscience/open_data/bpstatics94/7m_shots_0201_0329.segy.gz
3. https://s3.amazonaws.com/open.source.geoscience/open_data/bp2.5d1997/1997_2.5D_shots.segy.gz
4. http://s3.amazonaws.com/open.source.geoscience/open_data/bpvelanal2004/shots0001_0200.segy.gz
5. http://s3.amazonaws.com/open.source.geoscience/open_data/bptti2007/Anisotropic_FD_Model_Shots_part1.sgy.gz
6. https://s3.amazonaws.com/open.source.geoscience/open_data/hessvti/timodel_shot_data_II_shot001-320.segy.gz
7. http://s3.amazonaws.com/open.source.geoscience/open_data/Mobil_Avo_Viking_Graben_Line_12/seismic.segy
datasets(int) : the number of the datasets will be download in the datasets we provide if download = True,
e.g:dataset=2,it mean that you will download the 1. http://s3.amazonaws.com/open.source.geoscience/open_data/bpmodel94/Model94_shots.segy.gz
and 2. https://s3.amazonaws.com/open.source.geoscience/open_data/bp2.5d1997/1997_2.5D_shots.segy.gz two datasets.
aug_times(int) : the time of the aug you will perform,used to increase the diversity of the samples,in each time,
Choose one operation at a time,eg:flip up and down、rotate 90 degree and flip up and down
scales(list) : The ratio of the data being scaled. default = [1], no scale by default.
verbose(bool) : Whether to output the generate situation of the patches
jump(int) : default=1, mean that read shot one by one; when jump>=2, mean that don`t read the shot one by one
instead of with a certain interval,such as: jump=3,you will use the 1、4、7... shot data
agc(bool) : if use the agc(Normalize each trace by amplitude) of the data
- Note : the parameters "jump" is only available when the dimensions of each shot data are the same. And we provide a small .segy data in ‘data/test’ to test the "datagenerator" function or you can just run
python get_patch.py
to test and look at some of the data sets that are being visualized. Just like:
python main_train_denoise.py --data_dir data/train
python main_train_inter.py --data_dir data/train
(Note: we suppose you have put the "segy" files in the "data/train" folder. If not, please use --download True --datasets 2 (2 means you want to use 2 datasets in the default library). Sometimes the network is not stable and the datasets cannot be downloaded. We provide a baiduyun link for some datasets here, link:https://pan.baidu.com/s/1YBO8-GOvk6JJGQZSKBdgJg)
python main_test_denoise.py --data_dir data/test --sigma 50
python main_test_inter.py --data_dir data/test --rate 2
- For more tasks: salt body classification、wave equation inversion and test for field data
- Parallel computing
- Support for matconvnet