Seismic data sample generation

Based on pytorch

Introduction

This code is used to generate sample data from .segy seismic data for deep learning based on pytorch.
It can be used for denoising or interpolation for seismic data.
This code is modified from KaiZhang.

Prerequisites

Python3 with dependencies: scipy, numpy, h5py, glob, pytorch and segyio

Datasets

you own .segy or .sgy seismic data or you can download some .segy or .sgy data online by the code we provide
the model we provided is trained with Model94_shots and 7m_shots_0201_0329 dataset (mode: DNCNN) ##Generating training data

from get_patch import*  
from gain import * 
# original data generates patch
train_data = datagenerator(data_dir,patch_size = (128,128),stride = (32,32), train_data_num = float('inf'), download=False,datasets=[],aug_times=0,scales = [1],verbose=True,jump=1,agc=True)

train_data = train_data.astype(np.float64)
xs = torch.from_numpy(train_data.transpose((0, 3, 1, 2)))

# add noise
DDataset = DenoisingDataset(xs,25)

'''
#random downsampling，rate : the sampling rate
DDataset = DownsamplingDataset(xs,rate = 0.7,regular = False)
#sampling regularly, rate ： sampling interval
DDataset = DownsamplingDataset(xs,rate = 2,regular = True)
'''

Parameters in datagenerator :

data_dir      : the path of the .segy file exit or you want to download in
patch_size    : the size the of patch
stride        : when get patches, the step size to slide on the data
train_data_num: int or float('inf'),default=float('inf'),mean all the data will be used to Generate patches,
                if you just need 3000 patches, you can set train_data_num=3000;

download(bool): whether you will download the dataset from the internet,and we provide 7 inline datasets,the order is
			    1. http://s3.amazonaws.com/open.source.geoscience/open_data/bpmodel94/Model94_shots.segy.gz
                2. http://s3.amazonaws.com/open.source.geoscience/open_data/bpstatics94/7m_shots_0201_0329.segy.gz
				3. https://s3.amazonaws.com/open.source.geoscience/open_data/bp2.5d1997/1997_2.5D_shots.segy.gz
				4. http://s3.amazonaws.com/open.source.geoscience/open_data/bpvelanal2004/shots0001_0200.segy.gz
				5. http://s3.amazonaws.com/open.source.geoscience/open_data/bptti2007/Anisotropic_FD_Model_Shots_part1.sgy.gz
				6. https://s3.amazonaws.com/open.source.geoscience/open_data/hessvti/timodel_shot_data_II_shot001-320.segy.gz
				7. http://s3.amazonaws.com/open.source.geoscience/open_data/Mobil_Avo_Viking_Graben_Line_12/seismic.segy

datasets(int) : the number of the datasets will be download in the datasets we provide if download = True,
				 e.g:dataset=2,it mean that you will download the 1. http://s3.amazonaws.com/open.source.geoscience/open_data/bpmodel94/Model94_shots.segy.gz 
				 and 2. https://s3.amazonaws.com/open.source.geoscience/open_data/bp2.5d1997/1997_2.5D_shots.segy.gz two datasets.

aug_times(int) : the time of the aug you will perform,used to increase the diversity of the samples,in each time,
                 Choose one operation at a time,eg:flip up and down、rotate 90 degree and flip up and down
scales(list)   : The ratio of the data being scaled. default = [1], no scale by default.
verbose(bool)  : Whether to output the generate situation of the patches

jump(int)      : default=1, mean that read shot one by one; when jump>=2, mean that don`t read the shot one by one
                 instead of with a certain interval,such as: jump=3,you will use the 1、4、7... shot data

agc(bool)      : if use the agc(Normalize each trace by amplitude) of the data

Note : the parameters "jump" is only available when the dimensions of each shot data are the same. And we provide a small .segy data in ‘data/test’ to test the "datagenerator" function or you can just run python get_patch.py to test and look at some of the data sets that are being visualized. Just like:

Training

python main_train_denoise.py --data_dir data/train
python main_train_inter.py --data_dir data/train

(Note: we suppose you have put the "segy" files in the "data/train" folder. If not, please use --download True --datasets 2 (2 means you want to use 2 datasets in the default library). Sometimes the network is not stable and the datasets cannot be downloaded. We provide a baiduyun link for some datasets here, link：https://pan.baidu.com/s/1VuRC40rugaoD2-hRzC1cbQ code：x0nq)

Test

python main_test_denoise.py --data_dir data/test --sigma 50