This repository contains the code to print materials and velocity fields downloaded from the HEMEW-3D repository Recherche Data Gouv. Additional data can also be created following the notebook Create_materials.ipynb.
This repository also allows to train 4 neural operators using the HEMEW-3D dataset : Fourier Neural Operator (FNO), U-shaped Neural Operator (U-NO), Group-equivariant Fourier Neural Operator (G-FNO), Factorized Fourier Neural Operator (F-FNO). Pre-processing, training, and post-processing is described below.
The HEMEW-3D dataset contains 30,000 simulation results of the 3D elastic wave equation. Results have been obtained with the earthquake simulator SEM3D. This equation governs the propagation of waves in a 3D propagation medium (also called material in the following). Two types of data are given in this dataset.
The first type of data is the collection of 30,000 materials. The materials are 3D domains built from non-stationary random fields. Their size is 32 x 32 x 32 points. Physically, they correspond to a domain of length 9600m with 300m spacing between two points. Materials contain the values of shear-waves velocity. The minimum value is 1071m/s and the maximum is 4500m/s. All materials contain a 1800m-thick bottom layer with a constant velocity of 4500m/s.
Materials are provided as .npy
arrays, readable with python: a = np.load(‘materials0-1999.npy’)
Each file contains 2000 materials. Therefore, a
is of shape (2000, 32, 32, 32). Indices correspond to the material index, the x coordinate (from West to East), the y coordinate (from South to North), and the z coordinate (from bottom to top).
The 15 materials files amount to 3.9GB. They are downloadable individually on Recherche Data Gouv. A batch of 10 materials is given in the data
folder for illustration purposes.
Metadata are given in the data
folder. They contain the minimum, mean, maximum and standard deviation of each material.
The second type of data is the collection of surface velocity fields. They have been generated by solving the 3D elastic wave equation with the high-performance computing code SEM3D based on the Spectral Element Method (https://github.com/sem3d/SEM). To each material described above corresponds one velocity field, obtained by the propagation of waves through this material.
Velocity fields were recorded by a grid of 16 x 16 virtual sensors located at the surface of the propagation domain between 150m and 450m (600m between consecutive sensors). Each sensor records the 3-component velocity with a 100Hz sampling.
Computational details: The computational mesh was designed with elements of size 300m and 7 Gauss-Lobato-Legendre quadrature points. It can accurately represent the propagation of waves up to 5Hz frequency. Waves were generated by a point-wise source placed at the bottom of the domain, inside the constant layer (the position of the source is 4800, 4800, -8400m). The seismic source is described by a moment tensor with fixed orientation (strike = 48°, dip = 45°, and rake = 88°) and amplitude (moment magnitude M0=2.47 · 1016 N.m).
Results are given in .feather dataframes, readable with pandas library in Python: v = pd.read_feather(‘velocity0-99.feather’). Each dataframe contains 100 simulation results. Each row of the dataframe has the following format:
run | field | x | y | z | 0.0 | 0.01 | 0.02 | … | 19.98 | 19.99 |
---|---|---|---|---|---|---|---|---|---|---|
12 | Veloc E | 150.0 | 770.0 | -1.0 | 0 | 0 … | 1.1e-5 | 1.0e-5 |
where run
indicates the index of the material used in this simulation, field
indicates the component of the velocity field (Veloc E
for East-West, Veloc N
for North-South, Veloc Z
for Vertical). x
, y
, z
are the coordinates of the sensor (in meters). The next 2000 columns contain the velocity field for times 0, 0.01, …, 19.99.
The 300 velocity fields files amount to 369.9 GB. They are downloadable individually (1.2 GB per file) on Recherche Data Gouv. A batch of velocity fields corresponding to 10 materials is given in the data
folder for illustration purposes.
Metadata are given in the data
folder. They contain the first wave arrival time at the surface, the minimum, mean, and maximum Peak Ground Velocity.
Due to the large size of the database, it cannot be entirely loaded on CPUs or GPUs. Therefore, the preprocessing step consists in writing individual files sample_i.h5
that contain the material a
and the three components of the velocity fields uE
, uN
, and uZ
.
To reduce the computational time of machine learning applications, velocity fields are downsampled from 100 Hz to 50 Hz, and restricted to the time interval [1; 7.4s] (leading to 320 time steps). They are also spatially interpolated from 16 x 16 sensors to 32 x 32 to match the inputs dimension.
To create the inputs, run python3 create_data_materials.py @Ntrain 27000 @Nval 3000
and then python3 create_data_velocityfields.py @Ntrain 27000 @Nval 3000 @interpolate
.
Models FNO, U-NO, G-FNO, and F-FNO can be trained with the default options by running models.train_fno3d.py
, models.train_uno3d.py
, models.train_gfno3d.py
, and models.train_ffno3d.py
. The provided code supports CPU and single-GPU training.
To print loss history, model predictions under the form of timeseries of snapshots, use the notebook Neural_Operators_Predictions.ipynb. For detailled metrics of the neural operators performances, the notebook Intensity_Measures.ipynb computes Root Mean Squared Error (RMSE), Peak Ground Velocity (PGV), Cumulative Absolute Velocity (CAV), Relative Significant Duration (RSD), and Fourier coefficients in three frequency ranges.