/instance

Primary LanguageJupyter NotebookCreative Commons Attribution 4.0 InternationalCC-BY-4.0

event

DOI

Description

INSTANCE is a dataset of seismic waveforms data and associated metadata suited for analysis based on machine learning. It includes:

  • 54,008 earthquakes for a total of 1,159,249 3-channel waveforms;
  • 132,330 3-channel noise waveforms;
  • 115 precomputed observable quantities providing information on station, trace, source, path and quality;
  • 19 networks;
  • 620 seismic stations.

maps Earthquakes a) and stations b) in INSTANCE. Symbols size are proportional to earthquake magnitude and number of arrival phases recorded by stations, respectively

Events with Magnitude in the range [2-4] wf_c Events selected from HN channel wf_gm Noise selected form HH channel wf_n

How to cite the article journal

Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinović, D., and Lauciani, V., INSTANCE – the Italian seismic dataset for machine learning, Earth Syst. Sci. Data, 13 (12), 5509 – 5544, doi:10.5194/essd-13-5509-2021.

How to cite the dataset

INSTANCE The Italian Seismic Dataset For Machine Learning, Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinović & Valentino Lauciani, Seismic Waveforms And Associated Metadata published 2021 in Istituto Nazionale di Geofisica e Vulcanologia (INGV) https://doi.org/10.13127/instance

Downloads

To get the full INSTANCE dataset you have to download:

  • Events metadata version 2 (csv, 238 MB bz2 file, 1.1 GB after decompression, doi:10.13127/instance/eventsmetadata.2). Fixed the metadata parameter name source_mt_scalar_moment_Nm.

  • Events metadata version 1 (csv, 238 MB bz2 file, 1.1 GB after decompression, doi:10.13127/instance/eventsmetadata.1)

  • Events data in counts as single hdf5 file (39 GB bz2 file, 156 GB after decompression) or 10 GB parts (part-a, part-b, part-c, part-d, doi:10.13127/instance/events.1)

  • Events data in ground motion units as single hdf5 file (151 GB bz2 file, 156 GB after decompression) or 20 GB parts (part-a, part-b, part-c, part-d, part-e, part-f, part-g, part-h). Ground motion units are m/s for HH and EH channels and m/s2 for HN channel, doi:10.13127/instance/groundmotion.1

  • Noise metadata (csv, 6.7 MB bz2 file, 53 MB after decompression, doi:10.13127/instance/noisemetadata.1)

  • Noise data in counts (hdf5, 3.9 GB bz2 file, 18 GB after decompression, doi:10.13127/instance/noise.1)

  • Stations inventory (StationXML, 15 MB)

All the above downloads provide bzip2 compressed files. The multipart files can be reassembled and then unzipped (e.g., for the event data file)

cat  Instance_events_counts.hdf5.bz2.part-* > Instance_events_counts.hdf5.bz2
bzip2 -d Instance_events_counts.hdf5.bz2

A sample dataset of about 1.7 GB is provided to run the notebooks. This contains 10,000 events and 1000 noise waveforms together with the associated metadata. Potentially interested users can evaluate INSTANCE data and metadata without downloading the whole dataset.

Notebooks

The following notebooks provide examples about reading waveforms and metadata of INSTANCE. They refers to the sample dataset; to use them with the full dataset filenames must be changed accordingly.

Plots.ipynb to explore significant parameters distribution in INSTANCE using metadata

Waveforms.ipynb to select and plot 3 channel waveforms

Station_Hypocenter_MomentTensor.ipynb maps about earthquakes included in INSTANCE

Requirements

To run the notebooks please make sure the following packages are properly installed in your environment:

  • obspy
  • jupyter
  • basemap
  • pandas
  • seaborn
  • h5py
  • hdf5

or just create a dedicated environment for INSTANCE

conda create -n instance python=3.7 obspy jupyter basemap pandas seaborn h5py hdf5
conda activate instance
git clone https://github.com/cjunkk/instance
cd instance
curl http://repo.pi.ingv.it/instance/Instance_sample_dataset.tar.bz2 | tar xj

Licence

Creative commons license Attribution 4.0 International (CC BY 4.0)