INSTANCE is a dataset of seismic waveforms data and associated metadata suited for analysis based on machine learning. It includes:
- 54,008 earthquakes for a total of 1,159,249 3-channel waveforms;
- 132,330 3-channel noise waveforms;
- 115 precomputed observable quantities providing information on station, trace, source, path and quality;
- 19 networks;
- 620 seismic stations.
Earthquakes a) and stations b) in INSTANCE. Symbols size are proportional to earthquake magnitude and number of arrival phases recorded by stations, respectively
Events with Magnitude in the range [2-4] Events selected from HN channel Noise selected form HH channel
Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinović, D., and Lauciani, V., INSTANCE – the Italian seismic dataset for machine learning, Earth Syst. Sci. Data, 13 (12), 5509 – 5544, doi:10.5194/essd-13-5509-2021.
INSTANCE The Italian Seismic Dataset For Machine Learning, Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinović & Valentino Lauciani, Seismic Waveforms And Associated Metadata published 2021 in Istituto Nazionale di Geofisica e Vulcanologia (INGV) https://doi.org/10.13127/instance
To get the full INSTANCE dataset you have to download:
-
Events metadata version 2 (csv, 238 MB bz2 file, 1.1 GB after decompression, doi:10.13127/instance/eventsmetadata.2). Fixed the metadata parameter name source_mt_scalar_moment_Nm.
-
Events metadata version 1 (csv, 238 MB bz2 file, 1.1 GB after decompression, doi:10.13127/instance/eventsmetadata.1)
-
Events data in counts as single hdf5 file (39 GB bz2 file, 156 GB after decompression) or 10 GB parts (part-a, part-b, part-c, part-d, doi:10.13127/instance/events.1)
-
Events data in ground motion units as single hdf5 file (151 GB bz2 file, 156 GB after decompression) or 20 GB parts (part-a, part-b, part-c, part-d, part-e, part-f, part-g, part-h). Ground motion units are m/s for HH and EH channels and m/s2 for HN channel, doi:10.13127/instance/groundmotion.1
-
Noise metadata (csv, 6.7 MB bz2 file, 53 MB after decompression, doi:10.13127/instance/noisemetadata.1)
-
Noise data in counts (hdf5, 3.9 GB bz2 file, 18 GB after decompression, doi:10.13127/instance/noise.1)
-
Stations inventory (StationXML, 15 MB)
All the above downloads provide bzip2
compressed files. The multipart files can be reassembled and then unzipped (e.g., for the event data file)
cat Instance_events_counts.hdf5.bz2.part-* > Instance_events_counts.hdf5.bz2
bzip2 -d Instance_events_counts.hdf5.bz2
A sample dataset of about 1.7 GB is provided to run the notebooks. This contains 10,000 events and 1000 noise waveforms together with the associated metadata. Potentially interested users can evaluate INSTANCE data and metadata without downloading the whole dataset.
- Sample dataset vesrion 2 (1.7 GB bz2 file, 2.74 GB after decompression). Fixed the metadata parameter name source_mt_scalar_moment_Nm.
- Sample dataset version 1 (1.7 GB bz2 file, 2.74 GB after decompression)
The following notebooks provide examples about reading waveforms and metadata of INSTANCE. They refers to the sample dataset; to use them with the full dataset filenames must be changed accordingly.
Plots.ipynb
to explore significant parameters distribution in INSTANCE using metadata
Waveforms.ipynb
to select and plot 3 channel waveforms
Station_Hypocenter_MomentTensor.ipynb
maps about earthquakes included in INSTANCE
To run the notebooks please make sure the following packages are properly installed in your environment:
- obspy
- jupyter
- basemap
- pandas
- seaborn
- h5py
- hdf5
or just create a dedicated environment for INSTANCE
conda create -n instance python=3.7 obspy jupyter basemap pandas seaborn h5py hdf5
conda activate instance
git clone https://github.com/cjunkk/instance
cd instance
curl http://repo.pi.ingv.it/instance/Instance_sample_dataset.tar.bz2 | tar xj
Creative commons license Attribution 4.0 International (CC BY 4.0)