Toolbox for automated station quality control for MT inversion. Please contact me for further description, help or when something does not behave as expected ;-) : gesap@gfz-potsdam.de
- Catalog search for teleseismic events with uniform azimuthal coverage around array
- Download of data & metadata for these events + computation of synthetic data
- Relative gain factors in time domain (relative to reference station or to synthetic data)
- Rayleigh wave polarization analysis for detection of sensor misorientations
- Comparison of obs. and synth. PSDs; determining frequency ranges suitable for MT inversion
- Test for large timing errors (resolution depends on sampling rate of synthetic data)
- second independent and interactive test for exact and reliable amplitude corrections based on phase picking in snuffler and correlating waveforms
Petersen, G. M., Cesca, S., Kriegerowski, M. (2019): Automated Quality Control for Large Seismic Networks: Implementation and Application to the AlpArray Seismic Network. - Seismological Research Letters. 90 (3): 1177–1190. DOI: http://doi.org/10.1785/0220180342
- Improved feedback and error logging for easier user support.
- Option to generate automated maps without setting mid point and radius manuallyby setting
pl_opt: ['automatic']
in the config file (see example file below.) - data is downsampled to 10 Hz prior to restitution. This is necessary in case of high sampling rates to not introduce filtering errors.
- new independent and interactive test for amplitude corrections based on waveform correlations of teleseismic P phases --> please contact me for detailed instructions, I didn't have time to document the work-flow yet, but it is a great new test ;-)
- new simple test for large timing errors (see below...)
- added debug mode for a nice to gain and orientation test window selection experience;) can also be used to test filter settings
- python3
- Seismology toolbox pyrocko: https://pyrocko.org/ (Heimann et al. 2017) (and all requirements needed for pyrocko)
- To compute synthetic data a pre-calculated GF database can be downloaded using
fomosto
. For instance,fomosto download kinherd global_2s
from the pyrocko environment. https://greens-mill.pyrocko.org/ - for new interactive test: Grond (from the pyrocko suite of applications).
Tested on Ubuntu 16.04 and openSUSE, using matplotlib version 1.5.1 and gmt version 5.4.2 (Wessel et al., 2013).
- cd into the folder where you want to do the installation
- git clone https://github.com/gesape/AutoStatsQ
- cd AutoStatsQ
- (sudo) python3 setup.py install
... AutoStatsQ is updated whenever new ideas are implemented or bugs found...
- cd into your AutoStatsQ installation directory
- git pull origin main
- (sudo) python3 setup.py install
show basic commands/ help:
autostatsq -h
generate an example config file:
autostatsq --generate_config
run
autostatsq --config name_of_config_file --run
To get detailed error/ info logging, use the -l option:
autostatsq --config name_of_config_file --run -l INFO
Helpful for debugging: Forward terminal output into a logging file by using the option --logoutput FILENAME
.
The default config file should look like this (without the comments):
--- !autostatsq.config.AutoStatsQConfig
Settings:
- !autostatsq.config.GeneralSettings
work_dir: /some/data/directory/
list_station_lists: [/path/to/station-file/file.csv, /path/to/station-file/file.xml]
st_use_list: [STATION]
# if set, only stations in this list are considered. remove or set to [] to use all stations
# in station files.
- !autostatsq.config.CatalogConfig
search_events: true
# search gCMT catalog for events?
use_local_catalog: false
# Or use a local (already downloaded) catalog?
# Needed re-runs using same catalog.
subset_of_local_catalog: true
# Find a subset of the full catalog?
use_local_subsets: false
# Use local (already saved) subset instead?
subset_fns: {}
# if so, give here paths to subset-catalog-files: e.g.
# {'deep': 'catalog_deep_subset.txt',
# 'shallow': 'catalog_shallow_subset.txt'}
min_mag: 6.5
max_mag: 8.5
tmin_str: '2000-01-01 00:00:00'
tmax_str: '2018-10-01 00:00:00'
min_dist_km: 4000.0
max_dist_km: 20000.0
depth_options:
deep: [25000, 600000] # [m]
shallow: [100, 40000] # [m]
wedges_width: 15
# backazimuthal step for subset generation
# adjust to get more events, especially if time range is small
mid_point: [46.98, 10.74]
# give a rough estimate of midpoint of array/ network
# optional, if not provided a geographic station midpoint is calculated
### catalog plotting options ###
plot_catalog_all: false
# plots entire catalog on a map
plot_hist_wedges: false
# catalog statistics plot
plot_catalog_subset: false
# plots the subset(s) on a map
- !autostatsq.config.ArrTConfig
calc_first_arr_t: true
# Should first arrivals be computed?
phase_select: P|p|P(cmb)P(icb)P(icb)p(cmb)p|P(cmb)Pv(icb)p(cmb)p|P(cmb)P<(icb)(cmb)p
# which phases?
calc_est_R: true
# compute arrival time of Rayleigh waves for each station-event pair?
# (needed for orientation test)
v_rayleigh: 4.0 # [km/s] default
- !autostatsq.config.MetaDataDownloadConfig
# download of metadata and data
download_data: false
download_metadata: false
use_downmeta: true
# Set to true if downloaded metadata should be used.
# local_metadata: [stations.xml]
# list of local metadata files (uncomment if needed)
# local_data: [./data]
# list with paths to local waveform data (uncomment if needed)
# sds_structure: true
# if the local waveform data is saved in sds structure, set to true!
# otherwise assessing local data might be very slow in case of large amounts of data.
# working on it...
# local_data_only: true
# if only local, no freshly downloaded data is used
channels_download: HH*
# '*' would download all and analyse the most broadband channel for each
# station
token:
geofon: /path/to/token/token.asc
# delete token-dictionary, if no token needed for fdsn query
sites: [geofon, orfeus, iris]
dt_start: 0.1
# start time before origin time [h]
dt_end: 1.5
# end time after origin time [h]
- !autostatsq.config.RestDownRotConfig
# restitution, downsampling and rotation of data
# required for all tests
rest_data: false
freqlim: [0.005, 0.01, 0.2, 0.25] # [Hz]
rotate_data: false
deltat_down: 2 [s]
# set deltat_down to 0.0 if no downsampling is wanted. (This will slow down everything,
# and the PSD-test does only work if the sampling freuqency of synthtic and real data is
# the same.)
- !autostatsq.config.SynthDataConfig
# computation of synthetic data
# needed for PSD-test only, can otherwise be left out
make_syn_data: false
engine_path: /path/to/GF_stores
store_id: global_2s
- !autostatsq.config.GainfactorsConfig
# settings for first test
calc_gainfactors: false
gain_factor_method:
- reference_nsl
- [GE, MATE]
### describe different methods
fband:
corner_hp: 0.01 # [Hz]
corner_lp: 0.2 # [Hz]
order: 4
taper_xfrac: 0.25 # [s]
wdw_st_arr: 5
wdw_sp_arr: 60
# time window around P phase onset, start [s] before and end [s] after theo. arrival time
snr_thresh: 2. # threshold for snr of used event
debug_mode: false # if true, time windows are opened in snuffler to check window settings.
phase_select: first(P|p|P(cmb)P(icb)P(icb)p(cmb)p|P(cmb)Pv(icb)p(cmb)p|P(cmb)P<(icb)(cmb)p)
components: [Z, R, T]
# plotting options
plot_median_gain_on_map: false
plot_allgains: false
- !autostatsq.config.PSDConfig
# settings for PSD test
calc_psd: false
tinc: 600 # [s]
tpad: 200 # [s]
dt_start: 60 # [s] start before arrival of first P phases
dt_end: 120 # [s] end before arrival of Rayleigh waves
n_poly: 25
norm_factor: 50
f_ign: 0.02 # [Hz]
only_first: true # outputs only first "flat" frequency range
plot_psds: false
plot_ratio_extra: false
plot_m_rat: false
plot_flat_ranges: false
- !autostatsq.config.OrientConfig
# settings for orientation test
orient_rayl: false
bandpass: [3.0, 0.01, 0.05] # [Hz]
start_before_ev: 30.0 # start befor theo. Rayleigh wave arrival, [s]
stop_after_ev: 480.0 # end after theo. Rayleigh wave arrival, [s]
ccmin: 0.80
# min. cross-correlation value. results below this value will not be
# considered
debug_mode: false
# if true, time windows are opened in snuffler to check window settings.
plot_heatmap: false
# plot correction angle vs. cross-correlation value as imshow heatmap
# usually distibution plot is better.
plot_distr: false
# plot correction angle vs. cross-correlation value
# usually distibution plot is better.
plot_orient_map_fromfile: false
# plot a map with correction angles as lines
plot_angles_vs_events: false
# plot angle vs single events, one plot for each station
- !autostatsq.config.TimingConfig
# simple test for large timing errors (> 2s)
timing_test: false
bandpass: [3, 0.01, 0.1]
time_wdw: [firstP, 1200]
# needs a long time window for correlation
cc_thresh: 0.6
# test appropriate setting with debug mode, depends on frequency range
search_locations: false
# uses all stations in station list if false, otherwise all found
# in traces are used
debug_mode: false
# starts in interactive mode in snuffler showing the traces and the obtained
# cross correlation function
- !autostatsq.config.TeleCheckConfig
tele_check: false
- !autostatsq.config.maps
# settings for all output maps
map_size: [30.0, 30.0]
pl_opt: [46, 11.75, 800000]
# mid point of map (lat, lon) and radius [m]
# to use automated map dimensions:
# pl_opt: ['automatic']
pl_topo: false
# plotting topography can be very slow,
# topographic data will be downloaded first
Lists of stations as input can be in pyrocko station format, as station-xml or as comma-spread-file with columns: network code, station code, latitude (float), longitude(float), station elevation [km], station depth [km]. Please use the file extensions
- make a working directory
- prepare station list for input
- generate a template config file (
autostatsq --generate_config GENERATE_CONFIG
) and adjust the settings - it might be helpful to not run the toolbox in one run, but go though it step-by-step, setting the current step to
true
and the others tofalse
... (autostatsq --config my_config_file --run RUN
)
Gain test:
- yaml files with median, mean and standard deviation of Ai,j/Ai,ref; i: event, j: station
- csv files with Ai,j/Ai,ref for all events and all stations
- optional: maps showing median log. Ai,j/Ai,ref for all stations and components
- optional: ...
Orientation test:
- yaml file with median, mean and standard deviation of obtained correction angle
- yaml file with polarity errors
- yaml file containing all results for all stations and single events
- optional: map showing the median correction angle; figures showing cc value vs correction angle for each event and station; figure showing correction angle over time for each station
PSD test:
- flat freq ranges in yaml file for each station and component
- optional: synth. and real PSDs
- optional: plots showing fit through PSD ratios
--> The other tests are described in detail here: http://doi.org/10.1785/0220180342
The small implemented check for timing errors is based on the cross-correlation between the recorded traces and the synthetic vertical traces: (1) First, for each event and station the two (syn. + obs.) traces are correlated to obtain the time shift for which the correlation is highest. (2) In a second step the median time shift of each event over all stations is determined and the time shift values at the single stations are corrected for this median value. This is done to avoid errors from wrong origin times in the catalog, to take into account large deviations between orgin time and centroid time, and to consider large path effects of the teleseismic test events which effect all stations in a similar manner.
The test is run in low frequency ranges (e.g. 0.01-0.10 Hz) and using synthetics computed from a global GFDB with a sampling of 2s. Therefore this test can only be applied to detect large timing errors in the order of several seconds. This error range is only useful to check prior to other seismological applications which use a similar frequency range as e.g. MT inversions.
The output is returned as a yaml file with mean, median, standard deviations and number of considered events. Additionally one figure shows the correlation matrices, before and after correction with the median of each event, and a second figure shows the obtained timing errors for each station.
For the input parameters please check the exemplary config file above.
Wessel, P., Smith, W. H.~F., Scharroo, R., Luis, J.~F., and Wobbe, F. (2013). Generic Mapping Tools: Improved version released. EOS Trans. AGU, 94:409--410.
Heimann, S., Kriegerowski, M., Isken, M., Cesca, S., Daout, S., Grigoli, F., Juretzek, C., Megies, T., Nooshiri, N., Steinberg, A., Sudhaus, H., Vasyura-Bathke, H., Willey, T., and Dahm, T. (2017). Pyrocko - an open-source seismology toolbox and library. v. 0.3. GFZ Data Services. http://doi.org/10.5880/GFZ.2.1.2017.001.