zhengyang-c/cy1400-eqt

Matching REAL catalogue to USGS catalogue for largest events

Opened this issue · 7 comments

Out of 24 USGS events in the timeframe 1 Jan 2020 to 30 Jun 2021 (right?)

Assuming a 4 second time window (+ and - from the USGS origin time) 7 events match

Assuming a 30 second time window, 8 events match.

The 1 event in balance:
USGS: 2020-09-24 09:34:06 96.68E 5.2464N mb 4.1, 220km, likely plate interface
REAL: 2020-09-24 09:34:35 96.1407 5.4246N ml 4.5, 32km
image

Likely different events. Could be triggered seismicity?
Seems to have its own event cluster while not being in the USGS catalogue

Both of these are offshore to the Northwest:
image

The station coverage here is not fantastic.

image

REAL timestamps are all slightly later than the USGS catalogue timings which is not a serious issue until we do like a precise gridsearch

After generating hypothetical travel times, only considering events of less than 80 km depth in the USGS catalogue

I use a P arrival fudge window of 4 s and S arrival fudge window of 8 s which is very generous, but also because:

  1. I'm not sure what the accuracy of the USGS catalogue origin time is
  2. There will be velocity model errors anyway (Muksin 2019 suggesting that the P wave velocity near the surface is much lower?)

I find that all the events up to May have corresponding phases as recorded by EQT (they made it past the customfilter), which is not surprising since these are the largest magnitude events and you'd imagine it would be difficult to miss them; there are 16 of them.

Of this 16, 7 of them are found in the REAL catalogue.
Of this 16, 3 of them only have 2 or 1 detections s.t. they would not be accepted by the association.

But since this is the customfilter (S SNR > 8, agreement == 20) set, if I look at the merged detections (which by the way drops EQT picks with only 1 phase), there are at least 53 detections (hence 106 phases), along with maybe 1 or 2 more events (could be aftershock, or something else entirely)

Hence, the recommendation from this would be to use the larger data set to run association, with the downside that it would probably take a long time.

I think I'll want to find a good grid spacing, which depends on the station-station distance, which I'll have to find myself. Also, I should want to run tests on a few specific days to check the number of events that REAL produce (which is also partly why I wrote the testbench)

Note: Maximum event-station distance is less than 200km so that will limit the search size for REAL grids

  • Find out what's the smallest grid spacing you're okay with

Speed

  • Vary no. of processors, keeping thread count at 32
  • Vary no. of vertical cells
  • Vary no. of horizontal cells

No. of events

  • Vary grid spacing
  • Vary threshold (unlikely that this is limiting...)

On the bright side, most of the nodes are occupied right now so :) Merry Christmas to me I guess

Along the linear array, you more or less want a grid size of 0.5 km because that's around half of the station distance (~1.3km, ish)

There are some very dense series but that probably doesn't matter too much for the initial association.

Unfortunately the grid search algorithm naively searches in a grid so it's very inefficient.

If you already know what the possible locations are you can have pre-defined templates... for the location.. so it's no longer a naive search.

Test period: 5 Apr to 12 Apr 2021 since that includes 2 events in the USGS catalogue.

Speed benchmark: 5 Apr

I should add an option into REAL wrapper s.t. it automatically distributes the file list over some number of array jobs.

No. of cores benchmark, 5 Apr 21
1 day, 32 OMP threads

16 cores: 9m 40s
8 cores: 9m 51s
4 cores: 13m 50s
2 cores: 13m48s
1 core:13m41s

Grid size of 0.1 deg (horizontal) and 5 km (vertical)
Range of 2 deg (horizontal) and 60 km (vertical)

Settled on ~11 min run time per day with 1 deg, 0.05 deg spacings

Running on Jan to March, all EQT phases (no custom filter) with 16 workers.

Want to add n_worker option to make it even faster.

To compile number of events, check for collisions with the original catalogue.