Matching REAL catalogue to USGS catalogue for largest events
Opened this issue · 7 comments
Out of 24 USGS events in the timeframe 1 Jan 2020 to 30 Jun 2021 (right?)
Assuming a 4 second time window (+ and - from the USGS origin time) 7 events match
Assuming a 30 second time window, 8 events match.
The 1 event in balance:
USGS: 2020-09-24 09:34:06 96.68E 5.2464N mb 4.1, 220km, likely plate interface
REAL: 2020-09-24 09:34:35 96.1407 5.4246N ml 4.5, 32km
Likely different events. Could be triggered seismicity?
Seems to have its own event cluster while not being in the USGS catalogue
Both of these are offshore to the Northwest:
The station coverage here is not fantastic.
After generating hypothetical travel times, only considering events of less than 80 km depth in the USGS catalogue
I use a P arrival fudge window of 4 s and S arrival fudge window of 8 s which is very generous, but also because:
- I'm not sure what the accuracy of the USGS catalogue origin time is
- There will be velocity model errors anyway (Muksin 2019 suggesting that the P wave velocity near the surface is much lower?)
I find that all the events up to May have corresponding phases as recorded by EQT (they made it past the customfilter), which is not surprising since these are the largest magnitude events and you'd imagine it would be difficult to miss them; there are 16 of them.
Of this 16, 7 of them are found in the REAL catalogue.
Of this 16, 3 of them only have 2 or 1 detections s.t. they would not be accepted by the association.
But since this is the customfilter (S SNR > 8, agreement == 20) set, if I look at the merged detections (which by the way drops EQT picks with only 1 phase), there are at least 53 detections (hence 106 phases), along with maybe 1 or 2 more events (could be aftershock, or something else entirely)
Hence, the recommendation from this would be to use the larger data set to run association, with the downside that it would probably take a long time.
I think I'll want to find a good grid spacing, which depends on the station-station distance, which I'll have to find myself. Also, I should want to run tests on a few specific days to check the number of events that REAL produce (which is also partly why I wrote the testbench)
Note: Maximum event-station distance is less than 200km so that will limit the search size for REAL grids
- Find out what's the smallest grid spacing you're okay with
Speed
- Vary no. of processors, keeping thread count at 32
- Vary no. of vertical cells
- Vary no. of horizontal cells
No. of events
- Vary grid spacing
- Vary threshold (unlikely that this is limiting...)
On the bright side, most of the nodes are occupied right now so :) Merry Christmas to me I guess
Along the linear array, you more or less want a grid size of 0.5 km because that's around half of the station distance (~1.3km, ish)
There are some very dense series but that probably doesn't matter too much for the initial association.
Unfortunately the grid search algorithm naively searches in a grid so it's very inefficient.
If you already know what the possible locations are you can have pre-defined templates... for the location.. so it's no longer a naive search.
Test period: 5 Apr to 12 Apr 2021 since that includes 2 events in the USGS catalogue.
Speed benchmark: 5 Apr
I should add an option into REAL wrapper s.t. it automatically distributes the file list over some number of array jobs.
No. of cores benchmark, 5 Apr 21
1 day, 32 OMP threads
16 cores: 9m 40s
8 cores: 9m 51s
4 cores: 13m 50s
2 cores: 13m48s
1 core:13m41s
Grid size of 0.1 deg (horizontal) and 5 km (vertical)
Range of 2 deg (horizontal) and 60 km (vertical)
Settled on ~11 min run time per day with 1 deg, 0.05 deg spacings
Running on Jan to March, all EQT phases (no custom filter) with 16 workers.
Want to add n_worker option to make it even faster.
To compile number of events, check for collisions with the original catalogue.