uafgeotools/mtuq

Error in using FK metadata for Green's functions

ammcpherson opened this issue · 19 comments

There seems to be a point in MTUQ where, while the data is being read in (for example), the event depth is read in from the SAC headers. Later, when the data processing class is mapped onto the data, a np.ceil operation is applied to this event depth.

This causes problems when using local FK databases if said databases contain a boundary at that depth. For example, if I have an event at depth 8850 meters, and MTUQ pulls that depth from the SAC header and rounds up to 9000 meters, then MTUQ will fail if I am using a FK database with a boundary at 9000 meters depth. The error it throws is:

 File "run_mtuq.py", line 181, in <module>
    data_bw = data.map(process_bw)
  File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/dataset.py", line 143, in map
    processed += [function(stream, *args)]
 File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/process_data.py", line 497, in __call__
    sac_headers = obspy.read('%s/%s_%s/%s.grn.0' %
  File "/home/ammcpherson/.conda/envs/mtuq/lib/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/ammcpherson/.conda/envs/mtuq/lib/python3.8/site-packages/obspy/core/util/decorator.py", line 300, in _map_example_filename
    return func(*args, **kwargs)
  File "/home/ammcpherson/.conda/envs/mtuq/lib/python3.8/site-packages/obspy/core/stream.py", line 212, in read
    st = _generic_reader(pathname_or_url, _read, **kwargs)
  File "/home/ammcpherson/.conda/envs/mtuq/lib/python3.8/site-packages/obspy/core/util/base.py", line 700, in _generic_reader
    raise IOError(2, "No such file or directory", pathname)
FileNotFoundError: [Errno 2] No such file or directory: '/store/wf/FK_synthetics/scak/scak_9/69.grn.0'

I'm not sure how to address this, but currently it means there are events that are simply unusable in MTUQ using local FK databases. I have run into at least two of them so far. Perhaps if the function instead took the event depth from the Origin object we set up earlier in the code, so that the user can fudge the depth as needed to account for these boundaries?

Thanks.

Hi Amanda,

I think I see what's going on, but please respond further if I'm missing anything.

The problem doesn't seem to happen when the Green's functions are originally read in. Indeed, for the Green's function reader, origin is a required input argument, so it's clear what origin is being passed.

The problem occurs instead when process_data is called. In this case, origin is an optional input argument. Maybe the thinking was that for many data processing operations, e.g. bandpass filtering, origin information is not required. But it turns out that some data processing operations involve on-the-fly calculations using origin information. If the optional origin argument is not supplied, then it seems the code defaults to the catalog origin.

So maybe the simplest thing to do is to pass the origin directly to the process_data function.

There are probably other less direct workarounds too, like extending the FK database or using a taup model instead of FK metadata for the takeoff angle calculation.

Please let us know if a workaround is found, or if not and the issue remains unresolved.

thanks,
ryan

Hi Ryan,

Passing the origin to ProcessData yields the same error:

process_bw = ProcessData(
        filter_type='Bandpass',
        freq_min= 0.25,
        freq_max= 0.6667,
        pick_type='FK_metadata',
        FK_database=fk_path,
        window_type='body_wave',
        window_length=15.,
        capuaf_file=path_weights,
        origin = catalog_origin,
    )

If I instead try to put it in data.read:

data = read(path_data, 
                    format='sac',
                    origin=catalog_origin,
                    event_id=eid,
                    station_id_list=station_id_list,
                    tags=['units:cm', 'type:velocity'])

I get TypeError: read() got an unexpected keyword argument 'origin'.

Finally, if I try to put it in data.map:

data_bw = data.map(process_bw,origin=catalog_origin)

I get the same TypeError:

TypeError: map() got an unexpected keyword argument 'origin'

Can you also offer some clarity on why MTUQ is trying to read in a Green's Function to process the data initially? This happens before greens = db.get_greens_tensors(stations, origins) is called.

Thanks,
Amanda

I think you could do the data processing in the following way:

for stream in data_bw:
    processed_stream = process_bw(stream, origin=origin)

Alternatively, you could follow the syntax for map which requires an iterable for each input argument.

I think FK Green's function are being read during data processing because you selected pick_type = 'FK_metadata'. With this option, P and S travel times are read from FK sac headers. Another workaround could be to use pick_type = taup instead.

Trying your code snippet, I get: TypeError: 'ProcessData' object is not iterable, which seems to make sense to me since process_bw is the function being applied to the iterable data using map.

Can you point me toward where MTUQ is reading in the SAC event depth and I will put a Try-Catch statement in my fork to deal with this locally?

I think something like the following should work:

for stream in data_bw:
    processed_stream = process_bw(stream, origin=origin)

I could be making a small typo though. I don't follow how you get TypeError: 'ProcessData' object is not iterable.

The SAC depth header is read from the observed data by io/readers/SAC.py.

Here is what I have in my run script:

print('Reading data...\n')
        data = read(path_data, 
                    format='sac',
                    event_id=eid,
                    station_id_list=station_id_list,
                    tags=['units:cm', 'type:velocity']) 


        data.sort_by_distance()
        stations = data.get_stations()


        print('Processing data...\n')
        for stream in data:
            processed_stream = process_bw(stream,origin=catalog_origin)
            
        data_bw = data.map(processed_stream)

Which yields:

Traceback (most recent call last):
  File "run_mtuq.py", line 187, in <module>
    data_bw = data.map(processed_stream)
  File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/dataset.py", line 143, in map
    processed += [function(stream, *args)]
TypeError: 'Stream' object is not callable

Thanks for the detailed info, which helps a lot.

Maybe it could be easier to talk through, just to avoid so many emails. If you like, call me or send a Zoom link, using the contact info just emailed. Or if it is not urgent now, feel free to wait until later.

I think we are getting very close.

Hi Ryan,

Per our discussion, I added the following lines of code to my run script:

data_bw = Dataset()
data_sw = Dataset()
for stream in data:
    processed_stream_bw = process_bw(stream,origin=catalog_origin)
    data_bw += processed_stream_bw

    processed_stream_sw = process_sw(stream,origin=catalog_origin)
    data_sw += processed_stream_sw

Which solved the initial error from my FK database being built off of a velocity model with a horizontal discontinuity at the catalog event depth.

However, this caused an error later in my script when I attempted to perform the grid search (results_bw = grid_search(data_bw, greens_bw, misfit_bw, origins, grid)):

File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/misfit/waveform/__init__.py", line 176, in __call__
    raise Exception("Inconsistent container lengths\n\n  "+
Exception: Inconsistent container lengths

  len(data): 50
  len(greens): 27

I went on to add the following lines of code to process the GF's similarly to the data:

greens_bw = Dataset()
greens_sw = Dataset()
for stream in greens:
    processed_greens_bw = process_bw(stream,origin=catalog_origin)
    greens_bw += processed_greens_bw

    processed_greens_sw = process_sw(stream,origin=catalog_origin)
    greens_sw += processed_greens_sw

This solved the container length problem, but threw another error when attempting the grid search:

File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/grid_search.py", line 130, in grid_search
    results_bw = grid_search(
File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/util/__init__.py", line 232, in timed_func
    values = _grid_search_serial(
File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/grid_search.py", line 172, in _grid_search_serial
    return func(*args, **kwargs)
File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/dataset.py", line 100, in select
    data, greens.select(origin), sources, msg_handle)]
File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/dataset.py", line 33, in __init__
    return self.__class__(
  File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/dataset.py", line 91, in <lambda>
    selected = lambda stream: stream.origin==selector
AttributeError: 'Trace' object has no attribute 'origin'

Thanks,
Amanda

Does the following give any improvement?

from mtuq import GreensTensorList
greens_bw = GreensTensorList()
greens_sw = GreensTensorList()
for stream in greens:
    processed_greens_bw = process_bw(stream,origin=catalog_origin)
    greens_bw += processed_greens_bw

    processed_greens_sw = process_sw(stream,origin=catalog_origin)
    greens_sw += processed_greens_sw

The end error is essentially the same:

File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/greens_tensor/base.py", line 274, in <lambda>
    lambda tensor: tensor.origin==selector, self))
AttributeError: 'Trace' object has no attribute 'origin'

Both observed data and Green's functions are stored in list-like containers, which contain other nested lists.

Observed data should be stored in the following way

mtuq Dataset
    ObsPy Stream objects
        ObsPy Trace objects

Green's functions should be stored in the following way

mtuq GreensTensorList
    mtuq GreensTensor objects
        ObsPy Trace objects

The exception occurs because a Stream object is expected, but a trace object is actually being received.

So for some reason the nesting is getting messed up.

To debug, you could add print type(...) statements throughout the script.

It could be a good debugging exercise, giving experience with a common Python pitfall.

In other words, the difference between

greens_sw += processed_greens_sw

and

greens_sw += [processed_greens_sw]

can be a common Python pitfall.

See also the difference between append() and extend() methods.

For example, what do you get from the following?

print(type(process_bw(greens_tensor,origin=catalog_origin)))

Perhaps a fix might be something like

greens_bw = GreensTensorList()
for greens_tensor in greens:
    processed_greens_bw = process_bw(stream,origin=catalog_origin)
    greens_bw += GreensTensor(processed_greens_bw)

After this snippet of code:

greens.convolve(wavelet)
print(type(process_bw(greens,origin=catalog_origin)))

I get the following attribute error:

Traceback (most recent call last):
  File "run_mtuq.py", line 212, in <module>
    print(type(process_bw(greens,origin=catalog_origin)))
  File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/process_data.py", line 365, in __call__
    station.latitude,
AttributeError: 'NoneType' object has no attribute 'latitude'

And after trying your proposed fix:

Traceback (most recent call last):
  File "run_mtuq.py", line 219, in <module>
    greens_bw += GreensTensor(processed_greens_bw)
  File "/home/ammcpherson/REPOSITORIES/mtuq/mtuq/greens_tensor/base.py", line 48, in __init__
    raise TypeError
TypeError

I also tried greens_sw += [processed_greens_sw] and got a similar error to what initially prompted me to process the Green's Functions similarly to the data:

Exception: Inconsistent container lengths

  len(data): 50
  len(greens): 27

After looking at it closely just now, I think the troubleshooting suggestions above were on the right track. It's possible to do what you want in a few code lines-- it's just tricky to become familiar with Python syntax.

I'll put together a working example. In the meantime, continue troubleshooting if you like. I think the tracebacks above might start to make sense if you add some print(type(...)) debugging statements .

Attached is a working example, where I've taken one of the default examples, and replaced the data processing map() calls with for loops.
working_example.txt

Hi Ryan,

Thanks for your hard work on this - I tried out the new loops and all seems well. I tested it with a fixed magnitude and depth on one of my problem events just now and arrived at a solution!

Just for posterity's sake, I would like to reiterate what the problem is succinctly, for whenever someone (maybe me) gets a chance to look more closely at the problem.

  1. Suppose you have an FK database that you would like to compute travel times and synthetics from, and that FK database is based on a velocity model that has a horizontal discontinuity at some depth (or at several depths). That directory in the database will be empty.
  2. When you choose to use that FK database to compute the travel times for the P and S waves, MTUQ takes the event depth from the SAC header, and returns the ceiling of that depth, which may correspond to the depth of a horizontal discontinuity. There is no way to know these depths without the parameter file.
  3. MTUQ then attempts to read in files from an empty directory, and thus crashes.

Potential hard-coded solutions, as I see them:

  1. MTUQ does not read in the event depth from the SAC headers at all, and instead takes the Origin depth specified by the user, potentially throwing a warning and offering an opportunity in the command line to change the depth. Otherwise, the user can change the depth in their Origin object at the beginning, if they know the depths of the discontinuities.
  2. Use os.listdir to check if the directory is empty before attempting to read in files from it. If the directory is empty, choose the depth immediately above or below.

Thanks again!

Very helpful to have this recap, thanks!

As you were saying, the issue arises from a quirk of FK, in which gaps are present in the Green's function databases corresponding to layer discontinuities. The two workarounds suggested seem very good.

Other workarounds entirely on the Green's function side might be possible. For example, use soft links to fill in the gaps in FK databases. Or use AxiSEM, which provides actual depth interpolation. (although as we've seen , AxiSEM has quirks too!)

In any case, good to know about this issue!