adjtomo/pyatoa

Resample can introduce numerical noise

Closed this issue · 2 comments

rdno commented

Description of the problem

Hi,

During 2D synthetic seisflows runs in class, we encountered a weird issue with the adjoint sources. Waveform adjoint was computed as non-zero where it should be zero. I think I have pinpointed an issue with the standardize method. Since the synthetic examples have the same sampling rate, when the obsd synthetic is resampled it introduces some numerical noise. While computing adjoint source, it produces some undesirable effects.

Here obsd and synt has the same first arrivals. Hence, the adjoint source should be zero for it.

pyatoa_num_adj

If I load the same file for both obsd and synt, the effect can be seen more clearly.

pyatoa_num

If I add the following line at https://github.com/adjtomo/pyatoa/blob/master/pyatoa/core/manager.py#L730 to avoid the resampling, it fixes the issue.

        # avoid resampling, if they have the same sampling rate.
        if self.st_syn[0].stats.sampling_rate != self.st_obs[0].stats.sampling_rate: 
            if standardize_to == "syn":
                self.st_obs.resample(self.st_syn[0].stats.sampling_rate)
            else:
                self.st_syn.resample(self.st_obs[0].stats.sampling_rate)

pyatoa_num_adj_fixed

This might be fine fix for an edge case and I can't think of a down side. But you can also think about using interpolate which might be more accurate, especially if you can combine it with the trimming.

I am using the current master branch version.
In case, you want to test it yourself, here are the files I was using. data.zip

Minimal Complete Verifiable Example

import obspy
import numpy as np

def read_ascii_trace(filename):
    data = np.loadtxt(filename)
    dt = data[:, 0][1]-data[:, 0][0]
    stats = {"delta": dt, "channel": "BXY", "b": data[0, 0]}
    return obspy.Trace(data[:, 1], stats)



import pyatoa
from pyatoa import logger
logger.setLevel("DEBUG")

conf = pyatoa.Config(min_period=0.05, max_period=100.,
                     adj_src_type="waveform",
                     pyflex_preset=None,
                     component_list="Y",
                     st_obs_type="syn")

obsd = obspy.Stream(read_ascii_trace("obsd.semd"))
# synt = obspy.Stream(read_ascii_trace("obsd.semd")) # to test the same file
synt = obspy.Stream(read_ascii_trace("synt.semd"))

mgmt = pyatoa.Manager(config=conf,
                      st_obs=obsd,
                      st_syn=synt,
                      windows=None)

# mgmt.standardize(standardize_to="syn")
mgmt.flow()
mgmt.plot(choice="wav")
# print(mgmt.st_obs[0].data)
# print(mgmt.st_syn[0].data)

Full error message

No error message.

System information

Using linux with the current master branch.
bch0w commented

Hi @rdno, sorry for the late response I'm just getting back into coding after a few weeks in and out of the office.

That's a great catch! Thanks for finding and bringing this up. I was able to reproduce this with data in the test directory, and have created a new test to keep an eye on this behavior. I would have hoped ObsPy had some check statement to not resample data that matches the requested sampling rate, but it looks like it just goes for it. Maybe there is some sub-sample shifting going on, causing a very small time shift which is exacerbated by the preprocessing and eventually picked up by the misfit measure.

Here's two figures from the internal test data. I think this is a good lesson on the dangers of introducing time series manipulations when you don't need them!

Figure_1
Figure_2

Your suggested fix has things working correctly. I'll open up a PR with your suggested change and push to the latest devel branch of the code. Like you mention, perhaps in the future interpolate might be preferable to the current resample function.

rdno commented

Hi @bch0w,

Glad to see it is fixed. I think we can close this issue now :)