iris-edu/irisws-syngine

Set a smaller default dt

krischer opened this issue · 8 comments

We mentioned this before but I think this is really important so I'll bring it up once again here.

Please consider setting the default dt to a smaller value to avoid errors that will likely not get noticed by users.

The current version of the syngine service by default returns seismograms with the sampling rate of the database. That sampling rate is naturally as low as possible to save disc space. If people request velocity seismograms everything is good. When people request either displacement or acceleration we have to either numerically integrate or differentiate. These operations are not accurate and perform at their worst for frequencies close to Nyquist.

Thus significant errors are introduced if people just requests seismograms (in displacement/acceleration) without manually specifying a smaller dt. For small dt values Instaseis internally will resample before it performs the differentiation/integration thus avoiding this problem.

Only very few people will be aware of that and to avoid people requesting bad seismograms I propose to set the default dt to a tenth of the database sampling interval. The data is then effectively oversampled by a factor of 10 which puts more load on IRIS's connection but the error is no longer an issue.

In the following plot the green and dashed black line should be identical but they are not. The reason for the difference is that the differentiation acts as a strong lowpass filter.

screen shot 2015-11-20 at 13 26 36

See here for a more detailed explanation: http://nbviewer.ipython.org/gist/krischer/a0260437b675ba2c1993

Thank you Lion. I'm not sure why this escaped us, but it absolutely makes sense and we'll try to get this in place.

Model-specific default values for dt are now used.
The default values are listed in the table here: http://ds.iris.edu/ds/products/syngine/#models and they are also included in the http://service.iris.edu/irisws/syngine/1/models (which is currently broken due to improper JSON generation for the info route).

Below are the acceleration waveform comparisons before and after, without the "reconstructed" trace. Is this what is expected?

Before default dt ("large dt" means system default):
defaultdt-before

After default dt ("large dt" means system default):
defaultdt-after

Here is the Pwave for an acceleration trace using a 2s resolution model (dt=0.48s). Black is dt=0.05, red is dt=0.1. It's the same image just zoomed in differently on the Pwave, which will have the most high frequency content. As Lion shows in the example, velocity and displacement will look even more similar. So I think, yes we should set a default dt lower than the database's, but I'm OK with making it a fifth rather than a tenth. The discrepancy in amplitudes in the example is at most 1.3%. Going to a fifth will reduce the volume streamed to users by 2x, a worthy tradeoff.

dt_granularity

Below are the acceleration waveform comparisons before and after, without the "reconstructed" trace. Is this what is expected?

Yes. In this case I guess they are actually exactly the same requests due to the new default value. But that is exactly what we want.

👍 That also looks good from my point of view.

For making it a fifth: I think the actual error is even smaller than 1.3 % if you reconstruct/resample the lower frequency one to the higher frequency one. Your 1.3 % will matter if people use something like sac's interpolate method which is really quite bad for that purpose.

I also think its a worthy trade-off but I'm no amplitude person. @tnissen, @sstaehler: opinions?

I chose the tenth due to this: http://www.holoborodko.com/pavel/numerical-methods/numerical-derivative/central-differences/

We are at N=3 so if we go to a tenth of the original period there is almost no filtering effect.

@krischer Do you agree that 1/5 of the database sampling interval is OK? How did you come to the 1/10 recommendation?

We do not want to generate significantly more volume of data unnecessarily. OTOH if it's needed then it's needed.

I'm fine with the 1/5 - I think that's a worthy trade-off. But I would also like to hear some other opinions.

See the link in my previous comment why I chose the 1/10.

Sorry, I had an edit window open and hadn't see your response. Other opinions would be good so we'll wait before changing any defaults.

Defaults set to 1/5th the database sampling.