ATaCR: A MATLAB repository from eilonzach

Based on an email from Helen Janiszewski dated 2/2/18

The first step to running the code is to have the data properly formatted. For the noise data, the package expects daily time series files. In theory it should be able to accept different lengths of data, but I haven't tested it and that might require changing the quality control parameters a bit. The script download_data.m can be run, which uses the matlab interface with irisFetch to download the data for a particular station or network; or you may have data downloaded already that you would like to use. In the two NOISETC_SAMPLE folders, the downloaded raw data is saved in the folder DATA/datacache_day.

The next step is to preprocess the data. There's flexibility in how you do this, but the important thing is that the time series that you are using to calculate the transfer functions must be processed identically to the event data when you apply the corrections. The script daydata_preprocess.m is set up for standard preprocessing steps. Typically I downsample the data (make sure that half your Nyquist frequency is above the maximum frequency you are interested in) - you want the sample rate to be as low as possible for the frequencies you are interested in (the saved spectra files will be huge if you use a high sample rate, which is fine as long as you are prepared for the file size; I typically use 5 sps for surface waves and went up to 25 sps when I tried it for receiver functions/body waves). I remove the response unless there is an error, which includes putting a very long period high pass filter on the data. There's an option for gain adjustment as well. Again in the two NOISETC_SAMPLE folders, the processed day data is saved in the folder DATA/datacache_day_preproc.

If you prefer to use your own tools to process your data, you just need to make sure that you have the folder and file structure set up to match the example files in that folder. Also, for the transfer functions the sample rate of all four channels needs to be equal. There's a check for this in the later scripts, and it will automatically downsample the data to the greatest common factor of all the sample rates, but is generally easier to set up prior to starting the code.

For the event data (aka data that you want to apply the corrections to, such as earthquake data), the download and processing codes are download_event.m and eventdata_preprocess.m and example data and folder structure are in the DATA/datacache and DATA/datacache_preproc folders. One last caveat is that the length of your "event" data files should equal the length of the window of data used to calculate the spectra (which will be specified in the next set of steps). The default is that the code will break the day up into overlapping windows 7200 s in length. You can change this, but obviously the window length will affect frequency resolution if you make it too short. I would recommend keeping it at 7200 s if possible, and if you are planning on using event data that is shorter, I would just center the time period of interest in the 7200 s window and cut it after the corrections have been applied. A taper is applied to the time series during the corrections, so you don't want any signals of interest too close to the edges.

At this point, all the data should be downloaded and processed and now the actual noise and tilt and compliance corrections are calculated. You can skip to here if you have data downloaded and set up already.

For the noise package:

setup_parameter will set up all the parameters for the code (with the exception of some of the event correction parameters which are set up in that specific script). For a first run I would only change the directory, channel names, etc. and the TF list for whichever transfer functions you want to calculate. Probably don't change any of the other numerical parameters until you can evaluate if the code is working properly. Everything in the code should be commented, but let me know if any of the variable descriptions are unclear.

dailystaspectra is the first code you will run. This is set up to run for a single station and loop over all day files. For each station it will calculate daily spectral properties and save mat files. These files can be pretty large, so it's important to not use a sample rate higher than what you need. Toggles for plotting and overwriting are at the top of the script. The two plot types are spectrograms, which will let you check that it is throwing away bad time windows for a given day, and the daily spectra. For starting out these may be worth looking at, but this will spit out a ton of plots so you probably don't want to have those turned on all the time. The first step of quality control is applied in this script, parameters specified in setup_parameter.

cleanstaspectra is the second code.This also is set up to run for a particular station. Once all the daily spectra have been calculated for a particular station, you will run this code which performs a second step of quality control throwing out spectra from days that look different than the average deployment. For both this and the quality control in the dailystaspectra... the parameters are in the setup file. Like I said, I wouldn't change anything there to begin with, but I would definitely recommend that the QC parameters for this step are more lenient than for the daily window QC (since it's reasonable to expect there may be some long term variation in noise parameters over a deployment related to seasons, instrument re-leveling, etc). This script will also calculate and save an average spectra properties file for each station in a separate folder. For this to be accurate you will want at least a relatively diverse sample of days to calculate the noise spectra from over the entire deployment, but all days are not necessary. Again plotting can be turned on or off... this will only make 4 plots per station so much more reasonable. I recommend plotting here being turned on.

One important thing to note about these first two codes: the dailystaspectra will not be able to flag days where the majority looks bad either due to instrument failure or many earthquakes happening (since the quality control is based on differences from the mean, the mean will no longer represent noise for such days). This is where the cleastaspectra comes in to find and flag such days. But if your station fails for say half your deployment, the cleanstaspectra is going to run into problems for the same reason. In that case, you should go into the SPECTRA folder, manually delete the files for the days where you know the station has failed, and rerun the cleanstaspectra script.

At this point you can also make bad component text files (see examples in the NOISETC_SAMPLE folder). Should be formatted networkcodestationname. These aren't used until the correctevent code, but if you see a particular station where a component is failed for the entire deployment you may want to make a note of it. As of right now, these are made manually and are mainly used just to keep track of potentially bad corrections (e.g. if your H1 component is bad, all tilt corrections will be useless). The code will check if these files exist, so you can just avoid making the files if all your stations look great.

In theory these first two codes should look for four components but still run if a component is missing (e.g. you can get spectral properties for a land station that doesn't have a pressure gauge). I think the transfer function code (next) should work this way as well, although it might break if you ask it to calculate a TF for a component that doesn't exist... I haven't tested it yet.

So.... at this point, you should have run the first two codes for all stations you are interested in.

calctransfunc is the next code. This code loops over all the stations for which spectra have been calculated. It will calculate the TFs you have specified in the list. See setup_parameters for a list of transfer function code names that this script is set up to recognize. It works for either a 1 component transfer function (e.g. ratio of Z to P) or 3 component (e.g. ratio of Z to P with H1 and H2 already taken into account). But a 2 component that is a step of the 3 component can work (e.g. Z2-1 is possible as long as ZP-21 is also in that vector but not by itself). You should be able to put weird ones in too, such as 1P which would remove the compliance noise from the H1 component. The rotational tilt correction is also implemented (e.g. Bell et al., 2015) but that is hardwired to only accept Z component corrections.

correctevent is the last code. This will go through and apply corrections based on all the transfer functions that you have specified in your vector. This one is again a loop over everything (all stations and all events). This is the only code that has some substantial variables to change in the top of it besides the overwrite and figures. T1 and T2 are periods for filtering for plotting your seismograms. You'll need to specify the input path for where the actual event data is contained (e.g. what comes out of event_preprocess). It also is looking for lists for bad Z, H and P components if they exist. After all of that, you should have your corrected seismograms which you can then put into whatever codes you want to look at them.

To summarize run dailystaspectra and cleanstaspectra for each station you are interested in. Once that is finished, run calctransfunc and correctevent.

And that's it! Let me know if anything is confusing or if you have any questions. Example figure folders are included, so you should be able to compare your output to those to assess if something is going wrong. And please let me know if any of this description isn't clear, or if you have any questions about changing the default parameters. Once I upload the code to make it available on github, I'm planning on having a pdf manual to go with it so any feedback on my description would be appreciated.

eilonzach/ATaCR