noisepy/NoisePy

Issue with RawDataStore Selection and Usage for Custom Data in NoisePy label:"help wanted"

Opened this issue · 1 comments

I encountered issues when trying to use my own data with NoisePy after successfully following the tutorial. To prepare my data for cross-correlation, I used "S0B_to_ASDF.py" to convert my mseed data to h5 format, which took care of the necessary pre-processing steps.

Now, I need to perform cross-correlation using the "cross_correlate" function, which requires a RawDataStore, config_parameters, and a CrossCorrelationDataStore as inputs. The tutorial explains three options to create a RawDataStore:

ASDFRawDataStore
PNWDataStore
SCEDCS3DataStore

However, it is challenging for me to apply these options to my data, as it doesn't require any preprocessing steps.

I attempted to use the ASDFRawDataStore and SCEDCS3DataStore with my preprocessed .h5 data, but encountered errors. When using ASDFRawDataStore, I followed the tutorial's guidance by providing the path of the h5 file as input, like this: "raw_store = ASDFRawDataStore(raw_data_path)"Although this step didn't raise an error, the error occurred when calling the "cross_correlate" function: "cross_correlate(raw_store, config, cc_store)". I attempted to understand and resolve the error by examining the ASDFRawDataStore class, but without success.

Regarding SCEDCS3DataStore, following the tutorial, I set up the stations using the stations = "HAUP,PYKE".split(",") format and established the stationxml file with catalog = XMLStationChannelCatalog(S3_STATION_XML). Then, I created the SCEDCS3DataStore as follows:

S3_DATA = '/Volumes/GeoPhysics_23/users-data/juarezilma/Noisepy/RAWDATA/' (this is the folder containing the preprocessed .h5 file)
raw_store = SCEDCS3DataStore(S3_DATA, catalog, channel_filter(stations, "HH1"), range).

However, the output showed:
2023-07-11 11:07:35,157 INFO scedc_s3store._load_channels(): Loading 0 files from /Volumes/GeoPhysics_23/users-data/juarezilma/Noisepy/RAWDATA/2022/2022_001/
2023-07-11 11:07:35,157 INFO scedc_s3store._load_channels(): Init: 0 timespans and 0 channels

I suspect that the SCEDCS3DataStore expects raw data in .h5 format within directories following the pattern "2022/2022_001/." As I don't have .h5 raw files, it couldn't find any files.

My data is structured as follows:
juarezilma@:/Volumes/GeoPhysics_23/users-data/juarezilma/Noisepy/RAWDATA/2022/2022_001$ ls
2022.001.HAUP.10-HH1.ZX.D.IRremoved 2022.001.HOST.10-HH1.ZX.D.IRremoved 2022.001.PYKE.10-HH1.ZX.D.IRremoved
2022.001.HAUP.10-HH2.ZX.D.IRremoved 2022.001.HOST.10-HH2.ZX.D.IRremoved 2022.001.PYKE.10-HH2.ZX.D.IRremoved
2022.001.HAUP.10-HHZ.ZX.D.IRremoved 2022.001.HOST.10-HHZ.ZX.D.IRremoved 2022.001.PYKE.10-HHZ.ZX.D.IRremoved

I would greatly appreciate your assistance in creating a RawDataStore using my own data.

Thank you very much for pointing us to these issues.
We are in the midst of the full refactoring and it is clear to us that we need to provide more directions on how to use the new functions (as against the old scripts) and how to make a DataStore from scratch using our templates. We will address this over the next month and appreciate your patience.