See https://github.com/rweigel/cdaweb-metadata, this repository is no longer in use.

See also https://github.com/spase-group/adapt/tree/main/CDAWEB

Motivation

This repository was developed to improve the HAPI metadata served at https://cdaweb.gsfc.nasa.gov/hapi. Metadata-related code is discussed in Section 2. Output files are available at mag.gmu.edu.

This repository also contains a discussion of alternative approaches for using CDAWeb to serve HAPI data streams. Data-related code is discussed in Section 3.

Metadata Code

Three developers have written software to produce HAPI metadata using different methods. Each approach taken has limitations and differences exist in the produced metadata. This repository can be used to compare metadata generated by each method.

This repository contains

  • scripts for a fourth method that uses the CDAS REST service. The script that generates metadata is CDAS2HAPIinfo.js ($\sim$1000 lines). The output files generated by this script are placed locally in hapi/bw. The input files used to create the output files are stored locally in cache/bw.

    Output files are also available at http://mag.gmu.edu/git-data/cdaweb-hapi-metadata/hapi/bw.

    (The input files used to create the final output are stored in cache/bw).

  • a script, compare/compare-meta.js, that compares the metadata results. The file compare/meta/compare-meta.json contains the content from the four all files with keys to indicate the method. The keys are bw, nl, bh, and jf, which are the initials of the person who developed the software that generates the HAPI metadata. See below for additional details.

The four methods that produce HAPI metadata are

  1. bw, which uses the new code in this repository and generated HAPI metadata (stored at mag.gmu.edu.

    1. Extracts dataset ids and their start and stop dates from https://spdf.gsfc.nasa.gov/pub/catalogs/all.xml

    2. For each dataset, makes a /variables request to https://cdaweb.gsfc.nasa.gov/WebServices/REST/ to get the list of variable in the dataset that is needed for the next step

    3. Makes a /data request to https://cdaweb.gsfc.nasa.gov/WebServices/REST/ to obtain a sample data file. The time range used is that of the last file returned from a /orig_data request for all files over the timerange in `

    4. After step 3., all of the metadata needed to form a HAPI response is available. The final step is to generate HAPI /info responses for each dataset. There is one complication. HAPI mandates that all variables in a dataset have the same time tags. Some CDAWeb datasets have datasets with variables with different time tags. So prior to creating /info responses, new HAPI datasets are formed. These new datasets have ids that match the CDAWeb ids but have an "@0", "@1", ... appended, where the number indicates the time tag variable index in the original dataset.

    The initial generation of the the HAPI all.json file using CDAS2HAPIinfo.js can take up to 30 minutes, which is similar to the update time required daily by the nl server. In contrast, subsequent updates using CDAS2HAPIinfo.js takes less than a second; on a daily basis, only the startDate and stopDate must be updated, which requires reading all.xml and updating all-info.json. When CDAWeb adds datasets or the master CDF changes, the process outlined above is only required for those dataset; this process typically takes less than 10 seconds per dataset.

  2. nl, which uses an approach similar to the above for datasets with virtual variables and an approach similar to jf below otherwise. Output files are available at mag.gmu.edu.

    This code is used for the production CDAWeb HAPI server.

    This production HAPI server has many datasets for which the metadata or data responses are not valid. It appears the HAPI verifier was never run on all datasets. I have found that when randomly selecting datasets and parameters at https://hapi-server.org/servers, one frequently encounters issues.

    The production HAPI server becomes unresponsive at 9 am daily due to a similar update that appears to block the meain thread. However, in general, a full update is only needed when content other than the startDate and stopDate changes.

  3. bh, which uses SPASE records. Output files are available at mag.gmu.edu.

    This server is a prototype and it serves only CDAWeb datasets for which a SPASE record is available.

  4. jf, which uses master CDFs, raw CDF files, and code from Autoplot. Output files are available at mag.gmu.edu.

    The code that produces HAPI metadata is also a prototype and is not indended for production use.

Data Code

This respository also contains code to compare HAPI CSV generated using N different methods. The script, bin/HAPIdata.js is a wrapper script that can be used to generate HAPI CSV using the N methods given a HAPI dataset, parameter(s), and start and stop times.

The methods are

  1. cdaweb-csv-using-js
  2. cdaweb-cdf-using-pycdf
  3. cdas-text-using-js
  4. cdas-cdf-using-pycdf
  5. cdas-cdf-using-pycdas
  6. nl-hapi
  7. bh-hapi
  8. apds

Other options include

  1. text-raw
  2. text-noheader
  3. nl-hapi
  4. cdas-cdf-using-pycdf

This script with pre-set choices of HAPI request inputs can be excuted using

cd compare
make compare-data1
make compare-data2
make compare-data3

Use

Generating HAPI /info responses

Requires Node.js.

node CDAS2HAPIinfo.js --keepids '^AC_'

generates /info responses for all datasets with IDs starting with AC_.

Comparing HAPI /data responses

cd compare; make compare-data1 IDREGEX='^AC_'

Creating Metadata

make all

In reference to the above four options, to create metadata for all CDAWeb IDs that start with "AC_", use

  1. node CDAS2HAPIinfo.js --keepids '^AC_', which creates HAPI metadata that is written to hapi/bw.

  2. node HAPI2HAPIinfo.js --version 'nl' --keepids '^AC_', which creates

    hapi/nl/all.json and hapi/nl/info/

  3. node HAPI2HAPIinfo.js --version 'bh' --keepids '^AC_', which creates

    hapi/bh/all.json, which contains all of the info responses placed in hapi/bh/info/.

  4. node HAPI2HAPIinfo.js --version 'jf' --keepids '^AC_', which creates

    hapi/jf/all.json, which contains all of the info responses placed in hapi/jf/info/.

After these files are created, the program compare-meta.js can be executed to generate the file compare/compare-meta.json that shows the metadata created by the four approaches in a single file.