ESGF/esg-search

remove duplicates in cf_standard_names in publication service

alaniwi opened this issue · 0 comments

Several fields in the THREDDS catalog can have the same CF standard name. For example:

      <variable name="hur" vocabulary_name="relative_humidity" units="%">Relative Humidity</variable>
      <variable name="rhs" vocabulary_name="relative_humidity" units="%">Near-Surface Relative Humidity</variable>
      <variable name="rhsmax" vocabulary_name="relative_humidity" units="%">Surface Daily Maximum Relative Humidity</variable>
      <variable name="rhsmin" vocabulary_name="relative_humidity" units="%">Surface Daily Minimum Relative Humidity</variable>

Where this exists, currently the same value appears several times in the cf_standard_name variable in Solr. This is probably not useful.

I believe that the relevant code is in https://github.com/ESGF/esg-search/blob/devel/src/java/main/esg/search/publish/thredds/parsers/VariablesParser.java