microbiomedata/DataHarmonizer

DH Section review

Closed this issue · 2 comments

DataHarmonizer groups columns into sections. We should review:

  • Appearance (determined mostly by the Sections_order tab in Soil-NMDC-Template_Compiled)
    • Names of sections
      • What happens to the user experience when the section names are especially long
      • In the past, we had talked about including the contents of category columns into the section names when they add something. For example, biogeochemistry for the mixs_packages_x_slots tab.
      • Do we want to include all of these facets in section names?
        • MIxS vs MIxS modified
        • required, recommended, option?
    • Ordering of sections
    • Assignment of columns to sections
    • Ordering of columns within sections
  • Implementation
    • The section modelling should be injected into the LinkML schema, not created on the fly when the DH data.tsv is being created. Solution started in #123
    • Is it worth having a short name for each section and a more descriptive title? For now, I have just replaced all appearances of the names like 'biosample_id' with the titles like 'Biosample Identification' across the whole Google Sheet, with the exception of keeping a short_name column in the Sections_order tab, in case we want to revert.

I changed Biosample Identification to Sample ID because the longer form spills over the gray line on the left that indicates the column freeze

I don't believe any of this is relevant anymore. If there are problems with the way slots are grouped into sections please open an issue in the submission-schema repo if it specifically about NMDC's schema, specifically, or in the the main DataHarmonizer repo if it is about how DataHarmoinzer interprets a schema.