daltare/pfas-database

Example Files Formatting

Opened this issue · 3 comments

Field Names

Make sure names are consistent across files where possible. In general, use underscores to replace any spaces, and use consistent capitalization. In particular, these field names could be fixed:

  • AOF Files
    • Sample ID to Sample_ID (underscore)
    • Is it possible to remove the "#" and quotes from the first field name? #"Lab_ELAP_CertID"
  • NTA Files
    • PS Code to PS_Code (underscore)
    • Collection Date
    • Collection Time
    • Sample Type
    • Analyte Name
  • Field Data Files
    • PS_code to PS_Code (lowercase to uppercase "C")

PS_Code Values

Make sure they use consistent formatting. Assuming the well list file is correct (e.g. CA3610009_007_007), then:

  • Use underscores (not dashes)
  • Always start with CA
  • Make sure the values in the data files match one of the records in the well list

Other Questions

  • AOF files use some of the same Lab_Sample_ID values across the two different files - is that expected?
  • Is the first part of the AOF filename supposed to be the batch? If so, should probably make consistent with what's in the dataset.
  • Date formats - may not matter, but is there flexibility on this? Possible to collect and/or store data in YYYY-MM-DD format?
  • Some fields are not filled in in the example files - is it important to know the data types for these fields?
    • e.g. will Turbidity_Measure be a numeric value?

Some files have extra data / blank rows:

  • Example 3K06046_NTA_533_EDD_Rev1.csv (extra data - bottom of column D)
  • Example 4K00000_NTA_533_EDD.csv (extra data / blank lines - re-starts at row 2060)
  • Example Well List_9 wells.csv (extra data / blank lines - re-starts at row 3829)

In the NTA files, there's a discrepancy between the example files and the PowerPoint list:

  • first column of the NTA files is labeled LabID, but the PowerPoint list says Lab_Sample_ID
  • sixth column of the NTA files is labeled Lab_Sample_ID, but the PowerPoint list says Sample_ID

PS_Code CA1010007_204_204 is in both well list files (but with different information).