Sage-Bionetworks/schematic

Key error when regex validation rule in Filename attribute

adamjtaylor opened this issue · 3 comments

Describe the bug

As a data model developer for HTAN I would like to enforce that the Filename attribute must not contain whitespace characters to ensure compatibility with downstream scripts, repositories and workflows that do not handle whitespace in filenames elegantly.

eg.

  • imaging_level_2/HT 123 P1 S1H1 1.svs (with spaces) should fail validation
  • imaging_level_2/HT_123_P1_S1H1_1.svs (with underscores) should pass validation.

When implementing the validation rule regex search ^\\S*$ in the HTAN data model and testing against a template schematic fails with the error KeyError: 'Filename'

There are more details in this Slack thread and this ncihtan/data-models issue

To Reproduce
See linked issue

Expected behavior
Schematic runs successfully and reports error if there is a non-whitespace character in the Filename attribute entry

Priority (select one)

  • Minor ⬇️
  • Major 📢
  • Critical 🆘

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (if applicable, please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@MiekoHash and @abbywall just and FYI that this is currently preventing implementation of the linked HTAN issue, which prevents us from catching white spaces in file names (this has been a recurring issue across data models) and affects validation - see linked issue for additional details.

Thanks for the info.. @aclayton555 . I will follow up on this w @milen-sage

In the queue for next triage which is early next week.