Allow empty strings in string_or_none contract
Closed this issue · 7 comments
What is the reasoning for checking if the string has a length in the string_or_none
contract ? The name is confusing since passing an empty string results in an exception whereas None
is allowed. Is there a reason for this? I would suggest changing the name to non_empty_string_or_none
or removing the length restriction. Happy to add a PR depending on what is wanted! :) [My preference would be to make an empty string allowed, since it makes handling code to pass to the tracker simpler]
Question for @chuwy !
That's a good question. I see two opposite points here:
- On one side - empty strings are useless in general, it doesn't make much sense to send and store them.
- On the other side - most schemas implicitly allow them (by not using
minLength
property), and schemas supposed to be single source of truth for what is allowed and what is not.
For me second point seems stronger - if we think that empty strings are harmful let's use minLength: 1
everywhere in Iglu Central (using severityLevel 2). Tracker should follow this restrictions and should not decide for user.
Based on above I think that loosing string_or_none
seems like a right direction. But we need to check that all 28 dependent arguments are fine with empty string, I know that at least currency
in track_ecommerce_transaction
should accept string of exactly 3 chars (which is anyway not current string_or_none
).
Empty strings are semantically different from a null/missing/undefined, so my preference is to tolerate them unless there is a strong reason on a field-by-field basis not to (e.g. we know that a field must always be set to a UUID).
Equally I'm not comfortable enforcing string length limits in Iglu Central without having a strong confirmation that zero-length strings are impossible for that given property...
Remember too that in the Snowplow Tracker Protocol, empty strings are elided (removed from the final payload) because a null and an empty string are indistinguishable as part of a GET name-value pair.
So I wonder if that's how this crept in...
Agree. I also didn't mean we need to violently add minLength
everywhere. My whole point is that tracker should not prevent user from sending data that tracker can consider suspicious - precise validation is not on tracker's behalf.