snowplow/snowplow-python-tracker

Allow empty strings in string_or_none contract

Closed this issue · 7 comments

What is the reasoning for checking if the string has a length in the string_or_none contract ? The name is confusing since passing an empty string results in an exception whereas None is allowed. Is there a reason for this? I would suggest changing the name to non_empty_string_or_none or removing the length restriction. Happy to add a PR depending on what is wanted! :) [My preference would be to make an empty string allowed, since it makes handling code to pass to the tracker simpler]

Question for @chuwy !

chuwy commented

That's a good question. I see two opposite points here:

  • On one side - empty strings are useless in general, it doesn't make much sense to send and store them.
  • On the other side - most schemas implicitly allow them (by not using minLength property), and schemas supposed to be single source of truth for what is allowed and what is not.

For me second point seems stronger - if we think that empty strings are harmful let's use minLength: 1 everywhere in Iglu Central (using severityLevel 2). Tracker should follow this restrictions and should not decide for user.

Based on above I think that loosing string_or_none seems like a right direction. But we need to check that all 28 dependent arguments are fine with empty string, I know that at least currency in track_ecommerce_transaction should accept string of exactly 3 chars (which is anyway not current string_or_none).

Empty strings are semantically different from a null/missing/undefined, so my preference is to tolerate them unless there is a strong reason on a field-by-field basis not to (e.g. we know that a field must always be set to a UUID).

Equally I'm not comfortable enforcing string length limits in Iglu Central without having a strong confirmation that zero-length strings are impossible for that given property...

Remember too that in the Snowplow Tracker Protocol, empty strings are elided (removed from the final payload) because a null and an empty string are indistinguishable as part of a GET name-value pair.

So I wonder if that's how this crept in...

chuwy commented

Agree. I also didn't mean we need to violently add minLength everywhere. My whole point is that tracker should not prevent user from sending data that tracker can consider suspicious - precise validation is not on tracker's behalf.

@chuwy any update on this? What solution will you be going with?

chuwy commented

@joshblum I think we're all agree that making empty strings allowed is right decision. I think we don't plan python tracker release in near future, but PR would be welcome.