cal-itp/data-infra

Remove unmaintained `uri` field from `dim_gtfs_datasets`

Opened this issue · 2 comments

As a data user, I don't want the unmaintained uri field to exist on the mart dim_gtfs_datasets table because it is deceptive and may lead to people using an outdated field. (Specifically, this happened here: #3066 (comment).)

Acceptance criteria:

  • Remove the uri field from dim_gtfs_datasets (should first double check if anyone is using it)
  • Add a reconstructed pipeline_url field based on decoding base64_url as described in this comment or bring pipeline_url through from the intermediate table (only question would be double checking whether bringing pipeline_url through would affect the versioning, in which case maybe err on the side of decoding the existing column)

I'm a little confused. Both uri and pipeline_url are maintained in Airtable.

Apologies, I think maybe I got it confused -- I do think that having the templated URL with braces in the warehouse is confusing, given that pipeline_url is what is actually used for downloads and what can key to other data -- so the proposal would be to suppress uri and defer to pipeline_url. Does that seem ok? Can update ticket not to say that uri is not maintained but that it is perhaps confusing since it does not align with base64_url which is the widespread key