cal-itp/data-infra

Document all columns in `mart_gtfs_schedule_latest` & review YAML

Closed this issue · 0 comments

As an analytics engineer, I want all of our columns to be documented in dbt so that future maintainers and users of the warehouse will understand what each column is and how it should be used.

AC:

  • Add documentation for all columns in mart_gtfs_schedule_latest currently lacking documentation (see query below)
  • Audit associated dbt YAML:
    • For YAML files longer than ~10 models with common anchors (used by more than ~3 models), define the anchors at the very top of the file, as done here: https://github.com/cal-itp/data-infra/blob/main/warehouse/models/mart/gtfs/_mart_gtfs_dims.yml#L3
    • Check that anchors are being used appropriately: if there's a common field with equivalent description, an anchor should be used; if there's a field with the same name that doesn't use the anchor, consider using the anchor and overriding the part that's different or adding a comment about why this instance can't use the anchor
    • Review field documentation and evaluate for completeness/correctness
-- identify columns missing documentation 
SELECT *
FROM `cal-itp-data-infra`.`mart_gtfs_schedule_latest`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE description IS NULL
ORDER BY table_name