Ingest and transform old style Benefits enrollments into Metabase
Closed this issue ยท 7 comments
User story / feature request
As a Metabase dashboard creator or consumer, I want to be able to see the full history of Benefits enrollments in a single format together in a single dataset/chart to avoid having to manually add to the numbers currently available.
Acceptance Criteria
- Data from Amplitude is back-filled into the GCS data destination for the date range
2021-12-08
through2022-08-11
Pacific
DMV events
This Amplitude chart shows events that match the following criteria that should be transformed and ingested into Metabase in the new format: https://app.amplitude.com/analytics/compiler/chart/aztqyfek
- For all events before
2022-08-12
Pacific, of typeviewed page
withpath = /enrollment/success
,region = California
orregion = (none)
and all values ofcity
exceptLos Angeles
, create a new event of typereturned enrollment
withstatus = success
- Each event should be created in the
mart_benefits.fct_benefits_events
table according to the existing structure, and re-using as much relevant event information from the originalviewed page
event as possible (e.g. dates, times) - For each event created, assign the
transit_agency
properties the value ofMonterey-Salinas Transit
- For each event created, assign the
eligibility_types
properties the value ofsenior
- For each event created, assign the
eligibility_verifier
properties the value ofDMV
- For each event created, assign the
payment_group
property the value of5170d37b-43d5-4049-899c-b4d850e14990
- There should be 18 new events created matching the city breakdown below:
City | Total events |
---|---|
Salinas | 8 |
San Jose | 3 |
Monterey | 2 |
(none) | 2 |
Yuba City | 1 |
Pacific Grove | 1 |
San Francisco | 1 |
Total | 18 |
Login.gov events
This Amplitude chart shows events that match the following criteria that should be transformed and ingested into Metabase in the new format: https://app.amplitude.com/analytics/compiler/chart/l6lx6x7i
- For all events including and after
2022-08-12
Pacific, of typeviewed page
withpath = /enrollment/success
,region = California
orregion = (none)
and all values ofcity
exceptLos Angeles
, create a new event of typereturned enrollment
withstatus = success
- Each event should be created in the
mart_benefits.fct_benefits_events
table according to the existing structure, and re-using as much relevant event information from the originalviewed page
event as possible (e.g. dates, times) - For each event created, assign the
transit_agency
properties the value ofMonterey-Salinas Transit
- For each event created, assign the
eligibility_types
properties the value ofsenior
- For each event created, assign the
eligibility_verifier
properties the value of(MST) CDT claims via Login.gov
- For each event created, assign the
payment_group
property the value of5170d37b-43d5-4049-899c-b4d850e14990
- There should be 4 new events created matching the city breakdown below:
City | Total events |
---|---|
Monterey | 4 |
Total | 4 |
Notes
This came out of a discussion with @o-ram and @indexing as we start to roll out more public-facing Benefits metrics in e.g. the new Benefits newsletter, where numbers are based on the newer format for enrollment events, it is time-consuming and error-prone (and maybe not even possible from e.g. a Metabase chart perspective) to have to remember to add back these "old style" enrollments into the metrics.
The Benefits Timeline document helps explain some of the evolution in the Benefits app and reasons why there are differences in events captured during the "DMV days" when Benefits was hosted by the DMV and used driver's license information to verify eligibility for Older Adults.
TLDR; in cal-itp/benefits/pull/873 enrollments were defined as a specific event type rather than interpreted from a generic viewed page
event type. This change occurred after the Benefits relaunch with CDT and Login.gov, so there are old enrollments from both the DMV/pre-Login.gov days, and after.
Implementation
I think this can be done in the main DBT model/script for fct_benefits_events
.
We may need something like a separate WITH
clause to create a new subquery/Common Table Expression (CTE) to select out the old events and transform them according to the criteria above, and then the final SELECT
would need to be modified to JOIN
the 2 CTEs together into the resultant table.
Query to get started
This query matches the conditions above for which enrollment events should be converted:
SELECT * FROM `cal-itp-data-infra.staging.stg_amplitude__benefits_events`
WHERE client_event_time >= '2021-12-08T08:00:00Z'
and client_event_time < '2022-08-29T07:00:00Z'
and (region = 'California' or region is null)
and city <> 'Los Angeles'
and event_type = 'viewed page'
and JSON_VALUE(event_properties, '$.path') = '/enrollment/success'
Confirming that data exists in GCP for this back-fill.
Bucket
E.g. for the beginning of the date range:
And the end:
Staging table
Running this query in the console:
SELECT city, count(client_event_time) as enrollments
FROM `cal-itp-data-infra.staging.stg_amplitude__benefits_events`
WHERE client_event_time >= '2021-12-08T08:00:00Z'
and client_event_time < '2022-08-29T07:00:00Z'
and (region = 'California' or region is null)
and (city <> 'Los Angeles' or city is null)
and event_type = 'viewed page'
and JSON_VALUE(event_properties, '$.path') = '/enrollment/success'
GROUP BY city
ORDER BY enrollments desc, city
combining the date criteria above
Produces the expected results:
city | enrollments |
---|---|
Salinas | 8 |
Monterey | 6 |
San Jose | 3 |
null | 2 |
Pacific Grove | 1 |
San Francisco | 1 |
Yuba City | 1 |
Total | 22 |
Query I'm starting with to process the two groups (DMV and Login.gov) separately before combining:
WITH
fct_old_enrollments AS (
SELECT *
FROM `cal-itp-data-infra.staging.stg_amplitude__benefits_events`
WHERE client_event_time >= '2021-12-08T08:00:00Z'
and client_event_time < '2022-08-29T07:00:00Z'
and (region = 'California' or region is null)
and (city <> 'Los Angeles' or city is null)
and event_type = 'viewed page'
and JSON_VALUE(event_properties, '$.path') = '/enrollment/success'
ORDER BY client_event_time desc
),
fct_dmv_events AS (
SELECT
app,
device_id,
user_id,
client_event_time,
event_id,
session_id,
"returned enrollment" as event_type,
version_name,
os_name,
os_version,
device_family,
device_type,
country,
language,
library,
city,
region,
event_time,
client_upload_time,
server_upload_time,
server_received_time,
amplitude_id,
start_version,
uuid,
processed_time,
"senior" as event_properties_eligibility_types,
"DMV" as event_properties_eligibility_verifier,
"5170d37b-43d5-4049-899c-b4d850e14990" as event_properties_payment_group,
"success" as event_properties_status,
"Monterey-Salinas Transit" as event_properties_transit_agency,
"senior" as user_properties_eligibility_types,
"DMV" as user_properties_eligibility_verifier,
"Monterey-Salinas Transit" as user_properties_transit_agency
FROM fct_old_enrollments
WHERE client_event_time < '2022-08-12T07:00:00Z'
),
fct_login_events AS (
SELECT
app,
device_id,
user_id,
client_event_time,
event_id,
session_id,
"returned enrollment" as event_type,
version_name,
os_name,
os_version,
device_family,
device_type,
country,
language,
library,
city,
region,
event_time,
client_upload_time,
server_upload_time,
server_received_time,
amplitude_id,
start_version,
uuid,
processed_time,
version_name,
platform,
os_name,
os_version,
"senior" as event_properties_eligibility_types,
"OAuth claims via Login.gov" as event_properties_eligibility_verifier,
"5170d37b-43d5-4049-899c-b4d850e14990" as event_properties_payment_group,
"success" as event_properties_status,
"Monterey-Salinas Transit" as event_properties_transit_agency,
"senior" as user_properties_eligibility_types,
"OAuth claims via Login.gov" as user_properties_eligibility_verifier,
"Monterey-Salinas Transit" as user_properties_transit_agency
FROM fct_old_enrollments
WHERE client_event_time >= '2022-08-12T07:00:00Z'
)
SELECT * FROM fct_dmv_events
UNION DISTINCT
SELECT * FROM fct_login_events
This is producing the expected 22 records in the new enrollment format
Keeping this In review
while we wait for the warehouse refresh and can confirm the new data appears in Metabase.
@thekaveman Legacy enrollments for older adults are now appearing in Metabase. Total enrollment count is now accurate as well. Woot woot! As far as I know, we didn't modify the query; they simply appeared this morning. cc// @o-ram