cal-itp/data-infra

Payments: littlepay_transaction_id missing from some fct_payments_rides_v2 rows

Closed this issue · 1 comments

Describe the bug

Captured by dbt tests reflected in Sentry issues #72503 and #72504 - a subset of rows in fct_payments_rides_v2 lack a value for littlepay_transaction_id.

To Reproduce

The failing rows in question are almost entirely composed of payments from MST whose corresponding micropayments have a charge_type value of pending_charge_fare. Almost every micropayment with that charge_type for that agency results in a fct_payments_rides_v2 row that isn't mapped to a littlepay_transaction_id. These rows do map to valid littlepay_transaction_id values in int_littlepay__cleaned_micropayment_device_transactions, so the culprit is probably a disconnect somewhere in an adjacent stg/int model or in one of the joins within the fct_payments_rides_v2 model definition. Other agencies' micropayments rows with the same charge_type value appear to map correctly, so comparing tables across agencies for presence/absence of different columns could be a good start.

In addition to that identified pattern that accounts for most of the test failures, there are a handful of other rows in fct_payments_rides_v2 that lack a littlepay_transaction_id value for other reasons. These may have to be treated individually - some may just be due to bad data or testing data.

Expected behavior

The fct_payments_rides_v2 model is intended to reflect a broader, perhaps less tightly controlled dataset than fct_aggregations, so it is possible that a few rows could end up as true mysteries at the end of an investigation into these test failures. If so, the tests in question should likely be loosened with, say, a 0.99 threshold.

Investigation notes:

  • A significant subset of the affected rows (98%) are caused by issue #647; these are rows that are specifically being dropped in int_littlepay__cleaned_micropayment_device_transactions but are subsequently being reintroduced into payments_rides.
  • All rows did originally have a transaction ID.
WITH problem_rows AS (SELECT *
FROM `cal-itp-data-infra.mart_payments.fct_payments_rides_v2`
WHERE littlepay_transaction_id IS NULL),

map AS (
  SELECT *
  FROM `cal-itp-data-infra.staging.stg_littlepay__micropayment_device_transactions`
),

cleaned AS (
  SELECT *
  FROM `cal-itp-data-infra.staging.int_littlepay__cleaned_micropayment_device_transactions`
),

compare AS (
  SELECT 
    problem_rows.*,
    map.littlepay_transaction_id AS original_lp_id,
    cleaned.littlepay_transaction_id AS cleaned_lp_id
  FROM problem_rows
  LEFT JOIN map USING (micropayment_id)
  LEFT JOIN cleaned USING (micropayment_id)
)

SELECT
  original_lp_id IS NULL,
  cleaned_lp_id IS NULL,
  participant_id,
  COUNT(*),
  COUNT(DISTINCT compare.original_lp_id)
FROM compare
GROUP BY 1,2, 3
ORDER BY 4 DESC