mint-metrics/mojito-r-analytics

Customisable table references for reports

Opened this issue · 2 comments

I think we can make reports more customisable by changing the way we reference tables in our knits / Mojito.

Currently tables are referenced based on the client ID and subject types defined in wave_params. This makes for a tidy wave_params object:

wave_params <- list(
  client_id="client",
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  subject="usercookie",
  recipes=c("Control","Treatment")
)

This yields tables like so:

  • mojito.exposures_usercookie
  • mojito.segments_usercookie
  • mojito.client_conversions_usercookie

There are some issues with this though:

  1. It is quite rigid because users aren't able to deviate their schema naming or table naming conventions to fit their data warehouses.
  2. It's also inefficient in Redshift where goals that would normally be defined once-off, inside a report, need to be committed to our datamodelling steps (users can't just define a custom goal table for a goal e.g. (SELECT domain_userid as subject, 'conversion' as goal, 10.00 as revenue, derived_tstamp as conversion_time FROM client.events WHERE event_name = 'custom_schema')).
  3. Another inefficiency is requiring users to specify a client ID. Not all users will be multi-tenanted and the additional column uses extra space in the DWH, whilst not strictly needed.

Whilst slightly uglier, I think we can make Mojito easier-to-adopt through customisable table references, like so:

wave_params <- list(
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  tables=list(
    exposure="mojito.exposures_usercookie",
    goal="mojito.client_conversions_usercookie",
    segment="mojito.segments_usercookie",
    failure="mojito.recipe_errors_2"),
  recipes=c("Control","Treatment")
)

Thoughts @dapperdrop ?

@kingo55

I think this is fine. It should be minimal impact for us as well because the table references won't change around often.

With the example you gave, how would we adopt it for our use case (multi-tenanted DWH)?

@dapperdrop - we can create a multi-tenanted exposures table, like we currently do. Where it has the following columns:

client_id, subject, wave_id, recipe, exposure_time

And derive views from that, like:

(SELECT subject, wave_id, recipe, exposure_time FROM all_exposures WHERE client_id = 'client')