
The project involves answering questions around Product analytics topics such as Acquisition, Activity and Retention. Done as a requisite of the Mastering Product Analytics course at Central European University in Hungary.

Article can be found on Medium.com


Based on a real SaaS product subsampled and simplified for the task

Two datasets

  • registrations.csv: unique users with basic demographics

  • activity.csv: which users have been active in which month

Registration data

  • Columns

    • id: unique identifier of a registered user(e.g id_5)

    • registration_month: number of the month from 1 to 21

      • Month1 and Month13 are both January in consecutive years

      • No need to deal with date type

    • region: America, EMEA, or ROW (every other country)

    • operating_system: Windows, Mac, Linux, Unknown

  • About 40K records

  • Each record is a unique registered user

Activity data

  • Columns

    • id: unique identifier of a registered user(e.g id_5)

    • activity_month: number of the month from 1 to 21

      • Month1 and Month13 are both January in consecutive years

      • No need to deal with date type

    • Additional Columns:

      • Registration data joined to the activity records as convenience
  • About 79K activity events

    • One record represents an active user in a specific month

    • It means a user is active in two months on average (long tail distribution)