harrystech/arthur-redshift-etl

Manage users access and group membership using configuration

thomas-vogels opened this issue · 0 comments

Is your feature request related to a problem?

This proposal addresses multiple issues commonly faced in a data warehouse by extending the current approach in Arthur:

  • Needing a "system" user, which a BI tool uses
  • Adding users from the Data or Analytics team with wider permissions
  • Grouping users so that permissions can be based on groups
  • Adding users and removing users from groups, thus changing their permissions
  • Ensuring that users lose permissions (and access) when leaving the team.

(The first already exists. The rest is limited or doesn't yet exist.)

Instead of taking a manual approach of:

  • Using a SQL client like psql to run commands (which wouldn't be recorded)
  • Using arthur.py update_user to add a user to a group

this suggests to instead create a configuration section

  • with users
  • and their groups

So that users, groups and memberships can be configured and automatically managed.
This means that we have one place that describes who has access to which schemas (via groups). And we know that that's reflected in the state of the database as well. We make onboarding and offboarding faster and easier, thus ensuring that we're using the right tools consistently and timely.

How this plays out:

  • If a new group is mentioned (as user's group, as a reader or as a writer group), that group is created.
  • If a new user is mentioned, that user is created.
  • The user's groups in the database are compared with the groups in the configuration file.
    • The user is removed from any group not listed in its configuration.
    • The user is added to any group, where they are not already a member.
  • A user who is no longer listed in the configuration loses access to the database. (Removing the user would require dropping any objects so managing owned objects is a follow-up feature.)

Describe the solution you'd like

Here's an example:

Example Step 1:

{
    "data_warehouse": {
        "users": [
            {
                "name": "default",
                "group": "analyst"
            },
            {
                "name": "alice",
                "groups": ["analyst", "etl_ro"]
            }
        ]
    }
}
  • This creates groups analyst and etl_ro.
  • This creates user alice.
  • This adds alice to analyst and etl_ro.

Example Step 2:

{
    "data_warehouse": {
        "users": [
            {
                "name": "default",
                "group": "analyst"
            },
            {
                "name": "alice",
                "groups": ["etl_ro"]
            },
            {
                "name": "bob",
                "groups": ["analyst"]
            }
        ]
    }
}
  • This creates the new user bob.
  • This removes alice from the analyst group.
  • This adds bob to the analyst group.

Example Step 3:

{
    "data_warehouse": {
        "users": [
            {
                "name": "default",
                "group": "analyst"
            },
            {
                "name": "bob",
                "groups": ["etl_ro", "daas_ro"]
            },
            {
                "name": "carol",
                "groups": ["analyst", "daas_ro"]
            }
        ]
    }
}
  • This removes the user alice (in this step, just by using PASSWORD DISABLE)
  • This adds the user `carol.
  • This adds the group daas_ro.
  • This removes bob from analyst and adds him to etl_ro and daas_ro.
  • This adds carol to analyst and daas_ro.

Describe alternatives you've considered

  • Still looking for a tool that would do that for us.
  • Regarding database access, we should be leveraging IAM.

Also, out of scope of the configuration: managing a user's credentials by updating their passwords. (For now, continue to use arthur.py update_user.

Additional context

We have sometimes schemas that aren't managed by Arthur or external. For those schemas, we have to decide whether we want to have Arthur manage permissions as well.