counteractive/o365beat

User Data Enhancement

Michael-vdL opened this issue · 3 comments

Just a suggestion:
In the o365beat.yml file, it might be prudent to change UserKey to the user id field.
UserId in O365 relates to the user.email and user.name (Typically)

My suggestion is:

  • dissect:
    field: UserId
    tokenizer: '[%{user.name}@%{user.domain}]'
    when:
    contains:
    UserId: '@'

convert:

  • {from: UserKey, to: 'user.id', type: 'string'}

This is for UserId directly to ECS (as I believe that is the primary case for this beats) but it could be modified by adding UserName and UserDomain as fields then converting to ECS in the conversion processor.
I am still new to ELK so I am not sure how to modify the mappings. I have been doing these in Logstash but it is clear they would fit here.

Thank you for the issue! I appreciate another perspective, and there are certainly multiple reasonable ways to parse the API output into ECS. A few thoughts:

  • Parsing the first part of UserId into user.name seems like a fine addition
  • Parsing the email domain into user.domain is a little trickier. The ECS field user.domain is designed to hold the "name of the directory the user is a member of, [f]or example, an LDAP or Active Directory domain name," rather than a domain in the DNS sense. For some users those might be the same, in which case they could benefit from dissecting it that way. For others, the 365 email domain might differ from their directory domain name, and might introduce some confusion ... and I'm not sure if AD-integrated o365 deployments log another field to capture the AD domain. Maybe I'm being overly cautious here, and users could comment out that dissector if they don't use it. I'll look at the docs and include this if it doesn't stomp on another use of user.domain.
  • As far as parsing UserKey into user.id, I'm not sure what most folks would gain: the UserId is both named and used as a unique identifier for the user in the API results. The UserKey is used less frequently, and suffers from not being human-readable. Apparently ECS supports multiple user.id values ("one or multiple unique identifiers of the user"), though I'm not sure how that'd work cleanly in practice.

Happy to have more discussion on any of the above. Thanks again!

I agree with the UserKey to user.id not being a great fit. My initial argument was from a personal assumption that ECS user.id is expecting something more of a UUID rather than something a human understands. But they are certainly both unique IDs from a user perspective. The only thing users really gain is the ability to put more fields into ECS format which Elastic seems to think that when ECS is concerned, the more the merrier.

I definitely see your reasoning for Domain vs DNS domain. For us, they are the same at the moment, but we are standing up a new AD and O365 environment which will be domain joined. In a few months, I will have the answer to that question.

Thank you for the response.

Added a dissect processor in de5ef64 to pull out user.name and user.domain, it'll be in the next release and it's easy enough to add yourself (no upgrade required as it doesn't change the beat). It's a nice and reasonable ECS addition, we'll reconsider if we get reports of confusion or collisions with AD domains. Thanks again for the issue and discussion.