Understanding Authentication Data
Von-dy opened this issue · 4 comments
Hello,
This is more of a comment that I think anyone who is interested in collecting O365 Audit data should understand.
Authentication data is very very skewed because MS did not develop it for Audit purposes, but rather from a developer perspective.
There are 3 main categories of Authentication Operations:
UserLoggedIn
MailboxLogin
UserLoginFailed
Based on quickly looking at these, it would suggest that all failures fall under the UserLoginFailed, but from what I can tell, any Failure that is the RESULT of a user action being incorrect falls under UserLoginFailed but failed due to non-user specific controls can fall under UserLoggedIn. The difference here being between mistyping a password and using incorrect cached data, the first is a user error, the second is a system error.
Users should understand that LogonError is the key to determining if a UserLoggedIn operation is reporting a successful login or if it is just reporting an attempted login that went wrong. I have noticed as well that a LogonError is reported on Logouts as well. MS is not good at documenting this information and I will be posting to the forums about these findings.
I can understand why from a developer perspective things are reported in this way, as they are reported from a "System Tried to Take Response to Action, System Successfully Responded" but from an audit perspective, there is a lot to parse out here.
Elastic 7.6 allows for filtering on nested terms as well and ExtendedProperties is going to be your friend when validating successful vs unsuccessful login attempts.
I am working on a way to keep the original data the same in my beats system and modify it to be properly reported in the ECS event fields. I think it is crucial to report all the base MS information, but cleaning the data before leaving Beats will be important for not having costly Logstash pipelines down the road.
This is not a O365 Development Issue but rather an MS O365 API Development Issue I think everyone here should be aware of. MS needs to get their documentation in line with what it is reporting.
Good point.
Microsoft's documentation makes a reference to this here: https://docs.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api-schema#common-schema
See the point on ResultStatus.
I did some testing on this as well and I don't think using ExtendedProperties is reliable in determining whether the logon was successful or not. For example - I see some events where the ResultStatusDetail is "Success" but a LogonError property still exists with a value of KmsiInterrupt which is documented here: https://docs.microsoft.com/en-gb/azure/active-directory/develop/reference-aadsts-error-codes#aadsts-error-codes
I haven't seen any events where the LogonError property exists (except for when the value is "None") but the even't isn't an authentication failure so maybe that's the most reliable way.
Actually it looks like LogonError is always "None" for Logout events but LogonError "None" also seems to exists for some successful login events (not sure why this is the case).
So you could filter for successfuly authentication events by searching for events where Operation = UserLoggedIn and LogonError doesn't exist or equals to "None" and ExtendedProerties doesn't contain "Logout"
Thank you for this issue, and for the discussion! I'd love to consolidate these lessons and move them to the right location. I'll try to distill and capture this in a section in the README soon, with a link to this issue. I'll let you know when that happens, I don't want to close this until we have a good place to direct curious folks.
Referenced in README in release v1.5.1, and we can keep this (closed) issue for reference and any additional discussion. Thanks again!