nextcloud/suspicious_login

Validate the model with realistic and strict data sets

ChristophWurst opened this issue · 0 comments

In (machine learning) theory, there should be no intersection between the training data set and the validation training set. However, in this application the classifier does not only have to classify IPs it hasn't seen before, but also ones it sees again. Therefore I chose to not only validate on data that has only been seen recently but also IPs there have historic and recent appearance.

These results might not be a real representation of the quality of the model, so I'd like to add a second validation that only uses true recent data.