/user-behavior-anomaly-detector

User anomaly detector based on logs generated by Osquery framework and machine learning to process those logs.

Primary LanguagePythonMIT LicenseMIT

About

User anomaly detector based on logs generated by Osquery framework and machine learning to process those logs. Machine learnings alogirthms that are currently implemented are: Recurrent neural network - Long short-term memory (LSTM) and One-Class Support Vector Machine (OCSVM). This project is part of Master thesis at Faculty of electrical engineering and computing, University of Zagreb.

Note: Only Linux platforms are supported. Tested on Xubuntu 16.04.

The code includes:

  • dataset for different users to train and test models
  • preprocessing functions to properly tokenize the data
  • train and predict functions for existing (saved) data
  • training new model on raw osquery logs
  • saving and loading trained models
  • predict functions for incoming new data

Motivation

Existing security solutions are mostly based on preventing known malicious threats or a defined set of rules and therefore most outside and inside threats end as successful attacks. The idea was to build a system that is an adaptive user action identifier, so it can predict and detect anomalous behavior in real time.

Installation

osquery
  • Download osquery.
  • Move osquery pack of queries file user-behavior.conf and osquery configuration file osquery.conf from conf directory to osquery directory which is usually /etc/osquery/ or /usr/local/.
  • To capture syslog events additional configuration is required. (Ubuntu) Add the following to your rsyslog configuration files (usually located in /etc/rsyslog.conf or /etc/rsyslog.d/:
template(
  name="OsqueryCsvFormat"
  type="string"
  string="%timestamp:::date-rfc3339,csv%,%hostname:::csv%,%syslogseverity:::csv%,%syslogfacility-text:::csv%,%syslogtag:::csv%,%msg:::csv%\n"
)
*.* action(type="ompipe" Pipe="/var/osquery/syslog_pipe" template="OsqueryCsvFormat")

If you use syslog-ng or you can read about the configuration here. If no logs are available, read debugging suggestions.

Repository
  • Download the repository using git clone.
  • Install pip requirements with the following command: pip install -r requirements.txt. Python 2.7 is required. I don't guarantee that everything works with Python 3+ but please feel free to try.

How to use

  • Start osquery first i.e. sudo service osqueryd start or /usr/local/bin/osqueryd. Osquery in this case doesn't require root access.
  • You can start the program by choosing the wanted algorithms i.e. python main.py -a OCSVM. Default algorithm is LSTM. You can also change the queries result log file using the -l flag, default is /var/log/osquery/osqueryd.results.log. If you are running osquery as root, use sudo to run python script because it needs to be able to read log file.
  • Training model is built for each user since each user is expected to have different behavior. First part of the main script is training a new model by processing all actions in the log file and considering them as normal behavior. On top of that model, predictions are made on each new action that comes in.

License

The MIT License Copyright (c) 2017-present