Source code and submissions made to the Keystroke Biometrics Ongoing Competition (KBOC).
This repository also contains the code to reproduce the results in the companion paper, where the anomaly detection systems are described in detail.
Results were obtained with the following software versions:
> %watermark -v -p numpy,pandas,scikit-learn,tensorflow,pohmm
CPython 3.5.1
IPython 4.1.2
numpy 1.11.0
pandas 0.18.1
scikit-learn 0.17.1
tensorflow 0.8.0
pohmm 0.2
It is recommended to use Anaconda and create a virtual env with the above dependencies installed.
To reproduce the main results, place the KBOC databases (zip files) in the data/ folder. Then run the main.py script:
> python main.py
This will create the validation and submission files for 21 different systems. Systems 1-15 were submitted to the KBOC. Note that this script may take several hours to complete, depending on the CPU, use of GPU for neural network training, and available memory. The resulting validation and submission score files may also vary slightly from those in the repository depending on the GPU used for training neural network models.
To plot the score distributions of any system (requires matplotlib and seaborn), use the plot_scores.py script:
> python plot_scores.py system6
Default is to use SD score normalization, SD feature normalization, and keystroke correspondence between the given and target sequence (described in the paper).
Deep autoencoder with three hidden layers of dimensions 5, 4, and 3.
Variational autoencoder with two hidden layers of dimension 5.
Partially observable hidden Markov model with 2 hidden states and lognormal emissions.
One-class support vector machine (SVM) using press-press latency and duration features.
Contractive autoencoder with hidden layer of dimension 400.
Manhattan distance.
Autoencoder with a single hidden layer of dimension 5.
Contractive autoencoder with hidden layer of dimension 200.
Mean ensemble of systems 3, 4, and 5.
Mean ensemble of systems 1-8.
Same as system 3, except using min/max score normalization.
Same as system 4, except using min/max score normalization.
Same as system 5, except using min/max score normalization.
Same as system 8, except using min/max score normalization.
Mean ensemble of systems 11-14.
Manhattan distance using min/max score normalization.
Manhattan distance using no score normalization.
Manhattan distance without the keystroke alignment.
Manhattan distance without keystroke alignment and using min/max score normalization.
Manhattan distance without keystroke alignment and using no score normalization.
Manhattan distance using min/max feature normalization.
Manhattan distance discarding modifier keys.
Manhattan distance discarding modifier keys and using min/max score normalization.
Manhattan distance discarding modifier keys and using no score normalization.