/sigver

Signature verification package, for learning representations from signature data, training user-dependent classifiers.

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Learning representations for Offline Handwritten Signature Verification

This repository contains code to train CNNs for feature extraction for Offline Handwritten Signatures, code to train writer-dependent classifiers [1] and code to train meta-learners [3]. It also contains code to train two countermeasures for Adversarial Examples, as described in [2] (the code to run the experiments from this second paper can be found in this repository).

[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)

[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification" https://doi.org/10.1109/TIFS.2019.2894031 (preprint)

[3] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Meta-learning for fast classifier adaptation to new users of Signature Verification systems" https://doi.org/10.1109/TIFS.2019.2949425 (preprint)

This code for feature extraction and writer-dependent classifiers is a re-implementation in Pytorch (original code for [1] was written in theano+lasagne: link).

Installation

This package requires python 3. Installation can be done with pip:

pip install git+https://github.com/luizgh/sigver.git  --process-dependency-links

You can also clone this repository and install it with pip install -e <path/to/repository> --process-dependency-links

Usage

Data preprocessing

The functions in this package expect training data to be provided in a single .npz file, with the following components:

  • x: Signature images (numpy array of size N x 1 x H x W)
  • y: The user that produced the signature (numpy array of size N )
  • yforg: Whether the signature is a forgery (1) or genuine (0) (numpy array of size N )

We provide functions to process some commonly used datasets in the script sigver.datasets.process_dataset. As an example, the following code pre-process the MCYT dataset with the procedure from [1] (remove background, center in canvas and resize to 170x242)

python -m sigver.preprocessing.process_dataset --dataset mcyt \
 --path MCYT-ORIGINAL/MCYToffline75original --save-path mcyt_170_242.npz

During training a random crop of size 150x220 is taken for each iteration. During test we use the center 150x220 crop.

Training a CNN for Writer-Independent feature learning

This repository implements the two loss functions defined in [1]: SigNet (learning from genuine signatures only) and SigNet-F (incorporating knowledge of forgeries). In the training script, the flag --users is use to define the users that are used for feature learning. In [1], GPDS users 300-881 were used (--users 300 881).

Training SigNet:

python -m sigver.featurelearning.train --model signet --dataset-path  <data.npz> --users [first last]\ 
--model signet --epochs 60 --logdir signet  

Training SigNet-F with lambda=0.95:

python -m sigver.featurelearning.train --model signet --dataset-path  <data.npz> --users [first last]\ 
--model signet --epochs 60 --forg --lamb 0.95 --logdir signet_f_lamb0.95  

For checking all command-line options, use python -m sigver.featurelearning.train --help. In particular, the option --visdom-port allows real-time monitoring using visdom (start the visdom server with python -m visdom.server -port <port>).

Training WD classifiers and evaluating the result

For training and testing the WD classifiers, use the sigver.wd.test script. Example:

python -m sigver.wd.test -m signet --model-path <path/to/trained_model> \
    --data-path <path/to/data> --save-path <path/to/save> \
    --exp-users 0 300 --dev-users 300 881 --gen-for-train 12

Where trained_model is a .pth file (trained with the script above, or pre-trained - see the section below). This script will split the dataset into train/test, train WD classifiers and evaluate then on the test set. This is performed for K random splits (default 10). The script saves a pickle file containing a list, where each element is the result of one random split. Each item contains a dictionary with:

  • 'all_metrics': a dictionary containing:
    • 'FRR': false rejection rate
    • 'FAR_random': false acceptance rate for random forgeries
    • 'FAR_skilled': false acceptance rate for skilled forgeries
    • 'mean_AUC': mean Area Under the Curve (average of AUC for each user)
    • 'EER': Equal Error Rate using a global threshold
    • 'EER_userthresholds': Equal Error Rate using user-specific thresholds
    • 'auc_list': the list of AUCs (one per user)
    • 'global_threshold': the optimum global threshold (used in EER)
  • 'predictions': a dictionary containing the predictions for all images on the test set:
    • 'genuinePreds': Predictions to genuine signatures
    • 'randomPreds': Predictions to random forgeries
    • 'skilledPreds': Predictions to skilled forgeries

The example above train WD classifiers for the exploitation set (users 0-300) using a development set (users 300-881), with 12 genuine signatures per user (this is the setup from [1] - refer to the paper for more details). For knowing all command-line options, run python -m sigver.wd.test -m signet.

Pre-trained models

Pre-trained models can be found here:

  • SigNet (link)
  • SigNet-F lambda 0.95 (link)

These models contains the weights for the feature extraction layers.

Important: These models were trained with pixels ranging from [0, 1]. Besides the pre-processing steps described above, you need to divide each pixel by 255 to be in the range. This can be done as follows: x = x.float().div(255). Note that Pytorch does this conversion automatically if you use torchvision.transforms.totensor, which is used during training.

Usage:

import torch
from sigver.featurelearning.models import SigNet

# Load the model
state_dict, classification_layer, forg_layer = torch.load('models/signet.pth')
base_model = SigNet().eval()
base_model.load_state_dict(state_dict)

# Extract features
with torch.no_grad(): # We don't need gradients. Inform torch so it doesn't compute them
    features = base_model(input)

See example.py for a complete example. For a jupyter notebook, see this interactive example.

Meta-learning

To train a meta-learning model, use the sigver.metalearning.train script:

python -m sigver.metalearning.train --dataset-path <path/to/datataset.npz> \
    --pretrain-epochs <number_pretrain_epochs> --num-updates <number_gradient_steps> --num-rf <num_random_forgeries> \
    --epochs <number_epochs> --num-sk-test <number_skilled_in_Dtest> --model <model>

Where num-updates is the number of gradient descent steps in the classifier adaptation (K in the paper), and num-rf refers to the number of random forgeries in the classifier adaptation steps (set --num-rf 0 for the one-class formulation). Refer to the sigver/metalearning/train.py script for a complete list of arguments.

Citation

If you use our code, please consider citing the following papers:

[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)

[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification" https://doi.org/10.1109/TIFS.2019.2894031 (preprint)

[3] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Meta-learning for fast classifier adaptation to new users of Signature Verification systems" https://doi.org/10.1109/TIFS.2019.2949425 (preprint)

License

The source code is released under the BSD 3-clause license. Note that the trained models used the GPDS dataset for training, which is restricted for non-commercial use.

Please do not contact me requesting access to any particular dataset. These requests should be addressed directly to the groups that collected them.