/ImageHashMinimization

[ICAPR 2017] Image Hash Minimization for Tamper Detection

Primary LanguageMATLABGNU General Public License v3.0GPL-3.0

Banner
icapr banner
doi shield arxiv shield github pages shield github shield
build shield dependencies shield license shield


[ICAPR 2017] Image Hash Minimization for Tamper Detection 🔥

This is the official implementation of the paper Image Hash Minimization for Tamper Detection by S. Maity and R. K. Karsh published at ICAPR 2017.

📌 Requirements

📌 Guidelines to Use

📌 FAQ

📌 Citation

Methodological Flow

Sample Qualitative Depiction

Quantitative Performance Measures

Pun et al. Ours
Hash Length 634 digits 64 bits
Robustness against Noise & Compression Yes Yes
Detection Accuracy 60% Approximately 77%

🚀 Requirements

  • Mathworks MATLAB R2016b or later versions

📝 Guidelines to Use

Dataset

Running the Scripts

Dataset

  • To test the accuracy of our model, we have used CASIA 2.0 dataset which is no longer available from its official source. However, the official dataset as well as a correctly annotated version can be downloaded from here.
  • The dataset we curated having 200 tampered images with tampered area <5% is a private dataset and is unavailable for usage.
  • The dataset should be extracted to have the following structure.
├── dataset                      # Dataset root directory
   ├── CASIAv2                   # CASIAv2.0 dataset root directory
      ├── original               # Directory containing original images
      |  ├── 1.jpg
      |  ├── 2.jpg
      |  ├── ...
      |
      └── tampered               # Directory containing tampered images
         ├── (1).jpg
         ├── (2).jpg
         ├── ...
  • The images in the 'original' image directory have naming convention as <image_number>.jpg and the images in the 'Tampered' image directory have naming convention as <(image_number)>.jpg for the corresponding original and tampered image pairs. The image numbers should be consecutive without any breaks.

Running the Scripts

  1. Open the codes directory in MATLAB.
  2. Set the path and hyper-parameters in data_from_original.m and data_from_tampered.m. To imitate our process, ensure K=1 as we used single cluster to determine the deviation of the centroid. The hyper-parameter, the threshold thres for the strength of the SURF features detected in the images needs to be tuned according to the dataset. The proper CASIAv2.0 root path should be provided in the dataset_path and the count should be set as the total number of original and tampered image pairs.
count = 30;                                               % number of samples <n> in dataset
K = 1;                                                    % setting the number of clusters to be formed
thres = 1000;                                             % setting the threshold for SURF feature strength
dataset_path = 'path/to/dataset/root/CASIAv2/';           % setting the dataset path
maxiter_k = 1000000;                                      % setting up the maximum iterations for clustering
  1. Run the data_from_original.m script and make sure that the centroids are saved as centers_original.mat in the codes directory. The script will provide a visualization of the SURF features extracted from each of the original images.
  2. Run the data_from_tampered.m script and make sure that the centroids are saved as centers_tampered.mat in the codes directory. The script will provide a visualization of the SURF features extracted from each of the tampered images.
  3. Set relevant parameters in tampered.m. The count should be set as the total number of original and tampered image pairs and K=1 for imitating the method described in the paper, same as data_from_original.m and data_from_tampered.m.
count = 30;                                               % number of samples <n> in dataset
K = 1;                                                    % setting the number of clusters to be formed
  1. Run tampered.m script. The script will print out tampered or not-tampered status for each sample in the dataset and save the Euclidean distance matrix in a file named distance.mat where NaN represents the images that are not tampered.

🔍 FAQ

  • The k means clustering initial seed is chosen by the k means++ algorithm. It can also be chosen at random.
  • Different seeds either from the k means++ or random may result in minor deviation from the reported accuracy.
  • We recommend using the k means++ as it generates more stable seeds than the random strategy.

BibTeX

If you use our code for your research, please cite our paper. Many thanks!

@inproceedings{maity2017image,
title={Image Hash Minimization for Tamper Detection},
author={Maity, Subhajit and Karsh, Ram Kumar},
booktitle={Ninth International Conference on Advances in Pattern Recognition (ICAPR)},
year={2017}}