This is the official implementation of the paper Image Hash Minimization for Tamper Detection by S. Maity and R. K. Karsh published at ICAPR 2017.
📌 FAQ
📌 Citation
Pun et al. | Ours | |
---|---|---|
Hash Length | 634 digits | 64 bits |
Robustness against Noise & Compression | Yes | Yes |
Detection Accuracy | 60% Approximately | 77% |
- Mathworks MATLAB R2016b or later versions
✅ Dataset
- To test the accuracy of our model, we have used CASIA 2.0 dataset which is no longer available from its official source. However, the official dataset as well as a correctly annotated version can be downloaded from here.
- The dataset we curated having 200 tampered images with tampered area <5% is a private dataset and is unavailable for usage.
- The dataset should be extracted to have the following structure.
├── dataset # Dataset root directory
├── CASIAv2 # CASIAv2.0 dataset root directory
├── original # Directory containing original images
| ├── 1.jpg
| ├── 2.jpg
| ├── ...
|
└── tampered # Directory containing tampered images
├── (1).jpg
├── (2).jpg
├── ...
- The images in the 'original' image directory have naming convention as <image_number>.jpg and the images in the 'Tampered' image directory have naming convention as <(image_number)>.jpg for the corresponding original and tampered image pairs. The image numbers should be consecutive without any breaks.
- Open the
codes
directory in MATLAB. - Set the path and hyper-parameters in
data_from_original.m
anddata_from_tampered.m
. To imitate our process, ensureK=1
as we used single cluster to determine the deviation of the centroid. The hyper-parameter, the thresholdthres
for the strength of the SURF features detected in the images needs to be tuned according to the dataset. The proper CASIAv2.0 root path should be provided in thedataset_path
and thecount
should be set as the total number of original and tampered image pairs.
count = 30; % number of samples <n> in dataset
K = 1; % setting the number of clusters to be formed
thres = 1000; % setting the threshold for SURF feature strength
dataset_path = 'path/to/dataset/root/CASIAv2/'; % setting the dataset path
maxiter_k = 1000000; % setting up the maximum iterations for clustering
- Run the
data_from_original.m
script and make sure that the centroids are saved ascenters_original.mat
in thecodes
directory. The script will provide a visualization of the SURF features extracted from each of the original images. - Run the
data_from_tampered.m
script and make sure that the centroids are saved ascenters_tampered.mat
in thecodes
directory. The script will provide a visualization of the SURF features extracted from each of the tampered images. - Set relevant parameters in
tampered.m
. Thecount
should be set as the total number of original and tampered image pairs andK=1
for imitating the method described in the paper, same asdata_from_original.m
anddata_from_tampered.m
.
count = 30; % number of samples <n> in dataset
K = 1; % setting the number of clusters to be formed
- Run
tampered.m
script. The script will print out tampered or not-tampered status for each sample in the dataset and save the Euclidean distance matrix in a file nameddistance.mat
whereNaN
represents the images that are not tampered.
- The k means clustering initial seed is chosen by the k means++ algorithm. It can also be chosen at random.
- Different seeds either from the k means++ or random may result in minor deviation from the reported accuracy.
- We recommend using the k means++ as it generates more stable seeds than the random strategy.
If you use our code for your research, please cite our paper. Many thanks!
@inproceedings{maity2017image,
title={Image Hash Minimization for Tamper Detection},
author={Maity, Subhajit and Karsh, Ram Kumar},
booktitle={Ninth International Conference on Advances in Pattern Recognition (ICAPR)},
year={2017}}