/hashfunc-comparison

Collision resistency non-cryptographic hash functions

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

hashfunc-comparison

Test collision resistency non-cryptographic hash functions. This is the experience part of project for 2017 NTU CSIE Probability lecture.

Presentation part

We uses some optimizations to have xor8 hash behaves as good as djb2, even in less time.

$ ./hashfunc.py --hash xor8 data/studentID | ./barchart.py -g
$ ./hashfunc.py --hash djb2 data/studentID | ./barchart.py -g

# Optimization 1
$ ./hashfunc.py --hash xor8 --size 64 data/studentID | ./barchart.py -g

# Optimization 2
$ ./hashfunc.py --hash xor8 --size 64 --regroup data/studentID | ./tilechart.py -g

# Optimization 3
$ ./hashfunc.py --hash xor8 --size 64 --regroup --hist-eq=0,6 data/studentID | ./tilechart.py -g

Notice that the data is slightly transformed due to privacy, thus the result would be different from the slide above. But there’s same effect on each optimization.

Structure

This program can also be used to analyze other hash functions. hashfunc.py will read lines in file given as first argument, printing out a hashtable. barchart.py and tilechart.py will read the hashtable and show the responding chart.

  1. Options in hashfunc.py
    • -h, --hash: specify the hash function, sum8, xor8, djb2 is implemented
    • -s, --size: specify the bucket number in the hashtable
    • -r, --regroup: specify whether to use shift folding, suitable only for data/studentID now
    • -e, --hist-eq: specify the subscription for each line to perform histogram equilization
  2. Options in barchart.py, tilechart.py
    • -g: specify using pyplot to show the chart, instead of terminal

Environments

  • Python3 with numpy, matplotpy installed

References