rigtorp/spartan

hash function for symbol key strings to ints

Closed this issue · 7 comments

to map from symbol name to an int (which then can be used to index an array)

i think this is the fastest way to map a symbol name to a unique int. i think its faster than a trie as well.

$ gperf -e '\n' -L C++ -7 -C -E -k '*,1,$' -m 6 symbols.txt > perfhash.cpp

the '-m 6' indicates 6 elements in list.csv

test it out yourself, then look at perfhash.cpp

it will create a perfect hash function

create symbols.txt with 6 symbol names in it like:
MSFT
AAPL
AA
ZG
T
A

I'm aware of perfect hashing. Have you tried benchmarking it? It's less robust (need to regenerate when symbols are added) so I doubt it's worthwhile.

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp notifications@github.com
wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
#2 (comment).

but the latency will be lower because its just pure arithmetic vs branching

On Fri, Nov 13, 2015 at 11:30 AM, Peter Krey peterjkrey@gmail.com wrote:

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp notifications@github.com
wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
#2 (comment).

With a open addressing hash table with linear probing (what I'm using) all
branches are predictable when there is no collision. You could use the
perfect hash as the integer finalizer and get zero collisions in the common
case, but robustness if new symbols are added.

On Fri, Nov 13, 2015 at 1:32 PM peterjkrey notifications@github.com wrote:

but the latency will be lower because its just pure arithmetic vs branching

On Fri, Nov 13, 2015 at 11:30 AM, Peter Krey peterjkrey@gmail.com wrote:

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp <notifications@github.com

wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).

Have you tried using a trie with each node a struct containing a letter and
the integer key if you have reached the final node

On Friday, November 13, 2015, Erik Rigtorp notifications@github.com wrote:

With a open addressing hash table with linear probing (what I'm using) all
branches are predictable when there is no collision. You could use the
perfect hash as the integer finalizer and get zero collisions in the common
case, but robustness if new symbols are added.

On Fri, Nov 13, 2015 at 1:32 PM peterjkrey <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

but the latency will be lower because its just pure arithmetic vs
branching

On Fri, Nov 13, 2015 at 11:30 AM, Peter Krey <peterjkrey@gmail.com
javascript:_e(%7B%7D,'cvml','peterjkrey@gmail.com');> wrote:

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's
less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).

No, I suspect it'll be bad. Lots of branch misspredictions and a cache
unfriendly memory access pattern.

On Fri, Nov 13, 2015 at 2:49 PM peterjkrey notifications@github.com wrote:

Have you tried using a trie with each node a struct containing a letter and
the integer key if you have reached the final node

On Friday, November 13, 2015, Erik Rigtorp notifications@github.com
wrote:

With a open addressing hash table with linear probing (what I'm using)
all
branches are predictable when there is no collision. You could use the
perfect hash as the integer finalizer and get zero collisions in the
common
case, but robustness if new symbols are added.

On Fri, Nov 13, 2015 at 1:32 PM peterjkrey <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

but the latency will be lower because its just pure arithmetic vs
branching

On Fri, Nov 13, 2015 at 11:30 AM, Peter Krey <peterjkrey@gmail.com
javascript:_e(%7B%7D,'cvml','peterjkrey@gmail.com');> wrote:

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's
less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
<#2 (comment)
.


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).

You never need to add symbols in the middle of he day - I think the best
way to do it is to recompile the perfect hash function when there are
symbol changes

On Friday, November 13, 2015, Erik Rigtorp notifications@github.com wrote:

No, I suspect it'll be bad. Lots of branch misspredictions and a cache
unfriendly memory access pattern.

On Fri, Nov 13, 2015 at 2:49 PM peterjkrey <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Have you tried using a trie with each node a struct containing a letter
and
the integer key if you have reached the final node

On Friday, November 13, 2015, Erik Rigtorp <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

With a open addressing hash table with linear probing (what I'm using)
all
branches are predictable when there is no collision. You could use the
perfect hash as the integer finalizer and get zero collisions in the
common
case, but robustness if new symbols are added.

On Fri, Nov 13, 2015 at 1:32 PM peterjkrey <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

but the latency will be lower because its just pure arithmetic vs
branching

On Fri, Nov 13, 2015 at 11:30 AM, Peter Krey <peterjkrey@gmail.com
javascript:_e(%7B%7D,'cvml','peterjkrey@gmail.com');
<javascript:_e(%7B%7D,'cvml','peterjkrey@gmail.com
javascript:_e(%7B%7D,'cvml','peterjkrey@gmail.com');');>> wrote:

thats true you do need to add when you add new symbols

On Fri, Nov 13, 2015 at 10:41 AM, Erik Rigtorp <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>

wrote:

I'm aware of perfect hashing. Have you tried benchmarking it? It's
less
robust (need to regenerate when symbols are added) so I doubt it's
worthwhile.


Reply to this email directly or view it on GitHub
<
#2 (comment)
.


Reply to this email directly or view it on GitHub
<#2 (comment)
.


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).


Reply to this email directly or view it on GitHub
#2 (comment).