Support hashtables with more than 2G elements
Closed this issue · 4 comments
Some users run Haskell on machines with over 1TB memory, it's not impossible to have over 2G elements of hashtable.
vector-hashtables/src/Data/Vector/Hashtables/Internal.hs
Lines 905 to 914 in 6a671f6
What's the principle behind the choice of prime numbers here? How can it be extended further, to cover entire Int64
range?
the principle is:
[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...
so it can easily be extended given a sequence of prime numbers
Something like this
Prelude Math.NumberTheory.Primes Data.List> takeWhile (\x -> x < fromIntegral(maxBound :: Int)) . map unPrime $ iterate (\p -> nextPrime (ceiling $ 1.2 * fromIntegral (unPrime p))) (nextPrime 2050765853)
[2050765853,2460919037,2953102909,3543723499,4252468211,5102961869,6123554263,7348265123,8817918149,10581501797,12697802207,15237362659,18284835221,21941802293,26330162767,31596195347,37915434433,45498521329,54598225597,65517870827,78621445013,94345734019,113214880859,135857857043,163029428461,195635314181,234762377023,281714852449,338057822959,405669387577,486803265103,584163918127,700996701773,841196042177,1009435250623,1211322300767,1453586761009,1744304113229,2093164935877,2511797923093,3014157507727,3616989009287,4340386811153,5208464173387,6250157008087,7500188409719,9000226091663,10800271310027,12960325572053,15552390686477,18662868823783,22395442588543,26874531106253,32249437327559,38699324793071,46439189751707,55727027702123,66872433242599,80246919891143,96296303869429,115555564643329,138666677571997,166400013086401,199680015703691,239616018844471,287539222613371,345047067136049,414056480563273,496867776675943,596241332011217,715489598413487,858587518096247,1030305021715513,1236366026058641,1483639231270373,1780367077524493,2136440493029393,2563728591635279,3076474309962373,3691769171954869,4430123006345843,5316147607615027,6379377129138049,7655252554965709,9186303065958853,11023563679150639,13228276414980799,15873931697977019,19048718037572429,22858461645086941,27430153974104333,32916184768925261,39499421722710373,47399306067252449,56879167280702959,68255000736843607,81906000884212361,98287201061054833,117944641273265887,141533569527919211,169840283433503041,203808340120203719,244570008144244481,293484009773093401,352180811727712129,422616974073254567,507140368887905497,608568442665486697,730282131198584087,876338557438300843,1051606268925960971,1261927522711153157,1514313027253383839,1817175632704060721,2180610759244872743,2616732911093847187,3140079493312616483,3768095391975139873,4521714470370167831,5426057364444201011,6511268837333041219,7813522604799648773]
but probably fewer than that because each element is a bunch of words
the principle is:
[3, 7 ... primeNumber_n, primeNumber_n+k > 1.2*primeNumber_n ...
so it can easily be extended given a sequence of prime numbers
Thanks for the explanation! So fundamentally there is enough freedom to select a prime around +20%.
I have an idea to choose primes such that division by them (which is ubiquitous throughout all routines) can be replaced by wide multiplication and right shift, but have not worked out all details yet.