Tessil/robin-map

Adding ints with poor hash leads to explosive reserve sequence in some cases

Predelnik opened this issue · 1 comments

I run the following simple program. I do believe on libstdc++, wrapper could be replaced just with unsigned long long but I tested with VC++ so I had to write it.

#include "tsl/robin_set.h"

class wrapper
{
public:
  bool operator== (const wrapper &other) const { return x == other.x;}
  unsigned long long x;
};

namespace std
{
  template <>
  struct hash<wrapper>
  {
    size_t operator()(const wrapper &w) const noexcept { return w.x; }
  };
}

int main ()
{
  tsl::robin_set<wrapper> s;
  auto fp = fopen ("a.txt", "r");
  size_t x;
  while (fscanf (fp, "%llu", &x) == 1)
    {
      s.insert (wrapper {x});
      std::cout << s.bucket_count () << '\n';
    }
  fclose (fp);
}

Basically I add numbers from attached a.txt to robin_set

Print results at the end of execution are akin to the following:

8192
8192
8192
16384
16384
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912

With program taking 6-8 GB of RAM in the end. I believe adding a single number to the end of the file results in one more 2x reserved memory increase.
We could observe this problem only after recent update so it probably has to do with some recent changes in robin_map. While I understand that this hash is probably poor it is still default in libstdc++ and it would be nice if such behavior would be fixed.

Edit:
Bisect shows that 73959d6 is where it started to happen

Unfortunately I don't think there're a lot of solutions without changing the hash function or the hash collision resolution algorithm itself. I could raise the DIST_FROM_IDEAL_BUCKET_LIMIT limit but the problem will remain as the size of distance_type is limited.

The identity hash is a really poor hash function, even more when you only insert even numbers as in the a.txtfile, and I would recommend to change it to something more robust as it would reduce the collisions and improve the performances.