Collision Testing
Closed this issue · 4 comments
Sorry to mix forums, but @msricher did some collision testing. To quote:
I tried a bunch of wave functions with 250GB of RAM on computecanada, couldn't generate anything that gives a hash collision.This is just using FullCI though. With HCI... you can try much larger spaces of Nbasis, Nalpha, Nbeta, and it could happen.
I'm curious exactly what you tested @msricher . The probability of a collision is (asymptotic for
My rough calculation (from the memory you requested) is that it must be about 10 billion CI coefficients, so I would have expected a collission to be extremely frequent since 1e10 is bigger than 2^32.
Can you share more about what you were testing, specifically, @msricher ?
I only tried a few things, with about 1e9 determinants. I just did (Nbasis, Nocc) = (32, 32//2), (33,33//2), (34,34//2), etc., and I was already using a lot of RAM, getting out-of-memory errors and changing the requested RAM, and repeating.... I was wondering if there was a smarter way to do it.
So with 1e9 determinants, the chance of a collision is <3%. It would take >400 trials (if my math is right) to have a 99% chance of observing a collision. So I'm not sure how many times you looked for the collision, but you probably need to try ~500 times to really know if it's a problem.
That said, if the collision rate is ~3%, it's a bit disappointing because it means that in a lot of computational studies, where >50 calculations are performed, a collision may occur. Of course, the most likely collision is between two determinants with very low coefficient, so it probably doesn't matter much....
Yeah, that's a fine way to explain it, I think. I will use this for the paper if that's ok.
The only way to mitigate this is to use GUGA.