theochem/PyCI

Collision Testing

Closed this issue · 4 comments

Collision Testing

Sorry to mix forums, but @msricher did some collision testing. To quote:

I tried a bunch of wave functions with 250GB of RAM on computecanada, couldn't generate anything that gives a hash collision.This is just using FullCI though. With HCI... you can try much larger spaces of Nbasis, Nalpha, Nbeta, and it could happen.

I'm curious exactly what you tested @msricher . The probability of a collision is (asymptotic for $n \ll 2^64$ but it should be good enough) $\frac{n_{CI}^2}{2^{65}}$, though that's for a random uniform hash and one could potentially make it so that it's less likely to occur for the CI coefficients (low excitation) that are most likely to appear.

My rough calculation (from the memory you requested) is that it must be about 10 billion CI coefficients, so I would have expected a collission to be extremely frequent since 1e10 is bigger than 2^32.

Can you share more about what you were testing, specifically, @msricher ?

I only tried a few things, with about 1e9 determinants. I just did (Nbasis, Nocc) = (32, 32//2), (33,33//2), (34,34//2), etc., and I was already using a lot of RAM, getting out-of-memory errors and changing the requested RAM, and repeating.... I was wondering if there was a smarter way to do it.

So with 1e9 determinants, the chance of a collision is <3%. It would take >400 trials (if my math is right) to have a 99% chance of observing a collision. So I'm not sure how many times you looked for the collision, but you probably need to try ~500 times to really know if it's a problem.

That said, if the collision rate is ~3%, it's a bit disappointing because it means that in a lot of computational studies, where >50 calculations are performed, a collision may occur. Of course, the most likely collision is between two determinants with very low coefficient, so it probably doesn't matter much....

Yeah, that's a fine way to explain it, I think. I will use this for the paper if that's ok.
The only way to mitigate this is to use GUGA.