cuckaroo/mean.cu: small variance in edge indexes count after Round(0)
xiongyw opened this issue · 2 comments
Hi, Tromp,
I run cuda29
(of cuckaroo/mean.cu
) on NV 1080Ti multiple times, with the same nonce (e.g., I use -n 671
), and dump the bucket/indexes after SeedA()/SeedB()/Round(0), then use md5sum
to check whether data changes during each run. So far I observed that:
- For SeedA(): the bucket content md5sum are different (due to randomness of threads execution order I guess), and the index md5sum are exactly the same.
- For SeedB(): both bucket content and index md5sum are different (again, I guess this is due to randomness of threads execution and different zero padding during flushing such that different number of null edges are padded).
- For Round(0): I was expecting that, the bucket content md5sum could be different (because edges' order may be different), and the indexes should be identical (since there will be no null edges). But it seems that the index content md5sum is also different. By further comparing the index content, it seems that the number of different edge counts are relatively small, and the difference between the same edge count is also very small (most differences are just 1 edge more or less, and all differences seems within single digit). I just wondering why this is the case...do you know the reason for this?
Btw, my dump/print code can be found in a fork.
Thanks!
I guess I found the reason: FLUSHA=16
and FLUSHB=8
are not big enough, so some edges may lost when storing into the tmp[][]
array. If these two values are doubled, the problem is gone!
Correct; when too many threads access the same tmp row in https://github.com/tromp/cuckoo/blob/master/src/cuckatoo/mean.cu#L120
for instance, multiple threads write to tmp[row][FLUSHA2-1], and all but one of the edges gets lost.
This will also affect the bucket counts in all later trimming rounds.