curtisseizert/CUDASieve

-bs Block size command line option

Closed this issue · 1 comments

I am using Ubuntu 18.04 LTS as my operating system and I downloaded the binary provided for cudasieve. Even though I give -bs 1024 as an option, it keeps using 16kb as sieve size. Is this a bug or is it intended? I have a GTX 1080 as my gpu.

This option only allows one to change the size of the large sieve. The sieve uses on-chip shared memory (like L1 cache) to do most of the writes to the sieve (primes < 2**20 I think). The on-chip memory is only 128 kb per SM on the 1080. Each block, however, is allowed only 64 kb of shared memory, so the small sieve size would be limited to this amount.

Above 2^40, primes larger than 2^20 are required to run the sieve. For this operation, the shared memory for each block is copied to a larger array. This array is designed to fit in L2 cache, of which there is 2048 kb on the gtx 1080. A somewhat more complicated mechanism is used here to sieve using the larger primes, but in terms of speed, the array needs to be able to fit into L2 cache, which the programmer cannot explicitly control. I have found that 1024 kb is the optimal value here. If it is 2048 kb, I think the array is moved into global memory. On my machine, a certain range took 32 ms at 1024 kb, 53 ms at 2048 kb, and 79 ms at 4096 kb. Reducing the block size to 512 kb also slowed down the operation at 53 ms.

Long story short, the small sieve size is coded into the program at 16 kb for ranges =>20^20 and 2 kb for ranges <2^20, which are numbers that came from some optimization experiments. You can change these values in the code if you want, but you will be limited by the hardware limitation of 64 kb. The large sieve size is also optimized, but you can change it from the command line if you want.

Hope this helps.