A series of tests for memory operation.
This project contains four different programs:
-
A ordinary cuda program. Contains memory load, kernel execution and memory store.
-
A stream optimized program. Use streams(default number is 2) to hide the memory operations.
-
Use zero-copy memory to test.
-
Use unified memory to test. (The result may because the data is moved to gpu and never changed. Add some cpu operations will obtain different results.)
To compile all the programs:
mkdir bin
cd src
make
To test the result:
make run
The result is listed below(on 3080 Laptop):
Program | time |
---|---|
no stream | 61.27 + 69.79 + 60.58 = 191.64ms |
with stream | 110.16ms |
zero copy | 73.31ms |
unified | 68.73ms |