NVIDIA/gdrcopy

a question about memcpy in avx

hongbilu opened this issue · 2 comments

for memcpy_uncached_store_avx and memcpy_cached_store_avx implementation, if source ptr is aligned, why memcpy_uncached_store_avx will copy 8sizeof(__m256d) for every loop, but memcpy_cached_store_avx only copy 4sizeof(__m256d), it that a empirical value?

I am trying to recall... no good reason I believe.

thanks for truthful feedback.