The best Bitonic sort implementation ever written Parallel sorting is so fast compared to sequential sorting, given you have have the processors. This implementation of Bitonic sort uses CUDA to achieve monsterous performance. Test sample Sorting a list with 2^29 elements