JeffreySarnoff/SortingNetworks.jl

Restricted to Int64 / native Int?

Jutho opened this issue · 8 comments

Jutho commented

This looks interesting and a nice addition to TupleTools.jl. However, it seems that minmax is only fast for Int64. Comparing to the naive O(N^2) algorithm in TupleTools with

for T in (Int16,Int32,Int64)
for n=2:12
       println("sorting Tuple with $(n) elements of type $(T)")
       t=(collect(T(1):T(n))...,)
       println("TupleTools.sort:")
       @btime TupleTools.sort($t)
       println("SortingNetworks.swapsort:")
       @btime SortingNetworks.swapsort($t)
       println("---------------------------------")
       end
       end;

I obtain

sorting Tuple with 2 elements of type Int16
TupleTools.sort:
  5.622 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  5.821 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int16
TupleTools.sort:
  12.541 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  15.323 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int16
TupleTools.sort:
  16.996 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  24.519 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int16
TupleTools.sort:
  26.470 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  42.479 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int16
TupleTools.sort:
  29.567 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  53.294 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int16
TupleTools.sort:
  34.843 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  68.078 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int16
TupleTools.sort:
  38.545 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  79.180 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int16
TupleTools.sort:
  44.858 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  101.845 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int16
TupleTools.sort:
  52.250 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  118.193 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int16
TupleTools.sort:
  56.757 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  148.154 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int16
TupleTools.sort:
  66.788 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  155.918 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 2 elements of type Int32
TupleTools.sort:
  5.299 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  5.821 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int32
TupleTools.sort:
  12.822 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  20.338 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int32
TupleTools.sort:
  16.100 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  27.865 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int32
TupleTools.sort:
  20.060 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  45.119 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int32
TupleTools.sort:
  24.549 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  53.295 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int32
TupleTools.sort:
  28.785 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  68.606 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int32
TupleTools.sort:
  34.062 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  80.461 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int32
TupleTools.sort:
  39.646 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  102.902 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int32
TupleTools.sort:
  49.592 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  119.766 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int32
TupleTools.sort:
  55.696 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  141.927 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int32
TupleTools.sort:
  64.977 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  158.029 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 2 elements of type Int64
TupleTools.sort:
  5.560 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  3.090 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int64
TupleTools.sort:
  8.374 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  3.928 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int64
TupleTools.sort:
  11.884 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  4.513 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int64
TupleTools.sort:
  16.730 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  6.153 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int64
TupleTools.sort:
  21.399 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  7.868 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int64
TupleTools.sort:
  26.948 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  10.326 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int64
TupleTools.sort:
  33.021 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  10.398 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int64
TupleTools.sort:
  38.572 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  18.921 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int64
TupleTools.sort:
  46.221 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  20.364 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int64
TupleTools.sort:
  54.407 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  22.488 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int64
TupleTools.sort:
  64.721 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  25.938 ns (0 allocations: 0 bytes)
---------------------------------

That is unexpected. Looking at the code generated for minmax with Int64 and Int32, pairwise they appear to be equally properly handled. ```julia
julia> code_native(minmax, (Int64, Int64))
.text
; Function minmax {
; Location: promotion.jl:438
; Function <; {
; Location: promotion.jl:438
cmpq %rsi, %rdx
;}
jge L16
movq %rdx, (%rdi)
movq %rsi, 8(%rdi)
movq %rdi, %rax
retq
L16:
movq %rsi, (%rdi)
movq %rdx, 8(%rdi)
movq %rdi, %rax
retq
nopl (%rax,%rax)
;}

julia> code_native(minmax, (Int32, Int32))
.text
; Function minmax {
; Location: promotion.jl:438
; Function <; {
; Location: promotion.jl:438
cmpl %edi, %esi
;}
jge L9
movl %esi, %eax
movl %edi, %edx
retq
L9:
movl %edi, %eax
movl %esi, %edx
retq
nop
;}

--- my code is good --- I obtain

julia> using TupleTools

julia> for T in (Int16,Int32,Int64)
       for n=2:12
              println("sorting Tuple with $(n) elements of type $(T)")
              t=(collect(T(1):T(n))...,)
              println("TupleTools.sort:")
              @btime TupleTools.sort($t)
              println("SortingNetworks.swapsort:")
              @btime SortingNetworks.swapsort($t)
              println("---------------------------------")
              end
              end;
sorting Tuple with 2 elements of type Int16
TupleTools.sort:
  2.407 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.688 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int16
TupleTools.sort:
  6.983 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.694 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int16
TupleTools.sort:
  13.927 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  2.523 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int16
TupleTools.sort:
  24.247 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  3.920 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int16
TupleTools.sort:
  34.275 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  5.023 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int16
TupleTools.sort:
  45.483 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  6.424 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int16
TupleTools.sort:
  58.392 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  7.555 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int16
TupleTools.sort:
  71.382 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  9.508 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int16
TupleTools.sort:
  86.730 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  10.891 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int16
TupleTools.sort:
  103.827 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  12.857 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int16
TupleTools.sort:
  125.794 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  14.519 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 2 elements of type Int32
TupleTools.sort:
  3.625 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.495 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int32
TupleTools.sort:
  7.263 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.970 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int32
TupleTools.sort:
  16.998 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  2.521 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int32
TupleTools.sort:
  26.469 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  3.923 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int32
TupleTools.sort:
  36.495 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  5.025 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int32
TupleTools.sort:
  47.348 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  6.442 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int32
TupleTools.sort:
  59.923 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  7.557 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int32
TupleTools.sort:
  74.566 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  9.498 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int32
TupleTools.sort:
  89.782 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  10.906 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int32
TupleTools.sort:
  106.034 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  12.855 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int32
TupleTools.sort:
  128.897 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  14.546 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 2 elements of type Int64
TupleTools.sort:
  2.526 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.788 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 3 elements of type Int64
TupleTools.sort:
  10.863 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  1.964 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 4 elements of type Int64
TupleTools.sort:
  19.773 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  2.534 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 5 elements of type Int64
TupleTools.sort:
  30.093 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  3.929 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 6 elements of type Int64
TupleTools.sort:
  40.111 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  5.024 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 7 elements of type Int64
TupleTools.sort:
  51.245 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  6.447 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 8 elements of type Int64
TupleTools.sort:
  64.075 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  7.545 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 9 elements of type Int64
TupleTools.sort:
  77.901 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  9.498 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 10 elements of type Int64
TupleTools.sort:
  93.856 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  10.907 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 11 elements of type Int64
TupleTools.sort:
  110.873 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  12.903 ns (0 allocations: 0 bytes)
---------------------------------
sorting Tuple with 12 elements of type Int64
TupleTools.sort:
  135.839 ns (0 allocations: 0 bytes)
SortingNetworks.swapsort:
  14.548 ns (0 allocations: 0 bytes)
---------------------------------

julia> 

what OS, what is sizeof(Int), which version of Julia are using

Jutho commented

That's weird, and in your case TupleTools sort seems to be performing worse than on my system.

My versioninfo() is

Julia Version 0.6.3
Commit d55cadc350 (2018-05-28 20:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

I have not used v0.6x for a long time. If you can do so without overwriting your own setup, try downloading the prebuilt v0.7-alpha and rerun the timing using that -- there are a number of important changes low level. And if there is still such a discrepancy -- I will raise an issue with the dev team.

julia> versioninfo()
Julia Version 0.7.0-alpha.0
Commit 22590d5* (2018-05-31 00:07 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

The easiest way to temp install is just to download it (unzip) and run it from the download directory [if that works with the Mac] then use Pkg.add() and ignore all the warnings. You can erase the local download. Some new dirs will remain wherever Julia keeps its pkgs. That should not matter at all when using v0.6x and should not interfere with installing v0.7.0 when that happens. You can erase those extra dirs too -- I will post them here later today.

Jutho commented

Indeed, everything seems to work fine on julia 0.7.