facebookincubator/gloo

Use better default sum routines

pietern opened this issue · 1 comments

See https://github.com/pytorch/benchmark/blob/master/timing/cpp/benchmarks/avx_sum.cpp#L94 for a survey of different SIMD sum routines. Benchmarks indicate that sum_simple_128 is one of the fastest if AVX is available, per @cpuhrsch.

No need for this.