emscripten-core/emscripten

_mm_cvttpd_epi32 emulation incorrectly rounds values near integer boundaries

Closed this issue · 2 comments

Version of emscripten/emsdk:
4.0.8


Description and Reproduction

Emscripten’s emulation of _mm_cvttpd_epi32 is currently broken for some inputs.
For example, the following should print 1022 but instead prints 1023:

// Compile with: emcc -msimd128 -msse3 repro.c
#include <emmintrin.h>
#include <stdio.h>

int main() {
  __m128d v = _mm_set1_pd(1022.99998194495);
  __m128i i = _mm_cvttpd_epi32(v);
  int out[4];
  _mm_storeu_si128((__m128i*)out, i);
  printf("%d %d\n", out[0], out[1]);
}

Expected output:

1022 1022

Actual output:

1023 1023

Root Cause

The current emulation converts each double lane to float before truncating, which loses precision for values near integer boundaries.
In this case, 1022.99998194495 becomes 1023.0f, so truncation yields 1023 instead of 1022.


Suggested Fix

Avoid the lossy cast to float and use the f64 truncation builtin:

static inline __m128i _mm_cvttpd_epi32(__m128d a) {
  int m[4] = {0, 0, 0, 0};
  for (int i = 0; i < 2; ++i) {
    double e = a[i];
    if (!isfinite(e) || e >= 2147483648.0 || e < -2147483648.0) {
      m[i] = 0x80000000; // match Intel semantics for NaN / out of range
    } else {
      m[i] = __builtin_wasm_trunc_s_i32_f64(e); // <-- f64 truncation, not f32
    }
  }
  return wasm_i32x4_make(m[0], m[1], 0, 0);
}

This ensures correct results without losing precision.

juj commented

Thanks! Posted #25130

juj commented

In your example you had dropped lrint(elem) != 0 || fabs(elem) < 2.0) in favor of just an isfinite check. That looks like a possible optimization, though good for a separate PR.. looking at the interesting float values, there may be room for further testing to see if that works.