cockroachdb/cockroach

build: Better support for pre-SSE4.2 CPUs

ansarizafar opened this issue · 7 comments

I am still getting same error which was mentioned in issue #14443
image

@ansarizafar sorry to hear you're still hitting this error! As mentioned in #14443, the root cause is that our release binaries include RocksDB compiled with SSE4.2, which your machine seems not to support.

I'm going to investigate making our default behaviour better on older hardware. In the meantime, building from source should produce a binary that will run on your machine. Could you try that?

I think there is something relatively straightforward that can be done here: very early in the lifecycle of the process we can detect if the CPU supports SSE4.2 and exit with a detailed error message if it does not. The code to detect whether the CPU supports SSE4.2 already exists in RocksDB util/crc32c.cc.

FWIW, if that RocksDB code worked as intended, we wouldn't have this problem, since we'd gracefully fall back to the slow, non-SSE4.2 CRC32 implementation. @tamird and I discussed at some point, and the conclusion was that compiling all of RocksDB with -msse4.2 causes the autovectorizer to kick in and use SSE4.2 instructions outside of the fast CRC32 implementation, and thus outside of the cpuid check. I believe we'd need to move the fast implementations to their own translation unit, and compile only that file with -msse4.2.

Would we prefer to run on these old CPUs, albeit with a slow checksum? If so, we could upstream the above fix, rather than failing fast.

Would we prefer to run on these old CPUs, albeit with a slow checksum? If so, we could upstream the above fix, rather than failing fast.

Yes, running on old CPUs, albeit with a slow checksum would be preferable. I wonder if there is a performance hit to not compiling the rest of RocksDB with -msse4.2.

See also facebook/rocksdb#2488 for an upstream issue.

GCC has a -mcrc32 flag, which looks like it would enable the CRC32 functions for explicit use (guarded by the runtime check) without turning on auto-vectorization throughout the codebase.

The compiler is using SSE4.2 instructions elsewhere, so it must expect some benefit. Besides the CRC32 function, the rest of SSE4.2 appears to be optimized string-comparison instructions. It's unclear how much difference this makes (I also assume that Go is not doing this for the Go parts of our codebase). We could also try -msse3 to still get some auto-vectorization while setting the minimum bar lower.

Clang doesn't support -mcrc32, which is rather unfortunate. Not a dealbreaker, just something to be aware of.

What I think we want is to apply the target("sse4.2") function attribute to only the FastCRC function.