rust-lang/rfcs

Math support in core

japaric opened this issue · 36 comments

Background

Currently the core crate doesn't provide support for mathematical functions like sqrt or sin.
To do math in a #![no_std] program one has the following options:

  • Link to a C implementation of libm, i.e. libm.a. This is cumbersome as the programmer needs to
    obtain a compiled version of libm for their target, or compile libm themselves which implies a C
    cross toolchain when the target system and the build system are not the same architecture / OS.

  • Use a pure Rust implementation of libm, like the libm crate. On stable, (a) the performance of
    such implementation won't be on par with a C implementation, or (b) to achieve the same
    performance the user would require a C (cross) toolchain.

To elaborate on (a) and (b). Consider the following contrived program that computes the square root
of a number:

#![no_std]

extern crate libm;

use core::ptr;

use libm::F32Ext;

#[no_mangle]
pub unsafe fn foo() {
    // volatile memory accesses to prevent the compiler from optimizing away everything
    let x: f32 = ptr::read_volatile(0x2000_0000 as *const _);
    let y = x.sqrt();
    ptr::write_volatile(0x2000_1000 as *mut _, y);
}

When compiled for the thumbv7em-none-eabihf target it produces the following machine code:

00000000 <foo>:
   0:   f04f 5000       mov.w   r0, #536870912  ; 0x20000000
   4:   ed90 0a00       vldr    s0, [r0]
   8:   ee10 1a10       vmov    r1, s0
   c:   f001 40ff       and.w   r0, r1, #2139095040     ; 0x7f800000
  10:   f1b0 4fff       cmp.w   r0, #2139095040 ; 0x7f800000
  14:   d108            bne.n   28 <foo+0x28>
  16:   ee00 0a00       vmla.f32        s0, s0, s0
  (..)
 2f4:   ed80 0a00       vstr    s0, [r0]
 2f8:   4770            bx      lr

This is extremely inefficient machine code because the target has a hardware FPU that supports
computing the square root in a single instruction. Ideally, the program should compile down to the
following machine code:

00000000 <foo>:
   0:   f04f 5000       mov.w   r0, #536870912  ; 0x20000000
   4:   ed90 0a00       vldr    s0, [r0]
   8:   f241 0000       movw    r0, #4096       ; 0x1000
   c:   f2c2 0000       movt    r0, #8192       ; 0x2000
  10:   eeb1 0ac0       vsqrt.f32       s0, s0
  14:   ed80 0a00       vstr    s0, [r0]
  18:   4770            bx      lr

If the target had access to the standard library the program would compile down to that machine code
because the implementation of f32.sqrt in std looks like this:

#![feature(core_intrinsics)]

use std::intrinsics;

impl f32 {
    fn sqrt(self) -> Self {
        intrinsics::sqrtf32(self)
    }
}

sqrtf32 is an unstable, thin wrapper around an LLVM intrinsic that either compiles down to a
hardware implementation of square root if the target architecture supports it in its instruction
set, or it produces a call to the sqrtf routine if it doesn't (*). std makes use of 30+ of such
LLVM intrinsics for performance of math functions.

(*) The llvm.sqrt.* LLVM intrinsic, which sqrtf32 wraps, is not quite specified like that but
that's the observable effect.

The libm crate can't make use of this intrinsic on stable because it's unstable and feature
gated. However, the libm crate could replicate the behavior of the sqrtf32 intrinsic using
conditional compilation and external assembly files as shown below:

// crate: libm

// NOTE heavily simplified because it ignores architectures other than ARM
impl F32Ext for f32 {
    #[cfg(target_arch = "arm")]
    fn sqrt(self) -> Self {
        extern "C" {
            // provided by an external assembly file
            fn vsqrt_f32(x: f32) -> f32;
        }

        unsafe { vsqrt_f32(self) }
    }

    #[cfg(not(target_arch = "arm"))]
    fn sqrt(self) -> Self {
        // software implementation
    }
}

But this would heavily complicate the implementation of the libm crate, which would likely
introduce bugs. Also, as it's not possible to use inline assembly (asm!) on stable the vsqrt.f32
instruction would have to be invoked via FFI and an external assembly file. External assembly files
mean that the user would require a C (cross) toolchain to build the crate negating the main benefit
of using a pure Rust implementation of libm.

Possible solutions

I see two options for improving the situation here:

a. We stabilize the family of sqrtf32 LLVM intrinsics. This way crates like libm can achieve the
performance of the std implementation on stable without requiring complex conditional
compilation and C toolchains. Or,

b. We move all the existing math support from std to core. For the user this means that e.g.
f32.sqrt will also work in #![no_std] programs.

Option (a) is kind of bad (maybe?) for alternative backends like cranelift as they would have to
support / implement these LLVM intrinsics to be on parity with the rustc+LLVM compiler.

Option (b) requires us (*) to provide an implementation of math functions (symbols) like sqrtf
for targets that do not link to libm by default. If we don't do this those targets will hit
"undefined reference to sqrtf" linker errors when using math methods like f32.sqrt.

(*) "us" as in: we must provide symbols like sqrtf in the compiler-builtins crate. Note that we
are already providing such symbols for the wasm32-unknown-unknown target, and we are
using the libm crate to do that.

If we go ahead with option (b) we must be careful to not provide the math symbols in
compiler-builtins for targets that are currently using system libm (e.g.
x86_64-unknown-linux-gnu). Because if we do provide the symbols then all existing programs will
start using the libm crate implementation instead of the system libm implementation -- this is due
to how we invoke the linker: libcompiler_builtins.rlib appears before -lm in the linker
arguments -- and that may degrade performance in some cases where system libm has architecture
optimized implementations of some functions.

With option (b) I believe that #![no_std] programs that are currently linking to some C
implementation of libm for math support will end up using the libm crate implementation as a side
effect. I don't see a way to avoid this: even if we mark the math symbols in compiler-builtins as
weak the way we invoke the linker will cause the program to use the libm crate implementation.

Final thoughts

IMO, math support should be in the core crate as it doesn't depend on OS, or I/O, abstractions
like other std-only API does (e.g. std::fs, std::net). Also, std makes math like sqrt feel built-in because the functionality is provided as inherent methods -- it feels weird that such "built-in" functionality is not available in #![no_std].


Thoughts? Should we do (a) or (b)? Or is there some other solution? Or should we leave math out of core?

cc @SimonSapin (T-libs), @jethrogb @Ericson2314 (T-portability), @joshtriplett @korken89 (some stakeholders)

Previous discussion of this: rust-lang/rust#50145

I don’t understand the difference between option (a) and (b), they seem to be effectively the same. Stabilizing an intrinsic is typically done by adding a stable (and safe) wrapper function. The existing support in std is a stable wrapper function/method.

feel built-in because the functionality is provided as inherent methods

For what it’s worth we have other precedent of inherent methods not being present in libcore, for example [T]::to_vec.

Option (a) is kind of bad (maybe?) for alternative backends like cranelift as they would have to
support / implement these LLVM intrinsics to be on parity with the rustc+LLVM compiler.

A cranelift backend for rustc already needs to gracefully handle a long list of rustc intrinsics. In this case always translating the rustc intrinsic to a sqrt call should be easy & functional, it wouldn't even have to impact cranelift code. (cc @eddyb)

@SimonSapin

I don’t understand the difference between option (a) and (b), they seem to be effectively the same.

They are pretty similar but they are not exactly the same. In option (b) core becomes the
provider of math functionality. With option (a) any crate can provide math functionality that's on
par with the performance of std (cf. sqrtf32 example).

Thinking about it some more I think option (a) will also run into the problems that option (b) has
with respect to shadowing system libm.

Also, I originally thought (but didn't comment above) that having option (a) would good because the
ecosystem could grow architecture optimized versions of the libm crate and a user would be free to
pick one of those for their project. I don't think that would work well on stable because
architecture optimized versions would very likely involve external assembly files, which require a C
cross toolchain. Additionally, having a bunch of crates like libm-x86_64 and libm-arm-neon on
crates.io would probably lead to fragmentation. It seems better to have core as the only provider
of math functions and to have everyone contribute architecture optimized implementations to it --
core can make use of unstable features like asm! and global_asm! and it comes pre-compiled so
it doesn't add a dependency on a C toolchain.

TL;DR Option (b) (still) sounds best to me.

For what it’s worth we have other precedent of inherent methods not being present in libcore, for
example [T]::to_vec.

I think that's the only inherent method that involves a type that's not built into the language
(Vec), which makes it a bit of a special case / exception. I believe to_vec was turned into an
inherent method before the ToOwned trait was added to the standard library, and I think that
having ToOwned in the std prelude, and potentially in an alloc (pseudo) prelude, reduces or
eliminates the need for it.

@nagisa

I cc-ed T-portability people because they also look into issues that aim to close the gap between
standard programs and no_std programs like having io::{Read,Write} in core, and less so
because this is related to the portability lint.

I think math functions should not be cfg-ed away. Even if the target doesn't have a FPU, software
emulation of floats is always possible; 128-bit integer support is in a similar situation and we
don't cfg that today. (I know some targets have problems with LLVM bugs that may prevent core
compiling for them if it gained math functions; in that case we could temporarily cfg away math
support until the LLVM bug is fixed. I believe pure MIR rlibs would also sidestep the LLVM problem
and I would prefer that over using #[cfg])

@rkruppe

Right, other backends will have to port other Rust intrinsics as things stand today. Perhaps, 30
more intrinsics is not that much extra work.


Finally, this is a bit of speculation because I have not tested it but I think that putting the math
symbols / functions in core, rather than in compiler-builtins, may let the compiler perform
inlining -- the intrinsics / functions in compiler-builtins never get inlined, not even with LTO
enabled, because that crate is always a separate object file.

For what it’s worth we have other precedent of inherent methods not being present in libcore, for example [T]::to_vec.

I think that's the only inherent method that involves a type that's not built into the language
(Vec), which makes it a bit of a special case / exception.

Currently there are 21 inherent methods defined on primitive types outside of libcore because they involve Vec or String in their signature or in their implementation.

This doesn’t invalidate this issue, I think having math support in libcore would be good. I’m only saying it wouldn’t entirely remove the weirdness of inherent methods being "magically" added just by adding a dependency on a non-core standard library crate.

#[lang = "slice_alloc"]
impl<T> [T] {
    pub fn sort(&mut self) where T: Ord {}
    pub fn sort_by<F>(&mut self, mut compare: F) where F: FnMut(&T, &T) -> Ordering {}
    pub fn sort_by_key<K, F>(&mut self, mut f: F) where F: FnMut(&T) -> K, K: Ord {}
    pub fn sort_by_cached_key<K, F>(&mut self, f: F) where F: FnMut(&T) -> K, K: Ord {}
    pub fn to_vec(&self) -> Vec<T>  where T: Clone {}
    pub fn into_vec(self: Box<Self>) -> Vec<T> {}
    pub fn repeat(&self, n: usize) -> Vec<T> where T: Copy {}
}

#[lang = "slice_u8_alloc"]
impl [u8] {
    pub fn to_ascii_uppercase(&self) -> Vec<u8> {}
    pub fn to_ascii_lowercase(&self) -> Vec<u8> {}
}

#[lang = "str_alloc"]
impl str {
    pub fn into_boxed_bytes(self: Box<str>) -> Box<[u8]> {}
    pub fn replace<'a, P: Pattern<'a>>(&'a self, from: P, to: &str) -> String {}
    pub fn replacen<'a, P: Pattern<'a>>(&'a self, pat: P, to: &str, count: usize) -> String {}
    pub fn to_lowercase(&self) -> String {}
    pub fn to_uppercase(&self) -> String {}
    pub fn escape_debug(&self) -> String {}
    pub fn escape_default(&self) -> String {}
    pub fn escape_unicode(&self) -> String {}
    pub fn into_string(self: Box<str>) -> String {}
    pub fn repeat(&self, n: usize) -> String {}
    pub fn to_ascii_uppercase(&self) -> String {}
    pub fn to_ascii_lowercase(&self) -> String {}
}

Back on topic:

@japaric, I think I don’t quite understand how LLVM intrinsics, the libm crate, and system/toolchain-provided -lm C library all interact with each other, in your (a) and (b) options. In particular:

We stabilize the family of sqrtf32 LLVM intrinsics. This way crates like libm can achieve

Do you mean that the libm crate would call the intrinsic? Then what would provide the sqrtf symbol called by the intrinsic on targets that don’t have a dedicated instruction?

(Note that rust-lang/rust#27823 moved a number of f32 and f64 methods from libcore to libstd in order to make libcore not depend on a libm C library that needs to be provided separately.)

Regardless, I think this is largely independent of what user-facing API we want to stabilize, which at first I thought was what your (a) v.s. (b) was about:

  • Unsafe functions in the core::intrinsics modules, or
  • Safe inherent methods on the f32 and f64 types.

I think the latter API is obviously superior, assuming identical implementations.

@japaric is sqrtf32 the only intrinsic that needs to be stabilized or is that just an example and there's others for other math functions?

I do sort of agree with @japaric about the goal here of basically moving everything to libcore, but my main point of hesitation would be performance and accuracy of these intrinsics vs various libm implementations. @japaric would it be possible to collect some data about the efficiency of the various implementations in the Rust libm vs some native libm implementations? I'm less worried about things like sqrt which have intrinsics and can be optimized, but am more interested in things like trigonometric functions which don't have inherent compiler/architecture support. I think it'd also be interesting to check out the performance across platforms, it may be the case that the Rust libm is mega-fast on OSX (or something like that) but super slow on Linux

@jethrogb It’s an example. This is what’s in src/libstd/f64.rs today. f32 is similar.

#[lang = "f64_runtime"]
impl f64 {
    pub fn floor(self) -> f64 {} // intrinsics::floorf64
    pub fn ceil(self) -> f64 {} // intrinsics::ceilf64
    pub fn round(self) -> f64 {} // intrinsics::roundf64
    pub fn trunc(self) -> f64 {} // intrinsics::truncf64
    pub fn abs(self) -> f64 {} // intrinsics::fabsf64
    pub fn signum(self) -> f64 {} // intrinsics::copysignf64
    pub fn mul_add(self, a: f64, b: f64) -> f64 {} // intrinsics::fmaf64
    pub fn powi(self, n: i32) -> f64 {} // intrinsics::powif64
    pub fn powf(self, n: f64) -> f64 {} // intrinsics::powf64
    pub fn sqrt(self) -> f64 {} // intrinsics::sqrtf64
    pub fn exp(self) -> f64 {} // intrinsics::expf64
    pub fn exp2(self) -> f64 {} // intrinsics::exp2f64
    pub fn ln(self) -> f64 {} // intrinsics::logf64
    pub fn log2(self) -> f64 {} // intrinsics::log2f64
    pub fn log10(self) -> f64 {} // intrinsics::log10f64
    pub fn sin(self) -> f64 {} // intrinsics::sinf64
    pub fn cos(self) -> f64 {} // intrinsics::cosf64

    pub fn abs_sub(self, other: f64) -> f64 {} // cmath::fdim
    pub fn cbrt(self) -> f64 {} // cmath::cbrt
    pub fn hypot(self, other: f64) -> f64 {} // cmath::hypot
    pub fn tan(self) -> f64 {} // cmath::tan
    pub fn asin(self) -> f64 {} // cmath::asin
    pub fn acos(self) -> f64 {} // cmath::acos
    pub fn atan(self) -> f64 {} // cmath::atan
    pub fn atan2(self, other: f64) -> f64 {} // cmath::atan2
    pub fn exp_m1(self) -> f64 {} // cmath::expm1
    pub fn ln_1p(self) -> f64 {} // cmath::log1p
    pub fn sinh(self) -> f64 {} // cmath::sinh
    pub fn cosh(self) -> f64 {} // cmath::cosh
    pub fn tanh(self) -> f64 {} // cmath::tanh

    // Based on other methods, but not directly on intrinsics or cmatch
    pub fn log(self, base: f64) -> f64 {}
    pub fn fract(self) -> f64 {}
    pub fn div_euc(self, rhs: f64) -> f64 {}
    pub fn mod_euc(self, rhs: f64) -> f64 {}
    pub fn sin_cos(self) -> (f64, f64) {}
    pub fn asinh(self) -> f64 {}
    pub fn acosh(self) -> f64 {}
    pub fn atanh(self) -> f64 {}
}

Where cmath contains #[link_name = "m"] extern {…} bindings, and Rust intrinsics map to LLVM intrinsics:

$ grep "llvm.*f64" src/librustc_codegen_llvm/intrinsic.rs
        "sqrtf64" => "llvm.sqrt.f64",
        "powif64" => "llvm.powi.f64",
        "sinf64" => "llvm.sin.f64",
        "cosf64" => "llvm.cos.f64",
        "powf64" => "llvm.pow.f64",
        "expf64" => "llvm.exp.f64",
        "exp2f64" => "llvm.exp2.f64",
        "logf64" => "llvm.log.f64",
        "log10f64" => "llvm.log10.f64",
        "log2f64" => "llvm.log2.f64",
        "fmaf64" => "llvm.fma.f64",
        "fabsf64" => "llvm.fabs.f64",
        "copysignf64" => "llvm.copysign.f64",
        "floorf64" => "llvm.floor.f64",
        "ceilf64" => "llvm.ceil.f64",
        "truncf64" => "llvm.trunc.f64",
        "rintf64" => "llvm.rint.f64",
        "nearbyintf64" => "llvm.nearbyint.f64",
        "roundf64" => "llvm.round.f64",

cmath binds to a libm library that is expected to be provided by the C toolchain, and LLVM intrinsics may compile to calls that do the same.

However, note that (to the best of my knowledge) the vast majority of those intrinsics are lowered to libcalls rather than single instructions on most or all architectures. They are still useful to LLVM for optimizations (constant folding, code motion and dead code elimination based on the fact that they don't access errno, etc.), but most of them don't impact instruction selection the way @japaric demonstrated with sqrt.

@SimonSapin

I think I don’t quite understand how LLVM intrinsics, the libm crate, and
system/toolchain-provided -lm C library all interact with each other

The (observable) behavior of the sqrtf32 intrinsic is depicted below:

// user writes
fn my_sqrt(x: f32) -> f32 {
    intrinsics::sqrtf32(x)
}

// For ARM Cortex-M4F, LLVM lowers `my_sqrt` to
fn my_sqrt(x: f32) -> f32 {
    let y;
    unsafe {
        asm!("vsqrt $0, $1" : "=w"(y) : "w"(x));
    }
    y
}

// For targets that don't have an instruction for the sqrt operation, LLVM lowers `my_sqrt` to
fn my_sqrt(x: f32) -> f32 {
    extern "C" {
        fn sqrtf(_: f32) -> f32;
    }

    unsafe {
        sqrtf(x)
    }
}

On targets like x86_64 Linux std contains an extern block with #[link(name = "m")]. This makes
rustc pass -lm to the linker and libm.a provides the sqrtf symbol.

Do you mean that the libm crate would call the intrinsic?

Yes, the libm crate would use intrinsics::sqrtf32 to implement F32Ext.sqrt.

Then what would provide the sqrtf symbol called by the intrinsic on targets that don’t have a
dedicated instruction?

The libm crate itself can do that:

impl F32Ext for f32 {
    fn sqrt(x: f32) -> f32 {
        unsafe {
            intrinsics::sqrtf32(x)
        }
    }
}

#[no_mangle]
pub extern "C" fn sqrtf(x: f32) -> f32 {
    // Software implementation
}

@alexcrichton

@japaric would it be possible to collect some data about the efficiency of the various
implementations in the Rust libm vs some native libm implementations?

We can, but we don't have to force our libm implementation on the targets that are currently using
the system libm.a. For now, we can hold off on providing symbols like sqrtf in compiler-builtins
for those targets (see below) and they would see no change in performance / accuracy.

// crate: core
impl f32 {
    fn sqrt(self) -> Self {
        unsafe { intrinsics::sqrtf32(self) }

    }
}

// crate: compiler-builtins
#[cfg(any(target_os = "none", target_os = "unknown"))]
#[no_mangle]
pub fn sqrtf(x: f32) -> f32 {
    // Software implementation
}

Targets like x86_64-unknown-linux-gnu would continue to use the sqrtf symbol that comes from
-lm.

Also, right off the bat, I can tell you that most of the f32 math functions that we have ported
from MUSL have pretty bad runtime performance on 32-bit architectures because they internally use
f64 operations / functions. We want to replace the implementations of f32 ops with a port of
newlib. newlib implements f32 functions using only f32 operations. See rust-lang/libm#118 for
details.

@japaric So your (a) proposal is adding APIs to libcore that, when used, adds a dependency on a symbol being provided externally somehow. The precedent of rust-lang/rust#27823 and rust-lang/rust#32110 (comment) seems to be that libcore should avoid precisely this.

Agree with japaric that plan (b) is the way to. Normally [very much! Haha] want things to be stable code, and hopefully move to nursery crate, but the case of polyfilling code that may just be generated instead is clearly a special case, where the compiler coupling is inherent to the problem. And tying ourselves to LLVM in stable interfaces is definitely no good.

Let my also through out that on the general front of the compiler-builtinscore dependency cycle, #2492 may be of some assistance. That might remove the downsides of plan (b) eventually.

@japaric

For now, we can hold off on providing symbols like sqrtf in compiler-builtins
for those targets (see below) and they would see no change in performance / accuracy.

True! We don't have a great way of adding -lm to core though, on targets like x86_64-unknown-linux-gnu. :(

@SimonSapin

hmm, I believe that if the intrinsic is marked as #[inline] the undefined symbol would end up in the libm crate and not in the libcore crate.

@alexcrichton

We don't have a great way of adding -lm to core though, on targets like x86_64-unknown-linux-gnu

Having core inject -lm sounds wrong; core should not depend on C libraries being present on the host.

However, I don't think we would need to have core pass the -lm flag to the linker. Keeping the current behavior of having std pass -lm to the linker would be sufficient: even if a #[no_std] crate uses the math support in core the crate will end up being used in a binary that links to std, so the -lm requirement would be satisfied.

I don't know of any use case of #[no_std] executables for x86_64-unknown-linux-gnu but I expect building such crate requires passing -lc to the linker via #[link] or build.rs. If the use case requires custom linking from the get go then having the users manually pass an extra -lm flag to the linker to get math support doesn't seem too bad ...

Do we even officially support #![no_std] programs on x86_64-unknown-linux-gnu? x86_64-unknown-linux-gnu without std (just core) seems undistinguishable from x86_64-unknown-linux-musl without std, and sounds more like an OS agnostic x86_64-none-elf target.

if the intrinsic is marked as #[inline] the undefined symbol would end up in the libm crate and not in the libcore crate.

From the linker’s point of view, sure. But does it matter? From a user’s point of view there would be an API in libcore that, when used, might cause undefined symbol errors.

@SimonSapin Quite a few of the functions in core::intrinsics have the same behavior; they can produce undefined symbol errors. If you mean to say that we should not stabilize API that has such behavior; I would agree. That policy would also eliminate option (a).

core::intrinsics is in a kind of blurry area where it’s mostly intended as an implementation detail of other things, and probably never to be stabilized directly. So in a sense it’s not "really" part of the public API of the standard library.

rust-lang/rust#27823 and rust-lang/rust#32110 (comment) suggest that, at least so far, libcore (or the subset of it reachable through its stable public API) is intended to be "dependency-free".

But then based on some of the discussion in #2480, maybe we should rethink the whole libcore / no_std thing. As you suggest, maybe Linux x86-64 without libc/libm should be a different target.

The llvm math intrinsics also support vector types (e.g. f32x4). We recently had to split core::simd out of core because the number of issues due to these intrinsics missing in some targets (wasm, softfloat targets, ...) was piling up but the plan is to eventually put it back in core.

Ideally, all math intrinsics provided would not only work on f32 and f64, but also on packed vectors (f32x2, f32x4, f32x8, f32x16, f64x2, f64x4, f64x8).

That would mean, however, that libm isn't enough, but that we would need something like libmvec as well to handle the vector types in targets that do not provide a standard libmvec.

Please correct me if I'm wrong, but is it that when using libm in program targeting thumbv7em-none-eabihf, calling libm::sin function won't be using the hardware instructions, but use software implementations instead??

I assume that the problem described in the starting issue comment for sqrt (where a direct call to libm always gets the software implementation) applies just as much to sin and any other libm function.

What if we switch LLVM and bootstrap a new transpiler?

This should also work with rust-lang/rust#57241 const fn support, i.e. ideally writing something.sqrt() should:

  • if it's constant, evaluate at compile time, done
  • if we're compiling for e.g. eabihf, use FPU
  • otherwise use libm

If anyone is interested, I'm developing a set of core sin, cos, ln, exp etc. for the SIMD library here:

rust-lang/portable-simd#126

They are very simple, mostly a single polynomial evaluation and probably significantly faster
than any LLVM implementation, especially as they autovectorise and interleave.

With a little extra compiler support, we could also make them const-able - great for FFT evaluation
and other algorithms.

They are generated using the doctor_syn crate which extends syn to enable arbitrary
precision compile time evaluation of expressions and polynomial coefficient calculation.
This is still a work in progress and is due a rewrite once the dust settles.

This seems obvious and simple to me so why hasn't this been done yet? Math functions like ceil() and floor() are very simple yet are not available in core. I don't understand what the hold up is. If some subset of the functions are problematic, then just leave those out. Getting the trivial functions into core seems like a priority.

vks commented

It is not obvious and simple, because std currently relies on libc for math functions, which core cannot use. The different options are discussed at length in the issue description, and they are non-trivial.

The discussion in this thread fizzled out 5 years ago. Are we still in the same place?

From what I understand, core pulling in either -lm or the libm crate implicitly is not desired. Can we just make the user do that, like with -lc and compiler-builtins? This would improve the ergonomics of writing Rust code, and would move the issue of doing no_std math from library authors to binary builders.

Why do we need to rely on external system libraries? What's the problem with Rust implementing these directly?

vks commented

@mlindner That's approach (b) from the issue description. One possible issue might be that libm can yield different results than libc, and that you can already use it in your no_std crates. (In my crates, I usually have a libm feature that works on no_std.)

It would be nice to make that fallback official, but having explicit features might be better for reproducibility.

If I could provide a little context: The libm that's part of the rust project has nearly no attention. Thus, rust assumes that any system's libm will be better performing than ours. Thus, we don't have our libm export function symbols unless we know for sure that the symbol won't be there (eg: wasm targets).

Until our own libm is given much more attention, it's unlikely to be put into the default compilation mix.

Thanks for the context, as a workaround could we add (if it's not already there) a feature flag that says to use Rust's built-in math intrinsics and thus by that means allow them in to no-std?

The problem here is the use of llvm intrinsics to implemement maths functions.

For example f32::cos here:

https://github.com/rust-lang/rust/blob/master/library/std/src/f32.rs#L610-L612

This is a pragmatic choice as no work needs to be done on many platforms, but LLVM
often falls back on calling libm, especially on older targets. It may have improved this
since I last sampled the code base.

This is very much the B-grade choice as calling any function has a high cold code overhead
and the inability to inline the code prevents efficient loop transformations. The performance
difference is often as bad as 1000:1 on modern hardware but a lot less on older targets.

I'm planning to highlight this in an upcoming book on rust code performance, feedback is welcome.

A better choice would be to at least bite the bullet and use x86/ARM specific intrinsics for those platforms
for primitives like ceil() and round()

It would be interesting to see what core::intrinsics::cosf32 does on X86 platforms. If I am not mistaken,
it is not allowed to call libm.

Answers on a postcard.

core::intrinsics::cosf32 is codegened to a call on x86 and all other modern CPU targets. The x86 ISA has hardware sin/cos instructions, but they have poor accuracy and they're slow, so they're rarely used. The lowering to a call happens in codegen, after all loop optimizations are complete.

if you enable appropriate target features, floor, ceil, fma, etc. intrinsics are lowered to native instructions on x86, no lib calls.

https://rust.godbolt.org/z/WGs6aKhqT

@programmerjake is correct. When a modern target is enabled, it works well, but the default is always disappointing.

https://rust.godbolt.org/z/Kqsf3YG7K

It would be lovely if -C target-feature=native was the default for cargo install, but the argument against this is probably docker images which must always be the lowest common denominator. However, if you install on your own hardware, you expect maximum performance.

Much of the SIMD group's excellent work is hard to use without this option unless you use target_feature, which in turn
is hard to integrate into libraries.

I'm not a regular reader of Rust discussions, but would imagine this has been discussed before.