Math support in core
japaric opened this issue · 36 comments
Background
Currently the core
crate doesn't provide support for mathematical functions like sqrt
or sin
.
To do math in a #![no_std]
program one has the following options:
-
Link to a C implementation of libm, i.e.
libm.a
. This is cumbersome as the programmer needs to
obtain a compiled version of libm for their target, or compile libm themselves which implies a C
cross toolchain when the target system and the build system are not the same architecture / OS. -
Use a pure Rust implementation of libm, like the
libm
crate. On stable, (a) the performance of
such implementation won't be on par with a C implementation, or (b) to achieve the same
performance the user would require a C (cross) toolchain.
To elaborate on (a) and (b). Consider the following contrived program that computes the square root
of a number:
#![no_std]
extern crate libm;
use core::ptr;
use libm::F32Ext;
#[no_mangle]
pub unsafe fn foo() {
// volatile memory accesses to prevent the compiler from optimizing away everything
let x: f32 = ptr::read_volatile(0x2000_0000 as *const _);
let y = x.sqrt();
ptr::write_volatile(0x2000_1000 as *mut _, y);
}
When compiled for the thumbv7em-none-eabihf
target it produces the following machine code:
00000000 <foo>:
0: f04f 5000 mov.w r0, #536870912 ; 0x20000000
4: ed90 0a00 vldr s0, [r0]
8: ee10 1a10 vmov r1, s0
c: f001 40ff and.w r0, r1, #2139095040 ; 0x7f800000
10: f1b0 4fff cmp.w r0, #2139095040 ; 0x7f800000
14: d108 bne.n 28 <foo+0x28>
16: ee00 0a00 vmla.f32 s0, s0, s0
(..)
2f4: ed80 0a00 vstr s0, [r0]
2f8: 4770 bx lr
This is extremely inefficient machine code because the target has a hardware FPU that supports
computing the square root in a single instruction. Ideally, the program should compile down to the
following machine code:
00000000 <foo>:
0: f04f 5000 mov.w r0, #536870912 ; 0x20000000
4: ed90 0a00 vldr s0, [r0]
8: f241 0000 movw r0, #4096 ; 0x1000
c: f2c2 0000 movt r0, #8192 ; 0x2000
10: eeb1 0ac0 vsqrt.f32 s0, s0
14: ed80 0a00 vstr s0, [r0]
18: 4770 bx lr
If the target had access to the standard library the program would compile down to that machine code
because the implementation of f32.sqrt
in std
looks like this:
#![feature(core_intrinsics)]
use std::intrinsics;
impl f32 {
fn sqrt(self) -> Self {
intrinsics::sqrtf32(self)
}
}
sqrtf32
is an unstable, thin wrapper around an LLVM intrinsic that either compiles down to a
hardware implementation of square root if the target architecture supports it in its instruction
set, or it produces a call to the sqrtf
routine if it doesn't (*). std
makes use of 30+ of such
LLVM intrinsics for performance of math functions.
(*) The llvm.sqrt.*
LLVM intrinsic, which sqrtf32
wraps, is not quite specified like that but
that's the observable effect.
The libm
crate can't make use of this intrinsic on stable because it's unstable and feature
gated. However, the libm
crate could replicate the behavior of the sqrtf32
intrinsic using
conditional compilation and external assembly files as shown below:
// crate: libm
// NOTE heavily simplified because it ignores architectures other than ARM
impl F32Ext for f32 {
#[cfg(target_arch = "arm")]
fn sqrt(self) -> Self {
extern "C" {
// provided by an external assembly file
fn vsqrt_f32(x: f32) -> f32;
}
unsafe { vsqrt_f32(self) }
}
#[cfg(not(target_arch = "arm"))]
fn sqrt(self) -> Self {
// software implementation
}
}
But this would heavily complicate the implementation of the libm
crate, which would likely
introduce bugs. Also, as it's not possible to use inline assembly (asm!
) on stable the vsqrt.f32
instruction would have to be invoked via FFI and an external assembly file. External assembly files
mean that the user would require a C (cross) toolchain to build the crate negating the main benefit
of using a pure Rust implementation of libm.
Possible solutions
I see two options for improving the situation here:
a. We stabilize the family of sqrtf32
LLVM intrinsics. This way crates like libm
can achieve the
performance of the std
implementation on stable without requiring complex conditional
compilation and C toolchains. Or,
b. We move all the existing math support from std
to core
. For the user this means that e.g.
f32.sqrt
will also work in #![no_std]
programs.
Option (a) is kind of bad (maybe?) for alternative backends like cranelift as they would have to
support / implement these LLVM intrinsics to be on parity with the rustc+LLVM
compiler.
Option (b) requires us (*) to provide an implementation of math functions (symbols) like sqrtf
for targets that do not link to libm by default. If we don't do this those targets will hit
"undefined reference to sqrtf
" linker errors when using math methods like f32.sqrt
.
(*) "us" as in: we must provide symbols like sqrtf
in the compiler-builtins
crate. Note that we
are already providing such symbols for the wasm32-unknown-unknown
target, and we are
using the libm
crate to do that.
If we go ahead with option (b) we must be careful to not provide the math symbols in
compiler-builtins
for targets that are currently using system libm (e.g.
x86_64-unknown-linux-gnu
). Because if we do provide the symbols then all existing programs will
start using the libm
crate implementation instead of the system libm implementation -- this is due
to how we invoke the linker: libcompiler_builtins.rlib
appears before -lm
in the linker
arguments -- and that may degrade performance in some cases where system libm has architecture
optimized implementations of some functions.
With option (b) I believe that #![no_std]
programs that are currently linking to some C
implementation of libm for math support will end up using the libm
crate implementation as a side
effect. I don't see a way to avoid this: even if we mark the math symbols in compiler-builtins
as
weak the way we invoke the linker will cause the program to use the libm
crate implementation.
Final thoughts
IMO, math support should be in the core
crate as it doesn't depend on OS, or I/O, abstractions
like other std
-only API does (e.g. std::fs
, std::net
). Also, std
makes math like sqrt
feel built-in because the functionality is provided as inherent methods -- it feels weird that such "built-in" functionality is not available in #![no_std]
.
Thoughts? Should we do (a) or (b)? Or is there some other solution? Or should we leave math out of core?
cc @SimonSapin (T-libs), @jethrogb @Ericson2314 (T-portability), @joshtriplett @korken89 (some stakeholders)
Previous discussion of this: rust-lang/rust#50145
I don’t understand the difference between option (a) and (b), they seem to be effectively the same. Stabilizing an intrinsic is typically done by adding a stable (and safe) wrapper function. The existing support in std
is a stable wrapper function/method.
feel built-in because the functionality is provided as inherent methods
For what it’s worth we have other precedent of inherent methods not being present in libcore, for example [T]::to_vec
.
Option (a) is kind of bad (maybe?) for alternative backends like cranelift as they would have to
support / implement these LLVM intrinsics to be on parity with the rustc+LLVM compiler.
A cranelift backend for rustc already needs to gracefully handle a long list of rustc intrinsics. In this case always translating the rustc intrinsic to a sqrt
call should be easy & functional, it wouldn't even have to impact cranelift code. (cc @eddyb)
I don’t understand the difference between option (a) and (b), they seem to be effectively the same.
They are pretty similar but they are not exactly the same. In option (b) core
becomes the
provider of math functionality. With option (a) any crate can provide math functionality that's on
par with the performance of std
(cf. sqrtf32
example).
Thinking about it some more I think option (a) will also run into the problems that option (b) has
with respect to shadowing system libm.
Also, I originally thought (but didn't comment above) that having option (a) would good because the
ecosystem could grow architecture optimized versions of the libm
crate and a user would be free to
pick one of those for their project. I don't think that would work well on stable because
architecture optimized versions would very likely involve external assembly files, which require a C
cross toolchain. Additionally, having a bunch of crates like libm-x86_64
and libm-arm-neon
on
crates.io would probably lead to fragmentation. It seems better to have core
as the only provider
of math functions and to have everyone contribute architecture optimized implementations to it --
core
can make use of unstable features like asm!
and global_asm!
and it comes pre-compiled so
it doesn't add a dependency on a C toolchain.
TL;DR Option (b) (still) sounds best to me.
For what it’s worth we have other precedent of inherent methods not being present in libcore, for
example [T]::to_vec.
I think that's the only inherent method that involves a type that's not built into the language
(Vec
), which makes it a bit of a special case / exception. I believe to_vec
was turned into an
inherent method before the ToOwned
trait was added to the standard library, and I think that
having ToOwned
in the std
prelude, and potentially in an alloc
(pseudo) prelude, reduces or
eliminates the need for it.
I cc-ed T-portability people because they also look into issues that aim to close the gap between
standard programs and no_std
programs like having io::{Read,Write}
in core
, and less so
because this is related to the portability lint.
I think math functions should not be cfg-ed away. Even if the target doesn't have a FPU, software
emulation of floats is always possible; 128-bit integer support is in a similar situation and we
don't cfg that today. (I know some targets have problems with LLVM bugs that may prevent core
compiling for them if it gained math functions; in that case we could temporarily cfg away math
support until the LLVM bug is fixed. I believe pure MIR rlibs would also sidestep the LLVM problem
and I would prefer that over using #[cfg])
Right, other backends will have to port other Rust intrinsics as things stand today. Perhaps, 30
more intrinsics is not that much extra work.
Finally, this is a bit of speculation because I have not tested it but I think that putting the math
symbols / functions in core
, rather than in compiler-builtins
, may let the compiler perform
inlining -- the intrinsics / functions in compiler-builtins
never get inlined, not even with LTO
enabled, because that crate is always a separate object file.
For what it’s worth we have other precedent of inherent methods not being present in libcore, for example [T]::to_vec.
I think that's the only inherent method that involves a type that's not built into the language
(Vec), which makes it a bit of a special case / exception.
Currently there are 21 inherent methods defined on primitive types outside of libcore
because they involve Vec
or String
in their signature or in their implementation.
This doesn’t invalidate this issue, I think having math support in libcore would be good. I’m only saying it wouldn’t entirely remove the weirdness of inherent methods being "magically" added just by adding a dependency on a non-core
standard library crate.
#[lang = "slice_alloc"]
impl<T> [T] {
pub fn sort(&mut self) where T: Ord {…}
pub fn sort_by<F>(&mut self, mut compare: F) where F: FnMut(&T, &T) -> Ordering {…}
pub fn sort_by_key<K, F>(&mut self, mut f: F) where F: FnMut(&T) -> K, K: Ord {…}
pub fn sort_by_cached_key<K, F>(&mut self, f: F) where F: FnMut(&T) -> K, K: Ord {…}
pub fn to_vec(&self) -> Vec<T> where T: Clone {…}
pub fn into_vec(self: Box<Self>) -> Vec<T> {…}
pub fn repeat(&self, n: usize) -> Vec<T> where T: Copy {…}
}
#[lang = "slice_u8_alloc"]
impl [u8] {
pub fn to_ascii_uppercase(&self) -> Vec<u8> {…}
pub fn to_ascii_lowercase(&self) -> Vec<u8> {…}
}
#[lang = "str_alloc"]
impl str {
pub fn into_boxed_bytes(self: Box<str>) -> Box<[u8]> {…}
pub fn replace<'a, P: Pattern<'a>>(&'a self, from: P, to: &str) -> String {…}
pub fn replacen<'a, P: Pattern<'a>>(&'a self, pat: P, to: &str, count: usize) -> String {…}
pub fn to_lowercase(&self) -> String {…}
pub fn to_uppercase(&self) -> String {…}
pub fn escape_debug(&self) -> String {…}
pub fn escape_default(&self) -> String {…}
pub fn escape_unicode(&self) -> String {…}
pub fn into_string(self: Box<str>) -> String {…}
pub fn repeat(&self, n: usize) -> String {…}
pub fn to_ascii_uppercase(&self) -> String {…}
pub fn to_ascii_lowercase(&self) -> String {…}
}
Back on topic:
@japaric, I think I don’t quite understand how LLVM intrinsics, the libm crate, and system/toolchain-provided -lm
C library all interact with each other, in your (a) and (b) options. In particular:
We stabilize the family of sqrtf32 LLVM intrinsics. This way crates like libm can achieve
Do you mean that the libm crate would call the intrinsic? Then what would provide the sqrtf
symbol called by the intrinsic on targets that don’t have a dedicated instruction?
(Note that rust-lang/rust#27823 moved a number of f32
and f64
methods from libcore to libstd in order to make libcore not depend on a libm C library that needs to be provided separately.)
Regardless, I think this is largely independent of what user-facing API we want to stabilize, which at first I thought was what your (a) v.s. (b) was about:
- Unsafe functions in the
core::intrinsics
modules, or - Safe inherent methods on the
f32
andf64
types.
I think the latter API is obviously superior, assuming identical implementations.
CC @alexcrichton who discussed this in rust-lang/rust#32110 (comment).
@japaric is sqrtf32 the only intrinsic that needs to be stabilized or is that just an example and there's others for other math functions?
I do sort of agree with @japaric about the goal here of basically moving everything to libcore, but my main point of hesitation would be performance and accuracy of these intrinsics vs various libm implementations. @japaric would it be possible to collect some data about the efficiency of the various implementations in the Rust libm vs some native libm implementations? I'm less worried about things like sqrt
which have intrinsics and can be optimized, but am more interested in things like trigonometric functions which don't have inherent compiler/architecture support. I think it'd also be interesting to check out the performance across platforms, it may be the case that the Rust libm is mega-fast on OSX (or something like that) but super slow on Linux
@jethrogb It’s an example. This is what’s in src/libstd/f64.rs
today. f32
is similar.
#[lang = "f64_runtime"]
impl f64 {
pub fn floor(self) -> f64 {…} // intrinsics::floorf64
pub fn ceil(self) -> f64 {…} // intrinsics::ceilf64
pub fn round(self) -> f64 {…} // intrinsics::roundf64
pub fn trunc(self) -> f64 {…} // intrinsics::truncf64
pub fn abs(self) -> f64 {…} // intrinsics::fabsf64
pub fn signum(self) -> f64 {…} // intrinsics::copysignf64
pub fn mul_add(self, a: f64, b: f64) -> f64 {…} // intrinsics::fmaf64
pub fn powi(self, n: i32) -> f64 {…} // intrinsics::powif64
pub fn powf(self, n: f64) -> f64 {…} // intrinsics::powf64
pub fn sqrt(self) -> f64 {…} // intrinsics::sqrtf64
pub fn exp(self) -> f64 {…} // intrinsics::expf64
pub fn exp2(self) -> f64 {…} // intrinsics::exp2f64
pub fn ln(self) -> f64 {…} // intrinsics::logf64
pub fn log2(self) -> f64 {…} // intrinsics::log2f64
pub fn log10(self) -> f64 {…} // intrinsics::log10f64
pub fn sin(self) -> f64 {…} // intrinsics::sinf64
pub fn cos(self) -> f64 {…} // intrinsics::cosf64
pub fn abs_sub(self, other: f64) -> f64 {…} // cmath::fdim
pub fn cbrt(self) -> f64 {…} // cmath::cbrt
pub fn hypot(self, other: f64) -> f64 {…} // cmath::hypot
pub fn tan(self) -> f64 {…} // cmath::tan
pub fn asin(self) -> f64 {…} // cmath::asin
pub fn acos(self) -> f64 {…} // cmath::acos
pub fn atan(self) -> f64 {…} // cmath::atan
pub fn atan2(self, other: f64) -> f64 {…} // cmath::atan2
pub fn exp_m1(self) -> f64 {…} // cmath::expm1
pub fn ln_1p(self) -> f64 {…} // cmath::log1p
pub fn sinh(self) -> f64 {…} // cmath::sinh
pub fn cosh(self) -> f64 {…} // cmath::cosh
pub fn tanh(self) -> f64 {…} // cmath::tanh
// Based on other methods, but not directly on intrinsics or cmatch
pub fn log(self, base: f64) -> f64 {…}
pub fn fract(self) -> f64 {…}
pub fn div_euc(self, rhs: f64) -> f64 {…}
pub fn mod_euc(self, rhs: f64) -> f64 {…}
pub fn sin_cos(self) -> (f64, f64) {…}
pub fn asinh(self) -> f64 {…}
pub fn acosh(self) -> f64 {…}
pub fn atanh(self) -> f64 {…}
}
Where cmath
contains #[link_name = "m"] extern {…}
bindings, and Rust intrinsics map to LLVM intrinsics:
$ grep "llvm.*f64" src/librustc_codegen_llvm/intrinsic.rs
"sqrtf64" => "llvm.sqrt.f64",
"powif64" => "llvm.powi.f64",
"sinf64" => "llvm.sin.f64",
"cosf64" => "llvm.cos.f64",
"powf64" => "llvm.pow.f64",
"expf64" => "llvm.exp.f64",
"exp2f64" => "llvm.exp2.f64",
"logf64" => "llvm.log.f64",
"log10f64" => "llvm.log10.f64",
"log2f64" => "llvm.log2.f64",
"fmaf64" => "llvm.fma.f64",
"fabsf64" => "llvm.fabs.f64",
"copysignf64" => "llvm.copysign.f64",
"floorf64" => "llvm.floor.f64",
"ceilf64" => "llvm.ceil.f64",
"truncf64" => "llvm.trunc.f64",
"rintf64" => "llvm.rint.f64",
"nearbyintf64" => "llvm.nearbyint.f64",
"roundf64" => "llvm.round.f64",
cmath
binds to a libm library that is expected to be provided by the C toolchain, and LLVM intrinsics may compile to calls that do the same.
However, note that (to the best of my knowledge) the vast majority of those intrinsics are lowered to libcalls rather than single instructions on most or all architectures. They are still useful to LLVM for optimizations (constant folding, code motion and dead code elimination based on the fact that they don't access errno
, etc.), but most of them don't impact instruction selection the way @japaric demonstrated with sqrt.
I think I don’t quite understand how LLVM intrinsics, the libm crate, and
system/toolchain-provided -lm C library all interact with each other
The (observable) behavior of the sqrtf32
intrinsic is depicted below:
// user writes
fn my_sqrt(x: f32) -> f32 {
intrinsics::sqrtf32(x)
}
// For ARM Cortex-M4F, LLVM lowers `my_sqrt` to
fn my_sqrt(x: f32) -> f32 {
let y;
unsafe {
asm!("vsqrt $0, $1" : "=w"(y) : "w"(x));
}
y
}
// For targets that don't have an instruction for the sqrt operation, LLVM lowers `my_sqrt` to
fn my_sqrt(x: f32) -> f32 {
extern "C" {
fn sqrtf(_: f32) -> f32;
}
unsafe {
sqrtf(x)
}
}
On targets like x86_64 Linux std
contains an extern
block with #[link(name = "m")]
. This makes
rustc
pass -lm
to the linker and libm.a
provides the sqrtf
symbol.
Do you mean that the libm crate would call the intrinsic?
Yes, the libm crate would use intrinsics::sqrtf32
to implement F32Ext.sqrt
.
Then what would provide the sqrtf symbol called by the intrinsic on targets that don’t have a
dedicated instruction?
The libm
crate itself can do that:
impl F32Ext for f32 {
fn sqrt(x: f32) -> f32 {
unsafe {
intrinsics::sqrtf32(x)
}
}
}
#[no_mangle]
pub extern "C" fn sqrtf(x: f32) -> f32 {
// Software implementation
}
@japaric would it be possible to collect some data about the efficiency of the various
implementations in the Rust libm vs some native libm implementations?
We can, but we don't have to force our libm
implementation on the targets that are currently using
the system libm.a
. For now, we can hold off on providing symbols like sqrtf
in compiler-builtins
for those targets (see below) and they would see no change in performance / accuracy.
// crate: core
impl f32 {
fn sqrt(self) -> Self {
unsafe { intrinsics::sqrtf32(self) }
}
}
// crate: compiler-builtins
#[cfg(any(target_os = "none", target_os = "unknown"))]
#[no_mangle]
pub fn sqrtf(x: f32) -> f32 {
// Software implementation
}
Targets like x86_64-unknown-linux-gnu
would continue to use the sqrtf
symbol that comes from
-lm
.
Also, right off the bat, I can tell you that most of the f32
math functions that we have ported
from MUSL have pretty bad runtime performance on 32-bit architectures because they internally use
f64
operations / functions. We want to replace the implementations of f32
ops with a port of
newlib. newlib implements f32
functions using only f32
operations. See rust-lang/libm#118 for
details.
@japaric So your (a) proposal is adding APIs to libcore that, when used, adds a dependency on a symbol being provided externally somehow. The precedent of rust-lang/rust#27823 and rust-lang/rust#32110 (comment) seems to be that libcore should avoid precisely this.
Agree with japaric that plan (b) is the way to. Normally [very much! Haha] want things to be stable code, and hopefully move to nursery crate, but the case of polyfilling code that may just be generated instead is clearly a special case, where the compiler coupling is inherent to the problem. And tying ourselves to LLVM in stable interfaces is definitely no good.
Let my also through out that on the general front of the compiler-builtins
–core
dependency cycle, #2492 may be of some assistance. That might remove the downsides of plan (b) eventually.
For now, we can hold off on providing symbols like sqrtf in compiler-builtins
for those targets (see below) and they would see no change in performance / accuracy.
True! We don't have a great way of adding -lm
to core
though, on targets like x86_64-unknown-linux-gnu
. :(
hmm, I believe that if the intrinsic is marked as #[inline] the undefined symbol would end up in the libm crate and not in the libcore crate.
We don't have a great way of adding -lm to core though, on targets like x86_64-unknown-linux-gnu
Having core inject -lm sounds wrong; core should not depend on C libraries being present on the host.
However, I don't think we would need to have core pass the -lm flag to the linker. Keeping the current behavior of having std pass -lm to the linker would be sufficient: even if a #[no_std] crate uses the math support in core the crate will end up being used in a binary that links to std, so the -lm requirement would be satisfied.
I don't know of any use case of #[no_std] executables for x86_64-unknown-linux-gnu
but I expect building such crate requires passing -lc to the linker via #[link]
or build.rs
. If the use case requires custom linking from the get go then having the users manually pass an extra -lm flag to the linker to get math support doesn't seem too bad ...
Do we even officially support #![no_std] programs on x86_64-unknown-linux-gnu
? x86_64-unknown-linux-gnu
without std (just core) seems undistinguishable from x86_64-unknown-linux-musl
without std, and sounds more like an OS agnostic x86_64-none-elf
target.
if the intrinsic is marked as #[inline] the undefined symbol would end up in the libm crate and not in the libcore crate.
From the linker’s point of view, sure. But does it matter? From a user’s point of view there would be an API in libcore that, when used, might cause undefined symbol errors.
@SimonSapin Quite a few of the functions in core::intrinsics have the same behavior; they can produce undefined symbol errors. If you mean to say that we should not stabilize API that has such behavior; I would agree. That policy would also eliminate option (a).
core::intrinsics
is in a kind of blurry area where it’s mostly intended as an implementation detail of other things, and probably never to be stabilized directly. So in a sense it’s not "really" part of the public API of the standard library.
rust-lang/rust#27823 and rust-lang/rust#32110 (comment) suggest that, at least so far, libcore (or the subset of it reachable through its stable public API) is intended to be "dependency-free".
But then based on some of the discussion in #2480, maybe we should rethink the whole libcore / no_std
thing. As you suggest, maybe Linux x86-64 without libc/libm should be a different target.
The llvm
math intrinsics also support vector types (e.g. f32x4). We recently had to split core::simd
out of core because the number of issues due to these intrinsics missing in some targets (wasm, softfloat targets, ...) was piling up but the plan is to eventually put it back in core
.
Ideally, all math intrinsics provided would not only work on f32
and f64
, but also on packed vectors (f32x2, f32x4, f32x8, f32x16, f64x2, f64x4, f64x8).
That would mean, however, that libm
isn't enough, but that we would need something like libmvec
as well to handle the vector types in targets that do not provide a standard libmvec
.
Please correct me if I'm wrong, but is it that when using libm
in program targeting thumbv7em-none-eabihf
, calling libm::sin
function won't be using the hardware instructions, but use software implementations instead??
I assume that the problem described in the starting issue comment for sqrt
(where a direct call to libm
always gets the software implementation) applies just as much to sin
and any other libm
function.
What if we switch LLVM and bootstrap a new transpiler?
This should also work with rust-lang/rust#57241 const fn support, i.e. ideally writing something.sqrt()
should:
- if it's constant, evaluate at compile time, done
- if we're compiling for e.g.
eabihf
, use FPU - otherwise use libm
If anyone is interested, I'm developing a set of core sin, cos, ln, exp etc. for the SIMD library here:
They are very simple, mostly a single polynomial evaluation and probably significantly faster
than any LLVM implementation, especially as they autovectorise and interleave.
With a little extra compiler support, we could also make them const-able - great for FFT evaluation
and other algorithms.
They are generated using the doctor_syn crate which extends syn to enable arbitrary
precision compile time evaluation of expressions and polynomial coefficient calculation.
This is still a work in progress and is due a rewrite once the dust settles.
This seems obvious and simple to me so why hasn't this been done yet? Math functions like ceil() and floor() are very simple yet are not available in core. I don't understand what the hold up is. If some subset of the functions are problematic, then just leave those out. Getting the trivial functions into core seems like a priority.
It is not obvious and simple, because std
currently relies on libc
for math functions, which core
cannot use. The different options are discussed at length in the issue description, and they are non-trivial.
The discussion in this thread fizzled out 5 years ago. Are we still in the same place?
From what I understand, core pulling in either -lm or the libm crate implicitly is not desired. Can we just make the user do that, like with -lc and compiler-builtins? This would improve the ergonomics of writing Rust code, and would move the issue of doing no_std math from library authors to binary builders.
Why do we need to rely on external system libraries? What's the problem with Rust implementing these directly?
@mlindner That's approach (b) from the issue description. One possible issue might be that libm
can yield different results than libc
, and that you can already use it in your no_std
crates. (In my crates, I usually have a libm
feature that works on no_std
.)
It would be nice to make that fallback official, but having explicit features might be better for reproducibility.
If I could provide a little context: The libm
that's part of the rust project has nearly no attention. Thus, rust assumes that any system's libm will be better performing than ours. Thus, we don't have our libm export function symbols unless we know for sure that the symbol won't be there (eg: wasm targets).
Until our own libm is given much more attention, it's unlikely to be put into the default compilation mix.
Thanks for the context, as a workaround could we add (if it's not already there) a feature flag that says to use Rust's built-in math intrinsics and thus by that means allow them in to no-std?
The problem here is the use of llvm intrinsics to implemement maths functions.
For example f32::cos
here:
https://github.com/rust-lang/rust/blob/master/library/std/src/f32.rs#L610-L612
This is a pragmatic choice as no work needs to be done on many platforms, but LLVM
often falls back on calling libm, especially on older targets. It may have improved this
since I last sampled the code base.
This is very much the B-grade choice as calling any function has a high cold code overhead
and the inability to inline the code prevents efficient loop transformations. The performance
difference is often as bad as 1000:1 on modern hardware but a lot less on older targets.
I'm planning to highlight this in an upcoming book on rust code performance, feedback is welcome.
A better choice would be to at least bite the bullet and use x86/ARM specific intrinsics for those platforms
for primitives like ceil()
and round()
It would be interesting to see what core::intrinsics::cosf32
does on X86 platforms. If I am not mistaken,
it is not allowed to call libm.
Answers on a postcard.
core::intrinsics::cosf32
is codegened to a call on x86 and all other modern CPU targets. The x86 ISA has hardware sin/cos instructions, but they have poor accuracy and they're slow, so they're rarely used. The lowering to a call happens in codegen, after all loop optimizations are complete.
if you enable appropriate target features, floor
, ceil
, fma
, etc. intrinsics are lowered to native instructions on x86, no lib calls.
@programmerjake is correct. When a modern target is enabled, it works well, but the default is always disappointing.
https://rust.godbolt.org/z/Kqsf3YG7K
It would be lovely if -C target-feature=native
was the default for cargo install
, but the argument against this is probably docker images which must always be the lowest common denominator. However, if you install on your own hardware, you expect maximum performance.
Much of the SIMD group's excellent work is hard to use without this option unless you use target_feature
, which in turn
is hard to integrate into libraries.
I'm not a regular reader of Rust discussions, but would imagine this has been discussed before.