rust-lang/rust

Enable `f16` and `f128` in assembly on platforms that support it

tgross35 opened this issue · 5 comments

The below should work, but errors that f16 is not usable for registers:

#![feature(f16, f128)]

use core::arch::asm;

#[inline(never)]
pub fn f32_to_f16(a: f32) -> f16 {
    a as f16
}

#[inline(never)]
pub fn f32_to_f16_asm(a: f32) -> f16 {
    let ret: f16;
    unsafe {
        asm!(
                "fcvt    {ret:h}, {a:s}",
                a = in(vreg) a,
                ret = lateout(vreg) ret,
                options(nomem, nostack),
        );
    }

    ret
}

On aarch64 the first function generates:

example::f32_to_f16::hc897184dfb47f3d6:
        fcvt    h0, s0
        ret

f16 types should be supported as a vreg on aarch64 in order to reproduce that code.


The following other platforms also apparently have some level of instruction support, but are less well documented:

Additionally, for f128:

  • s390x supports f128, referred to as "BFP Extended Format" in https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf. I am not sure if this comes with any special instructions.
  • PowerPC with -Ctarget-cpu=pwr9 seems to have f128 support via instructions like xsaddqp

Tracking issue: #116909

I'm adding E-Easy because a PR that just enables support for aarch64 should be pretty easy, start around

ty::Float(FloatTy::F32) => Some(InlineAsmType::F32),
ty::Float(FloatTy::F64) => Some(InlineAsmType::F64),
and massage the new types in. Actually figuring out rules for the rest of the platforms will be harder, but that can come later.

Sample for reference: https://rust.godbolt.org/z/zK4qha1qo

@rustbot label +T-compiler +E-Easy +F-f16_and_f128 +A-inline-assembly -needs-triage

@tgross35 I can try to submit a PR, can you give me some guidance?

Hi @lengrongfu, thanks for the interest!

This should be pretty easy I think. Start by making a test in tests/ui/asm/ that contains the assembly function from my original post. Make sure this fails when you run ./x t --stage 1 path/to/your/new/test.rs.

Then just find where the error is emitted (search the codebase for "cannot use value of type") and work backwards from that until the test passes. This will probably mean adding F16 to InlineAsmType and then chasing down errors.

We will need to make sure that this works on platforms with support (e.g. aarch64) but still fails on those without it (e.g. x86). Just focus on getting aarch64 to build first.

There is a compiler help stream on Zulip https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp feel free to ask if you get stuck! Also not a bad idea to post a draft PR as soon as you have some basic work done, even if not yet passing.

I think E-easy label should be removed from this issue.

Fair enough - it is still pretty easy for a compiler change, but does require some background knowledge.