Enable `f16` and `f128` in assembly on platforms that support it

The below should work, but errors that f16 is not usable for registers:

#![feature(f16, f128)]

use core::arch::asm;

#[inline(never)]
pub fn f32_to_f16(a: f32) -> f16 {
    a as f16
}

#[inline(never)]
pub fn f32_to_f16_asm(a: f32) -> f16 {
    let ret: f16;
    unsafe {
        asm!(
                "fcvt    {ret:h}, {a:s}",
                a = in(vreg) a,
                ret = lateout(vreg) ret,
                options(nomem, nostack),
        );
    }

    ret
}

On aarch64 the first function generates:

example::f32_to_f16::hc897184dfb47f3d6:
        fcvt    h0, s0
        ret

f16 types should be supported as a vreg on aarch64 in order to reproduce that code.

The following other platforms also apparently have some level of instruction support, but are less well documented:

arm-*, armv7-*, aarch64-*, https://developer.arm.com/documentation/den0024/a/Porting-to-A64/Data-types
- #126555
- #127043
PowerPC PowerISA apparently has a half-precision format according to section 7.3.2.1 https://files.openpower.foundation/s/dAYSdGzTfW4j2r2, but I can't get LLVM to emit any instructions for it. Per others, the VSX feature on PowerISA 3.1+ has conversion support for f16, and the SVP64 feature (which I can't find documented anywhere) adds full hardware support
MIPS with the MSA extension...? https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00868-1D-MSA64-AFP-01.12.pdf section 3.1 says "16-bit floating-point storage format is supported through conversion instructions to/from 32-bit floating-point data.", I am unsure whether its vector registers have any special support
riscv64gc with the Q extension: https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-2023-05-23/src/q-st-ext.adoc
- #126530
x86 specifies an ABI for these types, and AVX512fp16 can use them
- #126417

Additionally, for f128:

s390x supports f128, referred to as "BFP Extended Format" in https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf. I am not sure if this comes with any special instructions.
PowerPC with -Ctarget-cpu=pwr9 seems to have f128 support via instructions like xsaddqp

Tracking issue: #116909

I'm adding E-Easy because a PR that just enables support for aarch64 should be pretty easy, start around

rust/compiler/rustc_hir_analysis/src/check/intrinsicck.rs

Lines 65 to 66 in b54dd08

    
           ty::Float(FloatTy::F32) => Some(InlineAsmType::F32), 
        
           ty::Float(FloatTy::F64) => Some(InlineAsmType::F64),

and massage the new types in. Actually figuring out rules for the rest of the platforms will be harder, but that can come later.

Sample for reference: https://rust.godbolt.org/z/zK4qha1qo

@rustbot label +T-compiler +E-Easy +F-f16_and_f128 +A-inline-assembly -needs-triage

@tgross35 I can try to submit a PR, can you give me some guidance?

Hi @lengrongfu, thanks for the interest!

This should be pretty easy I think. Start by making a test in tests/ui/asm/ that contains the assembly function from my original post. Make sure this fails when you run ./x t --stage 1 path/to/your/new/test.rs.

Then just find where the error is emitted (search the codebase for "cannot use value of type") and work backwards from that until the test passes. This will probably mean adding F16 to InlineAsmType and then chasing down errors.

We will need to make sure that this works on platforms with support (e.g. aarch64) but still fails on those without it (e.g. x86). Just focus on getting aarch64 to build first.

There is a compiler help stream on Zulip https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp feel free to ask if you get stuck! Also not a bad idea to post a draft PR as soon as you have some basic work done, even if not yet passing.

I think E-easy label should be removed from this issue.

Fair enough - it is still pretty easy for a compiler change, but does require some background knowledge.

	ty::Float(FloatTy::F32) => Some(InlineAsmType::F32),
	ty::Float(FloatTy::F64) => Some(InlineAsmType::F64),