Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

luojia65 · 2020-06-05T06:22:10Z

Recently I came up with floating point variable classification. I found the function f32::classify useful. Regardless of instruction set architecture, this function is implemented currently like this in standard library:

#[stable(feature = "rust1", since = "1.0.0")]
pub fn classify(self) -> FpCategory {
    const EXP_MASK: u32 = 0x7f800000;
    const MAN_MASK: u32 = 0x007fffff;
    
    let bits = self.to_bits();
    match (bits & MAN_MASK, bits & EXP_MASK) {
        (0, 0) => FpCategory::Zero,
        (_, 0) => FpCategory::Subnormal,
        (0, EXP_MASK) => FpCategory::Infinite,
        (_, EXP_MASK) => FpCategory::Nan,
        _ => FpCategory::Normal,
    }
}

However, this standard library's function compiles to very long bunches of instructions. On RISC-V RV64GC, it compiles into:

very long assembly code

example::classify_std:
  fmv.w.x ft0, a0
  fsw ft0, -20(s0)
  lwu a0, -20(s0)
  sd a0, -48(s0)
  j .LBB0_1
.LBB0_1:
  lui a0, 2048
  addiw a0, a0, -1
  ld a1, -48(s0)
  and a0, a0, a1
  lui a2, 522240
  and a2, a2, a1
  sw a0, -32(s0)
  sw a2, -28(s0)
  mv a2, zero
  bne a0, a2, .LBB0_3
  j .LBB0_2
.LBB0_2:
  lw a0, -28(s0)
  mv a1, zero
  beq a0, a1, .LBB0_7
  j .LBB0_3
.LBB0_3:
  lwu a0, -28(s0)
  mv a1, zero
  sd a0, -56(s0)
  beq a0, a1, .LBB0_8
  j .LBB0_4
.LBB0_4:
  lui a0, 522240
  ld a1, -56(s0)
  bne a1, a0, .LBB0_6
  j .LBB0_5
.LBB0_5:
  lw a0, -32(s0)
  mv a1, zero
  beq a0, a1, .LBB0_9
  j .LBB0_10
.LBB0_6:
  addi a0, zero, 4
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_7:
  addi a0, zero, 2
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_8:
  addi a0, zero, 3
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_9:
  addi a0, zero, 1
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_10:
  mv a0, zero
  sb a0, -33(s0)
  j .LBB0_11
.LBB0_11:
  lb a0, -33(s0)
  ret

To solve this problem, RISC-V provided with fclass.{s|d|q} instructions. According to RISC-V's spec Section 11.9, instruction fclass.s rd, rs1 examines rs1 as 32-bit floating point number, and store its type into rd. By this way we use register rd and to judge which enum value of Rust standard library we should return.

I'd like to explain this procedure in Rust code. The new way looks like this:

pub fn classify_riscv_rvf(input: f32) -> FpCategory {
    let ans: usize;
    // step 1: map this f32 value into RISC-V defined integer type number
    // this procedure could be built in into compiler
    unsafe { llvm_asm!(
        "fclass.s a0, fa0"
        :"={a0}"(ans)  
        :"{fa0}"(input)
        :
        :"intel"
    ) };
    // step 2: convert from return flags to FpCategory enum value
    if ans & 0b10000001 != 0 {
        return FpCategory::Infinite;
    }
    if ans & 0b01000010 != 0 {
        return FpCategory::Normal;
    }
    if ans & 0b00100100 != 0 {
        return FpCategory::Subnormal;
    }
    if ans & 0b00011000 != 0 {
        return FpCategory::Zero;
    }
    FpCategory::Nan
}

It compiles into the following assembly code which is shorter and could be executed faster:

example::classify_riscv_rvf:
        fclass.s a0, fa0
        andi    a2, a0, 129
        addi    a1, zero, 1
        beqz    a2, .LBB0_2
.LBB0_1:
        add     a0, zero, a1
        ret
.LBB0_2:
        andi    a2, a0, 66
        addi    a1, zero, 4
        bnez    a2, .LBB0_1
        andi    a2, a0, 36
        addi    a1, zero, 3
        bnez    a2, .LBB0_1
        andi    a0, a0, 24
        snez    a0, a0
        slli    a1, a0, 1
        add     a0, zero, a1
        ret

For f64 types, we could use fclass.d instruction other than fclass.s for f32s. If in the future we had a chance to introduce f128 primitive type, there's also fclass.q instruction. After using these instructions, it improves speed on this function in RISC-V platforms. This enhancement is especially significant for embedded devices. I suggest to change the implementation of this function in the standard library. We may implement it by any of following ways:

Implement fclassf32 and fclassf64 intrinsics function into core::instrinsics, and call them in f32::classify or f64::classify. These functions can be implemented with special instruction or fallback in other platforms;
Use inline assembly directly in standard library and add a #[cfg(..)] to compile it only in RISC-V targets with floating extension F or D respectively, or fallback in other platforms.

The text was updated successfully, but these errors were encountered:

nagisa · 2020-06-05T12:42:39Z

Would probably start with adding a target-specific intrinsic for this instruction to LLVM.

luojia65 · 2020-06-05T14:00:02Z

Yes, I think so. Is there existing works in LLVM? @nagisa

nagisa · 2020-06-05T14:34:30Z

Not that I’m aware of.

nbdd0121 · 2020-06-27T23:20:21Z

In new ASM syntax:

asm!(
    "fclass.s {}, {}",
    out(reg) ans,
    in(freg) input,
    options(pure, nomem, nostack)
);

@nagisa is there any reason that we prefer to add an intrinsic to LLVM rather than just use inline assembly?

nagisa · 2020-06-27T23:25:47Z

An LLVM intrinsic allows LLVM to properly understand the underlying operation, and optimise it better (e.g. const-evaluating it is the most trivial example).

nbdd0121 · 2020-06-27T23:31:47Z

I just did a test and found that the classify function we have in libstd does the job pretty well https://godbolt.org/z/b_Fyh8. I suspect the original "long assembly" is compiled without optimisation enabled.

I think the generated code is good enough and probably we don't need to bother adding intrinsic or using assembly.

@rustbot modify labels: -I-heavy

rustbot removed the I-heavy Issue: Problems and improvements with respect to binary size of generated code. label Jun 27, 2020

nbdd0121 mentioned this issue Jul 22, 2020

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

Open

workingjubilee added the A-floating-point Area: Floating point numbers and arithmetic label Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

luojia65 commented Jun 5, 2020 •

edited

Loading

nagisa commented Jun 5, 2020 •

edited

Loading

luojia65 commented Jun 5, 2020

nagisa commented Jun 5, 2020

nbdd0121 commented Jun 27, 2020

nagisa commented Jun 27, 2020

nbdd0121 commented Jun 27, 2020 •

edited

Loading

Use fclass.{s|d|q} instruction for float point classification in RISC-V targets #73015

Use fclass.{s|d|q} instruction for float point classification in RISC-V targets #73015

Comments

luojia65 commented Jun 5, 2020 • edited Loading

nagisa commented Jun 5, 2020 • edited Loading

luojia65 commented Jun 5, 2020

nagisa commented Jun 5, 2020

nbdd0121 commented Jun 27, 2020

nagisa commented Jun 27, 2020

nbdd0121 commented Jun 27, 2020 • edited Loading

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

Use `fclass.{s|d|q}` instruction for float point classification in RISC-V targets #73015

luojia65 commented Jun 5, 2020 •

edited

Loading

nagisa commented Jun 5, 2020 •

edited

Loading

nbdd0121 commented Jun 27, 2020 •

edited

Loading