StanfordAHA/lassen

No way to map FP round currently to lassen.

Opened this issue · 13 comments

Being able to do a FP round will give us round, ceil, and floor which are required from the Halide applications.

To be very clear here, the round function that we need takes in a float and produces a float. It does not cast it into an int.

Round can be implemented (f is a bloat number)
fi = int(f)
ff = frac(f)
fi = fi + (ff>=0x80)
fr = int2float(fi)

I have added int2float, so this is now doable.

@rdaly525 so...looking at the comments...this is resolved?

The explicit task is the following:

  • Write 'Round' complex op in lassen/stdlib
  • Write a test for Round using hwtypes.FPVector (please do not use float2bfbin/bfbin2float)
  • Write 'Ceil' complex op in lassen/stdlib
  • Write a test for Ceil using hwtypes.FPVector (please do not use float2bfbin/bfbin2float)
  • Write 'Floor' complex op in lassen/stdlib
  • Write a test for Floor using hwtypes.FPVector (please do not use float2bfbin/bfbin2float)

@nikhilbhagdikar, if you need an example of how to use FPVector, you can refer to tests/test_pe.py

I can try to take on round

We can partially implement round by cascading FCnvInt2F and FGetFint (int2float and float2int). However it only works for a sub range of floats* also it implements round to 0 (floor on positive, ceil on negative). So round(-.7) == round(.7) == 0 and round(1.2) == round(1.7) == 1

*empirically it works (-BOUND, BOUND) for BOUND=215-26-1

I think we could implement a full range round to 0 with the following:

def round(x):
  if exponent(x) > mantissa_size(x):
     return x #x is already an integer
  else:
     return i2f(f2i(x))

However this will take 5 PEs (extract exponent, compare, mux, i2f, f2i) instead of 2.

To get a proper floor we could use the following algorithm:

def floor(x):
   sign = x & (1 << 15) #gets the sign bit
   a_x = abs(x)
   r_x = round(a_x)
   return r_x | sign #sets the sign bit

which takes 3 additional PEs over round (&, abs, |)
ceil would be 1 PE over floor as we could use basically the same procedure but need the negative absolute value.

round to nearest could also be implementable as 1 PE over floor

def round_nearest(x):
  return floor(x+0.5)

Long story short:
round_towards_zero(x) where x is guaranteed to be in some limited range takes 2 PEs
round_towards_zero(x) where x can be anything takes 5 PEs
floor(x) +3 PEs over round_towards_zero
ceil(x) +4 PEs over round_towards_zero
round_nearest(x) +4 PEs over round_towards_zero

@nikhilbhagdikar If you have any ideas on how to do any of this with fewer PEs let me know

@cdonovick, thanks for doing this analysis. @jeffsetter, could you comment on the range issue for halide applications? Specifically, what sets of applications could we get away with using the Bounded round?

*empirically it works (-BOUND, BOUND) for BOUND=2^15-2^6-1

Since the range of bfloat is approximately +- 2^127, this rounding loses many orders of magnitude of the bfloat range. Essentially this renders the rounding to integers and omits rounding of large floating point numbers entirely.

Sorry, I missed that the range does not have to be bounded, but instead takes 5 PEs. I believe we are most likely to use that mapping in most cases.

It is unclear to me if the above strategy is the only one possible for rounding. The algorithm I use in Halide is below, which may be useful in finding a more efficient mapping.
https://github.com/StanfordAHA/Halide-to-Hardware/blob/handcrafted/src/EmulateFloat16Math.cpp#L220

@jeffsetter just to be clear we talking about implementing a round operations bfloat->bfloat (round(1.2) == 1.0, round(12.0) == 12.0) where only fractional parts (mantissa bits that have value < 1) are rounded. It looks like you are rounding ieee 754 singles to bfloat which is a very different task.

Github is stupid, partially resolves != resolves