Lassen

The PE for the second generation CGRA (garnet).

Instructions to run

git clone https://github.com/phanrahan/peak.git
cd peak
python setup.py install --user

git clone git@github.com:StanfordAHA/lassen.git
cd lassen
python setup.py install --user

pytest

Date	Person	Status	Task
Feb 13	Alex	Complete	Add BFloat16 add and multiply functional model
			Add Float (configurable width?) type to CoreIR, create float add and multiply operator implementations in CoreIR by wrapping designware module
			Add Float type to Magma
			Generate verilog from Peak
			Figure out what to do about different latencies (if they are different for the floating point ops)
			Add multi-PE support to Peak
			Change CoreIR mapper, PnR to support multi-PE

Compared to the first generation PE (Diablo), Lassen shall have two new features:

BFloat addition and multiplication in every PE
Transcendental functions (div, log, e^x, sin, pow) on BFloats implemented using a cluster of PEs and memory. The PEs and memory will get some small special instructions to support these operations.
It will not support denormalized numbers

+/- 1.mantissa * 2 ^ exponent, where mantissa is a 7 bit unsigned integer, and exponent is 8 bit signed integer (. dot means decimal point)

Implements out = a/b, where a, b and out are all BFloats
It is performed using out = a * (1/b), the BFloat multiply already exists as an instruction. So we basically have to implement reciprocal, 1/b
Let us say b = +/- 1.f * 2 ^ x
1/b = +/- (1/1.f) * 2 ^ (-x)
(1/1.f) is stored as a Bfloat in a look up table in a memory tile. It is a table with 128 entries as f is 7 bits. So you read this entry out, let us say it is some +/- 1.g * 2 ^ y
Then 1/b = +/- 1.g * 2 ^ y * 2 ^ (-x) = +/- 1.g * 2 ^ (y - x)
We implement subtraction of the exponent portions of two BFloats as a new instruction in the PE
So div boils down to a 16 bit lookup from a 128 entry table, one 8 bit signed integer subtraction and 1 BFloat multiply
We must take into account corner cases when a, b, out are not normal numbers

Implements out = ln(a) where a and out are BFloats
Let us say a = +/- 1.f * 2 ^ x
ln(a) should error out when a < 0
Otherwise, ln(a) = ln(1.f * 2 ^ x) = ln(1.f) + x * ln(2)
ln(1.f) is a look up table, similar to what we did for div
We add a special instruction to PE to convert 8 bit signed integer x to a BFloat, ln(2) is also a BFloat
So ln boils down to a lookup, 8 bit signed integer to BFloat conversion, a BFloat multiply and a BFloat add

e^x = (2^(1/ln(2)))^x = 2^(x/ln(2)) = 2^y
We can get y with existing instructions. Bfloat multiply it with a constant 1/ln(2).
Let us just work out the case when y is positive. If y is negative 2^y = 1/(2^-y) and we have already implemented reciprocal.
Convert Bfloat y to a+b where a is integer part and b is fractional part. When b is smaller than 2^-6 it is zero in Bfloat16.
We look up 2^b from a table, this has 64 entries.
Then we increment exponent of the looked up number by a.

To compute sin(x), first we calculate y = x mod (pi/2)
If y is less than some number, return y, else lookup in table. This gets rid of most negative exponents, and table it basically dependent on mantissa.

a^x = e^(ln(a^x)) = e^(x * ln(a))
We already have BFloat multiply, ln, and exponential, so we can implement power.