Blarney is a Haskell library for hardware description that builds a range of HDL abstractions on top of a small set of core circuit primitives. It is a modern variant of Lava, requiring GHC 8.6.1 or later. Below, we introduce the library by example, supplementing the Haddock docs.
- Example 1: Two-sort
- Example 2: Bubble sort
- Example 3: Polymorphism
- Example 4: Mutable registers
- Example 5: Queues
- Example 6: Mutable wires
- Example 7: Recipes
- Example 8: Bits class
- Example 9: FShow class
- Example 10: Bit selection
- Example 11: Block RAMs
- Example 12: Streams
- Example 13: Modular compilation
- Example 14: Master-slave pattern
- Example 15: Bit-string pattern matching
- Example 16: Tiny 8-bit CPU
- Example 17: 32-bit RISC-V CPU
Sorting makes for a good introduction to the library. Let's start
with perhaps the simplest kind of sorter possible: one that sorts just
two inputs. Given a pair of 8-bit values, the function twoSort
returns the sorted pair.
import Blarney
twoSort :: (Bit 8, Bit 8) -> (Bit 8, Bit 8)
twoSort (a, b) = a .<. b ? ((a, b), (b, a))
This definition makes use of three Blarney constructs: the Bit
type
for bit vectors (parametised by the size of the vector); the unsigned
comparison operator .<.
; and the ternary conditional operator ?
.
A quick test bench to check that it works:
top :: Module ()
top = always do
display "twoSort (1,2) = " (twoSort (1,2))
display "twoSort (2,1) = " (twoSort (2,1))
finish
We use Blarney's always
construct
always :: Action a -> Module a
which performs the given action on every clock cycle. Blarney
actions include statements for displaying values during simulation
(display
), terminating the simulator (finish
), and mutating state
(see below). All statements in an Action
execute in parallel,
within a clock-cycle. We can generate Verilog for the test bench as
follows.
main :: IO ()
main = writeVerilogTop top "top" "/tmp/twoSort/"
Assuming the above code is in a file named Sorter.hs
, it can be
compiled at the command-line using
> blc Sorter.hs
where blc
stands for Blarney compiler. This is just a script that
invokes GHC with the appropriate compiler flags. For it to work,
the BLARNEY_ROOT
environment variable needs to be set to the root of
the repository, and BLARNEY_ROOT/Scripts
must be in your PATH
.
Running the resulting executable ./Sorter
will produce Verilog in the
/tmp/twoSort
directory, including a makefile to build a Verilator
simulator (sudo apt-get install verilator
). The simulator can be
built and run as follows.
> cd /tmp/twoSort
> make
> ./top
twoSort (1,2) = (01,02)
twoSort (2,1) = (01,02)
Looks like twoSort
is working!
We can build a general N-element sorter by connecting together
multiple two-sorters. One of the simplest ways to do this is the
bubble sort network. The key component of this network is a
function bubble
that takes a list of inputs and returns a new list
in which the smallest element comes first (the smallest element
"bubbles" to the front).
bubble :: [Bit 8] -> [Bit 8]
bubble [] = []
bubble [x] = [x]
bubble (x:y:rest) = bubble (small:rest) ++ [large]
where (small, large) = twoSort (x, y)
If we repeatedly call bubble
then we end up with a sorted list.
sort :: [Bit 8] -> [Bit 8]
sort [] = []
sort xs = smallest : sort rest
where smallest:rest = bubble xs
Running the test bench
top :: Module ()
top = always do
let inputs = [3, 4, 1, 0, 2]
display "sort " inputs " = " (sort inputs)
finish
in simulation yields:
sort [03,04,01,00,02] = [00,01,02,03,04]
To see that the sort
function really is describing a circuit, let's
draw the circuit digram for a 5-element bubble sorter.
-->.
|
-->+---.
| |
Inputs -->+---+---.
| | |
-->+---+---+---.
| | | |
-->+---+---+---+---.
| | | | |
v v v v v
Outputs
The input list is supplied on the left, and the sorted output list is
produced at the bottom. Each +
denotes a two-sorter that takes
inputs from the top and the left, and produces the smaller value to
the bottom and the larger value to the right.
See The design and verification of a sorter core for a more in-depth exploration of sorting circuits in Haskell.
For simplicity, we've made our sorter specific to lists of 8-bit values. But if we look at the types of the primitive functions it uses, we can see that it actually has a more general type.
(.<.) :: Cmp a => a -> a -> Bit 1
(?) :: Bits a => Bit 1 -> (a, a) -> a
So .<.
can be used on any type in the
Cmp
(comparator) class. Similarly ?
can be used on any type in the
Bits
class (which allows serialisation to a bit vector and back
again). So a more generic definition of twoSort
would be:
twoSort :: (Bits a, Cmp a) => (a, a) -> (a, a)
twoSort (a, b) = a .<. b ? ((a, b), (b, a))
Indeed, this would be the type inferred by the Haskell compiler if no type signature was supplied.
So far, we've only seen display
and finish
actions inside a
Blarney module. It also supports creation and assignment of
registers. To illustrate, here is a module that creates a 4-bit
cycleCount
register, increments it on each cycle, stopping when it
reaches 10.
top :: Module ()
top = do
-- Create a register
cycleCount :: Reg (Bit 4) <- makeReg 0
always do
-- Increment on every cycle
cycleCount <== cycleCount.val + 1
-- Display value on every cycle
display "cycleCount = %0d" (cycleCount.val)
-- Terminate simulation when count reaches 10
when (cycleCount.val .==. 10) do
display "Finished"
finish
This example introduces a number of new library functions: makeReg
creates a register, initialised to the given value; val
returns the
value of a register; the .
operator is defined by Blarney as
reverse function application rather than the usual function
composition; and when
allows conditional actions to be introduced.
One can also use if
/then
/else
in an Action
context, thanks to
Haskell's rebindable syntax feature.
-- Terminate simulation when count reaches 10
if cycleCount.val .==. 10
then do
display "Finished"
finish
else
display "Not finished"
Running top
in simulation gives
cycleCount = 0
cycleCount = 1
cycleCount = 2
cycleCount = 3
cycleCount = 4
cycleCount = 5
cycleCount = 6
cycleCount = 7
cycleCount = 8
cycleCount = 9
cycleCount = 10
Finished
Queues (also known as FIFOs) are a commonly used abstraction in hardware
design. Blarney provides a range of different queue
implementations,
all of which implement the following interface available when importing
Blarney.Queue
.
-- Queue interface
data Queue a =
Queue {
notEmpty :: Bit 1 -- Is the queue non-empty?
, notFull :: Bit 1 -- Is there any space in the queue?
, enq :: a -> Action () -- Insert an element (assuming notFull)
, deq :: Action () -- Remove the first element (assuming canDeq)
, canDeq :: Bit 1 -- Guard on the deq and first methods
, first :: a -- View the first element (assuming canDeq)
}
The type Queue a
represents a queue holding elements of type a
,
and provides a range of standard functions on queues. The enq
method should only be called when notFull
is true and the deq
method should only be called when canDeq
is true. Similarly, the
first
element of the queue is only valid when canDeq
is true.
Below, we present the simplest possible implementation of a
one-element queue.
import Blarney.Queue
-- Simple one-element queue implementation
makeSimpleQueue :: Bits a => Module (Queue a)
makeSimpleQueue = do
-- Register holding the one element
reg :: Reg a <- makeReg dontCare
-- Register defining whether or not queue is full
full :: Reg (Bit 1) <- makeReg 0
-- Methods
let notFull = full.val .==. 0
let notEmpty = full.val .==. 1
let enq a = do reg <== a
full <== 1
let deq = full <== 0
let canDeq = full.val .==. 1
let first = reg.val
-- Return interface
return (Queue notEmpty notFull enq deq canDeq first)
The following simple test bench illustrates how to use a queue.
-- Small test bench for queues
top :: Module ()
top = do
-- Instantiate a queue of 8-bit values
queue :: Queue (Bit 8) <- makeSimpleQueue
-- Create an 8-bit count register
count :: Reg (Bit 8) <- makeReg 0
always do
count <== count.val + 1
-- Writer side
when (queue.notFull) do
enq queue (count.val)
display "Enqueued " (count.val)
-- Reader side
when (queue.canDeq) do
deq queue
display "Dequeued " (queue.first)
-- Terminate after 100 cycles
when (count.val .==. 100) finish
Wires are a feature of the Action
monad that offer a way for
separate action blocks to communicate within the same clock cycle.
Whereas assignment to a register becomes visible on the clock cycle
after the assigment occurs, assignment to a wire is visible on the
same cycle as the assignment. If no assignment is made to a wire on a
particular cycle, then the wire emits its default value on that
cycle. When multiple assignments to the same wire occur on the same
cycle, the wire emits the bitwise disjunction of all the assigned
values.
To illustrate, let's implement an n-bit counter module that supports increment and decrement operations.
-- Interface for a n-bit counter
data Counter n =
Counter {
inc :: Action ()
, dec :: Action ()
, output :: Bit n
}
We'd like the counter to support parallel calls to inc
and dec
.
That is, if inc
and dec
are called on the same cycle then the
counter's output
is unchanged. We'll achieve this using wires.
makeCounter :: KnownNat n => Module (Counter n)
makeCounter = do
-- State
count :: Reg (Bit n) <- makeReg 0
-- Wires
incWire :: Wire (Bit 1) <- makeWire 0
decWire :: Wire (Bit 1) <- makeWire 0
always do
-- Increment
when (incWire.val .&. decWire.val.inv) do
count <== count.val + 1
-- Decrement
when (incWire.val.inv .&. decWire.val) do
count <== count.val - 1
-- Interface
let inc = incWire <== 1
let dec = decWire <== 1
let output = count.val
return (Counter inc dec output)
State machines are a common way of defining the control-path of a circuit. They are typically expressed by doing case-analysis of the current-state and manually setting the next-state. Quite often however, they can be expressed more neatly in a Recipe -- a simple imperative language with various control-flow statements.
data Recipe =
Skip -- Do nothing (in zero cycles)
| Tick -- Do nothing (in one cycle)
| Action (Action ()) -- Perform action (in one cycle)
| Seq [Recipe] -- Execute recipes in sequence
| Par [Recipe] -- Fork-join parallelism
| If (Bit 1) Recipe -- Conditional recipe
| While (Bit 1) Recipe -- Loop
| Wait (Bit 1) -- Block until condition holds
To illustrate, here is a small state machine that computes the factorial of 10.
fact :: Module ()
fact = do
-- State
n :: Reg (Bit 32) <- makeReg 0
acc :: Reg (Bit 32) <- makeReg 1
-- Compute factorial of 10
let recipe =
Seq [
Action do
n <== 10
, While (n.val .>. 0) (
Action do
n <== n.val - 1
acc <== acc.val * n.val
)
, Action do
display "fact(10) = %0d" (acc.val)
finish
]
runOnce recipe
Blarney provides a lightweight compiler for the Recipe
language
(under 100 lines of code), which we invoke above through the call to
runOnce
.
A very common use of recipes is to define test sequences. For
example, here is a simple test sequence for the Counter
module
defined earlier.
-- Test-bench for a counter
top :: Module ()
top = do
-- Instantiate an 4-bit counter
counter :: Counter 4 <- makeCounter
-- Sample test sequence
let test =
Seq [
Action do
counter.inc
, Action do
counter.inc
, Action do
counter.inc
counter.dec
, Action do
display "counter = %0d" (counter.output)
finish
]
runOnce test
Here, we increment counter
on the first cycle, and then again on the
second. On the third cycle, we both increment and decrement it in
parallel. On the fourth cycle, we display the value and terminate the
simulator.
Any type in the Bits class can be represented in hardware, e.g. stored in a wire, a register, or a RAM.
class Bits a where
type SizeOf a :: Nat
sizeOf :: a -> Int
pack :: a -> Bit (SizeOf a)
unpack :: Bit (SizeOf a) -> a
The Bits
class supports generic deriving. For example, suppose
we have a simple data type for memory requests:
data MemReq =
MemReq {
memOp :: Bit 1 -- Is it a load or a store request?
, memAddr :: Bit 32 -- 32-bit address
, memData :: Bit 32 -- 32-bit data for stores
}
deriving (Generic, Bits)
To make this type a member of the Bits
class, we have suffixed it
with derving (Generic, Bits)
. The generic deriving mechanism for
Bits
does not support sum types: there is no way to convert a
bit-vector (run-time circuit value) to a sum type
(circuit-generation-time value) using the circuit primitives provided
by Blarney.
Any type in the
FShow
class can be passed as arguments to the
variadic display
function.
class FShow a where
fshow :: a -> Format
fshowList :: [a] -> Format -- Has default definition
-- Abstract data type for things that can be displayed
newtype Format
-- Format constructors
mempty :: Format -- Empty (from Monoid class)
(<>) :: Format -> Format -> Format -- Append (from Monoid class)
As an example, here is how the FShow
instance for pairs is defined.
-- Example instance: displaying pairs
instance (FShow a, FShow b) => FShow (a, b) where
fshow (a, b) = fshow "(" <> fshow a <> fshow "," <> fshow b <> fshow ")"
Like the Bits
class, the FShow
class supports generic deriving:
just include FShow
in the deriving
clause for the data type.
Bit selection is important when we want to extract a subset of bits out of a bit-vector. There are different flavours, depending on whether the index (or indices) are type-level numbers or circuit-generation-time values:
For type-level indices, we provide functions index and range, and use type application to specify the type-level indices:
-- Extract most-sigificant bit of a byte
msb :: Bit 8 -> Bit 1
msb x = index @7 x
-- Extract upper 4 bits of a byte
upperNibble :: Bit 8 -> Bit 4
upperNibble x = range @7 @4 x
For circuit-generation-time indices of type Int
, we provide
bit and
bits:
-- Extract most-sigificant bit of a byte
msb :: Bit 8 -> Bit 1
msb x = bit 7 x
-- Extract upper 4 bits of a byte
upperNibble :: Bit 8 -> Bit 4
upperNibble x = bits (7, 4) x
While index
and range
are type-safe, bit
and bits
are not.
For example, the argument to bit
could be out of range, and the
result of bits
could have a different width to that implied by the
range. Such cases will lead to confusing error messages at
circuit-generation time -- so use with care!
Blarney provides a variety of block RAM modules commonly supported on FPGAs. They are all based around the following interface.
-- Block RAM interface
-- (Parameterised by the address width a and the data width d)
data RAM a d =
RAM {
load :: a -> Action ()
, store :: a -> d -> Action ()
, out :: d
}
When a load
is issued for a given address, the value at that address
appears on out
on the next clock cycle. When a store
is issued,
the value is written to the RAM on the current cycle, and a load of
the new value can be requested on the subsequent cycle. A parallel
load
and store
should only be issued on the same cycle if the RAM
has been created as a dual-port RAM (as opposed to a single-port RAM).
To illustrate, here is a test bench that creates a single-port block
RAM and performs a store
followed by a load
.
top :: Module ()
top = do
-- Instantiate a 256 element RAM of 5-bit values
ram :: RAM (Bit 8) (Bit 5) <- makeRAM
-- Write 10 to ram[0] and read it back again
let test =
Seq [
Action do
store ram 0 10
, Action do
load ram 0
, Action do
display "Got 0x%0x" (ram.out)
finish
]
runOnce test
Somewhat-related to block RAMs are register files. The difference is that a register file allows the value at an address to be determined within a clock cycle. It also allows any number of reads and writes to be performed within the same cycle. Register files have the following interface.
data RegFile a d =
RegFile {
(!) :: a -> d -- Read
, update :: a -> d -> Action() -- Write
}
Unlike block RAMs, register files (especially large ones) do not always map efficiently onto hardware, so use with care!
Streams are another commonly-used abstraction in hardware description. They are often used to implement hardware modules that consume data at a variable rate, depending on internal details of the module that the implementer does not wish to (or is unable to) expose. In Blarney, streams are captured by the following interface.
type Stream a = Source a
data Source a =
Source {
canPeek :: Bit 1
, peek :: a
, consume :: Action ()
}
Streams are closely related to queues. Indeed, any queue can be converted to a stream:
-- Convert a queue to a stream
toStream :: Queue a -> Stream a
toStream q =
Source {
canPeek = q.canDeq
, peek = q.first
, consume = deq q
}
As an example, here's a function that increments each value in the input stream to produce the output stream.
inc :: Stream (Bit 8) -> Module (Stream (Bit 8))
inc xs = do
-- Output buffer
buffer <- makeQueue
always do
-- Incrementer
when (xs.canPeek .&. buffer.notFull) do
consume xs
enq buffer (xs.peek + 1)
-- Convert buffer to a stream
return (buffer.toStream)
So far we've seen examples of top-level modules, i.e. modules with no
inputs or outputs, being converted to Verilog. In fact, any Blarney
function whose inputs and outputs are members of the
Interface class
can be converted to Verilog (and the Interface
class supports
generic deriving). To illustrate, we can convert the function inc
(defined in Example 12) into a Verilog module
as follows.
main :: IO ()
main = writeVerilogModule inc "inc" "/tmp/inc"
The generated Verilog module /tmp/inc/inc.v
has the following
interface:
module inc(
input wire clock
, output wire [0:0] in_consume_en
, input wire [0:0] in_canPeek
, input wire [7:0] in_peek
, input wire [0:0] out_consume_en
, output wire [7:0] out_peek
, output wire [0:0] out_canPeek
);
Considering the definition of the Stream
type, the correspondance
between the Blarney and the Verilog is quite clear:
Signal | Description |
---|---|
in_consume_en |
Output asserted whenever the module consumes an element from the input stream. |
in_canPeek |
Input signalling when there is data available in the input stream. |
in_peek |
Input containing the next value in the input stream. |
out_canPeek |
Output asserted whenever there is data available in the output stream. |
out_peek |
Output containing the next value in the output stream. |
out_consume_en |
Input signalling when the caller consumes an element from the output stream. |
It is also possible to instantiate a Verilog module inside a Blarney
description. To illustrate, here is a function that creates an
instance of the Verilog inc
module shown above.
-- This function creates an instance of a Verilog module called "inc"
makeInc :: Stream (Bit 8) -> Module (Stream (Bit 8))
makeInc = makeInstance "inc"
Notice that interface of the Verilog module being instantiated is
determined from the type signature. Here's a sample top-level module
that uses the makeInc
function:
top :: Module ()
top = do
-- Counter
count :: Reg (Bit 8) <- makeReg 0
-- Input buffer
buffer <- makeQueue
-- Create an instance of inc
out <- makeInc (buffer.toStream)
always do
-- Fill input
when (buffer.notFull) do
enq buffer (count.val)
count <== count.val + 1
-- Consume
when (out.canPeek) do
consume out
display "Got 0x%0x" (out.peek)
when (out.peek .==. 100) finish
Using the following main
function we can generate both the inc
module and a top-level module that instantiates it.
main :: IO ()
main = do
let dir = "/tmp/inc"
writeVerilogModule inc "inc" dir
writeVerilogTop top "top" dir
Using this approach, we can maintain the module hierarchy of a Blarney design whenever we generate Verilog, rather than having to flatten it to massive netlist. This technique can also be used to instantaite any Verilog module within a Blarney design.
This is a common pattern in hardware design. Suppose we wish to move multiplication out of a module and into an separate slave module, where the slave takes requests (pairs of 32-bit integers to multiply) and produces responses (32-bit results).
type MulReq = (Bit 32, Bit 32)
type MulResp = Bit 32
The slave component might be defined as:
slave :: Stream MulReq -> Module (Stream MulResp)
slave reqs = do
resps <- makeQueue
always do
when (reqs.canPeek .&. resps.notFull) do
consume reqs
let (a, b) = reqs.peek
enq resps (a * b)
return (resps.toStream)
The master component produces requests for the slave, and consumes responses from the slave. In the example below, the master simply asks the slave to multiply 2 by 2, and then terminates the simulation.
master :: Stream MulResp -> Module (Stream MulReq)
master resps = do
reqs <- makeQueue
let recipe =
Seq [
Wait (reqs.notFull)
, Action do
enq reqs (2, 2)
, Wait (resps.canPeek)
, Action do
consume resps
display "Result: %0d" (resps.peek)
finish
]
runOnce recipe
return (reqs.toStream)
The top-level module which connects the master and the slave needs to
introduce a cycle, which can be achieved simply using Haskell's
recursive-do (mdo
) notation:
top :: Module ()
top = mdo
resps <- slave reqs
reqs <- master resps
return ()
Recent work on specifying and implementing ISAs led us to develop two
libraries for doing bit-string pattern matching. The first,
BitPat,
is statically-typed and based on the paper Type-safe pattern
combinators.
The second,
BitScan,
is dynamically typed but more expressive.
As an example, BitScan
,
let's us define the following instruction decoder for a tiny subset of
RISC-V.
import Blarney.BitScan
-- Semantics of add instruction
add :: Bit 5 -> Bit 5 -> Bit 5 -> Action ()
add rs2 rs1 rd =
display "add r%0d" (rd.val) ", r%0d" (rs1.val) ", r%0d" (rs1.val)
-- Semantics of addi instruction
addi :: Bit 12 -> Bit 5 -> Bit 5 -> Action ()
addi imm rs1 rd =
display "add r%0d" (rd.val) ", r%0d" (rs1.val) ", 0x%0x" (imm.val)
-- Semantics of store-word instruciton
sw :: Bit 12 -> Bit 5 -> Bit 5 -> Action ()
sw imm rs2 rs1 = display "sw r%0d" rs2 ", %0d(r%0d)" imm rs1
top :: Module ()
top = always do
-- Sample RISC-V store-word instruction
let instr :: Bit 32 = 0b1000000_00001_00010_010_00001_0100011
-- Dispatch
match instr
[
"0000000 rs2[4:0] rs1[4:0] 000 rd[4:0] 0110011" ==> add,
" imm[11:0] rs1[4:0] 000 rd[4:0] 0010011" ==> addi,
"imm[11:5] rs2[4:0] rs1[4:0] 010 imm[4:0] 0100011" ==> sw
]
finish
The nice thing about this decoder is that the scattered immediate
field imm
in the sw
instruction is automatically assembled by the
library. That is, the imm[11:5]
part of the immediate is combined
with the imm[4:0]
part to give the final 12-bit immediate value
passed to the right-hand-side function. Scattered immediates appear a
lot in the RISC-V specification. Thanks to Jon Woodruff for
suggesting this feature!
As a way of briging together a number of the ideas introduced above, let's look at a very simple, 8-bit CPU with the following ISA.
Opcode | Meaning |
---|---|
00 rd[1:0] imm[3:0] |
Write value imm (zero-extended) to register rd |
01 rd[1:0] ra[1:0] rb[1:0] |
Add register ra to register rb and store in register rd |
10 imm[3:0] rb[1:0] |
Branch back by imm instructions if register rb is non-zero |
11 XXXXXX |
Halt |
We have developed a 4-stage pipeline
implemention
of the ISA. Although the ISA is very simple, it does contain a few
challenges for a pipelined implementation, namely control hazards
(due to the branch instruction) and data hazards (due to the add
instruction). We resolve data hazards using register forwarding and
control hazards by performing a pipeline flush when branches are
taken. The CPU will execute the program defined in the file
instrs.hex
.
-- Instructions
type Instr = Bit 8
-- Register identifiers
type RegId = Bit 2
-- Extract opcode
opcode :: Instr -> Bit 2
opcode instr = range @7 @6 instr
-- Extract register A
rA :: Instr -> RegId
rA instr = range @3 @2 instr
-- Extract register B
rB :: Instr -> RegId
rB instr = range @1 @0 instr
-- Extract destination register
rD :: Instr -> RegId
rD instr = range @5 @4 instr
-- Extract immediate
imm :: Instr -> Bit 4
imm instr = range @3 @0 instr
-- Extract branch offset
offset :: Instr -> Bit 4
offset instr = range @5 @2 instr
-- CPU
makeCPU :: Module ()
makeCPU = do
-- Instruction memory
instrMem :: RAM (Bit 8) Instr <- makeRAMInit "instrs.hex"
-- Two block RAMs allows two operands to be read,
-- and one result to be written, on every cycle
regFileA :: RAM RegId (Bit 8) <- makeDualRAMForward 0
regFileB :: RAM RegId (Bit 8) <- makeDualRAMForward 0
-- Instruction register
instr :: Reg (Bit 8) <- makeReg dontCare
-- Instruction operand registers
opA :: Reg (Bit 8) <- makeReg dontCare
opB :: Reg (Bit 8) <- makeReg dontCare
-- Program counter
pcNext :: Wire (Bit 8) <- makeWire 0
let pc = reg 0 (pcNext.val)
-- Result of the execute stage
result :: Wire (Bit 8) <- makeWire 0
-- Wire to trigger a pipeline flush
flush :: Wire (Bit 1) <- makeWire 0
-- Cycle counter
count :: Reg (Bit 32) <- makeReg 0
always (count <== count.val + 1)
-- Trigger for each pipeline stage
go1 :: Reg (Bit 1) <- makeDReg 0
go2 :: Reg (Bit 1) <- makeDReg 0
go3 :: Reg (Bit 1) <- makeDReg 0
always do
-- Stage 0: Instruction Fetch
-- ==========================
-- Index the instruction memory
load instrMem (pcNext.val)
-- Start the pipeline after one cycle
go1 <== 1
-- Stage 1: Operand Fetch
-- ======================
when (go1.val) do
when (flush.val.inv) do
pcNext <== pc + 1
go2 <== 1
load regFileA (instrMem.out.rA)
load regFileB (instrMem.out.rB)
-- Stage 2: Latch Operands
-- =======================
-- Latch instruction
instr <== instrMem.out.old
-- Register forwarding logic
let forward rS other =
(result.active .&. (instr.val.rD .==. instrMem.out.old.rS)) ?
(result.val, other)
-- Latch operands
opA <== forward rA (regFileA.out)
opB <== forward rB (regFileB.out)
-- Trigger stage 3
when (flush.val.inv) do
go3 <== go2.val
-- Stage 3: Execute
-- ================
-- Instruction dispatch
when (go3.val) do
switch (instr.val.opcode)
[
-- Load-immediate instruction
0b00 --> result <== zeroExtend (instr.val.imm),
-- Add instruction
0b01 --> result <== opA.val + opB.val,
-- Branch instruction
0b10 --> when (opB.val .!=. 0) do
pcNext <== pc - zeroExtend (instr.val.offset) - 2
-- Control hazard
flush <== 1,
-- Halt instruction
0b11 --> finish
]
-- Writeback
when (result.active) do
store regFileA (instr.val.rD) (result.val)
store regFileB (instr.val.rD) (result.val)
display "%0d: " (count.val) "rf[r%0d]" (instr.val.rD) " := 0x%0x" (result.val)
Pebbles is a 5-stage 32-bit RISC-V core implemented in Blarney, aiming
for a high-level definition of the RV32I
instruction set with
moderate performance. See the Pebbles
page
for further details.