This repo contains applications and demos for embedded FPGA systems developed using the magma HDL.
Please refer to the magmathon repo for how to setup magma with the IceStick/HX8K.
Here is a breakdown of the various directories.
These directories contains magma programs written for the Lattice ICE40 IceStick architecture (either the IceStick or the HX8KBoard). They consist of modules (high-level circuits) that perform a specific function and link together to form a contiguous pipeline for capturing, processing, and classifying handwritten digits, and unit tests for some of these modules to test their functionality standalone.
To build and run the tests/examples, first edit the bake
file to include the name of the program to compile in the tests
list, then run ./bake
to generate verilog and pcf files. Next, run ./cat_verilog
if necessary (more explanation on this in the aetherling section below), then navigate to the build directory and edit the Makefile
so that the NAME
field equals the name of the program to run, then call make
followed by make upload
.
A magma module which exposes the SPI interface of an ArduCAM Mini 2MP peripheral for initiating a capture and receiving image data.
The module has three phases or states: a capture command phase, a wait phase, and a burst read phase. During the capture command phase, the programs steps through a set of 2-byte commands consisting of a register and value, which are pre-loaded in ROM during initialization, and sends them to the ArduCAM over the MOSI line. When the ArduCAM slave received these commands, it initiates a capture, and the program goes into the wait phase, continually checking the MISO line for the capture completion flag to be set. Once the flag is read as set, the program sends the burst read command to the ArduCAM, then proceeds to contiually read image data from the MISO line.
In order to initialize the ArduCAM for operation, as Raspberry Pi Model A+ is used. The library https://github.com/ArduCAM/RaspberryPi is installed to the Pi (along with the necessary supporting libraries: i2cdev, i2c-tools, WiringPi) and arducam_ov2640_capture.c
is modified to set the capture format to BMP instead of JPEG (line 76). Now, everytime the ArduCAM is first powered up, the ov2640_capture
program must be run to properly initialize the I2C registers. A way to setup the ArduCAM so it may be used standalone with the FPGA is to first power the ArduCAM with the FPGA, connect the remaining SPI and I2C headers to the appropriate GPIOs on the Raspberry Pi, and also connect the grounds of the FPGA and Pi together. Then, run ov2640_capture
to initialize the camera settings and now the I2C connections may be disconnected and the SPI connections wired to the FPGA's GPIOs.
Here are the pin mappings for the ArduCAM / Raspberry Pi interface
ArduCAM Raspberry Pi
CS pin 11
MOSI pin 19
MISO pin 21
SCK pin 23
GND pin 6, 9
VCC pin 1
SDA pin 3
SCL pin 5
Currently the I2C registers of the ArduCAM are configured for a 320x240 BMP capture, and thus the image data send over the MISO line consists of 153600 bytes (2 bytes per pixel, 320x240 = 76800 pixels). The resolution of the capture can be adjusted by running ov2640_capture
with different arguements: for example, sudo ./ov2640_capture -c test.jpg 320x240
The 2 byte, 16-bit pixels are formatted as RGB565, where the red, green, and blue values are allocated 5, 6, and 5 bits respectively.
A magma module which processes RGB pixel data converting it to grayscale values. It is designed to accept 2 bytes of pixel data at a time from the arducam module output, then interprets it as an 16-bit RGB565 pixel and extracts the red, green, and blue color values and sums them obtain a grayscale value for that pixel. These grayscale values range from 0 to a maximum of 31 + 63 + 31 = 125.
This magma module takes grayscale image data output from the process module, and resizes the image to scale it down, while also converting it to binary. It first makes use of the downscale
aetherling module (more details below), to map over the 320x240 image with a 20x15 sliding window, summing the pixel values in each window to downscale to a 16x16 image. Then a binary threshold is applied so that pixels below a certain value are binarized to 0 and ones above are set to 1. This produces a 16x16 binary version of the original 320x240 RGB image captured by the ArduCAM.
BNN digit classifier adapted from https://github.com/MIT-HAN-LAB/BNN_IceStick. Instead of the loading the image into ROM at initialization time, the circuit has a RAM for storing image bits so that the input of the BNN may be wired to another circuit (e.g. the output of the ArduCAM).
The network itself consists only of a single fully connected layer. It flattens the 16x16 image into a size 256 vector and dots it with 10 (number of classes) size 256 weight vectors. The multiply-accumulate operation of the dot product is computed using an XNOR followed by a popcount in order to take advantage of the binary nature of the data. The results of the veector product are succesively compared to each other in order to determine the maximum, which is then chosen as the output of the classifer.
The main program which defines a top level module (main
). It first uses the arducam
module to capture a raw image and feeds it the process
module, which converts it to grayscale. The grayscale version is passed to the rescale
module, which scales it down to a 16x16 binary image which is then fed to the pipeline
module and classified as a decimal digit, with the result being shown on the LEDs of the FPGA in binary. All processing up to pipeline
is done in a streaming fashion, i.e. each pixel coming in from the camera is processed before the next one comes in. Currently, the SPI clock speed for the ArduCAM is set to 3 MHz, which determines the processing rate for the process
and rescale
modules. The output of the rescale
module is collected in a RAM in the pipeline
module, while the rest of the pipeline set inactive by disabling the clock. When the SPI image transfer is done, and the RAM contains a full image, a flag is set to enable the clock and run the BNN classifier.
Pin mappings for the ArduCAM / HX8K board interface
ArduCAM HX8K
CS J2 pin 4
MOSI J2 pin 5
MISO J2 pin 8
SCK J2 pin 3
GND J4 pin 0
VCC J4 pin 6
and for the logic analyzer debugging pins
HX8K Logic Analyzer Channel Description
J2 pin 9 0 arducam output
J2 pin 10 1 process output
J2 pin 11 2 rescale output
J2 pin 12 3 transfer done
This folder contains a "library" of high level magma circuits defined using the aetherling
library. Because these circuits must be
compiled using the magma coreir backend and coreir mantle target, they cannot be used directly
with the ice40 magma circuits in the IceStick/HX8K folders, and instead must be compiled to
verilog separately then combined with any programs that use them at the verilog level. For example,
the rescale
ice40 module uses the aetherling module downscale_sub
, so in rescale.py
, the downscale circuit
is declared and wired up, and the verilog for the program (say rescale_test.v
) is generated, but the definition and implementation of the downscale circuit exists in downscale_sub.py
, so the verilog downscale_sub.v
must be generated separately by first calling python downscale_sub.py
to create a json file then runnning ./coreir_compile downscale_sub HX8K
which places the output verilog in the build folder of the the HX8K directory, where it can be concatenated to rescale_test.v
and the resulting file synthesized and uploaded to the FPGA.
The downscale.py
module takes a grayscale image as a stream of pixels (1 pixel per clock) and produces a downscaled version by summing together neighboring blocks of pixel values. It uses a linebuffer which stores sections of the image and outputs a sliding window of pixels and a parallel add module to sum these groups of pixels when they become valid. This module is used in the arducam pipeline to scale the 320x240 image to 16x16 using a 20x15 window.
However, summing over the entire 20x15 window requires a 2^8 = 512 size parallel add module, which is impractical due to size contraints of the IceStick and HX8K board. Thus, the downscale_sub.py
module performs the same function as the downscale.py
module, using the same linebuffer to output 20x15 windows of pixels, but only sums 16 of the 300 pixels in the window to calculate the new pixel value.
This module performs binary image convolution with a streaming input image and filter kernel. A linebuffer is used to output a sliding window and the multiply-accumulate of the window with the kernel is carried out with XNOR-popcount since the inputs are binary.
Diagram showing linebuffer used to output a 3x3 sliding window in the convolution of an image (taken from http://blog.teledynedalsa.com/2012/05/image-filtering-in-fpgas/)
This folder contains some miscellaneous python scripts. bmp2png.py
creates a
png image from a csv file of 16-bit RGB pixel values (as bytes),
bytes2jpeg.py
creates a jpg image from a a csv of raw JPEG bytes,
gray2img.py
displays a grayscale image from a csv file of grayscale values, and int2bitarray.py
takes either a
csv or list of int values representing rows of a binary image and displays it as a grid of black and white pixels.
monitor.py
is a simple program to receive UART data from the IceStick.
bmp2png.py
, gray2img.py
, and int2bitarray.py
are designed to be used with the logic analyzer to debug/check
the output of the arducam
, process
, and rescale
modules respectively. All of these modules include a "UART" output port, which transmits output data from the module at twice the SPI clock (SCLK) speed. These UART ports can be wired to GPIOs on the FPGA, which in turn may be connected to a logic analyzer. In the Saleae Logic software, there is an option to add an analyzer to a channel, so we can add an "Async Serial" analyzer to the channels connected to the UART ports and set the bit rate and data width appropriately in order to extract output data from these modules.
Screenshot of a logic analyzer capture showing UART data from the three modules.
The analyzer settings for each module's UART port are:
bit rate (bits/s) bits per transfer
arducam 6000000 8
process 6000000 16
rescale 6000000 16
The data from each of these analyzers can also be exported to a csv file, which can then be fed as input to the appropriate python script to visualize the output data.
Images generated by the
bmp2png.py
,gray2img.py
, andint2bitarray.py
, using csv data from thearducam
,process
, andrescale
modules respectively.