/py-videocore6

Python library for GPGPU programming on Raspberry Pi 4

Primary LanguagePython

py-videocore6

Python library for GPGPU programming on Raspberry Pi 4.

The V3D DRM driver is based on linux/drivers/gpu/drm/v3d and the assembler is based on mesa/src/broadcom/qpu. Especially, the disassembler in the Mesa repository is a great help in reverse-engineering the instruction set of VideoCore VI QPU. You can try it with Terminus-IMRC/vc6qpudisas.

For Raspberry Pi 1/2/3, use nineties/py-videocore instead.

About VideoCore VI QPU

Raspberry Pi 4 has a GPU named VideoCore VI QPU in its SoC. Though the basic instruction set (add/mul ALU dual issue, three delay slots et al.) remains same as VideoCore IV QPU of Raspberry Pi 1/2/3, the usages of some units are dramatically changed. For instance, the TMU can now write to memory in addition to read. Consult the tests directory for more examples.

Theoretical peak performances of QPUs are as follows. Note that the V3D DRM driver does not seem to support multi-sliced CSD job for now.

  • VideoCore IV QPU @ 250MHz: 250 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [physical core/qpu] x 2 [op/cycle] = 24 [Gflop/s]
  • VideoCore IV QPU @ 300MHz: 300 [MHz] x 3 [slice] x 4 [qpu/slice] x 4 [physical core/qpu] x 2 [op/cycle] = 28.8 [Gflop/s]
  • VideoCore VI QPU @ 500MHz: 500 [MHz] x 2 [slice] x 4 [qpu/slice] x 4 [physical core/qpu] x 2 [op/cycle] = 32 [Gflop/s]

Installation

You can install py-videocore6 directly using pip:

$ apt-get update
$ apt-get install python3-pip
$ pip3 install --user git+https://github.com/Idein/py-videocore6.git

Testing

$ pip3 install --user nose
$ git clone https://github.com/Idein/py-videocore6.git
$ cd py-videocore6/
$ nosetests -v -s

To run all tests including the ones which require root privilege:

$ pip3 install --target dest . nose
$ sudo PYTHONPATH=dest nosetests -v -s

Running examples

In the py-videocore6 directory:

$ PYTHONPATH=. python3 examples/sgemm.py
==== sgemm example (123x567 times 567x512) ====
numpy: 0.03433 sec, 2.086 Gflop/s
QPU:   0.8327 sec, 0.08599 Gflop/s
Minimum absolute error: 0.0
Maximum absolute error: 0.0
Minimum relative error: 0.0
Maximum relative error: 0.0

Note that the current implementation of sgemm is seriously naive, and therefore the performance is low at least for now.

$ pip3 install --target dest .
$ sudo PYTHONPATH=dest python3 examples/pctr_gpu_clock.py
==== QPU clock measurement with performance counters ====
500.535482 MHz