mvduin/py-uio

examples pypruss

Closed this issue · 21 comments

Dear @mvduin ,

Looks like a great repository. Could you also provide examples of the old Pruss library. I am especially interested in how to write to the pru 0 memory as I am working on a laser scanner for printed circuit board manufacturing which uses that.
so in specific;
how to port memwrite
memread

Greatings from Nijmegen,

Rik

It seems like a rather odd thing to file an issue here asking for examples for a different project which I have never worked on.

@mvduin and @dromer seems I understood this library incorrectly. I was looking for RPROC support.
The pypruss library uses the old UIO interface. I thought this library fixed it. Guess I have to wait for someone to fix this GSOC. My apologies for the confusion.

rproc is useful for kernel drivers that wish to interface with pru firmware. Compared to uio it offers no benefit to userspace code.

I'm curious, why would you want to use rproc instead of uio?

I thought uio would eventually be replaced by rpoc. In some new images, it takes quite some additional work to disable rproc and enable uio. RProc is standard. As a result, I prefer to be using the latter. Anyhow, my current setup does work with uio. If you are interested, i am working on a transparent polygon scanner which uses the beaglebone. Hzeller made a laser scanner with a reflective polygon
. This scanner is limited in resolution. This is removed by a transparent polygon.

In current beagleboard.org images, rproc-pru and uio-pruss are equally easy to use: you can simply uncomment a line in /boot/uEnv.txt to enable one or the other. Also the -bone kernels (based on mainline linux rather than TI's linux tree) only support uio-pruss. UIO itself has been around for ages, much longer than remoteproc, and isn't going anywhere.

As far as I can tell, remoteproc-pru merely adds complication while reducing functionality and without providing any benefit to userspace applications (that I've been able to discern so far), hence I personally don't really feel any urge to invest any time into it.

Note btw that, unlike libprussdrv and the old pypruss, my code supports loading ELF executables produced by clpru (in addition to raw binaries produced by pasm).

I also had to modify my old bootloader in my eMMC as it was blocking U-boot overlays and had to install some modules for uio_pruss as they were not there on the image by default.
Thank you for your clarifying the differences between remoteproc and uipruss.
Your library has several advantages; it's more up to date and written in Python 3. Still, i would appreciate additional examples. So if you can provide an example of how to write and read from PRU0 dram; it would be great.

An old u-boot on eMMC can indeed cause problems when booting from SD card. Reflashing or simply wiping eMMC (sudo blkdiscard /dev/mmcblk1) fixes that. A few versions of the 4.9-ti kernel series were accidently missing the uio-pruss driver but that has been fixed a while ago already, so upgrading to the latest 4.9-ti or 4.14-ti kernel should fix that problem.

PRU0 dram can be accessed via pruss.dram0 or pruss.core0.dram. This is a memory region object whose API is documented on the wiki. Various examples use shared memory:

  • intc-test.py (and intc-test-asyncio.py) use both dram0 and dram1
  • elf-test.py uses pruss.dram2 (aka core.shared_dram)
  • ddr-ping.py uses the ddr memory region allocated by uio-pruss
  • stream.py uses multiple memories

I have almost ported memread, I can read dram0, dram1 but unable to read the shared ram. I get 0x0 but except 0xbabe0002, see code.
I like your library a lot better. It has a lot more features and thus not require sudo privileges.

#!/usr/bin/python3
""" mem_read.py - test script for writing to PRU 0 mem using pyuio library """

from pyuio.ti.icss import Icss
from ctypes import c_uint32

pruss = Icss("/dev/uio/pruss/module")
pruss.initialize()

pruss.core0.load('./mem_read.bin')

pruss.core0.run()

while not pruss.core0.halted:
    pass

pru0mem = pruss.core0.dram.map(c_uint32)
print(hex(pru0mem.value))
pru1mem = pruss.core1.dram.map(c_uint32)
print(hex(pru1mem.value))
sharedmem = pruss.core0.shared_dram.map(c_uint32)
print(hex(sharedmem.value))  # NOT WORKING GET 0 expect 0xbabe0002
shmem = pruss.ddr.map(c_uint32)
print(hex(shmem.value))

Your pru program is setting c28 (of both cores, rude!) to 0x12000, i.e. offset 0x2000 in the shared dram. However, this is not where your python program is reading.

A nicer approach is to set c28 in python before calling core.run(). Here's an example:

from ti.icss import Icss
from ctypes import c_uint32

pruss = Icss("/dev/uio/pruss/module")
pruss.initialize()

core = pruss.core0  # core1 also works

core.load('fw/mem_read.bin')
core.c24 = 0x00100
core.c25 = 0x02f00
core.c28 = 0x12300
core.r4  = pruss.ddr.address + 0x12344
core.run()

while not core.halted:
    pass

print( hex( core.dram       .read( c_uint32, 0x100 ) ) )
print( hex( core.peer_dram  .read( c_uint32, 0xf00 ) ) )
print( hex( core.shared_dram.read( c_uint32, 0x2300 ) ) )
print( hex( pruss.ddr       .read( c_uint32, 0x12344 ) ) )

with corresponding pru code:

#include "common.h"

.entrypoint start
start:
        mov     r0, 0xbabe0000
        sbco    r0, c24, 0, 4

        mov     r0, 0xbabe0001
        sbco    r0, c25, 0, 4

        mov     r0, 0xbabe0002
        sbco    r0, c28, 0, 4

        mov     r0, 0xbabe0003
        sbbo    r0, r4, 0, 4

        halt

Note: commit 45cabc4 is needed to make those .read( c_uint32, offset ) calls return python integers instead of c_uint32 objects.

BTW, keep in mind that memories retain their contents, so if you accidently break the pru program you might not notice at first if the correct values are still in memory from a previous test. One solution would be to zero-fill all the memories as part of initialization. I've just made doing so a lot easier by adding a fill method to MemRegion (commit 6a6fc60) and an option to fill all memories (commit bb98a8b) using:

pruss.initialize( fill_memories=True )

Also, just to point out a technicality: from a theoretical point of view, it is not actually guaranteed that python will read the correct value from pruss.ddr. The pru core halts after the final sbbo instruction has been executed, but the corresponding write-transfer may still be in flight on the main interconnect that connects various cores, memories, and peripherals on the SoC. Since the pruss.ddr.read from the Cortex-A8 takes a different path to the DDR memory controller, it could theoretically arrive there before the PRU's write does.

In practice this will probably never happen, since the timing window is extremely small (transit times on the interconnect are normally a few dozen nanoseconds). It is however also quite easy to guarantee that it will never happen: any read from DDR memory by PRU will implicitly wait for all previous writes to DDR memory by PRU to complete. Adding a dummy read after the write and before notifying the cortex-a8 will therefore ensure that the data has safely landed and therefore the cortex-a8 cannot read stale data.

@mvduin thanks for all your help. The project has been quite successful on Hackaday , I have 2200 followers, am in the semi-finals and some working prototype. At the moment, I use the most recent version of your software.
I also have two questions, as you seem to be the expert in this kind of thing.

  1. If I configure my pins after a boot via config-pin -f firestarter.bbio the laser turns on. The only solution I can think of is running a script on the PRU. For some reason, I have to reset the PRU. Do you know of a more elegant solution? The relevant pin is here set to the mode pruout.
  2. At the moment, I use a ring buffer to send data to the laser head. The approach is similar to your intc-test. The pru and cpu communicate via register 31. Now the photodiode is read out via register 31 as well. This can give conflicts. The PRU needs to know exactly when the photodiode is high so it keeps a hold on register 31. A request from the CPU can introduce noise. At the moment, i fix this by passing the following
irq = Uio("/dev/uio/pruss/irq%d" % IRQ, blocking=False)

and using

    while True:
        result = irq.irq_recv()
        if result:
            break
        else:
            sleep(1E-3)

I also keep the window of my photo-diode very small.
What are my other options here? Should I read out the pin via GPIO? Can I use another register for communication between the CPU and PRU?

The following parts of PRUSS are not initialized by hardware, and will therefore initially contain random data after power-on until they are initialized by software:

  • registers r0-r30 of both cores (r31 is not a real register and has no state)
  • instruction memory for both cores (iram0, iram1)
  • all three pruss-local data memories (dram0, dram1, dram2)
    When using py-uio, you typically call pruss.initialize() which will call core.full_reset() on both cores, which will (among other things) reset r0-r30 to zero. If you pass fill_memories=True to pruss.initialize() it will also zero-fill the data memories and fill the instruction memories with the HALT instruction.

When you configure pins to 'pruout', they will immediately begin driving their output value from r30 of the relevant pru core, hence you should ensure this register is initialized before you configure the pins to pruout. A simple solution is to configure the pins from your python script after having initialized pruss. Note that all config-pin does is read/write the pinmux state attributes in sysfs, which you can just as easily do in pure python, e.g.

def sysfs_write( path, data ):
    if isinstance( data, str ):
        data = data.encode()
    with open( path, 'wb', buffering=0 ) as f:
        f.write( data )

def config_pin( pin, mode ):
    sysfs_write( '/sys/bus/platform/devices/ocp:'+pin+'_pinmux/state', mode )

config_pin( 'P8_11', 'pruout' )

(Should you ever feel the desire to declare pinmux in DT instead of using cape-universal, I have a kernel patch that allows userspace to select the pinmux state of UIO devices via an ioctl(). You can then use DT to declare a "default" state that doesn't mux any pruout-pins and an "active" state that does, and switch the pinmux state from default to active after initializing pruss)

As for your second question, I can't even begin to guess what you're talking about. The code you're showing is related to receiving events from pru and the changes you made (making the irq device non-blocking and polling it 1000 times per second) just makes your python script waste CPU time and cause it to respond more slowly to events generated by PRU. It has no effect whatsoever on PRU itself.

Like I said before, R31 is not a real register: reading from it reads the pruin pins (and two irq outputs from the pruss interrupt controller), writing to it is used for generating events (sent to the pruss interrupt controller), and these two uses of R31 are completely unrelated to each other and do not influence each other in any way.

Thank you for your reply. I still have to add fill memories to the code but will do it asap.
Your code for configuring pins to pruout is useful. I think this code is best implemented by the adafruit io library. I have found a bug and already suggested your code to them.
In case your interested, the pru code I use is here. The python code I use is here. The conflict occurs between line 111 in my python code and line 80 in my pru code. I think the PRU keeps quite a strong lock on R31. Typically, this should be at least give an error like "time over run" in line 384 but for some reason this doesn't occur. I will talk to zeller about it.

My impression is that the adafruit python libraries are mostly terrible, and often superfluous.

I have no idea what you mean by "conflict", there is no interaction possible whatsoever between the two lines you indicated, so I think you're just confused about the root cause of whatever problem you're experiencing.

The adafruit library does have bugs, but the less code I have to write/maintain the better.

The conflict arises as pru core 0 looks at register 31 to determine whether there is an input from the photodiode , i.e. QBBS bit_is_set, r31, 16 and the CPU looks at register 31, result = irq.irq_recv() to determine whether there is a new trigger from the pru MOV R31.b0, PRU0_ARM_INTERRUPT+16.
This causes competition between the PRU and CPU over register 31.
I am pretty sure the python line causes the conflict as I can disable the script and let the PRU run. This effects the speed at which the polygon spins and how well the photodiode is detected.
What i could do is try do to something else with the pru, like flip a GPIO pin I don't use and check with the cpu wether this GPIO pin is flipped.

The cortex-a8 cannot access PRU registers while the PRU core is running, and in particular never accesses R31, nor does it have it any reason to access it.

You still seem very confused about how R31 works. Please read carefully what I said in my comment from yesterday:

R31 is not a real register: reading from it reads the pruin pins (and two irq outputs from the pruss interrupt controller), writing to it is used for generating events (sent to the pruss interrupt controller), and these two uses of R31 are completely unrelated to each other and do not influence each other in any way.

To emphasize: when PRU does MOV R31.b0, PRU0_ARM_INTERRUPT+16 this has absolutely no effect on the value read from R31, nor does the irq handling on the cortex-a8 have any effect whatsoever on the value PRU reads R31.

Your ringbuffer code looks really confusing to me btw and I suspect it is wrong, but I'll need to study it a bit more to try to understand what it's trying to do.

Some other random remarks:

  • It is pointless to do pruss.intc.out_enable_one(IRQ) twice in the loop. It only needs to be done once before the loop and once after receiving an IRQ (irq_recv() returns true)
  • Every use of dram.map in your code is a silly use of this method and can be replaced by dram.read to get the same result more efficiently. For memory accessed frequently (e.g. the ringbuffer) using map can be more efficient and/or convenient, but this isn't how you use that method (you use map only once and then read/write the fields/elements of the mapped object to access the underlying shared memory).
  • It feels a bit weird to use halted as loop condition in an irq-handling loop: if the core halts without sending an event it will hang forever, and even if the core sends an event before halting there's technically a race condition in the python-side check (although unlikely to ever trigger in practice). I suggest you have PRU send an event after it indicates it's exiting with CMD_DONE and use that as criterion to break out of the irq-handling loop. You can optionally add a second loop to wait for halted after that if you have a reason to care about making sure the core has actually halted, and in fact you already have such a loop.

Thanks for the feedback. I will clean up my code and see if I can simplify my problem and maybe that will solve it. It could take some time, but will keep you posted.

Problems have been resolved. There were two issues. A mistake in the state machine on the assembly side. Another problem was that the laser would affect the current supplied to the polygon motor. That's what created cross talk. If the laser was turned on the motor would get slightly less current.
If not enough data was sent to the state machine, it had to restart which created even more noise. There was interplay between these two mistakes.