google-coral/edgetpu

Coral TPU on PCI Express (PCIe) raspberry pi 5 randomly hang

Opened this issue · 0 comments

Description

Hei . I know that this github is not really active but I have decided to document my situation:

TLDR: In raspberry 5 using pineboard hat for PCIe , the examples provided by pycoral, or my own model randomly hang aftear reading the TPU.

Details:

  • Hardware : Raspberry 5 8 Gb
    • Pineboard From here . The full line have been discontinued and the power supply changed to usb C. Could be that the problem?
  • Software: python 3.8.19
    • Linux raspberryb03 6.1.0-rpi7-rpi-v8 1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux

I have also track the tpu temperature who slowly increase after the code hangs, so I decided to kill the process ..

Here the actual hagging output when verbose = 10:

I driver/request.cc:47] Adding input "serving_default_images:0" with 307200 bytes.
I driver/request.cc:58] Adding output "Sigmoid" with 38412 bytes.
I driver/request.cc:58] Adding output "concat_1" with 76824 bytes.
I driver/request.cc:167] Request prepared, total batch size: 1, total TPU requests required: 1.
I driver/driver.cc:307] Request [5]: Submitting P0 request immediately.
I driver/single_tpu_request.cc:80] [6] Request constructed.
I driver/single_tpu_request.cc:113] Adding input "serving_default_images:0" with 307200 bytes.
I driver/single_tpu_request.cc:187] Adding output "Sigmoid" with 38412 bytes.
I driver/single_tpu_request.cc:187] Adding output "concat_1" with 76824 bytes.
I driver/package_registry.cc:639] Reusing old instruction buffers.
I driver/device_buffer_mapper.cc:75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 0000007f7009e000 -> 8000000000800000 (76 pages) flags=00000002.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x8000000000800000
I driver/device_buffer_mapper.cc:222] Mapped "serving_default_images:0" : Buffer(ptr=0x7f7009e040) -> 0x8000000000800040, 307200 bytes. Direction=1
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 00000055a373a000 -> 8000000000880000 (19 pages) flags=00000004.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x8000000000880000
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 00000055a374e000 -> 80000000008a0000 (19 pages) flags=00000004.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x80000000008a0000
I driver/device_buffer_mapper.cc:222] Mapped "concat_1" : Buffer(ptr=0x55a374e000) -> 0x80000000008a0000, 76824 bytes. Direction=2
I driver/device_buffer_mapper.cc:222] Mapped "Sigmoid" : Buffer(ptr=0x55a373a000) -> 0x8000000000880000, 76824 bytes. Direction=2
I driver/single_tpu_request.cc:365] MapDataBuffers() done.
I driver/executable_util.cc:93] Linking serving_default_images:0[0]: 0x8000000000800040
I driver/executable_util.cc:93] Linking Sigmoid[0]: 0x8000000000880000
I driver/executable_util.cc:93] Linking concat_1[0]: 0x80000000008a0000
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 00000055a3677000 -> 80000000008c0000 (64 pages) flags=00000002.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x80000000008c0000
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 00000055a36b8000 -> 8000000000900000 (64 pages) flags=00000002.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x8000000000900000
I driver/kernel/kernel_mmu_mapper.cc:135] MmuMapper#Map() : 00000055a36f9000 -> 8000000000940000 (28 pages) flags=00000002.
I driver/memory/mmio_address_space.cc:55] MapMemory() page-aligned : device_address = 0x8000000000940000
I driver/device_buffer_mapper.cc:222] Mapped "instructions" : Buffer(ptr=0x55a3677000) -> 0x80000000008c0000, 260416 bytes. Direction=1
I driver/device_buffer_mapper.cc:222] Mapped "instructions" : Buffer(ptr=0x55a36b8000) -> 0x8000000000900000, 259600 bytes. Direction=1
I driver/device_buffer_mapper.cc:222] Mapped "instructions" : Buffer(ptr=0x55a36f9000) -> 0x8000000000940000, 113296 bytes. Direction=1
I driver/single_tpu_request.cc:381] MapInstructionBuffers() done.
I driver/single_tpu_request.cc:478] [6] SetState old=0, new=1.
I driver/single_tpu_request.cc:390] [6] NotifyRequestSubmitted()
I driver/single_tpu_request.cc:478] [6] SetState old=1, new=2.
I driver/single_queue_dma_scheduler.cc:82] Request[6]: Submitted
I driver/single_tpu_request.cc:398] [6] NotifyRequestActive()
I driver/single_tpu_request.cc:478] [6] SetState old=2, new=3.
I driver/single_queue_dma_scheduler.cc:132] Request[6]: Scheduling DMA[0]
I ./driver/mmio/host_queue.h:383] Adding an element to the host queue.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485a8, value = 0x0000000000000011
I driver/single_queue_dma_scheduler.cc:132] Request[6]: Scheduling DMA[1]
I ./driver/mmio/host_queue.h:383] Adding an element to the host queue.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485a8, value = 0x0000000000000012
I driver/single_queue_dma_scheduler.cc:132] Request[6]: Scheduling DMA[2]
I ./driver/mmio/host_queue.h:383] Adding an element to the host queue.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485a8, value = 0x0000000000000013
I driver/kernel/linux/kernel_event_linux.cc:75] event_fd=7. Monitor thread got num_events=1.
I ./driver/mmio/host_queue.h:416] Completed 1 elements.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485c8, value = 0x0000000000000000
I driver/single_queue_dma_scheduler.cc:154] Completing DMA[0]
I driver/kernel/linux/kernel_event_linux.cc:75] event_fd=7. Monitor thread got num_events=1.
I ./driver/mmio/host_queue.h:416] Completed 1 elements.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485c8, value = 0x0000000000000000
I driver/single_queue_dma_scheduler.cc:154] Completing DMA[1]
I driver/kernel/linux/kernel_event_linux.cc:75] event_fd=7. Monitor thread got num_events=1.
I ./driver/mmio/host_queue.h:416] Completed 0 elements.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000485c8, value = 0x0000000000000000
I driver/kernel/linux/kernel_event_linux.cc:75] event_fd=11. Monitor thread got num_events=1.
I driver/kernel/kernel_registers.cc:190] Write: offset = 0x00000000000486a8, value = 0x000000000000000e
I driver/kernel/kernel_registers.cc:211] Read: offset = 0x00000000000486d0, value: = 0x0000000000000007

And after that it hang for ever