raspberrypi/linux

I2C issues with 5.4 + TC358743

Closed this issue · 115 comments

Describe the bug
After updating the kernel from version 4.19.118 to 5.4.35 an attempt to get an image from the tc358743 device twice in a row (reopen device file) results to I2C timeouts. In some cases, this leads to a hang (see log1), sometimes it causes errors when working with the MMC card and a reboot (see log2).

UPD: The first message was not entirely accurate. The problem occur when the first reading process is interrupted. That is, I run yavta, press Ctrl+C and immediately got a dead kernel. I played around with yavta a bit and found out that the problem occurs either when closing /dev/video0 or when executing ioctl VIDIOC_STREAMOFF. I think the tc358743 driver is trying to command something over I2C, and everything stops working.

/UPD

[   79.678547] ------------[ cut here ]------------
[   79.678554] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   79.678569] tc358743 10-000f: i2c_wr: writing register 0x4 from 0xf failed
[   79.683241] WARNING: CPU: 3 PID: 34 at drivers/firmware/raspberrypi.c:63 rpi_firmware_transaction+0xe8/0x124
[   79.705713] Firmware transaction timeout
[   79.705715] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial btsdio bluetooth ecdh_generic ecc brcmfmac brcmutil cfg80211 raspberrypi_hwmon hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam i2c_bcm2835 iproc_rng200 rng_core bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common rpivid_mem uio_pdrv_genirq uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[   79.766074] CPU: 3 PID: 34 Comm: kworker/3:1 Tainted: G         C        5.4.35-1-ARCH #1
[   79.774292] Hardware name: BCM2711
[   79.777717] Workqueue: events dbs_work_handler
[   79.782192] [<c0211424>] (unwind_backtrace) from [<c020c6fc>] (show_stack+0x10/0x14)
[   79.789977] [<c020c6fc>] (show_stack) from [<c0ccc638>] (dump_stack+0x94/0xb4)
[   79.806238] [<c0ccc638>] (dump_stack) from [<c022ceb8>] (__warn+0xd0/0xf8)
[   79.817766] [<c022ceb8>] (__warn) from [<c022d29c>] (warn_slowpath_fmt+0x98/0xc4)
[   79.834301] [<c022d29c>] (warn_slowpath_fmt) from [<c0b36468>] (rpi_firmware_transaction+0xe8/0x124)
[   79.852463] [<c0b36468>] (rpi_firmware_transaction) from [<c0b36550>] (rpi_firmware_property_list+0xac/0x168)
[   79.871582] [<c0b36550>] (rpi_firmware_property_list) from [<c0b3666c>] (rpi_firmware_property+0x60/0x108)
[   79.890551] [<c0b3666c>] (rpi_firmware_property) from [<c0928494>] (raspberrypi_clock_property+0x48/0x78)
[   79.909488] [<c0928494>] (raspberrypi_clock_property) from [<c09285f8>] (raspberrypi_fw_set_rate+0x44/0xb8)
[   79.928821] [<c09285f8>] (raspberrypi_fw_set_rate) from [<c0921c94>] (clk_change_rate+0xe0/0x558)
[   79.947361] [<c0921c94>] (clk_change_rate) from [<c0922284>] (clk_core_set_rate_nolock+0x178/0x1a0)
[   79.966220] [<c0922284>] (clk_core_set_rate_nolock) from [<c09222dc>] (clk_set_rate+0x30/0x88)
[   79.984877] [<c09222dc>] (clk_set_rate) from [<c0b03da8>] (dev_pm_opp_set_rate+0x364/0x460)
[   80.003253] [<c0b03da8>] (dev_pm_opp_set_rate) from [<c0b0d0bc>] (set_target+0x2c/0x54)
[   80.021370] [<c0b0d0bc>] (set_target) from [<c0b07e08>] (__cpufreq_driver_target+0x220/0x534)
[   80.040342] [<c0b07e08>] (__cpufreq_driver_target) from [<c0b0b32c>] (od_dbs_update+0xb4/0x160)
[   80.059658] [<c0b0b32c>] (od_dbs_update) from [<c0b0c4c4>] (dbs_work_handler+0x2c/0x58)
[   80.078284] [<c0b0c4c4>] (dbs_work_handler) from [<c02489ec>] (process_one_work+0x1f0/0x588)
[   80.097449] [<c02489ec>] (process_one_work) from [<c0248dd0>] (worker_thread+0x4c/0x528)
[   80.116348] [<c0248dd0>] (worker_thread) from [<c024ec28>] (kthread+0x128/0x154)
[   80.134600] [<c024ec28>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[   80.152846] Exception stack(0xef2a9fb0 to 0xef2a9ff8)
[   80.163429] 9fa0:                                     00000000 00000000 00000000 00000000
[   80.182818] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   80.202646] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   80.215376] ---[ end trace 91bfd0c131224965 ]---
[   80.226095] raspberrypi-clk firmware-clocks: Failed to change pllb frequency: -110
[   80.718577] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   80.730213] tc358743 10-000f: i2c_rd: reading register 0x2 from 0xf failed
[   81.758599] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   81.770193] tc358743 10-000f: i2c_wr: writing register 0x2 from 0xf failed
[   82.798624] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   82.810166] tc358743 10-000f: i2c_wr: writing register 0x2 from 0xf failed
[   83.838648] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   83.850108] tc358743 10-000f: i2c_wr: writing register 0x14c from 0xf failed
[   84.878670] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   84.890186] tc358743 10-000f: i2c_wr: writing register 0x150 from 0xf failed
[   85.918697] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   85.930211] tc358743 10-000f: i2c_wr: writing register 0x210 from 0xf failed
[   86.958710] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   86.970186] tc358743 10-000f: i2c_wr: writing register 0x214 from 0xf failed
[   87.998734] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   88.010264] tc358743 10-000f: i2c_wr: writing register 0x218 from 0xf failed
[   89.038749] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   89.050252] tc358743 10-000f: i2c_wr: writing register 0x21c from 0xf failed
[   90.078765] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   90.090261] tc358743 10-000f: i2c_wr: writing register 0x220 from 0xf failed
[   91.118779] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   91.130418] tc358743 10-000f: i2c_wr: writing register 0x224 from 0xf failed
[   92.158791] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   92.170372] tc358743 10-000f: i2c_wr: writing register 0x228 from 0xf failed
[   93.198809] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   93.210184] tc358743 10-000f: i2c_wr: writing register 0x22c from 0xf failed

To reproduce
Set up Auvidea B101 with kernel 4.4.35, run any capture software that supports DV-timings (https://github.com/pikvm/ustreamer, yavta, etc) and stop it.

Expected behaviour
No crashes when closing the device file

Actual behaviour
Subj

System

  • Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW
    4B 1Gb
  • Which OS and version (cat /etc/rpi-issue)?
    Arch Linux ARM (uses the kernel from this repo).
  • Which firmware version (vcgencmd version)?
May  1 2020 17:56:42
Copyright (c) 2012 Broadcom
version 688a8f8f3d84e788f07f53c93693e1705f68c407 (clean) (release) (start_x)
  • Which kernel version (uname -a)?
Linux pikvm 5.4.35-1-ARCH #1 SMP PREEMPT Sun May 3 21:53:42 UTC 2020 armv7l GNU/Linux

Logs
Attached kernel log from tty.
Just I2C errors: log1.txt

MMC crash: log2.txt

@6by9 I'm sorry to call you directly, but you may be interested in this issue. However, I'm not sure if this is related to the tc358743 driver, because somehow it affected the MMC card driver as well. If you need more logs, I will try to provide them to you as soon as possible.

6by9 commented

Please provide your full configuration from config.txt.

5.4 has reworked i2c-0 to use the i2c-mux-pinctrl module to switch it between GPIOs 0&1, and whichever GPIOs the display/camera connector are attached to (either 28&29, or 44&45 dependent on variant). Any userspace messing with pinmuxing for i2c-0 should be avoided. i2c-0 is always GPIOs 0&1. i2c-10 is the camera/display.

We've also had a fairly significant update to the bcm2835-unicam driver as part of the work for libcamera, but that seems sound.

I'll have a look to see if there is anything obvious using standard v4l2-ctl commands.

MMC and the firmware blocked warnings generally mean you've managed to kill the firmware, typically a memory trample somehow.

6by9 commented

It looks like my B101 board has died - it is recognised fine, set EDID and timings fine, but it doesn't produce any output data with either 5.4.38 or 4.19.118.
I'll see if I can get a B102 working on a CM as I don't think I have another B101 around.

6by9 commented

OK, working fine with v4l2-ctl and yavta on a CM3L with B102, with kernel 5.4.35 and firmware 688a8f8f3d84e788f07f53c93693e1705f68c407 (rpi-update 8d64ec70 - top of tree today).
Source is 720p60 from a Pi3. Works in either RGB3 or UYVY.

More information required, particularly the source resolution and frame rate, format chosen from the B101, and what processing you're doing.

I have a suspicion that it may be a firmware change that is the problem, not the kernel. There were a few changes to the video_encoder and ISP components relatively recently.
Can you do rpi-update to get the latest kernel and firmware, and then sudo SKIP_KERNEL=1 rpi-update e5d3320 to revert the firmware to that hash (28 Feb in this case). If that works, work forward in commits to https://github.com/Hexxeh/rpi-firmware/commits/master until it fails.

config.txt:

initramfs initramfs-linux.img followkernel

hdmi_force_hotplug=1
gpu_mem=256
start_x=1
enable_uart=1
dtoverlay=tc358743
dtoverlay=pi3-disable-bt
dtoverlay=dwc2
dtparam=i2c_arm=on

The last line does not affect the presence of the problem, I added it out of desperation when trying to understand what is happening. If you delete it, the problem persists.

More information required, particularly the source resolution and frame rate, format chosen from the B101, and what processing you're doing.

I tried to simplify the way to reproduce the problem. The initial info about the launch of two times in a row, quite correct, the error appears upon termination of the read stream. I used yavta with EDID from your repository. Tried 720p/60Hz and 1080p/60Hz with UYVY - both resulted in a hang when pressing Ctrl+C.

./yavta --capture=1000 -n 3 --encode-to=file.h264 -f UYVY -m -T /dev/video0
...
Ctrl+C
// Kernel crash

It looks like my B101 board has died - it is recognised fine, set EDID and timings fine, but it doesn't produce any output data with either 5.4.38 or 4.19.118.

By the way. This is not relevant, but as my users have found out, the B101 may burn due to an accidentally flown CSI cable.

Both 4.x and 5.x kernels used firmware 688a8f8. I am using Arch Linux and using rpi-update will completely break the system. So I installed minimal Raspbian and used same config.txt exclude initramfs and i2c_arm lines. I also updated the system and executed rpi-update 8d64ec70. The problem was successfully reproduced wih yavta:

pi@raspberrypi:~ $ vcgencmd version
May  1 2020 17:56:42
Copyright (c) 2012 Broadcom
version 688a8f8f3d84e788f07f53c93693e1705f68c407 (clean) (release) (start_x)
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 5.4.35-v7l+ #1314 SMP Fri May 1 17:47:34 BST 2020 armv7l GNU/Linux

Then I started moving through the firmware versions, as you said.

pi@raspberrypi:~ $ sudo SKIP_KERNEL=1 rpi-update e5d3320
<reboot>
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 5.4.35-v7l+ #1314 SMP Fri May 1 17:47:34 BST 2020 armv7l GNU/Linux
pi@raspberrypi:~ $ vcgencmd version
Feb 28 2020 18:45:36
Copyright (c) 2012 Broadcom
version 9520663ed4fa4ca8c2a6923a522bff85fcdedef5 (clean) (release) (start_x)

The problem was successfully reproduced again.

I assume you failed to reproduce it because you used CM3, not RPi4. Perhaps the model of the video capture board has some significance (but they seem to differ slightly from each other).

6by9 commented

1080p60 can't be supported using a B101.

The B101 and B102 only differ in the number of CSI2 lanes that they present on the output connector, and the physical connector itself.
CM3 vs Pi4 have very few differences in the hardware blocks involved here. We're using the Unicam driver heavily for the recent libcamera work, so know that is sound over multiple runs.

I currently have no way to reproduce, so can't actually help.
Things to try (all after having provided an EDID):

  • provide the output from raspigpio get after the first run of yavta, and again after the second run has failed to start streaming (ideally when the second run is progress too).
  • v4l2-ctl --set-dv-bt-timings query v4l2-ctl --stream-mmap=3 --stream-to=/dev/null --stream-count=1000 and run the stream command for second time. That is the simplest V4L2 setup going.
  • drop out the --encode-to=file.h264 from yavta. If that works then it points to an issue with H264 encode.
  • use rpi-update e1050e9482 to drop back the kernel to the last 4.19 kernel and nearly latest firmware.

1080p60 can't be supported using a B101.

Sorry, probably the KDE dialog is deceiving me and we are talking about 50 Hz. I forgot about this feature.

CM3 vs Pi4 have very few differences in the hardware blocks involved here.

I found a 3B+ board and tried. The problem is not reproduced on it even without --encode-to=file.h264. I believe that the problem is still directly related to RPi4.

provide the output from raspigpio get after the first run of yavta, and again after the second run has failed to start streaming (ideally when the second run is progress too).

I can't. I made a mistake in the initial diagnostics; immediately when the yavta process is interrupted, the core becomes inoperable.

drop out the --encode-to=file.h264 from yavta. If that works then it points to an issue with H264 encode.

Reproduced.

v4l2-ctl --stream-mmap=3 --stream-to=/dev/null --stream-count=1000
It doesn't do anything, just returns New timings found. Are some parameters missing?

rpi-update e1050e9482

Not reproduced.


Given the fact that I also tested on RPi 3B+ and the problem did not reproduce there, it seems that the problem is reproduced exactly and only on 5.4 and RPi4. The problem exists on two different B101 boards and several RPi4 users I contacted.

What else could I do to help you debug this?

6by9 commented

You can try adding core_freq_min=250 to /boot/config.txt, although I'd expect that to impact both kernel versions.

v4l2-ctl --stream-mmap=3 --stream-to=/dev/null --stream-count=1000
should print out a '>' for every frame received, and print an fps value each second.

You've got me fairly stumped if rpi-update e1050e9482 (4.19) works but 8d64ec7 (5.4) doesn't, and only on a Pi4.

I'll have a dig through my parts boxes to see if I have another B101. Unfortunately there isn't a way to attach a B102 to a standard Pi - the 15pin to 22pin adapters are not reversible. The FFC cable on the one that was working is fairly kinked, so if nothing else I can try swapping the cables. It'd be nice to know what the cause of the failure was, but probably just static onto the CSI lines (shame it's that sensitive).

6by9 commented

@pelwell Any thoughts as to any setup that has changed only on Pi4 between 4.19 and 5.4 that would trip I2C this up?
We have got Maxime's updated clock driver, but that applies to both Pi3 and Pi4.

Any thoughts as to any setup that has changed only on Pi4 between 4.19 and 5.4 that would trip I2C this up?

There's nothing specific to Pi 4 that I'm aware of that only affects I2C. It's probably worth making sure the EEPROM is up-to-date:

$ sudo rpi-eeprom-update

@6by9

You can try adding core_freq_min=250 to /boot/config.txt, although I'd expect that to impact both kernel versions.

Made with both kernels. It didn't affect anything.

@pelwell

BCM2711 detected
BOOTLOADER: up-to-date
CURRENT: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 LATEST: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: up-to-date
CURRENT: 000137ad
 LATEST: 000137ad

There are a lot of under-voltage warnings in your "i2c" log above. Either you are using a bad power supply or you are trying to power too much from your Pi.

The Pi 4 includes under-voltage detection logic because there is a real possibility your Pi will malfunction in some way or other. Every silicon die is unique in its analogue electrical properties, causing the voltage tolerance to vary between devices, but as you reduce the voltage gradually eventually the weakest link in the chain will break. If you're lucky, one of the ARM cores will panic immediately, but the failure could be memory corruption that has time to propagate and perhaps even make it to persistent storage.

You won't get support here until the under-voltage is resolved - it's not in your best interests.

(Sorry for the deleted message, I attached the wrong log)

@pelwell Okay, good point. I changed the power supply to a more powerful one and again successfully reproduced the problem, but this time without a single report of power problems.

The terminal log is attached. Messages may differ slightly from launch to launch (as I've shown before), but it always starts with I2C messages.

screenlog.txt

I understand that the problem looks strange. But it really exists not only for me, but for five other users that I know. All of them updated the kernel to 5.4 on RPi4 and all of them started having problems with B101 by I2C.

I have excluded all previous assumptions about the nature of this bug. How else can I help fix it?

I'm starting to think that I2C is just collateral damage caused by a firmware lockup.

Your original reports says "an attempt to get an image from the tc358743 device twice in a row (reopen device file) results [in] I2C timeouts." Let's see if we can narrow down the cause of the failure by breaking the process down into a sequence of smaller steps.

  1. Disable the initial loading of the driver by commenting out the dtoverlay line in config.txt.
  2. Boot the Pi and use i2cdetect, i2cget, etc. to see if I2C seems to be working.
  3. Apply the overlay manually from the command line - "sudo dtoverlay tc358743".
  4. More i2cdetect, i2cget, etc.
  5. Try to get one image from the device.
  6. i2cdetect.
  7. Try to get another image from the device.
  8. i2cdetect.

The first message was not entirely accurate, I wrote. The problem does not occur on the second start, but when the first reading process is interrupted. That is, I run yavta, press Ctrl+C and immediately got a dead kernel. I played around with yavta a bit and found out that the problem occurs either when closing /dev/video0 or when executing ioctl VIDIOC_STREAMOFF. I think the tc358743 driver is trying to command something over I2C, and everything stops working.

So I tried to follow your instructions:

  1. Done.
  2. I didn't see any i2c devices in /dev and added in i2c_arm=on in config.txt. After that, I saw the i2c-1, but there were no devices in it. If I don't disable the overlay, I can execute and get the following:
pi@raspberrypi:~ $ i2cdetect -y 10
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- UU
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ dmesg | grep tc3
[    6.492908] tc358743 10-000f: tc358743 found @ 0x1e (i2c-11-mux (chan_id 1))
pi@raspberrypi:~ $ sudo i2cget 10 0x1e
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will read from device file /dev/i2c-10, chip address 0x1e, current data
address, using read byte.
Continue? [Y/n]
Error: Read failed
  1. Disabling the overlay and trying to load it manually:
pi@raspberrypi:~ $ sudo dtoverlay tc358743
* Failed to apply overlay '0_tc358743' (kernel)
pi@raspberrypi:~ $ dmesg | tail -n 20
... // Nothing related
[   93.592265] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/i2c@1/status
[   93.592364] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/status
[   93.592388] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/port/endpoint/remote-endpoint
[   93.592448] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c@7e205000/status
[   93.592466] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/status
[   93.592490] OF: overlay: ERROR: multiple fragments add and/or delete node /soc/i2c0mux/i2c@1/tc358743@0f
[   93.592631] OF: overlay: ERROR: multiple fragments add, update, and/or delete property /soc/i2c0mux/i2c@1/tc358743@0f/name
[   93.592724] OF: overlay: ERROR: multiple fragments add and/or delete node /soc/i2c0mux/i2c@1/tc358743@0f/port
[   93.592742] OF: overlay: ERROR: multiple fragments add, update, and/or delete property /soc/i2c0mux/i2c@1/tc358743@0f/port/name
[   93.592763] OF: overlay: ERROR: multiple fragments add and/or delete node /soc/i2c0mux/i2c@1/tc358743@0f/port/endpoint
[   93.593019] OF: overlay: ERROR: multiple fragments add, update, and/or delete property /soc/i2c0mux/i2c@1/tc358743@0f/port/endpoint/name

The errors in 3 suggest that you have failed to disable the overlay in config.txt. You can verify that the overlay isn't present by looking in /proc/device-tree/soc/i2c0mux/i2c@1.

6by9 commented

You'll need dtparam=i2c_vc=on as we're looking for the camera/display i2c, not the one on GPIOs 2&3.
Addr 0x0f showing up as UU also denotes that there is a kernel driver already loaded and using that address, which stops userspace access (unless you add a force flag - -f in i2cget).

@pelwell I clarified that. The driver is not loaded and the overlay is missing:

pi@raspberrypi:~ $ cat /boot/config.txt
hdmi_force_hotplug=1
gpu_mem=256
start_x=1
enable_uart=1
#dtoverlay=tc358743
dtoverlay=pi3-disable-bt
#dtoverlay=dwc2
#core_freq_min=250
#dtparam=i2c_arm=on
pi@raspberrypi:~ $ dmesg | grep tc3
pi@raspberrypi:~ $ ls /proc/device-tree/soc/i2c0mux/i2c@1
'#address-cells'   name   phandle   reg  '#size-cells'
pi@raspberrypi:~ $ find /proc/device-tree/soc/i2c0mux/
/proc/device-tree/soc/i2c0mux/
/proc/device-tree/soc/i2c0mux/i2c@0
/proc/device-tree/soc/i2c0mux/i2c@0/#address-cells
/proc/device-tree/soc/i2c0mux/i2c@0/#size-cells
/proc/device-tree/soc/i2c0mux/i2c@0/phandle
/proc/device-tree/soc/i2c0mux/i2c@0/reg
/proc/device-tree/soc/i2c0mux/i2c@0/name
/proc/device-tree/soc/i2c0mux/compatible
/proc/device-tree/soc/i2c0mux/pinctrl-1
/proc/device-tree/soc/i2c0mux/status
/proc/device-tree/soc/i2c0mux/i2c@1
/proc/device-tree/soc/i2c0mux/i2c@1/#address-cells
/proc/device-tree/soc/i2c0mux/i2c@1/#size-cells
/proc/device-tree/soc/i2c0mux/i2c@1/phandle
/proc/device-tree/soc/i2c0mux/i2c@1/reg
/proc/device-tree/soc/i2c0mux/i2c@1/name
/proc/device-tree/soc/i2c0mux/#address-cells
/proc/device-tree/soc/i2c0mux/i2c-parent
/proc/device-tree/soc/i2c0mux/#size-cells
/proc/device-tree/soc/i2c0mux/phandle
/proc/device-tree/soc/i2c0mux/pinctrl-0
/proc/device-tree/soc/i2c0mux/name
/proc/device-tree/soc/i2c0mux/pinctrl-names
pi@raspberrypi:~ $

You're right - the overlay was written in a time when overlays modifying their own fragments was a bad idea, for which there is a workaround with an unfortunate side effect where you can't apply the overlay at runtime. You should be on a recent firmware with a 5.4 kernel, so I'll tweak the overlay to make it work in both scenarios.

@6by9 Thanks! Correct me if I'm doing something wrong. I have never worked with I2C.

pi@raspberrypi:~ $ cat /boot/config.txt 
hdmi_force_hotplug=1
gpu_mem=256
start_x=1
enable_uart=1
#dtoverlay=tc358743
dtoverlay=pi3-disable-bt
dtparam=i2c_vc=on
#dtoverlay=dwc2
#core_freq_min=250
#dtparam=i2c_arm=on
pi@raspberrypi:~ $ ls /dev/i2c-*
/dev/i2c-0  /dev/i2c-10  /dev/i2c-11
pi@raspberrypi:~ $ i2cdetect -y 0
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ i2cdetect -y 10
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- 0f
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ i2cdetect -y 11
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- 0f
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ dmesg | grep tc3

@pelwell I didn't quite understand - are you going to make some kind of fix now and I will need to download it via rpi-update? Or is something else required of me?

Can you test this updated overlay for me? It applies for me without any errors:

// SPDX-License-Identifier: GPL-2.0-only
// Definitions for Toshiba TC358743 HDMI to CSI2 bridge on VC I2C bus
/dts-v1/;
/plugin/;

/{
	compatible = "brcm,bcm2835";

	fragment@0 {
		target = <&i2c_csi_dsi>;
		__overlay__ {
			#address-cells = <1>;
			#size-cells = <0>;
			status = "okay";

			tc358743@0f {
				compatible = "toshiba,tc358743";
				reg = <0x0f>;
				status = "okay";

				clocks = <&tc358743_clk>;
				clock-names = "refclk";

				tc358743_clk: bridge-clk {
					compatible = "fixed-clock";
					#clock-cells = <0>;
					clock-frequency = <27000000>;
				};

				port {
					tc358743: endpoint {
						remote-endpoint = <&csi1_ep>;
						clock-lanes = <0>;
						clock-noncontinuous;
						link-frequencies =
							/bits/ 64 <486000000>;
					};
				};
			};
		};
	};

	fragment@1 {
		target = <&csi1>;
		__overlay__ {
			status = "okay";

			port {
				csi1_ep: endpoint {
					remote-endpoint = <&tc358743>;
				};
			};
		};
	};

	fragment@2 {
		target = <&tc358743>;
		__overlay__ {
			data-lanes = <1 2>;
		};
	};

	fragment@3 {
		target = <&tc358743>;
		__dormant__ {
			data-lanes = <1 2 3 4>;
		};
	};

	fragment@4 {
		target = <&i2c0if>;
		__overlay__ {
			status = "okay";
		};
	};

	fragment@5 {
		target = <&i2c0mux>;
		__overlay__ {
			status = "okay";
		};
	};

	__overrides__ {
		4lane = <0>, "-2+3";
		link-frequency = <&tc358743>,"link-frequencies#0";
	};
};

@pelwell Sure. How do I do this?

Save the code to a file - tc358743-overlay.dts is the usual name. Then run:

$ dtc -@ -I dts -O dtb -o tc358743.dtbo tc358743-overlay.dts
$ sudo cp tc358743.dtbo /boot/overlays
6by9 commented

/dev/i2c-0 is GPIOs 0&1 (pins 28&29)
/dev/i2c-10 is the camera/display I2C
/dev/i2c-11 is an anomally, and is whichever of the above was last selected :-/

pi@raspberrypi:~ $ dmesg | grep tc3
[    6.492908] tc358743 10-000f: tc358743 found @ 0x1e (i2c-11-mux (chan_id 1))

would imply that you did still have the overlay loaded from config.txt, but we're beyond that now.

[ corrected typo in the cp line above ]

@6by9 I have only pi3-disable-bt to access to UART. The rest is disabled, it can be seen from config.txt. I don't know where it came from.

@pelwell Okay give me 5 minutes.

Use sudo vcdbg log msg to read diagnostic messages from the firmware - you should see overlays being processed (or not).

6by9 commented

Use sudo vcdbg log msg to read diagnostic messages from the firmware - you should see overlays being processed (or not).

Doesn't that need dtdebug=1 in config.txt to show up overlay stuff?

You will get more info that way, but the basic yes/no is there from the file accesses.

e.g.

004510.857: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xb4ac
004587.307: brfs: File read: 46252 bytes
004599.989: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
005060.356: brfs: File read: 1371 bytes
005062.675: brfs: File read: /mfs/sd/config.txt
005062.760: dtparam: audio=on
005154.713: brfs: File read: 269 bytes
005167.340: brfs: File read: /mfs/sd/overlays/vc4-fkms-v3d.dtbo
005305.932: Loaded overlay 'vc4-fkms-v3d'
005656.871: brfs: File read: 1238 bytes
005659.221: brfs: File read: /mfs/sd/cmdline.txt
005659.373: Read command line from file 'cmdline.txt':
005659.463: 'zswap.zpool=zbud xearlycon=uart8250,mmio32,0xfe215040 console=serial0,115200 console=tty1 root=PARTUUID=ea7d04d6-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait plymouth.ignore-serial-consoles loglevel=8'
007400.053: brfs: File read: 218 bytes
007900.456: brfs: File read: /mfs/sd/kernel7l.img
007900.493: Loading 'kernel7l.img' to 0x8000 size 0x5b1628
007900.540: Device tree loaded to 0x2eff4500 (size 0xba75)

I don't know if it's important:

pi@raspberrypi:~ $ dtc -@ -I dts -O dtb -o tc358743.dtbo tc358743-overlay.dts
tc358743.dtbo: Warning (unit_address_vs_reg): /fragment@3: node has a unit name, but no reg property
tc358743.dtbo: Warning (unit_address_format): /fragment@0/__overlay__/tc358743@0f: unit name should not have leading 0s

Test 1: dtoverlay enabled in config.txt
The video device has appeared, everything looks as usual. The problem persists. Also log:

pi@raspberrypi:~$ sudo vcdbg log msg
002905.214: arasan: arasan_emmc_open
002905.395: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003010.156: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003010.315: arasan: arasan_emmc_set_clock C0: 0x00000f00 C1: 0x000ec2c7 emmc: 250000000 actual: 129937 div: 0x000003c2 target: 130000 min: 100000 max: 400000 delay: 15
003065.069: arasan: arasan_emmc_set_clock C0: 0x00000f06 C1: 0x000e0307 emmc: 250000000 actual: 41666666 div: 0x00000003 target: 40000000 min: 0 max: 40000000 delay: 1
003336.362: brfs: File read: /mfs/sd/config.txt
003729.286: brfs: File read: 175 bytes
003755.583: brfs: File read: /mfs/sd/config.txt
004818.382: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
004820.853: *** Restart logging
004820.891: brfs: File read: 175 bytes
004834.033: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
004834.078: HDMI0: hdmi_pixel_encoding: 300000000

004834.090: HDMI1: hdmi_pixel_encoding: 300000000

004847.151: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
004847.190: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xb494
004921.359: brfs: File read: 46228 bytes
004940.560: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
005388.771: brfs: File read: 1371 bytes
005393.997: brfs: File read: /mfs/sd/config.txt
005394.885: brfs: File read: 175 bytes
005409.107: brfs: File read: /mfs/sd/overlays/tc358743.dtbo
005520.570: Loaded overlay 'tc358743'
005853.887: dtwarn: overlay 'pi3-disable-bt' has been renamed 'disable-bt'
005854.627: brfs: File read: 2071 bytes
005861.742: brfs: File read: /mfs/sd/overlays/disable-bt.dtbo
005962.834: Loaded overlay 'pi3-disable-bt'
006241.088: brfs: File read: 954 bytes
006246.055: brfs: File read: /mfs/sd/cmdline.txt
006246.172: Read command line from file 'cmdline.txt':
006246.245: 'console=ttyAMA0,115200 console=tty1 root=PARTUUID=738a4d67-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cma=128M'
008139.703: brfs: File read: 130 bytes
009313.353: brfs: File read: /mfs/sd/kernel7l.img
009313.439: Loading 'kernel7l.img' to 0x8000 size 0x5b1ce0
009313.527: Device tree loaded to 0x2eff4500 (size 0xba38)
009314.266: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
011073.242: vchiq_core: vchiq_init_state: slot_zero = 0xe6d80000, is_master = 1
011076.615: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
011083.469: TV service:host side not connected, dropping notification 0x00000002, 0x00000002, 0x00000004
015382.209: camsubs: Ignoring camera 0 as unicam device not available

Test 2: manual dtoverlay

[   79.904868] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/i2c@1/status
[   79.916570] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/status
[   79.928013] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/port/endpoint/remote-endpoint
[   79.941549] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c@7e205000/status
[   79.953121] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/status
[   79.980701] i2c i2c-11: Added multiplexed i2c bus 0
[   79.987784] i2c i2c-11: Added multiplexed i2c bus 10
002903.685: arasan: arasan_emmc_open
002903.866: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003008.626: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003008.785: arasan: arasan_emmc_set_clock C0: 0x00000f00 C1: 0x000ec2c7 emmc: 250000000 actual: 129937 div: 0x000003c2 target: 130000 min: 100000 max: 400000 delay: 15
003063.354: arasan: arasan_emmc_set_clock C0: 0x00000f06 C1: 0x000e0307 emmc: 250000000 actual: 41666666 div: 0x00000003 target: 40000000 min: 0 max: 40000000 delay: 1
003334.717: brfs: File read: /mfs/sd/config.txt
003727.635: brfs: File read: 176 bytes
003753.931: brfs: File read: /mfs/sd/config.txt
004816.725: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
004819.196: *** Restart logging
004819.234: brfs: File read: 176 bytes
004832.380: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
004832.426: HDMI0: hdmi_pixel_encoding: 300000000

004832.438: HDMI1: hdmi_pixel_encoding: 300000000

004845.496: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
004845.534: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xb494
004919.767: brfs: File read: 46228 bytes
004938.994: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
005387.191: brfs: File read: 1371 bytes
005392.550: brfs: File read: /mfs/sd/config.txt
005392.988: dtwarn: overlay 'pi3-disable-bt' has been renamed 'disable-bt'
005393.742: brfs: File read: 176 bytes
005397.717: brfs: File read: /mfs/sd/overlays/disable-bt.dtbo
005497.973: Loaded overlay 'pi3-disable-bt'
005773.171: brfs: File read: 954 bytes
005778.156: brfs: File read: /mfs/sd/cmdline.txt
005778.279: Read command line from file 'cmdline.txt':
005778.360: 'console=ttyAMA0,115200 console=tty1 root=PARTUUID=738a4d67-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cma=128M'
007663.137: brfs: File read: 130 bytes
008856.027: brfs: File read: /mfs/sd/kernel7l.img
008856.106: Loading 'kernel7l.img' to 0x8000 size 0x5b1ce0
008856.204: Device tree loaded to 0x2eff4700 (size 0xb86f)
008856.948: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
010625.001: vchiq_core: vchiq_init_state: slot_zero = 0xe6d80000, is_master = 1
010628.448: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
010635.271: TV service:host side not connected, dropping notification 0x00000002, 0x00000002, 0x00000004
014915.139: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015215.947: camsubs: Camera not found
015216.042: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015516.839: camsubs: Camera not found
015516.932: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015517.716: camsubs: Camera not found
015517.808: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015518.592: camsubs: Camera not found
015518.683: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015519.278: camsubs: Camera found OK
pi@raspberrypi:~ $ i2cdetect -y 0
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ i2cdetect -y 10
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- 0f
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ i2cdetect -y 11
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- 0f
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
pi@raspberrypi:~ $ dmesg | grep tc3
pi@raspberrypi:~ $ sudo i2cget -f 10 0x0f
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will read from device file /dev/i2c-10, chip address 0x0f, current data
address, using read byte.
Continue? [Y/n]
0x00

After that, I tried to load the tc358743 module, but nothing happened. There was no message about the video device in the dmesg.

The dtc warning is annoying but expected.

005853.887: dtwarn: overlay 'pi3-disable-bt' has been renamed 'disable-bt'

This line tells you that pi3-disable-bt has been replaced by the more generic disable-bt. At some point in the future the old name will be removed.

The fact that you are getting the same results at runtime as with the firmware-applied overlay is good because it takes one variable out of the equation. It also suggests that the patched overlay is working.

This line tells you that pi3-disable-bt has been replaced by the more generic disable-bt. At some point in the future the old name will be removed.

I've used this on our embedded systems. It is good that you warned me about this, I will release a patch that will make the necessary correction to the device configs. Thanks!

It also suggests that the patched overlay is working.

Cool. Although everything else has not yet been applied-the tc358743 module loads, but does nothing after manually applying dtoverlay.

If necessary, I can provide you with remote access to our test subject's serial console. Well, if it helps in any way.

6by9 commented

This sounds remarkably familiar to an issue I was having trying to dynamically load imx477.dtbo. The clock appears not to get created, therefore we get -E_PROBE_DEFER repeatedly, and it never loads. I was going to ask @pelwell about it when it came back to the top of my list.

It's because of the location of the clock node. Try this one:

// SPDX-License-Identifier: GPL-2.0-only
// Definitions for Toshiba TC358743 HDMI to CSI2 bridge on VC I2C bus
/dts-v1/;
/plugin/;

/{
	compatible = "brcm,bcm2835";

	fragment@0 {
		target = <&i2c_csi_dsi>;
		__overlay__ {
			#address-cells = <1>;
			#size-cells = <0>;
			status = "okay";

			tc358743@0f {
				compatible = "toshiba,tc358743";
				reg = <0x0f>;
				status = "okay";

				clocks = <&tc358743_clk>;
				clock-names = "refclk";

				port {
					tc358743: endpoint {
						remote-endpoint = <&csi1_ep>;
						clock-lanes = <0>;
						clock-noncontinuous;
						link-frequencies =
							/bits/ 64 <486000000>;
					};
				};
			};
		};
	};

	fragment@1 {
		target = <&csi1>;
		__overlay__ {
			status = "okay";

			port {
				csi1_ep: endpoint {
					remote-endpoint = <&tc358743>;
				};
			};
		};
	};

	fragment@2 {
		target = <&tc358743>;
		__overlay__ {
			data-lanes = <1 2>;
		};
	};

	fragment@3 {
		target = <&tc358743>;
		__dormant__ {
			data-lanes = <1 2 3 4>;
		};
	};

	fragment@4 {
		target = <&i2c0if>;
		__overlay__ {
			status = "okay";
		};
	};

	fragment@5 {
		target = <&i2c0mux>;
		__overlay__ {
			status = "okay";
		};
	};

	fragment@6 {
		target-path = "/";
		__overlay__ {
			tc358743_clk: bridge-clk {
				compatible = "fixed-clock";
				#clock-cells = <0>;
				clock-frequency = <27000000>;
			};
		};
	};

	__overrides__ {
		4lane = <0>, "-2+3";
		link-frequency = <&tc358743>,"link-frequencies#0";
	};
};

I'll explain more fully after lunch.

[   95.157695] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/i2c@1/status
[   95.168929] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/status
[   95.180039] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/csi@7e801000/port/endpoint/remote-endpoint
[   95.193234] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c@7e205000/status
[   95.204443] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /soc/i2c0mux/status
[   95.241861] i2c i2c-11: Added multiplexed i2c bus 0
[   95.250502] i2c i2c-11: Added multiplexed i2c bus 10
[   95.299606] link-frequencies 0 value 486000000
[   95.382201] tc358743 10-000f: tc358743 found @ 0x1e (i2c-11-mux (chan_id 1))
002906.011: arasan: arasan_emmc_open
002906.192: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003010.953: arasan: arasan_emmc_set_clock C0: 0x00000000 C1: 0x000e7187 emmc: 250000000 actual: 200000 div: 0x00000271 target: 200000 min: 100000 max: 400000 delay: 10
003011.112: arasan: arasan_emmc_set_clock C0: 0x00000f00 C1: 0x000ec2c7 emmc: 250000000 actual: 129937 div: 0x000003c2 target: 130000 min: 100000 max: 400000 delay: 15
003065.843: arasan: arasan_emmc_set_clock C0: 0x00000f06 C1: 0x000e0307 emmc: 250000000 actual: 41666666 div: 0x00000003 target: 40000000 min: 0 max: 40000000 delay: 1
003337.273: brfs: File read: /mfs/sd/config.txt
003730.192: brfs: File read: 172 bytes
003756.514: brfs: File read: /mfs/sd/config.txt
004819.307: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
004821.779: *** Restart logging
004821.817: brfs: File read: 172 bytes
004834.959: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
004835.004: HDMI0: hdmi_pixel_encoding: 300000000

004835.017: HDMI1: hdmi_pixel_encoding: 300000000

004848.091: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
004848.130: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xb494
004922.318: brfs: File read: 46228 bytes
004941.184: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
005389.254: brfs: File read: 1371 bytes
005394.628: brfs: File read: /mfs/sd/config.txt
005395.522: brfs: File read: 172 bytes
005399.510: brfs: File read: /mfs/sd/overlays/disable-bt.dtbo
005499.780: Loaded overlay 'disable-bt'
005775.165: brfs: File read: 954 bytes
005780.130: brfs: File read: /mfs/sd/cmdline.txt
005780.253: Read command line from file 'cmdline.txt':
005780.334: 'console=ttyAMA0,115200 console=tty1 root=PARTUUID=738a4d67-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cma=128M'
007663.754: brfs: File read: 130 bytes
008839.707: brfs: File read: /mfs/sd/kernel7l.img
008839.789: Loading 'kernel7l.img' to 0x8000 size 0x5b1ce0
008839.886: Device tree loaded to 0x2eff4700 (size 0xb86f)
008840.622: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
010608.229: vchiq_core: vchiq_init_state: slot_zero = 0xe6d80000, is_master = 1
010611.630: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
010618.477: TV service:host side not connected, dropping notification 0x00000002, 0x00000002, 0x00000004
015021.214: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015322.023: camsubs: Camera not found
015322.119: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015622.913: camsubs: Camera not found
015623.009: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015623.793: camsubs: Camera not found
015623.906: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015624.689: camsubs: Camera not found
015624.784: camsubs: Looking for camera 0: i2c_port = 0, led gpio = -1, power enable gpio = 133
015625.380: camsubs: Camera found OK

Also:

pi@raspberrypi:~$ i2cdetect -y 10
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- UU
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

This time the module loaded and the device started working. However, the subj problem still persists.

@6by9 Device nodes can't just be placed arbitrarily if we expect them to be recognised and cause kernel device objects to be instantiated. Children of buses are instantiated by their parents, as are all children of the root. Certain other compatible strings are searched for globally during kernel initialisation - "fixed-clock" is one of them - but this only works if the node is already present when the kernel boots; if it is added later anywhere other than on a bus (or in the root) it get ignored.

I'll look for other affected overlays - I know that the new imx477 overlay (#3605) is one of them.

6by9 commented

I hate magic stuff that you just have to know about :-(

All the CSI devices will be affected as they all had the same parentage
tc358743, adv7282, ov5647, imx219, imx477, and irs1125.
I thought I'd tried moving the clock to / and it hadn't worked. Perhaps I dropped it under the more logical /clk instead, which I guess then has the same issue.

It's logical when you think about it - each bus needs to be able to look at its children to decide how and when to initialise them - but it's unfortunate that the fixed-clocks of this world muddy the waters.

My colleague (hw engineer) and I did another check. We checked the level on this pad (I2C SDA).

image

Before the video capture process, it was high. At the moment of hanging, it becomes permanently low, and keeps at this level until the B101 is reseted by connecting RESET and GND. After resetting, the SDA again shows a high level. My colleague's hypothesis is that the I2C state machine is hanging on waiting for NACK, possibly on both sides.

At this point you could really benefit from a logic analyser, but you might get something useful from the kernels event tracing. Load the overlay in your preferred way, then before starting yavta run:

$ sudo sh -c "echo 1 >/sys/kernel/debug/tracing/events/i2c/enable"

Now run the capture, exiting with Ctrl-C in the usual way, and capture the trace events to a file:

$ sudo cp /sys/kernel/debug/tracing/trace trace_1.txt

N.B.: /sys/kernel/debug requires root privileges to access, so you won't get the usual command TAB-completion behaviour unless you are running as root.

Run this sequence 2 or 3 times (so we can verify that the trace ends the same way every time - or not), using a different output file name each time, and upload the results somewhere like pastebin, DropBox, Google Drive etc.

I was lucky. I managed to make two dumps. Usually the kernel breaks down so that even the serial terminal no longer works. The dumps turned out to be quite small. Each dump contains an EDID loading stage, getting five frames, and calling VIDIOC_STREAMOFF. I didn't perform a RESET.

In addition, I have attached another log of kernel messages.

trace_1.txt
trace_2.txt
screenlog.txt

In fact, I can go as far as using a logic analyzer, but this is currently problematic. But if there is no way out...

Thanks - they're interesting logs, identical except for some minor timing effects.

The timeout occurs after a particular write to I2C register 0x0004, known to the driver as CONFCTL. Here is a list of all of them:

i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-14-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [14-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [d4-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [d4-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [d4-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [d4-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-d7-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-d7-0c]
i2c_result: i2c-11 n=1 ret=1
i2c_result: i2c-10 n=1 ret=1
--
i2c_write: i2c-11 #0 a=00f f=0000 l=2 [00-04]
i2c_read: i2c-11 #1 a=00f f=0001 l=2
i2c_reply: i2c-11 #1 a=00f f=0001 l=2 [d7-0c]
i2c_result: i2c-11 n=2 ret=2
i2c_reply: i2c-10 #1 a=00f f=0001 l=2 [d7-0c]
i2c_result: i2c-10 n=2 ret=2
i2c_write: i2c-10 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_write: i2c-11 #0 a=00f f=0000 l=4 [00-04-d4-0c]
i2c_result: i2c-11 n=1 ret=-110
i2c_result: i2c-10 n=1 ret=-110

The addition of the mux duplicates all the activities, but it's still clear enough. The -110 is the timeout, of course.

Decoding this a bit further, the sequence of operations to CONFCTL is (again, from the driver):

read 0c14 -> write 0c14   # AUDCHNUM_2 | AUDOUTSEL_I2S | AUTOINDEX
read 0c14 -> write 0c14
read 0c14 -> write 0c14
read 0c14 -> write 0cd4   # Set YCBCRFMT_422_8_BIT
read 0cd4 -> write 0cd4
read 0cd4 -> write 0cd7   # Set ABUFEN & VBUFEN = enable_stream(true)
read 0cd7 -> write 0cd4   # Clear ABUFEN & VBUFEN = enable_stream(false)
[ timeout ]

I haven't been able to find a datasheet for the device, so my questions are:

  1. Are there restrictions on when ABUFEN and VBUFEN can be cleared?
  2. Does the device return NACK (or something) until some event has occurred such as a FIFO emptying?
  3. Might the driver shutdown order as the result of a Ctrl-C signal have changed?

I'd suggest adding a few tactical pr_err()s into enable_stream (showing the value of "enable") or even a WARN_ON(1), and other key points. Do something similar to the brcm_unicam driver as well.

6by9 commented

Just to keep you updated, the hunch is that there is something going wrong with the I2C mux.

pelwell is digging into the behaviour, whilst I'll try and sort a dtoverlay that disables it for you. It's all quite puzzling still.

Thanks! I really appreciate it.

6by9 commented

An alternate tc358743-overlay.dts to try

// SPDX-License-Identifier: GPL-2.0-only
// Definitions for Toshiba TC358743 HDMI to CSI2 bridge on VC I2C bus
/dts-v1/;
/plugin/;

/{
	compatible = "brcm,bcm2835";

	fragment@0 {
		target = <&i2c0if>;
		__overlay__ {
			#address-cells = <1>;
			#size-cells = <0>;
			status = "okay";

			tc358743@0f {
				compatible = "toshiba,tc358743";
				reg = <0x0f>;
				status = "okay";

				clocks = <&tc358743_clk>;
				clock-names = "refclk";

				port {
					tc358743: endpoint {
						remote-endpoint = <&csi1_ep>;
						clock-lanes = <0>;
						clock-noncontinuous;
						link-frequencies =
							/bits/ 64 <486000000>;
					};
				};
			};
		};
	};

	fragment@1 {
		target = <&csi1>;
		__overlay__ {
			status = "okay";

			port {
				csi1_ep: endpoint {
					remote-endpoint = <&tc358743>;
				};
			};
		};
	};

	fragment@2 {
		target = <&tc358743>;
		__overlay__ {
			data-lanes = <1 2>;
		};
	};

	fragment@3 {
		target = <&tc358743>;
		__dormant__ {
			data-lanes = <1 2 3 4>;
		};
	};

	fragment@4 {
		target = <&i2c0if>;
		__overlay__ {
			pinctrl-0 = <&i2c0_gpio44>;
			pinctrl-names = "default";
		};
	};

	fragment@5 {
		target-path = "/";
		__overlay__ {
			tc358743_clk: bridge-clk {
				compatible = "fixed-clock";
				#clock-cells = <0>;
				clock-frequency = <27000000>;
			};
		};
	};

	__overrides__ {
		4lane = <0>, "-2+3";
		link-frequency = <&tc358743>,"link-frequencies#0";
	};
};

and do NOT include dtparam=i2c_vc=on as it does cause grief (I've wasted a chunk of time on that one!)
You should get just a /dev/i2c11, and that should be configured correctly for the camera connector on a Pi4 with no magic muxing.

Checked. The problem still remains, /dev/i2c-11 exists.

On the fourth attempt I was able to make traces.
screenlog.txt
trace.txt

That's really helpful, as it means I can forget the mux locking and concentrate on the TC358743 again. But if you can attach a logic analyser then it is likely to shorten the time to a solution.

This is a bit difficult, because I personally don't have a logic analyzer, but a colleague 200 km away has one. We will try to organize this process remotely. I think we'll figure it out in an hour.

If it doesn't test your friendship with the colleague too much then that sounds like a good option, but for now I'll assume that it doesn't work out.

Well, we can always try it. I have carefully recorded all my actions, so there should be no problems with reproducing the test conditions. Unless he has a Chinese replica of B101, although it differs slightly from the original. Schematically, I mean. From the driver's point of view, they look the same.

6by9 commented

Can you try again with v4l2-ctl instead of yavta please?
yavta uses a load of VPU stuff. The fact that you're getting errors setting clocks does imply that you've crashed the VPU.

Each v4l2-ctl launch only prints "New timings found". The kernel writes a trace, but it doesn't hang. I can make a call several times, and each time there will be the same stacktraces. I recorded a single call and the I2C trace.

screenlog.txt
trace_v4l2.txt

Okay, I couldn't do it fast enough, so we'll try the logic analyzer tomorrow.

Up to now you've always spoken of exiting yavta with Ctrl-C. I notice that there is a command line option to specify the number of frames - --capture=<nframes>. Does specifying a frame count, and therefore hopefully getting an orderly shutdown rather as the result of a signal, change the behaviour?

If so, can I get an i2c trace of it?

All traces starting from this #3602 (comment) I performed with a limit on the number of frames, so that the result was the same. Each time there were exactly 5 of them.

6by9 commented

Can you drop the MMAL side out of yavta please?
./yavta --capture=1000 -n 3 -f UYVY -T
you won't get a preview, but it should capture frames and print timestamps to the screen.

The other thought is dropping out the more recent changes to bcm2835-unicam. I suspect that means building the kernel for yourself having reverted:
f79b211 media: bcm2835: unicam: Fix uninitialized warning
504574e media: bcm2835-unicam: Fix reference counting in unicam_open
563c4ad media: bcm2835-unicam: Do not stop streaming in unicam_release
51a5955 media: bcm2835-unicam: Add support for VIDIOC_[S|G]_SELECTION
0ac8108 media: bcm2835-unicam: Re-fetch mbus code from subdev on a g_fmt call
4a7398e media: bcm2835-unicam: Add support for the FRAME_SYNC event
7410f0f media: bcm2835-unicam: Disable event-related ioctls on metadata node
c9e2f8f media: bcm2835-unicam: Use dummy buffer if none have been queued
3ccd6c4 media: bcm2835-unicam: Add embedded data node.
e73ab21 media: bcm2835-unicam: Add support for mulitple device nodes.

Build instructions are at https://www.raspberrypi.org/documentation/linux/kernel/building.md

I'm in the process of modifying the overlay to use i2c-gpio instead of the hardware interface (to rule out another factor), but it's not co-operating yet.

Results of ./yavta --capture=1000 -n 3 -f UYVY -T:
yavta.txt
screenlog.txt
trace.txt

About the kernel: do I need to start doing something, or wait for the overlay? Also, no ready-made builds?

6by9 commented

So the reduced yavta died as well? It's odd as the I2C comms appear to pick up again at 270.348485 according to the i2c trace. A bundle of -ETIMEDOUT, a couple of EIO, and then back to normal. Did you try a second run?

Sorry, as we're in the process of upstreaming the driver we waited until all the cleanups could be done, and then merged it in one swoop onto 5.4. AFAIK There are no intermediate builds (there certainly aren't in rpi-update).
Wait for Phil's updated overlay first.

I'll head into the office later to scoop up all the B101's I can find in the hope that one might work. I think I've had 3 over time, so one might play ball. I'll get a couple more on order too (probably the Chinese clones as they're half the price).

Sorry about the delay - I hit the weirdest issue applying the overlay at runtime. A slight reshuffle has avoided it, but it remains a problem for another day.

Anyway, try this:

// SPDX-License-Identifier: GPL-2.0-only
// Definitions for Toshiba TC358743 HDMI to CSI2 bridge on VC I2C bus
/dts-v1/;
/plugin/;

/{
	compatible = "brcm,bcm2835";

	fragment@0 {
		target-path = "/";
		__overlay__ {
			i2c_gpio: i2c-gpio-tc358743@0 {
				compatible = "i2c-gpio";
				gpios = <&gpio 44 0 /* sda */
					 &gpio 45 0 /* scl */
					>;
				i2c-gpio,delay-us = <2>;        /* ~100 kHz */
				#address-cells = <1>;
				#size-cells = <0>;

				status = "okay";

				tc358743@0f {
					compatible = "toshiba,tc358743";
					reg = <0x0f>;
					status = "okay";

					clocks = <&tc358743_clk>;
					clock-names = "refclk";

					port {
						tc358743: endpoint {
							remote-endpoint = <&csi1_ep>;
							clock-lanes = <0>;
							clock-noncontinuous;
							link-frequencies =
								/bits/ 64 <486000000>;
						};
					};
				};
			};
		};
	};

	fragment@1 {
		target-path = "/";
		__overlay__ {
			tc358743_clk: bridge-clk {
				compatible = "fixed-clock";
				#clock-cells = <0>;
				clock-frequency = <27000000>;
			};
		};
	};

	fragment@2 {
		target = <&csi1>;
		__overlay__ {
			status = "okay";

			port {
				csi1_ep: endpoint {
					remote-endpoint = <&tc358743>;
				};
			};
		};
	};

	fragment@3 {
		target = <&tc358743>;
		__overlay__ {
			data-lanes = <1 2>;
		};
	};

	fragment@4 {
		target = <&tc358743>;
		__dormant__ {
			data-lanes = <1 2 3 4>;
		};
	};

	__overrides__ {
		4lane = <0>, "-3+4";
		link-frequency = <&tc358743>,"link-frequencies#0";
	};
};

@6by9 The kernel freezes (only prints messages to the serial console and does not respond to input), but after a few minutes it comes to, and I get to run yavta again. I attached two traces: taken after the first launch and after the second.

screenlog.txt
yavta.txt
trace1.txt
trace2.txt

[ updated the 4lane parameter with the modified fragment ]

@pelwell my logs were for the previous comment, now I will try your overlay.

6by9 commented

Prints messages to the serial console

Do you have an HDMI display attached?
(Thought process is whether it is the display side deciding it has nothing to do and shutting clocks down)

The other thought that I have remembered is that earlier versions of this driver height aligned their buffers to a multiple of 16. If you're capturing 1080p50 (as it appears you are), then 1080 is not a multiple of 16, and would previously have been rounded up to 1088.
We also have 192021080 = 4147200, or 1012.5 pages. 192021088 would have been 4177920, or 1020 pages. Is it possible a page table is going adrift?

There is a way to get V4L2 to allocate larger buffers than that specified in sizeimage. I'll see if I can dig them out and make it work.

Do you have an HDMI display attached?

No, sorry. I don't have an adapter.

Okay, here we go. Here the dmesg about the device:

[    5.364606] gpio-44 (i2c-gpio-tc358743@0): enforced open drain please flag it properly in DT/ACPI DSDT/board file
[    5.364705] gpio-45 (i2c-gpio-tc358743@0): enforced open drain please flag it properly in DT/ACPI DSDT/board file
[    5.370188] i2c-gpio i2c-gpio-tc358743@0: using lines 44 (SDA) and 45 (SCL)
[    6.373303] tc358743 11-000f: tc358743 found @ 0x1e (i2c-gpio-tc358743@0)

On v4l2-ctl --stream-mmap=3 --stream-to=/dev/null --stream-count=100 only "New timings found". Also I still get stacktraces to the console, but now the kernel doesn't freeze.

When I try to use ./yavta --capture=1000 -n 3 -f UYVY -T /dev/video0 after this, I get a hung process that doesn't respond to ctrl+c. But the kernel doesn't freeze.

pi@raspberrypi:~$ ps aux | grep yavta
pi         601  0.8  0.0      0     0 pts/0    D+   11:29   0:00 [yavta]

screenlog.txt
terminal.txt
trace.txt

I also tried my ustreamer which uses the V4L2 API. I tried starting and stopping this several times. This worked and even gave a picture, but it seems to have broken the definition of DV timings. I kept getting a resolution of 640x480 with an incorrect picture, although I expected 1920x1080. After a few launches, the core broke down.

./ustreamer --host 0.0.0.0 \
    --format=uyvy \ # Device input format
    --encoder=cpu \ # CPU encoding
    --workers=3 \ # Maximum workers
    --persistent \ # Don't re-initialize device on timeout (for example when HDMI cable was disconnected)
    --dv-timings \ # Use DV-timings
    --drop-same-frames=30 \ # Save the traffic

image

screenlog.txt
The trace could not be maked because the kernel died.

Ad break: I wrote this thing specifically for B101. It is able to stream MJPG very quickly, can use multithreaded compression and/or RPi's GPU using OMX. I managed to achieve 24 fps for FullHD. Well, if someone needs it.

Could I get a trace of a 5 frame capture with yavta and the i2c-gpio-based overlay?

I mean, without running v4l2-ctl first.

That appears to have shut down successfully, or have I missed something?

6by9 commented

@naushir
How sure are we of Unicam stopping at the end address? You changed the mode it was in with your recent patches from stop at end of buffer to wrap around? (Added UNICAM_IBOB). Can you remind me why?
If Unicam has been configured for 640x480 (because that's what has been reported), but the frame is actually 1920x1080, are we sure that it doesn't check the end address at the end of the current line, which is now going to be off the end of the buffer.

@pelwell yes. The problem has become irregular. After 10 or 20 launches, I may get a crash.

Is that a change since switching to the i2c-gpio overlay?

Yes

If you do manage to capture a trace during a crash then I'd like to see it.

@naushir
How sure are we of Unicam stopping at the end address? You changed the mode it was in with your recent patches from stop at end of buffer to wrap around? (Added UNICAM_IBOB). Can you remind me why?

We had used warp around in the firmware driver in all our production code, I do not recall it causing us any problems. Wrapping was also used for the N8 RMI stuff, but the peripheral had changed a bit since then.

We are also currently using the wrapping behaviour by spinning in a dummy buffer (not sized to the image) if we have no buffers queued by userland - this should also go wrong if we have a problem in Unicam with IBOB.

If I recall, I switched to IBOB because (ironically) I was seeing problems the original behaviour where I suspected it was writing past the end of the buffer.

If Unicam has been configured for 640x480 (because that's what has been reported), but the frame is actually 1920x1080, are we sure that it doesn't check the end address at the end of the current line, which is now going to be off the end of the buffer.

I would have expected it to do the address test on AXI burst granularity and not lines, but I cannot verify this.

@pelwell I tried to do this, but every time I try, the kernel freezes completely, so I can't get a dump. I can only give you stacktrace.
screenlog.txt

The stacktrace shows that it's stuck in unicam_stop_streaming, and it might even be doing the same I2C write. I'm surprised it isn't showing up as a timeout, but that could be something to do with it being a regular output rather than an open-drain output.

6by9 commented

I have been to the office and have one B101 that I suspect will be dead, but also one dev board that I have hope for. Hopefully I'll be able to reproduce this now.

6by9 commented

It looks like I have a result. One of my dead B101s was the FFC kinked and broken. Replaced the FFC (why does it have to be contacts on the same side, when all Pi cameras and front/rear?!) and I've just streamed from it at 720p60. Ctrl-C and it has locked up!

Now I can investigate it!

why does it have to be contacts on the same side, when all Pi cameras and front/rear

As far as I remember, Auvidea did this for compatibility with some of its other boards. Note that Chinese copies use the usual loops from RPi cameras.

Now I can investigate it!

Great!

Let me know if you still need the logic analyzer output.

6by9 commented

After a day of two of us digging into I2C issues, it turned out not to be.
The Unicam shutdown procedure in unicam_stop_streaming is wrong in that it stops servicing all interrupts, and then tells the source and hardware to stop streaming. Any interrupt that occurs between the two causes a lockup :-(

Why it doesn't happen on a Pi3 is a bit of a mystery. There is a difference in the I2C config there, but I don't see why that would avoid the bug in the Unicam driver. This is new code in 5.4 (for libcamera support), so that explains why it affects 5.4 but not 4.19.

The fix is to drop the check for streaming or not in unicam_isr (https://github.com/raspberrypi/linux/blob/rpi-5.4.y/drivers/media/platform/bcm2835/bcm2835-unicam.c#L810). I'll sort out a PR, and it should be in the next rpi-update build.

But a bonus is that there was a potential I2C problem we'd not spotted, and probably wouldn't have without digging into this.

Then I just have to wait for the update. Please let me know when it is ready, and I will test it immediately.

By the way, can I ask one quick question not related to this issue, but related to B101? I noticed that this resolutions don't work correctly: 720x480 or 720x576 (I don't remember which one is available) from EDID at yavta. The image fast jumps up and down. A more minor problem is observed at the 800x600 resolution: a line of green pixels flickers at the very bottom of the screen.

Can this have something to do with the unicam buffers you wrote about above? The problem is observed on all kernels.

6by9 commented

By the way, can I ask one quick question not related to this issue, but related to B101? I noticed that this resolutions don't work correctly: 720x480 or 720x576 (I don't remember which one is available) from EDID at yavta. The image fast jumps up and down. A more minor problem is observed at the 800x600 resolution: a line of green pixels flickers at the very bottom of the screen.

No, I suspect this is the FIFO trigger level in the TC358743.

You have HDMI data coming in at whatever pixel rate is determined by the mode. Data out over CSI2 is at a fixed 972Mbit/s/lane, but with a dynamically switchable number of lanes between 1 and 4.
The data is read into a FIFO, and one of the register settings is the fill level on that FIFO for when to start transmitting out over CSI2. Too high and the FIFO overflows on high input pixel rates. Too low and the FIFO underflows on low input pixel rates.

Ideally the value should be dynamically adjusted, but the calculations behind it are pretty horrid, and trying to make it cover all resolutions hasn't been addressed. Previous users only worried about 720p60 (over 2 lanes) and 1080p60 (over 4 lanes) and at a lower link frequency than we run at. Toshiba produce a big spreadsheet to compute all the values from (and tell you when they're invalid), but it's under NDA so we can't release it, nor really code which replicates it.

There is a patch (aecd105) that increases it from 374 to 16 to cover 1080p modes on 2 lanes. You could try tweaking it and rebuilding the kernel.

Previous users only worried about 720p60 (over 2 lanes) and 1080p60 (over 4 lanes)

Well, they probably didn't try to use B101 to configure the BIOS and reinstall OS on a server. Of course, the problem is not critical, since the 800x600 works passably, and the 720x... sees rarely. But the perfectionist in me wants to find a solution.

Ideally the value should be dynamically adjusted

Okay, I think I understand. FIFO is in the chip itself, right? And since I don't have a spreadsheet, I'll have to pick a parameter at random?

6by9 commented

Yes, the FIFO is in the TC358743.

Small FIFO trigger values are good for roughly matched data rates, or CSI speed lower than HDMI rate (otherwise the FIFO overflows).
Large FIFO trigger values are needed for faster CSI data rate than HDMI data rate, otherwise the FIFO is emptied too soon.

You can also load the DT overlay with dtoverlay=tc358743,link-frequency=297000000 to drop the link speed from 972Mbit/s/lane (486MHz) to 574Mbit/s/lane (297MHz link speed). It was at the lower speed that a FIFO trigger of 16 was used for 720p60 and 1080p60.

Sup. Do you need my help in testing? What release should the fix be included in?

6by9 commented

https://github.com/Hexxeh/rpi-firmware/commits/master
Hexxeh/rpi-firmware@7a4e85f

kernel: media: bcm2835-unicam: Always service interrupts
See: raspberrypi/linux#3608

So using rpi-update at any point since 20th May would have got you the fix.

Everything seems to be working on Raspbian with latest kernel obtained using rpi-update. But I have another problem on the same kernel in Arch Linux ARM that I can't reproduce on Raspbian yet. When I try to stream using ustreamer I get a crash in dmesg:

[  148.617613] 8<--- cut here ---
[  148.620716] Unhandled fault: unknown 3 (0x2a03) at 0xffeee000
[  148.626514] pgd = 21038de6
[  148.629239] [ffeee000] *pgd=80000000007003, *pmd=2ffbe003, *pte=c00fff54e3e71f
[  148.636518] Internal error: : 2a03 [#1] PREEMPT SMP ARM
[  148.641779] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial btsdio bluetooth ecdh_generic ecc brcmfmac brcmutil cfg80211 raspberrypi_hwmon hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam i2c_bcm2835 bcm2835_v4l2(C) bcm2835_codec(C) bcm2835_isp(C) v4l2_mem2mem iproc_rng200 rng_core bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common rpivid_mem uio_pdrv_genirq uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[  148.698262] CPU: 1 PID: 155 Comm: vchiq-slot/0 Tainted: G         C        5.4.42-1-ARCH #1
[  148.706633] Hardware name: BCM2711
[  148.710048] PC is at b15_dma_inv_range+0x20/0x50
[  148.718599] LR is at dma_cache_maint_page+0xa8/0x150
[  148.727498] pc : [<c0219d6c>]    lr : [<c0215bb0>]    psr: 000b0013
[  148.737726] sp : eeb67db8  ip : 00400fff  fp : c1404f98
[  148.746883] r10: 00000030  r9 : 00000002  r8 : c1555440
[  148.756068] r7 : c1407c84  r6 : fff54e3e  r5 : 00000000  r4 : 00000020
[  148.766656] r3 : 0000003f  r2 : 00000040  r1 : ffeef000  r0 : ffeee000
[  148.777100] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  148.788171] Control: 30c5383d  Table: 204a5580  DAC: 55555555
[  148.797915] Process vchiq-slot/0 (pid: 155, stack limit = 0x41d87e62)
[  148.808346] Stack: (0xeeb67db8 to 0xeeb68000)
[  148.816732] 7da0:                                                       eeb67dfc c0219e14
[  148.832977] 7dc0: ffeee000 ffffc4dc ef085c00 ed6a3ba0 00000fff ef138810 00000fe0 00000020
[  148.849495] 7de0: 00000000 e6580194 e6545000 c0216114 c0219e14 00000000 54e3e020 00000fff
[  148.866438] 7e00: ef138810 00000000 00000002 c029d728 00000fe0 00000002 eeb67e64 c0cedc28
[  148.883883] 7e20: 54e3e020 00000fff ef138810 c029d8f8 00000fe0 00000002 ef0da000 ef0d2d00
[  148.901904] 7e40: 9a48a783 e6545020 00000001 00000003 ef138810 c029d940 00000fe0 00000002
[  148.920474] 7e60: 00000000 c0ce842c 00000002 e6545068 00002540 e6545014 c15b1d38 c14a78f8
[  148.939531] 7e80: 00000003 c0b66f60 00000000 c0b5b8dc 00000004 c1591848 400b0013 c1591848
[  148.959039] 7ea0: 200b0013 c159178c c15917f4 e18634cc 00000008 e1863400 e6585b28 c14a78f8
[  148.978971] 7ec0: e1863570 e6580194 c15917f4 c0b60ff4 eeb67f1c c0cedc28 ef668880 c0255080
[  148.999580] 7ee0: ffffffff c1400000 00000000 00000002 00000010 00000009 0000000e 00003b38
[  149.020752] 7f00: c10d7804 e6580020 c15b1b34 c1591b30 e6580194 c1591814 c1591884 c0e8bf80
[  149.042467] 7f20: c10d8b58 c10d8b2c c1591848 c10d85cc c10d8718 e18634e0 c15918ec c10650a8
[  149.064604] 7f40: c10d86e4 c10d86c0 00000000 ef399e00 c0273544 eeb67f54 eeb67f54 c1404f88
[  149.087150] 7f60: ef0b1d04 eeafb440 eeb61700 00000000 eeb66000 c15917f4 c0b60094 eeafb45c
[  149.110327] 7f80: ef0b1d04 c024ec5c 00000000 eeb61700 c024eb34 00000000 00000000 00000000
[  149.134033] 7fa0: 00000000 00000000 00000000 c02010d8 00000000 00000000 00000000 00000000
[  149.157814] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  149.181586] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  149.205350] [<c0219d6c>] (b15_dma_inv_range) from [<c0215bb0>] (dma_cache_maint_page+0xa8/0x150)
[  149.229387] [<c0215bb0>] (dma_cache_maint_page) from [<c0216114>] (__dma_page_dev_to_cpu+0x2c/0xc4)
[  149.253280] [<c0216114>] (__dma_page_dev_to_cpu) from [<c029d728>] (dma_direct_sync_single_for_cpu+0xbc/0xc0)
[  149.277603] [<c029d728>] (dma_direct_sync_single_for_cpu) from [<c029d8f8>] (dma_direct_unmap_page+0xb4/0xb8)
[  149.301254] [<c029d8f8>] (dma_direct_unmap_page) from [<c029d940>] (dma_direct_unmap_sg+0x44/0x60)
[  149.323356] [<c029d940>] (dma_direct_unmap_sg) from [<c0b66f60>] (vchiq_complete_bulk+0x1a4/0x260)
[  149.344992] [<c0b66f60>] (vchiq_complete_bulk) from [<c0b60ff4>] (slot_handler_func+0xf60/0x1560)
[  149.366167] [<c0b60ff4>] (slot_handler_func) from [<c024ec5c>] (kthread+0x128/0x154)
[  149.385806] [<c024ec5c>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  149.404578] Exception stack(0xeeb67fb0 to 0xeeb67ff8)
[  149.415213] 7fa0:                                     00000000 00000000 00000000 00000000
[  149.434245] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  149.453142] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  149.465086] Code: e1a02312 e2423001 e1100003 e1c00003 (1e070f3e) 
[  149.476410] ---[ end trace ee601eaafd00e3bb ]---
[  149.486095] note: vchiq-slot/0[155] exited with preempt_count 1

Both OS has kernel 5.4.42, the latest firmare and 369ed4e44cb5a080a2dfa7f854ae4ff46b7c9ef9 version by vgencmd.

This problem doesn't seem to be directly related to tc358743. But before I figure out the exact conditions for reproducing the problem and create a separate issue, maybe this kernel trace tell you anything?

A bit of investigation: the Arch kernel is built from a commit: 5e5024f

rpi-update in Raspbian delivers this kernel (and everything works fine): 971a2bb

In other words, the Arch kernel is slightly newer, and contains two commits that could have caused the problem. At least they are about DMA, and that's what my stacktrace is all about: d8cbdaa and 79495a5.

6by9 commented

Seeing as this is now totally unrelated to tc358743, you really should be opening a new issue.

Created #3647.

PS: Thanks for your work!

@6by9 I seem to have found another related problem with unicam. All the same TC357743 stopped working on ZeroW with 5.4 kernel. Just trying to start a stream and getting this:

[  197.603129] kernel BUG at drivers/media/platform/bcm2835/bcm2835-unicam.c:687!
[  197.610522] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[  197.616488] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial 8021q garp stp mrp llc brcmfmac brcmutil cfg80211 raspberrypi_hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) i2c_bcm2835 v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common bcm2835_rng rng_core uio_pdrv_genirq uio fixed sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[  197.671736] CPU: 0 PID: 588 Comm: stream Tainted: G         C        5.4.45-1-ARCH #1
[  197.679733] Hardware name: BCM2835
[  197.683257] PC is at unicam_wr_dma_addr+0x60/0x64 [bcm2835_unicam]
[  197.689591] LR is at unicam_start_streaming+0x4c0/0x8ec [bcm2835_unicam]
[  197.696431] pc : [<7f272ab8>]    lr : [<7f273294>]    psr: 80000113
[  197.702831] sp : 91ec9d98  ip : 00000001  fp : 7f275378
[  197.708166] r10: 00000100  r9 : 00000122  r8 : 91ec9da8
[  197.713501] r7 : 921802f0  r6 : 00000000  r5 : 00000000  r4 : 92180040
[  197.720163] r3 : 00000000  r2 : 53696000  r1 : 53600000  r0 : 92180088
[  197.726828] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  197.737532] Control: 00c5387d  Table: 11ecc008  DAC: 00000055
[  197.746941] Process stream (pid: 588, stack limit = 0xb22e5c17)
[  197.756593] Stack: (0x91ec9d98 to 0x91eca000)
[  197.764555] 9d80:                                                       20000013 00000005
[  197.779776] 9da0: 00000400 53600000 00000000 81204f88 921803c0 92180408 92180508 7f168bfc
[  197.795076] 9dc0: 00000001 923c2900 92180548 7f208664 92180408 40045612 00000000 7f168bfc
[  197.810739] 9de0: 00000001 7f209adc 92180560 7f16d2f8 92310240 7586a000 fff9e02c 923cf880
[  197.826669] 9e00: 00000000 91ec9e6c 00000001 7f1809e8 92310300 00013800 00000000 00096000
[  197.842977] 9e20: 757d4000 81204f88 91e91e00 40045612 00000000 00000004 00000000 00000000
[  197.859596] 9e40: 91ec9e6c 7f16d0d4 00000001 7f16d8f8 00000001 00000000 76abdd30 923c2900
[  197.876585] 9e60: 00000000 00000000 812b62c0 00000001 00000002 00000000 00000001 00000000
[  197.893892] 9e80: 8134f800 923102a0 923105a0 7557a000 00000000 81183313 81183304 80296704
[  197.911518] 9ea0: 93170194 00000000 00000000 93170194 0000004d 80284164 0000004d 8011a04c
[  197.929558] 9ec0: 00000000 802afb20 00000000 802b5590 91eac134 00000000 00000000 923102a0
[  197.948026] 9ee0: 00000055 00000cc0 0007544d 81204f88 91ecdd50 76abdd30 92fbcb18 923c2900
[  197.966921] 9f00: 76abdd30 923c2900 91ec8000 00000036 00000008 8030a4f8 0007544d 91ec9fb0
[  197.986147] 9f20: 7544d000 00000817 91e91e00 915fb000 923102a0 915fb040 00000055 80c145a8
[  198.005680] 9f40: 00000000 00000000 0012d000 00000003 923c2900 00000008 00000001 00004000
[  198.025631] 9f60: 920f1500 91ec8000 00000036 81204f88 923c2901 00000008 40045612 76abdd30
[  198.046025] 9f80: 923c2900 91ec8000 00000036 8030a9b0 00000000 00000004 00000001 00000036
[  198.066402] 9fa0: 80101204 80101000 00000000 00000004 00000008 40045612 76abdd30 015723f8
[  198.086780] 9fc0: 00000000 00000004 00000001 00000036 40045612 00470038 76abdd30 00000008
[  198.107225] 9fe0: 0046fe08 76abdd0c 00429bc4 76cb1f0c 20000010 00000008 00000000 00000000
[  198.127836] [<7f272ab8>] (unicam_wr_dma_addr [bcm2835_unicam]) from [<7f273294>] (unicam_start_streaming+0x4c0/0x8ec [bcm2835_unicam])
[  198.152277] [<7f273294>] (unicam_start_streaming [bcm2835_unicam]) from [<7f208664>] (vb2_start_streaming+0x5c/0x16c [videobuf2_common])
[  198.176525] [<7f208664>] (vb2_start_streaming [videobuf2_common]) from [<7f209adc>] (vb2_core_streamon+0x7c/0x14c [videobuf2_common])
[  198.200487] [<7f209adc>] (vb2_core_streamon [videobuf2_common]) from [<7f16d2f8>] (__video_do_ioctl+0x224/0x44c [videodev])
[  198.223269] [<7f16d2f8>] (__video_do_ioctl [videodev]) from [<7f16d8f8>] (video_usercopy+0x28c/0x6c8 [videodev])
[  198.244702] [<7f16d8f8>] (video_usercopy [videodev]) from [<8030a4f8>] (do_vfs_ioctl+0x388/0x80c)
[  198.264259] [<8030a4f8>] (do_vfs_ioctl) from [<8030a9b0>] (ksys_ioctl+0x34/0x60)
[  198.281989] [<8030a9b0>] (ksys_ioctl) from [<80101000>] (ret_fast_syscall+0x0/0x4c)
[  198.299739] Exception stack(0x91ec9fa8 to 0x91ec9ff0)
[  198.309773] 9fa0:                   00000000 00000004 00000008 40045612 76abdd30 015723f8
[  198.327672] 9fc0: 00000000 00000004 00000001 00000036 40045612 00470038 76abdd30 00000008
[  198.345444] 9fe0: 0046fe08 76abdd0c 00429bc4 76cb1f0c
[  198.355297] Code: e12fff1e e1a0cf22 e35c0003 0affffeb (e7f001f2)
[  198.366097] ---[ end trace 9cbe57c15d3f6770 ]---

Everything worked fine on 4.19.
/boot/cmdline.txt:

root=/dev/mmcblk0p2 ro cma=64M rootwait console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop audit=0

/boot/config.txt:

initramfs initramfs-linux.img followkernel
hdmi_force_hotplug=1
gpu_mem=64
start_x=1
enable_uart=1
dtoverlay=tc358743,i2c_pins_28_29=1
dtoverlay=disable-bt
dtoverlay=dwc2

This would have been triggered by BUG_ON((dmaaddr >> 30) != 0x3 && (endaddr >> 30) != 0x3);

@6by9, is my assumption about the dma address returned incorrect?

6by9 commented

No, you're quite right that we need the 0xC alias. This would imply something wrong in device-tree again.

@pelwell I'm assuming dma aliases are the same on Pi0/1 as on PI 2/3/4 and use the uncached alias, otherwise how do the other peripherals work?