brgl/libgpiod

Why would libgpiod not detect rising edges on some pins, when sysfs was able to?

Closed this issue · 15 comments

I am working to upgrade a sysfs event polling rising edge detection implementation to libgpiod to make use of the kernel event timestamping.

In testing I found that the gpiod is not able to detect rising edge's on my pins for some reason. I verified this was not a usage issue in my code by testing with gpiomon in command line. gpiomon was also not detecting rising edges, although it did detect falling edges.

Relevant info:

  • kernel 4.19.94-ti-r73
  • libgpiod 1.4.1
  • am33xx processor
  • debian-buster OS

Device Tree is a combination of these:

https://github.com/RobertCNelson/dtb-rebuilder/blob/4.19.x/src/arm/am33xx.dtsi
https://github.com/RobertCNelson/dtb-rebuilder/blob/4.19.x/src/arm/am335x-osd335x-common.dtsi
and this:

&am33xx_pinmux {
    ecap0_pins_default: ecap0_pins_default {
        pinctrl-single,pins = <
            AM33XX_IOPAD(0x964, PIN_INPUT | MUX_MODE0) /* gpio0[7] */
        >;
    };
    gpio3_pins_default: gpio3_pins_default {
        pinctrl-single,pins = <
            AM33XX_IOPAD(0x99c, PIN_INPUT | MUX_MODE7) /* gpio3[17] */
        >;
    };
};

When I run gpiomon on one pin I get both rising and falling edges:

gpiomon gpiochip0 7
event:  RISING EDGE offset: 7 timestamp: [1709072918.200972377]
event: FALLING EDGE offset: 7 timestamp: [1709072918.700979793]
event:  RISING EDGE offset: 7 timestamp: [1709072919.200997043]
event: FALLING EDGE offset: 7 timestamp: [1709072919.701002002]
event:  RISING EDGE offset: 7 timestamp: [1709072920.201015460]
event: FALLING EDGE offset: 7 timestamp: [1709072920.701029335]
event:  RISING EDGE offset: 7 timestamp: [1709072921.201066377]

But When I run it on another I get only falling edges:

gpiomon gpiochip3 17
event: FALLING EDGE offset: 17 timestamp: [1709073108.897098295]
event: FALLING EDGE offset: 17 timestamp: [1709073109.897153795]
event: FALLING EDGE offset: 17 timestamp: [1709073110.897172545]
event: FALLING EDGE offset: 17 timestamp: [1709073111.897198629]
event: FALLING EDGE offset: 17 timestamp: [1709073112.897222629]
event: FALLING EDGE offset: 17 timestamp: [1709073113.897253671]

/sys/kernel/debug/gpio contents:

gpiochip0: GPIOs 0-31, parent: platform/44e07000.gpio, gpio-0-31:
gpio-7   (ECAP0_IN_PWM0_OUT   |gpiomon             ) in  hi IRQ
gpiochip3: GPIOs 96-127, parent: platform/481ae000.gpio, gpio-96-127:
gpio-113 (MCASP0_AHCLKR       |gpiomon             ) in  lo IRQ

Running a sysfs based monitor on either pin is able to detect both rising and falling edges.

I suspect this is either a kernel issue in my kernel version or a device tree issue, but I can't figure it out. The part that really throws me is that the sysfs based gpio event polling works fine on the same pins.

Note: I realize it would be great to update the kernel/OS versions, but unless I can only do that if I have a specific fix to point at to move those updates up on the development pipeline.

brgl commented

@DanFTRX this is a really ancient release of both the kernel and libgpiod. Any chance you could verify if the issue persists on a more recent release? Because that would be my first step, not debugging an issue that may have long been fixed since.

Also: @warthog618 any idea at first glance?

At the moment, no I can't. This development effort was meant to be a stopgap effort used prior to a later scheduled update of the OS/Kernel for this system.

I reached out here because I was hoping it might be something immediately obvious to those familiar with the technology, I suspect its likely not an issue with libgpiod itself (even on the older version). Possibly an issue with the older kernel, and maybe more likely an issue with my device tree (Although I don't know why it would work with sysfs if this were the case).

I understand this may mean you cannot support this issue.

@DanFTRX Firstly, it is libgpiod, not libgpio. Doesn't matter much here, but there are libgpios out there so this is a source of confusion in other settings.

Also: @warthog618 any idea at first glance?

I've already replied to this on SO, basically saying the same thing you have - most likely an issue with hardware or kernel, ideally move to more recent releases and see if that helps.

One thing of note is the huge rate difference between the good and bad examples. What is the reason for that? The faster rate in the bad example could elicit event buffer overflow, but no idea why that would selectively filter rising edges. v2 sequence numbers would help eliminate that possibility, but that is off the table given the old kernel.

It is also on a different chip - maybe the driver for gpiochip3 is buggy?
How does the "sysfs based monitor" work? Is that literally polled, or does it poll() the edge file? If the latter then that would indicate that the driver works and the problem is somewhere in the GPIO subsystem.

What do the different MUX_MODEs mean for those pins?

At the moment, no I can't. This development effort was meant to be a stopgap effort used prior to a later scheduled update of the OS/Kernel for this system.

Earlier you indicated it was to get kernel timestamps. I would suggest skipping over libgpiod v1 and going direct to v2, if the later kernel update allows, as the two versions have different APIs and v1 is deprecated.

@DanFTRX Firstly, it is libgpiod, not libgpio. Doesn't matter much here, but there are libgpios out there so this is a source of confusion in other settings.

Corrected

I've already replied to this on SO, basically saying the same thing you have - most likely an issue with hardware or kernel, ideally move to more recent releases and see if that helps.

The tough part here is I am using a hardware specific offshoot of the linux kernel, of which I am on the latest release of. And short of updating to their 5.10.X offshoot (Which is a larger development effort), or personally updating and testing their patches against the latest 4.19 mainline I am stuck.

One thing of note is the huge rate difference between the good and bad examples. What is the reason for that? The faster rate in the bad example could elicit event buffer overflow, but no idea why that would selectively filter rising edges. v2 sequence numbers would help eliminate that possibility, but that is off the table given the old kernel.

I'm not sure what you are referring to here? Both pins are transitioning approximately once per second.

It is also on a different chip - maybe the driver for gpiochip3 is buggy? How does the "sysfs based monitor" work? Is that literally polled, or does it poll() the edge file? If the latter then that would indicate that the driver works and the problem is somewhere in the GPIO subsystem.

Yes I am referring to setting the "edge" attribute to "rising" and running poll() on the value file as described here https://www.[kernel.org/doc/Documentation/gpio/sysfs.txt](https://www.kernel.org/doc/Documentation/gpio/sysfs.txt)

What do the different MUX_MODEs mean for those pins?

The MUX_MODEs for the working one is setting it as an ECAP pin, which for the purposes of this should be identical to a plain gpio, and the non-working one is being set as just a plain gpio as opposed to being configured as a mcasp pin.

At the moment, no I can't. This development effort was meant to be a stopgap effort used prior to a later scheduled update of the OS/Kernel for this system.

Earlier you indicated it was to get kernel timestamps. I would suggest skipping over libgpiod v1 and going direct to v2, if the later kernel update allows, as the two versions have different APIs and v1 is deprecated.

Yes, I will be upgrading this system to kernel 5.x + libgpiod v2 in the future, but I was investigating using libgpiod v1 as it exists in my system as it is right now to use kernel timestamping as a stopgap implementation prior to that larger kernel/OS update effort.

This investigation led to me finding this unexpected behavior difference regarding the rising edges for these pins between the sysfs poll() system and the libgpiod v1 implementation. I was hoping it might be something immediately obvious like perhaps the kernel gpio uAPI needing more specific device tree configuration versus the sysfs API.

I'm not sure what you are referring to here? Both pins are transitioning approximately once per second.

Ah - ignore me. I was focused at the nanos in the bad example and overlooked the seconds incrementing.

Yes I am referring to setting the "edge" attribute to "rising" and running poll() on the value file as described here

Ok, it is definitely odd that sysfs is seeing rising edges but cdev isn't, and I would have to take a look through the kernel code to see if I can find a reason for that - AFAIAA they should be equivalent. Both your kernel and libgpiod are from before my time, so I have no idea what the code looked like back then. Are there any changes to the gpio subsystem in your branch that you are aware of, or is it as per the mainline 4.19? Can you provide a link to the branch you are using?

Wrt the libgpiod version, is updating to the more recent v1.6.3 a possibility, or are you stuck with v1.4.1?

@warthog618 I can probably update to 1.6.3 libgpiod. As for the kernel I am using: https://openbeagle.org/beagleboard/linux/-/tree/4.19.94-ti-r74

Looks like the latest v1 is now v1.6.4, though a lot of the distros are still on v1.6.3, which I suppose is why I incorrectly think of that being the latest v1.

Ok, that kernel branch (specifically looking at gpiolib.c) looks to be v4.19 with some backports applied. Nothing odd there. But there is one concerning patch titled "hack: gpiolib: yes we have drivers stomping on each other..." that appears to allow multiple requests on the same line. That isn't good.

Perhaps that line is also being used by a driver?

Any chance you could remove that patch and see what happens?

Ok, that kernel branch (specifically looking at gpiolib.c) looks to be v4.19 with some backports applied. Nothing odd there. But there is one concerning patch titled "hack: gpiolib: yes we have drivers stomping on each other..." that appears to allow multiple requests on the same line. That isn't good.

Perhaps that line is also being used by a driver?

Any chance you could remove that patch and see what happens?

Oh that is a concerning patch. You are suggesting maybe that a driver is consuming all the rising edge events on that line and this patch has made it such that instead of receiving a "Resource Busy" when requesting that line we just get the unconsumed falling edge events? That sounds like a plausible theory. I'll give it a go at removing the patch, but it's going to be a couple days. I'll post here when I have results.

You are suggesting maybe that a driver is consuming all the rising edge events on that line and this patch has made it such that instead of receiving a "Resource Busy" when requesting that line we just get the unconsumed falling edge events?

Yup, the DT associates line 17 with mcasp0 (MCASP0_AHCLKR), so I'm thinking its irq handler is stealing the rising edge interrupts. The difference between the sysfs and cdev interrupt handling is that the sysfs handles everything in the irq handler, while cdev timestamps the event in the irq handler and defers the rest of the processing to an irq thread. I'm guessing that the mcasp0 irq handler marks the rising edge interrupts as handled so the cdev irq thread is cancelled for those edges.

Not something we expect to see as normally you would get EBUSY when requesting a used line.

If you aren't using mcasp0 then completely disable it, otherwise you need to find another line.

Hmm, unfortunately mcasp0 is already disabled, so I don't think that pans out. But I had another thought though now that I know where all these kernel drivers are:
I noticed the SysFs implementation (https://openbeagle.org/beagleboard/linux/-/blob/4.19.94-ti-r74/drivers/gpio/gpiolib-sysfs.c#L158) just handles the IRQ and immediately notifies.

Whereas the gpiolib implementation (https://openbeagle.org/beagleboard/linux/-/blob/4.19.94-ti-r74/drivers/gpio/gpiolib.c#L827) first gets the new value to determine if its a rising or falling edge event. If the gpio signal is short enough (say 20us) is that going to toggle back to low before that get value call thereby causing these symptoms? Thats a big difference between my working and non-working pins. One is a much longer signal.

Any other differences between the two cases we should know about?

Hmmm, the cdev interrupt handling has been changed a little since 4.19.
I was going to suggest you only look for rising edges using cdev, but looking at the 4.19 code, that always checks the current level and will filter out short pulses.
Since 4.19 that has been changed to ignore the value if you are only looking for one type of edge. The value is only used to distinguish between rising and falling edges when both are requested.

Do you need both edges, or just rising? If just rising then you could backport that change.

Any other differences between the two cases we should know about?

I don't think so, my apologies it had not occurred to me as a possible cause until now.

Hmmm, the cdev interrupt handling has been changed a little since 4.19. I was going to suggest you only look for rising edges using cdev, but looking at the 4.19 code, that always checks the current level and will filter out short pulses. Since 4.19 that has been changed to ignore the value if you are only looking for one type of edge. The value is only used to distinguish between rising and falling edges when both are requested.

Do you need both edges, or just rising? If just rising then you could backport that change.

I do only need the rising edge, so I think this is probably my root cause and I can backport the changes to not read the current state when only listening for one edge.

I will run some tests tomorrow with a longer signal to confirm.

Btw, the change I'm referring to is fa38869b0161 "gpiolib: Don't support irq sharing for userspace" from 20/08/2018 that was merged into 4.20.

I have verified that the kernel patch fixes my issue. Thank you for the help this was a tough one and I really appreciate the help even with my constraints of being on an older baseline.