[Linux-kernel-mentees] Any other ways to debug GPIO interrupt controller (pinctrl-amd) for broken touchpads of a new laptop model?

Coiby Xu coiby.xu at gmail.com
Thu Oct 15 03:27:04 UTC 2020


On Wed, Oct 14, 2020 at 01:46:14PM +0200, Hans de Goede wrote:
>Hi,
>
>On 10/14/20 6:24 AM, Coiby Xu wrote:
>>On Tue, Oct 06, 2020 at 11:29:40AM +0200, Hans de Goede wrote:
>>>
>>>
>>>On 10/6/20 11:28 AM, Hans de Goede wrote:
>>>>Hi,
>>>>
>>>>On 10/6/20 10:55 AM, Hans de Goede wrote:
>>>>>Hi,
>>>>>
>>>>>On 10/6/20 10:31 AM, Coiby Xu wrote:
>>>>>>On Tue, Oct 06, 2020 at 08:28:40AM +0200, Hans de Goede wrote:
>>>>>>>Hi,
>>>>>>>
>>>>>>>On 10/6/20 6:49 AM, Coiby Xu wrote:
>>>>>>>>Hi Hans and Linus,
>>>>>>>>
>>>>>>>>I've found the direct evidence proving the GPIO interrupt controller is
>>>>>>>>malfunctioning.
>>>>>>>>
>>>>>>>>I've found a way to let the GPIO chip trigger an interrupt by accident
>>>>>>>>when playing with the GPIO sysfs interface,
>>>>>>>>
>>>>>>>> - export pin130 which is used by the touchad
>>>>>>>> - set the direction to be "out"
>>>>>>>> - `echo 0 > value` will trigger the GPIO controller's parent irq and
>>>>>>>>   "echo 1 > value" will make it stop firing
>>>>>>>>
>>>>>>>>(I'm not sure if this is yet another bug of the GPIO chip. Anyway I can
>>>>>>>>manually trigger an interrupt now.)
>>>>>>>>
>>>>>>>>I wrote a C program is to let GPIO controller quickly generate some
>>>>>>>>interrupts then disable the firing of interrupts by toggling pin#130's
>>>>>>>>value with an specified time interval, i.e., set the value to 0 first
>>>>>>>>and then after some time, re-set the value to 1. There is no interrupt
>>>>>>>>firing unless time internal > 120ms (~7Hz). This explains why we can
>>>>>>>>only see 7 interrupts for the GPIO controller's parent irq.
>>>>>>>
>>>>>>>That is a great find, well done.
>>>>>>>
>>>>>>>>My hypothesis is the GPIO doesn't have proper power setting so it stays
>>>>>>>>in an idle state or its clock frequency is too low by default thus not
>>>>>>>>quick enough to read interrupt input. Then pinctrl-amd must miss some
>>>>>>>>code to configure the chip and I need a hardware reference manual of this
>>>>>>>>GPIO chip (HID: AMDI0030) or reverse-engineer the driver for Windows
>>>>>>>>since I couldn't find a copy of reference manual online? What would you
>>>>>>>>suggest?
>>>>>>>
>>>>>>>This sounds like it might have something to do with the glitch filter.
>>>>>>>The code in pinctrl-amd.c to setup the trigger-type also configures
>>>>>>>the glitch filter, you could try changing that code to disable the
>>>>>>>glitch-filter. The defines for setting the glitch-filter bits to
>>>>>>>disabled are already there.
>>>>>>>
>>>>>>
>>>>>>Disabling the glitch filter works like a charm! Other enthusiastic
>>>>>>Linux users who have been troubled by this issue for months would
>>>>>>also feel great to know this small tweaking could bring their
>>>>>>touchpad back to life:) Thank you!
>>>>>
>>>>>That is good to hear, I'm glad that we have finally found a solution.
>>>>>
>>>>>>$ git diff
>>>>>>diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
>>>>>>index 9a760f5cd7ed..e786d779d6c8 100644
>>>>>>--- a/drivers/pinctrl/pinctrl-amd.c
>>>>>>+++ b/drivers/pinctrl/pinctrl-amd.c
>>>>>>@@ -463,7 +463,7 @@ static int amd_gpio_irq_set_type(struct irq_data *d, unsigned int type)
>>>>>>                 pin_reg &= ~(ACTIVE_LEVEL_MASK << ACTIVE_LEVEL_OFF);
>>>>>>                 pin_reg |= ACTIVE_LOW << ACTIVE_LEVEL_OFF;
>>>>>>                 pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF);
>>>>>>-               pin_reg |= DB_TYPE_PRESERVE_HIGH_GLITCH << DB_CNTRL_OFF;
>>>>>>+               /** pin_reg |= DB_TYPE_PRESERVE_HIGH_GLITCH << DB_CNTRL_OFF; */
>>>>>>                 irq_set_handler_locked(d, handle_level_irq);
>>>>>>                 break;
>>>>>>
>>>>>>I will learn more about the glitch filter and the implementation of
>>>>>>pinctrl and see if I can disable glitch filter only for this touchpad.
>>>>>
>>>>>The glitch filter likely also has settings for how long a glitch
>>>>>lasts, which apparently goes all the way up to 120ms. If it would
>>>>>only delay reporting by say 0.1ms and consider any pulse longer
>>>>>then 0.1s not a glitch, then having it enabled would be fine.
>>>>>
>>>>>I don't think we want some sort of quirk here to only disable the
>>>>>glitch filter for some touchpads. One approach might be to simply
>>>>>disable it completely for level type irqs.
>>>>>
>>>>>What we really need here is some input from AMD engineers with how
>>>>>this is all supposed to work.
>>>>>
>>>>>E.g. maybe the glitch-filter is setup by the BIOS and we should not
>>>>>touch it all ?
>>>>>
>>>>>Or maybe instead of DB_TYPE_PRESERVE_HIGH_GLITCH low level interrupts
>>>>>should use DB_TYPE_PRESERVE_LOW_GLITCH ?   Some docs for the hw
>>>>>would really help here ...
>>>>
>>>>So I've been digging through the history of the pinctrl-amd.c driver
>>>>and once upon a time it used to set a default debounce time of
>>>>2.75 ms.
>>>>
>>>>See the patch generated by doing:
>>>>
>>>>git format-patch 8cf4345575a416e6856a6856ac6eaa31ad883126~..8cf4345575a416e6856a6856ac6eaa31ad883126
>>>>
>>>>In a linux kernel checkout.
>>>>
>>>>So it would be interesting to add a debugging printk to see
>>>>what the value of pin_reg & DB_TMR_OUT_MASK is for the troublesome
>>>>GPIO.
>>>>
>>>>I guess that it might be all 1s (0xfffffffff) or some such which
>>>>might be a way to check that we should disable the glitch-filter
>>>>for this pin?
>>>
>>>p.s.
>>>
>>>Or maybe we should simply stop touching all the glitch-filter
>>>related bits, in the same way as that old commit has already
>>>removed the code setting the timing of the filter ?
>>>
>>>At least is seems that forcing the filter to be on without
>>>sanitizing the de-bounce time is not a good idea.
>>>
>>Today I find an inconsistency in drivers/pinctrl/pinctrl-amd.c
>>so there must be a bug. As far as I can understand pinctrl-amd,
>>"pin_reg & ~DB_CNTRl_MASK" is used to mask out the debouncing
>>feature,
>>
>>static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset,
>>         unsigned debounce)
>>{
>>     ...
>>     if (debounce) {
>>         ...
>>         if (debounce < 61) {
>>             pin_reg |= 1;
>>             pin_reg &= ~BIT(DB_TMR_OUT_UNIT_OFF);
>>             pin_reg &= ~BIT(DB_TMR_LARGE_OFF);
>>         ...
>>         } else if (debounce < 1000000) {
>>             time = debounce / 62500;
>>             pin_reg |= time & DB_TMR_OUT_MASK;
>>             pin_reg |= BIT(DB_TMR_OUT_UNIT_OFF);
>>             pin_reg |= BIT(DB_TMR_LARGE_OFF);
>>         } else {
>>             pin_reg &= ~DB_CNTRl_MASK;
>>             ret = -EINVAL;
>>         }
>>
>>     } else {
>>         ...
>>         pin_reg &= ~DB_CNTRl_MASK;
>>     }
>>     ...
>>}
>>
>>However in amd_gpio_irq_set_type, "ping_reg & ~(DB_CNTRl_MASK << DB_CNTRL_OFF)"
>>is used,
>>
>>static int amd_gpio_irq_set_type(struct irq_data *d, unsigned int type)
>>{
>>
>>     ...
>>     case IRQ_TYPE_LEVEL_LOW:
>>         pin_reg |= LEVEL_TRIGGER << LEVEL_TRIG_OFF;
>>         pin_reg &= ~(ACTIVE_LEVEL_MASK << ACTIVE_LEVEL_OFF);
>>         pin_reg |= ACTIVE_LOW << ACTIVE_LEVEL_OFF;
>>         pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF);
>>         pin_reg |= DB_TYPE_PRESERVE_HIGH_GLITCH << DB_CNTRL_OFF;
>>         irq_set_handler_locked(d, handle_level_irq);
>>         break;
>>     ...
>>}
>>
>>If "pin_reg & ~DB_CNTRl_MASK" is used instead, the touchpad will work
>>flawlessly. So I believe "pin_reg & ~DB_CNTRl_MASK" is the correct way
>>to mask out the debouncing filter and the bug lies in amd_gpio_set_type.
>
>I'm afraid that that is not the case, the current code is correct,
>it clears bit 5 and 6 of the register which are the bits which control
>the debounce type.
>
Thank you for the explanation. As mentioned in another email (that email
was supposed to be delivered much yesterday, but I forgot to run
msmtp-runqueue.sh to send offline emails), this hack led to some issues.
So it must be amd_gpio_set_debounce that makes the mistake of incorrectly
masking out the bits of controlling the debounce type. Btw,
amd_gpio_set_debounce seems to be never used because
"struct acpi_gpio_info" doesn't has the debounce_timeout field. So the
bug has never been exposed.
>You mentioned in an earlier mail that the value of the register is
>0x500e8 before this function runs.
>
>If you drop the "<< DB_CNTRL_OFF" part then instead you are masking out
>bits 0 and 1 which are already 0, so the mask becomes a no-op.
>
>>Btw, can you explain what's the difference between glitch filter and
>>debouncing filter?
>
>There is no difference the driver mixes the terms, but they both refer
>to the same thing this is most clear in the defines for the DB_CNTRL bits
>(bits 5 and 6 of the register):
>
>#define DB_TYPE_NO_DEBOUNCE               0x0UL
>#define DB_TYPE_PRESERVE_LOW_GLITCH       0x1UL
>#define DB_TYPE_PRESERVE_HIGH_GLITCH      0x2UL
>#define DB_TYPE_REMOVE_GLITCH             0x3UL
>
Thank you for the clarification! This makes it much easier to
understanding the behaviour of the GPIO controller.

>Which is interesting because bits 5 and 6 are both 1 as set by the BIOS,
>so with your little hack to dro the "<< DB_CNTRL_OFF" you are in essence
>keeping bits 5 and 6 as DB_TYPE_REMOVE_GLITCH.
>
But the line before the hacked line is,
                 pin_reg &= ~(DB_CNTRl_MASK << DB_CNTRL_OFF);

which will mask out bits 5 and 6. So my little hack essentially disables
the glitch filter.
>So it seems that the problem is that the irq_set_type code changes
>the glitch filter type from DB_TYPE_REMOVE_GLITCH (filter out all
>glitches) to DB_TYPE_PRESERVE_HIGH_GLITCH, which apperently breaks
>things.
>
>To test this you could replace the:
>
>DB_TYPE_PRESERVE_HIGH_GLITCH
>
>bit in the case IRQ_TYPE_LEVEL_LOW path with:
>
>DB_TYPE_REMOVE_GLITCH
>
>Which I would expect to also fix your touchpad.
>
Changing to DB_TYPE_REMOVE_GLITCH doesn't completely fix the
touchpad. The touchpad is not as sensitive as the hack of disabling
the glitch filter, for example, often two fingers touching will
trigger right-mouse action. hid-recorder shows there are duplicate
HID reports being received.

However, if I set the debounce timeout to be 610us, the touchpad would
work flawlessly and no issue of duplicate HID reports.
>If that is the case an interesting experiment would be to
>replace DB_TYPE_PRESERVE_HIGH_GLITCH with
>DB_TYPE_PRESERVE_LOW_GLITCH instead.
>
Changing to DB_TYPE_PRESERVE_LOW_GLITCH could save the touchpad.
Although hid-recorder shows there are also duplicate HID reports, but
the touchpad work flawlessly (at least I couldn't notice any problem).

I also did other experiments and found if we use
DB_TYPE_PRESERVE_HIGH_GLITCH but set the debounce timeout to 610us, this
could save the touchpad.

Btw, based on on the code of set_debounce, I calculated the debounce
timeout set by the BIOS and found the value is 124.8ms. This may explain
why ~7 interrupts are fired when DB_TYPE_PRESERVE_HIGH_GLITCH is used.

I tried to come up with a minimum set of rules to explain all observations
about this GPIO controller,
  - the read value from the register equal to the written value written
    to the register
  - when the touchpad sends signal through its interrupt line to indicate
    arrival of new data, there are multiple cycles of signal debouncing,
    i.e., spurious interrupts would fired
  - When the debounce timeout is set, the GPIO chip will wait for
    specified time to collect enough data to judge if this is a valid
    signal thus eliminating spurious interrupts
  - DB_TYPE_PRESERVE_HIGH_GLITCH is for filtering high input while
    DB_TYPE_PRESERVE_LOW_GLITCH for filtering low input

but obviously the above set of rules could not explain,
  - when debounce filter is disabled, no duplicate HID reports read by
    hid-recorder which indicates no spurious interrupts
  - with DB_TYPE_REMOVE_GLITCH and the default debounce timeout of
    124.8ms, the interrupt fires at a much higher rate than 7Hz

>I've never seen this kinda glitch/debounce filter where
>you can filter out only one type of level before, so
>I wonder if the code maybe simply got it wrong, also for
>a level type irq I really see no objection to just
>use DB_TYPE_REMOVE_GLITCH instead of the weird "half"
>filters.
>
>So I just ran a git blame and the DB_TYPE_PRESERVE_HIGH_GLITCH
>has been there from the very first commit of this driver,
>I wonder if it has been wrong all this time and should be
>inverted (so DB_TYPE_PRESERVE_LOW_GLITCH instead).
>
>I think we may want to just play it safe though and simply
>switch to DB_TYPE_REMOVE_GLITCH as we already do for all
>edge types and when amd_gpio_set_config() gets called!
>
>Linus, what do you think about just switching to
>DB_TYPE_REMOVE_GLITCH for level type irqs (unifying them
>with all the other modes) and not mucking with this weird,
>undocumented "half" filter modes ?
>
>>Or can you point to some references? I've gain some
>>experience about how to configure the GPIO controller by studying the
>>code of pinctrl-amd and pinctrl-baytrail (I can't find the hardware
>>reference manual for baytrail either). I also tweaked the configuration
>>in pinctrl-amd, for example, setting the debounce timeout to 976 usec
>>and 3.9 msec without disabling the glitch filter could also save the
>>touchpad. But I need some knowledge to understand why this touchpad [1]
>>which also uses the buggy pinctrl-amd isn't affected.
>>
>>[1] https://github.com/Syniurge/i2c-amd-mp2/issues/11#issuecomment-707427095
>
>My guess would be that it uses edge types interrupts instead ?
>I have seen that quite a few times, even though it is weird
>to do that for i2c devices.
>
>Regards,
>
>Hans
>

--
Best regards,
Coiby


More information about the Linux-kernel-mentees mailing list