[Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches

Mon Sep 10 23:38:04 UTC 2018

On Mon, Sep 10, 2018 at 04:45:19PM -0400, Steven Rostedt wrote:
>On Mon, 10 Sep 2018 19:43:11 +0000
>Sasha Levin <Alexander.Levin at microsoft.com> wrote:
>
>> On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
>> >So this is what my argument really boils down to: the more critical a
>> >patch is, the more likely it is to be pushed more aggressively, which
>> >in turn makes it statistically much more likely to show up not only
>> >during the latter part of the development cycle, but it will directly
>> >mean that it looks "less tested".
>> >
>> >And AT THE SAME TIME, the more critical a patch is, the more likely it
>> >is to also show up as a problem spot for distros. Because, by
>> >definition, it touched something critical and likely subtle.
>> >
>> >End result: BY DEFINITION you'll see a correlation between "less
>> >testing" and "more problems".
>> >
>> >But THAT is correlation. That's not the fundamental causation.
>> >
>> >Now, I agree that it's correlation that makes sense to treat as
>> >causation. It just is very tempting to say: "less testing obviously
>> >means more problems". And I do think that it's very possibly a real
>> >causal property as well, but my argument has been that it's not at all
>> >obviously so, exactly because I would expect that correlation to exist
>> >even if there was absolutely ZERO causality.
>> >
>> >See what my argument is? You're arguing from correlation. And I think
>> >there is a much more direct causal argument that explains a lot of the
>> >correlation.
>>
>> Both of us agree that patches in later -rc cycles are buggier. We don't
>> agree on why, but I think that it actually doesn't matter much. For the
>> sake of the argument, let's go with what you're saying and assume that
>> they're buggier because they are are more critical, tricky and subtle.
>>
>> So we have this time period of a few weeks where we know that we're
>> going to see tricky patches. What can we do to better deal with it?
>> Saying that we'll just see more bugs and we should just live with it
>> because it's "BY DEFINITION" is not really a good answer IMO.
>>
>> For stable trees, we can address that by waiting even longer before
>> picking up -rc5+ stuff, but that will move us further away from your
>> tree which is an undesirable effect.
>>
>> I don't have anything beyond guesses, but I don't think the
>> solution here is WONTFIX.
>>
>
>I think it may be more of CANTFIX.
>
>The bugs introduced after -rc5 are more subtle and harder to trigger. I
>(and I presume Linus, but he can talk for himself) don't believe that
>keeping it in linux-next any longer will help find them, unless the
>bots get better to do so. The problem is that these bugs are not going
>to be triggered until they get into the mainline kernel and perhaps not
>even until they get into the distros. We want to find them before that,
>but it's not until they are used in production environments that they
>will get found.

If you're fixing something in -rc8, which is, according to Linus, only
for *critical* fixes that are usually complex, you better have tested
that code before pushing in.

Is it on obscure hardware no one has access too? I can't imagine what
makes that bug critical then.

Otherwise, yes, it should be a requirement that a patch was reasonably
tested before being merged, this is more true for those late -rc
critical fixes.

>The best we can do is make the automated testing of linux-next better
>such that there's less -rc5 patches that need to go in in the first
>place.

Being in -next is not only about running it through automatic bots.
Being on 0day means, in practice, "amount of days humans had to
review/test that code".

I didn't want to count days-in-next just to credit automatic testing,
but also as an indicator of how many eyeballs a commit attracted before
being merged.

>I do think that anything that goes into -rc5 or later should be tested
>by the developer and the 0day bot, to make sure they don't introduce
>some silly bug. But linux-next was mainly to deal with bugs caused by
>integration of various sub systems. But -rc5 fixes only care about
>integrating with mainline. And as Linus pointed out, when it gets into
>mainline, it will then be pulled into linux-next where it gets
>integrated with new code coming into the next merge window.

It would be nice if every bug coming in that late would have a
Tested-by: tag. Isn't it a requirement that patches should be tested
anyways?

Require that every patch was sent to lkml? Is it a big ask?

If the patches are so complex and subtle, require at least one
reviewed-by/acked-by?

--
Thanks,
Sasha