[Ksummit-discuss] [MAINTAINERS SUMMIT] Bug-introducing patches

Mon Sep 10 20:45:19 UTC 2018

On Mon, 10 Sep 2018 19:43:11 +0000
Sasha Levin <Alexander.Levin at microsoft.com> wrote:

> On Fri, Sep 07, 2018 at 08:52:40AM -0700, Linus Torvalds wrote:
> >So this is what my argument really boils down to: the more critical a
> >patch is, the more likely it is to be pushed more aggressively, which
> >in turn makes it statistically much more likely to show up not only
> >during the latter part of the development cycle, but it will directly
> >mean that it looks "less tested".
> >
> >And AT THE SAME TIME, the more critical a patch is, the more likely it
> >is to also show up as a problem spot for distros. Because, by
> >definition, it touched something critical and likely subtle.
> >
> >End result: BY DEFINITION you'll see a correlation between "less
> >testing" and "more problems".
> >
> >But THAT is correlation. That's not the fundamental causation.
> >
> >Now, I agree that it's correlation that makes sense to treat as
> >causation. It just is very tempting to say: "less testing obviously
> >means more problems". And I do think that it's very possibly a real
> >causal property as well, but my argument has been that it's not at all
> >obviously so, exactly because I would expect that correlation to exist
> >even if there was absolutely ZERO causality.
> >
> >See what my argument is? You're arguing from correlation. And I think
> >there is a much more direct causal argument that explains a lot of the
> >correlation.  
> 
> Both of us agree that patches in later -rc cycles are buggier. We don't
> agree on why, but I think that it actually doesn't matter much. For the
> sake of the argument, let's go with what you're saying and assume that
> they're buggier because they are are more critical, tricky and subtle.
> 
> So we have this time period of a few weeks where we know that we're
> going to see tricky patches. What can we do to better deal with it?
> Saying that we'll just see more bugs and we should just live with it
> because it's "BY DEFINITION" is not really a good answer IMO.
> 
> For stable trees, we can address that by waiting even longer before
> picking up -rc5+ stuff, but that will move us further away from your
> tree which is an undesirable effect.
> 
> I don't have anything beyond guesses, but I don't think the
> solution here is WONTFIX.
> 

I think it may be more of CANTFIX.

The bugs introduced after -rc5 are more subtle and harder to trigger. I
(and I presume Linus, but he can talk for himself) don't believe that
keeping it in linux-next any longer will help find them, unless the
bots get better to do so. The problem is that these bugs are not going
to be triggered until they get into the mainline kernel and perhaps not
even until they get into the distros. We want to find them before that,
but it's not until they are used in production environments that they
will get found.

The best we can do is make the automated testing of linux-next better
such that there's less -rc5 patches that need to go in in the first
place.

I do think that anything that goes into -rc5 or later should be tested
by the developer and the 0day bot, to make sure they don't introduce
some silly bug. But linux-next was mainly to deal with bugs caused by
integration of various sub systems. But -rc5 fixes only care about
integrating with mainline. And as Linus pointed out, when it gets into
mainline, it will then be pulled into linux-next where it gets
integrated with new code coming into the next merge window.

-- Steve