[Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking

Wed Jul 5 15:36:55 UTC 2017

On Wed, 2017-07-05 at 11:27 -0400, Steven Rostedt wrote:
> On Wed, 5 Jul 2017 08:16:33 -0700
> Guenter Roeck <linux at roeck-us.net> wrote:
> 
> > 
> > The reproducers for several of the usb fixes I submitted recently
> > took hours of stress test to reproduce the underlying problems. I
> > have one more to fix which takes days to reproduce, if at all (I
> > have seen that problem only two or three times during weeks of
> > stress test). Due to the nature of the problems, reproducing
> > them heavily depended on the underlying hardware. None of the
> > reproducers can guarantee that the problem is fixed; they are
> > intended to show the problem, not that it is fixed. This happens a
> > lot with race conditions - in many cases it is impossible to prove
> > that the problem is fixed; one can only prove that it still exists.
> > 
> > Echoing what you said, I have no idea how it would even be possible
> > to write unit tests to verify if the problems I fixed are really
> > fixed.
> > 
> > Several of the fixes I have submitted are based on single-instance
> > error logs with no reproducer. Many others are compile time fixes
> > or fix problems found with code inspection (manual or automatic).
> > 
> > If we start shaming people for not providing unit tests, all we'll
> > accomplish is that people will stop providing bug fixes.
> 
> I need to be clearer on this. What I meant was, if there's a bug
> where someone has a test that easily reproduces the bug, then if
> there's not a test added to selftests for said bug, then we should
> shame those into doing so.
> 
> A bug that is found by inspection or hard to reproduce test cases are
> not applicable, as they don't have tests that can show a regression.
> 
> And I'm betting that those bugs are NOT REGRESSIONS! Most likely are
> bugs that always existed, but because of the unpredictable hitting of
> the bug (as you said, it required hours of stress tests to
> reproduce), the bug was not previously hit during development. That's
> not a regression, that's a feature.
> 
> Are we tracking regressions or just simply bugs?

A lot of device driver regressions are bugs that previously existed in
the code but which didn't manifest until something else happened.  A
huge number of locking and timing issues are like this.  The irony is
that a lot of them go from race always being won (so bug never noticed)
to race being lost often enough to make something unusable.  To a user
that ends up being a kernel regression because it's a bug in the
current kernel which they didn't see previously which makes it unusable
for them.

I've got to vote with my users here: that's a regression not a
"feature".

James