[Ksummit-discuss] [MAINTAINER SUMMIT] Distribution kernel bugzillas considered harmful

James Bottomley James.Bottomley at HansenPartnership.com
Wed Sep 5 10:13:52 UTC 2018


I'm seeing a lot of wasted effort by our customers on kernel bugs and
trying to engage the distribution to fix them.  As a caveat, I'm
working in the cloud, so the distributions in question are usually
community ones not enterprise ones.  However, we do have a fair few
customers on LTS kernels from Distributions.

Mostly they find a cloud performance regression, they try to engage the
distro, spend ages working on it or submitting bugs and usually end up
with an unsatisfactory result.  By the time they call my team in, we've
likely only got a week to fix the issue.  However, step one is always
confirming whether upstream works (95% of the time it does) and then
finding the fix by bisection (usually assisted by knowledge of where
the bug is).  To do the bisection we usually have to build a kernel
package with our guesses and get them to try it, so it can be a bit
slow.  Once we have the backport, we send it to stable and notify the
distribution to include it in their next kernel release.

Here's the rub: community distributions (even LTS ones) don't have the
resources even to triage cloud bugs in environments they likely can't
reproduce, so we really need to develop assistive tools for customers
to perform bisections to identify what caused the bug or (in the 95%
case) what fixed it.  Having a bugzilla and using it as first line of
support implies a service expectation (usually coming from Enterprise)
that simply isn't met, so distributions need to fix this at the point
of interaction: bugzilla.

The first suggestion is that kernel builds are pretty much automated
and we try to make every commit buildable, so could we automate the
machinery that allows a customer to do bisection simply by installing a
kernel package? (we here, obviously means the distro, but going from
git bisect to kernel package would be the useful link).

Second suggestion is that the bugzillas need to say much more strongly
that the reporter really needs to confirm the fix in upstream and do
the bisection themselves (and ideally request the backport to stable
themselves).

James



More information about the Ksummit-discuss mailing list