[Ksummit-discuss] [Stable kernel] feature backporting collaboration

Bird, Timothy Tim.Bird at am.sony.com
Fri Sep 2 17:06:34 UTC 2016


> -----Original Message-----
>  James Bottomley wrote:
> On Fri, 2016-09-02 at 10:54 +0100, Mark Brown wrote:
> > On Thu, Sep 01, 2016 at 09:25:31PM -0400, Levin, Alexander via
> > Ksummit-discuss wrote:
> > > On Wed, Aug 31, 2016 at 10:01:13PM -0400, Alex Shi wrote:
> >
> > > > I am a Linaro stable kernel maintainer. Our stable kernel is base
> > > > on LTS plus much of upstream features backporting on them. Here
> > > > is the detailed
> >
> > > I really disagree with this approach. I think that backporting
> > > board support like what LTSI does might make sense since it's self
> > > contained, but what LSK does is just crazy.
> >
> > The bulk of these features are exactly that - they're isolated driver
> > specific code or new subsystems.  There are also some things with
> > wider impact but it's nowhere near all of them.
> 
> It's crazy because it encourages precisely the wrong behaviour: vendors
> target this tree not upstream.
> 
> > > Stable kernels have very strict restrictions that are focused on
> > > not taking commits that have high potential to cause unintended
> > > side effects, incorrect interactions with the rest of the kernel or
> > > just introduce new bugs.
> >
> > > Mixing in new features that interact with multiple subsystems is a
> > > recipe for disaster. We barely pull off backporting what looks like
> > > trivial fixes, trying to do the same for more than that is bound be
> > > broken.
> >
> > It's what people are doing for products, they want newer features but
> > they also don't want to rebase their product kernel onto mainline as
> > that's an even bigger integration risk.  People aren't using this
> > kernel raw, they're using it as the basis for product kernels.  What
> > this is doing is getting a bunch of people using the same backports
> > which shares effort and hopefully makes it more likely that some of
> > the security relevant features will get deployed in products.
> 
> 
> And history repeats itself: this is almost the precise rationale the
> distros used for all their out of tree patches in their 2.4 enterprise
> kernels.  The disaster that ended up with (patch sets bigger than the
> kernel itself with no way of getting them all upstream) is what led
> directly to their upstream first policy.
> 
> The fact that all the distros track upstream more closely also means it's better
> tested: the farther away from upstream you move, the more problems you'll
> have.
> 
> >  Ideally some of the saved time can be spent on upstreaming things
> > though I fear that's a little optimistic.
> 
> Such as a diff to mainline that grows without bound ...
> 
> > > As an alternative, why not use more recent stable kernels and
> > > customize the config specifically for each user to enable on
> > > features that that specific user wants to have.
> >
> > That's just shipping a kernel - I don't think anyone is silly enough
> > to ship an allmodconfig or similar in production (though I'm sure
> > someone can come up with an example).
> >
> > > The benefit here is that if used correctly you'll get to use all
> > > the new shiny features you want on a more recent kernel, and none
> > > of the things you don't want. So yes, you're upgrading to a newer
> > > kernel all the time, but if I understant your use-case right it
> > > shouldn't matter too much, more so if you're already taking chances
> > > on backporting major features yourself.
> >
> > Like I say in this case updating to a newer kernel also means
> > rebasing the out of tree patch stack and taking a bunch of test risk
> > from that
> 
> Risk you wouldn't have if you just followed upstream first.  You can
> add this to the list of problems you created by not upstreaming the
> patches.
> 
> >  - in product development for the sorts of products that end up
> > including the LSK the churn and risk from targeted backports is seen
> > as much safer than updating to an entire new upstream kernel.
> 
> This is the attitude that needs to change.  If enterprises can finally
> realise that tracking upstream more closely is a good strategy: shared
> testing on the trunk, why can't embedded?  What is this huge risk they
> see with the upstream kernel?  Granted, they have this vicious circle
> where they need stuff that's not upstream because they targetted a non
> -upstream kernel, which leads to them not wanting to upport it, but
> surely it's Linaro's job to break this circle?

I'll take a crack at this (the "why can't embedded?" question).
In many cases, when many of these embedded
SoC vendors first started with Linux there was a combination of
1) kernel not being sufficient for their needs (particularly in the mobile space)
2) inexperience with mainlining and the community, and
3) a rush to market (there's always a rush to market)

The solution that Android came up with, to ship first and worry about
mainlining either later or not at all, worked tremendously for them.
It caused pain throughout the supply chain (that I've felt personally),
but it got the job done.  My argument is that they could not possibly
have followed any kind of "upstream-first" strategy, even if they had
had the skills or inclination. As one example, it took 3 years before
the kernel community accepted their strategy for power management
in the mobile space.

Where we are now with some of these SoCs is at millions of lines of
code out-of-tree.  It's being reduced, slowly, but there are still
significant areas where the mainline kernel just doesn't have the
support needed for shipping product. My pet peeve is support for
charging over USB, where Linaro has had a patch set
being stalled and/or ignored by the USB maintainer for 2 years!!

This discussion trivializes the difficulty of making progress mainlining
some of these pieces. At least the enterprise guys were working on
the same chip architecture. The problem for some of these
SoC vendors is that they have completely different approaches
to some issues, already shipping, and there's very little by
way of synergy with other vendors' patches.  That's an issue
of fragmentation (in the embedded space) that the enterprise
distros didn't have (well, not from my perspective - likely I'm trivializing
their issues :-).

Anyway, add to that the enormous churn caused by device tree, and it's
a miracle any of them have made significant progress upstreaming stuff
in  the last few years.

The stark reality is that in the near term, mainline just won't be feasible to
ship in products on some SoCs.  And no SoC vendor is going to stop and
wait for it to mature before shipping product.

Now - I'm not holding the SoC vendors blameless.  Many of them have
done their upstreaming work in about the worst way possible, when
they've made attempts at all.  I sympathize with the notion that somehow
we've failed them by not convincing them how to do things right.

Sorry to rant... This discussion just hit an area I've felt deeply about, and
tried to do something about.
 -- Tim



More information about the Ksummit-discuss mailing list