[Ksummit-discuss] [CORE TOPIC] [nomination] Move Fast and Oops Things

Wed May 21 10:11:08 UTC 2014

On Wed, 21 May 2014 01:36:55 -0700 Dan Williams <dan.j.williams at intel.com>
wrote:

> On Wed, May 21, 2014 at 1:25 AM, NeilBrown <neilb at suse.de> wrote:
> > On Wed, 21 May 2014 00:48:48 -0700 Dan Williams <dan.j.williams at intel.com>
> > wrote:
> >
> >> On Fri, May 16, 2014 at 8:04 AM, Chris Mason <clm at fb.com> wrote:
> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> > Hash: SHA1
> >> >
> >> > On 05/15/2014 10:56 PM, NeilBrown wrote:
> >> >> On Thu, 15 May 2014 16:13:58 -0700 Dan Williams
> >> >> <dan.j.williams at gmail.com> wrote:
> >> >>
> >> >>> What would it take and would we even consider moving 2x faster
> >> >>> than we are now?
> >> >>
> >> >> Hi Dan, you seem to be suggesting that there is some limit other
> >> >> than "competent engineering time" which is slowing Linux "progress"
> >> >> down.
> >> >>
> >> >> Are you really suggesting that?  What might these other limits be?
> >> >>
> >> >> Certainly there are limits to minimum gap between conceptualisation
> >> >> and release (at least one release cycle), but is there really a
> >> >> limit to the parallelism that can be achieved?
> >> >
> >> > I haven't compared the FB commit rates with the kernel, but I'll
> >> > pretend Dan's basic thesis is right and talk about which parts of the
> >> > facebook model may move faster than the kernel.
> >> >
> >> > The facebook is pretty similar to the way the kernel works.  The merge
> >> > window lasts a few days and the major releases are every week, but
> >> > overall it isn't too far away.
> >> >
> >> > The biggest difference is that we have a centralized tool for
> >> > reviewing the patches, and once it has been reviewed by a specific
> >> > number of people, you push it in.
> >> >
> >> > The patch submission tool runs the patch through lint and various
> >> > static analysis to make sure it follows proper coding style and
> >> > doesn't include patterns of known bugs.  This cuts down on the review
> >> > work because the silly coding style mistakes are gone before it gets
> >> > to the tool.
> >> >
> >> > When you put in a patch, you have to put in reviewers, and they get a
> >> > little notification that your patch needs review.  Once the reviewers
> >> > are happy, you push the patch in.
> >> >
> >> > The biggest difference: there are no maintainers.  If I want to go
> >> > change the calendar tool to fix a bug, I patch it, get someone else to
> >> > sign off and push.
> >> >
> >> > All of which is my way of saying the maintainers (me included) are the
> >> > biggest bottleneck.  There are a lot of reasons I think the maintainer
> >> > model fits the kernel better, but at least for btrfs I'm trying to
> >> > speed up the patch review process and use patchwork more effectively.
> >>
> >> To be clear, I'm not arguing for a maintainer-less model.  We don't
> >> have the tooling or operational-data to support that.  We need
> >> maintainers to say "no".  But, what I think we can do is give
> >> maintainers more varied ways to say it.  The goal, de-escalate the
> >> merge event as a declaration that the code quality/architecture
> >> conversation is over.
> >>
> >> Release early, release often, and with care merge often.
> >
> > I think this falls foul of the "no regressions" rule.
> >
> > The kernel policy is that once the functionality gets to users, it cannot be
> > taken away.  Individual drivers in 'staging' manage to avoid this rule
> > because that are clearly separate things.
> > New system calls and attributes in sysfs etc seem to be much harder to
> > "partially" release.
> 
> My straw man is something like the following for driver "foo"
> 
> if (gatekeeper_foo_new_awesome_sauce)
>    do_new_thing();
> 
> Where setting gatekeeper_foo_new_awesome_sauce taints the kernel and
> warns that there is no guarantee of this functionality being present
> in the same form or at all going forward.

Interesting idea.
Trying to imagine how this might play out in practice....

You talk about "value delivered to users".   But users tend to use
applications, and applications are the users of kernel features.

Will anyone bother writing or adapting an application to use a feature which
is not guaranteed to hang around?
Maybe they will, but will the users of the application know that it might
stop working after a kernel upgrade?  Maybe...

Maybe if we had some concrete examples of features that could have been
delayed using a gatekeeper.

The one that springs to my mind is cgroups.  Clearly useful, but clearly
controversial.  It appears that the original implementation was seriously
flawed and Tejun is doing a massive amount of work to "fix" it, and this
apparently will lead to API changes.  And this is happening without any
gatekeepers.  Would it have been easier in some way with gatekeepers?
... I don't see how it would be, except that fewer people would have used
cgroups, and then maybe we wouldn't have as much collective experience to
know what the real problems were(?).

I think that is the key.  With a user-facing option, people will try it and
probably cope if it disappears (though they might complain loudly and sign
petitions declaring facebook to be the anti-$DEITY).  However  with kernel
internal options, applications are unlikely to use them without some
expectation of stability.  So finding the problems would be a lot harder.

Which doesn't mean that it can't work, but it would be nice if create some
real life examples to see how it plays out in practice.

NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://lists.linuxfoundation.org/pipermail/ksummit-discuss/attachments/20140521/4e44b75b/attachment.sig>