[Ksummit-discuss] [MAINTAINER SUMMIT] Stable trees and release time

Wed Sep 5 16:26:17 UTC 2018

On Wed, Sep 5, 2018 at 6:19 PM, Sasha Levin
<Alexander.Levin at microsoft.com> wrote:
> On Wed, Sep 05, 2018 at 05:54:47PM +0200, Daniel Vetter wrote:
>>On Wed, Sep 5, 2018 at 4:05 PM, Greg KH <gregkh at linuxfoundation.org> wrote:
>>> On Wed, Sep 05, 2018 at 03:27:58PM +0200, Daniel Vetter wrote:
>>>> On Wed, Sep 5, 2018 at 3:03 PM, Takashi Iwai <tiwai at suse.de> wrote:
>>>> > On Wed, 05 Sep 2018 14:24:18 +0200,
>>>> > James Bottomley wrote:
>>>> >>
>>>> >> On September 5, 2018 11:47:00 AM GMT+01:00, Mark Brown <broonie at kernel.org> wrote:
>>>> >> >On Wed, Sep 05, 2018 at 10:58:45AM +0100, James Bottomley wrote:
>>>> >> >
>>>> >> >> This really shouldn't be an issue: stable trees are backported from
>>>> >> >> upstream.  The patch (should) work in upstream, so it should work in
>>>> >> >> stable.  There are only a few real cases you need to worry about:
>>>> >> >
>>>> >> >>    1. Buggy patch in upstream backported to stable. (will be caught
>>>> >> >and
>>>> >> >>       the fix backported soon)
>>>> >> >>    2. Missing precursor causing issues in stable alone.
>>>> >> >>    3. Bug introduced when hand applying.
>>>> >> >
>>>> >> >> The chances of one of these happening is non-zero, but the criteria
>>>> >> >for
>>>> >> >> stable should mean its still better odds than the odds of hitting the
>>>> >> >> bug it was fixing.
>>>> >> >
>>>> >> >Some of those are substantial enough to be worth worrying about,
>>>> >> >especially the missing precursor issues.  It's rarely an issue with the
>>>> >> >human generated backports but the automated ones don't have a sense of
>>>> >> >context in the selection.
>>>> >> >
>>>> >> >There's also a risk/reward tradeoff to consider with more minor issues,
>>>> >> >especially performance related ones.  We want people to be enthusiastic
>>>> >> >about taking stable updates and every time they find a problem with a
>>>> >> >backport that works against them doing that.
>>>> >>
>>>> >> I absolutely agree.  That's why I said our process is expediency
>>>> >> based:  you have to trade off the value of applying the patch vs the
>>>> >> probability of introducing bugs.  However the maintainers are mostly
>>>> >> considering this which is why stable is largely free from trivial
>>>> >> but pointless patches.  The rule should be: if it doesn't fix a user
>>>> >> visible bug, it doesn't go into stable.
>>>> >
>>>> > Right, and here the current AUTOSEL (and some other not-stable-marked)
>>>> > patches coming to a gray zone.  The picked-up patches are often right
>>>> > as "some" fixes, but they are not necessarily qualified as "stable
>>>> > fixes".
>>>> >
>>>> > How about allowing to change the choice of AUTOSEL to be opt-in and
>>>> > opt-out, depending on the tree?  In my case, usually the patches
>>>> > caught by AUTOSEL aren't really the patches with forgotten stable
>>>> > marker, but rather left intentionally by various reasons.  Most of
>>>> > them are fine to apply in anyway, but it was uncertain whether they
>>>> > are really needed / qualifying as stable fixes.  So, I'd be happy to
>>>> > see them as opt-in, i.e. applied only via manual approval.
>>>> >
>>>> > Meanwhile, some trees have no stable-maintenance, and AUTOSEL would
>>>> > help for them.  They can be opt-out, i.e. kept until someone rejects.
>>>>
>>>> +1 on AUTOSEL opt-in. It's annyoing at best, when it backports cleanup
>>>> patches (because somehow those look like stealthy security fixes
>>>> sometimes) and breaks a bunch of people's boxes for no good reason.
>>>>
>>>> In general it'd be really good if -stable had a clearer audit path.
>>>> Every patch have a recorded reason why it's being applied (e.g. Cc:
>>>> stable in upstream, Link to the lkml thread/bug report, AUTOSEL mail,
>>>> whatever), so that after the fact I can figure out why a -stable patch
>>>> happend, that would be really great. Atm -stable occasionally blows
>>>> up, with a patch we didn't mark as cc: stable, and we have no idea
>>>> whyiit showed up in -stable even. That makes it really hard to do
>>>> better next time around.
>>>
>>> I try to keep the audit thread here, as I get asked all the time why
>>> stuff got added.
>>>
>>> Here's what I do, it's not exactly obvious, sorry:
>>>         - if it came from a stable@ tag, just leave it alone and add my
>>>           signed-off-by
>>>         - if it was manually requested by someone, I add a "cc:
>>>           requestor" to the signed-off-by area and add my s-o-b
>>
>>Cc-stable-requested-by: would be more obvious. If you have, lkml
>>archive link with the bug report is even better.
>>
>>An additional quirk in drm is that we have committers, so normal Cc:
>>rules (author + committer + anyone already on Cc:) has a good chance
>>of leaving out maintainers. And generally committers don't care one
>>bit about some multi-year old LTS kernel, not their job ... You'll
>>never get any review from them.
>>
>>>         - if it came from Sasha's tree, Sasha's s-o-b is on it
>>
>>How do things end up in Sasha's tree? Is that just AUTOSEL, or also
>>other patches?
>
> Just autosel. Other patches take the regular way into Stable.
>
>>>         - if it came from David Miller's patchset, his s-o-b is on it.
>>
>>Ok, that's netdev and Dave knows what's he doing :-)
>>
>>> That should cover all types of patches currently going into the trees,
>>> right?
>>>
>>> So always, you can cc: everyone on the s-o-b area and get the people
>>> involved in the patch and someone involved in reviewing it for stable
>>> inclusion.
>>
>>Let's pick a concrete example:
>>
>>commit c81350c31d0d20661a0aa839b79182bcb0e7a45d
>>Author: Satendra Singh Thakur <satendra.t at samsung.com>
>>Date:   Thu May 3 11:19:32 2018 +0530
>>
>>    drm/atomic: Handling the case when setting old crtc for plane
>>
>>    [ Upstream commit fc2a69f3903dfd97cd47f593e642b47918c949df ]
>>
>>    In the func drm_atomic_set_crtc_for_plane, with the current code,
>>    if crtc of the plane_state and crtc passed as argument to the func
>>    are same, entire func will executed in vein.
>>    It will get state of crtc and clear and set the bits in plane_mask.
>>    All these steps are not required for same old crtc.
>>    Ideally, we should do nothing in this case, this patch handles the same,
>>    and causes the program to return without doing anything in such scenario.
>>
>>    Signed-off-by: Satendra Singh Thakur <satendra.t at samsung.com>
>>    Cc: Madhur Verma <madhur.verma at samsung.com>
>>    Cc: Hemanshu Srivastava <hemanshu.s at samsung.com>
>>    Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>    Link: https://patchwork.freedesktop.org/patch/msgid/1525326572-25854-1-git-send-email-satendra.t@samsung.com
>>    Signed-off-by: Sasha Levin <alexander.levin at microsoft.com>
>>    Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
>>
>>Upstream patch doesn't have a cc: stable. I tried looking for it in my
>>mail archives (and it's a patch committed by myself, so I guess I'll
>>get cc'ed?), didn't find anything.
>
> I'm really not sure why you don't see the mail. Can you maybe see if it
> got filtered as spam?

Nothing in spam either. Maybe gmail cleaned it out already.

>>I have no idea why this got added at all. Looking at the discussion on
>>dri-devel, it's purely a cleanup for consistency with another
>>function. And it blew up :-/
>
> On the flip side, what about:
>
> commit 3fd34ac02ae8cc20d78e3aed2cf6e67f0ae109ea
> Author: Hang Yuan <hang.yuan at linux.intel.com>
> Date:   Mon Jul 23 20:15:46 2018 +0800
>
>     drm/i915/gvt: fix cleanup sequence in intel_gvt_clean_device
>
>     Create one vGPU and then unbind IGD device from i915 driver. The following
>     oops will happen. This patch will free vgpu resource first and then gvt
>     resource to remove these oops.
>
>     BUG: unable to handle kernel NULL pointer dereference at       00000000000000a8
>       PGD 80000003c9d2c067 P4D 80000003c9d2c067 PUD 3c817c067 P      MD 0
>       Oops: 0002 [#1] SMP PTI
>       RIP: 0010:down_write+0x1b/0x40
>     Call Trace:
>       debugfs_remove_recursive+0x46/0x1a0
>       intel_gvt_debugfs_remove_vgpu+0x15/0x30 [i915]
>       intel_gvt_destroy_vgpu+0x2d/0xf0 [i915]
>       intel_vgpu_remove+0x2c/0x30 [kvmgt]
>       mdev_device_remove_ops+0x23/0x50 [mdev]
>       mdev_device_remove+0xdb/0x190 [mdev]
>       mdev_device_remove+0x190/0x190 [mdev]
>       device_for_each_child+0x47/0x90
>       mdev_unregister_device+0xd5/0x120 [mdev]
>       intel_gvt_clean_device+0x91/0x120 [i915]
>       i915_driver_unload+0x9d/0x120 [i915]
>       i915_pci_remove+0x15/0x20 [i915]
>       pci_device_remove+0x3b/0xc0
>       device_release_driver_internal+0x157/0x230
>       unbind_store+0xfc/0x150
>       kernfs_fop_write+0x10f/0x180
>       __vfs_write+0x36/0x180
>       ? common_file_perm+0x41/0x130
>       ? _cond_resched+0x16/0x40
>       vfs_write+0xb3/0x1a0
>       ksys_write+0x52/0xc0
>       do_syscall_64+0x55/0x100
>       entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>     BUG: unable to handle kernel NULL pointer dereference at 0      000000000000038
>       PGD 8000000405bce067 P4D 8000000405bce067 PUD 405bcd067 PM      D 0
>       Oops: 0000 [#1] SMP PTI
>       RIP: 0010:hrtimer_active+0x5/0x40
>     Call Trace:
>       hrtimer_try_to_cancel+0x25/0x120
>       ? tbs_sched_clean_vgpu+0x1f/0x50 [i915]
>       hrtimer_cancel+0x15/0x20
>       intel_gvt_destroy_vgpu+0x4c/0xf0 [i915]
>       intel_vgpu_remove+0x2c/0x30 [kvmgt]
>       mdev_device_remove_ops+0x23/0x50 [mdev]
>       mdev_device_remove+0xdb/0x190 [mdev]
>       ? mdev_device_remove+0x190/0x190 [mdev]
>       device_for_each_child+0x47/0x90
>       mdev_unregister_device+0xd5/0x120 [mdev]
>       intel_gvt_clean_device+0x89/0x120 [i915]
>       i915_driver_unload+0x9d/0x120 [i915]
>       i915_pci_remove+0x15/0x20 [i915]
>       pci_device_remove+0x3b/0xc0
>       device_release_driver_internal+0x157/0x230
>       unbind_store+0xfc/0x150
>       kernfs_fop_write+0x10f/0x180
>       __vfs_write+0x36/0x180
>       ? common_file_perm+0x41/0x130
>       ? _cond_resched+0x16/0x40
>       vfs_write+0xb3/0x1a0
>       ksys_write+0x52/0xc0
>       do_syscall_64+0x55/0x100
>       entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>     Fixes: bc7b0be316ae("drm/i915/gvt: Add basic debugfs infrastructure")
>     Fixes: afe04fbe6c52("drm/i915/gvt: create an idle vGPU")
>     Signed-off-by: Hang Yuan <hang.yuan at linux.intel.com>
>     Signed-off-by: Zhenyu Wang <zhenyuw at linux.intel.com>
>
> Which wasn't tagged for (and is not in any) stable trees?

Not stable material, it fixes just a driver unload bug. That's for
developers only. Worst case you break some user's box for this, which
I don't think is cool. Since we're a 100% upstream driver team this
won't harm developers if it's not backported.

Note that because of fbcon and other reasons, an rmmod i915 will fail.
You need to enable a bunch of CONFIG_EXPERT options (with scary texts
and stuff) and have a script from our test suite to be able to even
make this happen.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch