[Ksummit-discuss] [MAINTAINER SUMMIT] Stable trees and release time
Daniel Vetter
daniel.vetter at ffwll.ch
Wed Sep 5 16:26:17 UTC 2018
On Wed, Sep 5, 2018 at 6:19 PM, Sasha Levin
<Alexander.Levin at microsoft.com> wrote:
> On Wed, Sep 05, 2018 at 05:54:47PM +0200, Daniel Vetter wrote:
>>On Wed, Sep 5, 2018 at 4:05 PM, Greg KH <gregkh at linuxfoundation.org> wrote:
>>> On Wed, Sep 05, 2018 at 03:27:58PM +0200, Daniel Vetter wrote:
>>>> On Wed, Sep 5, 2018 at 3:03 PM, Takashi Iwai <tiwai at suse.de> wrote:
>>>> > On Wed, 05 Sep 2018 14:24:18 +0200,
>>>> > James Bottomley wrote:
>>>> >>
>>>> >> On September 5, 2018 11:47:00 AM GMT+01:00, Mark Brown <broonie at kernel.org> wrote:
>>>> >> >On Wed, Sep 05, 2018 at 10:58:45AM +0100, James Bottomley wrote:
>>>> >> >
>>>> >> >> This really shouldn't be an issue: stable trees are backported from
>>>> >> >> upstream. The patch (should) work in upstream, so it should work in
>>>> >> >> stable. There are only a few real cases you need to worry about:
>>>> >> >
>>>> >> >> 1. Buggy patch in upstream backported to stable. (will be caught
>>>> >> >and
>>>> >> >> the fix backported soon)
>>>> >> >> 2. Missing precursor causing issues in stable alone.
>>>> >> >> 3. Bug introduced when hand applying.
>>>> >> >
>>>> >> >> The chances of one of these happening is non-zero, but the criteria
>>>> >> >for
>>>> >> >> stable should mean its still better odds than the odds of hitting the
>>>> >> >> bug it was fixing.
>>>> >> >
>>>> >> >Some of those are substantial enough to be worth worrying about,
>>>> >> >especially the missing precursor issues. It's rarely an issue with the
>>>> >> >human generated backports but the automated ones don't have a sense of
>>>> >> >context in the selection.
>>>> >> >
>>>> >> >There's also a risk/reward tradeoff to consider with more minor issues,
>>>> >> >especially performance related ones. We want people to be enthusiastic
>>>> >> >about taking stable updates and every time they find a problem with a
>>>> >> >backport that works against them doing that.
>>>> >>
>>>> >> I absolutely agree. That's why I said our process is expediency
>>>> >> based: you have to trade off the value of applying the patch vs the
>>>> >> probability of introducing bugs. However the maintainers are mostly
>>>> >> considering this which is why stable is largely free from trivial
>>>> >> but pointless patches. The rule should be: if it doesn't fix a user
>>>> >> visible bug, it doesn't go into stable.
>>>> >
>>>> > Right, and here the current AUTOSEL (and some other not-stable-marked)
>>>> > patches coming to a gray zone. The picked-up patches are often right
>>>> > as "some" fixes, but they are not necessarily qualified as "stable
>>>> > fixes".
>>>> >
>>>> > How about allowing to change the choice of AUTOSEL to be opt-in and
>>>> > opt-out, depending on the tree? In my case, usually the patches
>>>> > caught by AUTOSEL aren't really the patches with forgotten stable
>>>> > marker, but rather left intentionally by various reasons. Most of
>>>> > them are fine to apply in anyway, but it was uncertain whether they
>>>> > are really needed / qualifying as stable fixes. So, I'd be happy to
>>>> > see them as opt-in, i.e. applied only via manual approval.
>>>> >
>>>> > Meanwhile, some trees have no stable-maintenance, and AUTOSEL would
>>>> > help for them. They can be opt-out, i.e. kept until someone rejects.
>>>>
>>>> +1 on AUTOSEL opt-in. It's annyoing at best, when it backports cleanup
>>>> patches (because somehow those look like stealthy security fixes
>>>> sometimes) and breaks a bunch of people's boxes for no good reason.
>>>>
>>>> In general it'd be really good if -stable had a clearer audit path.
>>>> Every patch have a recorded reason why it's being applied (e.g. Cc:
>>>> stable in upstream, Link to the lkml thread/bug report, AUTOSEL mail,
>>>> whatever), so that after the fact I can figure out why a -stable patch
>>>> happend, that would be really great. Atm -stable occasionally blows
>>>> up, with a patch we didn't mark as cc: stable, and we have no idea
>>>> whyiit showed up in -stable even. That makes it really hard to do
>>>> better next time around.
>>>
>>> I try to keep the audit thread here, as I get asked all the time why
>>> stuff got added.
>>>
>>> Here's what I do, it's not exactly obvious, sorry:
>>> - if it came from a stable@ tag, just leave it alone and add my
>>> signed-off-by
>>> - if it was manually requested by someone, I add a "cc:
>>> requestor" to the signed-off-by area and add my s-o-b
>>
>>Cc-stable-requested-by: would be more obvious. If you have, lkml
>>archive link with the bug report is even better.
>>
>>An additional quirk in drm is that we have committers, so normal Cc:
>>rules (author + committer + anyone already on Cc:) has a good chance
>>of leaving out maintainers. And generally committers don't care one
>>bit about some multi-year old LTS kernel, not their job ... You'll
>>never get any review from them.
>>
>>> - if it came from Sasha's tree, Sasha's s-o-b is on it
>>
>>How do things end up in Sasha's tree? Is that just AUTOSEL, or also
>>other patches?
>
> Just autosel. Other patches take the regular way into Stable.
>
>>> - if it came from David Miller's patchset, his s-o-b is on it.
>>
>>Ok, that's netdev and Dave knows what's he doing :-)
>>
>>> That should cover all types of patches currently going into the trees,
>>> right?
>>>
>>> So always, you can cc: everyone on the s-o-b area and get the people
>>> involved in the patch and someone involved in reviewing it for stable
>>> inclusion.
>>
>>Let's pick a concrete example:
>>
>>commit c81350c31d0d20661a0aa839b79182bcb0e7a45d
>>Author: Satendra Singh Thakur <satendra.t at samsung.com>
>>Date: Thu May 3 11:19:32 2018 +0530
>>
>> drm/atomic: Handling the case when setting old crtc for plane
>>
>> [ Upstream commit fc2a69f3903dfd97cd47f593e642b47918c949df ]
>>
>> In the func drm_atomic_set_crtc_for_plane, with the current code,
>> if crtc of the plane_state and crtc passed as argument to the func
>> are same, entire func will executed in vein.
>> It will get state of crtc and clear and set the bits in plane_mask.
>> All these steps are not required for same old crtc.
>> Ideally, we should do nothing in this case, this patch handles the same,
>> and causes the program to return without doing anything in such scenario.
>>
>> Signed-off-by: Satendra Singh Thakur <satendra.t at samsung.com>
>> Cc: Madhur Verma <madhur.verma at samsung.com>
>> Cc: Hemanshu Srivastava <hemanshu.s at samsung.com>
>> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>> Link: https://patchwork.freedesktop.org/patch/msgid/1525326572-25854-1-git-send-email-satendra.t@samsung.com
>> Signed-off-by: Sasha Levin <alexander.levin at microsoft.com>
>> Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
>>
>>Upstream patch doesn't have a cc: stable. I tried looking for it in my
>>mail archives (and it's a patch committed by myself, so I guess I'll
>>get cc'ed?), didn't find anything.
>
> I'm really not sure why you don't see the mail. Can you maybe see if it
> got filtered as spam?
Nothing in spam either. Maybe gmail cleaned it out already.
>>I have no idea why this got added at all. Looking at the discussion on
>>dri-devel, it's purely a cleanup for consistency with another
>>function. And it blew up :-/
>
> On the flip side, what about:
>
> commit 3fd34ac02ae8cc20d78e3aed2cf6e67f0ae109ea
> Author: Hang Yuan <hang.yuan at linux.intel.com>
> Date: Mon Jul 23 20:15:46 2018 +0800
>
> drm/i915/gvt: fix cleanup sequence in intel_gvt_clean_device
>
> Create one vGPU and then unbind IGD device from i915 driver. The following
> oops will happen. This patch will free vgpu resource first and then gvt
> resource to remove these oops.
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
> PGD 80000003c9d2c067 P4D 80000003c9d2c067 PUD 3c817c067 P MD 0
> Oops: 0002 [#1] SMP PTI
> RIP: 0010:down_write+0x1b/0x40
> Call Trace:
> debugfs_remove_recursive+0x46/0x1a0
> intel_gvt_debugfs_remove_vgpu+0x15/0x30 [i915]
> intel_gvt_destroy_vgpu+0x2d/0xf0 [i915]
> intel_vgpu_remove+0x2c/0x30 [kvmgt]
> mdev_device_remove_ops+0x23/0x50 [mdev]
> mdev_device_remove+0xdb/0x190 [mdev]
> mdev_device_remove+0x190/0x190 [mdev]
> device_for_each_child+0x47/0x90
> mdev_unregister_device+0xd5/0x120 [mdev]
> intel_gvt_clean_device+0x91/0x120 [i915]
> i915_driver_unload+0x9d/0x120 [i915]
> i915_pci_remove+0x15/0x20 [i915]
> pci_device_remove+0x3b/0xc0
> device_release_driver_internal+0x157/0x230
> unbind_store+0xfc/0x150
> kernfs_fop_write+0x10f/0x180
> __vfs_write+0x36/0x180
> ? common_file_perm+0x41/0x130
> ? _cond_resched+0x16/0x40
> vfs_write+0xb3/0x1a0
> ksys_write+0x52/0xc0
> do_syscall_64+0x55/0x100
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> BUG: unable to handle kernel NULL pointer dereference at 0 000000000000038
> PGD 8000000405bce067 P4D 8000000405bce067 PUD 405bcd067 PM D 0
> Oops: 0000 [#1] SMP PTI
> RIP: 0010:hrtimer_active+0x5/0x40
> Call Trace:
> hrtimer_try_to_cancel+0x25/0x120
> ? tbs_sched_clean_vgpu+0x1f/0x50 [i915]
> hrtimer_cancel+0x15/0x20
> intel_gvt_destroy_vgpu+0x4c/0xf0 [i915]
> intel_vgpu_remove+0x2c/0x30 [kvmgt]
> mdev_device_remove_ops+0x23/0x50 [mdev]
> mdev_device_remove+0xdb/0x190 [mdev]
> ? mdev_device_remove+0x190/0x190 [mdev]
> device_for_each_child+0x47/0x90
> mdev_unregister_device+0xd5/0x120 [mdev]
> intel_gvt_clean_device+0x89/0x120 [i915]
> i915_driver_unload+0x9d/0x120 [i915]
> i915_pci_remove+0x15/0x20 [i915]
> pci_device_remove+0x3b/0xc0
> device_release_driver_internal+0x157/0x230
> unbind_store+0xfc/0x150
> kernfs_fop_write+0x10f/0x180
> __vfs_write+0x36/0x180
> ? common_file_perm+0x41/0x130
> ? _cond_resched+0x16/0x40
> vfs_write+0xb3/0x1a0
> ksys_write+0x52/0xc0
> do_syscall_64+0x55/0x100
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: bc7b0be316ae("drm/i915/gvt: Add basic debugfs infrastructure")
> Fixes: afe04fbe6c52("drm/i915/gvt: create an idle vGPU")
> Signed-off-by: Hang Yuan <hang.yuan at linux.intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw at linux.intel.com>
>
> Which wasn't tagged for (and is not in any) stable trees?
Not stable material, it fixes just a driver unload bug. That's for
developers only. Worst case you break some user's box for this, which
I don't think is cool. Since we're a 100% upstream driver team this
won't harm developers if it's not backported.
Note that because of fbcon and other reasons, an rmmod i915 will fail.
You need to enable a bunch of CONFIG_EXPERT options (with scary texts
and stuff) and have a script from our test suite to be able to even
make this happen.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Ksummit-discuss
mailing list