memrlimit controller merge to mainline
hugh at veritas.com
Mon Aug 4 14:52:36 PDT 2008
On Tue, 5 Aug 2008, Balbir Singh wrote:
> Hugh Dickins wrote:
> > BUG: unable to handle kernel paging request at 6b6b6b8b
> > IP: [<7817078f>] memrlimit_cgroup_uncharge_as+0x18/0x29
> > Pid: 22500, comm: swapoff Not tainted (2.6.26-rc8-mm1 #7)
> > [<78161323>] ? exit_mmap+0xaf/0x133
> > [<781226b1>] ? mmput+0x4c/0xba
> > [<78165ce3>] ? try_to_unuse+0x20b/0x3f5
> > [<78371534>] ? _spin_unlock+0x22/0x3c
> > [<7816636a>] ? sys_swapoff+0x17b/0x37c
> > [<78102d95>] ? sysenter_past_esp+0x6a/0xa5
> I am unable to reproduce the problem,
Me neither, I've spent many hours trying 2.6.27-rc1-mm1 and then
back to 2.6.26-rc8-mm1. But I've been SO stupid: saw it originally
on one machine with SLAB_DEBUG=y, have been trying since mostly on
another with SLUB_DEBUG=y, but never thought to boot with
slub_debug=P,task_struct until now.
> but I do have an initial hypothesis
> CPU0 CPU1
> task 1 stars exiting look at mm = task1->mm
> .. increment mm_users
> task 1 exits
> mm->owner needs to be updated, but
> no new owner is found
> (mm_users > 1, but no other task
> has task->mm = task1->mm)
> mm_update_next_owner() leaves
> grace period
> user count drops, call mmput(mm)
> task 1 freed
> dereferencing mm->owner fails
Yes, that looks right to me: seems obvious now. I don't think your
careful alternation of CPU0/1 events at the end matters: the swapoff
CPU simply dereferences mm->owner after that task has gone.
(That's a shame, I'd always hoped that mm->owner->comm was going to
be good for use in mm messages, even when tearing down the mm.)
> I do have a potential solution in mind, but I want to make sure my
> hypothesis is correct.
It seems wrong that memrlimit_cgroup_uncharge_as should be called
after mm->owner may have been changed, even if it's to something safe.
But I forget the mm/task exit details, surely they're tricky.
By the way, is the ordering in mm_update_next_owner the best?
Would there be less movement if it searched amongst siblings before
it searched amongst children? Ought it to make a first pass trying
to stay within the same cgroup?
More information about the Containers