memrlimit controller merge to mainline
balbir at linux.vnet.ibm.com
Mon Aug 4 21:53:04 PDT 2008
Hugh Dickins wrote:
> On Tue, 5 Aug 2008, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>> BUG: unable to handle kernel paging request at 6b6b6b8b
>>> IP: [<7817078f>] memrlimit_cgroup_uncharge_as+0x18/0x29
>>> Pid: 22500, comm: swapoff Not tainted (2.6.26-rc8-mm1 #7)
>>> [<78161323>] ? exit_mmap+0xaf/0x133
>>> [<781226b1>] ? mmput+0x4c/0xba
>>> [<78165ce3>] ? try_to_unuse+0x20b/0x3f5
>>> [<78371534>] ? _spin_unlock+0x22/0x3c
>>> [<7816636a>] ? sys_swapoff+0x17b/0x37c
>>> [<78102d95>] ? sysenter_past_esp+0x6a/0xa5
>> I am unable to reproduce the problem,
> Me neither, I've spent many hours trying 2.6.27-rc1-mm1 and then
> back to 2.6.26-rc8-mm1. But I've been SO stupid: saw it originally
> on one machine with SLAB_DEBUG=y, have been trying since mostly on
> another with SLUB_DEBUG=y, but never thought to boot with
> slub_debug=P,task_struct until now.
Unfortunately, I've not tried on 32 bit and not at all with SLAB_DEBUG=y. I'll
give the latter a trial run and see what I get.
>> but I do have an initial hypothesis
>> CPU0 CPU1
>> task 1 stars exiting look at mm = task1->mm
>> .. increment mm_users
>> task 1 exits
>> mm->owner needs to be updated, but
>> no new owner is found
>> (mm_users > 1, but no other task
>> has task->mm = task1->mm)
>> mm_update_next_owner() leaves
>> grace period
>> user count drops, call mmput(mm)
>> task 1 freed
>> dereferencing mm->owner fails
> Yes, that looks right to me: seems obvious now. I don't think your
> careful alternation of CPU0/1 events at the end matters: the swapoff
> CPU simply dereferences mm->owner after that task has gone.
> (That's a shame, I'd always hoped that mm->owner->comm was going to
> be good for use in mm messages, even when tearing down the mm.)
The problem we have is that tasks are independent of mm_struct's (in some ways)
and are associated almost like a database associates two entities through keys.
>> I do have a potential solution in mind, but I want to make sure my
>> hypothesis is correct.
> It seems wrong that memrlimit_cgroup_uncharge_as should be called
> after mm->owner may have been changed, even if it's to something safe.
> But I forget the mm/task exit details, surely they're tricky.
The fix would be to uncharge when a new owner can no longer be found (I am yet
to code/test it though).
> By the way, is the ordering in mm_update_next_owner the best?
> Would there be less movement if it searched amongst siblings before
> it searched amongst children? Ought it to make a first pass trying
> to stay within the same cgroup?
Yes, we need to make a first pass at keeping it in the same cgroup. You might be
right about the sibling optimization.
Linux Technology Center
More information about the Containers