[Bugme-janitors] [Bug 9182] Critical memory leak (dirty pages)

Wed Dec 19 11:21:44 PST 2007

http://bugzilla.kernel.org/show_bug.cgi?id=9182

------- Comment #51 from anonymous at kernel-bugs.osdl.org  2007-12-19 11:21 -------
Reply-To: torvalds at linux-foundation.org

On Wed, 19 Dec 2007, Ingo Molnar wrote:
> 
> ha! It triggered when i gave up after 15 minutes of trying to trigger it 
> via various stress tools and logged out of the testbox, over its 
> console:

Goodie. So this path does indeed seem to be the reason. 

>  WARNING: at mm/filemap.c:132 __remove_from_page_cache()
> Pid: 3238, comm: bash Not tainted 2.6.24-rc5 #111
>  [<c0105c46>] show_trace_log_lvl+0x12/0x25
>  [<c01063ea>] show_trace+0xd/0x10
>  [<c010670a>] dump_stack+0x57/0x5f
>  [<c01615cf>] __remove_from_page_cache+0x78/0xd4
>  [<c016164f>] remove_from_page_cache+0x24/0x2f
>  [<c0167183>] truncate_complete_page+0x2d/0x41
>  [<c0167252>] truncate_inode_pages_range+0xbb/0x29d
>  [<c0167440>] truncate_inode_pages+0xc/0x10
>  [<c016d329>] vmtruncate+0x7d/0x11d
>  [<c018e9e7>] inode_setattr+0x5e/0x139
>  [<c01be487>] ext3_setattr+0x189/0x1e5
>  [<c018ec0f>] notify_change+0x14d/0x2de
>  [<c017c3b7>] do_truncate+0x62/0x7b
>  [<c0184af6>] may_open+0x1a9/0x1f4
>  [<c01869b2>] open_namei+0x254/0x555
>  [<c017bd39>] do_filp_open+0x1f/0x35
>  [<c017bd8f>] do_sys_open+0x40/0xb5
>  [<c017be30>] sys_open+0x16/0x18
>  [<c0104bae>] sysenter_past_esp+0x5f/0xa5
>  =======================
> 
> so it's ext3 inode attributes and vmtruncate ... hmm .... fun :-)

No, it's not inode attributes in the "extended attribute" meaning - the 
"setattr()" thing is just the VFS's internal name for setting various 
perfectly standard state in an inode. 

In this case, it simply seems to be a regular O_TRUNC that causes us to 
truncate the file, which in turn causes a "notify_change()" with the inode 
size, and that causes the VFS layer to call down to the low-level 
filesystem that an "inode attribute" has changed (namely the size and the 
inode modification times). That in turn just causes a regular 
vmtruncate().

However, the interesting thing is that "truncate_complete_page()" already 
did a "cancel_dirty_page()" on that page, which should have cleared the 
dirty bit. And we do all of this with the page lock held, and after having 
unmapped the page from any user mappings, so how the *heck* did that page 
get to be dirty again by the time we do the "remove_from_page_cache()" 
right afterwards?

Regardless, very interesting, and this does seem to be the cause. The 
trivial patch for 2.6.24 - and any backports - may well be to just remove 
the warnings (and just keep the "fixup" in remove_from_page_cache()), but 
I'd really like to understand how that page got marked dirty again, and 
why it seems to be related to "data=journal".

(Of course, the "data=journal" may just be a timing/IO-pattern thing, and 
maybe this is a totally generic race that is just hard to hit under normal 
circumstances, but we really shouldn't be marking locked pages dirty!)

Anyway, I'd really love to have a confirmation from Krzysztof that this 
really does fix it for him too (with hopefully the same backtrace).

                        Linus

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.