[PATCHv4] locks: Filter /proc/locks output on proc pid ns

Thu Aug 4 15:21:31 UTC 2016

Nikolay Borisov <kernel at kyup.com> writes:

> On 08/04/2016 05:09 PM, Eric W. Biederman wrote:
>> Jeff Layton <jlayton at poochiereds.net> writes:
>> 
>>> On Thu, 2016-08-04 at 10:26 +0300, Nikolay Borisov wrote:
>>>> On busy container servers reading /proc/locks shows all the locks
>>>> created by all clients. This can cause large latency spikes. In my
>>>> case I observed lsof taking up to 5-10 seconds while processing around
>>>> 50k locks. Fix this by limiting the locks shown only to those created
>>>> in the same pidns as the one the proc fs was mounted in. When reading
>>>> /proc/locks from the init_pid_ns proc instance then perform no
>>>> filtering
>>>>
>>>>> Signed-off-by: Nikolay Borisov <kernel at kyup.com>
>>>> ---
>>>>  fs/locks.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index ee1b15f6fc13..df038c27b19f 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
>>>>  {
>>>>>  	struct locks_iterator *iter = f->private;
>>>>>  	struct file_lock *fl, *bfl;
>>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>>  
>>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>>  
>>>>> +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>
>>> Looks reasonable to me. Eric, any comments? If this looks alright I'll
>>> go ahead and merge into my -next branch for v4.9.
>> 
>> Generally this looks good to me.
>> 
>> Some related nits.
>> - We are not filtering the processes that are blocked waiting on the
>>   lock.
>> 
>> - The same issue shows up in show_fd_locks.
>> 
>> - In lock_get_status the code should say:
>>   if (fl->fl_nspid) {
>>   	/* Don't let fl_pid change depending on who is reading the file */
>>   	fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
>>         /* If there isn't a fl_pid don't display who is waiting on the lock */
>>         if (fl_pid == 0)
>>            return;
>>   } else {
>>   	fl_pid = fl->fl_pid;
>>   }
>> 
>>   All of which implies that lock_get_status needs to take proc_pidns
>>   from it's caller, or derive proc_pidns from the seq_file.
>
> Just had a quick look at the code. If the aforementioned change is
> introduced in lock_get_status and proc_pidns is derived from the
> seq_file, then the issue in show_fd_locks would also be fixed, correct?

Yes I believe so.

> We essentially want to skip showing locks for whose owner we don't have
> a mapping in the current pidns hierarchy?

Yes.  That is the semantic reason why this change is ok.  Don't display
things that are not parts of the current pid namespace.

It probably makes sense to fold all of these fixes together as they are
logically one semantic change.  Only show locks that are valid in procs
pid namespace.

Eric