For review: rewritten pivot_root(2) manual page

Michael Kerrisk (man-pages) mtk.manpages at gmail.com
Wed Oct 9 10:22:49 UTC 2019


Hello Eric,

Thank you. I was hoping you might jump in on this thread.

Please see below.

On 10/9/19 10:46 AM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
> 
>> Hello Philipp,
>>
>> My apologies that it has taken a while to reply. (I had been hoping
>> and waiting that a few more people might weigh in on this thread.)
>>
>> On 9/23/19 3:42 PM, Philipp Wendler wrote:
>>> Hello Michael,
>>>
>>> Am 23.09.19 um 14:04 schrieb Michael Kerrisk (man-pages):
>>>
>>>> I'm considering to rewrite these pieces to exactly
>>>> describe what the system call does (which I already
>>>> do in the third paragraph) and remove the "may or may not"
>>>> pieces in the second paragraph. I'd welcome comments
>>>> on making that change.

What did you think about my proposal above? To put it in context,
this was my initial comment in the mail:

[[
One area of the page that I'm still not really happy with
is the "vague" wording in the second paragraph and the note
in the third paragraph about the system call possibly
changing. These pieces survive (in somewhat modified form)
from the original page, which was written before the
system call was released, and it seems there was some
question about whether the system call might still change
its behavior with respect to the root directory and current
working directory of other processes. However, after 19
years, nothing has changed, and surely it will not in the
future, since that would constitute an ABI breakage.
I'm considering to rewrite these pieces to exactly
describe what the system call does (which I already
do in the third paragraph) and remove the "may or may not"
pieces in the second paragraph. I'd welcome comments
on making that change.
]]

And the second and third paragraphs of the manual page currently
read:

[[
       pivot_root()  may  or may not change the current root and the cur‐
       rent working directory of any processes or threads  that  use  the
       old  root  directory  and which are in the same mount namespace as
       the caller of pivot_root().  The  caller  of  pivot_root()  should
       ensure  that  processes  with root or current working directory at
       the old root operate correctly in either case.   An  easy  way  to
       ensure  this is to change their root and current working directory
       to  new_root  before  invoking  pivot_root().   Note   also   that
       pivot_root()  may  or may not affect the calling process's current
       working directory.  It is therefore recommended to call chdir("/")
       immediately after pivot_root().

       The  paragraph  above  is  intentionally vague because at the time
       when pivot_root() was first implemented, it  was  unclear  whether
       its  affect  on  other process's root and current working directo‐
       ries—and the caller's current working  directory—might  change  in
       the  future.   However, the behavior has remained consistent since
       this system call was first implemented: pivot_root()  changes  the
       root  directory  and the current working directory of each process
       or thread in the same mount namespace to new_root if they point to
       the  old  root  directory.   (See also NOTES.)  On the other hand,
       pivot_root() does not change the caller's current  working  direc‐
       tory  (unless it is on the old root directory), and thus it should
       be followed by a chdir("/") call.
]]

>>> I think that it would make the man page significantly easier to
>>> understand if if the vague wording and the meta discussion about it are
>>> removed.
>>
>> It is my inclination to make this change, but I'd love to get more
>> feedback on this point.
>>
>>>> DESCRIPTION
>>> [...]>        pivot_root()  changes  the
>>>>        root  directory  and the current working directory of each process
>>>>        or thread in the same mount namespace to new_root if they point to
>>>>        the  old  root  directory.   (See also NOTES.)  On the other hand,
>>>>        pivot_root() does not change the caller's current  working  direc‐
>>>>        tory  (unless it is on the old root directory), and thus it should
>>>>        be followed by a chdir("/") call.
>>>
>>> There is a contradiction here with the NOTES (cf. below).
>>
>> See below.
>>
>>>>        The following restrictions apply:
>>>>
>>>>        -  new_root and put_old must be directories.
>>>>
>>>>        -  new_root and put_old must not be on the same filesystem as  the
>>>>           current root.  In particular, new_root can't be "/" (but can be
>>>>           a bind mounted directory on the current root filesystem).
>>>
>>> Wouldn't "must not be on the same mountpoint" or something similar be
>>> more clear, at least for new_root? The note in parentheses indicates
>>> that new_root can actually be on the same filesystem as the current
>>> note. However, ...
>>
>> For 'put_old', it really is "filesystem".
> 
> If we are going to be pedantic "filesystem" is really the wrong concept
> here.  The section about bind mount clarifies it, but I wonder if there
> is a better term.

Thanks. My aim was to try to distinguish "mount point" from
"a mount somewhere inside the file system associated with a
certain mount point"--in other words, I wanted to make it clear
that 'put_old' (and 'new_root') could not be subdirectories
under the current root mount point (which is correct, right?).

Using "mount" does seem better. (My only concern is that some
people may take it to mean "the mount point", but perhaps that
just my own confusion.)

> I think I would say: "new_root and put_old must not be on the same mount
> as the current root."

I've made that change.

> I think using "mount" instead of "filesystem" keeps the concepts less
> confusing.
> 
> As I am reading through this email and seeing text that is trying to be
> precise and clear then hitting the term "filesystem" is a bit jarring.
> pivot_root doesn't care a thing for file systems.  pivot_root only cares
> about mounts.
> 
> And by a "mount" I mean the thing that you get when you create a bind
> mount or you call mount normally.

Thanks for the above comments.

Hmm, doI need to make similar changes in the initial paragraph of
the manual page as well? It currently reads:

       pivot_root() changes the root filesystem in the mount namespace of
       the calling process.  More precisely, it moves the root filesystem
       to  the directory put_old and makes new_root the new root filesys‐
       tem.  The calling process must have the  CAP_SYS_ADMIN  capability
       in the user namespace that owns the caller's mount namespace.

Furthermore the one line NAME of the man page reads:

       pivot_root - change the root filesystem

Is a change needed there also?

> Michael do you have man pages for the new mount api yet?

David Howells wrote pages in mid-2018, well before the syscalls got
merged in the kernel (in mid-2019). I did not merge them because
the code was not yet in the kernel, and lacking time, I never chased
David when the syscalls did get merged to see if the pages were still
up to date. I pinged David just now.

>> For 'new_root', see below.
>>
>>>>        -  put_old must be at or underneath new_root; that  is,  adding  a
>>>>           nonnegative  number  of /.. to the string pointed to by put_old
>>>>           must yield the same directory as new_root.
>>>>
>>>>        -  new_root must be a mount point.  (If  it  is  not  otherwise  a
>>>>           mount  point,  it  suffices  to  bind  mount new_root on top of
>>>>           itself.)
>>>
>>> ... this item actually makes the above item almost redundant regarding
>>> new_root (except for the "/") case. So one could replace this item with
>>> something like this:
>>>
>>> - new_root must be a mount point different from "/". (If it is not
>>>   otherwise a mount point, it suffices  to bind mount new_root on top
>>>   of itself.)
>>>
>>> The above item would then only mention put_old (and maybe use clarified
>>> wording on whether actually a different file system is necessary for
>>> put_old or whether a different mount point is enough).
>>
>> Thanks. That's a good suggestion. I simplified the earlier bullet
>> point as you suggested, and changed the text here to say:
>>
>>        -  new_root must be a mount point, but can't be "/".  If it is not
>>           otherwise  a mount point, it suffices to bind mount new_root on
>>           top of itself.  (new_root can be a bind  mounted  directory  on
>>           the current root filesystem.)
> 
> How about:
>           - new_root must be the path to a mount, but can't be "/".  Any

Surely here it must be "mount point" not "mount"? (See my discussion
above.)

>           path that is not already a mount can be converted into one by
>           bind mounting the path onto itself.
>>>> NOTES
>>> [...]
>>>>        pivot_root() allows the caller to switch to a new root  filesystem
>>>>        while  at  the  same time placing the old root mount at a location
>>>>        under new_root from where it can subsequently be unmounted.   (The
>>>>        fact  that  it  moves  all processes that have a root directory or
>>>>        current working directory on the old root filesystem  to  the  new
>>>>        root  filesystem  frees the old root filesystem of users, allowing
>>>>        it to be unmounted more easily.)
>>>
>>> Here is the contradiction:
>>> The DESCRIPTION says that root and current working dir are only changed
>>> "if they point to the old root directory". Here in the NOTES it says
>>> that any root or working directories on the old root file system (i.e.,
>>> even if somewhere below the root) are changed.
>>>
>>> Which is correct?
>>
>> The first text is correct. I must have accidentally inserted
>> "filesystem" into the paragraph just here during a global edit.
>> Thanks for catching that.
>>
>>> If it indeed affects all processes with root and/or current working
>>> directory below the old root, the text here does not clearly state what
>>> the new root/current working directory of theses processes is.
>>> E.g., if a process is at /foo and we pivot to /bar, will the process be
>>> moved to /bar (i.e., at / after pivot_root), or will the kernel attempt
>>> to move it to some location like /bar/foo? Because the latter might not
>>> even exist, I suspect that everything is just moved to new_root, but
>>> this could be stated explicitly by replacing "to the new root
>>> filesystem" in the above paragraph with "to the new root directory"
>>> (after checking whether this is true).
>>
>> The text here now reads:
>>
>>        pivot_root() allows the caller to switch to a new root  filesystem
>>        while  at  the  same time placing the old root mount at a location
>>        under new_root from where it can subsequently be unmounted.   (The
>>        fact  that  it  moves  all processes that have a root directory or
>>        current working directory on the old root  directory  to  the  new
>>        root  frees the old root directory of users, allowing the old root
>>        filesystem to be unmounted more easily.)
> 
> 
> Please "mount" instead of "filesystem".

Changed.


>>>> EXAMPLE>        The program below demonstrates the use of  pivot_root()  inside  a
>>>>        mount namespace that is created using clone(2).  After pivoting to
>>>>        the root directory named in the program's first command-line argu‐
>>>>        ment,  the  child  created  by  clone(2) then executes the program
>>>>        named in the remaining command-line arguments.
>>>
>>> Why not use the pivot_root(".", ".") in the example program?
>>> It would make the example shorter, and also works if the process cannot
>>> write to new_root (e..g., in a user namespace).
>>
>> I'm not sure. Some people have a bit of trouble to wrap their head
>> around the pivot_root(".", ".") idea. (I possibly am one of them.)
>> I'd be quite keen to hear other opinions on this. Unfortunately,
>> few people have commented on this manual page rewrite.
> 
> I am happy as long as it is pivot_root(".", ".") is documented
> somewhere.  There is real code that uses it so it is not going away.
> Plus pivot_root(".", ".") is really what is desired in a lot of
> situations where the caller of pivot_root is an intermediary and
> does not control the new root filesystem.  At which point the only
> path you can be guaranteed to exit on the new root filesystem is "/".

Good. There is documentation of pivot_root(".", ".") i the page!

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


More information about the Containers mailing list