[GIT PULL] userns fixes for 4.17-rc2

Eric W. Biederman ebiederm at xmission.com
Tue Jun 19 11:23:47 UTC 2018


Linus,

Please pull the userns-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-linus

   HEAD: 04035aa33a1258ca3c30f58138897ca3e97485f1 proc: Don't change mount options on remount failure.

Mount options for proc have been something of a mess since they were
added in the beginnging of 2012.  I compounded that in 2016 by merging
a change that in practice ignored the proc mount options except on remount.

Ordinarily noticing that in 2018 that had been broken for 2 years
without complaint I would think hmm "Can we just get rid of these
things".  Unfortunately it was someone who uses the proc hidepid option
that noticed this problem.  So fixed it must be.

I stared at this code for quite a while and I finally concluded that the
best course forward is to simply things and remove the internal kernel
mount of proc.  The internal mount of proc is directly responsible for
this regression and it has been the source of pain over the years.

The cost of this simplification is that proc_flush_task gains two more
atomic operations.

The upside is that proc is no longer special.  So following the same
idioms as filesystems will no longer be a problem.


While I was looking at the mount options of proc I found two more issues
that date back to the original change that added them.  A remount of
proc that fails did not return the proper error code.  A remount of proc
that fails could wind up changing the proc mount options.

I have personally tested all of these changed and verified everything
works correctly.  Alistair Strachan has tested and verified that
Android's use of proc's hidepid option works with this change.

I had hoped to let this sit a little longer in linux-next just in case
some of the build bots might turn up something I had missed.  But with
the parallel fscontext changes to proc that testing won't happen.

With linux-next useless, I figure the better part of valor is a pull
request that explains the reasons for this change and highlights the
subtle issues with mount option handling.  Hopefully something we can
solve with the new fscontext userspace API before it gets merged.


This also looks like time to revisit killing off sysctl syscall support.
I don't believe anyone compiles it into their kernel any more and
keeping the support made this change more difficult than necessary.

Eric W. Biederman (3):
      proc: Simplify and fix proc by removing the kernel mount
      proc: Change proc_parse_options to return an errno value
      proc: Don't change mount options on remount failure.

 arch/um/drivers/mconsole_kern.c |  4 +--
 fs/proc/base.c                  | 36 ++++++++++++++++++++-----
 fs/proc/inode.c                 | 17 +++++++++---
 fs/proc/internal.h              |  7 ++++-
 fs/proc/root.c                  | 58 ++++++++++++++++++++++-------------------
 include/linux/pid_namespace.h   |  3 +--
 include/linux/proc_ns.h         |  7 ++---
 kernel/pid.c                    |  8 ------
 kernel/pid_namespace.c          |  7 -----
 kernel/sysctl_binary.c          |  5 ++--
 10 files changed, 87 insertions(+), 65 deletions(-)

Eric


More information about the Containers mailing list