[PATCH 0/2] namespaces: log namespaces per task

Serge Hallyn serge.hallyn at ubuntu.com
Fri May 2 21:00:44 UTC 2014


Quoting Richard Guy Briggs (rgb at redhat.com):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb at redhat.com):
> > > I saw no replies to my questions when I replied a year after Aris' posting, so
> > > I don't know if it was ignored or got lost in stale threads:
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > 
> > > I've tried to answer a number of questions that were raised in that thread.
> > > 
> > > The goal is not quite identical to Aris' patchset.
> > > 
> > > The purpose is to track namespaces in use by logged processes from the
> > > perspective of init_*_ns.  The first patch defines a function to list them.
> > > The second patch provides an example of usage for audit_log_task_info() which
> > > is used by syscall audits, among others.  audit_log_task() and
> > > audit_common_recv_message() would be other potential use cases.
> > > 
> > > Use a serial number per namespace (unique across one boot of one kernel)
> > > instead of the inode number (which is claimed to have had the right to change
> > > reserved and is not necessarily unique if there is more than one proc fs).  It
> > > could be argued that the inode numbers have now become a defacto interface and
> > > can't change now, but I'm proposing this approach to see if this helps address
> > > some of the objections to the earlier patchset.
> > > 
> > > There could also have messages added to track the creation and the destruction
> > > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > > information to help identify a namespace.
> > > 
> > > There has been some progress made for audit in net namespaces and pid
> > > namespaces since this previous thread.  net namespaces are now served as peers
> > > by one auditd in the init_net namespace with processes in a non-init_net
> > > namespace being able to write records if they are in the init_user_ns and have
> > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > > of userspace processes that try to join netlink broadcast groups.
> > > 
> > > 
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Eric Biederman has always been adamantly opposed to adding new namespaces
> > of namespaces, so the fact that you're asking this question concerns me.
> 
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
> 
> One way that occured to me to be able to identify a kernel instance was
> to look at CPU serial numbers or other CPU entity intended to be
> globally unique, but that isn't universally available.

That's one issue, which is uniqueness of namespaces cross-machines.

But it gets worse if we consider that after allowing in-container audit,
we'll have a nested container running, then have the parent container
migrated to another host (or just checkpointed and restarted);  Now the
nexted container's indexes will all be changed.  Is there any way audit
can track who's who after the migration?

That's not an indictment of the serial # approach, since (a) we don't
have in-container audit yet and (b) we don't have c/r/migration of nested
containers.  But it's worth considering whether we can solve the issue
with serial #s, and, if not, whether we can solve it with any other
approach.

I guess one approach to solve it would be to allow userspace to request
a next serial #.  Which will immediately lead us to a namespace of serial
#s (since the requested # might be lower than the last used one on the
new host).

As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
unique, though perhaps we could attach a generation # for the sake of
audit.  Then after a c/r/migration the generation # may be different,
but we may have a better shot at at least using the same ino#.

> Another possibility was RTC reading at time of boot, but that isn't good
> enough either.
> 
> Both are dubious in VMs anyways.
> 
> > The way things are right now, since audit belongs to the init userns,
> > we can get away with saying if a container 'migrates', the new kernel
> > will see a different set of serials, and noone should care.  However,
> > if we're going to be allowing containers to have their own audit
> > namespace/layer/whatever, then this becomes more of a concern.
> 
> Having a container have its own audit daemon (partitionned appropriately
> in the kernel) would be a long-term goal.

Agreed, fwiw.

> > That said, I'll now look at the patches while pretending that problem
> > does not exist :)  If I ack, it'll be on correctness of the code, but
> > we'll still have to deal with this issue.
> 
> Getting some discussion about this migration challenge was a significant
> motivation for posting this patch, so I'm hoping others will weigh in.
> 
> Thanks for your review, Serge.
> 
> > > What additional events should list this information?
> > > 
> > > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > > init namespace at the moment.
> > > 
> > > 
> > > Proposed output format:
> > > This differs slightly from Aristeu's patch because of the label conflict with
> > > "pid=" due to including it in existing records rather than it being a seperate
> > > record:
> > >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > > 
> > > 
> > > Note: This set does not try to solve the non-init namespace audit messages and
> > > auditd problem yet.  That will come later, likely with additional auditd
> > > instances running in another namespace with a limited ability to influence the
> > > master auditd.  I echo Eric B's idea that messages destined for different
> > > namespaces would have to be tailored for that namespace with references that
> > > make sense (such as the right pid number reported to that pid namespace, and
> > > not leaking info about parents or peers).
> > > 
> > > 
> > > Richard Guy Briggs (2):
> > >   namespaces: give each namespace a serial number
> > >   audit: log namespace serial numbers
> > > 
> > >  fs/mount.h                     |    1 +
> > >  fs/namespace.c                 |    1 +
> > >  include/linux/audit.h          |    7 +++++++
> > >  include/linux/ipc_namespace.h  |    1 +
> > >  include/linux/nsproxy.h        |    8 ++++++++
> > >  include/linux/pid_namespace.h  |    1 +
> > >  include/linux/user_namespace.h |    1 +
> > >  include/linux/utsname.h        |    1 +
> > >  include/net/net_namespace.h    |    1 +
> > >  init/version.c                 |    1 +
> > >  ipc/msgutil.c                  |    1 +
> > >  ipc/namespace.c                |    2 ++
> > >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> > >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> > >  kernel/pid.c                   |    1 +
> > >  kernel/pid_namespace.c         |    2 ++
> > >  kernel/user.c                  |    1 +
> > >  kernel/user_namespace.c        |    2 ++
> > >  kernel/utsname.c               |    2 ++
> > >  net/core/net_namespace.c       |    4 +++-
> > >  20 files changed, 99 insertions(+), 1 deletions(-)
> > > 
> > > _______________________________________________
> > > Containers mailing list
> > > Containers at lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs at redhat.com>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545


More information about the Containers mailing list