[PATCH V4 1/8] namespaces: assign each namespace instance a serial number

Tue Sep 2 21:40:35 UTC 2014

On 14/08/28, Eric W. Biederman wrote:
> Richard Guy Briggs <rgb at redhat.com> writes:
> > On 14/08/23, Eric W. Biederman wrote:
> >> Richard Guy Briggs <rgb at redhat.com> writes:
> >> 
> >> > Generate and assign a serial number per namespace instance since boot.
> >> >
> >> > Use a serial number per namespace (unique across one boot of one kernel)
> >> > instead of the inode number (which is claimed to have had the right to change
> >> > reserved and is not necessarily unique if there is more than one proc fs) to
> >> > uniquely identify it per kernel boot.
> >> 
> >> This approach is just broken.
> >> 
> >> For this to work with migration (aka criu) you need to implement a
> >> namespace of namespaces.  You haven't done this, and therefore
> >> such an interface will break existing userspace.
> >> 
> >> Inside of audit I can understand not caring about these issues,
> >> but you go foward and expose these serial numbers in proc,
> >> and generally make this infrastructure available to others.
> >> 
> >> The deep issue with migration is that we move tasks from one machine
> >> from another and on the destination machine we need to have all of the
> >> same global identifiers for software to function properly.
> >> 
> >> My weasel words around the proc inode numbers is to preserve to allow us
> >> room to be able to restore those ids if it every becomes relevant for
> >> migration.
> >
> > What do you do if the inode number is already in use on the target
> > host?
> 
> Since the inode numbers are relative to a superblock or a pid namespace
> the numbers that are in use can be restored on the target system
> by creating them in the appropriate namespace.

So you seem to be advocating for a namespace of namespaces, since
neither host can create a new namespace without consulting the others in
its pool for a new free number.

> The support does not exist in the kernel today for doing that because no
> one has cared but as architected the support can be added if needed to
> support migration.
> 
> >> That is the proc inode numbers (technically) live in a pid namespace,
> >> (aka a mount of proc).  So depending on the pid namespace you are in
> >> or the mount of proc you look in the numbers could change.
> >> 
> >> Qualifications like that must exist to have a prayer of ever supporting
> >> process migration in the crazy corner cases where people start caring
> >> about inode numbers.
> >> 
> >> We currently don't and inode numbers for a namespace will never change
> >> after a namespace is created.  So I think you really are ok using the
> >> proc inode numbers.  I am happy declaring by fiat that the inode numbers
> >> that audit uses are the numbers connected to the initial pid namespace.
> >
> > But once a namespace/container is migrated, it is a different audit that
> > is looking at it (unless we create an audit manager or entity that
> > functions at the level of a container manager), so audit should not care.
> 
> These numbers were exported to everyone as a general purpose facility in
> proc.  If audit is global and audit doesn't migrate you are right it
> doesn't matter.  However if these numbers are used by anyone else for
> anything else it causes a problem.

So let us restrict their use to audit, by removing them from
/proc/<pid>/ns/ and only exposing them via netlink calls to audit gated
by CAP_AUDIT_WRITE or CAP_AUDIT_CONTROL.

> Further given that people run entire distributions in containers we may
> reach the point where we wish to run auditd in a container in the
> future.  I would hate to paint ourselves into a corner with a design
> that could never allow audit to migrate.  Support that case someday
> seems a valid naive desire.

Agreed.  That is an option we do not want to rule out at this point.
I'll need to think about this one more.

> >> At a fairly basic level anything that is used to identify namespaces for
> >> any general purpose use needs to have most if not all of the same
> >> properties of the proc inode numbers.  The most important of which is
> >> being tied to some context/namespace so there is a ability if we ever
> >> need it to migrate those numbers from one machine to another.
> >
> > Sooo...  does it make any sense to have those inode or serial numbers be
> > blank inside the namespace/container itself, but only visible to its
> > manager outside the container (unless it is the initial namespace)?
> 
> Mostly I think it makes sense to use the inode numbers from the initial
> pid namespace.  They already exist.  They already are unique.  (Which
> means I don't need to maintain more code and more special cases).  And
> the do what you need now.

Will inode numbers never be re-used once they are freed?  Guaranteed?

> I probably haven't followed closely enough but I don't see what makes
> inode numbers undesirable.

This posting:
	https://www.redhat.com/archives/linux-audit/2013-March/msg00032.html

> Eric

- RGB

--
Richard Guy Briggs <rbriggs at redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545