Virtualizing /proc/sys/kernel/random/boot_id per container ?

Daniel P. Berrange berrange at redhat.com
Thu Aug 30 22:50:02 UTC 2012


On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
> "Daniel P. Berrange" <berrange at redhat.com> writes:
> 
> > One of the features that SystemD folks have asked us to fix in LXC, is
> > to make sure that /proc/sys/kernel/random/boot_id changes each time a
> > container is started.
> 
> There may be a good reason for this.  Most of the time what I have seen
> of kernel requests from the direction of SystemD is that while there may
> be a real problem but usually their imagined solution is not a
> particularly good solution.  So a description of the problem is needed.
> 
> Justifying something with just SystemD wants this is a good way to get
> a nack.

SystemD records log messages for all system services in their journal.
They can show you all log messages for the current service execution,
all log messages for a service since system boot, or all log messsages
ever. The boot_id value is used as a unique tag to allow grouping of
the log messages per system boot. When we run systemd inside a container
we want to get that grouping of log messages generated by services inside
the container, to take account of the container boot, not the host boot.
Hence the desire to have the boot_id value reflect when a container is
booted.

> > The current semantics are that this file produces a new random UUID each
> > time the host OS is booted. Obviously each time we start a container now,
> > they just see the host's random boot_id, so from a container's POV this
> > does not change each time it starts.
> 
> That is correct.  As I recall the contract with boot_id is to provide
> a unique per boot value to assist in dealing with boots etc.  I seem
> to recall emacs uses the combination of hostname+boot_id to help
> generate unique lock files names.
> 
> I would definitely need a refresher on how boot_id is used in practice
> by applications other than SystemD before I could suggest a good design.
> 
> There is also a question of uptime.

Agreed, as you say, this is one of many other /proc values needing
virtualizing for container.

> > There seems to be general agreement that, aside from the PID directories,
> > changes to data in  proc should be done by a FUSE filesystem overlay of
> > some kind.
> 
> No.  I have yet to see a justification for using FUSE in containers on
> top of proc files.
> 
> I have seen a lot of bad ideas suggested like hacking /proc/cpuinfo
> instead of providing a proper mechanism to tell applications how
> parallel they can/should be.
> 
> For hacks and controversial ideas FUSE is good because it makes it
> someone else's problem and it means it isn't something we have to
> support in the kernel for the indefinite future.  At the same time in
> general a FUSE solution does not really solve anything it just sort of
> papers over a problem.
> 
> For some problems papering over them is good enough, for other problems
> they really should be solved properly.

Ok, well I guess things aren't as clear cut as I understood then. I've
been told that FUSE was the desired approach to dealing with all the
various files in /proc that might need changing for containers. Personally
I don't much care what approach is used - if the kernel wants to do more
stuff that's fine with my from a libvirt LXC POV. I'll just follow whatever
the consensus is in this area.

> > We could use that mechanism to fix 'boot_id' in userspace, but
> > I'm wondering if this is a better candidate for dealing with in kernel
> > space, since as well as the /proc/sys tree, the data is also visible via
> > the sysctl() system call which a FUSE overlay won't address.
> 
> Any application that uses the sysctl() system call needs to be fixed.
> When I looked years ago the number of applications using sysctl() could
> be numbered on one hand and most of those applications were the fedora
> installer, and the fedora installer hasn't used sysctl.

Ok, I did wonder whether anyone would actually use sysctl() instead
of reading /proc/sys. If we can ignore the sysctl that gives us more
options.

> > The kernel doesn't have a real concept of a 'container' to associate
> > a boot_id value with as such, but maybe it is reasonable to associate
> > a boot_id value with each PID namespace ?
> 
> There is also the question of uptime and clocks and things like that.
> 
> The utsnamespace might be a more resasonable place to tack on that kind
> of extended functionality.
>
> Just changing boot_id itself and not all of the other bits that track
> when we have booted does not seem reasonable.
> 
> Once we can sort out the details a kernel implementation should be quite
> trivial.  It just requires the appropriate sysctl registration dance.

Ok, I'll try to identify a list of other related parts which need changing
wrt boot.

Thanks for the feedback.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


More information about the Containers mailing list