Virtualizing /proc/sys/kernel/random/boot_id per container ?

Tue Sep 4 14:45:53 UTC 2012

On 09/04/2012 06:44 PM, Serge Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm at xmission.com):
>> Glauber Costa <glommer at parallels.com> writes:
>>
>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote:
>>>> "Daniel P. Berrange" <berrange at redhat.com> writes:
>>>>
>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote:
>>>>>> "Daniel P. Berrange" <berrange at redhat.com> writes:
>>>>>>
>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is
>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a
>>>>>>> container is started.
>>>>>>
>>>>>> There may be a good reason for this.  Most of the time what I have seen
>>>>>> of kernel requests from the direction of SystemD is that while there may
>>>>>> be a real problem but usually their imagined solution is not a
>>>>>> particularly good solution.  So a description of the problem is needed.
>>>>>>
>>>>>> Justifying something with just SystemD wants this is a good way to get
>>>>>> a nack.
>>>>>
>>>>> SystemD records log messages for all system services in their journal.
>>>>> They can show you all log messages for the current service execution,
>>>>> all log messages for a service since system boot, or all log messsages
>>>>> ever. The boot_id value is used as a unique tag to allow grouping of
>>>>> the log messages per system boot. When we run systemd inside a container
>>>>> we want to get that grouping of log messages generated by services inside
>>>>> the container, to take account of the container boot, not the host boot.
>>>>> Hence the desire to have the boot_id value reflect when a container is
>>>>> booted.
>>>>
>>>> Since SystemD post-dates containers and since the logging feature is not
>>>> currently in wide use that use case is completely non-persuasive.
>>>>
>>>> So far this just sounds like a plain SystemD bug and something that can
>>>> be easily changed at this point in time.
>>>>
>>>> It has been a long time but my fuzzy memory says that the originial
>>>> boot_id justification was based on use cases that could not be solved
>>>> any other way.
>>>>
>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233
>>>> that inspired the implementation of boot_id.  However reading the
>>>> current emacs source code it appears emacs gave up before boot_id
>>>> was implemented and stats /var/run/random-seed (which we seem to
>>>> have removed) or looks in wtmp or utmp for the latest boot record.
>>>>
>>>> I did a quick grep through the binaries on my system and I could not
>>>> find anything using /proc/sys/random/boot_id.
>>>>
>>>> That suggests to me that the proper solution is to actually just remove
>>>> boot_id.
>>>>
>>>> Hmm.  And then there is other interesting detail.  What should boot_id
>>>> return after the processes have migrated from one system to another.
>>>>
>>>
>>> Since this would be a per-boot id, this clearly has to be carried over
>>> with migration, along with all the tons of data we already carry.
>>
>> The twist of course is what does a boot mean.  If we are really after
>> machine boots than the current behavior is correct.
>>
>> Looking back in the archives the desired behavior appears to be a value
>> that can be used to see if a pid value must be stale.
>>
>> As a stale pid detector boot_id is pretty lousy.  Pids can still be
>> reused.
>>
>> Still a role as a stale pid detector makes it clear which namespace
>> boot_id should be in and how we should treat boot_id upon migration.
>>
>> You can only serve as a stale pid detector if you are in the pid
>> namespace.
>>
>> So at this point patches are welcome.  Hopefully with a summary
>> of the discussion.
> 
> I don't understand why this should be provided by the kernel.  Especially
> given that we've proven that everyone really wants this to be per-container
> as well.
> 
> So why not just have init, on startup, create a /run/boot_id file, perhaps
> by sha1summing the time at which it started perhaps plus some nonce?
> 
Why shouldn't it provided by the kernel?, is the real question

The way I see it, every file we need to setup from the outside is a
hassle. Among many other things, it is just asking for duplication of
efforts among multiple userspaces.

netns does this for its proc files. The only reason we don't do it for
cgroups-driven file, is that the semantics is very ill-defined. For this
file, it doesn't seem to be the case.