RFC(v2): Audit Kernel Container IDs

Wed Oct 18 20:56:06 UTC 2017

On Tue, Oct 17, 2017 at 11:44 AM, James Bottomley
<James.Bottomley at hansenpartnership.com> wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>> > Without a *kernel* policy on containerIDs you can't say what
>> > security policy is being exempted.
>>
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
>
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.  The label should only be susceptible
> to modification by something possessing a capability (which one TBD).
> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.
>
> The label will be used as a tag for audit information.
>
> I think you were missing label inheritance above.

That is a pretty good summary of what we want to do, and what Richard
and I have discussed while brainstorming this offline.  The details
may not have translated well into those initial emails from Richard,
but I think you've got the idea, even if some of the smaller details
are still TBD.  FWIW, right now I'm not as worried about the exact
capability or the size of the audit container ID, I think those things
will sort themselves out as we progress through the implementation,
especially once we get to the next stage when we start to allow copies
of the audit records to be routed to audit daemons running inside
containers (note well that I said "copies", the host system still sees
all).

> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  I actually think this
> means the label should be write once (once you've set it, you can't
> change it) ...

Richard and I have talked about a write once approach, but the
thinking was that you may want to allow a nested container
orchestrator (Why? I don't know, but people always want to do the
craziest things.) and a write-once policy makes that impossible.  If
we punt on the nested orchestrator, I believe we can seriously think
about a write-once policy to simplify things.

A bit off topic, but I've also wondered about not even implementing
read access, just to help ensure the audit container ID wouldn't be
abused, but I'm not sure how practical that will be.  Something else
to sort out during the RFC phase of the implementation with the
container orchestrators.

> ... and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.

My current thinking is that the default state is to start unlabeled (I
just vomited a little into my SELinux hat); in other words
init/systemd/PID-1 in the host system starts with an "unset" audit
container ID.  This not only helps define the host system (anything
that has an unset audit container ID) but provides a blank slate for
the orchestrator(s).

> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

I haven't made up my mind on this completely just yet, but I'm
currently of the mindset that supporting multiple audit container IDs
on a given process is not a good idea.

-- 
paul moore
www.paul-moore.com