[PATCH 5/9] cr: capabilities: define checkpoint and restore fns

Andrew G. Morgan morgan at kernel.org
Thu Jun 4 07:13:33 PDT 2009


On Wed, Jun 3, 2009 at 9:45 AM, Serge E. Hallyn <serue at us.ibm.com> wrote:
> Quoting Andrew G. Morgan (morgan at kernel.org):
>> You guys are well on the road to having something functional. I do not
>> want to get in the way of that.
>>
>> What I'm trying to question is the way the code is abstracted. My
>> sense is that its not very friendly to the subsystems being
>> checkpointed. I could offer an alternative patch for the capabilities
>> code, but if it breaks some invisible constraint I'd rather you tell
>> me about it first.
>>
>> There are three sets of code in these changes:
>>
>>   1. the capabilities code (but I think I'm arguing about that because
>> it is an example I'm familiar with and this probably applies to other
>> subsystems in the kernel)
>>   2. the glue code in the kernel that implements sys_restore() etc.
>>   3. the user space code
>>
>> What seems to have happened with these patches is that the glue code
>> and the user space code have knowledge of each other and things are
>> written to make an API/ABI that they can agree on. (Part of this is
>> evidenced by the u64 reference to capabilities and non-use of the more
>> natural kernel_cap_t data that the kernel itself uses.)
>>
>> It seems to me that a more natural abstraction is that 1 and 3 know
>> about each other (actually 3 knows about how 1 chooses to define
>> things), and 2 is a generic data packager interface.
>
> One reason (3) and (2) need to know about each is other is bc
> userspace needs to read the checkpoint image enough to read
> the task states, clone off suitable children, and then each
> child calls sys_restart().  So to this end, there are certainly
> details which userspace doesn't need to be able to parse.
>
> The other reason is that userspace will expect to be able to
> make changes to the checkpoint file.  For instance, if restarting
> a process from one kernel version with 34 capabilities defined,
> on a kernel version with 35 defined, then userspace may want to
> fill in an extra cap.
>
> But I strongly agree with your concern that subsystems will get
> out of sync with the checkpoint code.
>
> If we just move the struct ckpt_capabilities definition into
> include/linux/capabilities.h, does that suffice?  The c/r
> userspace could then do something like
>
> cat > ckpthdr.sed << EOF
> /^[^ \t]*struct ckpt[^\n]*{/,/}/p
> EOF
> cat /usr/include/linux/*.h /usr/include/asm/*.h | sed -n -f ckpthdr.sed \
>        > checkpoint_headers.h
>
> Now you mention using kernel_cap_t's...  we can go partway
> down that road, but the inherent problem is serializing the
> relevant data so it can be streamed to some file.  So I
> think it's better if the subsystems are required to specify
> a useful format (like struct ckpt_capabilities) and define
> the checkpoint/restart helpers to serialize data into the
> struct.  We can try and get cute with dual mirrored
> struct defs, one which is defined in terms the caps code
> can understand, and one defined in more arch-independent
> terms (which is why we need __u64s and packed, for instance).
> But that seems more fragile than having clear and simple
> requirements for the $SUBYSTEM_checkpoint() and $SUBSYSTEM_restart()
> helpers.

I like this $SUBSYSTEM_checkpoint() etc. thing.

I like the ckpthdr.sed thing. I think a similar rule could be used to
generate the calls to the list of $SUBSYSTEM_checkpoint() functions.

For serialization, could a kernel "gcc -E checkpoint-headers.h >
this-kernel-checkpoint-file.h" build rule be enough?

Cheers

Andrew

>
> (I'll wait to hear whether you think I'm on the right track
> before reworking my patch)
>
>> This should be functionally equivalent, but my belief is an
>> implementation based on the current abstraction will be broken more
>> often because some subsystem (eg. 1) evolves its critical state and
>> the corresponding (2+3)'s relationship is too confusing/constraining
>> for the kernel subsystem developer to keep in sync (someone else's
>> problem). And subtle problems of ignored state in checkpoints can be
>> really hard to debug!
>>
>> Bottom line. Don't wait for me to Ack this change. I intend to lurk
>> and if I see a way to "clean it up", I'll offer a patch, which you can
>> of course reject... But don't wait on me.
>
> Thanks, it's much appreciated.  But please don't hold back,
> as your points are imo valid, and we need to work out how to
> address them.
>
> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


More information about the Containers mailing list