[PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

Nathan Lynch ntl at pobox.com
Thu Jan 29 13:40:35 PST 2009


Hey Oren, thanks for taking a look.

Oren Laadan wrote:
> 
> Nathan Lynch wrote:
> > 
> > What doesn't work:
> > * restarting a 32-bit task from a 64-bit task and vice versa
> 
> Is there a test to bail if we attempt to checkpoint such tasks ?

No, but I'll add one if it looks too hard to fix for the next round.


> > +struct cr_hdr_cpu {
> > +	struct pt_regs pt_regs;
> 
> It has been suggested (as done in x86/32 code) not to use 'struct pt_regs'
> because it "can (and has) changed on x86" and because "it only container
> the registers that the kernel trashes, not all usermode registers".
> 
> https://lists.linux-foundation.org/pipermail/containers/2008-August/012355.html

Yeah, I considered that discussion, but the situation is different for
powerpc (someone on linuxppc-dev smack me if I'm wrong here :)
pt_regs is part of the ABI, and it encompasses all user mode registers
except for floating point, which are handled separately.


> > +	/* relevant fields from thread_struct */
> > +	double fpr[32][TS_FPRWIDTH];
> 
> Can TS_FPRWIDTH change between sub-archs or kernel versions ?  If so, it
> needs to be stated explicitly.
> 
> > +	unsigned int fpscr;
> > +	int fpexc_mode;
> > +	/* unsigned int align_ctl; this is never updated? */
> > +	unsigned long dabr;
> 
> Are these fields always guarantee to compile to the same number of bytes
> regardless of 32/64 bit choice of compiler (or sub-arch?) ?
> 
> In the x86(32/64) architecture we use types with explicit size such as
> __u32 and the like to ensure that it always compiled to the same
> size.

Yeah, I'll have to fix these up.



> > +static void cr_hdr_init(struct cr_hdr *hdr, __s16 type, __s16 len, __u32 parent)
> > +{
> > +	hdr->type = type;
> > +	hdr->len = len;
> > +	hdr->parent = parent;
> > +}
> > +
> 
> This function is rather generic and useful to non-arch-dependent and other
> architectures code. Perhaps put in a separate patch ?

Alright.  By the way, why are cr_hdr->type and cr_hdr->len signed
types?


> > +int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
> > +{
> > +	struct cr_hdr_cpu *cpu_hdr;
> > +	struct pt_regs *pt_regs;
> > +	struct cr_hdr cr_hdr;
> > +	u32 parent;
> > +	int ret;
> > +
> > +	cpu_hdr = cr_hbuf_get(ctx, sizeof(*cpu_hdr));
> > +	if (!cpu_hdr)
> > +		return -ENOMEM;
> > +
> > +	parent = task_pid_vnr(t);
> > +
> > +	cr_hdr_init(&cr_hdr, CR_HDR_CPU, sizeof(*cpu_hdr), parent);
> > +
> > +	/* pt_regs: GPRs, MSR, etc */
> > +	pt_regs = task_pt_regs(t);
> > +	cpu_hdr->pt_regs = *pt_regs;
> > +
> > +	/* FP state */
> > +	memcpy(cpu_hdr->fpr, t->thread.fpr, sizeof(cpu_hdr->fpr));
> 
> As note above, is sizeof(cpu_hdr->fpr) the same on all chips ?

It can differ depending on kernel configuration.


> > +/* restart APIs */
> > +
> 
> The restart APIs belong in a separate file: arch/powerpc/mm/restart.c

Explain why, please?  This isn't a lot of code, and it seems likely
that checkpoint and restart paths will share data structures and tend
to be modified together over time.


> > +		pr_debug("%s: unexpected thread_hdr contents: 0x%lx\n",
> > +			 __func__, (unsigned long)thread_hdr->unimplemented);
> 
> Given the macro for 'pr_fmt' in include/linux/checkpoint.h, the use of
> __func__ is redunant.

It seems to me that defining your own pr_fmt in a "public" header like
that is inappropriate, or at least unconventional.  Any file that
happens to include linux/checkpoint.h will have any prior definitions
of pr_fmt overridden, no?


> > +	regs = task_pt_regs(current);
> > +	*regs = cpu_hdr->pt_regs;
> > +
> > +	regs->msr = sanitize_msr(regs->msr);
> > +
> > +	/* FP state */
> > +	memcpy(current->thread.fpr, cpu_hdr->fpr, sizeof(current->thread.fpr));
> > +	current->thread.fpscr.val = cpu_hdr->fpscr;
> > +	current->thread.fpexc_mode = cpu_hdr->fpexc_mode;
> > +
> > +	/* debug registers */
> > +	current->thread.dabr = cpu_hdr->dabr;
> 
> I'm unfamiliar with powerpc; is it necessary to sanitize any of the registers
> here ?  For instance, can the user cause harm with specially crafted values
> of some registers ?

I had this in mind with the treatment of MSR, but I'll check on the
others, thanks.


> > +int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int rparent)
> > +{
> > +	struct cr_hdr_mm_context *mm_hdr;
> > +	int ret;
> > +
> > +	mm_hdr = cr_hbuf_get(ctx, sizeof(*mm_hdr));
> > +	if (!mm_hdr)
> > +		return -ENOMEM;
> > +
> > +	ret = cr_read_obj_type(ctx, mm_hdr, sizeof(*mm_hdr),
> > +				  CR_HDR_MM_CONTEXT);
> > +	if (ret != rparent)
> > +		goto out;
> 
> Seems like 'ret' isn't set to an error value if the 'goto' executes.

It returns whatever error value cr_read_obj_type() returns.  Hrm.  I
guess if the image is garbage, cr_read_obj_type can potentially return
a non-error value that still isn't the desired value, is that right?



More information about the Containers mailing list