[PATCH 0/6][RFC] user-cr: restart: Make task table portable

Matt Helsley matthltc at us.ibm.com
Tue Feb 9 15:22:52 PST 2010


On Mon, Feb 08, 2010 at 06:26:08PM -0500, Oren Laadan wrote:
> Matt,
> 
> Thanks for the patch-set.
> 
> Matt Helsley wrote:
> >This series modifies the task table entries to use indexes rather than
> >pointers to create the tree. This is one step that enables the same
> >table to be shared between multiple restart processes regardless of
> >whether they are 32 or 64-bit.
> >
> >Further steps, not in this set, include:
> >	1. Mark bitness of each task in the table.
> >	2. Share the table contents.
> >		Probably via an fd passed during execve() then mmap()'ed
> 
> As I said before, I'm concerned about the security implications.
> 
> Assume the 'restart' is setuid.
> 
> When 'restart' starts with a switch, e.g. --cont-fd=FD --cont-nr=NN,
> it will map that FD to memory and expect valid data there. But what
> if the data is bogus ?

As far as I can see there are two broad security concerns with a setuid
restart:

	1. Make sure it can't be used to invoke arbitrary code.
	2. Make sure the incoming data is as "safe" as what
		would come in for the "non-exec" design.

#2 can be taken care of by making a setuid "frontend" which builds the
tables and runs a "backend" that isn't runnable by normal users and
does not have the suid bit set. Roughly something like:

chmod executable
----------------
u+s   /bin/restart
og-x  /foo/restart32
og-x  /foo/restart64

/bin/restart only execs the two helpers, does not use exec*p() [glibc],
and uses constant strings (e.g. no sprintf) for locating the executable.
At that point I think we can be sure that users cannot pass arbitrary
data into the helpers which did not also get into the suid restart
"frontend".

In the frontend we can check things like number of tasks in the checkpoint
image to make sure they won't exceed per-user limits. There are different
ways/times we can enforce that. Since the table is one block we can check
indices or even pointers for illegal values. Perhaps we could check the pids
somehow, or modify the way pids are specified to avoid the checks.
I'd need to think about the pid checking some more..

> At the very least, we'll need to verify that the data in the array
> is valid. That is, iterating through entries to verify contents.
> 
> (We might as well pass the data via a pipe and make a local copy of
> the data at the exec'ed instance)
> 
> 
> >	3. Find/modify restart to do execve() at the right spot.
> >
> >The patches:
> >	1/6 Make context global
> 
> I suppose this is only necessary to because the ->ctx pointer in
> the @task will be invalid in the address space of the exec'ed
> instance.
> 
> To avoid the need for @ctx as global, we can (as noted above) make
> a local copy of the tasks array and set adjust the @ctx pointer in
> each entry.

Except pointers are different sizes and have different alignment
constraints too. If we need to change too much of the table we may
as well rebuild the table from scratch, no? That may even be
perferrable if the code would be simpler to read.

> I actually want to remove globals altogether, so that we can make
> the restart functionality available as a library. Unfortunately I'm
> not sure it's possible because we use most of them in the signal
> handling context.

I don't think removing the globals is necessary for a library. We
just need to be careful about the symbols it exports.

Libraries handling signals? I've never written one and conventional
wisdom says to run away from such a beasts. Libraries that handle
signals greatly limit usefulness (applications use the same signals)
and introduce ample opportunities for ugly bugs. Most applications
frequently mess up signal handling in obscure, possibly
arch or system-dependent ways. Combining those applications with a
library that handles signals is just asking for trouble.

There are better ways of doing things. Perhaps eventfds. [Ab]using pipes.
Abusing robust futexes. Put any/all fds you need under a single epoll fd
and let the library API caller be responsible for poll'ing it if you must.
If the library caller is ok with blocking calls then the library itself
can poll those fds. waitid() with WNOWAIT -- which sort of peeks at the
wait "events" -- might also be useful.

Cheers,
	-Matt Helsley


More information about the Containers mailing list