Roadmap for features planed for containers where and Some future features ideas.

Peter Dolding oiaohm at gmail.com
Tue Jul 22 17:56:46 PDT 2008


On Wed, Jul 23, 2008 at 12:05 AM, Oren Laadan <orenl at cs.columbia.edu> wrote:
>
>
> Eric W. Biederman wrote:
>>
>> "Peter Dolding" <oiaohm at gmail.com> writes:
>>
>>> On Mon, Jul 21, 2008 at 10:13 PM, Eric W. Biederman
>>> <ebiederm at xmission.com> wrote:
>>>>
>>>> "Peter Dolding" <oiaohm at gmail.com> writes:
>>>>
>>>>> http://opensolaris.org/os/community/brandz/  I would like to see if
>>>>> something equal to this is on the roadmap in particular.   Being able
>>>>> to run solaris and aix closed source binaries contained would be
>>>>> useful.
>>>>
>>>> There have been projects to do this at various times on linux.  Having
>>>> a namespace dedicated to a certain kind of application is no big deal.
>>>> Someone would need to care enough to test and implement it though.
>>>>
>>>>> Other useful feature is some way to share a single process between PID
>>>>> containers as like a container bridge.  For containers used for
>>>>> desktop applications not having a single X11 server  interfacing with
>>>>> video card is a issue.
>>>>
>>>> X allows network connections, and I think unix domain sockets will work.
>>>> The latter I need to check on.
>>>
>>> Does to a point until you see that local X11 is using shared memory
>>> for speed.   Hardest issue is getting GLX working.
>>
>> That is easier in general.  Don't unshare the sysvipc namespace.
>> Or share the mount of /dev/shmem at least for the file X cares about.
>>
>>>> The pid namespace is well defined and no a task will not be able
>>>> to change it's pid namespace while running.  That is nasty.
>>>
>>> Ok if that is imposable to extremely risky.
>>>
>>> What about a form of a proxy pid in the pid namespace proxying
>>> application chatter between 1 name space to another.  Applications
>>> being the bridge if its not possible to do it invisible to application
>>> could be made aware of it.   So they can provide shared memory and the
>>> like across pid namespaces. But only where they have a activated proxy
>>> to do there bidding.  This also allows applications to maintain there
>>> own internal secuirty between namespaces.
>>>
>>> Ie application is 1 pid number in its source container and virtual pid
>>> numbers in the following containers.  Symbolic linking at task level
>>> yes a little warped.  Yes this will annoying mean a special set of
>>> syscalls and a special set of capabilities and restrictions.  Like PID
>>> containers starting up forbidding proxy pid's or allowing them.
>>>
>>> If I am thinking right that avoids not be able to change it's pid.
>>> Instead sending and receiving the messages you need in the other name
>>> space threw a small proxy.   Yes I know that will cost some
>>> performance.
>>
>> Proxy pids don't actually do anything for you, unless you want to send
>> signals.  Because all of the namespaces are distinct.  So even at the
>> best of it you can see the X server but it still can't use your
>> network sockets or ipc shm.
>>
>> Better is working out the details on how to manipulate multiple
>> sysvipc and network namespaces from a single application.  Mostly
>> that is supported now by the objects there is just no easy way
>> of dealing with it.
>>
>>> Basically want to setup a neat universal container way of handling
>>> stuff like http://www.cs.toronto.edu/~andreslc/xen-gl/ without having
>>> to go network and hopefully in a way that limitations don't have to
>>> exist since messages are really only be sent threw 1 X11 server to 1
>>> driver system.  Only thing is really sending the correct messages to
>>> the correct place.   There will most likely be other services were a
>>> single entity at times is preferred.   Worst out come is if proxying
>>> .so is required.
>>
>> Yes.  I agree that is essentially desirable.  Given that I think
>> high end video card actually have multiple hardware contexts that
>> can be mapped into different user space processes there may be other
>> ways of handling this.
>>
>> Ideally we can find a high performance solution to X that also gives
>> us good isolation and migration properties.  Certainly something to talk
>> about tomorrow in the conference.
>
> In particular, if you wish to share private resources of a container
> between more than a single container, then you won't be able to use
> checkpoint/restart on neither container (unless you make special
> provisions in the code).
>
> I agree with Eric that the way to handle this is via virtualization
> as opposed to direct sharing. The same goes for other hardware, e.g.
> in the context of a user desktop - /dev/rtc, sound, and so on. My
> experience is that a proxy/virtualized device is what we probably
> want.
>
> Oren.
>
Giving up means to use checkpoint cleanly on containers independent of
each other when using X11 might be a requirement.   Reason in GPU
processing if you want to provide that a lot GPU's don't have a good
segmented freeze its either park the full GPU or risk issues on
startup.  Features need to be added to GPU so we can suspend
individual opengl context's to make that work.   So any application
using the GPU at most likely will have to be lost in a checkpoint
restore independent to the other X11 using the desktop.
Even suspending the GPU as a block there are still issues with some cards.

Sorry Oren from using http://www.virtualgl.org I know suspending GPU's
is trouble.

http://www.cs.toronto.edu/~andreslc/xen-gl/ blocks out all usage of
GPU for advance processing effectively crippling card.   Virtualized
basically is not going to cut it.   You need access to GPU for
particular software to work.

This is more containers being used by desktop users to run many
distributions at once.

Of course there is nothing stopping checkpoint process informing user
that they cannot go past this point in check pointing until the
following application are closed.  Ie the ones using the GPU shader
processing and the like.  We just have to wait for video card makers
to provide us with something equal intels and amd's cpu vitalisation
instructions to suspend independent opengl context's.

Multiple hardware contexts are many independent gpu's stuck on cards
just like sticking more video cards in a computer  yes they can be
suspended independently yes how they are allocated should be
controllable,  These are not on every card out there.  Yet you want
migration sorry really bad new here.  A suspend of a gpu has to be
loaded backup on exactly the same type of GPU or you are stuffed.  2
different model cards will not work.  So this does not help you at all
with migration or even worse video card death.  Most people forget
that a suspend using compiz or anything else in gpu cannot be restored
if you have change video cards to a different gpu.  Brand card does
not help you here.

Full X11 with Fully functional opengl will mean giving some things up.
 Means to keep every application running threw a migration or
checkpoint is impossable.   Applications container/suspend aware could
have some form of internal rebuild opengl context after restore from a
point they can restart there processing loop from but they will have
to redo all there shader code and other in gpu processing code in case
of change of gpu type and even there engine internal paths.  This
alteration would allow check pointing and migration back with
dependability but only if using aware applications.

X11 2d can suspend and restore without major issue as
http://partiwm.org/wiki/xpra shows.  3d is a bugger.

There is basically no magical trick to get around this problem.
Containers alone cannot solve it.  Rare section with loss has to be
excepted to make it work.  By it working will be like Xen when it
started started cpu makers looking at making it better.

Restart should be a zero issue.   Clearing the opengl context
displayed on the X11 server gets done in case of a application splat
out reset would be equal.  When application restarts it will create
the opengl context new so no 3d issue.

Video cards are different to most other hardware you are dealing with.
 They are a second processing core that you don't have full control
over and are different card to card to the point of being 100 percent
incompatible with each other.


Peter Dolding


More information about the Containers mailing list