pid namespace bug ?

Daniel Lezcano daniel.lezcano at free.fr
Fri May 7 01:51:57 PDT 2010


Sukadev Bhattiprolu wrote:
> Daniel Lezcano [daniel.lezcano at free.fr] wrote:
>   
>> Ferenc Wagner wrote:
>>
>>     
>>> I noticed something strange:
>>>
>>> # lxc-start -n jail -s lxc.mount.entry="/ /tmp/jail none bind 0 0" -s lxc.rootfs=/tmp/jail -s lxc.pivotdir=/mnt /bin/sleep 1000
>>> (in another terminal)
>>> # lxc-ps --lxc
>>> CONTAINER    PID TTY          TIME CMD
>>> jail        4173 pts/1    00:00:00 sleep
>>> # kill 4173
>>> (this does not kill the sleep!)
>>> # strace -p 4173
>>> Process 4173 attached - interrupt to quit
>>> restart_syscall(<... resuming interrupted call ...> = ? ERESTART_RESTARTBLOCK (To be restarted)
>>> --- SIGTERM (Terminated) @ 0 (0) ---
>>> Process 4173 detached
>>> # lxc-ps --lxc
>>> CONTAINER    PID TTY          TIME CMD
>>> jail        4173 pts/1    00:00:00 sleep
>>> # fgrep -i sig /proc/4173/status SigQ:	1/16382
>>> SigPnd:	0000000000000000
>>> SigBlk:	0000000000000000
>>> SigIgn:	0000000000000000
>>> SigCgt:	0000000000000000
>>> # kill -9 4173
>>>
>>> That is, the jailed sleep process could be killed by SIGKILL only, even
>>> though (according to strace) SIGTERM was delivered and it isn't handled
>>> specially.  Why does this happen?
>>>       
>
> Yes, SIGKILL is the only reliable way to terminate a container-init.
> container-init needs to be immune to signals from within the container
> but be open to receiving signals from parent container.  These requirements
> complicate the implementation of allowing SIGINIT/SIGTERM etc to
> container-init from parent container.
>
> Besides a realistic container-init would block such signals, in which case
> the complexity in the kernel could be viewed as unnecessary.
>   

I am not sure it is good to have the pid 1 immune against signals sent 
from outside of the container.
 From the POV of the parent process, the container init is like any 
other process and it may want to kill it with a signal (for notification 
or just terminate instead of killing it).

If the container init is a real init pid, these signals will be blocked 
but if we launch something different, eg a 'sleep', Ctrl+C won't work. 
eg: lxc-start -n foo sleep 3600 is not interruptible.

That's a bit annoying if we need to plug the container with batch 
managers or use them with HPC jobs.







More information about the Containers mailing list