[PATCH 1/1] s390: actually restart syscalls after sys_restart

Oren Laadan orenl at cs.columbia.edu
Thu Jan 21 08:00:02 PST 2010


On Thu, 21 Jan 2010, Serge E. Hallyn wrote:

> Quoting Oren Laadan (orenl at cs.columbia.edu):
> > Hi,
> > 
> > I'm not sure thix fix is totally correct. 
> > 
> > First, a reminder of how -ERESTART... errors behave:
> > 
> > 1) If a (user) signal handler was *not* called then:
> >   -ERESTARTSYS:    re-execute syscall
> >   -ERESTARTNOHAND:    re-execute syscall   
> >   -ERESTARTNOINTR:    re-execute syscall
> >   -ERESTART_RESTARTBLOCK:   arrange to call sys_restart_syscall
> > 
> > 2) If a (user) signal handler *was* called then:
> >   -ERESTARTSYS:    re-execute syscall iff SA_RESTART (else EINTR)
> >   -ERESTARTNOHAND:    return -EINTR
> >   -ERESTARTNOINTR:    re-execute syscall
> >   -ERESTART_RESTARTBLOCK:   return -EINTR
> > 
> > The difference between x86 and s390 is _when_ (and how) the task state is
> > altered (to arrange for re-execution), in case of task freezing:
> > 
> > * In x86, changes occur _after_ the task is frozen, and the work is split 
> > between handle_signal() and do_signal().
> > 
> > * In s390, some changes occur _before_ the task is frozen, and possibly 
> > undone after the freeze if a signal handler was called. All work is done 
> > is do_signal(). Because of that, our checkpoint only sees a partial view 
> > of the task's state.
> > 
> > Now back to the proposed fix: it isn't sufficient because:
> > 
> > (a) If the process has a signal pending when checkpointed, then the state 
> > already reflects some changes by do_signal(). If the signal leads to a 
> > (user) signal handler after restart, then do_signal() (after restart) will 
> > _not_ undo the changes originally done before the checkpoint.
> > 
> > This issue holds for ERESTARTSYS, ERESTARTNOHAND and ERESTART_RESTARTBLOCK.
> > 
> > Note that do_signal() after restart will not do the changes again either
> > in case of -ERESTART.... value, because the regs->svcnr was set to 0 and
> > recorded that way.
> > 
> > (b) Same, but also if the process didn't have a signal at checkpoint 
> > time, but will have one during/after restart but before resuming.
> > 
> > (c) Because do_restart() selects its return value from gprs[2] (upon 
> > successful completion), then when it tries to resume userspace and 
> > enters do_signal() it will have -EINTR (which isn't the original return 
> > value!), and therefore not set your new TIF_RESTARTBLOCK bit. If the task 
> > is now checkpointed again, that state will be lost.
> 
> Sorry if I'm misunderstanding what you're saying, but the return value
> after exiting sys_restart() won't be -EINTR now, bc my patch sets it to
> __NR_restart_syscall.

True ... but the consequences are the same:

Consider task A is checkpointed, then restarted, but before it completely
resumes userspace, it is checkpointed again. For the second checkpoint,
do_signal() will see svcnr==0, and even if you set svcnr, then gprs[2]>0.
Therefore, do_signal() will not set (again) the TIF_RESTARTBLOCK flag
because that happens only for svcnr!=0 and gprs[2]=-ERESTART_RESTARTBLOCK.

So the second checkpoint image lost that particular state you were 
interested. Therefore, when you restart from that image, the gprs[2] will
be set correctly (if ignoring issues a, b), but the TIF_RESTART_SVC won't
be set again.

> 
> > As a side note: I noticed that on x86 there are {checkpoint,restore}_thread() 
> > that save the thread flags and restore the necessary flags. They also 
> > check the flags - at checkpoint to ensure the task is checkpointable, and 
> > at restore for sanity.  Is there not a need for something similar for s390?
> > 
> > That would be an appropriate place to svae and restore a TIF_RESTARTBLOCK
> > thread flag and fix (c) above, should you stick to that method.
> 
> We actually want that flag cleared - the flag is only there so that
> checkpoint can tell that it needs to set the 'should_restart' flag in
> the thread info in the checkpoint image.  sys_estart does not then reset
> that flag, it instead does the steps which are done in the last block
> of do_signal(): set gprs[2] to NR_syscall_restart and add the TIF_RESTART_SVC
> thread flag.

The problem is that sys_restart *does not even set* that flag, and 
do_signal() will not set that flag either after restarting from a 
checkpoint that had should_restart set.  Therefore a subsequent checkpoint
that occurs _before_ the process really resumes to userspace will not 
carry that information.

Oren.


More information about the Containers mailing list