[PATCH] c/r: tolerate X86_EFLAGS_RF on restart

Oren Laadan orenl at librato.com
Fri Oct 23 11:37:04 PDT 2009


On restart, the c/r code that sanitizes a task's EFLAGS failed with an
error if the X86_EFLAGS_RF was set. However, there are cases in which
a task legitimately has this flag recorded at checkpoint, such as:

1) We are running inside a KVM guest, and the guest is being debugged
(see arch/x86/kvm/{svm,vmx}.c, via callbacks from qemu.

2) The kernel itself is being debugged using kgbd.

3) Exceptions that occur under certain condition will set this flag
on the currently running task. For details see page 679 in Intel's
manual (http://www.intel.com/Assets/PDF/manual/253668.pdf), like:

  When calling an event handler, Intel 64 and IA-32 processors
  establish the value of the RF flag in the EFLAGS image pushed on the
  stack:
    • For any fault-class exception except a debug exception
      generated in response to an instruction breakpoint, the value
      pushed for RF is 1.
    • For any interrupt arriving after any iteration of a repeated
      string instruction but the last iteration, the value pushed for
      RF is 1.
    • For any trap-class exception generated by any iteration of a
      repeated string instruction but the last iteration, the value
      pushed for RF is 1.
    • For other cases, the value pushed for RF is the value that was
      in EFLAG.RF at the time the event handler was called.
  [...]

Case #3 is consistent with the fact that in all cases where we
observed the issue there was a sendmail process doing IO.

Clearly, the X86_EFLAGS_RF is valid to be set.

This patches fixes the restart to:

1) Silently ignore the X86_EFLAGS_RF if was set at checkpoint: this
allows restart to tolerate the flag and safely complete the restart.

2) Preserve the existing X86_EFLAGS_RF of a restarting task (instead
of overwriting it): this ensures that restarting does not interfere
with kvm, kgdb, etc.

Signed-off-by: Oren Laadan <orenl at cs.columbia.edu>
Cc: Alexey Dobriyan <adobriyan at gmail.com>
Cc: Matt Helsley <matthltc at us.ibm.com>
Cc: Avi Kivity <avi at redhat.com>
---
 arch/x86/mm/checkpoint.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
index 9dd8e12..a93bb6f 100644
--- a/arch/x86/mm/checkpoint.c
+++ b/arch/x86/mm/checkpoint.c
@@ -27,13 +27,49 @@ static int check_eflags(__u32 eflags)
 #define X86_EFLAGS_CKPT_MASK  \
 	(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF | X86_EFLAGS_ZF | \
 	 X86_EFLAGS_SF | X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_OF | \
-	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID)
+	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID | X86_EFLAGS_RF)
 
 	if ((eflags & ~X86_EFLAGS_CKPT_MASK) != (X86_EFLAGS_IF | 0x2))
 		return 0;
 	return 1;
 }
 
+static void restore_eflags(struct pt_regs *regs, __u32 eflags)
+{
+	/*
+	 * A task may have had X86_EFLAGS_RF set at checkpoint, .e.g:
+	 * 1) It ran in a KVM guest, and the guest was being debugged,
+	 * 2) The kernel was debugged using kgbd,
+	 * 3) From Intel's manual: "When calling an event handler, 
+	 *    Intel 64 and IA-32 processors establish the value of the
+	 *    RF flag in the EFLAGS image pushed on the stack:
+	 *  - For any fault-class exception except a debug exception
+	 *    generated in response to an instruction breakpoint, the
+	 *    value pushed for RF is 1.
+	 *  - For any interrupt arriving after any iteration of a
+	 *    repeated string instruction but the last iteration, the
+	 *    value pushed for RF is 1.
+	 *  - For any trap-class exception generated by any iteration
+	 *    of a repeated string instruction but the last iteration,
+	 *    the value pushed for RF is 1.
+	 *  - For other cases, the value pushed for RF is the value
+	 *    that was in EFLAG.RF at the time the event handler was
+	 *    called.
+	 *  [from: http://www.intel.com/Assets/PDF/manual/253668.pdf]
+	 *
+	 * So: (1) silently ignore X86_EFLAGS_RF from checkpoint time,
+	 * as it's irrelevant for the current process, (2) preserve
+	 * the existing X86_EFLAGS_RF of this process as it it may be
+	 * used by kvm/kgdb. Disable preemption to protect test and
+	 * change of our eflags.
+	 */
+	preempt_disable();
+	eflags &= ~X86_EFLAGS_RF;
+	eflags |= (regs->flags & X86_EFLAGS_RF);
+	regs->flags = eflags;
+	preempt_enable();
+}
+
 static int check_tls(struct desc_struct *desc)
 {
 	if (!desc->a && !desc->b)
@@ -433,7 +469,7 @@ static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
 	regs->orig_ax = h->orig_ax;
 	regs->ip = h->ip;
 
-	regs->flags = h->flags;
+	restore_eflags(regs, h->flags);
 	regs->sp = h->sp;
 
 	regs->ds = decode_segment(h->ds);
-- 
1.6.0.4



More information about the Containers mailing list