[RFC][PATCH] clone_with_pids()^w eclone() for x86_64

Louis Rilling Louis.Rilling at kerlabs.com
Thu Nov 19 13:29:05 PST 2009


On Thu, Nov 19, 2009 at 10:26:46PM +0100, Louis Rilling wrote:
> On Thu, Nov 19, 2009 at 09:48:49AM -0800, Dave Hansen wrote:
> > On Thu, 2009-11-19 at 10:58 +0100, Louis Rilling wrote:
> > > > int clone_with_pids(long flags_low, struct clone_args *clone_args, long args_size,
> > > >                  int *pids)
> > > > {
> > > >         long retval;
> > > > 
> > > >         __asm__  __volatile__(
> > > >                  "movq %3, %%r10\n\t"           /* pids in r10*/
> > > >                  "pushq %%rbp\n\t"              /* save value of ebp */
> > > >                 :
> > > >                 :"D" (flags_low), /* rdi */
> > > >                  "S" (clone_args),/* rsi */
> > > >                  "d" (args_size), /* rdx */
> > > >                  "a" (pids)       /* use rax, which gets moved to r10 */
> > > >                 );
> > > 
> > > 1. The fourth C arg is not in rax, but in rcx.
> > 
> > Hey Louis,
> > 
> > So, try as I might, I couldn't get that to work.  I thought it was rcx,
> > too.
> > 
> > So, changing that instruction to:
> > 
> >                 "movq %3, %%rcx\n\t"           /* pids in r10*/
> 
> Hm, no.
> 
> I meant (without taking into account my other comments):
> 
>          __asm__  __volatile__(
>                   "movq %3, %%r10\n\t"           /* pids in r10*/
>                   "pushq %%rbp\n\t"              /* save value of ebp */
>                  :
>                  :"D" (flags_low), /* rdi */
>                   "S" (clone_args),/* rsi */
>                   "d" (args_size), /* rdx */
>                   "c" (pids)       /* use rcx, which gets moved to r10 */
>                  );
> 
> But actually this is even better :D:
> 
>          __asm__  __volatile__(
>                   "movq %3, %%r10\n\t"           /* pids in r10*/
>                   "pushq %%rbp\n\t"              /* save value of ebp */
>                  :
>                  :"D" (flags_low), /* rdi */
>                   "S" (clone_args),/* rsi */
>                   "d" (args_size), /* rdx */
>                   "r10" (pids)     /* Linux reads its fourth arg from r10 */
>                  );
> 

Grrrr, I sent the email too quickly! This is the better version:

         __asm__  __volatile__(
                  "pushq %%rbp\n\t"              /* save value of ebp */
                 :
                 :"D" (flags_low), /* rdi */
                  "S" (clone_args),/* rsi */
                  "d" (args_size), /* rdx */
                  "r10" (pids)     /* Linux reads its fourth arg from r10 */
                 );

Louis


> > 
> > and putting 0x11111, etc... in for the args the strace output for the
> > syscall looks like this:
> > 
> >         syscall_299(0x11111, 0x22222, 0x33333, 0x1, 0x1, 0x2, 0, 0, 0,
> >         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >         0, 0) = -1 (errno 22)
> > 
> > and I get -EFAULT back from the function doing the copy_from_user() of
> > the pids argument, even when using good values.
> > 
> > If I use the asm posted above, I get this:
> >         
> >         syscall_299(0x11111, 0x22222, 0x33333, 0x44444, 0x1, 0x2, 0, 0,
> >         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >         0, 0, 0) = -1 (errno 22)
> >         
> > Or, this from a real call:
> >         
> >         syscall_299(0x1100011, 0x7fff19f0fd40, 0x38, 0x602070, 0x1, 0x2,
> >         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >         0, 0, 0, 0, 0[2992, 377]: Child:
> >         
> > I had to find r10 basically by trial and error.  I have no idea why it
> > works.
> 
> r10 is used to pass the fourth arg to the kernel because the syscall instruction
> puts next rip (return address) in rcx. Using r10 instead of rcx is defined as part
> of Linux ABI for x86_64.
> 
> For all the details, read the comments in
> arch/x86/kernel/entry_64.S:ENTRY(system_call).
> 
> > 
> > > > 
> > > >         __asm__ __volatile__(
> > > >                  "syscall\n\t"  /* Linux/x86_64 system call */
> > > >                  "testq %0,%0\n\t"      /* check return value */
> > > >                  "jne 1f\n\t"           /* jump if parent */
> > > >                  "popq %%rbx\n\t"       /* get subthread function */
> > > >                  "call *%%rbx\n\t"      /* start subthread function */
> > > >                  "movq %2,%0\n\t"
> > > >                  "syscall\n"            /* exit system call: exit subthread */
> > > >                  "1:\n\t"
> > > >                  "popq %%rbp\t"         /* restore parent's ebp */
> > > >                 :"=a" (retval)
> > > >                 :"0" (__NR_clone3), "i" (__NR_exit)
> > > >                 :"ebx", "ecx", "edx"
> > > >                 );
> > > 
> > > 2. You should probably not separate this into two asm statements. In particular,
> > >    the compiler has no way to know that r10 should be preserved between the two
> > >    statements, and may be confused by the change of rsp.
> > 
> > Yeah, I wondered about that.  Suka, we should probably fix your tests
> > and the i386 code, too.
> > 
> > > 3. r10 and r11 should be listed as clobbered.
> > 
> > D'oh!  I didn't even touch the bottom registers because it continued to
> > work from the i386 version that I stole from Suka.  
> 
> That's again because of the syscall instruction, which saves EFLAGS to r11
> (and sysret restores EFLAGS from r11).
> 
> > 
> > > 4. I fail to see the magic that puts the subthread function pointer in the
> > >    stack.
> > > 
> > > 5. Maybe rdi should contain the subthread argument before calling the subthread?
> > > 
> > > 6. rdi, rsi, rdx, rcx, r8 and r9 should be added to the clobber list because of
> > >    the call to the subthread function.
> > > 
> > > 7. rsi could be used in place of rbx to hold the function pointer, which would
> > >    allow you to remove ebx from the clobber list.
> > > 
> > > 8. I don't see why rbp should be saved. The ABI says it must be saved by the
> > >    callee.
> > > 
> > > 9. Before calling exit(), maybe put some exit code in rdi?
> > 
> > Thanks for looking through this, Louis.  I'll send out another version
> > in a bit.
> 
> Thanks,
> 
> Louis
> 
> -- 
> Dr Louis Rilling			Kerlabs
> Skype: louis.rilling			Batiment Germanium
> Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
> http://www.kerlabs.com/			35700 Rennes



-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20091119/54dbb194/attachment.pgp 


More information about the Containers mailing list