[RFC][PATCH] clone_with_pids()^w eclone() for x86_64

Louis Rilling Louis.Rilling at kerlabs.com
Thu Nov 19 13:26:47 PST 2009


On Thu, Nov 19, 2009 at 09:48:49AM -0800, Dave Hansen wrote:
> On Thu, 2009-11-19 at 10:58 +0100, Louis Rilling wrote:
> > > int clone_with_pids(long flags_low, struct clone_args *clone_args, long args_size,
> > >                  int *pids)
> > > {
> > >         long retval;
> > > 
> > >         __asm__  __volatile__(
> > >                  "movq %3, %%r10\n\t"           /* pids in r10*/
> > >                  "pushq %%rbp\n\t"              /* save value of ebp */
> > >                 :
> > >                 :"D" (flags_low), /* rdi */
> > >                  "S" (clone_args),/* rsi */
> > >                  "d" (args_size), /* rdx */
> > >                  "a" (pids)       /* use rax, which gets moved to r10 */
> > >                 );
> > 
> > 1. The fourth C arg is not in rax, but in rcx.
> 
> Hey Louis,
> 
> So, try as I might, I couldn't get that to work.  I thought it was rcx,
> too.
> 
> So, changing that instruction to:
> 
>                 "movq %3, %%rcx\n\t"           /* pids in r10*/

Hm, no.

I meant (without taking into account my other comments):

         __asm__  __volatile__(
                  "movq %3, %%r10\n\t"           /* pids in r10*/
                  "pushq %%rbp\n\t"              /* save value of ebp */
                 :
                 :"D" (flags_low), /* rdi */
                  "S" (clone_args),/* rsi */
                  "d" (args_size), /* rdx */
                  "c" (pids)       /* use rcx, which gets moved to r10 */
                 );

But actually this is even better :D:

         __asm__  __volatile__(
                  "movq %3, %%r10\n\t"           /* pids in r10*/
                  "pushq %%rbp\n\t"              /* save value of ebp */
                 :
                 :"D" (flags_low), /* rdi */
                  "S" (clone_args),/* rsi */
                  "d" (args_size), /* rdx */
                  "r10" (pids)     /* Linux reads its fourth arg from r10 */
                 );


> 
> and putting 0x11111, etc... in for the args the strace output for the
> syscall looks like this:
> 
>         syscall_299(0x11111, 0x22222, 0x33333, 0x1, 0x1, 0x2, 0, 0, 0,
>         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>         0, 0) = -1 (errno 22)
> 
> and I get -EFAULT back from the function doing the copy_from_user() of
> the pids argument, even when using good values.
> 
> If I use the asm posted above, I get this:
>         
>         syscall_299(0x11111, 0x22222, 0x33333, 0x44444, 0x1, 0x2, 0, 0,
>         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>         0, 0, 0) = -1 (errno 22)
>         
> Or, this from a real call:
>         
>         syscall_299(0x1100011, 0x7fff19f0fd40, 0x38, 0x602070, 0x1, 0x2,
>         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>         0, 0, 0, 0, 0[2992, 377]: Child:
>         
> I had to find r10 basically by trial and error.  I have no idea why it
> works.

r10 is used to pass the fourth arg to the kernel because the syscall instruction
puts next rip (return address) in rcx. Using r10 instead of rcx is defined as part
of Linux ABI for x86_64.

For all the details, read the comments in
arch/x86/kernel/entry_64.S:ENTRY(system_call).

> 
> > > 
> > >         __asm__ __volatile__(
> > >                  "syscall\n\t"  /* Linux/x86_64 system call */
> > >                  "testq %0,%0\n\t"      /* check return value */
> > >                  "jne 1f\n\t"           /* jump if parent */
> > >                  "popq %%rbx\n\t"       /* get subthread function */
> > >                  "call *%%rbx\n\t"      /* start subthread function */
> > >                  "movq %2,%0\n\t"
> > >                  "syscall\n"            /* exit system call: exit subthread */
> > >                  "1:\n\t"
> > >                  "popq %%rbp\t"         /* restore parent's ebp */
> > >                 :"=a" (retval)
> > >                 :"0" (__NR_clone3), "i" (__NR_exit)
> > >                 :"ebx", "ecx", "edx"
> > >                 );
> > 
> > 2. You should probably not separate this into two asm statements. In particular,
> >    the compiler has no way to know that r10 should be preserved between the two
> >    statements, and may be confused by the change of rsp.
> 
> Yeah, I wondered about that.  Suka, we should probably fix your tests
> and the i386 code, too.
> 
> > 3. r10 and r11 should be listed as clobbered.
> 
> D'oh!  I didn't even touch the bottom registers because it continued to
> work from the i386 version that I stole from Suka.  

That's again because of the syscall instruction, which saves EFLAGS to r11
(and sysret restores EFLAGS from r11).

> 
> > 4. I fail to see the magic that puts the subthread function pointer in the
> >    stack.
> > 
> > 5. Maybe rdi should contain the subthread argument before calling the subthread?
> > 
> > 6. rdi, rsi, rdx, rcx, r8 and r9 should be added to the clobber list because of
> >    the call to the subthread function.
> > 
> > 7. rsi could be used in place of rbx to hold the function pointer, which would
> >    allow you to remove ebx from the clobber list.
> > 
> > 8. I don't see why rbp should be saved. The ABI says it must be saved by the
> >    callee.
> > 
> > 9. Before calling exit(), maybe put some exit code in rdi?
> 
> Thanks for looking through this, Louis.  I'll send out another version
> in a bit.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20091119/8baf0fb7/attachment-0001.pgp 


More information about the Containers mailing list