[PATCHv5 25/37] x86/vdso: Switch image on setns()/clone()

hpa at zytor.com hpa at zytor.com
Thu Aug 1 06:09:20 UTC 2019

On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski <luto at kernel.org> wrote:
>On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov <dima at arista.com> wrote:
>> As it has been discussed on timens RFC, adding a new conditional
>> `if (inside_time_ns)` on VDSO for all processes is undesirable.
>> It will add a penalty for everybody as branch predictor may
>> the jump. Also there are instruction cache lines wasted on cmp/jmp.
>> +#ifdef CONFIG_TIME_NS
>> +int vdso_join_timens(struct task_struct *task)
>> +{
>> +       struct mm_struct *mm = task->mm;
>> +       struct vm_area_struct *vma;
>> +
>> +       if (down_write_killable(&mm->mmap_sem))
>> +               return -EINTR;
>> +
>> +       for (vma = mm->mmap; vma; vma = vma->vm_next) {
>> +               unsigned long size = vma->vm_end - vma->vm_start;
>> +
>> +               if (vma_is_special_mapping(vma, &vvar_mapping) ||
>> +                   vma_is_special_mapping(vma, &vdso_mapping))
>> +                       zap_page_range(vma, vma->vm_start, size);
>> +       }
>This is, unfortunately, fundamentally buggy.  If any thread is in the
>vDSO or has the vDSO on the stack (due to a signal, for example), this
>will crash it.  I can think of three solutions:
>1. Say that you can't setns() if you have other mms and ignore the
>signal issue.  Anything with green threads will disapprove.  It's also
>rather gross.
>2. Make it so that you can flip the static branch safely.  As in my
>other email, you'll need to deal with CoW somehow,
>3. Make it so that you can't change timens, or at least that you can't
>turn timens on or off, without execve() or fork().
>BTW, that static branch probably needs to be aligned to a cache line
>or something similar to avoid all the nastiness with trying to poke
>text that might be concurrently executing.  This will be a mess.

Since we are talking about different physical addresses I believe we should be okay as long as they don't cross page boundaries, and even if they do it can be managed with proper page invalidation sequencing – it's not like the problems of having to deal with XMC on live pages like in the kernel.

Still, you really need each instruction sequence to be present, with the only difference being specific patch sites.

Any fundamental reason this can't be strictly data driven? Seems odd to me if it couldn't, but I might be missing something obvious.
Sent from my Android device with K-9 Mail. Please excuse my brevity.

More information about the Containers mailing list