[PATCH RFC 3/5] tun: vringfd receive support.

Max Krasnyanskiy maxk at qualcomm.com
Thu Apr 10 10:18:52 PDT 2008


Rusty Russell wrote:
> On Wednesday 09 April 2008 05:49:15 Max Krasnyansky wrote:
>> Rusty Russell wrote:
>>> This patch modifies tun to allow a vringfd to specify the receive
>>> buffer.  Because we can't copy to userspace in bh context, we queue
>>> like normal then use the "pull" hook to actually do the copy.
>>>
>>> More thought needs to be put into the possible races with ring
>>> registration and a simultaneous close, for example (see FIXME).
>>>
>>> We use struct virtio_net_hdr prepended to packets in the ring to allow
>>> userspace to receive GSO packets in future (at the moment, the tun
>>> driver doesn't tell the stack it can handle them, so these cases are
>>> never taken).
>> In general the code looks good. The only thing I could not convince myself
>> in is whether having generic ring buffer makes sense or not.
>> At least the TUN driver would be more efficient if it had its own simple
>> ring implementation. Less indirection, fewer callbacks, fewer if()s, etc.
>> TUN already has the file descriptor and having two additional fds for rx
>> and tx ring is a waste (think of a VPN server that has to have a bunch of
>> TUN fds). Also as I mentioned before Jamal and I wanted to expose some of
>> the SKB fields through TUN device. With the rx/tx rings the natural way of
>> doing that would be the ring descriptor itself. It can of course be done
>> the same way we copy proto info (PI) and GSO stuff before the packet but
>> that means more copy_to_user() calls and yet more checks.
>>
>> So. What am I missing ? Why do we need generic ring for the TUN ? I looked
>> at the lguest code a bit and it seems that we need a bunch of network
>> specific code anyway. The cool thing is that you can now mmap the rings
>> into the guest directly but the same thing can be done with TUN specific
>> rings.
> 
> I started modifying tun to do this directly, but it ended up with a whole heap 
> of code just for the rings, and a lot of current code (eg. read, write, poll) 
> ended up inside an 'if (tun->rings) ... else {'.  Having a natural poll() 
> interface for the rings made more sense, so being their own fds fell out 
> naturally.
Hmm, the version that I sent you awhile ago (remember I sent you an attachment 
with prototype of the new tun driver and user space code) was not that bad in 
that area. It mean it did not touch existing read()/write() path. The 
difference was that it allocated the rings and the data buffer in the kernel 
and mapped into the user-space. Which is not what you guys need but that's a 
separate thing.

The fd thing could be an issue. As I mentioned the example would be a VPN 
server (OpenVPN, etc) with a bunch of client connection (typically tun per 
connection).

> I decided to float this version because it does minimal damage to tun, and I
> know that other people have wanted rings before: I'd like to know if this is
> likely to be generic enough for them.
I see.

I'll try to spend some time on this in a near future and take a crack at the 
version with the TUN specific rings. Although I said that many times now and 
it may not happen in the near enough future :). In the mean time if your 
current version helps you guys a lot I do not mind us putting it in. We can 
always add another mode or something that uses internal rings and gradually 
obsolete old read()/write() and generic rings.

Max


More information about the Virtualization mailing list