[Bridge] VLAN tags in mac_len

Toshiaki Makita toshiaki.makita1 at gmail.com
Tue Jul 2 10:00:48 UTC 2019


On 2019/06/28 20:02, Johannes Berg wrote:
> On Mon, 2019-06-17 at 20:15 +0900, Toshiaki Makita wrote:
>> I'll try to explain the problem I see, which cannot be fixed by option 1...
>> The bug is in tcf_vlan_act(), and mainly in skb->data, not in mac_len.
>>
>> Consider about vlan packets from NIC, but non-hw-accelerated, where
>> vlan devices are configured to receive them.
>>
>> When __netif_receive_skb_core() is called, skb is like this.
>>
>> +-----+------+--------
>>> eth | vlan | TCP/IP
>>
>> +-----+------+--------
>>         ^
>>        data
>>
>> skb->data is at the beginning of the vlan header.
> 
> Right.
> 
>> This is reasonable because we did not process the vlan tag at this point.
> 
> I think with this simple sentence you just threw a whole new semantic
> issue into the mix, one that I at least hadn't considered.
>
> However, it's not clear to me whether we should consider a tag as
> processed or not when we push it.

It's clear that we always insert a tag as unprocessed in a single tag case.
The tag is inserted as hw-offloaded one, and hw-offloaded one is treated
as an unprocessed tag in __netif_receive_skb_core().
The single tag is the most common usage I think, so unprocessed should be
the expected behavior.

Also, tc vlan act was originally introduced to replace OVS vlan action.
It's in fact used as HW offload path of ovs-vswitchd, and should behave the same as
OVS push_vlan action. And push_vlan action inserts a tag as unprocessed one.

> In a sense, this means we should have two different VLAN tag push
> options - considering it processed or unprocessed. Or maybe it should
> always be considered unprocessed, but that's not what we do today.
>
>> Then after vlan_do_receive() (receive the skb on a vlan device), the skb is like this.
>>
>> +-----+--------
>>> eth | TCP/IP
>>
>> +-----+--------
>>         ^
>>        data
>>
>> Or if reorder_hdr is off (which does not remove vlan tags when receiving on vlan devices),
>>
>> +-----+------+--------
>>> eth | vlan | TCP/IP
>>
>> +-----+------+--------
>>                ^
>>               data
>>
>> Relying on this mechanism, we are currently able to handle multiple vlan tags.
>>
>> For example if we have 2 tags,
>>
>> - On __netif_receive_skb_core() invocation
>>
>> +-----+------+------+--------
>>> eth | vlan | vlan | TCP/IP
>>
>> +-----+------+------+--------
>>         ^
>>        data
>>
>> - After first vlan_do_receive()
>>
>> +-----+------+--------
>>> eth | vlan | TCP/IP
>>
>> +-----+------+--------
>>         ^
>>        data
>>
>> Or if reorder_hdr is off,
>>
>> +-----+------+------+--------
>>> eth | vlan | vlan | TCP/IP
>>
>> +-----+------+------+--------
>>                ^
>>               data
>>
>> When we process one tag, the data goes forward by one tag.
> 
> Right, that's a very good point.
> 
>> Now looking at TC vlan case...
>>
>> After it inserts two tags, the skb looks like:
>>
>> (The first tag is in vlan_tci)
>> +-----+------+--------
>>> eth | vlan | TCP/IP
>>
>> +-----+------+--------
>>                ^
>>               data
>>
>> The data pointer went forward before we process it.
>> This is apparently wrong. I think we don't want to (or cannot?) handle cases like this
>> after tcf_vlan_act(). This is why I said we should remember mac_len there.
> 
> Right, makes a lot of sense.
> 
> If you consider a tc VLAN pop, you'd argue that it should pop the next
> unprocessed tag I guess, since if it was processed then it doesn't
> really exist any more (semantically, you still see it if reorder_hdr is
> off), right?

Right.

>> So, my opinion is:
>> On ingress, data pointer can be at the end of vlan header and mac_len probably should
>> include vlan tag length, but only after the vlan tag is processed.
> 
> You're basically arguing for option (3), I think, making VLAN push/pop
> not manipulate mac_len since they can just push/pop *unprocessed* tags,
> right?

Ah, true, on the second thought (2b) is not an appropriate fix but (3) is.
(3) more correctly emulates already tagged packets from wire so it should
cause least confusion.

> I fear this will cause all kinds of trouble in other code. Perhaps we
> need to make this processed/unprocessed state more explicit.

But (3) makes mac_len the same as the already tagged packets from NICs.
If other code cannot handle the packets correctly, they need to be fixed anyway.

As you explained OVS MPLS seems to rely on mac_len adjustment by skb_vlan_push(), but
it means the OVS MPLS code cannot correctly handle double-tagged packets from NICs,
so it needs a fix anyway (use __vlan_get_protocol() to get the real mac_len?).

>> Bridge may need to handle mac_len that is not equal to ETH_HLEN but to me it's a
>> different problem.
> 
> Yes. Like I just said to Daniel, I think we should make bridge handle
> mac_len so that we can just exclude it from this whole discussion.
> Regardless of the mac_len and processed/unprocessed tags, it would just
> work as expected.

That's OK, it should help packets from vlan devices with reorder_hdr off.

Toshiaki Makita


More information about the Bridge mailing list