[PATCH 1/4] Unify skb read/write functions and fix for fragmented buffers

Mon Nov 16 11:05:18 PST 2009

Dan Smith wrote:
>>> struct ckpt_hdr_socket_buffer {
>>> struct ckpt_hdr h;
>>> +	__u64 transport_header;
>>> +	__u64 network_header;
>>> +	__u64 mac_header;
>>> +	__u64 lin_len; /* Length of linear data */
>>> +	__u64 frg_len; /* Length of fragment data */
>>> +	__u64 skb_len; /* Length of skb (adjusted) */
>>> +	__u64 hdr_len; /* Length of skipped header */
>>> +	__u64 mac_len;
> 
> OL> Can you use u32 (or even less ?) for these ?
> 
> Well, in the structure the len is an unsigned int, so u64 seemed
> appropriate.
> 
>>> __s32 sk_objref;
>>> __s32 pr_objref;
>>> +	__u16 protocol;
>>> +	__u16 nr_frags;
>>> +	__u8 cb[48];
> 
> OL> Do you it will ever be required that cb[] be aligned ?
> 
> It's not aligned in the real structure and it's mostly opaque data, so
> no, I wouldn't think so.
> 
> OL> I'm unsure how much of a performance issue this is - I sort of
> OL> expected a comment from Dave Hansen about this; Did you consider
> OL> reusing the restore_read_page() from checkpoint/memory.c ?  (and
> OL> the matching checkpoint_dump_page() for the checkpoint).
> 
> Nope, I'll take a look.
> 
>>> +	for (i = 0; i < h->nr_frags; i++) {
> 
> OL> Sanitize h->nr_frags ?
> 
> To what?  I've already checked h->frg_len at this point, so I think
> maybe just making sure frag_len never crosses zero would be
> sufficient.

MAX_SKB_FRAGS.

You use @i when calling skb_add_rx_frag() which in turn uses it
to index in the skb's fragments array without checking.

> 
>>> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>>> +		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>>> +		u8 *vaddr = kmap(frag->page);
> 
> OL> Here, too, consider checkpoint_dump_page() to avoid kmap() ?
> 
> OL> It makes sense to have a scratch page to be used (also) for this,
> OL> on ckpt_ctx - to avoid alloc/dealloc repeatedly.
> 
> Hmm, is there an alloc/dealloc here?

Sorry, should have been more explicit: these two functions use an
temporary page to read the data to (write the data from) without
holding kmap(); instead they use kmap_atomic(), copy the data to
the scratch page, kunmap_atomic(), and then write the data.

When dumping memory, the scratch page is allocated once for every
big group of pages and then freed Here, you will need to allocate
for each skb, which makes less sense - hence my suggestion that
we keep a scratch page for that purpose. (and it can be used by
the memory dump/restore too).

> 
> I'll take a look at checkpoint_dump_page()...
> 

Oren.