[Lightning-dev] Quick analysis of channel_update data

Fabrice Drouin fabrice.drouin at acinq.fr
Mon Feb 18 15:34:47 UTC 2019


I'll start collecting and checking data again, but from what I see now
using our checksum extension still significantly reduces gossip
traffic.

I'm not saying that heuristics to reduce the number of updates cannot
help, but I just don't think it should be our primary way of handling
such traffic. If you've opened channels to nodes that are unreliable
then you should eventually close these channels, but delaying how you
publish updates that disable/enable them has an impact on everyone,
especially if they mostly send payments (as opposed to relaying or
receiving them).

Cheers,

Fabrice

On Mon, 18 Feb 2019 at 13:10, Rusty Russell <rusty at rustcorp.com.au> wrote:
>
> BTW, I took a snapshot of our gossip store from two weeks back, which
> simply stores all gossip in order (compacting every week or so).
>
> channel_updates which updated existing channels: 17766
> ... which changed *only* the timestamps: 12644
>     ... which were a week since the last: 7233
> ... which only changed the disable/enable: 4839
>
> So there are about 5100 timestamp-only updates less than a week apart
> (about 2000 are 1036 seconds apart, who is this?).
>
> 1. I'll look at getting even more conservative with flapping (120second
>    delay if we've just sent an update) but that doesn't seem to be the
>    majority of traffic.
> 2. I'll also slow down refreshes to every 12 days, rather than 7, but
>    again it's only a marginal change.
>
> But basically, the majority of updates I saw two weeks ago are actually
> refreshes, not spam.
>
> Hope that adds something?
> Rusty.
>
> Fabrice Drouin <fabrice.drouin at acinq.fr> writes:
> > Additional info on channel_update traffic:
> >
> > Comparing daily backups of routing tables over the last 2 weeks shows
> > that nearly all channels get at least a new update every day. This
> > means that channel_update traffic is not primarily cause by nodes
> > publishing new updates when channel are about to become stale:
> > otherwise we would see 1/14th of our channels getting a new update on
> > the first day, then another 1/14th on the second day and so on.This is
> > confirmed by comparing routing table backups over a single day: nearly
> > all channels were updated, one average once, with an update that
> > almost always does not include new information.
> >
> > It could be caused by "flapping" channels, probably because the hosts
> > that are hosting them are not reliable (as in is often offline).
> >
> > Heuristics can be used to improve traffic but it's orhtogonal to the
> > problem of improving our current sync protocol.
> > Also, these heuristics would probaly be used to close channels to
> > unreliable nodes instead of filtering/delaying publishing updates for
> > them.
> >
> > Finally, this is not just obsessing over bandwidth (though bandwidth
> > is a real issue for most mobile users). I'm also over obsessing over
> > startup time and payment UX :), because they do matter a lot for
> > mobile users, and would like to push the current gossip design as far
> > as it can go. I also think that we'll face the same issue when
> > designing inventory messages for channel_update messages.
> >
> > Cheers,
> >
> > Fabrice
> >
> >
> >
> > On Wed, 9 Jan 2019 at 00:44, Rusty Russell <rusty at rustcorp.com.au> wrote:
> >>
> >> Fabrice Drouin <fabrice.drouin at acinq.fr> writes:
> >> > I think there may even be a simpler case where not replacing updates
> >> > will result in nodes not knowing that a channel has been re-enabled:
> >> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> >> > it, U3 enables it again and is the same as U1. If you discard it and
> >> > just keep U1, and your peer has U2, how will you tell them that the
> >> > channel has been enabled again ? Unless "discard" here means keep the
> >> > update but don't broadcast it ?
> >>
> >> This can only happen if you happen to lose connection to the peer(s)
> >> which sent U2 before it sends U3.
> >>
> >> Again, this corner case penalizes flapping channels.  If we also
> >> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
> >>
> >> > But then there's a risk that nodes would discard channels as stale
> >> > because they don't get new updates when they reconnect.
> >>
> >> You need to accept redundant updates after 1 week, I think.
> >>
> >> Cheers,
> >> Rusty.


More information about the Lightning-dev mailing list