[Lightning-dev] Quick analysis of channel_update data

Tue Jan 8 23:38:12 UTC 2019

Christian Decker <decker.christian at gmail.com> writes:
> Assume that we have a network in which a node D receives the updates
> from a node A through two or more separate paths:
>
> A --- B --- D
>  \--- C ---/
>
> And let's assume that some channel of A (c_A) is flapping (not the ones
> to B and C). A will send out two updates, one disables and the other one
> re-enables c_A, otherwise they are identical (timestamp and signature
> are different as well of course).

> The flush interval in B is sufficient
> to see both updates before flushing, hence both updates get dropped and
> nothing apparently changed (D doesn't get told about anything from
> B). The flush interval of C triggers after getting the re-enable, and D
> gets the disabling update, followed by the enabling update once C's
> flush interval triggers again.

Yes, we save gossip from B->D, but not C->D.  That's OK.

In general we won't get coalescing if the DOWN/UP combo spans gossip
flush.  If everyone is the same 60 second timers this will continue to
happen across the network AFAICT?  We should probably change our gossip
timer to 90 +/- 30 seconds which would (I think?) give more chance of
flap suppression.

> Worse if the connection A-C gets severed
> between the updates, now C and D learned that the channel is disabled
> and will not get the re-enabling update since B has dropped that one
> altogether. If B now gets told by D about the disable, it'll also go
> "ok, I'll disable it as well", leaving the entire network believing that
> the channel is disabled.

You're right; B needs to remember the last timestamp of the update it
discarded, and ignore ones prior.

So, in this (fairly obscure) scenario, the flapping channel gets
penalized.  But network is happier, and this suppression is a nice local
policy.

> If the routing
> protocol is too chatty, we should make efforts towards local policies at
> the senders of the update to reduce the number of flapping updates, not
> build in-network deduplications. Maybe something like "eager-disable"
> and "lazy-enable" is what we should go for, in which disables are sent
> right away, and enables are put on an exponential backoff timeout (after
> all what use are flappy nodes for routing?).

Well, we lazy-disable because we assume it's still advertised as
available.  We eager-enable (iff we sent a disable) because we assume
it's advertised as unavailable so we won't get traffic through it.
Though we could set delay of 30 seconds on the enable, I think we're
already current best practice?

Cheers,
Rusty.