[Lightning-dev] Improve Lightning payment reliability through better error attribution

Fri Jun 14 10:59:26 UTC 2019

Hi ZmnSCPxj,

> > That is definitely a concern. It is up to senders how to interpret the
> received timestamps. They can decide to tolerate slight variations. Or they
> could just look at the difference between the in and out timestamp,
> abandoning the synchronization requirement altogether (a node could also
> just report that difference instead of two timestamps). The held duration
> is enough to identify a pair of nodes from which one of the nodes is
> responsible for the delay.
> >
> > Example (held durations between parenthesis):
> >
> > A (15 secs) -> B (14 secs) -> C (3 secs) -> D (2 secs)
> >
> > In this case either B or C is delaying the payment. We'd penalize the
> channel between B and C.
>
> This seems better.
> If B is at fault, it could lie and reduce its reported delta time, but
> that simply means it will be punished with A.
> If C is at fault, it could lie and increase its reported delta time, but
> that simply means it will be punished with D.
>
> I presume that the delta time is the time difference from when it sends
> `update_add_htlc` and when it receives `update_fulfill_htlc`, or when it
> gets an irrevocably committed `update_fail_htlc` + `revoke_and_ack`.
> Is that accurate?
>

Yes that is accurate, although using the time difference between receiving
the `update_add_htlc` and sending back the `update_fail_htlc` would work
too. It would then include the node's processing time.

> Unit should probably be milliseconds
>

Yes, we probably want sub-second resolution for this.

An alternative that comes to mind is to use active probing and tracking
> persistent data per node.
>
> For each node we record two pieces of information:
>
> 1.  Total imposed delay.
> 2.  Number of attempts.
>
> Suppose a probe or payment takes N milliseconds on a route with M nodes to
> fulfill or irrevocably fail at the payer.
> For each node on the route, we increase Total imposed delay by N / M
> rounded up, and increment Number of attempts.
> For error reports we can shorten the route if we get an error response
> that points to a specific failing node, or penalize the entire route in
> case of a completely undecodable error response.
>
> When finding a route for a "real" payment, we adjust the cost of
> traversing a node by the ratio Total imposed delay / Number of attempts (we
> can avoid undefined math by starting both fields at 1).
> For probes we can probably ignore this factor in order to give nodes that
> happened to be borked by a different slow node on the trial route another
> chance to exonerate their apparent slowness.
>
> This does not need changes in the current spec.
>

I think we could indeed do more with the information that we currently have
and gather some more by probing. But in the end we would still be sampling
a noisy signal. More scenarios to take into account, less accurate results
and probably more non-ideal payment attempts. Failed, slow or stuck
payments degrade the user experience of lightning, while "fat errors"
arguably don't impact the user in a noticeable way.

Joost
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20190614/9c95c6f1/attachment.html>