[Lightning-dev] Improving Payment Latency by Fast Forwards

ZmnSCPxj ZmnSCPxj at protonmail.com
Wed Apr 24 08:32:26 UTC 2019


Introduction
============

Currently, the protocol for forwarding requires 1.5 round trips before the next node can safely forward the payment.
This creates much greater latency for payments, and even with the current network at a nascent stage, payments can take entire seconds to complete.
As the network grows and some nodes start becoming unable to store the entire routemap, remote route lookups are needed.
Remote route lookups are likely to increase the route length, thus payment latency will increase even more.

So, we should consider ways of improving payment latency, before the increasing LN size causes further increases in payment latency.
Like all optimizations, it is likely that we will need to run as fast as we can, just to stay in one place, like the Red Queen.

Slow Forwarding
---------------

The current protocol for a single forward looks like:

      Alice                     Bob                     Carol
        |                        |                        |
        | ---update_add_htlc---> |                        |
        | --commitment_signed--> |                        |
        |                        |                        |
        | <-commitment_signed--- |                        |
        | <--revoke_and_ack----- |                        |
        |                        |                        |
        | ---revoke_and_ack----> |                        |
        |                        | ---update_add_htlc---> |

Bob cannot safely forward `update_add_htlc` immediately since there is no commitment transaction that contains the HTLC yet.
Further, even when a new pair of commitment transactions is signed by Alice, two commitment transactions can still safely be put onchain, one of which does not contain the HTLC.
Only when Alice and Bob have revoked their previous commitment transaction can Bob safely forward to Carol.

The above concept is called "irrevocably committed" by the BOLT spec.

To reach this "irrevocably committed" state, requires the above 1.5 round trips.
If Alice and Bob are physically distant from each other, communication latency can be very large.

Further, both Alice and Bob want to reduce bandwidth usage.
This sometimes means that Alice and Bob will wait a short while before signing commitments and revoking previous ones, in order to keep the number of new signatures being passed around small.
C-lightning, for example, defers signing a new commitment by 10ms after sending an update in case another forward on the same channel is requested.
This causes additional latency on top of the communication latency.

Poon-Dryja Outputs
------------------

Let me now digress, to investigate the outputs of a Poon-Dryja commitment transaction.

There are at least two commitment transactions that are valid at any one time, one for each node in the channel.
Each one is symmetrical, but different.
Thus, there is the concepts below:

1.  "local" commitment transaction, is the valid commitment transaction I am holding.
2.  "remote" commitment transaction, is the valid commitment transaction my counterparty is holding.

Each commitment transaction has at least two outputs (although one may be elided if too small or 0).
Thus, each commitment transaction has these two outputs:

1.  "to-local" output.
    On the local commitment transaction, this is my "main" output.
    On the remote commitment transaction, this is my counterparty "main" output.
2.  "to-remote" output.
    On the local commitment transaction, this is my counterparty "main" output.
    On the remote commitment transaction, this is my "main" output.

In the original Poon-Dryja formulation, the "to-remote" output pays directly to a P2WPKH.
However, the "to-local" output is encumbered by a CSV, and is revocable.
In the BOLT 1.0 spec, the SCRIPT is:

    OP_IF
        # Penalty transaction
        <revocationpubkey>
    OP_ELSE
        `to_self_delay`
        OP_CSV
        OP_DROP
        <local_delayedpubkey>
    OP_ENDIF
    OP_CHECKSIG

Of note, is that the `revocationpubkey` is actually a combination of the local node revocation key, and a remote node key.
It is like a 2-of-2 that cannot be signed cooperatively by both parties, but which the local node can give entirely to the remote node so that the remote node can sign by itself (revocation).
It could have been implemented as a 2-of-2 multisignature, but the above formulation takes less block space.

In the recent Lightning Developer Summit in 2018, it was decided that the "to-remote" output will also be encumbered by a CSV.
I will propose in this writeup, that a modification of the above script be used in both "to-local" and "to-remote".

Fast Forwards
=============

Ideally, we would like to be able to say that an HTLC is "irrevocably committed" using only a single message from Alice to Bob.
That way, communication latencies when forwarding payments can be reduced, which should improve payment speed over the network in general.

I observe that one may consider any offchain system a specialization of an offchain transaction cut-through system.
Thus, one may model changes to the offchain system state as the creation of some transactions, followed by a cut-through of those transactions into the new state.

Thus, I propose that to-local outputs be encumbered with the script:

    OP_IF
        # Penalty transaction/Fast forward
        <local_revokepubkey> OP_CHECKSIGVERIFY <remote_penaltyclaimpubkey>
    OP_ELSE
        `to_self_delay`
        OP_CSV
        OP_DROP
        <local_delayedpubkey>
    OP_ENDIF
    OP_CHECKSIG

Then, symmetrically, to-remote outputs are encumbered with the script:

    OP_IF
        # Penalty transaction/Fast forward
        <local_revokepubkey> OP_CHECKSIGVERIFY <remote_penaltyclaimpubkey>
    OP_ELSE
        `to_self_delay`
        OP_CSV
        OP_DROP
        <remote_delayedpubkey>
    OP_ENDIF
    OP_CHECKSIG

When doing a `revoke_and_ack`, the sender gives the `local_revokeprivkey` to the remote side, who now knows both keys to the penalty branch and can now penalize the sender if the revoked commitment transaction is published.

Then, we define a new message, `fastforward_add_htlc`.
This creates a pair of transactions, the fast-forward HTLC transactions, on top of the latest commitment transactions of both nodes.
For simplicity, they can be restricted to be sent only while one commitment transaction is valid on both sides and while no update is in-flight in the channel.
(alternatively, a channel using fast-forwards might be restricted to *only* using fast-forwards, with updates of commitment transactions being in strong synchrony rather than the weak synchrony currently used)

The local/remote fast-forward HTLC transaction spends the to-local/to-remote output of the commitment transaction.
It spends the value of the HTLC being forward to the normal HTLC construction used in Lightning.
The remaining change is placed into "the same" script that was spent (with pubkeys changed).
This change output can now be considered the "next" main output (it is used to chain the next `fastforward_add_htlc` from that side).

The `fastforward_add_htlc` includes the originating node signatures for both the local and remote fast-forward HTLC transactions.
It also contains the signatures needed to support the revocable HTLC construction.

For example, the local fast-forward HTLC transaction spends the to-local output of the local commitment.
The sending node signs using the `local_revokepubkey` and includes this signature in the `fastforward_add_htlc` message.
The remote fast-forward HTLC transaction spends the to-remote output of the remote commitment.
The sending node signs using the `remote_penaltyclaimpubkey` and includes this signature in the `fastforward_add_htlc` message.

The receiver of the message can now consider this HTLC to be irrevocably committed.
This is because it can now spend the main output of the counterparty using the fast-forward HTLC transactions by providing the other missing signature.
Further, the sender cannot revoke it since it cannot double-spend that transaction until after the CSV restriction.
The CSV restriction is precisely how long the receiver can be offline before a successful theft can be performed, so it should not be an issue for the receiver.

Thus, upon receipt of the `fastforward_add_htlc`, it is now possible for Bob to immediately begin forwarding the payment onward:

      Alice                     Bob                     Carol
        |                        |                        |
        | -fastforward_add_htlc> |                        |
        |                        | -fastforward_add_htlc> |

(if Carol does not support fast forwards, Bob can send the old `update_add_htlc` instead.)
Then, the next commitment transaction will "cut-through" any built up fast-forward HTLC transactions, collapsing the HTLC outputs to the commitment transactions.

Unilateral Closes
-----------------

Unfortunately, unilateral closes mean that, if the counterparty is not paying attention, they will not be able to claim any HTLC added via `fastforward_add_htlc`.
Now, the CSV setting one selects should reflect how long one feels their node can remain offline.
And as long as we are able to come back online before the CSV is reached, we can apply any fast-forward HTLC transactions and claim the HTLCs that have dropped onchain.

Of course, the real world has many ways to surprise our expectations.
Thus, using fast-forwards implies higher risk for nodes that accept fast forwards and which then themselves forward immediately.


Fast Failures
=============

Fast forwards are not enough: due to incomplete information, failures of individual payment routing attempts are common.
The directive is to simply try and try again until the payment pushes through or no routes remain or too much time has been spent trying to route.

Further, success is already fast: as soon as you receive `update_fulfill_htlc` you can immediately safely send `update_fulfill_htlc` to upstream without the commitment signing and revocation of previous commitment.

However, failures via `update_fail_htlc` cannot be propagated immediately for a similar reason that payments via `update_add_htlc` cannot be propagated immediately: there exists a valid commitment that still has the HTLC by which the money can still be claimed onchain.

I observe that the "fast forward" technique simply reuses the revocation path to root a new transaction.
I also observe that the HTLC construction used by Lightning is revocable.

Thus, since it is possible to revoke the HTLC construction used by Lightning, we can reuse the revocation path of an HTLC as the "fast failure" path, using the same technique we used in fast forward.
Care must be taken to provide signatures for failing the HTLC itself, as well as the HTLC-success and HTLC-timeout transactions.

The difference here is that failed HTLCs do not contribute back to the "main" output immediately.
The transaction used to fail the HTLC is a simple one-input one-output transaction.
Only when failure of the HTLCs has been put in a new commitment transaction can their value be reused for adding new HTLCs.

For example, suppose we start with 3mBTC on my side of the channel.
I offer two different HTLCs to you, of 1mBTC each, by two `fastforward_add_htlcs`.
However, both of them fail.
This leaves me with only 1mBTC on my main output, with two different 1mBTC outputs that have not been "merged" back into it yet.
So I can no longer forward a 2mBTC HTLC, until we have resynchronized (signed new commitments and revoked) and combined the failed HTLC outputs back to my main output.

Still, this removes the commitment transaction synchronization away from the critical path in the overall Lightning try-and-try-until-you-die routing algorithm, improving payment latency overall.
Thus, this may be an acceptable tradeoff when considering payment latency.

Fees
====

Oh no.

Please do not ask about fees.

In case of a unilateral close after a fast-forward or fast-fail, additional transactions need to be put onchain, beyond just the commitment transactions.
These transactions need to pay for onchain fees.

Thus, channels offering low-latency fast forwards need to charge higher offchain fees to offset the risk that they need to pay onchain fees.
Further, channels offering low-latency fast forwards also need to offset the unilateral close risk with higher fees.

Perhaps the two nodes on the channel can attest that they have low-latency fast forwards.
However, merely because they claim it does not make it so.

Nodes could make known-failing payments (generate a random payment hash, route through the "fast forward"-claiming channel, measure latency) to determine the truth of the fast forward.
In fact, if nodes do this "in the background" continuously, they can map out which channels have good latency (regardless of the use of fast forward or not: two nodes located physically close to each other with low-latency internet connections may very well have good enough latency even without fast forwards).

Fast Forwards on Decker-Russell-Osuntokun
=========================================

Decker-Russell-Osuntokun does not, in fact, need fast forwards, if we design the link-level protocol properly.

Each "add HTLC" "fulfill HTLC" "fail HTLC" "change fee" update message includes the signature needed for the next update transaction and the next state transaction, that immediately has the new state.
Then the peer should reply with the remaining signatures needed immediately.

Upon receiving an "add HTLC", one can now construct the full next update transaction/next state transaction, and the existence of this update transaction is enough to invalidate any previous update transaction.
So it is safe to forward the "add HTLC" to the next hop immediately as soon as the node can update its local database.
This is different from Poon-Dryja, where the existence of the next commitment transactions does not imply that the previous commitment transaction is revoked.

Of course, there is now the issue of "how do we handle when both nodes want to update at the same time and sent conflicting 'next update' messages?"
Perhaps it can be left as an exercise to the reader how to do this while not requiring any round trips in the critical path of forwarding, in the typical case.
For instance, if one has sent an update but receives an update in return, coordinate with the counterparty, do not forward yet, and then figure out a common "next" update/state transaction that has both updates, then continue with forwarding the update you received.


More information about the Lightning-dev mailing list