[Lightning-dev] [Proposal] Payment Route Reservation

g0b1el g0b1el at proton.me
Sun Feb 26 13:40:28 UTC 2023


Hi, lightning devs

It's my first mail, so first, I'd like to say a big thanks to everyone involved in the development of the lightning network.

I've just finished going through this mailing list, so I'm not sure if Payment Route Reservation is already proposed in some other place. If there is a similar proposal, please point me to the right place. 

I'll first list some potential improvements with this proposal:

1)  Higher reliability of payments
2)  Lower payment latency (on average)
3)  Lower fees
4)  Increase in privacy
5)  Trampoline payment improvements
6)  Significant reduction of gossip messages
7)  Healthier, more decentralized network
8)  Simplified routing algorithm
9)  LN Wallet UX improvements
10) Fast spam prevention
11) Channel liquidity probing prevention
12) Eliminates the need for splice-in (and partially splice-out)
13) Eliminates the need for rebalancing and fee update scripts 
14) Eliminates the need for global nodes' reputation

So what is a payment route reservation?
  

Payment Route Reservation 
========================= 

The idea behind Payment Route Reservation is to split the payment into two steps. In the first step, we reserve the route, and in the second, we send a payment. In the reservation step, routing nodes will not sign a new commitment state. Instead, they will just reserve a specified amount of satoshis. Those satoshis might be used with the next payment step or not if the reservation is canceled. Each reservation has a timeout (let's say 1 min). After a timeout, the reservation is removed from the channel reservation set and considered canceled. Reservation can be canceled manually by any upstream node.

For example:


          add_reservation(payment_hash, amount) -->
             ┌───┐         ┌───┐         ┌───┐
     ───────►│   ├────────►│   ├────────►│   ├────────►
   S         │ A │         │ B │         │ C │          R
     ◄───────┤   │◄────────┤   │◄────────┤   │◄────────
             └───┘         └───┘         └───┘
          <- reservation_success(fees_sum, cltv_delta_sum)


S is a sender, R is a receiver, and A, B, and C are routing nodes.

S wants to send `amount` satoshis with `payment hash`[1] to R. S first creates a reservation onion for A->B->C->R route and then sends the onion to the first node in the route, using `add_reservation` call. A unwraps the onion, and if he has enough unreserved balance in A->B channel, he will add a new reservation for the `amount` request to the channel reservation set and then forward the onion to B. Reservation in reservation set can be addressed by `payment hash`.
If A has an insufficient channel reservation balance, A will return 'reservation_failed' to the upstream node(sender S). B and C will do the same. Assuming there is enough reservation balance in all channels to reserve a payment, reservation flow eventually reaches R. R unwraps the onion and checks if he knows the preimage. If he knows, R returns `reservation_success` to upstream node C.

At this point, we have successfully reserved the payment route for `amount` satoshis from S to R. But, our route nodes haven't included the fees(or cltv_delta) of downstream nodes. `reservation_success` returns to upstream route node tuple (fees_sum, cltv_delta_sum). In this case, R will return (0, 0) to C because he is the last node in the route[2]. C will try to extend his reservation by fees from the R. If reservation extension is successful, C sends `reservation_success` to B, with a tuple increased by his fees[3] and his cltv delta. If C can't extend the reservation, he will send to B `reservation_failure` and to R `cancel_reservation`[4]. A will do the same. Eventually, if everything is ok with nodes A and B, S will receive a tuple (fees_sum, cltv_delta_sum). Now S knows precisely how much fees and cltv_delta route expects, and he can now send the payment onion. If S doesn't like the fees or cltv_delta, he can try some other route.

Let's compare current payment forwarding to payment with reservation forwarding:


  +-------+                               +-------+             +-------+                               +-------+
  |       |--(1)---- update_add_htlc ---->|       |             |       |--(1)---- add_reservation ---->|       |
  |       |                               |       |             |       |<-(2)---- reservation_success--|       | reservation onion
  |       |                               |       |           ──|─────────────────────────────────────────────────
  |       |--(2)--- commitment_signed --->|       |             |       |--(3)--- commitment_signed --->|       |
  |   A   |<-(3)---- revoke_and_ack ------|   B   |             |   A   |<-(4)---- revoke_and_ack ------|   B   |
  |       |                               |       |             |       |                               |       | HTLC onion
  |       |<-(4)--- commitment_signed ----|       |             |       |<-(5)--- commitment_signed ----|       |
  |       |--(5)---- revoke_and_ack ----->|       |             |       |--(6)---- revoke_and_ack ----->|       |
  |       |                               |       |             |       |                               |       |
  +-------+                               +-------+             +-------+                               +-------+

One `update_add_htlc` is now replaced with two calls - `add_reservation` and `reservation_success`. Also, we have two onions now.

When node A receives reservation_success with a tuple (fees_sum, cltv_delta_sum), nodes A and B have all the information to sign a new commitment transaction. The new commitment transaction amount would be `amount` + fees_sum, and cltv_delta would be cltv_delta_sum. 

For now, we've just increased the latency. I have yet to check any implementation, so it is hard to guess how significant this increase is. The thing to note is that the reservation set is stored only in memory. No information needs to be stored in DB or replicated. If the node crashes with a live reservation, nothing bad can happen. So increase in latency would probably be proportional to 1 additional call between hops and how fast hops can unwrap the onion.

But before we tackle latency, let's first address the reliability of payments.


[1] For PTLC case, we can use points
[2] R can return some low random values to prevent C from discovering that R is the receiver.
[3] Fees can be ppm, base, or whatever fee function the node operator prefers.
[4] To ensure the node can always extend a reservation, we can set aside part of the channel reserve balance just for this case (let's say 0.5 percent).


Route Split Reservation 
=======================

If the routing node can't add a new reservation because the channel reservation balance is insufficient, the node can try to find some other route to reserve the missing amount for the next downstream node.

For instance, A needs to forward a reservation of 10 mBTC to B, but A->B channel has only 8 mBTC left of its channel reservation capacity[5].

           ┌───┐           ┌───┐
       10  │   │     8     │   │   10
      ────►│ A ├──────────►│ B ├───────►
           │   │           │   │
           └─┬─┘           └─▲─┘
             │               │
             │2    ┌───┐     │2
             │     │   │     │
             └────►│ C ├─────┘
                   │   │
                   └───┘


Rather than returning `reservation_failure` and losing the possibility to earn fees on the remaining 8 mBTC, A will try to find a route for the remaining 2 mBTC. Let's say A picks A->C->B route. It is important to note that the new route should share the same payment hash as the main route(->A->B->)[6]. Now B will accept to forward the reservation of 10 mBTC further because he knows if a payment route is ever constructed, and if the preimage is revealed, he will receive his 10 mBTC.

There can be optimization in terms of fees. For instance, A can renounce his A->C channel fees. So the only fee A needs to pay for the new route is for 2 mBTC on C->B route. Assuming network convergences to roughly the same fees on all nodes, cumulative route split fees effect would be the same as if the whole amount was routed through A->B. Although this example looks a bit too perfect, it's quite common on the plebnet part of the lightning network, where most nodes are created using Triangle Liquidity Swaps.
It is important to note that A now needs to take into account the fees and cltv_delta of the new route. Fees are the sum of the fees in A->B channel and A->C->B route. Cumulative cltv_delta would be maximum between A->B and A->C->B route.

Note that A can try multiple reservation routes. Also, if C doesn't have enough reserve balance, C can split a route through some node D to B, etc. To prevent infinite reservation recursion calls (in case of huge payment), the route split node will pass a tuple (max_fees, max_cltv_delta) that he will accept from the new route. Each node in the route will first decrement tuple by his fees and cltv_delta. If the new tuple contains any element below 0, the node will return `reservation_error`. If tuple elements are still positive, the reservation will continue, and decremented tuple will be passed to the next downstream node.

To hide the difference between the main route and route split, the payment sender will also send the tuple(max_fees, max_cltv_delta) that he accepts for the main payment route. Now, if someone is observing lightning network traffic, he wouldn't know if a reservation is for payment or route split. Thus we have an increase in privacy of payment.

Note that now we can also send payments larger than channel capacity A->B.


# Trampoline route split reservation

What if A and B don't have a direct channel, or the sender doesn't know the whole network topology or all the channels between A and B are suddenly disabled?


                          ┌───┐         .         ┌───┐
                       10 │   │---->   . .   ---->│   │ 10
                     ────►│ A │---->   . .   ---->│ B ├─────►
                          │   │---->    .    ---->│   │
                          └───┘                   └───┘

A still can try to reserve payments through different routes to B. Idea is similar to what is currently known as Trampoline Routing. 
The biggest benefit of using Trampoline route split reservation payment over Trampoline Routing payment is that a sender, before initiating the payment, knows exactly how much in fees he needs to pay. As well as how big cltv_delta will be.



How expensive is a route split reservation?
A needs to create a new reservation onion, and until that route split reservation propagates to B, B can't forward the reservation further downstream.
So we increased latency again. 

My guess is that reservations with route split payment will increase the reliability of single payments to the range of 90-95%. But can we reach 99% and finally lower the average latency?

[5] The outbound liquidity can be greater than 10 mBTC, but the reservation set has only 8 mBTC left.
[6] For PTLC A->B and C->B, payment hops need to commit to the same point.


Redundant Multi Route Payment Reservation
========================================

The idea is similar to what is now called Multi-Path Payments and Stuckless Payments[7].


                          ┌───┐       ┌───┐       ┌───┐
                          │   ├──────►│   ├──────►│   │
                     ────►│ R │       │ T │       │ U ├─────►
                          │   ├──────►│   │   ┌──►│   │
                          └───┘       └───┘   │   └───┘
                                              │
                             .......................

                          ┌───┐       ┌───┐  │    ┌───┐
              S           │   ├──────►│   │  └───►│   │
                     ────►│ D │       │ E │       │ F ├─────►    R
                          │   ├────┐  │   ├──┐    │   │
                          └───┘    │  └───┘  │    └───┘
                                   │         │
                          ┌───┐    │  ┌───┐  │    ┌───┐        
                          │   │    └─►│   │◄─┘    │   │
                     ────►│ A │       │ B │       │ C ├─────►
                          │   ├──────►│   ├─────► │   │
                          └───┘       └───┘       └───┘

The problem with a single route reservation is that reservation fees(or cltv) might be too high, and the sender will have to create a new reservation route hoping to get a better fee. Also, reservations might take longer than we are willing to wait. 

To overcome this issue, we first split the amount into N parts[8], and then we create 2*N[9] reservation routes each for the amount of `payment amount`/N. Now a sender needs to wait for just N routes out of 2N to return `reservation_success`, and then initiate payment. Note that multiple reservation routes can go through the same node.

This now creates an interesting market dynamic where the fastest route wins. And the fastest route is with the fastest nodes. Now nodes are incentivized to compete with better internet and better hardware. There would also be competition between lightning node developers to create the fastest lightning node. 

If a sender is not that interested in the speed of payment, he can wait for all 2N routes to return reservation results and then pick the cheapest N routes.
Now nodes also need to compete in fees as well. Cheaper the fees, the higher probability of the sender picking the route.

All this eventually will benefit lightning network users.

It doesn't matter if the channels are balanced or not. If the node is well connected and has enough outbound liquidity, the reservation route split should(in most cases) find a route to the next node. So there is no need for rebalancing or fee update scripts. 

Just because nodes reserved the route doesn't mean they have to route the payment. There is still a probability of payment failure. Payment can fail because of reservation time out, node going offline, griefing node, etc. To further increase reliability, we can also add stuckles payments.

With Redundant Multi Route Payment Reservations, I believe the reliability of payments would be around the 99% percent range.
Latency, on average, should converge around a value much lower than the average value we have now. 

[7] https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-June/002029.html
[8] Unless it is a small amount, for instance, below 0.5 mBTC. 
[9] Or some other multiplier


LN Wallet UX improvements
=========================

Not every payment is the same. For instance, if I'm paying a groceries bill in a supermarket, I value the speed of payments more than fees. On the other hand, if I'm paying my electricity bill at home, I value fees more than the speed of transactions. Also, if I'm paying a VPN subscription, I care more about privacy and cost over transaction speed.

With route reservation, we can present the exact user fees he is going to pay, so wallet developers can add the following element to UX:

Radio box: Fast Payment | Cheap Payment | MaxPrivacy

- If a user selects `Fast Payment` wallet will wait for the first N fastest routes and initiate payment automatically(unless fees exceed in percents some config value). 
- If a user selects the `Cheap Payments` node will wait for all 2N routes to return, pick N cheapest routes, and then present the user with fees and a send button. If fees are too high, the user can press the `Try again` button.
- For `MaxPrivacy` wallet will behave as in `Cheap Payment` mode, except the routing algorithm will prefer longer routes over smaller nodes, with preferably at least one Tor node.

*N routes can be N trampoline routes if a wallet doesn't have a whole network topology.


Gossip messages spam
====================

Currently, 97 percent or more LN gossip messages go to `channel_update` messages. The reason is that most nodes on the network are running some form of automatic fee adjustment script. If the number of nodes and channels continues to grow, soon, this can become unsustainable.

Reservation is not used just to reserve the route but also to inform a sender of fees and cltvs. Thus there is no need for nodes to publish their fees to the whole network. And thus, information like base_fee, ppm fee, and cltv can be removed from the `channel_update` message.

With route reservation, there is no need for channel rebalancing or fee update scripts. Node operators can still run fee update scripts without restrictions, but those updates would not propagate to the network.


Routing algorithm and network decentralization
=============================================

In terms of network topology, a current lightning network can be seen as a decentralized network. But in terms of a payment network, it is more a hub base network, where the majority of payments go through ~30 largest nodes. Medium and smaller nodes rarely forward any payment, and they are mostly used by big nodes for channel rebalancing. This is a big issue in terms of fairness(the rich are getting richer faster), privacy, reliability of the network, cost of a transaction, etc.

I would argue that this problem is not because the rich are rich but because of the routing algorithm. Currently, used routing algorithms usually calculate the probability of routing a payment over some channel as a channel capacity function. If the channel has a large capacity probability is higher, small channels will have a lower probability of payment routing. Because all wallets are effectively running in `Fast Payment` mode, the routing algorithm will almost always choose bigger channel routes than smaller ones and thus create the hub-based model we have today.

If we remove `fees` and `cltvs` from the gossip, what is left for the routing algorithm is `channel_capacity`, `htlc_minimum_msat` and `htlc_maximum_msat`. 
`htlc_maximum_msat` - can be a local node config. If the reservation attempts to reserve a greater amount than the maximum allowed for a single payment, a node can try to split the reservation over multiple channels.
`htlc_maximum_msat` - can remain a global config to avoid unnecessary reservation failures.

What about channel capacity? 

Channel capacity can be the biggest misleading factor for LN routing algorithms. For instance, I can create a node and then buy 100 channels with 1BTC each of income liquidity. Then I set all channel fees to 0. Every routing algorithm will try to route payments through my node, but my node can't route a single satoshi. I would suspect that today's most popular nodes on the LN network probably have way more inbound than outbound liquidity.
So I think that capacity should not be used at all as a heuristic in the routing algorithms.
I think a better approach would be if nodes would just publish their cumulative outbound liquidity over all channels. If there are routing node A and routing node B, and they are well connected, we know that we can send(in theory) from node A to node B, with route split, a maximum payment of `min(A.balance, B.balance)` amount.
This would also signal to the rest of the network where liquidity is needed and where there is too much liquidity. There would be no liquidity sinks, and network liquidity could be better deployed. Automatic channel opening becomes trivial.

But would the nodes lie? 
If the node lies, the reservation will fail on his node, and if there is a lot of reservation failure on misleading nodes, neighbor nodes might close the channel with him. Also, while processing huge payment reservations, which will eventually fail, a node could have routed a smaller payment and earned fees. So nodes will be incentives to report the real outbound liquidity or lower if privacy is the issue.

Now routing algorithm has become very simple. We make routing decisions just on hop count from our node, on node reported balance, and a bit of randomness. Randomness is the most important part. If some route fails, we can't know which node to blame. It can be any node in the route. The routing algorithm should not track any blame information about individual nodes and would not try to blacklist any particular node. If the node is well connected and the routing algorithm is random enough, we should be able eventually to find enough routes to the sender. If there is a faulty, griefing, or just slow node in the network, we let nodes handle deputies themselves. Each node will track the local reputation of each of its neighbors. If there are a lot of failed payments from one neighbor, the node operator might contact that neighbor for an explanation or just close the channel to that neighbor node.
 
For privacy currencies, not publishing channel capacity is a necessity. Privacy currency users (like Monero) don't want to relieve any of their UTXO amounts. Privacy currency LN can avoid publishing their channel capacity and outbound node liquidity, and only make routing decisions on hop counts and randomness. There would probably be more reservation failures, especially for larger amounts, but it should be worth privacy preserved.


Fast spam prevention
====================

Fast spam is a situation on the network when malicious nodes start creating a lot of payments to the random nodes, which will never be resolved because the receiver does not know the payment preimage. The receiver can only fail the payment. 
This is not possible with a reservation anymore because receivers will reject a reservation, and there will be no payment. Now, this is a lot better because the reservation does not commit to a new channel state, and no DB operations are involved.

But how to prevent reservation spam?
If the node sees a lot of reservation errors per minute coming from one channel, the node can throttle reservations coming from that channel. Eventually, if spam continues node operator can inform the neighbor node operator to investigate the reason. The neighbor node will do the same. And if the problem persists, the node operator can just close the channel.

But what if an attacker controls both nodes(receiver and sender) or if the attacker creates a circular route?

The malicious receiver node will accept the reservation, and then when HTLC comes, it just fails the payment. This would be basically fast spam again, with a bit more work to be done by the attacker.
As proposed before[10], we can demand a prepayment. If a node sees a large amount of failed payments per minute on some of its channels, a node can start demanding some prepayment to route HTLC. The problem is how much to charge. If we charge too little, spam will continue. If we charge too much, this will affect regular micro-payments. 
We can extend reservation_success tuple with a prepayment fee. And we can adjust the fee on the fly. There is no need to propagate it to the whole network. A node can initially charge 0 prepayment fees. This is in their best interest because the routing algorithm will prefer routes without prepayment. The reason is that there is still a chance of payment failure. In a payment failure case, the sender loses the pre-payed amount unconditionally. Now if a node observes a large number of payment failures per minute, he can start aggressively increasing his prepayment fees, thus making the attack very costly.

There was also a discussion on how to send a prepayment. Wouldn't the first node in the route just steal all prepayment and then fail the payment?
There are two possible scenarios here:
 - prepayment is less than or equal to the fees the node is going to earn routing the payment. In this case, it is in the node's best interest to route the payment.
 - prepayment amount is greater than node payment fees. I can see two possible solutions:
    - let a first node steal the prepayment. This should be fine because if prepayment gets so big, most likely, the attacker is the one being robbed. Honest users will prefer routes without a prepayment, especially large prepayments.
    - we can also try to send multiple keysend prepayments, but there are some privacy issues here because the length of the route is revealed to the first node, and the last route node finds out who the receiver is. A slightly less bad option is to send prepayments for the parts of the route. For instance, one keysend is for the first half of the route. The second is for the second half of the route.

[10] https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-February/002547.html

Channel liquidity probing
=========================

With a route reservation split, inspecting channel liquidity from other nodes becomes significantly more demanding. The reason is the fact that reservations should rarely fail. Now observer needs to deduct the channel liquidity from the reservation response (fee,cltv_delta). If there is a reservation split, fees and cltv_delta will most likely be higher than if there is no reservation split. But every node can change fee and cltv_delta on the fly, and by just randomizing those values on every return, potential channel observer jobs get much harder.
Also, if the attacker starts sending a lot of reservation requests, those requests might be seen as a DOS attempt, and the node might start throttling the reservation requests.


No need for splice-in(and partially splice-out)
===============================================

The idea behind the splicing is to increase the capacity of the channel, so that channel can route bigger payments. Also, the routing algorithm prefers a bigger channel so the node would route more payments.
With route split reservations, this is not necessary anymore. For example, if nodes A and B have N channels between them, node A  would have the option to reserve a route through all N channels during reservation. Thus N smaller channels can be seen as one big channel, not by routing algorithm but by node A. Routing algorithm will just care if there is a connection between A and B.

If nodes A and B have N channels between them, closing one channel basically becomes a splice-out operation.



Looking forward to your feedback.

Best Regards,
g0b1el






More information about the Lightning-dev mailing list