[Lightning-dev] Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0

Wed Mar 6 10:38:27 UTC 2019

Good morning Rene,

The base idea is good at first glance.

However, let us consider this situation:

          0.1   1.0
        A --------- B
    1.5 |         / 0.5
        |        /
        |       /
        |      /
        |     /
        |    /
        |   /
        |  /
    0.5 | / 0.75
        C

A must pay B, 1.0 BTC.

A knows the exact state of AB and AC channels, but only knows that BC has total 1.25 capacity.

A sends via route A->C->B.
It knows that it has sufficient capacity in AC.
It also knows that the total capacity in CB has a chance of transferring 1.0BTC.

So what happens (let us assume Lightning fees are negligible).

1.  A locks 1.0 in AC in an HTLC.
2.  C cannot forward since it has only 0.75 in CB.
    It initiates a rebalance.
3.  C notices it has 0.5 BTC in AC.
    It can transfer 0.25BTC from AC to CB to be able to get 1.0 in CB that it can forward.
    So it routes 0.25 BTC via C->A->B->C.
4.  C locks 0.25 in CA in an HTLC.
5.  A cannot forward the C->A->B->C payment since it only has 0.1BTC in AB.
    It initiates a rebalance.
6.  A notices it has 0.5 BTC in AC (1.0 is currently locked in an HTLC, leaving it 0.5BTC).
    It can transfer 0.15BTC in AC to AB to be able to get 0.25 in AC that it can forward.
    So it routes 0.15 BTC via A->C->B->C.
7.  A locks 0.15 in AC in an HTLC.
8.  C is now in a bind.
    If it forwards the 0.15 BTC, then it will still fail the earlier A->C->B 1.0 payment.
    This is because it will lose 0.15 BTC from its side here, leading to 0.6 BTC, then receive 0.25 BTC from the rebalance, only having 0.85 BTC.
    If it does not forward, A is unable to rebalance and fails the rebalance of C, which fails the original payment from A->C->B.

Thus it seems to me, that precisely because of the lack of a global view, such kinds of complicated HTLC networks will form, then still lead to payment failure.

--

Perhaps it is better to consider that a high-level payment failure requires failure of all possible routes.
>From that point of view, a failure of a single payment attempt is not a big deal.

--

With that said, it may still be valuable to try doing this.
It will massively increase the complexity in c-lightning.
We would need a new persistent db table (meaning not in Python, at least until we make c-lightning into a bus connecting its actual running components that send commands to each other via the same bus that external commands and plugins use).

Then we can evaluate a small network of such nodes on mainnet LN and see how often rebalance failures occur.

Regards,
ZmnSCPxj

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 5, 2019 9:47 PM, René Pickhardt via Lightning-dev <lightning-dev at lists.linuxfoundation.org> wrote:

> Hey everyone,
>
> In this mail I introduce the Just in Time Routing schema (aka JIT Routing). Its main idea is to mitigate the disadvantages from our current source based routing (i.e.: guessing a route that will work in the sense that it has enough liquidity in each channel) and make the routing process a little bit more like the best effort routing that we know from IP-forwarding. As far as I know this will not decrease the privacy of the nodes. As part of this Routing scheme nodes need to be able to quickly rebalance their channels. Thus in this mail I also propose a heuristic for doing this efficiently which I have implemented and seems to provide pretty good results. Obviously the heuristic should be tested with the help of a simulation. I did not have the chance to do that yet. Partly also because I am lacking a proper dataset and I don't want to do this on artificial data.
>
> The advantages of JIT Routing are:
> * it is possible to do now without any protocol modification. In particular no modifications of the onions are necessary.
> * routing nodes can already easily implement it. By implementing it they will increase the routing success even for nodes which are running older implementations
> * it seems to be logically equivalent to AMP Routing. In particular its properties will also help base AMP once it is part of the protocol.
> * local channel balance information along the route can now be part of the path finding process while not decreasing the privacy by sharing information about channel balances with others. In fact the privacy of nodes is even being increased. 
>
> The disadvantages seem: 
> * it might economically not be incentivized for a routing node in every situation. Theoretically it can even happen that a node pays a fee in order to use this technique but can't earn the routing fee as the onion fails later. Nodes can implement risk management strategies to mitigate this issue.
> * The routing process might take a longer time as it starts sub routing processes.  
> * While doing JIT routing the capacity for channels should be reserved even before HTLCs are set up (to prevent hostile recursive chains of rebalancing operations)
>
> Obviously routing single big payments is a challenge for the lightning network. During the developer summit in Adelaide we have agreed to put Base AMP to BOLT 1.1. To review Base AMP the idea is basically after receiving a payment hash to create several onions on various routes to the recipient. While Base AMP in theory can find the maxflow / min cut and achieve maximum liquidity it is not clear yet how well Base AMP will really work.
>
> While it has been shown that smaller payments have a higher chance to be routed successfully there is the downside that we have more payments which increases the likelihood that any one of those payment eventually fails. As far as I know there have not been any studies researching this fact. Also the fact remains that Base AMP is still a source base routing protocol putting the sender into a tough spot as it has to guess which routes might work.
>
> How to JIT Routing?
>
> For the BOLTs we basically need one Recommendation (in fact even today nodes can do this without this explicit recommendation, but I would suggest to add the recommendation): 
>
> If a node cannot forward an incoming HTLC because the node has not enough funds on the outgoing channel the node MAY pause the routing process and try to rebalance the channel that misses liquidity. If it isn't able to rebalance the channel it should fail the onion sending back an insufficient wire funds error `temporary_channel_failure`
>
> Let us consider the following Graph and situation for an example: 
>
>   100 / 110     80 / 200
> S ----------> B --------> R
>               \         /
>         80/200  \     /  100/200
>                   \ / 
>                    T
>
> Meaning we have the following channels:
> S ---> B capacity: 110   A funds: 100  B funds:  10
> B ---> R capacity: 200   B funds:  80  R funds: 120
> B ---> T capacity: 200   B funds:  80  T funds: 120
> T ---> R capacity: 200   T funds: 100  R funds: 100
>
> Let us assume S wants to send a payment of 90 to R. With this distribution of funds this will not work with a single route S ---> B ---> R as the channel B--->R can only forward 78 (taking the channel reserve of 1% into consideration)
>
> Now with Base AMP after a protocol update this would be resolved by making two onions forwarding for example 45 each. Also S would have to pay more routing fees as the basefee of B will be charged twice. 
>
> Onion 1: S ---> B ---> R
> Onion 2: S ---> B ---> T ---> R
>
> However it is S again who has to guess how the problems with the liquidity are it does not know how B has spread its funds between R and T (and potentially other channels)
>
> With the above recommendation in place B can create a different onion with lets say moving 50 from the B---> T channel to the B ---> R channel resulting in the following situation: 
>
>   100 / 110     130 / 200
> S ----------> B --------> R
>               \         /
>         30/200  \     /  50/200
>                   \ / 
>                    T
>
> Meaning we have the following channels:
> S ---> B capacity: 110   A funds: 100  B funds:  10
> B ---> R capacity: 200   B funds: 130  R funds:  70
> B ---> T capacity: 200   B funds:  30  T funds: 170
> T ---> R capacity: 200   T funds:  50  R funds: 150
>
> Now B can easily pass the original onion with a value of 90 which was designed for the route S ---> B ---> R
>
> Obviously there can also be issues when B tries to rebalance their channels as the attempted event of rebalancing might trigger another JIT operation at another node (potentially making a later stage of the original onion harder to be forwarded). Also B does not have full information about the T---> R channel. Yet B has more information about B's neighbourhood than S does as B knows which channels are liquid and how many paths exist between the endpoints of those channels. This B should be able to make a more educated decision as the originator node S. Since B would only do a rebalancing if the routing fees for the rebalancing are cheaper than the fees they collect (and other nodes would do the same) the risk for and infinit cascade of rebalancing operations should be mitigated. Also the total CLTV for the rebalancing operation should be smaller than the original onion.  A sender of the onion could obviously start with larger cltv_deltas to allow routing nodes to have some buffer. This is also good in the sense of shadow routing as described in the BOLTs. 
>
> Another problem that might arise is the question if the routing node is able to quickly compute rebalancing paths. I have been working on a c-lightning plugin for rebalancing over the weekend (it is actually when I came up with the idea of JIT routing). Currently I use the following promising heuristic for rebalancing:
> I look at the friend of a friend network of the node that wants to rebalance channels. This network consists of alle channel partners of the node and their channels. This will in particular have all triangles and rectangles of the lightning network that the node that wants to rebalance is part of. Now I remove the node that wants to rebalance from the friend of the friend network. The resulting graph should always stay pretty small. In a regular small world network the friend of a friend graph consists of roughly 10k nodes. In particular after pruning nodes with degree 1 (which can't be used to rebalance on this subgraph) we have a pretty small network. After extracting this subnetwork (which for general operation could always be maintained while gossip messages are coming in) the computation to suggest several hundred rebalancing cycles and even ordering them by price takes less than 1 second on my machine. 
> So far I have been pretty successful finding several (cheap!) rebalancing suggestions in this network from my lightning node (which has about 40 channels) 
>
> I will release the code for the rebalancing schema soon but I wanted to have the idea already out in order to get more feedback while working on this. 
>
> best Rene
>
>  
>
> --
> https://www.rene-pickhardt.de
>
> Skype: rene.pickhardt 
>
> mobile: +49 (0)176 5762 3618