[Lightning-dev] A New Routing Paradigm:Ant Routing +`getroutequick` + Trampoline

ZmnSCPxj ZmnSCPxj at protonmail.com
Wed Feb 12 04:08:32 UTC 2020


Good morning Cezary,

> That was really interesting to read this.
> Why you want to send Node ID inside pheromone? Why not to send random number, that would be then passed though invoice?
> To increase privacy, node could generate new random number every week and distribute pheromone gain.

Our spam-limiting mechanism in current Lightning is tied to node IDs, and node IDs are effectively anchored onchain by the channels they have (and onchain is public anyway...).
I was hoping to reuse the same mechanism to give an upper bound on how much spam you can ride on the Lightning pheromone network (the same bound applies to the current Lightning gossip, you can only spam Lightning by spamming Bitcoin, and spamming Bitcoin is costly).

However, if you have other ideas about how to limit pheromone spam, I would gladly like to hear it.

Regards,
ZmnSCPxj


> Best regards,
> Cezary Dziemian
>
> pon., 10 lut 2020 o 11:35 ZmnSCPxj via Lightning-dev <lightning-dev at lists.linuxfoundation.org> napisał(a):
>
> > Overview of Components
> > ======================
> >
> > Ant Routing
> > -----------
> >
> > https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-February/002505.html
> >
> > Ant Routing is a distributed pathfinding algorithm where nodes emit "pheromones", which are broadcasted over the entire network.
> > When a node receives a pheromone (a UUID plus a distance counter) from a peer it has a channel with, it records that pheromone locally, then broadcasts it to other peers it has channels with, but with the distance counter increased by one.
> >
> > Subsequently, a payee can then provide the pheromone identifier to a payer, which can check if it has received that pheromone, and from which channel it received it from.
> > It can the forward the payment via the channel.
> > The next node can itself perform this operation, looking up the pheromone identifier and determining which channel it came from, and so on, until the payment reaches the destination.
> >
> > `getroutequick`
> > ---------------
> >
> > https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-August/002095.html
> > https://zmnscpxj.github.io/activity/2019-10-18/presentation.odp
> >
> > The overall `getroutequick` algorithm simply uses a Greedy Best First Search, which requires a heuristic that returns a distance to the target.
> > The heuristic is generated from cached data: the cache itself is generated by first performing a Dijkstra on the entire routemap, recording the distance from our own node at each mapped node.
> > The `getroutequick` algorithm then starts at one end of the route, and the other end must be our own node; this typically makes sense since both ends of the route are the payer and the payee, and have an interest in the payment succeeding by having a route available.
> >
> > The power of `getroutequick` lies in pre-caching the Dijkstra run; Dijkstra is heavy as it requires traversing the entire routemap, but its result can be cached and then used in Greedy Best First Search, which is likely to traverse only the shortest route.
> > This moves the heavy lifting from the point in time in which a payment is initiated, improving payment experience by reducing the amount of time needed to find a route to a particular payee.
> >
> > Trampoline
> > ----------
> >
> > https://github.com/lightningnetwork/lightning-rfc/pull/654
> >
> > Trampoline simply means doing onion routing at a level distinct from the individual channel level.
> > An onion route of "trampoline" nodes is formed, without specifying the exact channels in the route between them.
> > Currently, the idea is that there is an "outer" onion that describes a path to the next trampoline node, and an "inner" onion that describes the next-next trampoline node.
> >
> > Digression: On the Current Gossip System
> > ========================================
> >
> > Currently, we use a gossip system where nodes and channels publicly show that particular outputs on the blockchain layer are actually funding transaction outputs that back channels.
> >
> > Whenever a new node joins the Lightning network, it anchors its existence to the blockchain by having a channel opened with it (either it funds, or is funded to).
> > Then, on the basis of this anchored channel, it announces its existence.
> > This implies that bandwidth usage for this gossip system is bounded by the bandwidth limits we have imposed in the blockchain layer.
> > Thus, even though this gossip system requires the data to be broadcast globally to all nodes, the utilized resource is still bounded by the space limits of the Bitcoin blockchain layer.
> >
> > Further, the gossip system allows announcements to be refreshed periodically.
> > However, ultimately the amount of bandwidth used for routing information broadcast via gossip is bounded by the amount of bandwidth we have allowed the blockchain layer to use (i.e. the block size and block rate parameters of the blockchain).
> >
> > Combined Proposal
> > =================
> >
> > Combining Ant Routing and `getroutequick`
> > -----------------------------------------
> >
> > Ant routing, we can observe, is equivalent to a distributed form of Dijkstra algorithm.
> > Its major drawback is that every payment causes the emission of a pheromone, which must be broadcasted to the entire network.
> > This implies using resources proportional to the total number of payments globally, unlike the current Lightning network where each node only gets data regarding payments it is involved in routing.
> > This reverses the advantage of Lightning network, making it as inefficient (in terms of big-O, not necessarily in the lower-level details) as a blockchain.
> >
> > `getroutequick` works by pre-caching the result of a Dijkstra run, with the result then used as the heuristic in a subsequent, faster Greedy Best First Search.
> >
> > This leads to the realization that we could have a possible payee release a pheromone *once* (equivalent to the pre-cached Dijkstra run, now made distributed) and then subsequent payments to that payee just use the pre-cached Dijkstra by referring to that pheromone.
> > (This reduces privacy as the pheromone persistently identifies the node, but I fix this later; please bear with me.)
> >
> > ### Pheromone Handling
> >
> > So what we do is, every node that wants to be a forwarding node keeps a mapping, from a non-self node ID to a distance counter.
> > A pheromone is then composed of:
> >
> > * Node ID (33 bytes).
> > * A reference to a channel that the node is a participant of (8 bytes, a short channel ID).
> > * A proof that the Node ID is indeed a participant in the indicated channel (65 bytes for the node signature, 33 bytes for the node ID of the counterparty of the node, maybe some more bytes for whatever details on how the channel script is derived or whatever).
> > * A distance counter (1 byte can probably fit; do we expect Lightning diameter to exceed 255? maybe in some really insane conditions? how attackable is this?).
> >
> > When a node receives a pheromone from a peer it has a channel with, it checks if the node already exists in its mapping.
> > If it exists, and the distance counter in the pheromone is larger than or equal to the distance counter in the mapping, it ignores the pheromone and does not propagate it.
> > However, if the distance counter in the pheromone is less than that in the mapping, or the mapping does not contain that pheromone, it updates its mapping, then sends the pheromone to its peers it has channels with, but with the distance counter one greater than what it has now.
> >
> > ### Distance Queries and Payment Routing
> >
> > A node can ask its direct peers about the distance of that peer to a specific node ID.
> > (the peer could lie, but I cover this case later; please bear with me.)
> >
> > When a payer wants to route, the payer knows its (supposed) distance to the payee.
> > It then creates an HTLC whose timelock limit is limited only to the given distance to the payee.
> > The payer then queries each online peer for its mapping of the payee node ID and distance counter.
> > It selects any online peer with sufficient payer-side capacity in the channel, whose distance counter is lower than its own and sends the HTLC out via a channel with that peer.
> > If there are no such, it could select any peer with the lowest distance counter (and update its own distance counter to be one higher than that peer as well).
> >
> > The same operation is done by each forwarding node.
> > It queries each peer, where it has sufficient capacity to forward to, to get the distance counters for that node.
> > Then it forwards to any peer that has lower distance counter than itself, to which it has sufficient capacity.
> >
> > ### Forwarding Failures
> >
> > There are only two failures:
> >
> > * The destination has absolutely no idea of what the fayment you are talking about.
> >   * This case should include a proof that it comes from the destination, e.g. a signature.
> > * A peer is unable to forward because the HTLC it receives has insufficient timelock remaining for it to forward to the destination, and it reports back how many more hops needs to be added (this has to be 1 or higher).
> >
> > Failures are not encrypted, i.e. they are in plaintext.
> >
> > Suppose a forwarding node currently thinks it is 2 hops away from the payee.
> > However, the channel to the node that is 1 hop away is closed, or that node goes offline, etc.
> >
> > Since the forwarding node also propagated the pheromone outwards such that it is 2 hops away from the payee, the payer has allocated it just enough timelock in the HTLC that reaches it to be worth 2 hops of timelock.
> > The forwarding node then checks its *other* peers, in the hope that one of them is 1 hop away from the payee.
> > If it finds a peer that is one hop away, all is well with the world and it just forwards to that peer.
> >
> > On the other hand, suppose it finds that the peer it has that is nearest to the payee is 4 hops away from the payee.
> > It updates its own distance counter to 5 (one higher than the peer that is nearest), then (because the HTLC has too little timelock left for that number of hops) it propagates a failure back to the incoming HTLC, with the indication +3 (i.e. it had to add +3 to its previous distance counter of 2 to get its current distance counter of 5).
> > When the previous forwarding node (which presumably had a distance counter of 3) receives this failure, it can instead try to search other peers that have a distance counter of 2 or less and silently retry with those, or if there are none it could look for those that are still nearer than the adjusted counter of the failing peer (it was 2, but with the +3 is now at 5, so any peer it has that is 4 or less could be nearer and it can lower the adjustment of +3 to something lower), or if there are still none, propagate the +3 hops failure back to the previous node, and so on.
> >
> > Similarly, when the ultimate payer receives this +N hops failure, it can readjust its own sense of how distant it is from that payee, then retry with the same peer, or a different peer that is now nearer than the updated distance of the peer that propagated the failure.
> >
> > This case partially handles liars who underreport their distance to the payee.
> > They cannot forward the payment and expect the payment to push through, since some node beyond it will see that it has too little time left in the HTLC to propagate and report this as a +N hops failure.
> > The lying node can then only admit its lie and report back a +N hops failure, or keep mum and keep the HTLC until it is about to time out (the latter behavior is something current forwarding nodes can already do).
> > (But note: a lying node could effectively censor a particular node, by misreporting its distance to that node as 1 and then keeping the HTLC around until near the locktime limit, then admit just a +1 hops failure (if it did not, its peer will drop the channel with the lying node onchain in order to enforce the locktime limit and will then search among *its* peers for how distant it truly is), slowly reducing its censorship but still able to do so among a number of nodes on the network; this will settle over time, but would still be a nasty attack.)
> > (this could be mitigated if we had a clever mathematical construct such that if I have proof that somebody committed to N, I can forge a proof that the same somebody committed to N+1 but cannot forge a proof that somebody committed to N-1; maybe sinking signatures, with "height" in sinking signatures being some constant minus the distance? but then if you get N you could propagate N: and you could as well also be running multiple nodes that you propagate N to but not actually have channels with.)
> >
> > ### Periodic Updates
> >
> > Periodically, a node may want to update the Dijkstra run for itself, by re-emitting a new pheromone.
> > This refreshes the sense of the rest of the network of what their distance is to each node on the network.
> >
> > We already allow the current gossip protocol to publish similar periodic updates to the channel and node announcements, thus we expect the big-O bandwidth use to be similar with periodic pheromone re-emission.
> >
> > Plugging Privacy With Trampoline Routing
> > ----------------------------------------
> >
> > The core idea of trampoline routing is simply to add an onion routing to a level above individual channel hops.
> >
> > In current Lightning, we use onion routing to encode individual channel hops.
> > Individual channel hops, with our new Ant Routing + `getroutequick` scheme, are handled by the distributed `getroutequick`.
> > This has the major drawback that the `getroutequick` needs to know the exact destination.
> >
> > For this new routing paradigm, we simply add an onion routing on a layer above the channel-level routing, in much the same way that Tor adds onion routing on top of the TCP router-leving routing.
> > That is, we move the onion packet from the hop level to the trampoline node level.
> >
> > An onion packet is sent to the destination that we admit to the `getroutequick` channel-level routing layer.
> > This onion packet indicates openly how much of the HTLC timelock is allocated to reach that destination, and how much is allocated to the destination itself.
> > On reaching the destination, it unwraps the onion by one layer, and it might find that it is the final destination, or that it has to use the allocated timelock as well to forward to the next destination.
> >
> > ### Payer Allocation
> >
> > Each node already has a mapping of nodes to distance counters.
> > Thus, to make a payment to a node, it need only find some number of nodes in this mapping.
> >
> > Ideally, it finds two other nodes that are nearer to it than the payee node (i.e. has distance counters lower than the distance to the actual payee) in the hope that they will be in "the same" direction or area as the final destination.
> >
> > Suppose that it currently has a destination, and selects an intermediate node in the trampoline onion.
> > It got both nodes from its local map of node ID to distance, thus it knows the distance from itself to both nodes (assuming this data is not too stale).
> > By considering itself, and the two nodes, as a triangle, we can conclude that the maximum distance between those two nodes is equal to the sum of the distances between itself and each of the nodes.
> > So for example if the trampoline node is 5 hops away and the payee is 3 hops away, then the maximum distance between both is 8 hops.
> > So, the payer will allocate 8 hops for the route from the intermediate node to the current destination.
> >
> > The payer can repeat that however many times it wants, in order to add more layers to the trampoline onion.
> > For example, it could add second trampoline, with a distance of 6 hops away, and it gets the 5 hops distance of the previous trampoline node and adds those together, allocating 11 hops for the route from second to first trampoline, and allocating the above 8 hops for the first trampoline to the payee.
> > Then, it sends out an HTLC of a total of 6 (distance to second trampoline) + 11 (route from second to first trampoline) + 8 (route from first trampoline to destination) hops of timelock and fees.
> >
> > ### Trampoline-level Failures
> >
> > A trampoline node might, for some reason, not be able to find the next trampoline.
> > It might also find that the payer information is so stale that the budgeted number of hops for that trampoline hop is insufficient.
> > In those cases, the trampoline node can report back a failure.
> >
> > These would have a distinct failure code compared to the +N hops needed failure at the channel hop level.
> > However, it may be possible to add the destination "I have no idea what the fayment you are talking about" to these failures.
> > Like the onion-wrapped failures in the current Lightning onion routing, it may be possible to also onion-wrap the failures at the trampoline node level.
> > Then failures seen at the channel hop level are either "some onion-wrapped failure" or "add +N hops plz".
> > Of note is that a trampoline node that finds it has been given insufficient budget should not use the "add +N hops plz" message, but an onion-wrapped failure (which itself might be just "add +N hops plz, but to my budget").
> >
> > ### Payment Point Rotation
> >
> > With payment point+scalar, we can have "path decorrelation", where each hop in the onion route changes the outgoing PTLC by a constant scalar that is secretly shared between itself and the original payer.
> >
> > This will still work by moving the onion routing one level higher, from the level of channel hops to the level of trampoline nodes.
> >
> > ### Hidden Destinations
> >
> > Another advantage of trampoline onions is that the payee can provide a pre-encrypted trampoline onion which the payer can prepend its own trampoline nodes to, i.e. hidden destinations.
> > This feature remains unchanged when done on top of the hop-level Ant Routing + `getroutequick` scheme.
> >
> > Advantages and Disadvantages
> > ----------------------------
> >
> > Advantages:
> >
> > * Far fewer channels have to be published offchain; a node that wants to receive in the future needs only publish *one* of its channels in order to act as proof that it is on Lightning.
> >   A node that expects many incoming payments can have multiple channels opened towards it, but only admit ownership of one channel.
> >   Two cooperating nodes that have a channel between them can use that channel to prove their existence in their individual pheromones, thereby revealing to the rest of the network only a small number of channels.
> >   This makes it much harder to map out the full network, but with this hiding much more consistently distributed among nodes, for better risk-sharing.
> >   * Nodes that do not expect to receive in the future, except from direct peers, do not need to publish *any* channels.
> >     Even if they do not publish *any* channels, however, they can still participate in forwarding, since forwarding nodes perform local queries to determine the next hop, and participation in forwarding is an important privacy technique; a node that regularly forwards can always claim that its payment is actually forwarded from somebody else, a property lost with the current "private"^H^H^H^H^H^H^H^H^Hunpublished channels.
> >
> > Disadvantages:
> >
> > * Censorship attacks are trivial: just lie about your distance to the node you want to censor and say you know them, then just HODL on to the HTLC that should go to them.
> >   This is a major disadvantage, thus possibly a good reason to avoid this paradigm.
> >   My hope is someone else can come up with some way to mitigate or remove this weakness.
> > * We would need to impose a consistent feerate and CLTV-delta at each node.
> >   * On the other hand, one can argue that it is precisely the ability of forwarding nodes to give arbitrary feerates that allows surveillors to effectively "pay to surveill" by giving below-market-rate feerates, so ***maybe*** central planning of this parameter (yuck!) would be useful... nah non-free-market, boo.
> >
> > Regards,
> > ZmnSCPxj
> > _______________________________________________
> > Lightning-dev mailing list
> > Lightning-dev at lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev




More information about the Lightning-dev mailing list