[Lightning-dev] A New Routing Paradigm:Ant Routing +`getroutequick` + Trampoline

Cezary Dziemian cezary.dziemian at gmail.com
Tue Feb 11 18:58:51 UTC 2020


That was really interesting to read this.

Why you want to send Node ID inside pheromone? Why not to send random
number, that would be then passed though invoice?

To increase privacy, node could generate new random number every week and
distribute pheromone gain.

Best regards,
Cezary Dziemian


pon., 10 lut 2020 o 11:35 ZmnSCPxj via Lightning-dev <
lightning-dev at lists.linuxfoundation.org> napisał(a):

> Overview of Components
> ======================
>
> Ant Routing
> -----------
>
>
> https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-February/002505.html
>
> Ant Routing is a distributed pathfinding algorithm where nodes emit
> "pheromones", which are broadcasted over the entire network.
> When a node receives a pheromone (a UUID plus a distance counter) from a
> peer it has a channel with, it records that pheromone locally, then
> broadcasts it to other peers it has channels with, but with the distance
> counter increased by one.
>
> Subsequently, a payee can then provide the pheromone identifier to a
> payer, which can check if it has received that pheromone, and from which
> channel it received it from.
> It can the forward the payment via the channel.
> The next node can itself perform this operation, looking up the pheromone
> identifier and determining which channel it came from, and so on, until the
> payment reaches the destination.
>
> `getroutequick`
> ---------------
>
>
> https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-August/002095.html
> https://zmnscpxj.github.io/activity/2019-10-18/presentation.odp
>
> The overall `getroutequick` algorithm simply uses a Greedy Best First
> Search, which requires a heuristic that returns a distance to the target.
> The heuristic is generated from cached data: the cache itself is generated
> by first performing a Dijkstra on the entire routemap, recording the
> distance from our own node at each mapped node.
> The `getroutequick` algorithm then starts at one end of the route, and the
> other end must be our own node; this typically makes sense since both ends
> of the route are the payer and the payee, and have an interest in the
> payment succeeding by having a route available.
>
> The power of `getroutequick` lies in pre-caching the Dijkstra run;
> Dijkstra is heavy as it requires traversing the entire routemap, but its
> result can be cached and then used in Greedy Best First Search, which is
> likely to traverse only the shortest route.
> This moves the heavy lifting from the point in time in which a payment is
> initiated, improving payment experience by reducing the amount of time
> needed to find a route to a particular payee.
>
> Trampoline
> ----------
>
> https://github.com/lightningnetwork/lightning-rfc/pull/654
>
> Trampoline simply means doing onion routing at a level distinct from the
> individual channel level.
> An onion route of "trampoline" nodes is formed, without specifying the
> exact channels in the route between them.
> Currently, the idea is that there is an "outer" onion that describes a
> path to the next trampoline node, and an "inner" onion that describes the
> next-next trampoline node.
>
> Digression: On the Current Gossip System
> ========================================
>
> Currently, we use a gossip system where nodes and channels publicly show
> that particular outputs on the blockchain layer are actually funding
> transaction outputs that back channels.
>
> Whenever a new node joins the Lightning network, it anchors its existence
> to the blockchain by having a channel opened with it (either it funds, or
> is funded to).
> Then, on the basis of this anchored channel, it announces its existence.
> This implies that bandwidth usage for this gossip system is bounded by the
> bandwidth limits we have imposed in the blockchain layer.
> Thus, even though this gossip system requires the data to be broadcast
> globally to all nodes, the utilized resource is still bounded by the space
> limits of the Bitcoin blockchain layer.
>
> Further, the gossip system allows announcements to be refreshed
> periodically.
> However, ultimately the amount of bandwidth used for routing information
> broadcast via gossip is bounded by the amount of bandwidth we have allowed
> the blockchain layer to use (i.e. the block size and block rate parameters
> of the blockchain).
>
> Combined Proposal
> =================
>
> Combining Ant Routing and `getroutequick`
> -----------------------------------------
>
> Ant routing, we can observe, is equivalent to a distributed form of
> Dijkstra algorithm.
> Its major drawback is that every payment causes the emission of a
> pheromone, which must be broadcasted to the entire network.
> This implies using resources proportional to the total number of payments
> globally, unlike the current Lightning network where each node only gets
> data regarding payments it is involved in routing.
> This reverses the advantage of Lightning network, making it as inefficient
> (in terms of big-O, not necessarily in the lower-level details) as a
> blockchain.
>
> `getroutequick` works by pre-caching the result of a Dijkstra run, with
> the result then used as the heuristic in a subsequent, faster Greedy Best
> First Search.
>
> This leads to the realization that we could have a possible payee release
> a pheromone *once* (equivalent to the pre-cached Dijkstra run, now made
> distributed) and then subsequent payments to that payee just use the
> pre-cached Dijkstra by referring to that pheromone.
> (This reduces privacy as the pheromone persistently identifies the node,
> but I fix this later; please bear with me.)
>
> ### Pheromone Handling
>
> So what we do is, every node that wants to be a forwarding node keeps a
> mapping, from a non-self node ID to a distance counter.
> A pheromone is then composed of:
>
> * Node ID (33 bytes).
> * A reference to a channel that the node is a participant of (8 bytes, a
> short channel ID).
> * A proof that the Node ID is indeed a participant in the indicated
> channel (65 bytes for the node signature, 33 bytes for the node ID of the
> counterparty of the node, maybe some more bytes for whatever details on how
> the channel script is derived or whatever).
> * A distance counter (1 byte can probably fit; do we expect Lightning
> diameter to exceed 255? maybe in some really insane conditions? how
> attackable is this?).
>
> When a node receives a pheromone from a peer it has a channel with, it
> checks if the node already exists in its mapping.
> If it exists, and the distance counter in the pheromone is larger than or
> equal to the distance counter in the mapping, it ignores the pheromone and
> does not propagate it.
> However, if the distance counter in the pheromone is less than that in the
> mapping, or the mapping does not contain that pheromone, it updates its
> mapping, then sends the pheromone to its peers it has channels with, but
> with the distance counter one greater than what it has now.
>
> ### Distance Queries and Payment Routing
>
> A node can ask its direct peers about the distance of that peer to a
> specific node ID.
> (the peer could lie, but I cover this case later; please bear with me.)
>
> When a payer wants to route, the payer knows its (supposed) distance to
> the payee.
> It then creates an HTLC whose timelock limit is limited only to the given
> distance to the payee.
> The payer then queries each online peer for its mapping of the payee node
> ID and distance counter.
> It selects any online peer with sufficient payer-side capacity in the
> channel, whose distance counter is lower than its own and sends the HTLC
> out via a channel with that peer.
> If there are no such, it could select any peer with the lowest distance
> counter (and update its own distance counter to be one higher than that
> peer as well).
>
> The same operation is done by each forwarding node.
> It queries each peer, where it has sufficient capacity to forward to, to
> get the distance counters for that node.
> Then it forwards to any peer that has lower distance counter than itself,
> to which it has sufficient capacity.
>
> ### Forwarding Failures
>
> There are only two failures:
>
> * The destination has absolutely no idea of what the fayment you are
> talking about.
>   * This case should include a proof that it comes from the destination,
> e.g. a signature.
> * A peer is unable to forward because the HTLC it receives has
> insufficient timelock remaining for it to forward to the destination, and
> it reports back how many more hops needs to be added (this has to be 1 or
> higher).
>
> Failures are not encrypted, i.e. they are in plaintext.
>
> Suppose a forwarding node currently thinks it is 2 hops away from the
> payee.
> However, the channel to the node that is 1 hop away is closed, or that
> node goes offline, etc.
>
> Since the forwarding node also propagated the pheromone outwards such that
> it is 2 hops away from the payee, the payer has allocated it just enough
> timelock in the HTLC that reaches it to be worth 2 hops of timelock.
> The forwarding node then checks its *other* peers, in the hope that one of
> them is 1 hop away from the payee.
> If it finds a peer that is one hop away, all is well with the world and it
> just forwards to that peer.
>
> On the other hand, suppose it finds that the peer it has that is nearest
> to the payee is 4 hops away from the payee.
> It updates its own distance counter to 5 (one higher than the peer that is
> nearest), then (because the HTLC has too little timelock left for that
> number of hops) it propagates a failure back to the incoming HTLC, with the
> indication +3 (i.e. it had to add +3 to its previous distance counter of 2
> to get its current distance counter of 5).
> When the previous forwarding node (which presumably had a distance counter
> of 3) receives this failure, it can instead try to search other peers that
> have a distance counter of 2 or less and silently retry with those, or if
> there are none it could look for those that are still nearer than the
> adjusted counter of the failing peer (it was 2, but with the +3 is now at
> 5, so any peer it has that is 4 or less could be nearer and it can lower
> the adjustment of +3 to something lower), or if there are still none,
> propagate the +3 hops failure back to the previous node, and so on.
>
> Similarly, when the ultimate payer receives this +N hops failure, it can
> readjust its own sense of how distant it is from that payee, then retry
> with the same peer, or a different peer that is now nearer than the updated
> distance of the peer that propagated the failure.
>
> This case partially handles liars who underreport their distance to the
> payee.
> They cannot forward the payment and expect the payment to push through,
> since some node beyond it will see that it has too little time left in the
> HTLC to propagate and report this as a +N hops failure.
> The lying node can then only admit its lie and report back a +N hops
> failure, or keep mum and keep the HTLC until it is about to time out (the
> latter behavior is something current forwarding nodes can already do).
> (But note: a lying node could effectively censor a particular node, by
> misreporting its distance to that node as 1 and then keeping the HTLC
> around until near the locktime limit, then admit just a +1 hops failure (if
> it did not, its peer will drop the channel with the lying node onchain in
> order to enforce the locktime limit and will then search among *its* peers
> for how distant it truly is), slowly reducing its censorship but still able
> to do so among a number of nodes on the network; this will settle over
> time, but would still be a nasty attack.)
> (this could be mitigated if we had a clever mathematical construct such
> that if I have proof that somebody committed to N, I can forge a proof that
> the same somebody committed to N+1 but cannot forge a proof that somebody
> committed to N-1; maybe sinking signatures, with "height" in sinking
> signatures being some constant minus the distance? but then if you get N
> you could propagate N: and you could as well also be running multiple nodes
> that you propagate N to but not actually have channels with.)
>
> ### Periodic Updates
>
> Periodically, a node may want to update the Dijkstra run for itself, by
> re-emitting a new pheromone.
> This refreshes the sense of the rest of the network of what their distance
> is to each node on the network.
>
> We already allow the current gossip protocol to publish similar periodic
> updates to the channel and node announcements, thus we expect the big-O
> bandwidth use to be similar with periodic pheromone re-emission.
>
> Plugging Privacy With Trampoline Routing
> ----------------------------------------
>
> The core idea of trampoline routing is simply to add an onion routing to a
> level above individual channel hops.
>
> In current Lightning, we use onion routing to encode individual channel
> hops.
> Individual channel hops, with our new Ant Routing + `getroutequick`
> scheme, are handled by the distributed `getroutequick`.
> This has the major drawback that the `getroutequick` needs to know the
> exact destination.
>
> For this new routing paradigm, we simply add an onion routing on a layer
> above the channel-level routing, in much the same way that Tor adds onion
> routing on top of the TCP router-leving routing.
> That is, we move the onion packet from the hop level to the trampoline
> node level.
>
> An onion packet is sent to the destination that we admit to the
> `getroutequick` channel-level routing layer.
> This onion packet indicates openly how much of the HTLC timelock is
> allocated to reach that destination, and how much is allocated to the
> destination itself.
> On reaching the destination, it unwraps the onion by one layer, and it
> might find that it is the final destination, or that it has to use the
> allocated timelock as well to forward to the next destination.
>
> ### Payer Allocation
>
> Each node already has a mapping of nodes to distance counters.
> Thus, to make a payment to a node, it need only find some number of nodes
> in this mapping.
>
> Ideally, it finds two other nodes that are nearer to it than the payee
> node (i.e. has distance counters lower than the distance to the actual
> payee) in the hope that they will be in "the same" direction or area as the
> final destination.
>
> Suppose that it currently has a destination, and selects an intermediate
> node in the trampoline onion.
> It got both nodes from its local map of node ID to distance, thus it knows
> the distance from itself to both nodes (assuming this data is not too
> stale).
> By considering itself, and the two nodes, as a triangle, we can conclude
> that the maximum distance between those two nodes is equal to the sum of
> the distances between itself and each of the nodes.
> So for example if the trampoline node is 5 hops away and the payee is 3
> hops away, then the maximum distance between both is 8 hops.
> So, the payer will allocate 8 hops for the route from the intermediate
> node to the current destination.
>
> The payer can repeat that however many times it wants, in order to add
> more layers to the trampoline onion.
> For example, it could add second trampoline, with a distance of 6 hops
> away, and it gets the 5 hops distance of the previous trampoline node and
> adds those together, allocating 11 hops for the route from second to first
> trampoline, and allocating the above 8 hops for the first trampoline to the
> payee.
> Then, it sends out an HTLC of a total of 6 (distance to second trampoline)
> + 11 (route from second to first trampoline) + 8 (route from first
> trampoline to destination) hops of timelock and fees.
>
> ### Trampoline-level Failures
>
> A trampoline node might, for some reason, not be able to find the next
> trampoline.
> It might also find that the payer information is so stale that the
> budgeted number of hops for that trampoline hop is insufficient.
> In those cases, the trampoline node can report back a failure.
>
> These would have a distinct failure code compared to the +N hops needed
> failure at the channel hop level.
> However, it may be possible to add the destination "I have no idea what
> the fayment you are talking about" to these failures.
> Like the onion-wrapped failures in the current Lightning onion routing, it
> may be possible to also onion-wrap the failures at the trampoline node
> level.
> Then failures seen at the channel hop level are either "some onion-wrapped
> failure" or "add +N hops plz".
> Of note is that a trampoline node that finds it has been given
> insufficient budget should not use the "add +N hops plz" message, but an
> onion-wrapped failure (which itself might be just "add +N hops plz, but to
> my budget").
>
> ### Payment Point Rotation
>
> With payment point+scalar, we can have "path decorrelation", where each
> hop in the onion route changes the outgoing PTLC by a constant scalar that
> is secretly shared between itself and the original payer.
>
> This will still work by moving the onion routing one level higher, from
> the level of channel hops to the level of trampoline nodes.
>
> ### Hidden Destinations
>
> Another advantage of trampoline onions is that the payee can provide a
> pre-encrypted trampoline onion which the payer can prepend its own
> trampoline nodes to, i.e. hidden destinations.
> This feature remains unchanged when done on top of the hop-level Ant
> Routing + `getroutequick` scheme.
>
>
> Advantages and Disadvantages
> ----------------------------
>
> Advantages:
>
> * Far fewer channels have to be published offchain; a node that wants to
> receive in the future needs only publish *one* of its channels in order to
> act as proof that it is on Lightning.
>   A node that expects many incoming payments can have multiple channels
> opened towards it, but only admit ownership of one channel.
>   Two cooperating nodes that have a channel between them can use that
> channel to prove their existence in their individual pheromones, thereby
> revealing to the rest of the network only a small number of channels.
>   This makes it much harder to map out the full network, but with this
> hiding much more consistently distributed among nodes, for better
> risk-sharing.
>   * Nodes that do not expect to receive in the future, except from direct
> peers, do not need to publish *any* channels.
>     Even if they do not publish *any* channels, however, they can still
> participate in forwarding, since forwarding nodes perform local queries to
> determine the next hop, and participation in forwarding is an important
> privacy technique; a node that regularly forwards can always claim that its
> payment is actually forwarded from somebody else, a property lost with the
> current "private"^H^H^H^H^H^H^H^H^Hunpublished channels.
>
> Disadvantages:
>
> * Censorship attacks are trivial: just lie about your distance to the node
> you want to censor and say you know them, then just HODL on to the HTLC
> that should go to them.
>   This is a major disadvantage, thus possibly a good reason to avoid this
> paradigm.
>   My hope is someone else can come up with some way to mitigate or remove
> this weakness.
> * We would need to impose a consistent feerate and CLTV-delta at each node.
>   * On the other hand, one can argue that it is precisely the ability of
> forwarding nodes to give arbitrary feerates that allows surveillors to
> effectively "pay to surveill" by giving below-market-rate feerates, so
> ***maybe*** central planning of this parameter (yuck!) would be useful...
> nah non-free-market, boo.
>
> Regards,
> ZmnSCPxj
> _______________________________________________
> Lightning-dev mailing list
> Lightning-dev at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20200211/6b15b7d6/attachment-0001.html>


More information about the Lightning-dev mailing list