[Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)
ZmnSCPxj at protonmail.com
Tue Apr 9 10:32:00 UTC 2019
Good morning m.a.holden,
> > Let me clarify: When you say "node" here, do you mean Lightning Network node?
> > Or do you mean instead an in-memory node?
> Neither. I meant a node in a tree. I tried to use the term bucket to make the distinction between this and Lightning node.
> The tree is not strictly in-memory. There is a purely conceptual global quadtree.
Can you then be more specific what is the behavior of a specific node?
Note I use the term "node" specifically to refer to Lightning Network Node.
Given the node has space for N channels or N nodes (or whatever), how does the node prune its routemap?
What gossip messages do it ignore and which ones do it check?
If that node has to pay to a node that is not in its routemap, how does it perform the routemap lookup?
> Each bucket is essentially a view over the union of its 4 children. The bucket 00b is (0000b U 0001b U 0010b U 0011b). The bucket 0011b is (001100b U 001101b U 001110b 001111b), etc. Each client choses a suitable max_depth <= HASHLEN/2, and the leaves of their tree contain a set of all PKH whose prefix matches the bucket index. A client will keep, at minimum, a subtree of the global quadtree (and both nodes for all channels where at least one of the nodes is in the same subtree).
When you refer to "sub-tree", do you mean each node stores a tree of certain maximum height, with the root of that tree being some random node on the global tree?
When you refer to "bucket spilling", what does it mean when the buvket has to spill beyond the maximum height of the tree that the node is willing to store?
Obviously some kind of pruning has to be done.
That is the whole point of this exercise.
I still do not see the benefit of your lookup scheme if the lookup scheme is just in-memory.
That is not what is interesting at all.
What is interesting is remote lookup, over the network.
It is what must be carefully designed, as the network cannot be trusted and anyone can be an attacker and anyone can be a victim.
Suppose a node decides to store the prefix 0b00000000, storing the subtree at that point with a height of only 2 (for simplicity).
What happens when it needs to look for a node with prefix 0b01101101?
Does it mean the node has to somehow store nodes that are outside of the prefix it decides to store?
Just what data does it have to store?
And why not just use a dist() measure like I proposed?
> > As I understand your proposal, taking over or taking down a node near the root of the tree will make it difficult to look up several nodes at once.
> > This is because the tree nature makes nodes near the root of the tree (top of the hierarchy) much more important than leaf nodes, which only worry about themselves.
> If a node advertises itself near the top of the tree then it will be a better information provider than others, but this is never at the exclusion of others below it. All of the information a node in the parent bucket holds is held in part by the nodes in the child buckets. The parent does not know anything that other people don't know too. No nodes are "more important", but they might potentially be "more optimal".
We should face the fact that nodes that know more about the network are "more important", really.
Thus, they should be protected.
The simplest protection is to anonymize them by never revealing how much each node actually knows of the network.
This enforces that all nodes are peers, with nobody being easily visible as more important than anyone else.
Instead, each node that is used in lookup can simply silently delegate (without informing whoever is asking) the lookup to some other arbitrary node that it knows is more likely to know the destination.
That node might know more than this node, or less, but it is immaterial --- no node knows how much each node knows.
> If you know about some nodes in bucket 0011b, but none in 0001b where a payment destination is, then you could query any node in 0011b and ask about any nodes in bucket 0001b. Since the buckets are nearby, there is a greater probability that they'll know about more nodes than somebody further away. This would be similar to how your proposal operates.
Then just use my proposal.
It is simpler and I already presented the algorithms used by each node in detail, including how each node limits the memory it uses for routemap.
As I mentioned, the point is to provide anonymity by not revealing how much you know.
There is a tradeoff between anonymity and some measure of efficiency.
Obviously, I prioritize anonymity, else I would not be ZmnSCPxj.
> you could query any node in 0011b and ask about any nodes in bucket 0001b
In order to query a node, you need to know that node, meaning it (and at least one route to it) is stored somewhere in your (limited) memory.
Note that even now in the network, some nodes do not provide a way to contact them directly --- you can only contact them over the channels they have advertised.
Thus to contact a node on a different bucket, you must store the route to the node in that bucket.
Thus my proposal, which allows a node to delegate to some other node, on the assumption that different nodes have different local views.
Just use a continuous distance function rather than buckets, buckets don't buy you anything.
> If somebody did advertise being in bucket 00b, then they're able to find a potentially better (shorter) path to the destination because they know more information and you don't need to find a path through multiple buckets. If they are under DDoS, it doesn't bring down the network or limit access to the child buckets - it just makes it trivially less optimal to route because it might (low probability) require more queries to reach the destination.
> When querying, it is expected that if you know about any nodes in the same bucket as the payment destination, then there's a high probability that they will know a route to the destination. A parent bucket of that bucket is not any more likely to know about the destination, they have the same information. I've shown that at small depth, there's a high probability that you will have knowledge about a reasonable quantity of paths to every other bucket in the global network - so the majority of payments will only need to visit one bucket outside your own. The reason to specifically query parent buckets is limited, and not ultimately necessary.
> The only way I can see the problem you raise being a concern is if misconfigured clients have an unreasonably large depth for the amount of information in the network, such that there are few, or no channels from their own bucket to other buckets. In that case, they might become over-reliant on the parent's buckets to receive payments, but there are likely to be numerous parents at every depth of the tree (correctly configured clients), meaning there isn't a clear-cut target that an attacker could DDoS to try and disrupt a bucket.
> Nodes do not need to present the real depth at which they keep gossip, but there are potentially greater fee-earning opportunities if publicly advertising their maximum information capability. Nodes could counterbalance potential DDoS risk with higher routing fees, where people might pay more to have a greater chance of finding shorter routes in fewer queries, but a frugal client would simply chose the cheapest routes, and the more expensive routes via parent buckets would be less desirable to them - meaning DDoS on nodes in parent buckets may be wasted effort because it would drive people towards saving money on fees. Also, since the opportunity cost for missed routing fees would be increased for nodes nearer the top of the tree, they are incentivized to make the efforts to maintain uptime and try to mitigate DDoS attacks against themselves.
> It's hard to say for certain whether the risks would be real cause for concern without testing on real world data, which we don't yet have the scale to test, but I just can't see it being realistic that there would be so few targets which make DDoS feasible to begin with, and that if attacks against a significant number of nodes were successful, the potential degrading of the network would be trivial or even unnoticeable. As the network grows, it seems the attack vector would get even more unrealistic because the number of targets that one would need DDoS would increase.
> > What happens if a node near the root of the tree is brought down?
> Buckets closer to the root do not hold any exclusive information. If you bring down every node in the root bucket, and every node at depth=1, then nodes at depth>=2 will continue to communicate and still collectively know the entire network map, with still a good probability that any node can find any other node in the network by asking just one other node.
Can you describe how each node decides what parts of the routemap to prune and what parts to keep?
Given that each node has a specific number of nodes+Channels it can store?
More information about the Lightning-dev