[Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)
m.a.holden at protonmail.com
Sat Apr 6 02:40:25 UTC 2019
Good morning ZmnSCPxj,
I'll try to clarify my proposal further, but also have some questions about yours.
> Now, it seems to me what you propose, is to have octrees contain octrees, and so on.
There's one global tree, which is the same for all users. Every node in the tree has a bucket and exactly 4 child nodes, except leaves have no children. The tree has a max depth, which each client sets itself. The theoretical maximum being HASHLEN/2 (In practice, we're likely to be concerned about <8). Note that every parent's bucket contains all of the information of all of its children's buckets - meaning the root node of the global quadtree is equivalent to the unfiltered global network. Nodes pick a depth and concern themselves with the bucket at that depth, unless it overflows, in which case they increase the depth by 1.
> Now let us return, to my proposal of a distance measurement.
> This effectively maps the LN universe onto a circle.
> Each node commits itself to storing an arc of this circle, and possibly various other nodes it happens to be directly connected to that may be far from the arc near it.
The quadtree can also be looked at from the perspective of a circle, assuming there is a reference point P on it. Each bucket is represented by an arc of size 2pi/(4^depth), and the PKH-prefix represents a displacement of the arc from P (Eg, the bucket 00b would represent an arc from P to pi/2+P). Bucket 00b includes all gossip information for the buckets 0000b, 0001b, 0010b and 0011b, which are sub-arcs of it, and so forth. Spilling a bucket is the same as narrowing the arc to 1/4 its previous size.
Although different from your arbitrary distance k (arc size 2*k?), they're not too dissimilar in the kind of distance covered - with the primary distinction being that mine proposes using set interval ranges (based on powers of 4), and the node picks the range which it's PKH fits into, rather than the range being centred on the node.
> I can overload those N nodes by generating node addresses that also lie in that octant.
> By just iterating over scalars, about 1/8 of the generated node addresses will lie in the target octant.
If you target the first layer of the tree only, then once nodes spill to the second layer, they will filter out any gossip which does not concern them. It isn't sufficient to just brute force the first 2 bits, but you need to do it for 2^depth, where depth is the target's maximum they're willing to spill to (which is not shared).
However, commodity hardware can attempt millions of ec_mult/hash per second, so getting a node into the bucket you want is trivial anyway for small depth.
> In order to disrupt a particular node, I must place fake nodes near that node, in an attempt to force its neighbors to reduce the arc of the circle that they can map.
> However, I need to generate fake nodes that are nearer to that node than genuine honest nodes.
> This means the probability of me generating such node addresses are much lower than 1/8, thus requiring that I devote more energy to generating the falsified attack nodes.
I would argue that the above is also true in your approach. It would still be trivial to brute force the prefixes which minimize the distance between themselves and the target, even with a network of tens of millions of nodes. The amount of energy which would need devoting does not seem like it would a deciding factor for this attack.
> Further, in executing this attack, while I disrupt one node very well, and nearby nodes somewhat, my effect will be far less than disrupting 1/8 of the network.
Since the attack needs to target the maximum depth that nodes might spill to, then the amount of the network which could be affected by the attack would be 1/4^depth. I can't imagine it being very different in your distance based approach, since we're considering the same kind of distances, from a different perspective.
> Once I get a node address in the targeted octant, I can commit a nominal amount of Bitcoin into some 2-of-2 with an existing actual node I control and the generated node address, then synthesize some node gossip containing those node addresses.
The way I see it, this potential attack affects the global network generally. Somebody could try this regardless of our proposals, and try to flood the whole network with spam.
BOLT#7 only states that a node MAY forward information about nodes for which it does not know any channels. One could reconsider this approach if constrainted on resources, such as to say if depth>X, MUST NOT forward any information about nodes with no known channels, or perhaps have a variation of gossip which combines node information into channel broadcasts to ensure that such node spam can't occur. I think the vagueness of the spec on this rule is inidicative that it needs addressing.
To try and mount an attack with greater chance of success, the attacker should need to open many new channels, for which they face obvious constraints on the bitcoin network, and real costs. And since the attack can only attempt to degrade LN at best, and offers no guarantee of censoring, it seems unlikely that an attacker would exhaust significant resources to try and target somebody with a low chance of success.
The attacker would also likely use nominal amounts for the channels they're trying to attack with, as they would not want to lock up significant funding. It may be trivial to filter many tiny capacity channels as they're not likely to be useful for making payments anyway.
> This unbalances the octree such that one octant has a far lerger number of nodes inside it than the other octants.
> The N nodes that promised to keep track of the routemap within the octant find that they need to take up more and more memory to store the octant data because of this targeted attack.
> Eventually, the N nodes start dropping out of this responsibility since they run out of resources.
Each node would treat the quadtree as a perfectly balanced one at the depth they're concerned. The buckets themselves vary in capacity, and if the gossip reaches capacity at any depth, the bucket spills to the next depth. If an attacker managed to overflow a leaf bucket, then the potential victim would need to consider another gossip approach.
On determining whether one is being attacked though, it should be possible to detect based on information size before and after spilling. If one is being specifically targeted, then the information before and after spilling will be almost the same, with little saving on gossip, where normally spilling might expect a 50%+ saving. A client could set a low threshold on the minimum saving they expect to get from spilling, and if it is not met, they might determine that they are being targeted, and take countermeasures.
To me it seems like your distance narrowing suffers the same issue. As the size of your gossip grows, you will narrow the distance k for which you concern yourself with gossip - but at some point you must have a minimum k, else it will be too narrow to have any useful information. What happens if you decrease k to some minimum, and the amount of spam still overflows the capacity limits you've set?
> Once the number of nodes that hold that octant drops, I can then switch to targeted DDoS attacks on the remaining octant-mapping nodes, especially easy to locate them since they openly broadcast the fact that they promise to map the octant they are in.
> Now the octant becomes hard to access for 7/8 of the network.
> I have successfully executed a partitioning attack and disrupted the operation of an entire octant.
There are no "octant-mapping nodes". Every node knows about every channel from its own bucket to other buckets. This remains true even after spilling - although the quantity of information is reduced. Performing a DoS on a node who stores the whole network information does not affect other users, because they're not exclusive holder of information. Every single participant of the network could operate at depth=2. There does not need to exist anybody who has the full network map.
Also, if most nodes follow a reasonable autopilot strategy, then every node in a bucket also has at least some channels to other buckets - meaning the target of a DDoS would essentially be the entire bucket to try and completely isolate it. Remember that I'm suggesting half (or more) of the open channels in a bucket are to other buckets.
A node doesn't necessarily need to broadcast the depth at which they gossip, but could indicate a deeper depth, so as to reduce the amount of queries made to it, and hide the truth about the full information they carry. They shouldn't indicate a smaller depth than the one at which they gossip as they'll be unable to answer to a majority of queries.
If a node is being targeted they could specify a large depth at which it would be infeasible for an attacker to target, whilst secretly maintaining information at a smaller depth, but by doing so, they're also stating that they're not going to be widely useful.
I'm also not suggesting this be the only gossip strategy. At minimum I would also want friend-of-a-friend-of-a-friend type gossip in addition to this, because those channels are the ones we'll most likely be routing payments over anyway, so we should have them cached separately from this quadtree proposal. Several other gossip approaches could be used too.
> Crucially no node admits to how large an arc of this circle they map out, only that they are more likely to map points nearer to themselves.
> This is because nodes only commit to *probabilistically* being more likely to know nodes near to them, than nodes that are not near to them.
I can see how using a probabilistic approach might mitigate the issue to some extent, but it doesn't seem like a cure. Since the information nearest to you is more proabable to be forwarded to you, an attacker could still try to overwhelm your resources by plaing enough nodes nearby. How is the probability decided? Does your node probabilistically filter incoming information, or do you signal to your peers to probabilisitcally filter it?
In the latter case, while not necessarily admitting what you store, it seems like you'll still leak hints about it. What information are you proposing gets communicated (if any) between peers to filter gossip being sent?
My approach is based on the idea that the broadcaster will always know whether or not to forward you information based on the filter you give them, with the default filter being a bit-mask for the depth of the quadtree matching your PKH. A node will forward you all information about any channel which has one or both nodes matching the bit mask, and both of the nodes for every channel. Everything else gets filtered (unless other filters or strategies are agreed). A node forwarding you gossip you've specifically requested to be filtered could be considered malicious.
The filter a node communicates with its peer does not necessarily need to indicate the amount of the network they actually gossip about. For example, a node wishing to have all information about bucket 00b might request filters for buckets 0000b, 0001b, 0010b and 0011b from different peers. If each peer forwards information for their buckets, then the node will learn of all information in 00b without ever revealing to anybody, at the cost of some duplication.
> They do not commit, as in your proposal, to absolutely knowing everything within their committed area.
Ultimately, my proposal is a convention and not a hard rule. Any commitment comes with limits and it isn't to be *assumed* that a node will know everything about their local topology, only that they are very likely to know it under normal circumstances. It doesn't prevent anyone from using other techniques, or to try routing through different buckets (which one might want to try for increasing privacy).
Any feature must be considered the same. Nodes could advertize that they support a feature and then make no commitment to following it.
> Nodes that openly broadcast that they know the nodes within the same octant they are, are nodes that want to be DDoS'ed.
> Thus, my definition of a "special" node is a lot looser than you seem to define it.
> Anything that makes it possible to point to a node and say "this is a good node to attack, in order to disrupt Lightning" is special.
Acknowledged, but I'm not convinced there is a DDoS motive. Disrupting specific nodes doesn't necessarily disrupt other users. Obviously, it can disrupt the friend and friend-of-a-friend of the target, but this will always be true. The gossip network specifically intends to publish friend-of-a-friend information, so targeted DDoS attacks on individuals and their friends are not necessarily avoidable.
> * No node shall reveal the extent of its knowledge of the network, since if it reveals that it knows the entire network, it may become a target for attack in order to degrade the network.
My main question about your proposal is, given that bandwidth is a key resource constraint, what strategy are you using to prevent the sending/receiving of unwanted gossip information, which also prevents leaking information about which gossip you're interested in?
> * This also implies that if a node does not know the location of some node, instead of admitting its ignorance, it should instead delegate to another node that might have better information; otherwise it would be possible to profile nodes to determine how much of the network they know.
I'm not sure I agree. I'd prefer to go the opposite way and always fail fast.
It could be specified that if you've indicated that you'll answer queries for a specific bucket, and somebody requests information about a node or channel outside of the bucket, and which there exists no direct channel from within your bucket to that node, then you should automatically fail the query, even if you know the information.
A query will then either return a very likely yes (on information within the bucket), or a definite no (anything not in the bucket). Nothing more than which was previously specified is leaked, even if you keep more gossip information than you have publicized. If probing can only reveal information that is already public, there is nothing to be gained.
PS, don't take any of this to be a dismissal of your proposal because I fully acknowledge your perspective and concerns.
I think there are potential useful optimizations in my approach which I'm not sure how equivalent could be acheived with yours.
But if I am convinced there are unresolvable problems with the idea I'll drop it and focus on your approach.
More information about the Lightning-dev