[Bitcoin-development] Proposed additional options for pruned nodes

Tue May 12 19:03:55 UTC 2015

It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.

Summarizing from memory:

(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges.   Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.

(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.

(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.

(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.

(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.

(5) The communication about what blocks a node has should be compact.

(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)

(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.

I've previously proposed schemes which come close but fail one of the above.

(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.;   or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)

So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.