[bitcoin-dev] A design for Probabilistic Partial Pruning

Sat Feb 27 22:09:48 UTC 2021

On Fri, 26 Feb 2021 at 20:57, Keagan McClelland via bitcoin-dev
<bitcoin-dev at lists.linuxfoundation.org> wrote:

> The goals are to increase data redundancy in a way that more uniformly shares the load across nodes, alleviating some of the pressure of full archive nodes on the IBD problem. I am working on a draft BIP for this proposal but figured I would submit it as a high level idea in case anyone had any feedback on the initial design before I go into specification levels of detail.

You might be interested in an approach (henceforth "SeF") by Swanand
Kadhe, Jichan Chung and Kannan Ramchandran which employs fountain
codes, presented at Scaling Bitcoin 2019:
https://arxiv.org/abs/1906.12140

>From a cursory search it appears there is quite a bit of
related/followup work as well.

The simplest fountain code, the Luby Transform (applied in this work)
the encoder divides a message into smaller chunks, and then constructs
an infinite stream of codewords which are XORs of d randomly selected
chunks where d is sampled from the robust soliton distribution. The
simplest decoder simply XORs new k=1 codewords from the relevant k>1
codewords, and the full message can be recovered with overwhelming
probability and in linear time with a sufficiently large random sample
of codewords from the encoded stream. Note that the decoder must know
which chunks went into a codeword, but this is usually addressed using
pseudorandomness, which has other motivations in an adversarial
setting.

In SeF, the general idea is that "droplet nodes" are pruning nodes
which retain some number (equivalent to your threshold parameter) of
codewords from the encoding concatenated blocks (to obtain a fixed
message size), and serve these to compatible clients. This is more
robust than retaining a random sample of blocks, and also performs
well according to their simulations.

Even if this paper is not directly applicable to your efforts, the
theory of fountain codes should be of interest (many variants exist),
and there is work on fountain codes. There is also some work on
fountain codes in an adversarial setting (Falcon codes) but I'm only
vaguely familiar with it, and if i'm not mistaken most of the
considerations are either already implicitly addressed by the Bitcoin
protocol or explicitly addressed in the SeF paper, whose results also
take into account malicious droplet nodes.