[Bitcoin-ml] Partial UTXO tree as commitment

Tomas tomas at bitcrust.org
Tue Sep 5 12:40:18 UTC 2017

I would like to propose an efficient UTXO commitment scheme.

A UTXO commitment can be useful for:

1. Fast syncing a full node, by downloading the UTXO-set
2. Proofing (non) existence of a UTXO..

Various schemes have been proposed:

* Merkle/radix trees and variants; all of which have the problem that
they significantly increase the burden of maintaining the UTXO set.
Furthermore, such schemes tend to practically prescribe the UTXO storage
format severely limiting continuous per-implementation optimizations.
* A "flat" rolling hash, eg the ECMH proposed by Pieter Wiulle which is
cheap to calculate but only solves (1) and not (2).

I propose a hybrid approach, with very limited extra burden to maintain
and reasonably small proofs: 

We divide the UTXO set in buckets by prefix of their TXID, then maintain
a rolling hash for each bucket. The commitment is then the root of the
tree constructed from the resulting bucket hashes. To construct the
tree: For each depth, we group the hashes of the next depth per 64
hashes and calculate the rolling hash of each. (Effectively, this is a
prefix tree with a fixed branch-size of 64).

txcount = number of TXIDs in the UTXO set
bucketcount = (smallest power of 2 larger than sqrt(txcount))  << 6

Rationale for bucketcount:

* This currently gives a bucketcount of 2^19, which is very cheap to
maintain with a 16mb array of rolling hashes.
* This currently gives an average bucket size of 4kb. With a rolling
hash, full nodes don't need to maintain the buckets themselves, but they
are used for proofs.
* The burden of future UTXO growth is divided among maintaining the
rolling hashes and size of the proof: 10,000x as large UTXO set (20TB),
gives ~400kb buckets and ~1.6gb in maintaining rolling hashes.
* This gives a tree depth of 5, which means the cost of every UTXO
update is increased  by ~3 rolling hashes (and a double SHA), as the
lowest depths don't benefit from caching.
* A proof for (non) existence of a UTXO is ~ 4*64*32 =8kb (branch-nodes)
+ 4kb (bucket) = ~12kb

Specification [WIP]
We define the "UTXO commitment" as the serialized byte array: "U" "T"
"X" "O" VARINT(version) VARINT(txcount) UINT256(UTXO-root)    [todo

A block that contains an output in the coinbase whose scriptPubKey
consists solely of OP_RETURN [UTXO commitment] must be rejected if in
the UTXO commitment the version equals 1 and either 
* After updating the UTXO state, the number of distinct TXIDs in the
UTXO set is not equal to the txcount value of the UTXO commitment
* After updating the UTXO state, the UTXO-root in the UTXO commitment is
not equal to the UTXO-root defined below.

The UTXO-root can be calculated as follows:

* Define _bucketcount_ as (smallest power of 2 larger than
sqrt(txcount))  << 6
* Given a TXID in the UTXO set, define UTXO(TXID) as the double SHA256
of (TXID + coins). (coins is the serialization of unspent outputs to be
* Let bucket N be the set of values UTXO(TXID) for each TXID in the
UTXO-set where (TXID mod _bucketcount_) equals N.
* Let rhash N be the rolling hash (TBD) of all values in bucket N
* Let the hash sequence be the ordered sequence  rhash

1. If the hash sequence contains at most 64 entries, then the UTXO-root
is the rolling hash of all entries in the hash sequence, otherwise:
2. Group the hash sequence in ordered subsequences of 64 entries each.
3. Find the rolling hash of each subsequence
4. Continue with 1., with the hash sequence being the ordered sequence
of these rolling hashes.

Note: an implementation may want to maintain and update the set of
rolling hashes at higher depths on each UTXO set operation.

Note: the secure ECMH is a good candidate for the bucket hash. This
could also be used for the branch rolling hashes, but it might be worth
considering XOR for those as there seem to be simply not enough
candidates to find a colliding set?

Note: two magic numbers are used: "<< 6" for the bucket count, and "64"
for the branch size. They work nicely but are pulled out of a dark place
and merit some experimentation.

Use cases for light clients
These UTXO proofs could be used as compact fraud proofs, although the
benefit of this is not generally agreed upon.

They can also be used to increase low-conf security to light clients, by
validating the signatures and order-validity of incoming transactions
against the right bucket of the current UTXO set.

An interesting use case may be another type of light client. It could be
interesting for a light client to abandon the bloom filters, and instead
use the UTXO proofs to verify whether an incoming or outgoing
transaction is confirmed. This could be beneficial for "rarely active"
light clients such as smartphone apps, as it prevents the need to
synchronize previous blocks with bloom filters, and allows syncing to
the latest block with 12kb/output.

* Allows fast full node syncing.
* Costs full nodes ~20mb extra in RAM
* Costs full nodes ~3 rolling hash operations per UTXO operation.
* Allows UTXO (non) existence proofs for currently avg ~12kb.
* Size of proof grows O(sqrt(N)) with UTXO set
* Size of extra full node memory grows O(sqrt(N)) with UTXO set

Tomas van der Wansem

(Thanks to deadalnix for slack discussion)

More information about the bitcoin-ml mailing list