[Bitcoin-ml] Partial UTXO tree as commitment

Chris Pacia ctpacia at gmail.com
Tue Sep 5 18:07:11 UTC 2017

Given the primary use case of fraud proofs would likely be rarely used I
think the proof size is pretty good. Haven't had time to digest the rest
of it yet.

On 09/05/2017 08:40 AM, Tomas via bitcoin-ml wrote:
> I would like to propose an efficient UTXO commitment scheme.
> A UTXO commitment can be useful for:
> 1. Fast syncing a full node, by downloading the UTXO-set
> 2. Proofing (non) existence of a UTXO..
> Various schemes have been proposed:
> * Merkle/radix trees and variants; all of which have the problem that
> they significantly increase the burden of maintaining the UTXO set.
> Furthermore, such schemes tend to practically prescribe the UTXO storage
> format severely limiting continuous per-implementation optimizations.
> * A "flat" rolling hash, eg the ECMH proposed by Pieter Wiulle which is
> cheap to calculate but only solves (1) and not (2).
> I propose a hybrid approach, with very limited extra burden to maintain
> and reasonably small proofs: 
> We divide the UTXO set in buckets by prefix of their TXID, then maintain
> a rolling hash for each bucket. The commitment is then the root of the
> tree constructed from the resulting bucket hashes. To construct the
> tree: For each depth, we group the hashes of the next depth per 64
> hashes and calculate the rolling hash of each. (Effectively, this is a
> prefix tree with a fixed branch-size of 64).
> Bucketcount
> -------------------
> txcount = number of TXIDs in the UTXO set
> bucketcount = (smallest power of 2 larger than sqrt(txcount))  << 6
> Rationale for bucketcount:
> * This currently gives a bucketcount of 2^19, which is very cheap to
> maintain with a 16mb array of rolling hashes.
> * This currently gives an average bucket size of 4kb. With a rolling
> hash, full nodes don't need to maintain the buckets themselves, but they
> are used for proofs.
> * The burden of future UTXO growth is divided among maintaining the
> rolling hashes and size of the proof: 10,000x as large UTXO set (20TB),
> gives ~400kb buckets and ~1.6gb in maintaining rolling hashes.
> * This gives a tree depth of 5, which means the cost of every UTXO
> update is increased  by ~3 rolling hashes (and a double SHA), as the
> lowest depths don't benefit from caching.
> * A proof for (non) existence of a UTXO is ~ 4*64*32 =8kb (branch-nodes)
> + 4kb (bucket) = ~12kb
> Specification [WIP]
> ---------------------------
> We define the "UTXO commitment" as the serialized byte array: "U" "T"
> "X" "O" VARINT(version) VARINT(txcount) UINT256(UTXO-root)    [todo
> clarify]
> A block that contains an output in the coinbase whose scriptPubKey
> consists solely of OP_RETURN [UTXO commitment] must be rejected if in
> the UTXO commitment the version equals 1 and either 
> * After updating the UTXO state, the number of distinct TXIDs in the
> UTXO set is not equal to the txcount value of the UTXO commitment
> * After updating the UTXO state, the UTXO-root in the UTXO commitment is
> not equal to the UTXO-root defined below.
> The UTXO-root can be calculated as follows:
> * Define _bucketcount_ as (smallest power of 2 larger than
> sqrt(txcount))  << 6
> * Given a TXID in the UTXO set, define UTXO(TXID) as the double SHA256
> of (TXID + coins). (coins is the serialization of unspent outputs to be
> spec'ed). 
> * Let bucket N be the set of values UTXO(TXID) for each TXID in the
> UTXO-set where (TXID mod _bucketcount_) equals N.
> * Let rhash N be the rolling hash (TBD) of all values in bucket N
> * Let the hash sequence be the ordered sequence  rhash
> [0,_bucketcount_).
> 1. If the hash sequence contains at most 64 entries, then the UTXO-root
> is the rolling hash of all entries in the hash sequence, otherwise:
> 2. Group the hash sequence in ordered subsequences of 64 entries each.
> 3. Find the rolling hash of each subsequence
> 4. Continue with 1., with the hash sequence being the ordered sequence
> of these rolling hashes.
> Note: an implementation may want to maintain and update the set of
> rolling hashes at higher depths on each UTXO set operation.
> Note: the secure ECMH is a good candidate for the bucket hash. This
> could also be used for the branch rolling hashes, but it might be worth
> considering XOR for those as there seem to be simply not enough
> candidates to find a colliding set?
> Note: two magic numbers are used: "<< 6" for the bucket count, and "64"
> for the branch size. They work nicely but are pulled out of a dark place
> and merit some experimentation.
> Use cases for light clients
> -------------------------------------
> These UTXO proofs could be used as compact fraud proofs, although the
> benefit of this is not generally agreed upon.
> They can also be used to increase low-conf security to light clients, by
> validating the signatures and order-validity of incoming transactions
> against the right bucket of the current UTXO set.
> An interesting use case may be another type of light client. It could be
> interesting for a light client to abandon the bloom filters, and instead
> use the UTXO proofs to verify whether an incoming or outgoing
> transaction is confirmed. This could be beneficial for "rarely active"
> light clients such as smartphone apps, as it prevents the need to
> synchronize previous blocks with bloom filters, and allows syncing to
> the latest block with 12kb/output.
> Summary 
> --------------
> * Allows fast full node syncing.
> * Costs full nodes ~20mb extra in RAM
> * Costs full nodes ~3 rolling hash operations per UTXO operation.
> * Allows UTXO (non) existence proofs for currently avg ~12kb.
> * Size of proof grows O(sqrt(N)) with UTXO set
> * Size of extra full node memory grows O(sqrt(N)) with UTXO set
> Tomas van der Wansem
> bitcrust
> (Thanks to deadalnix for slack discussion)
> _______________________________________________
> bitcoin-ml mailing list
> bitcoin-ml at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-ml

More information about the bitcoin-ml mailing list