[Bitcoin-ml] UTXO Commitments

Tomas tomas at bitcrust.org
Thu Mar 1 11:23:12 UTC 2018

Let me share some progress and ideas related to UTXO commitments.


Adding a UTXO commitment to the blocks has two major use cases:

1. Allow fast syncing of full nodes as they can download the UTXO set instead of historic blocks
2. Provide UTXO inclusion/exclusion proofs.

I believe especially 1. is an enormous gain, as it saves full nodes from downloading the entire chain without compromising trust or security.

Let me address three different ways to approach this:

A. Flat ECMH commitment

We use an Elliptic Curve Multiset Hash to merge all UTXO's into a single hash. Such multiset hash allows us to add and remove UTXO's to it in place, and can therefore be cheaply maintained.

This only solves (1.) fast syncing as it is only possible to verify the *entire* UTXO set against this hash.

The current ECMH implementation takes about 10us per operation, which is rather acceptable.

Note that adding the commitment is only the first step in fast syncing. The next steps would include providing getutxo/utxo p2p messages, and after that implementing fast sync.

* ECMH: https://github.com/tomasvdw/bips/blob/master/ecmh.mediawiki
* Commitment: TODO

Code (WIP): 
* https://github.com/tomasvdw/bitcoin-abc/tree/utxoflat  (reviews on reviews.bitcoin-abc.org)

More info:
* https://github.com/tomasvdw/bitcoin-abc/blob/utxoflat/src/secp256k1/src/modules/multiset/README.md
* https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html
* https://arxiv.org/pdf/1601.06502.pdf
* https://eprint.iacr.org/2009/226.pdf

B. Bucketed ECMH commitment

This approach is similar to A, except that we divide the UTXO set in different buckets (or "shards") based on prefix. The different prefixes are than hashed together in a tree.

This would allow UTXO inclusion and exclusion proofs, with the size of a single bucket. The drawback is that maintaining a large number of ECMH commitments is considerably more heavy than maintaining a single one. Although the ECMH operations themselves are still 10us, gathering everything into a single hash requires normalizing each ECMH bucket and this takes 150ms-300ms per block depending on the bucket size (and thus proof-size).

We will need to look at improvements before this is feasible

* ECMH (same as ^): https://github.com/tomasvdw/bips/blob/master/ecmh.mediawiki
* Commitment: https://github.com/tomasvdw/bips/blob/master/BIP-UtxoCommitBucket.mediawiki

Code (WIP)
* https://github.com/tomasvdw/bitcoin-abc/tree/utxocommit

More info:
* https://lists.linuxfoundation.org/pipermail/bitcoin-ml/2017-September/000240.html

C. Commitment enabled UTXO storage

An alternative to ECMH would be to maintain the UTXO set in a storage that directly provides a commitment, instead of maintaining or calculating it separately. This could be done by hashing the UTXOs in a binary merkle tree based on prefix, or a "merklix" tree.

Even if the hash is specified as a binary merklix, its not necessary to use an actual binary tree for storage; an implementation can opt for a much more efficient concurrent hashtree (CTrie) structure, while calculating the root hash *as if* it were a binary tree.

The challenge of this approach is that - like for instance LevelDB - such storage does not utilize the MRU access pattern of UTXO's, which makes it rather cache-inefficient. Yet unlike the current implementations we would not be able to easily use a custom cache. I can imagine it may  be possible to improve on this by adding a clever block height branch at the right position in the tree, where we trade multiple branch lookups for cache efficiency (which is effectively what  the currrent cache does). This however needs more research.

More info:
* https://www.deadalnix.me/2016/11/06/using-merklix-tree-to-shard-block-validation/
* https://en.wikipedia.org/wiki/Ctrie

Versioning and conclusion

It is never easy to come to agreement on the details. Luckily it is trivial to version the commitment by making it optional. We only specify that *if* present it must be correct.

Such versioning allows us to start with the simplest "A." flat ECMH solution as "version 1 commitment", and continue our research from there. 

Progress with A. is being made, and in my opinion including it has major benefits and little drawbacks. Hopefully we can agree on the value of such a first step though if not, feel free to comment.

Tomas van der Wansem

More information about the bitcoin-ml mailing list