[bitcoin-dev] A Better MMR Definition

Peter Todd pete at petertodd.org
Wed Mar 1 22:31:01 UTC 2017

On Fri, Feb 24, 2017 at 10:23:20PM -0800, Bram Cohen wrote:
> > Your merkle-set implementation is 1500 lines of densely written Python
> The reference implementation which is included in those 1500 lines is less
> than 300 lines and fairly straightforward. The non-reference implementation
> always behaves semantically identically to the reference implementation, it
> just does so faster and using less memory.


But do you see my point here? Even though I spent some time reading through
that code, I didn't realise you had a 300 line reference implementation
embedded within those 1500 lines. This makes it less likely for you to get any
review on either.

A better way to present your work would have been to at least explain that at
the top of the file, and perhaps even better, split the reference
implementation and optimized implementation into two separate files. If you did
this, you'd be more likely to get others to review your work.

> > with
> > almost no comments,
> The comments at the top explain both the proof format and the in-memory
> data structures very precisely. The whole codebase was reviewed by a
> coworker of mine and comments were added explaining the subtleties which
> tripped him up.

Yes, and it's good that you have those comments. But the codebase itself could
definitely use more, and adding those comments would help get more people
reviewing your work.

> > and less than a 100 lines of (also uncommented) tests.
> Those tests get 98% code coverage and extensively hit not only the lines of
> code but the semantic edge cases as well. The lines which aren't hit are
> convenience functions and error conditions of the parsing code for when
> it's passed bad data

Great! But you see how without comments, it'll take a tremendous amount of work
for an external reviewer like myself to determine what is being tested, and
what edge cases you're targeting.

In fact, I'd suggest that for things like edge cases, you test edge cases in
separate unit tests that explain what edge cases you're trying to catch.

> > By
> > comparison, my Python MMR implementation is 300 lines of very readable
> > Python
> > with lots of comments, a 200 line explanation at the top, and 200 lines of
> > (commented) tests. Yet no-one is taking the (still considerable) effort to
> > understand and comment on my implementation. :)
> >
> Given that maaku's Merkle prefix trees were shelved because of performance
> problems despite being written in C and operating in basically the same way
> as your code and my reference code, it's clear that non-optimized Python
> won't be touching the bitcoin codebase any time soon.

To be clear, I gave my implementation as an example of how hard it is to get
external review, not to suggest it's going to be a part of Bitcoin; I've
pointed a lot of people to it when they asked for a MMR implementation, and I'm
sure if some of those people had reviewed it carefully they would have
suggested changes. Yet they haven't, because doing good review is a lot of

> > In particular, while at the top of merkle_set.py you have a list of
> > advantages,
> > and a bunch of TODO's, you don't explain *why* the code has any of these
> > advantages. To figure that out, I'd have to read and understand 1500 lines
> > of
> > densely written Python. Without a human-readable pitch, not many people are
> > going to do that, myself included.
> >
> It's all about cache coherence. When doing operations it pulls in a bunch
> of things which are near each other in memory instead of jumping all over
> the place. The improvements it gets should be much greater than the ones
> gained from insertion ordering, although the two could be accretive.

That's good, but that paragraph should be part of your MerkleSet git repo,
preferably in the README, where reviewers will immediately find it and get
excited about reviewing your code.

https://petertodd.org 'peter'[:-1]@petertodd.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Digital signature
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20170301/53de0322/attachment.sig>

More information about the bitcoin-dev mailing list