[bitcoin-dev] SHA1 collisions make Git vulnerable to attakcs by third-parties, not just repo maintainers

Fri Feb 24 23:49:36 UTC 2017

If the 20 byte SHA1 is now considered insecure (with good reason), what about RIPEMD-160 which is the foundation of Bitcoin addresses?

Is that also susceptible to such an attack vector?

What does that mean for old addresses?

etc

/s

> Date: Fri, 24 Feb 2017 11:04:54 +0100
> From: Tim Ruffing <tim.ruffing at mmci.uni-saarland.de>
> To: bitcoin-dev at lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> 	attakcs by third-parties, not just repo maintainers
> Message-ID: <1487930694.1528.1.camel at mmci.uni-saarland.de>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Fri, 2017-02-24 at 00:57 +0100, Aymeric Vitte via bitcoin-dev wrote:
>> 
>> I have not worked on this since some time, so that's just thoughts,
>> but maybe it can render things much more difficult
>> than???????computing two files until the same hash is found
>> 
> 
> You basically rely on the idea that specific collisions are more
> difficult to find.?This trick or similar tricks will not help.?(And
> actually, the more files you add to the hash, the more freedom you give
> the attacker.)
> 
> Even if certain collisions are more difficult to find today (which is
> certainly true), the general rule is that someone will prove you wrong
> in a year.
> 
> Even if ignore security entirely, switching to new hash function is
> much simpler trying to fix the usage of a broken hash function.
> 
> Relying on SHA1 is hopeless. We have to get rid of it.
> 
> Best,
> Tim
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 24 Feb 2017 16:18:43 +0100
> From: Aymeric Vitte <vitteaymeric at gmail.com>
> To: bitcoin-dev at lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> 	attakcs by third-parties, not just repo maintainers
> Message-ID: <15848c1b-2873-35e8-0588-c636126257df at gmail.com>
> Content-Type: text/plain; charset=utf-8
> 
> Not sure that you really read deeply what I sent, because stating that
> hashing files continuously instead of hashing the intermediate steps
> just gives more latitude to the attacker can't be true when the attacker
> has absolutely no control over the past files
> 
> I did not write this as a workaround to fix SHA1, which will be dead
> soon or later but as maybe some general concept that could possibly help
> whatever hash function you are using for objects that are not frozen but
> extending (ie the original email stating that trees might be some kind
> of worse candidates for collisions reminded me this), indeed it makes no
> sense to patch SHA1 or play around, but this kind of proposal could
> accompany the defunct
> 
> The drawback is that you have to keep the hash state when you close the
> latest hash computation in order to start the next one
> 
> Then the question is: knowing the hash state, is it as easy to find a
> collision between two files that will be computed in the next round than
> finding a collision between two files only?
> 
> Knowing that you can probably modify the hash state with some
> unpredictable patterns
> 
> Most likely the answer is: no, it's (astronomically?) more difficult
> 
> Please take it as a suggestion that might be explored (ps: I have the
> code for this if needed) rather than an affirmation, still amazed as
> shown in the few links provided (among others) that each time I raise
> this subject nobody really pays attention (what's the use case?, etc)
> and by the fact that it's apparently used by only one project in the
> world and not supported by any library
> 
> 
> Le 24/02/2017 ? 11:04, Tim Ruffing via bitcoin-dev a ?crit :
>> On Fri, 2017-02-24 at 00:57 +0100, Aymeric Vitte via bitcoin-dev wrote:
>>> I have not worked on this since some time, so that's just thoughts,
>>> but maybe it can render things much more difficult
>>> than       computing two files until the same hash is found
>>> 
>> You basically rely on the idea that specific collisions are more
>> difficult to find. This trick or similar tricks will not help. (And
>> actually, the more files you add to the hash, the more freedom you give
>> the attacker.)
>> 
>> Even if certain collisions are more difficult to find today (which is
>> certainly true), the general rule is that someone will prove you wrong
>> in a year.
>> 
>> Even if ignore security entirely, switching to new hash function is
>> much simpler trying to fix the usage of a broken hash function.
>> 
>> Relying on SHA1 is hopeless. We have to get rid of it.
>> 
>> Best,
>> Tim
>> 
>> 
>> 
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev at lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> -- 
> Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
> Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
> Get the torrent dynamic blocklist: http://peersm.com/getblocklist
> Check the 10 M passwords list: http://peersm.com/findmyass
> Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
> Peersm : http://www.peersm.com
> torrent-live: https://github.com/Ayms/torrent-live
> node-Tor : https://www.github.com/Ayms/node-Tor
> GitHub : https://www.github.com/Ayms
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 24 Feb 2017 17:30:49 +0100
> From: Tim Ruffing <tim.ruffing at mmci.uni-saarland.de>
> To: bitcoin-dev at lists.linuxfoundation.org
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> 	attakcs by third-parties, not just repo maintainers
> Message-ID: <1487953849.5148.2.camel at mmci.uni-saarland.de>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Fri, 2017-02-24 at 16:18 +0100, Aymeric Vitte via bitcoin-dev wrote:
>> Not sure that you really read deeply what I sent, because stating
>> that
>> hashing files continuously instead of hashing the intermediate steps
>> just gives more latitude to the attacker can't be true when the
>> attacker
>> has absolutely no control over the past files
> What prevents the attacker to provide different past files when talking
> to parties who are still in the initial state?
> 
> Then the question is: knowing the hash state, is it as easy to find a
>> collision between two files that will be computed in the next round
>> than
>> finding a collision between two files only?
> With the original usage of the hash function, the hash state is always
> the initial state. Now that the attacker has some control over the hash
> state even. In other words, if the original use of the hash function
> was vulnerable, then your scheme is vulnerable for the initial state.
> 
> Concrete attack: If you can find x != y with H(x) = H(y), then you can
> also find m, x != y, with H(m||x) = H(m||y), just by setting m = "". 
> 
> Not sure if this is the right place to discuss that issue though...
> 
> Best,
> Tim
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 24 Feb 2017 18:29:50 +0100
> From: Aymeric Vitte <vitteaymeric at gmail.com>
> To: Tim Ruffing <tim.ruffing at mmci.uni-saarland.de>,	Bitcoin Protocol
> 	Discussion <bitcoin-dev at lists.linuxfoundation.org>
> Subject: Re: [bitcoin-dev] SHA1 collisions make Git vulnerable to
> 	attakcs by third-parties, not just repo maintainers
> Message-ID: <b557a0de-2492-80a1-eff7-229503ae382d at gmail.com>
> Content-Type: text/plain; charset=windows-1252
> 
> ??? apparently we are not discussing the same thing
> 
> Maybe I did not provide the right links (reading them again I myself
> don't find them so clear), see maybe again
> https://github.com/whatwg/streams/issues/33#issuecomment-28045860
> 
> a - b - c -d
> 
> hash(a)
> 
> hash(a+b)
> 
> etc
> 
> But you are not going to rehash from the beginning, then:
> 
> update a --> keep the remaining bytes a_ (+ hash state 1) --> digest
> a=hash(a)
> 
> update a_+b from hash state 1--> keep the remaining bytes b_ (+ hash
> state 2) --> digest a_+b=hash(a+b)
> 
> etc
> 
> Basically that's similar to a real time progressive hash of chunks of a
> file that you are streaming and therefore don't know what will come next
> (per opposition to hashing a file that you already have), this could
> apply to trees
> 
> This is different from something like:
> 
> hash(a)
> 
> hash(hash(a) +hash(b))
> 
> etc
> 
> There is no initial state, and the attacker can't modify what was
> already hashed, to make it more difficult you can probably modify the
> hash state N
> 
> 
> Le 24/02/2017 ? 17:30, Tim Ruffing via bitcoin-dev a ?crit :
>> On Fri, 2017-02-24 at 16:18 +0100, Aymeric Vitte via bitcoin-dev wrote:
>>> Not sure that you really read deeply what I sent, because stating
>>> that
>>> hashing files continuously instead of hashing the intermediate steps
>>> just gives more latitude to the attacker can't be true when the
>>> attacker
>>> has absolutely no control over the past files
>> What prevents the attacker to provide different past files when talking
>> to parties who are still in the initial state?
>> 
>> Then the question is: knowing the hash state, is it as easy to find a
>>> collision between two files that will be computed in the next round
>>> than
>>> finding a collision between two files only?
>> With the original usage of the hash function, the hash state is always
>> the initial state. Now that the attacker has some control over the hash
>> state even. In other words, if the original use of the hash function
>> was vulnerable, then your scheme is vulnerable for the initial state.
>> 
>> Concrete attack: If you can find x != y with H(x) = H(y), then you can
>> also find m, x != y, with H(m||x) = H(m||y), just by setting m = "". 
>> 
>> Not sure if this is the right place to discuss that issue though...
>> 
>> Best,
>> Tim
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev at lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> -- 
> Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
> Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
> Get the torrent dynamic blocklist: http://peersm.com/getblocklist
> Check the 10 M passwords list: http://peersm.com/findmyass
> Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
> Peersm : http://www.peersm.com
> torrent-live: https://github.com/Ayms/torrent-live
> node-Tor : https://www.github.com/Ayms/node-Tor
> GitHub : https://www.github.com/Ayms
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Fri, 24 Feb 2017 14:20:19 -0800
> From: Bram Cohen <bram at bittorrent.com>
> To: Peter Todd <pete at petertodd.org>
> Cc: Bitcoin Protocol Discussion
> 	<bitcoin-dev at lists.linuxfoundation.org>
> Subject: Re: [bitcoin-dev] A Better MMR Definition
> Message-ID:
> 	<CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> So your idea is to cluster entries by entry time because newer things are
> more likely to leave and updating multiple things near each other is
> cheaper?
> 
> That can be done with my tool. Instead of using hashes for the values being
> stored, you use position entries. The first entry gets a value of all
> zeros, the next one a one followed by all zeros, then the next two
> correspond to the first two with the second bit flipped to one, then the
> next four the first four with the third bit flipped to one, etc. It
> probably performs a little bit better to do it two bits at a time instead
> of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
> 0110, 0111, 1001, etc. If you were to really use this you'd probably want
> to to add some optimizations to use the fact that the terminals fit in 64
> bits instead of 256, but it mostly works unchanged, and gets whatever
> benefits there are to this clustering plus the high performance
> implementation tricks I've built which I keep complaining that nobody's
> giving feedback on.
> 
> I'm not sold on this being a win: The empirical access patterns are
> unknown, it requires an extra cache miss per lookup to find the entry
> number, it may be that everything is optimized well enough without it for
> there to be no meaningful gains, and it's a bunch of extra complexity. What
> should be done is that a plain vanilla UTXO set solution is optimized as
> well as it can be first, and then the insertion ordering trick is tried as
> an optimization to see if it's an improvement. Without that baseline
> there's no meaningful basis for comparison, and I'm quite confident that a
> naive implementation which just allocates individual nodes will
> underperform the thing I've come up with, even without adding optimizations
> related to fitting in 64 bits.
> 
> On Thu, Feb 23, 2017 at 8:36 PM, Peter Todd <pete at petertodd.org> wrote:
> 
>> On Thu, Feb 23, 2017 at 07:32:43PM -0800, Bram Cohen wrote:
>>> On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete at petertodd.org> wrote:
>>> 
>>>> 
>>>> Glad we're on the same page with regard to what's possible in TXO
>>>> commitments.
>>>> 
>>>> Secondly, am I correct in saying your UTXO commitments scheme requires
>>>> random
>>>> access? While you describe it as a "merkle set", obviously to be
>> merkelized
>>>> it'll have to have an ordering of some kind. What do you propose that
>>>> ordering
>>>> to be?
>>>> 
>>> 
>>> The ordering is by the bits in the hash. Technically it's a Patricia
>> Trie.
>>> I'm using 'merkle tree' to refer to basically anything with a hash root.
>> 
>> The hash of what? The values in the set?
>> 
>>>> Maybe more specifically, what exact values do you propose to be in the
>> set?
>>>> 
>>>> 
>>> That is unspecified in the implementation, it just takes a 256 bit value
>>> which is presumably a hash of something. The intention is to nail down a
>>> simple format and demonstrate good performance and leave those semantics
>> to
>>> a higher layer. The simplest thing would be to hash together the txid and
>>> output number.
>> 
>> Ok, so let's assume the values in the set are the unspent outpoints.
>> 
>> Since we're ordering by the hash of the values in the set, outpoints will
>> be
>> distributed uniformly in the set, and thus the access pattern of data in
>> the
>> set is uniform.
>> 
>> Now let's fast-forward 10 years. For the sake of argument, assume that for
>> every 1 UTXO in the set that corresponds to funds in someone's wallet that
>> are
>> likely to be spent, there are 2^12 = 4096 UTXO's that have been permanently
>> lost (and/or created in spam attacks) and thus will never be spent.
>> 
>> Since lost UTXO's are *also* uniformly distributed, if I'm processing a new
>> block that spends 2^12 = 4096 UTXO's, on average for each UTXO spent, I'll
>> have to update log2(4096) = 12 more digests than I would have had those
>> "dead"
>> UTXO's not existed.
>> 
>> Concretely, imagine our UTXO set had just 8 values in it, and we were
>> updating
>> two of them:
>> 
>>               #
>>              / \
>>             /   \
>>            /     \
>>           /       \
>>          /         \
>>         #           #
>>        / \         / \
>>       /   \       /   \
>>      #     .     .     #
>>     / \   / \   / \   / \
>>    .   X .   . .   . X   .
>> 
>> To mark two coins as spent, we've had to update 5 inner nodes.
>> 
>> 
>> Now let's look at what happens in an insertion-ordered TXO commitment
>> scheme.
>> For sake of argument, let's assume the best possible case, where every UTXO
>> spent in that same block was recently created. Since the UTXO's are
>> recently
>> created, chances are almost every single one of those "dead" UTXO's will
>> have
>> been created in the past. Thus, since this is an insertion-ordered data
>> structure, those UTXO's exist in an older part of the data structure that
>> our
>> new block doesn't need to modify at all.
>> 
>> Concretely, again let's imagine a TXO commitment with 8 values in it, and
>> two
>> of them being spent:
>> 
>>               #
>>              / \
>>             /   \
>>            /     \
>>           /       \
>>          /         \
>>         .           #
>>        / \         / \
>>       /   \       /   \
>>      .     .     .     #
>>     / \   / \   / \   / \
>>    .   . .   . .   . X   X
>> 
>> To mark two coins as spent, we've only had to update 3 inner nodes; while
>> our
>> tree is higher with those lost coins, those extra inner nodes are amortised
>> across all the coins we have to update.
>> 
>> 
>> The situation gets even better when we look at the *new* UTXO's that our
>> block
>> creates. Suppose our UTXO set has size n. To mark a single coin as spent,
>> we
>> have to update log2(n) inner nodes. We do get to amortise this a bit at
>> the top
>> levels in the tree, but even if we assume the amortisation is totally free,
>> we're updating at least log2(n) - log2(m) inner nodes "under" the amortised
>> nodes at the top of the tree for *each* new node.
>> 
>> Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to
>> the
>> data set goes in the same place - the end. So almost none of the existing
>> data
>> needs to be touched to add the new UTXOs. Equally, the hashing required
>> for the
>> new UTXO's can be done in an incremental fashion that's very L1/L2 cache
>> friendly.
>> 
>> 
>> tl;dr: Precisely because access patterns in TXO commitments are *not*
>> uniform,
>> I think we'll find that from a L1/L2/etc cache perspective alone, TXO
>> commitments will result in better performance than UTXO commitments.
>> 
>> 
>> Now it is true that Bitcoin's current design means we'll need a map of
>> confirmed outpoints to TXO insertion order indexes. But it's not
>> particularly
>> hard to add that "metadata" to transactions on the P2P layer in the same
>> way
>> that segwit added witnesses to transactions without modifying how txids
>> were
>> calculated; if you only connect to peers who provide you with TXO index
>> information in blocks and transactions, you don't need to keep that map
>> yourself.
>> 
>> Finally, note how this makes transactions *smaller* in many circumstances:
>> it's
>> just a 8-byte max index rather than a 40 byte outpoint.
>> 
>> --
>> https://petertodd.org 'peter'[:-1]@petertodd.org
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20170224/63ab2731/attachment.html>
> 
> ------------------------------
> 
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> 
> End of bitcoin-dev Digest, Vol 21, Issue 34
> *******************************************