[bitcoin-dev] Progress on bech32 for future Segwit Versions (BIP-173)

Sat Dec 5 22:59:17 UTC 2020

> On Wednesday, October 7, 2020 5:21 PM, Rusty Russell via bitcoin-dev bitcoin-dev at lists.linuxfoundation.org wrote:
>
> > I propose an alternative to length restrictions suggested by
> > Russell in https://github.com/bitcoin/bips/pull/945: use the
> > https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb variant,
> > unless the first byte is 0.
>
> Hi all,
>
> starting a slight side-thread here.

Another update, and a longer write-up.

Introduction
------------

Bech32's checksum algorithm was designed to be strong against substitution
errors, but it also provides some protection against more general classes of
errors. The final constant M that is XOR'ed into the checksum influences that
protection. BIP173 today uses M=1, but it is now known that this has a
weakness: if the final character is a "p", any number of "q" characters can be
inserted or erased right before it, without invalidating the checksum.

As it was recognized that other constants do not have this issue, the obvious
question is whether this is the only possible type of weakness, and if not, if
there is an optimal constant to use that minimizes the largest number of
weaknesses.

Since my last mail I've realized that it is actually possible to analyse the
behavior of these final constants under a wide variety of error classes
(substitutions, deletions, insertions, swaps, duplications) programatically.
Greg Maxwell and I have used this to perform an exhaustive analysis of certain
error patterns for all 2^30 possible M values, selected a number of criteria
to optimize for, and conclude that we should use as constant:

  M = 0x2bc830a3

The code used to do this analysis, as well as the code used to verify some
expected properties of the final result, and more, can be found on
https://gist.github.com/sipa/14c248c288c3880a3b191f978a34508e

See results_final.txt to see how this constant compares with the previously
suggested constants 1, 0x3fffffff, and 0x3fefffff.

Properties
----------

If we define an error as a deletion of one position, a swap of adjacent
positions, a substitution in a specific position with a random character, an
insertion of one random character in a specific position, or a duplication of
the character in a specific position, then this M constant above gives us the
following properties:

* For generic HRPs and errors that don't affect the first 6 data characters,
  or alternatively averaged over all HRPs (see details futher):
  * Always detected:
    * (P) Up to 4 substitution errors (true for any constant).
    * (Q) Any valid code with the new constant, fed without modification to
          software that uses the old constant 1 (true for any constant).
  * Any error pattern has failure to detect probability <= 2^-30:
    * (L) Any number of errors restricted to a single window of up to 4
          characters.
    * (B) Up to two errors restricted to a window of up to 68 characters.
    * (D) Any one error made to a valid code with the new constant, and fed to
          software that uses the old constant 1
  * Most error patterns have probability <= 2^-30:
    * (C) Up to two errors in general: out of 23926796 such error patterns,
          0.0040% have probability 2^-25.
    * (N) Up to three errors restricted to a window of up to 69 characters:
          out of 284708444 such patterns, 0.033% have probability 2^-25.
    * (O) Up to three errors in general: out of 295744442 such error patterns,
          0.034% have probability 2^-25; 0.000065% have probability 2^-20.
    * (G) Up to two errors made to a valid code with the new constant, and fed
          to software that uses the old constant 1: out of 2831622 such error
          patterns, 0.048% have probability 2^-25.

* Specifically for the bc1 HRP, with the BIP173 length restrictions:
  * Always detected:
    * (R) Up to 4 substitution errors (true for any constant).
    * (A) Up to 3 substitution errors made to a valid code with the new
          constant, and fed to software that uses the old constant 1.
  * Any error pattern has failure to detect probability <= 2^-30:
    * (E) Any one error.
    * (F) Any one error made to a valid code with the new constant, and fed to
          software that uses the old constant 1.
    * (H) Up to two errors restricted to a window of 28 characters.
  * Most error patterns have probability <= 2^-30:
    * (J) Up to two errors in general: out of 455916 such error patterns,
          0.039% have probability 2^-25; 0.0053% have 2^-20.
    * (K) Any number of errors restricted to a window of 4 characters: out of
          5813139 such error patterns, 0.0016% have probability 2^-25.
    * (M) Up to three errors: out of 50713466 such error patterns, 0.078% have
          probability 2^-25; 0.00063% have 2^-20.
    * (I) Up to two errors made to a valid code with the new constant, and fed
          to software that uses the old constant 1: out of 610683 such error
          patterns, 0.092% have probability 2^-25; 0.00049% have probability
          2^-20.

To give an idea of what these probabilities mean, consider the known BIP173
insertion issue. It admits an error pattern of 1 error (insertion in
penultimate position) that has a failure to detect probability of 2^-10:
it requires the final character to be 'p', and the inserted character to be
'q'. Assuming those are both random, we have a chance of 1 in 32*32 to hit it.
Note that the choice of *what* the error pattern is (whether it's insertion,
and where) isn't part of our probabilities: we try to make sure that *every*
pattern behaves well, not just randomly chosen ones, because presumably humans
make some kinds of errors more than others, and we don't know which ones.

All the analyzed patterns above are guaranteed to be detected with probability
2^-20 or better (and most are 2^-30). Of course, if we'd search for even
larger classes of errors, say any 4 independent errors of any type, we would
probably discover patterns with worse probabilities, but at that point the
probability of the pattern itself being hit should be taken into account.

The selection was made based on these same properties:
* Start with the set of all 2^30 constants.
* The generic properties (L), (B), (D), (C), (N), (O), and (G) were selected
  for by rejecting all constants that left any worse error patterns (e.g.
  all codes for which patterns matching (N) existed with failure probability
  above 2^-25 were removed). All these restrictions are as strong as they
  can be: making them over longer strings, wider windows, or more errors with
  the same restrictions removes all remaining constants. This leaves us with
  just 12054 acceptable constants.
* The same was then done for the bc1/BIP173 specific properties (A), (E), (J),
  (F), (H), (K), (M), and (I). This reduces the set further to 79 acceptable
  constants. The full analysis output for all of these can be found in
  output.txt.
* Finally, the constant with the minimal number of worst-probability patterns
  was chosen for the generic property (N). The single constant 0x2bc830a3
  remains.
* This solution and a few of its expected properties were then validated using
  a simple program that makes random errors (see the monte_carlo.py file).

Technical details
-----------------

For the purpose of this analysis, define an "error pattern" as a starting
length (of a valid string consisting of otherwise randomly chosen characters)
combined with a sequence of the following (in this order):
* 0 or more deletions of characters at specific positions (without
  constraining what those characters are)
* 0 or more swaps of characters at specific positions with the character
  following it
* 0 or more substitutions of characters at specific positions with a uniformly
  randomly selected character
* 0 or more insertions of uniformly randomly selected characters at specific
  positions
* 0 or more duplications of characters at specific positions (including
  duplications of characters inserted/substituted in the previous steps)

Examples:
* "Start with a random valid 58 character string, remove the 17th character,
  swap the 11th character with the 12th character, and insert a random
  character in the 24th position" is an error pattern.
* "Replace the 17th through 24th characters in a 78 character string with
  'aardvark'" is not an error pattern, because substituted characters have to
  be random, and can't be specific values.

Given such a pattern, assign variable names to every input character, and to
every inserted/substituted character. For example, the pattern "Start with
a 6 character string, delete the 1st character, swap the 2nd and 3rd
character, and insert a random character between those" would be represented
as [v0 v1 v2 v3 v4 v5] and [v1 v3 v6 v2 v4 v5]. Treat these variables as
elements of GF(32), and write out the equations that both the first and second
list have a valid checksum. Due to the fact that BCH codes are linear, this is
just a linear set of equations over GF(32), and we can use Gaussian
elimination to find the size of the solution space. If the input and output
are the same length, we need to subtract the number of solutions for which the
input and output are exactly the same, which is easy to find with another set
of equations. Now compute the ratio of this number divided by (32^numvars /
32^6), where the 32^6 is due to the precondition that the input string is
valid. This gives us the probability of failure, assuming input and output are
random, apart from the known relation between the two, and the fact that both
are valid.

This technique has an important limitation: it can only reason about randomly-
chosen input strings, and the presence of the HRP and version numbers at the
start violates that assumption. These are not random, and we're forced to
make one of these concessions:
1) Ignore the problem, and treat the HRP as random. This lets us derive
   properties that hold over all potential HRPs on average, but will thus fail
   to account for the possibility that for a small numbers of potential HRPs
   some error patterns may exist that behave worse. For technical reasons,
   this averaging makes all constants behave identically for error patterns
   that don't change the length of the string. Given that substitution/swap
   only errors are already dealt with well due to the BCH design this is
   perhaps not too important. One exception is frame-shifting errors (a
   deletion in one place compensated with an insertion in another place).
2) Restrict analysis to error patterns that don't affect the first 6 actual
   characters. Doing so "masks" the effect of the HRP completely.
3) Do analysis for specific HRPs only, allowing much more accurate statements,
   but HRP-specific ones that may not hold for every HRP.

Our final selection primarily optimizes for 1) and 2) as those benefit all
potential uses of the encoding, but do optimize for 3) the "bc1" prefix
specifically (and the BIP173 length restriction) as a tiebreaker.

The code for this can be found under the link above, in const_analysis.cpp.

Cheers,

--
Pieter