[Bitcoin-development] BIP39 word list

Allen Piscitello allen.piscitello at gmail.com
Fri Nov 1 23:41:53 UTC 2013


The problem with this is that you might have word A which is similar to B,
but B is also similar to C.  So we scrub B from the list, someone enters B,
and we have no way to know if it means A or C.  It leads to a much more
complicated scheme to ensure that all errors are correctable.

Scrubbing A, B, and C is preferable, since it leads to no ambiguity and
there is no need to try to correct an error.


On Fri, Nov 1, 2013 at 3:14 PM, Brooks Boyd <boydb at midnightdesign.ws> wrote:

> I was inspired to join the mailing list to comment on some of these
> discussions about BIP39, which I think will have great use in the Bitcoin
> community and outside it as a way to transcribe binary data.
>
> The one thought I had as the discussions about similar characters are
> resulting in culling words from the list, is that it only helps to validate
> input, not help the user if it is incorrect.
>
> For example, if both "cat" and "eat" were in the word list, and someone
> wrote down "eat", but later mis-translated it and put "cat" back into
> translator, the result would be a checksum error; "cat" is a different
> number, so the checksum would fail.
>
> As it currently stands, "cat" would not be a valid word ("eat" is the real
> word, and no other number is "cat"), so the translator can throw a
> different error which is more helpful (i.e. "'cat' isn't a valid word
> choice), but still doesn't get the user to the proper translation.
>
> What about if the wordlist included those "words that are so similar to
> each other that we only kept one of them" and had them all refer to the
> same number? I propose the wordlist have the possibility of multiple words
> on a single line, with the first word on the line being the "primary" or
> "real" word to be used, with the other similar words be included so that a
> translation program if it wanted to assist the user could fix their input
> for them (verbosely or not), along the lines of "'cat' isn't a valid word
> choice; assuming you meant 'eat', which is valid". You might still hit a
> checksum error if that similar word is still the wrong word, but as it
> stands now, I know you culled a bunch of words from the wordlist as "too
> similar", but if I want to try and help the user fix a bad input, I need to
> write a translation program with a full english dictionary alongside the
> BIP39 dictionary.
>
> I'd be willing to create a pull request for such an update, but before I
> delve into that, does this sound like a good idea? I could see it devolving
> into a slippery slope if every number in the 2048 set had a dozen word
> variations (misspellings, similar words, slang terms for the real word,
> etc.) which could get confusing of how similar is similar enough to be
> added as an alternate, and the standard would need to be clear that when
> translating binary to words, you only use the "main" word for that row, not
> any of the variations.
>
> MidnightLightning
>
>
> > I've just pushed updated wordlist which is filtered to similar
> characters taken from this matrix.
> > BIP39 now consider following character pairs as similar:
> >         similar = (
> >             ('a', 'c'), ('a', 'e'), ('a', 'o'),
> >             ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'),
> >             ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'),
> ('c', 'u'),
> >             ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'),
> >             ('e', 'f'), ('e', 'o'),
> >             ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'),
> >             ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'),
> >             ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'),
> >             ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'),
> >             ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'),
> >             ('k', 'x'),
> >             ('l', 't'),
> >             ('m', 'n'), ('m', 'w'),
> >             ('n', 'u'), ('n', 'z'),
> >             ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'),
> >             ('p', 'q'), ('p', 'r'),
> >             ('q', 'y'),
> >             ('s', 'z'),
> >             ('u', 'v'), ('u', 'w'), ('u', 'y'),
> >             ('v', 'w'), ('v', 'y')
> >         )
> > Feel free to review and comment current wordlist, but I think we're
> slowly moving forward final list.
> > slush
>
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20131101/2a99b12c/attachment.html>


More information about the bitcoin-dev mailing list