[Lightning-dev] Blinded channel observation

Thu Aug 18 06:47:49 UTC 2016

Tadge Dryja <tadge at lightning.network> writes:
> There's two approaches with encrypted vs non-encrypted: the non-encrypted
> design which I kindof like, is to make all the information given to the
> observer not mean anything on its own.  With encrypted, you achieve the
> same result, but have some decryption key stuffed somewhere in the observed
> transaction to reveal meaningful data which identifies the channel, but is
> encrypted.
>
> Non-encrypted can be more efficient, because it's hard to squeeze down
> compact encrypted data (though see below for an attempt!).  But most things
> in the channel states can be obfuscated such that even if you tell
> everything to the observer, they don't learn anything.  (Even in the case
> where the observer is watching both sides of the channel, they shouldn't be
> able to match them... well other than timing, which is admittedly a very
> effective way to do it!)
>
> I skipped over HTLCs though because they didn't fit with this model.  And
> they really don't -- unlike the updating pubkeys in the commit tx, HTLC's
> are passed though multiple nodes, so information about them can get to the
> observer pretty easily.  So I think HTLCs would need to be in some kind of
> encrypted blob to send to the observer.

Yeah, I think that makes the non-encrypted idea a non-starter; we need
to steal those HTLCs without introducing nasty restrictions.

> I really like txid[0:16] as the truncated txid for the observer and
> txid[16:32] as the decryption key because it's simple and quite fast.  This
> would allow constant-time lookups into the observer's database regarless of
> how many channels it's watching, which HMAC'ing the txid doesn't have.
>  (You could hash txid[16:32] again for the decryption key if you want 32
> bytes.)

Yeah; I'd just hash the whole txid again though.  I mean, why not?

> The non-HTLC data can be sent unencrypted -- it's pretty much just a
> signature and hash from the tree.  If there is a new HTLC (or a few) added
> in that state, the node can elect to send that to the observer as well.  I
> think the format can be something like:

You can't send hash in the clear unless you're using variable keys in
commit txs (since that hash is what makes the previous txids
unguessable).  But that's OK, encrypting it doesn't hurt us AFAICT.

> htlc #1 expiry (4 bytes)
> htlc #1 preimage (20 bytes)
> htlc #2 expiry (4 bytes)
> htlc #2 preimage (20 bytes)
> offset to previous blob (2 bytes)
> decrypt key for previous blob (16 bytes)
>
> having pointers to previous states can save a lot of space if HTLCs are
> added incrementally.  The "blobs" can be kept in a separate data store
> indexed by state number, so it's quick to see that, e.g, state 471 also has
> an HTLC from state 465, which has HTLCs from state 442.  This chained
> decryption may end up revealing more HTLCs than are needed (which are quick
> for the observer to detect and discard) but if the fraud has occurred then
> anonymity is gone anyway and it's no big deal if the observer learns a
> little more -- they already learned all the important stuff.

Hmm, this is more sophisticated that my suggestion, by allowing
reference to *any* prev blob.

Would that win much though?  If HTLCs have been added then removed, it's
possible, but I'm not sure how much it saves.

eg.
                HTLCs
Commit tx:
1               1
2               1       2
3                       2 
4                       2       3       4       5
5                                               5
6                                                       6       7
7                                                       6               8
8                                                       6

Using a heuristic says "don't ever leak more than <current-htlcs>
unnessary HTLCs" gives:

#1: HTLC1
#2: (references #1) HTLC2
#3: (references #2)
#4: HTLC2 HTLC3 HTLC4 HTLC5
#5: HTLC5
#6: HTLC6 HTLC7
#7: (references #6) HTLC8
#8: (refefences #6) HTLC6

Assuming 24 bytes per HTLC, and 18 bytes per backref, the naive encoding
would be 14 * 24 = 336, backref = 11 * 24 + 3 * 18 = 318 bytes.  Not
much.

OK, let me write a quick simulator, with exponential distribution times
(because bitcoin!).  Hmm, attached below.  It only references the
previous HTLC, not any previous, but it does show that it's hard to make
the savings more than 50% (reduce the chance of an HTLC expiring to 1%,
for example).

See horrible hacked-up code below.

> Also, yeah, padding (handwave) and timing are what make hiding the channel
> very tricky, especially for HTLCs.  With non-HTLC updates, it can be hard
> to know when 2 nodes are updating a channel state, but with HTLCs there are
> more nodes in the mix with more points for data to leak out to the
> observer.  That's another reason you might want to omit sending out some
> portion of HTLC recovery data.

Yes, this is the hard part.

> I will try coding some of this and see, because it seems to work in my head
> but that's no indication it'll work on the computer :)

For sure!  Here's my hacky simulation results for 1M commitment txs
with the mean and stddev in the brackets:

Num htlcs:              0-24(3.99+/-2.5)
Raw-encoding bytes:     0-576(95+/-60)
Backref-encoding bytes: 16-520(40+/-34)
Best-encoding bytes:    0-528(65+/-51)

Cheers,
Rusty.

// 2>/dev/null; set -e; OUT=/tmp/`basename $0 .c`; if [ ! -f "$OUT" ] || [ "$OUT" -ot "$0" ]; then gcc -g -Wall -o "$OUT".$$ $0 -lm && mv "$OUT".$$ "$OUT"; fi; exec "$OUT" "$@"
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>

/* For convenience, avoid allocation */
#define MAX_HTLCS 1000

/* Percentage chance of HTLC being removed. */
#define REMOVE_CHANCE 25
/* Percentage chance of HTLC being added. */
#define ADD_CHANCE 50

#define BYTES_PER_HTLC 24
#define BACKREF_BYTES 16

static void add_some_htlcs(unsigned int htlcs[], size_t *n_htlcs,
			   size_t *counter)
{
	/* Exponential distribution again. */
	while (random() < (RAND_MAX * (long long)ADD_CHANCE / 100)) {
		assert(*n_htlcs < MAX_HTLCS);
		htlcs[*n_htlcs] = *counter;
		(*counter)++;
		(*n_htlcs)++;
	}
}

static size_t calc_encoding(const unsigned int old_htlcs[], size_t n_old,
			    const unsigned int new_htlcs[], size_t n_new,
			    size_t prev_leaks)
{
	size_t o, n, num_new = 0, num_old = 0;
	size_t raw_bytes, bytes_using_backref;

	/* Figure out number only in new, only in old. */
	for (o = n = 0; o < n_old || n < n_new;) {
		if (o == n_old) {
			num_new++;
			n++;
		} else if (n == n_new) {
			num_old++;
			o++;
		} else if (old_htlcs[o] == new_htlcs[n]) {
			o++;
			n++;
		} else if (old_htlcs[o] < new_htlcs[n]) {
			num_old++;
			o++;
		} else {
			num_new++;
			n++;
		}	
	}

	raw_bytes = n_new * BYTES_PER_HTLC;
	bytes_using_backref = num_new * BYTES_PER_HTLC + BACKREF_BYTES;

	/* Num htlcs, raw-encoding, backref-encoding, encoding-used */
	printf("%zu,%zu,%zu,", n_new, raw_bytes, bytes_using_backref);

	/* Don't use backref if we would leak more than #current. */
	if (!num_old || num_old + prev_leaks > n_new) {
		printf("%zu\n", raw_bytes);
		return 0;
	} else {
		printf("%zu\n", bytes_using_backref);
		return num_old + prev_leaks;
	}
}

static void iterate(unsigned int htlcs[], size_t *n_htlcs, size_t *counter,
		    size_t *leaks)
{
	size_t i, num;
	unsigned int tmp_htlcs[MAX_HTLCS];

	/* REMOVE_CHANCE percentage chance of HTLC being closed. */
	for (i = 0, num = 0; i < *n_htlcs; i++) {
		if (random() < (RAND_MAX * (long long)REMOVE_CHANCE / 100))
			continue;
		tmp_htlcs[num++] = htlcs[i];
	}
	add_some_htlcs(tmp_htlcs, &num, counter);

	*leaks = calc_encoding(htlcs, *n_htlcs, tmp_htlcs, num, *leaks);
	memcpy(htlcs, tmp_htlcs, num * sizeof(htlcs[0]));
	*n_htlcs = num;
}

int main(int argc, char *argv[])
{
	size_t i, n_iter;
	unsigned int htlcs[MAX_HTLCS];
	size_t n_htlcs = 0, counter = 0, leaks = 0;

	printf("Num htlcs, raw-encoding bytes, backref-encoding bytes, best-encoding bytes\n");
	n_iter = argv[1] ? atoi(argv[1]) : 100;
	for (i = 0; i < n_iter; i++)
		iterate(htlcs, &n_htlcs, &counter, &leaks);

	return 0;
}