[Lightning-dev] Proof-of-closure as griefing attack mitigation

ZmnSCPxj ZmnSCPxj at protonmail.com
Wed Apr 1 06:19:03 UTC 2020


Given the fact that contracts on offchain protocols need to be enforceable onchain as well, timelocks involved in multi-hop payments are measured in blocks.
This is because the blockchain can only (third-party-verifiably) enforce timeouts in units of entire blocks.
This leads to very long timeouts for payment delivery, thus multi-hop offchain payment attempts can be, deliberately or accidentally, be in a "pending" state up to the very large timeouts involved.

Deliberately setting up a multi-hop payment such that it will be in a "pending" state for long periods of time is colloquially known as a "griefing attack".
In this article, we assess various proposed solutions to mitigate the effects of griefing attacks, and propose a particular solution, proof-of-closure, as well, that requires significant changes to the update state machine.

Digression: Why Grief?

Before embarking on our investigation for solutions to the griefing problem, we should first wonder if griefing is, in fact, a problem.

This brings up the question of: why would anybody grief at all?

Humans, like cats and other less-sapient pieces of walking meat, often find enjoyment in causing the suffering of others for no immediate direct gain to themselves, as a public demonstration of dominance over those they make suffer (aka "shits and giggles", which, if executed correctly, can lead to eventual direct gains to themselves or their progeny or relatives or allies, but such details are often outside the ken of the very beings who execute such survival strategies: brains are pieces of meat that have been hacked to act as action-reaction engines, but are not sophisticated enough to execute as pure rationality engines at all times).
Fortunately, in the Bitcoin world, only purely rational beings of pure selfishness can exist in the long run, thus we can neglect such motivations as mere noise.

First, let us investigate *how* griefing attacks can be performed.

* An intermediate node in a multi-hop attempt can delay forwarding or failing an incoming HTLC.
* A final node in a payment attempt can delay claiming an incoming HTLC.

Let us consider a purely rational intermediate node of pure selfishness:

* If it forwards as soon as possible, it can earn fees, and also speed up the release of the HTLC-locked funds so that they can reuse those funds as liquidity for further payment attempts.
* Thus, delaying an HTLC is not selfishly-rational for an intermediate node.

Thus, for an intermediate node, it seems there is no selfishly-rational motivation to execute a griefing attack on an arbitrary payment attempt.
We can then conclude that an intermediate that delays a payment would do so, not of its own rational self-interest, but as an accident, such as an unforeseen connectivity or power failure.

However, things are different when we consider a non-arbitrary payment.
Suppose a node were to make a payment attempt to itself, and deliberately delay claiming this self-payment.
This lets any single node, *who happens to own large liquidity*, to lock up the liquidity of other nodes.

The motivation to lock up the liquidity of other nodes is to *eliminate competition*.
Suppose we have a network as below:

    A -- B -- C
      \     /
       \   /
        \ /

When A and C want to transact with one another, they may choose to route via either B or E.
B and E are therefore competitors in the business of forwarding payments.

But suppose E has much larger channels AE and CE than the channels of AB and CB.
For example, suppose E has 100mBTC perfectly-balanced channels while B has only 10mBTC perfectly-balanced channels, as all things should be in simplified models of reality.
E can then "take out the competition" by making a 5mBTC self-payment along E->A->B->C->E and a 5mBTC self-payment along E->C->B->A->E, then refusing to claim the payment, tying up all the liquidity of the channels of B.
By doing so, it can ensure that A and C will always fail to pay via B, even if they wish to transact in amounts less than 5mBTC.
E thereby eliminates B as a competitor.

This demonstrates that griefing attacks will be motivated, such that such attacks will be performed by payers and payees *against intermediate nodes*.
Intermediate nodes have no motivation to attack payers and payees (those are their potential customers in the business of forwarding payments, and attacking potential customers is bad business: such attacking intermediate nodes will be removed economically in the long run).
However, payers and payees can become motivated to attack intermediate nodes, if the "payer" and "payee" are actually competitor intermediate nodes.

(We can observe that this is always a possibility even outside of Lightning: a service or product provider has no incentive to attack its customers ("the customer is always right"), but have an incentive to *pretend* to be a customer of a competitor and attack them.)

We will keep this fact in mind: active griefing attacks are attacks *on* intermediate nodes, not *by* intermediate nodes, because there is no economic incentive for intermediate nodes to attack their customers.

Previous Proposed Solutions

Time-Spent Reporting

At each channel along the route, the time spent by a node to handle its forwarding is recorded, and reported upstream in the route.

Unfortunately, this solution protects payers from intermediate nodes and payees: it does not protect intermediate nodes from colluding payers and payees.
Even if an intermediate node knows that a particular node is consistently slow via a previous time-spent report, it will not be able, with our current onion routing, determine if an onion packet it just received will or will not go through the known-slow node.
Thus, an intermediate node would not be able to defend against distant payees that, with a colluding payer, will not claim a particular payment.

As we have established, an active griefing atttack will never be deliberately performed by a selfishly-rational intermediate node.
Thus, this solution protects against the wrong thing: it protects payers against slow/unreliable intermediate nodes, it does not protect intermediate nodes against malicious payer/payee collusions.
It protects only against intermediate nodes that inadvertently go offline during forwarding, but such nodes will inevitably lose out on the forwarding market anyway, and will disappear in the long run.

Up-Front Payment

Payers pay for an attempt, not just the successful completion of an attempt.

A variation on this is that the payer (or payee) continuously pays as long as the payment is pending.
Further variations include paying by other means, such as just locking funds or paying with proof-of-work.

While it certainly erects economic barriers against payer/payee collusions attacking intermediate nodes, it *also* erects economic barriers against normal, non-malicious payments.

We can consider that economic barriers against non-malicious, low-value, high-frequency payments ("micropayments") may be enough that such payments become infeasible if we impose up-front payment for mere attempts.
Thus, while this solution is certainly something we can consider, we must be reluctant to use it due to its up-front, strict-evaluation behavior.


Observing the above, we want the properties for a "good" solution to griefing attacks to be:

* It should protect intermediate nodes against payer/payee collusions.
* It should only come into play upon detection of an attack.

We now present proof-of-closure, which (we hope) has the above properties.

We can consider instead a softer timeout, distinct from the HTLC block-based timeout.
This softer timeout is measurable in fractions of a second, e.g. units of 0.1 seconds.

Each node on the network advertises, in addition to a block-based `cltv_delta`, a `timeout_delta` in units of 0.1 seconds.
Further, each invoice contains, in addition to a block-based `final_cltv`, a `final_timeout` in units of 0.1 seconds.

Thus, there are two timeouts:

* The current "hard" block-based timeout that is enforceable onchain.
* A new "soft" sidereal-time-based timeout that is not onchain enforceable.

The soft timeout, as mentioned, is not enforceable onchain.
Instead, enforcement of the soft timeout *is* the act of putting the channel state onchain.

Now, for the current "hard" block-based timeout, we already have a reaction.
If the HTLC "hard" timeout is approaching:

* Drop the channel onchain and enforce the hard timeout onchain to reclaim the funds in the HTLCs.
* Wait for the onchain action to be deeply resolved (either timelock or hashlock branch is confirmed deeply) and report the result (success or fail) upstream.

What happens if the "soft" timeout is violated?

* Drop the channel onchain.
* Report the channel closure upstream.

The "hard" timeout is cancelled in any of these two conditions:

* A success is reported via `update_fulfill_htlc`, OR,
* A failure is reported via `update_fail_htlc` AND the HTLC is irrevocably removed from the latest commitments/state(s) of the channel.

The "soft" timeout is cancelled in any of these three conditions, the first two of which are the same as above:

* A success is reported via `update_fulfill_htlc`, OR,
* A failure is reported via `update_fail_htlc` AND the HTLC is irrevocably removed from the latest commitments/state(s) of the channel, OR
* A channel closure is reported.

Let us fill this in more detail.

Suppose we have a payment route A->B->C->E.

Both the "hard" block timeouts and the "soft" second timeouts decrement monotonically at each hop.
Thus, the payee E has the shortest "hard" and "soft" timeouts (as normal).

* Suppose E then delays claiming the payment and violates the "soft" timeout.
* C then drops the CE channel onchain.
* C reports, before its own timeout (slightly larger than the timeout imposed on E), the closing of the channel CE, to B.
* B validates this report, and if valid, propagates the report to A.
* A validates this report, and if valid, accepts that the payment will be "stuck" for up to the hard timeout it imposed on B.

C has to report back to B in order to prevent B from closing the BC channel, and B has to report back to A in order to prevent A from closing the AB channel.
The decrementing seconds-unit timeouts are needed for each hop, for the same reason that decrementing block-unit timeouts are needed.

Since E is motivated to attack intermediate nodes because it wants to redirect payment forwards through itself rather than its competitotrs, having one of its channels closed (which prevents it from being used for forwarding) is directly opposed to its end goal of getting more money, thus, we can believe the action of closing a channel involved in a griefing attack is sufficient disincentive.

The major drawback is that enforcement of the soft timeout *is* a channel closure, which is generally a negative for the network.
This is not a remote attack vector, since a node can only trigger this closure if it is able to stall the fulfillment or failure of an HTLC on a channel, which generally means the node triggering this closure can only do so for its own channels (or it is able to, via a separate mechanism, remotely crash a different node).

Proving Channel Closes

What C *really* needs to prove is that:

* It is *willing* to close a channel due to a violation of the soft timeout.
* The channel it is willing to close was, in fact, involved in the same payment attempt.

With the above, B can believe that C was innocent of wrongdoing, because:

* C would only be wiling to close a channel in case of a protocol violation, in this case, a violation of the soft timeout.
* The channel it closed was closed because of this payment attempt, and not because of another payment attempt, or some other unrelated channel being unilaterally closed.

First, what C needs to prove is *NOT*, in fact, actual channel closure: it needs to prove a *willingness* to close a channel.
Thus, it does not require the channel to actually be *closed* yet, i.e. it does not have to wait for onchain activity that the channel closure is in a mempool and is confirmed deeply onchain etc etc.

Thus, to prove a *willingness to close* rather than an actual close, C can provide the unilateral close of the channel CE.
The act of unilaterally closing a channel is the publication of the transaction(s) making up the unilateral close.
Thus, if C is *willing* to close the channel, it is willing to publish the transaction(s) involved, and thus, providing the unilateral close to B and further upstream, shows a willingness to close the channel.

B then validates the provided proof-of-closure by checking that the unilateral close transaction is either onchain, in the mempool, or that it spends a TXO that is not currently spent by another transaction.
In the case the unilateral close transaction is not confirmed and in the mempool, B can speed up its propagation on the Bitcoin layer by putting it in its own mempool as well --- after all, C is willing to close the channel to exonerate itself and punish the actual culprit, and B putting the unilateral close in its own mempool can only help C in what it is willing to do.

Secondly, C needs to prove that the channel it is willing to close involves the payment attempt, and is not some other channel closure that it is attempting to use to fulfill its own soft timeout.
Since the unilateral close transaction *is* the proof-of-closure, B (and A) can inspect the transaction outputs and see (with some additional data from C) that one of the outputs is to an HTLC that matches the payment hash.

Thus, B (and A) can believe that the proof-of-closure proves that whoever is presenting it is free of wrongdoing, as whoever is actually causing the delay has been punished (by someone being willing to close a channel with the culprit), and that the proof-of-closure commits to this particular payment attempt and no other (because it commits to a particular payment hash).

Further, if CE is closed by E dropping it onchain rather than C, C will still be able to fulfill its own soft timeout by taking the closing transaction from E, which should still contain the HTLC.
Indeed, neither A nor B will particularly care (nor need to know) who dropped the channel onchain, or (for A) that the channel participants are C and E.

Update State Shenanigans

Bitcoin update mechanisms are complicated things, and it may be possible for an attacking payee E to fool around with the update state machine to make it difficult for C to report a willingness to close CE.

In particular, I quote here the relevant passages from `lightning-rfc`, `02-peer-protocol.md`, which is an implementation of the Poon-Dryja update mechanism:

> Thus each update traverses through the following states:
> 1. pending on the receiver
> 2. in the receiver's latest commitment transaction
> 3. ... and the receiver's previous commitment transaction has been revoked,
>    and the update is pending on the sender
> 4. ... and in the sender's latest commitment transaction
> 5. ... and the sender's previous commitment transaction has been revoked

The payee E is the "receiver" in this context.

In this case, once the update has reached step 2, then E has a commitment transaction that it can put onchain, that contains an HTLC it can claim.
>From this step onward, C cannot send a failure (i.e. it cannot send back an `update_fail_htlc`) back to B, because E could drop its latest commitment onchain and claim the HTLC onchain.

However, until step 4, C does not have a unilateral close containing the HTLC, and thus cannot provide a proof-of-closure that contains an HTLC that refers to the payment.

Thus, between steps 2 to 4, C cannot safely respond to its own soft timeout.
C cannot respond with a failure, as E could then drop its latest commitment transaction onchain and claim the payment from C, and extract money from C that way.
C also cannot respond with a proof-of-closure, as it does not have a transaction that it can use to provide this proof.

The best that C can do would be to impose an even shorter timeout between steps 2 and 4 above, and to drop its current commitment transaction (which does not contain the HTLC yet and thus does not constitute a valid proof-of-closure) onchain.
In between the time it drops the commitment transaction and its own incoming soft timeout, there is a chance, however small, that this transaction will be confirmed, and the channel will (with high probability) settle in a state where the HTLC is not instantiated, thus C can safely fail its incoming HTLC (not show a proof-of-closure, since that is not possible for C to do) without risk of loss, just prior to its own soft timeout.

Of course, C is still at risk here: E could collude with miners via a side-channel fee offer to confirm its commitment transaction with the HTLC present, and ensure that C is liable for the HTLC value.

With Decker-Russell-Osuntokun, we can remove this risk by requiring a ritual as follows:

1.  C requests exclusive access to update their single shared state.
  * This can be done via a variety of sub-protocols, including a fair coin toss in case of near-simultaneous requests for exclusive locks on both sides.
2.  C provides the details of the new HTLC to E.
3.  C and E generate the new state transaction and exchange signatures for it.
4.  C and E generate (without signing) the new update transaction.
5.  E provides the signature (or share of signature, if MuSig) for the new update transaction to C.
6.  C provides the signature for the new update transaction to E, which releases the exclusive lock on the shared state atomically with the finalization of the new update transaction.

Prior to step 5, C can simply fail the incoming HTLC from B in case its own soft timeout is near.
Even if E performs step 5 after C has already failed the incoming HTLC from B, C can simply not perform step 6 and drop the channel onchain with the previous update and state transactions.

With Poon-Dryja, we will have to rearrange the order in which we perform things, effectively adding an extra communications turnaround between the participants.
Specifically, the order would have to be revised to:

> 1. pending on the sender
> 2. in the sender's latest commitment transaction
> 3. ... and the sender's previous commitment transaction has been revoked,
>    and the update is pending on the receiver
> 4. ... and in the receiver's latest commitment transaction
> 5. ... and the receiver's previous commitment transaction has been revoked

This allows the sender (C in our context) to provide a proof-of-closure after step 2, and before step 2, C can safely return a failure with `update_fail_htlc` (and refuse to proceed beyond step 2, thus ensuring it can still use the previous commitment that still has no HTLC).

Of course, this change will require redesigning the update state machine, increasing the number of communication turnarounds, and creating a subtle incompatbility when transitioning a payment from a hop that knows only the old update state machine to a hop that knows the new update state machine.

Purely Falsified Proof-Of-Closure

Of course, the attacking node E might want to create a false proof-of-closure.
E can do this by simulating a Lightning channel: lock an amount of funds in a 2-of-2 (where E controls both keys), then spend it in a set of transactions mimicking the unilateral close.

We observe, however, that the overhead of simulating a Lightning channel is the same as the overhead of actually creating and closing a Lightning channel.
Since the punishment of proof-of-closure is to force attackers to have their channels closed, we can consider that this simulation of a channel open and close is sufficient as well.


This sketch of proof-of-closure can be used for any update mechanism:

* With Poon-Dryja, C can use its own commitment transaction as the proof-of-closure.
* With Decker-Wattenhofer, C can give all the offchain transactions up to the last stage in the multi-stage decrementing-`nSequence` mechanism.
* With Deckker-Russell-Osuntokun, C can give the latest update and state trnsaction.

Basically, we expect that for now, and in the future, any update mechanism worth consideration will have a concept of "unilateral close" where a channel can be dropped onchain, using data that only one of the channel participants holds.

Such a unilateral close will be a sequence of one or more valid transactions, terminating in a transaction containing an HTLC-like contract in one of its outputs.

Thus, to validate the unilateral close, it is only required to validate all the transactions contained in the proof-of-closure, and see that the last transaction has an HTLC output.

The limitations are thus:

* The acceptable forms of HTLC would need to be agreed-upon by the entire network.
* Implementations would need to be able to assess, in a Bitcoin-consensus-compatible way, whether a transaction is valid or not.

Payment Decorrelation and Payment Points

Of course, having a single payment hash for the entire payment attempt is a privacy loss, which we intend to fix in the near future by using payment points, and adding a blinding scalar at each hop, aka. payment decorrelation.

Thus, in the future, there will not be any HTLC, but instead a PTLC.
Further, the payment point at each hop will be changed at each hop, in order to prevent decorrelation.

Thus, C needs to provide proofs:

* That an apparent singlesig on the unilateral close output is in fact a PTLC.
  C needs to provide:
  * A target point P.
  * A partial signature that would spend that singlesig for a particular sighash.
  * An adaptor signature which, with knowledge of the completed signature, adaptor signature, and sighash message, would have revealed the scalar behind P.
* That the PTLC belongs to the same payment attempt as what B offered to C.
  C needs to provide:
  * The C-only blinding factor that is the difference between the payment point of the B-to-C PTLC and the C-to-E PTLC on the unilateral close.

Then, when B needs to propagate the proof-of-closure back to A, B simply adds its own blinding factor to the reported blinding factor, in order to convince A that this is the same payment attempt.

As we have brought up privacy, we observe that, when this mechanism triggers, there is a mild privacy loss, in that intermediate nodes now know some channel closure that is related to this payment, and can thus determine the exact path that the payment attempt went through, at least until the channel being closed.
However, proof-of-closure is only propagated in case of violation of the soft timeout, so for normal non-malicious payments, proof-of-closure does not cause any privacy loss.

More information about the Lightning-dev mailing list