Skip to content

rules+firewall: rules and privacy mappings for channel creation #752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 9, 2024

Conversation

bitromortac
Copy link
Contributor

Based on #746

This PR adds functionality to litd in order to make automatic channel opens possible.

We add two new rules:

  • ChannelConstraint: to limit channel sizes
  • OnChainBudget: to limit onchain interactions with a per session budget
  • update the peer restriction rule to be active for channel opening endpoints

Automatic channel creation involves the following additional endpoints:

  • WalletBalance: check whether enough funds are available
  • PendingChannels: check if a channel is already pending with a node
  • ClosedChannels: for peer sanity checks
  • ConnectPeer: connect to a peer before opening
  • BatchOpenChannel: the request for opening

@levmi levmi requested review from ellemouton, guggero and ViktorTigerstrom and removed request for guggero May 2, 2024 14:16
Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass - looking soooo good 🔥 very clean.

Gonna need a second pass to just really get into the budget logic.

Main suggestion on first pass is to perhaps add s "private only" restriction to open-chan constraints?

Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work! Pretty much g2g I think. One main question though:

How does session linking worth with the on-chain budget? Does the same budget carry over & do we prevent them from providing a new one? or do we update the budget when a session is linked?

return &ChannelConstraint{
MinCapacitySat: channelBounds.MinCapacitySat,
MaxCapacitySat: channelBounds.MaxCapacitySat,
MaxPushSat: channelBounds.MaxPushSat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the new public/private allowed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think both are false here meaning no channels allowed?

should we make the default (false) => allowed & then rather have "NoPublic"/"NoPrivate"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does a call to OpenChan work with this as is given that both are false here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does a call to OpenChan work with this as is given that both are false here?

I think UnmarshalRuleValues is only used for sanity checking upon session registration, but the actual used final rule values are the ones from the server, so even if those are false, channel openings work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super sure what you mean?

we get the litrpc.RuleValue here and convert it to ChannelConstraint which we then use in AddAutopilotSession to determine the rules (caveats) to set in the macaroon for the session.

so am not sure what you mean by "the final rule values are the ones from the server"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt that bad then? that if the user was to set both to false, that channel opens would still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get the litrpc.RuleValue here and convert it to ChannelConstraint which we then use in AddAutopilotSession to determine the rules (caveats) to set in the macaroon for the session.

thanks for the clarification! I think I misunderstood the flow of the rules, was confused by the default values provided by the autopilot if no rules are requested. Registering with both values false will not be allowed here https://github.com/lightninglabs/lightning-terminal/blob/master/session_rpcserver.go#L1074, so I think it works as intended

@bitromortac
Copy link
Contributor Author

Really great work! Pretty much g2g I think. One main question though:

How does session linking worth with the on-chain budget? Does the same budget carry over & do we prevent them from providing a new one? or do we update the budget when a session is linked?

Thank you for the reviews 🙏! It's possible to change the budget, it's used as a mechanism to top up funds for a session, so we don't prevent it. We could enforce it that it's only allowed to be increased though.

Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice PR 🔥🚀!! Just like Elle says, I think this is pretty much ready to go, so great work 🎉!

I mainly added some test coverage request comments, and also commented one potential issue for the onchain_budget handling. Let me know what you think about that potential issue. But otherwise this looks great 🔥!

Comment on lines 237 to 236
handleOpenChannelRequest,
func(ctx context.Context,
r *lnrpc.ChannelPoint) (proto.Message, error) {

return nil, o.handlePaymentConfirmed(ctx)
},
handleOpenChannelError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is something we actually can/want to handle, but I thought I'd just point to a potential issue here with using the temp store in between requests and their responses, that could potentially allow on-chain spending above the rule bounds:

Since the temp store is cleared on every restart of litd, we MUST process the response for every successful request in order for it to actually be saved permanently and therefore also be used in the calculation for the current spent on-chain budget after litd is restarted.

Now, if either:

  1. litd is shutdown after a successful request has been sent, but before we've had time to process the response
  2. The processing of the response errors (causing a shutdown)

The spent on-chain amount will be lost and will not be used for the future on-chain budget calculations once litd is restarted.

Like I said though, I'm not sure if these edge cases is something we should address, but I just to make you aware of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is (a minor) issue. I'm not sure how that would be solved, because one would need a message buffer on lnd that would persist them in order for litd to be able to ack them? So unless we don't find an easier solution, I'd suggest to keep it this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree! Just thought it be worth pointing out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point and I do think we should consider handling it. In the accounts system, we are "restart" proof so probably worth trying to be restart proof here too if possible.

for the accounts system, we can do this quite easily cause we know the payment hash to track before sending the request through, so then on startup, we call LND to see if the payment with that hash did in fact go through.

So is there any way here that we can force there to be some identifier? For example, I see that OpenChannelRequest has an optional Memo string field and same for `BatchOpenChannel. Can we not force the caller to set these (to something unique that we have not seen before) and then on startup, can we then not list channels & pending channels and then subscribe to channel open events to continue tracking these? For example I see that the Channel open event does contain the memo field.

If we do this then we also would not use the temp store. Instead we would have some kind of permanent "memo -> amount" map that we can check on startup.

thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored the idea of using the memo to keep track of pending channel opens (see commits after the itest). A missing piece is to add the memo field to the ClosedChannels api. With that we could then delete any pending actions for which we don't find an outcome (so that must have either been some internal failure to open the channel or a channel open was abandoned). In my opinion, the complexity is not really worth it. I think we should not use the temporary store, but persist all pending actions and remove them once we get a response/error. This way we are conservative and don't allow overspending. In the worst case this leads to a decreased budget, which may lead to UX issues, but I would consider a session to be something ephemeral and a possible problem can be fixed by just creating a new (non-linked) one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the complexity is not really worth it. I think we should not use the temporary store, but persist all pending actions and remove them once we get a response/error

agreed. but we should have some identifier persisted with them that is restart safe (ie, not the request ID as used today). We want to make sure that whatever we do now we can potentially recover from in future.

but I would consider a session to be something ephemeral and a possible problem can be fixed by just creating a new (non-linked) one.

ah yes that is a good point!

Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work 🔥🚀🔥!

Question, do we want to merge this before deciding to marge automatic channel opens, in case we'd find out that more functionality is needed?

Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we definitely need itest coverage here. I think there are a few things slipping through the cracks that would be picked up via an itest.

Namely:

  1. a OpenChannel request wont currently make it through at all since the rule enforcer doesnt currently allow streaming calls. ALso - calls to OpenChannel currently look like they succeed from the PoV of autopilot when they actually do not.
  2. The privacy mapper doesnt currently have support for the OpenChannel and OpenChannelSync calls.

I put together a rough itest demonstrating these things:
itest.patch

return &ChannelConstraint{
MinCapacitySat: channelBounds.MinCapacitySat,
MaxCapacitySat: channelBounds.MaxCapacitySat,
MaxPushSat: channelBounds.MaxPushSat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super sure what you mean?

we get the litrpc.RuleValue here and convert it to ChannelConstraint which we then use in AddAutopilotSession to determine the rules (caveats) to set in the macaroon for the session.

so am not sure what you mean by "the final rule values are the ones from the server"?

return &ChannelConstraint{
MinCapacitySat: channelBounds.MinCapacitySat,
MaxCapacitySat: channelBounds.MaxCapacitySat,
MaxPushSat: channelBounds.MaxPushSat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt that bad then? that if the user was to set both to false, that channel opens would still work?

@@ -0,0 +1,354 @@
package rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need an itest for this too


// TestChannelConstraintCheckers ensures that the ChannelConstraint values
// correctly accepts or denies a request.
func TestChannelConstraintCheckers(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that none of these tests cover the OpenChannel call (only the sync one) and since OpenChannel is a stream call, it currently just lets any call through....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either need to start intercepting stream calls or we should remove the OpenChannel handling & explicitly dissallow it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently im also finding that these open chan calls (except for BatchOpenChannel) are not supported by the privacy mapper. So they can only work if the "no privacy" option is set on the session (ie, privacy flags dont matter here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing this 🙏. Right, the OpenChannel and OpenChannelSync calls were not fully implemented and tested. I removed OpenChannel (and disallowed any streaming requests), but added the privacy functionality for OpenChannelSync.

@bitromortac
Copy link
Contributor Author

Thanks a lot for the suggestions and the itest @ellemouton 🙏 💯.

I removed the OpenChannel streaming RPC functionality, which was not fully built out and added privacy mapping for OpenChannelSync.

During itests I found that there was an issue when combining other rules with the budget rule, which is stateful. It may happen that a request adds a temporary budget reservation while another rule is violated, resulting in the reserved budget not being removed. I tried to fix this by adding a RollbackRequest method to the Enforcer interface, which is called after any rule is violated for a request. In principle we would need to add something similar to commit a balance reservation after knowing that all response rules were followed. We don't have restrictions on responses right now which is why I didn't add that.

@bitromortac bitromortac requested a review from ellemouton May 31, 2024 14:11
@lightninglabs-deploy
Copy link

@ellemouton: review reminder
@bitromortac, remember to re-request review from reviewers when ready

Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the itest 🙏 definitely improves the confidence here!

Two main comments from me on the latest review


// RollbackRequest reverts the enforcer's state after errors.
RollbackRequest(ctx context.Context, uri string) error
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this if we have HandleErrorResponse?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooh i see: this is for when the request doesnt get through all the enforcers and errors on one later down the stack.

The only issue I see here is: what about the case when it isnt another rule interceptor that errors on the request but instead another interceptor (like: accounts system).

So im wondering if we dont instead need take a more general approach here so that we can trigger the other possibly stateful interceptors to also roll-back.

Ie, if we encounter an error on the request, can we somehow turn that into an error response instead so that layers above can know to roll-back too?

Copy link
Contributor Author

@bitromortac bitromortac Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ie, if we encounter an error on the request, can we somehow turn that into an error response instead so that layers above can know to roll-back too?

I think that interceptors handle sequentially. So if a validation error happens in the rule enforcer, the error is converted to a RPCErr. Does this error then propagate back through the other interceptors that were already passed? If so, then I think everything works as expected and each stateful interceptor should handle a failure case its own way

Comment on lines 237 to 236
handleOpenChannelRequest,
func(ctx context.Context,
r *lnrpc.ChannelPoint) (proto.Message, error) {

return nil, o.handlePaymentConfirmed(ctx)
},
handleOpenChannelError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point and I do think we should consider handling it. In the accounts system, we are "restart" proof so probably worth trying to be restart proof here too if possible.

for the accounts system, we can do this quite easily cause we know the payment hash to track before sending the request through, so then on startup, we call LND to see if the payment with that hash did in fact go through.

So is there any way here that we can force there to be some identifier? For example, I see that OpenChannelRequest has an optional Memo string field and same for `BatchOpenChannel. Can we not force the caller to set these (to something unique that we have not seen before) and then on startup, can we then not list channels & pending channels and then subscribe to channel open events to continue tracking these? For example I see that the Channel open event does contain the memo field.

If we do this then we also would not use the temp store. Instead we would have some kind of permanent "memo -> amount" map that we can check on startup.

thoughts?

@ellemouton
Copy link
Member

re-requesting from @ViktorTigerstrom since quite a bit has changed

@bitromortac bitromortac force-pushed the autoopen branch 3 times, most recently from 2df24e6 to fc0d5f7 Compare July 1, 2024 09:29
Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great, awesome job 🚀fire:!!!!

Just leaving one small suggestion in my comments below. Other than that, there's one last thing I noticed which wasn't really added by this PR:
If obfuscating an amounts field, the exact returned varies for every API request. This behaviour is different than how our normal obfuscation works, as we for example will return the same obfuscated pubkey with every unique API request in the firewall. I'm not sure if this is something we should address for now, but just thought I'd brought it up!


// removeReqId removes the request ID from the memo if present.
func removeReqId(memo string) string {
parts := strings.Split(memo, ":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth already implementing removal of the memo in by looping the over the parts and to look for the onBudget prefix to avoid that older versions of litd in the future leak extra privacy, and to have a reference how it should be done if we implement useage of the memo field for fetching the reqId in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented that idea and tested the current prefix together with a different one

// TestRemoveMemo tests that request identifiers are correcly removed from the
// memo string.
func TestRemoveMemo(t *testing.T) {
tests := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you implement my suggestion above, I'd add another test that shows that you can have another part before the onBudget-connID-reqID part.
If you decide not to do it, I'd probably add another test to show that it's intended behaviour that we can't have another part before it currently.

Copy link
Member

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 🔥

A couple of minor suggestions

Comment on lines 258 to 276
_, err := rule.HandleErrorResponse(ctx, ri.URI, nil)
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm if we return here then we dont undo all the persisted changes.... perhaps log instead and then return later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point 🙏

Comment on lines 33 to 39
// spendAmtKey is the key that will be used in the persisted KV store to
// store the total amount that has been spent.
onChainSpent = "spent-amt"

// pendingSpentKey is the key that will be used in the persisted KV
// store to keep track of the total pending spent amount.
onChainPending = "pending-spent"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable names dont match those in the comments

Comment on lines +51 to +60
func (o *OnChainBudgetMgr) Stop() error {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely pretty easy to test? just extend TestOnChainBudgetCheckRequest: call a request, call Stop - see that it waits, then call response & see that Stop exits

Comment on lines +51 to +60
func (o *OnChainBudgetMgr) Stop() error {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a bit complicated to actually implement though afaict since what happens if LND stops before lit? we would need to hook in to that to know which requests we can just not wait for


// OnChainBudgetEnforcer enforces requests and responses against a
// OnChainBudget rule.
type OnChainBudgetEnforcer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. Worth noting that we cant really rely on the memo field not changing though. So this is a nice to have, temp measure. Worst case scenario - user just links a session and updates the budget

@@ -0,0 +1,785 @@
package rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this is a good time to start a doc where we explain small details like the memo field so that it is easy to recap in future?

so perhaps let's add a rules/docs/onchain_budget.md where we explain some things here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I added a small doc, let me know if I missed anything there

bitromortac and others added 16 commits July 2, 2024 15:04
We collect errors of all rule enforcers, handling errors in all of them
should an error occur. This is to roll back state consistently.
This allows us to rewrite a request.
We pass a random lnd connection identifier to the rule enforcer that is
unique per lnd connection lifetime. It is used to generate unique
request identifiers that amend the non-unique request identifiers that
are passed from lnd.
This adds an on-chain budget that handles requests and responses for the
following endpoints:

* OpenChannelSync
* BatchOpenChannel

The budget rule checks that the onchain fee rate is not violated and
that pending and confirmed amounts are handled correctly.

An edge case can occur when lit crashes or shuts down after the budget
rule has forwarded a request but didn't receive a response yet. The
pending budget is not removed in case the request didn't go through and
is not accounted towards the spent budget in case the request did go
through. To be able to handle these cases in the future we add a unique
identifier to the request, that can be checked by calling LND's channel
bookkeeping APIs. Not all of them expose the identifier yet, which is
why pending actions cannot be deleted yet. This leads to underspending
of the budget and can be fixed by user intervention by creating a new
session. The memo prefix is removed when reading forwarding the
bookkeeping requests for privacy reasons.
We add a new channel constraint rule that enforces limits on channel
opening for the OpenChannelSync and BatchOpenChannel end points.

The channel constraint rule takes care about:
* min channel size
* max channel size
* max push amount
* no closing address was used
We add a peer restriction for channel opening for OpenChannelSync and
BatchOpenChannel.
Channel policy boundaries are enforced for channel openings.
Pull out transaction related constants to the top of the test.
Adds a debug comment that is useful for this code. It is often needed to
check the human readable representation of a message.
For closes we need to know the close type and settle balances to know
which peers should be avoided in the future.
Only obfuscate pending open channels for now.
We obfuscate fields from the batch channel open requests and
responses.
Also adds a privacy flag that controls obfuscation of network addresses.
@bitromortac bitromortac force-pushed the autoopen branch 2 times, most recently from 3ff4876 to e6bd797 Compare July 4, 2024 09:57
@guggero guggero merged commit ef1ddc9 into master Jul 9, 2024
13 checks passed
@guggero guggero deleted the autoopen branch July 9, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants