Skip to content

query cancel #1392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

christophemacabiau
Copy link
Contributor

The cancel method seems not to be working anymore, and the test has disappeared.
ps: the new documentation node-postgres.com is great! congratulations. If you need it I can write some doc on client.cancel

var Query = helper.pg.Query

// before running this test make sure you run the script create-test-tables
test('simple query interface', function () {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t accept a done callback or return a promise, so its assertions won’t fail at the right time. The assert(true) can’t fail at all and doesn’t confirm that row is called. Both queries are queued, so the important cancellation path isn’t tested at all. (Actually, the first one is queued too, because the test doesn’t wait for the client to connect.)

Queries spliced out of the queue should probably reject/emit an error like they would if cancelled while running. Then this test can use the promise API and:

  • wait for the connection to complete
  • start a query, asserting that it resolves
  • start two more queries, cancelling them both and asserting that both reject

@christophemacabiau
Copy link
Contributor Author

You are right! both queries are queued :-(
I have rewritten the test, is it better for you?

lib/client.js Outdated
@@ -379,6 +377,8 @@ Client.prototype.query = function (config, values, callback) {
rejectOut = reject
})
query.callback = (err, res) => err ? rejectOut(err) : resolveOut(res)
} else {
result = query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn’t be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a reference to the query to be able to cancel it later. Is this breaking something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was removed intentionally and doesn’t work with promises. I’m not sure what the right fix is, though… requiring the Query object to be passed explicitly like in the tests? Or a separate method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that if you want to cancel a request, you don't use the promise api. Most of users don't cancel queries, the few people who need to cancel will have to use the callback api. That seemed to me to be the best tradeoff.

In my application I have written a wrapper around the callback api to use promises. Cancelable queries resolves with something like:

{
  data: Promise { <pending> },
  cancel: [Function: bound cancel]
}

But this is more complex for users; the cancelable queries api become different from the "normal" queries; and maybe it's not the pg responsibility to do this kind of things (too high level?). And the wrapper itself needs to keep references to query...

Do you remember why returning the query was removed?

@christophemacabiau
Copy link
Contributor Author

@charmander have you seen my reply to your comment?
I have added the two lines in client.js to keep a reference to the query. I can try to help you if you have any question.

@sehrope
Copy link
Contributor

sehrope commented Sep 4, 2017

I don't think we need to change the return type of .query(...). Instead the Query object being passed in needs to have a .cancel() function that operates on its connection, or, a new SimpleQuery can be created that has just that (simpler / less code impact).

That would still require an extra step for a user's queries to be cancellable (i.e. use Query instead of just a string) but as you've noted, that's the exception not the common case. Most people don't cancel their queries.

On the plus side it'd be pool friendly as the .cancel() would now be bubbled up to the top Query. Otherwise the user does not have access to a .cancel() as they don't have access to the specific Client being used to execute the query.

@christophemacabiau
Copy link
Contributor Author

We can't use the connection of the query to cancel it (see the postgresql doc https://www.postgresql.org/docs/9.3/static/protocol-flow.html#AEN99752). We have to open a new connection.
Not sure to see what is the problem with the return type of .query: it's already returning the config object, or a promise, or undefined. I change the returned value only when undefined (in the current code), so this won't break nothing?
Furthermore I didn't want to break the old cancel api, which was already working this way.

Can you explain me what is the problem with the return value of .query?

@sehrope
Copy link
Contributor

sehrope commented Sep 4, 2017

We can't use the connection of the query to cancel it (see the postgresql doc https://www.postgresql.org/docs/9.3/static/protocol-flow.html#AEN99752). We have to open a new connection.

I'm not saying to reuse the connection. I'm saying the Query object would have a .cancel property that would be a function that issues a cancel for that query. The function is mean to be a way to expose the cancel functionality outward in a situation where the user does not have access to the original client which is needed to know where to issue the cancel (i.e. same host / port but with a cancel request rather than login). When issuing a query from a pool the user does not have access to the original Client object as it's abstracted away by the pool.query(...) function (to simplify common resource management).

Can you explain me what is the problem with the return value of .query?

The previous work for 7.0 (see #1299) normalized the return type of .query(...) to be either:

  • A Promise if no callback was specified (presumed to be the common case)
  • The first parameter if a Query type object was specified (i.e. rather than just text)
  • Undefined if a callback was specified

The return type also shouldn't very depending on whether it's invoked a pooled connection or a stand alone Client.

Your PR would break that up again with a special case for cancel. Rather than supporting it as a special case, I'm suggesting we keep the current behavior and expose the method through the Query object itself. It'd likely require some internal work to add the hooks but longer term I think keeping the .query(...) function consistent with the above logic is a good idea.

@sehrope
Copy link
Contributor

sehrope commented Sep 6, 2017

Yes something like that. Haven't had a chance to really review it yet though (just eyeballed it a bit).

I wonder if it'd be worth having the Query keep track of whether it actually has completed prior to issuing the cancel. PostgreSQL is kind of annoying in that cancellation isn't tied to queries, it's a basic message that says, "Cancel whatever connection XYZ is doing", even if the original bad/slow/hanging action is no longer the one that's running. It's impossible to completely work around that race condition but we could make it a bit more robust by having the Query check if it's still running on the client side prior to issuing the cancel.

@christophemacabiau
Copy link
Contributor Author

From the postgres server point of view, it's not a problem

The cancellation signal might or might not have any effect — for example, if it arrives after the backend has finished processing the query, then it will have no effect. If the cancellation is effective, it results in the current command being terminated early with an error message.

So we can assume that when canceling a request, the user must not expect the ERROR: 57014: canceling statement due to user request
Is your idea to avoid to create a useless client/connection?

@sehrope
Copy link
Contributor

sehrope commented Sep 6, 2017

I mean that it's a problem for an app relying on a cancel impacting a specific query. If you're using a pooled connection it can be a problem because you can end up cancelling a different command subsequently executed using the same client.

Example:

  1. Fetch connection A from pool
  2. Execute query X on connection A
  3. Issue cancel for query X... i.e. issue cancel for connection A
  4. Query X finishes and connection A is returned to pool
  5. Connection A is given to a different part of the app
  6. Query Y begins executing on connection A
  7. PostgreSQL server begins processing cancel request (remember everything is async...)
  8. PostgreSQL server cancels query Y

Now there's no way to avoid this because of how the v3 protocol works but you can slightly avoid it by ignoring the cancel request if we know the query is already finished. Specifically if step 3 above happens after the client is notified of the completion (i.e. step 4) then we can safely skip the cancel. Assuming it's meant to be query driven and not client driven.

@christophemacabiau
Copy link
Contributor Author

native version added

@christophemacabiau
Copy link
Contributor Author

You are right, but we cannot do a lot to avoid it. Canceling a query is an exception, and your case is an exception of the exception :-)
We might check again if the query is still active just before sending the cancel message to the connection in Query.prototype.cancel:

connection.on('connect', function () {
  if (runningClient.activeQuery === self) {
    connection.cancel(self.client.processID, self.client.secretKey)
  }
})

I think it's the latest we can do it. Or can we do better?

@christophemacabiau
Copy link
Contributor Author

@sehrope @charmander any progress on this issue? we need to fix this regression...

lib/query.js Outdated
const connection = cancelingClient.connection
connection.connect(runningClient.port, runningClient.host)
const self = this
connection.on('connect', function () {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use arrow function and remove const self = this.

reject();
})
query1.on('error', (err) => {
assert.equal(err.code, 57014);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could define the error code above in a constant, so that it's not so much of a magic number.

@christophemacabiau
Copy link
Contributor Author

Thank for your review @nicolasroger17

@nicolasroger17
Copy link

You're welcome.

@christophemacabiau
Copy link
Contributor Author

@brianc @sehrope @charmander
Sorry to insist but do you plan to commit this pr one day or another? I fixed this regression in july, with the changes asked by @sehrope it's the third version of cancel (with my initial commit of 2011!) I have written, and it's still not fixed. I am not the only one needing that feature.

At least can you tell us what is the problem?

@yocontra
Copy link
Contributor

yocontra commented Jan 28, 2018

Don't usually like to +1 or ping, but this ticket has been open for a while. Anything I can help out with to get this merged? We're spending a lot of $$$ on RDS bills for queries that could be killed when a request ends before the query is done (user closes the page, clicks around, etc.).

@charmander
Copy link
Collaborator

Sorry to raise this after the API has already been rewritten once, but: I really think cancellation should, to prevent surprises, reflect the way it actually works – i.e. on a connection, not a query. If my understanding is correct, both the original implementation and this PR suffer from a race condition that could end up cancelling an arbitrary future query made by the client instead of the one explicitly referenced.

So, proposed alternative:

  • remove Client.prototype.cancel
  • add a method with a different name to Client.prototype that only has a client as a parameter and always sends a cancellation message to that client using the context client’s connection, producing an error if the context client is not connected

Then, to make typical use convenient and correct, Pool can have a cancel method added to it that:

  • requires the client being cancelled to have come from the pool
  • checks out a client from the pool and uses it for cancellation (potentially involving a separate pool limit to avoid cancellation being blocked on the completion of the very long-running queries that would be cancelled)
  • prevents the client being cancelled from returning to the pool until the cancellation is complete (this depends on it being possible to tell when cancellation is complete, however – if it isn’t, discard the target client, since the pool will be replenished by the client used to cancel anyway?)

making it explicit that what’s cancelled is whatever query the client is executing when the message reaches it while allowing pools to continue working. There’s some extra overhead in making a new full connection to send the message, but reduced overhead if the pool has a free client.

This should definitely be tested with SSL, too! I will try adding that to CI, and am up for implementing this afterwards if people think it’s a good idea. (@sehrope?)

@contra I’m not sure that brianc is available anyway, though, so in the meantime you can approximate this by getting the PID in advance for each client running a query you’d potentially want to cancel with SELECT pg_backend_pid(), then sending SELECT pg_cancel_backend($1) to cancel it.

@sehrope
Copy link
Contributor

sehrope commented Jan 29, 2018

Sorry to raise this after the API has already been rewritten once, but: I really think cancellation should, to prevent surprises, reflect the way it actually works – i.e. on a connection, not a query.

+1

If my understanding is correct, both the original implementation and this PR suffer from a race condition that could end up cancelling an arbitrary future query made by the client instead of the one explicitly referenced.

Yes that's a problem with PostgreSQL query cancellation in general, not just node-postgres. To issue a cancel you specify only the process ID of the backend executing the query you'd like to cancel. You don't specify anything specific to the query itself. Whatever happens to be running at the time the cancel gets processed will be cancelled. That could be the original query (if it's still running), nothing (if it finished and the connection is idle), or some subsequently executed query.

The query cancellation is also asynchronous on the backend so the completion of the issuance of the request doesn't mean anything was (or will be) cancelled. The issuer of the cancel doesn't get any feedback whatsoever: https://www.postgresql.org/docs/current/static/protocol-flow.html#idm46428663888448

So, proposed alternative:

  • remove Client.prototype.cancel
    add a method with a different name to Client.prototype that only has a client as a parameter and always sends a cancellation message to that client using the context client’s connection, producing an error if the context client is not connected

For the cancellation itself you have two choices. You can either use a different connection and issue a SELECT pg_cancel_backend(:processId) or use the v3 protocol cancellation message which includes the to-be-cancelled process ID and a secret token (4 random bytes) associated with that connection. The existing cancel code and all the other drivers I've worked on use the protocol level cancellation mechanism. For that you wouldn't reuse an existing connection, you'd create a new one but issue a cancel instead of a startup message.

Then, to make typical use convenient and correct, Pool can have a cancel method added to it that:

  • requires the client being cancelled to have come from the pool

Mixing cancellation and connection pools is pretty messy. It's likely doable but there's a lot of bookkeeping involved to keep things consistent. This gets particularly messy due to the client query queue (in memory, not sent to the backend) and the pool handing out raw connections (so after pool return you can shoot yourself in the foot if you try to reuse the connection).

  • checks out a client from the pool and uses it for cancellation (potentially involving a separate pool limit to avoid cancellation being blocked on the completion of the very long-running queries that would be cancelled)

Per above, most other drivers open a new connection for cancellation. Presumably cancellation isn't something that happens frequently enough to warrant a connection pool. Plus using a pooled connection runs the risk of the connection being tainted by something else (ex: dangingly errant transaction) which would cause the pg_cancel_backend(...) call to error.

  • prevents the client being cancelled from returning to the pool until the cancellation is complete (this depends on it being possible to tell when cancellation is complete, however – if it isn’t, discard the target client, since the pool will be replenished by the client used to cancel anyway?)

Yes you have to discard it. I don't even think you can track the error responses on the original connection as it's possible that someone else cancelled the query. The only safe thing to do with a cancelled pooled connection is to close it.

making it explicit that what’s cancelled is whatever query the client is executing when the message reaches it while allowing pools to continue working. There’s some extra overhead in making a new full connection to send the message, but reduced overhead if the pool has a free client.

A better separation of the client / pool would simplify a lot of this. See #1414 (comment) for some detail on what I mean. With that approach the .cancel() function on a PoolClient could handle things a bit differently than the real method on a Client (by say discarding the connection).

This should definitely be tested with SSL, too!

+1. SSL all the things!

I will try adding that to CI, and am up for implementing this afterwards if people think it’s a good idea. (@sehrope?)

I want to take a stab at the Client / PoolClient separation I mentioned above as I think it'd simplify implementing a saner .cancel() and prevent a lot of footgun issues with pool client reuse. Not sure when I'll get around to that though. I'd want to get rid of the client query queue too. I'll write up my thoughts on that in a separate issue to get some feedback as don't want to make semver major changes, which that certainly be, unless we agree they make sense.

@contra I’m not sure that brianc is available anyway, though, so in the meantime you can approximate this by getting the PID in advance for each client running a query you’d potentially want to cancel with SELECT pg_backend_pid(), then sending SELECT pg_cancel_backend($1) to cancel it.

+1 to this approach if you want more fine grained control over the cancellation. I'd also suggest manually managing your connections. You can still use the pool as a source of connections but do explicit checkout / checkin so that you only return connections that you're done with and know for certain will not have async cancels invoked on them. Anything that's been cancelled should be discarded. Or just use explicit clients without pools.

@charmander
Copy link
Collaborator

For the cancellation itself you have two choices. You can either use a different connection and issue a SELECT pg_cancel_backend(:processId) or use the v3 protocol cancellation message which includes the to-be-cancelled process ID and a secret token (4 random bytes) associated with that connection. The existing cancel code and all the other drivers I've worked on use the protocol level cancellation mechanism. For that you wouldn't reuse an existing connection, you'd create a new one but issue a cancel instead of a startup message.

Mixing cancellation and connection pools is pretty messy. It's likely doable but there's a lot of bookkeeping involved to keep things consistent. This gets particularly messy due to the client query queue (in memory, not sent to the backend) and the pool handing out raw connections (so after pool return you can shoot yourself in the foot if you try to reuse the connection).

The idea was to send a cancellation message on a regularly-connected client to allow that client to be taken from and returned to the pool… if it’s valid to send that message after a normal startup. I didn’t check. But this was only to allow…

Presumably cancellation isn't something that happens frequently enough to warrant a connection pool.

… limits to be enforced if the cancellation could be triggered by some outside action. Maybe the user can handle that.

Plus using a pooled connection runs the risk of the connection being tainted by something else (ex: dangingly errant transaction)

(This is a pool bug, though.)

Yes you have to discard it. I don't even think you can track the error responses on the original connection as it's possible that someone else cancelled the query. The only safe thing to do with a cancelled pooled connection is to close it.

At least that makes the implementation straightforward! Thanks.

I'd want to get rid of the client query queue too.

Very much looking forward to this.

I’ll do that SSL CI thing in the meantime, then.

@arthurfiorette
Copy link

Any updates? there's still no way to cancel the query...

@charmander
Copy link
Collaborator

@arthurfiorette There’s still pool.query('SELECT pg_cancel_backend($1)', [clientToCancel.processID]).

@brianc
Copy link
Owner

brianc commented May 30, 2024

@arthurfiorette honestly what @charmander is suggesting is almost as good as a built-in cancel function. read up on postgres and how it does (or doesn't) cancel queries & you'll see how. I'm going to implement this evenually but it's not high priority given how race-condition prone and edge-case canceling queries in code is (obviously you want to cancel queries in code but a built in cancel funciton is SO fraught with edge cases you might as well just send a cancel query in on an avail client)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants