Conditional delete/update based on rows affected #2156

steve-chavez · 2022-02-07T03:12:36Z

Problem

Even with safeupdate, a DELETE /employees?salary=gt.1000 can still delete a large amount of rows.

We already have code for handling a special case of 1 affected row for update/delete, when using application/vnd.pgrst.object+json(CTSingularJSON).

postgrest/src/PostgREST/App.hs

Lines 569 to 574 in bd62008

    
           failNotSingular :: ContentType -> Int64 -> Wai.Response -> DbHandler Wai.Response 
        
           failNotSingular contentType queryTotal response = 
        
             if contentType == CTSingularJSON && queryTotal /= 1 then 
        
               do 
        
                 lift SQL.condemn 
        
                 throwError $ Error.singularityError queryTotal

We can extend this logic to accept an arbitrary number of affected rows and reject the request if it doesn't comply. With this a user can ensure that an expected number of rows will be updated/deleted.

Proposal

Use a new Prefer header to specify the affected rows, a couple of options:

Prefer: rows-affected=<number>

Specifies the affected rows in the header value. The default behavior we have being like Prefer: rows-affected=unlimited/infinity. A Prefer: rows-affected=1 would be equivalent to application/vnd.pgrst.object+json.

Prefer: rows-affected=same-as-limit

This would reuse the ?limit=<number> query param value. The default behavior would also be "unlimited". To do an equivalent application/vnd.pgrst.object+json the request would be curl -X DELETE "host/tbl?limit=1" -H "Prefer: rows-affected=same-as-limit". We could also have a rows-affected=less-than-limit to use the limit as a threshold instead of an exact value. Though maybe just a Prefer: rows-affected=limited with a behavior of rows affected <= limit would be enough.

With this option, to globally disallow unlimited deletes/updates, we could change the default Prefer to rows-affected=limited and combine it with max-rows-mutated=<number>(#2155). If users wants to do a full table delete, they'd need to be explicit and specify a Prefer: rows-affected=unlimited.

Notes

The Prefer RFC says:

A server that does not recognize or is unable to comply with
particular preference tokens in the Prefer header field of a request
MUST ignore those tokens and continue processing instead of signaling
an error.

Before I understood that as "Prefer cannot reject a request", but handling=strict says:

The "handling=strict"
preference can be used to indicate that, while any particular error
may be recoverable, the client would prefer that the server reject
the request.

So I think the first paragraph above is mostly saying that we should ignore unrecognized/conflicting prefers, we can reject requests with Prefer.

The text was updated successfully, but these errors were encountered:

wolfgangwalther · 2022-02-07T17:36:55Z

Overall, this seems quite complex. Not necessarily because of the proposed solution, but because of the problem itself, I guess.

Maybe we can find a way to make this whole thing simpler?

I think we don't need too many options. The basic motivation should be: "When sending a request for mutation, I should be able to tell PostgREST what kind of mutation (number of rows) I roughly expect. If the result differs from that (heavily), then the request should be rejected because it's assumed to be faulty."

I think this is also the basic idea behind db-max-rows, only that this is set server-side.

From my perspective, we should:

Disallow limit and offset on mutation requests entirely. Those are not useful and likely to lead to confusion.
Fix db-max-rows, so that it works as expected for mutation requests.
Simplify the suggested Prefer header and name it something like Prefer: max-rows=xxx - similar to the server option. The upper limit for this value would be the server-side db-max-rows, but the user can prefer to choose a lower value for this specific request. This makes it easier to understand, because things behave the same way.

I'm not sure whether I like the idea of using a Prefer header here in general, will have to think about it. At first glance it kind of "conflicts" with the accept single object header - two different ways to express the same intent. Maybe we can find a more similar way to express both.

So I think the first paragraph above is mostly saying that we should ignore unrecognized/conflicting prefers, we can reject requests with Prefer.

Agreed.

steve-chavez · 2022-02-07T18:31:38Z

Disallow limit and offset on mutation requests entirely. Those are not useful and likely to lead to confusion.

True, not useful, I think they were done like that because of consistency(there are tests too).

offset on mutations doesn't look useful at all, but limit could be in this type of scenarios:

https://blog.crunchydata.com/blog/simulating-update-or-delete-with-limit-in-postgres-ctes-to-the-rescue

Basically for perf reasons, deleting a large amount of rows can lock a table for too much time.

At first glance it kind of "conflicts" with the accept single object header - two different ways to express the same intent.

~~We could deprecate the vnd.pgrst.object+json in favor of the Prefer header, it's much clearer in its intent.~~ Scratch that, forgot that vnd.pgrst.object+json makes the result a json object(not a json array of 1).

steve-chavez · 2022-02-07T18:39:11Z

Maybe we can find a way to make this whole thing simpler?

The simplest thing could be to make ?limit on mutations to behave differently(breaking change), it would instead serve as a threshold of the affected rows, the request fails if surpassed.

steve-chavez · 2022-02-07T18:42:20Z

https://blog.crunchydata.com/blog/simulating-update-or-delete-with-limit-in-postgres-ctes-to-the-rescue

Or instead make limit behave like on the blog post: just limit the amount of deletes/updates but succeed. A prefer header can make the request fail instead.

wolfgangwalther · 2022-02-07T19:46:20Z

~~We could deprecate the vnd.pgrst.object+json in favor of the Prefer header, it's much clearer in its intent.~~ Scratch that, forgot that vnd.pgrst.object+json makes the result a json object(not a json array of 1).

This reminds me of #1655. Here we discussed how the accept header has two different things mixed together: The shape of the output + the "I need exactly 1 row".

Maybe we can decouple a few different things:

Disallow offset for mutating requests.
Re-implement limit for mutating requests as mentioned in the blog post. Just for batching - but no errors.
Use the Accept header strictly for the shape of the json response (this is basically what I asked for in Return 404 instead of 406 for json object response with no rows? #1655 - but without a replacement for the "exactly 1 row" aspect, yet)
Add some prefer headers that match db-max-rows closely, so that the user can lower their expected affected rows. This will throw an error if exceeded. ~~Question: Should the counting for that depend on the Prefer count header, too? Is the threshold comparing against db-max-rows currently doing so?~~
Add something like a Prefer min-rows header additionally. If you wanted to do the same thing as before with the single accept header, you could do Prefer: min-rows=1,max-rows=1 - but with a lot more flexibility.

wolfgangwalther · 2022-02-08T07:00:32Z

Add some prefer headers that match db-max-rows closely, so that the user can lower their expected affected rows. This will throw an error if exceeded. ~~Question: Should the counting for that depend on the Prefer count header, too? Is the threshold comparing against db-max-rows currently doing so?~~

Eh, I think I mis-remembered how db-max-rows works right now. db-max-rows is basically the server-side variant of ?limit right now, right?

So part of the suggestion would then be to change the way db-max-rows works for mutation requests, too. But that's discussed in #2155 I guess.

Edit: And that would still be inconsistent. To make it fully consistent even a GETrequest should error out when reaching db-max-rows. And that makes sense, too. After all db-max-rows is about preventing from malicious requests. Why would I want to return 1000 rows instead of an error, when I think the request is malicious?

steve-chavez · 2022-02-08T17:29:56Z

db-max-rows is basically the server-side variant of ?limit right now, right?

Yes, correct.

Disallow offset for mutating requests.
Re-implement limit for mutating requests as mentioned in the blog post. Just for batching - but no errors.

Cool, I agree.

To make it fully consistent even a GETrequest should error out when reaching db-max-rows. And that makes sense, too. After all db-max-rows is about preventing from malicious requests. Why would I want to return 1000 rows instead of an error, when I think the request is malicious?

Another benefit of db-max-rows is to prevent OOMs from happening(accidental requests on big tables),. An error might be too harsh when trying reads I think.

Add some prefer headers that match db-max-rows closely, so that the user can lower their expected affected rows. This will throw an error if exceeded.

How about reusing the specified ?limit here coupled with a Prefer: limit-handling=fail-if-surpassed(can be other name) to throw an error? That seems much simpler than duplicating the number in the header.

If you wanted to do the same thing as before with the single accept header, you could do Prefer: min-rows=1,max-rows=1 - but with a lot more flexibility.

Same could be done with ?limit=1 plus a Prefer: limit-handling=fail-if-not-equal as well.

It just doesn't seem right to add an additional header to specify a number when ?limit can already do the job.

In fact, we already have the Range header, that acts the same as limit, we could reuse that as well.

wolfgangwalther · 2022-02-09T08:26:27Z

Another benefit of db-max-rows is to prevent OOMs from happening(accidental requests on big tables),. An error might be too harsh when trying reads I think.

If a request is deemed as "accidental", it should not return success.

The problem with not returning an error if db-max-rows is hit: The user doesn't know about it. They might assume that they succeeded in getting all the data. Basically the same confusion as in #2155. I'm not even sure if the RFCs allow to repsond with a partial response to a request without range headers. Couldn't find anything right now in the RFCs.

How about reusing the specified ?limit here coupled with a Prefer: limit-handling=fail-if-surpassed (can be other name) to throw an error? That seems much simpler than duplicating the number in the header.

This is the part that I think makes it so complicated.

In fact, we already have the Range header, that acts the same as limit, we could reuse that as well.

And this is exactly the thing I think we should not do. The Range headers have quite clear semantics - and I don't think they make any sense for mutation requests. The RFC even says:

An origin server indicates response semantics by choosing an
appropriate status code depending on the result of processing the
POST request; almost all of the status codes defined by this
specification might be received in a response to POST (the exceptions
being 206 (Partial Content), 304 (Not Modified), and 416 (Range Not
Satisfiable)).

Clearly, POST is not meant to be used with range headers.

And I understand our limit parameter to be the querystring alternative to range headers - but other than that, they should be pretty much identical. We can't deprecate them, because nested limits can not be set via headers. But other than that, Range headers are the "right thing" to do instead, in a HTTP sense.

steve-chavez · 2022-02-09T17:03:18Z

If a request is deemed as "accidental", it should not return success.
The problem with not returning an error if db-max-rows is hit: The user doesn't know about it.

Yeah, I agree somewhat. However until now max-rows has been useful in production instances for keeping OOMs at bay, I wouldn't be keen on causing a breaking change there. I have another idea(basically running an EXPLAIN and rejecting results that are too wide) for an additional safeguard that would err instead of succeed, that could be evaluated as an alternative later.

The RFC even says:
almost all of the status codes defined by this specification might be received in a response to POST (the exceptions
being 206 (Partial Content) ..and 416 (Range Not Satisfiable)).

So that means a POST request can never return "206 Partial Content" and thus it shouldn't be limited with ?limit or Range.. that seems reasonable. If we abide by that then we shouldn't apply limit on POST and maybe not even on PATCH/DELETE.. thus not implementing the batching limit mentioned above.

Prefer: min-rows=1,max-rows=1

So with that we're left with a new header. Hm, max-rows seems confusing because the config max-rows has a different behavior. Maybe a Prefer: max-affected/mutated?

steve-chavez · 2022-02-09T17:10:42Z

Maybe a Prefer: max-affected/mutated?

Maybe just rows-mutated, that would fit with the max-rows-mutated proposed in #2155 (whatever name we choose, it should match with the global config)

wolfgangwalther · 2022-02-09T17:18:11Z

Yeah, I agree somewhat. However until now max-rows has been useful in production instances for keeping OOMs at bay, I wouldn't be keen on causing a breaking change there.

Hm. The change required from the client-side isn't much really - client requests would just need to add a ?limit=1000 to the querystring to get the same as before. This could even be done in a reverse proxy.

I think the current implementation of db-max-rows is just flawed and we should fix it, even at the cost of a breaking change.

Alternatively, we implement the same things with a different name and deprecate db-max-rows - to be removed later. But keeping `db-max-rows' as flawed as it is long-term.... I don't like that idea too much.

I have another idea(basically running an EXPLAIN and rejecting results that are too wide) for an additional safeguard that would err instead of succeed, that could be evaluated as an alternative later.

This reminds me of the discussion around aggregates and using query cost estimation to block queries, too. So there are for sure some things we can do to prevent malicious requests.

So that means a POST request can never return "206 Partial Content" and thus it shouldn't be limited with ?limit or Range.. that seems reasonable. If we abide by that then we shouldn't apply limit on POST and maybe not even on PATCH/DELETE.. thus not implementing the batching limit mentioned above.

Agreed.

steve-chavez · 2022-03-16T23:42:58Z

Since #2164 has gotten a bit complicated how about taking a different path.

Re-implement limit for mutating requests as mentioned in the blog post. Just for batching - but no errors.

The above behavior is useful(other RDBMS have it, it's a common use case in pg and is something that users would expect) so I'll go ahead and implement it. This will also mitigate the problem of updating/deleting too much which is really the main goal here.

The discussion about the RFC above regarding "206 Partial Content" is mostly adequate in the context of range headers but our limit is clearly a more advanced thing:

?limit is basically a range header on steroids, because it can be used for nested embeddings.

(From #2164 (comment))

steve-chavez · 2022-03-19T10:07:23Z

The limited update/delete technique from https://blog.crunchydata.com/blog/simulating-update-or-delete-with-limit-in-postgres-ctes-to-the-rescue has some limitations:

The ctid system column is not available on views(unless explicitly included).
Querying the ctid system column requires the SELECT privilege.

1 could be circumvented by inferring a PK or unique column from the view's source table, this would depend on the internal cache freshness.

2 doesn't have a way around unfortunately so UPDATE + SELECT privilege will be a requirement.

steve-chavez · 2022-03-21T14:44:13Z

2 doesn't have a way around unfortunately so UPDATE + SELECT privilege will be a requirement.

Actually it doesn't make much sense to have an UPDATE privilege without having a SELECT privilege on the PK at least - this is needed for updating particular rows otherwise it would all be full table UPDATEs. So this requirement seems fine.

steve-chavez · 2023-08-02T19:28:17Z

Continuing this on #2887, a conditional request looks more fit than a Prefer.

steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Feb 7, 2022

This was referenced Feb 10, 2022

max-rows affects bulk insert #2155

Closed

feat: max-changes prefer header to limit mutations #2164

Closed

steve-chavez mentioned this issue Mar 21, 2022

Add limited UPDATE/DELETE(tables only) #2195

Merged

3 tasks

laurenceisla mentioned this issue Mar 22, 2023

Ignore Range header when the method is different than GET #2710

Merged

4 tasks

This was referenced Aug 2, 2023

Prefer: handling=strict; affected for conditional requests based on number of affected resources #2887

Closed

Provide an accurate way to determine if the returned response is complete or partial #2776

Open

steve-chavez closed this as completed Aug 2, 2023

wolfgangwalther mentioned this issue Oct 19, 2023

Ordering embedded resources for PATCH request is broken #3013

Closed

rdlrt mentioned this issue Apr 8, 2024

asset_utxos does not accept Range header cardano-community/koios-artifacts#271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Conditional delete/update based on rows affected #2156

Conditional delete/update based on rows affected #2156

steve-chavez commented Feb 7, 2022 •

edited

Loading

wolfgangwalther commented Feb 7, 2022

Uh oh!

steve-chavez commented Feb 7, 2022 •

edited

Loading

Uh oh!

steve-chavez commented Feb 7, 2022

Uh oh!

steve-chavez commented Feb 7, 2022

Uh oh!

wolfgangwalther commented Feb 7, 2022 •

edited

Loading

Uh oh!

wolfgangwalther commented Feb 8, 2022 •

edited

Loading

Uh oh!

steve-chavez commented Feb 8, 2022 •

edited

Loading

Uh oh!

wolfgangwalther commented Feb 9, 2022

Uh oh!

steve-chavez commented Feb 9, 2022

Uh oh!

steve-chavez commented Feb 9, 2022

Uh oh!

wolfgangwalther commented Feb 9, 2022

Uh oh!

steve-chavez commented Mar 16, 2022 •

edited

Loading

Uh oh!

steve-chavez commented Mar 19, 2022

Uh oh!

steve-chavez commented Mar 21, 2022

Uh oh!

steve-chavez commented Aug 2, 2023

Uh oh!

Uh oh!

Conditional delete/update based on rows affected #2156

Conditional delete/update based on rows affected #2156

Comments

steve-chavez commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Proposal

Notes

wolfgangwalther commented Feb 7, 2022

Uh oh!

steve-chavez commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Feb 7, 2022

Uh oh!

steve-chavez commented Feb 7, 2022

Uh oh!

wolfgangwalther commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented Feb 9, 2022

Uh oh!

steve-chavez commented Feb 9, 2022

Uh oh!

steve-chavez commented Feb 9, 2022

Uh oh!

wolfgangwalther commented Feb 9, 2022

Uh oh!

steve-chavez commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Mar 19, 2022

Uh oh!

steve-chavez commented Mar 21, 2022

Uh oh!

steve-chavez commented Aug 2, 2023

Uh oh!

steve-chavez commented Feb 7, 2022 •

edited

Loading

steve-chavez commented Feb 7, 2022 •

edited

Loading

wolfgangwalther commented Feb 7, 2022 •

edited

Loading

wolfgangwalther commented Feb 8, 2022 •

edited

Loading

steve-chavez commented Feb 8, 2022 •

edited

Loading

steve-chavez commented Mar 16, 2022 •

edited

Loading