-
Notifications
You must be signed in to change notification settings - Fork 103
[sql39]: Actions schemas, queries and SQL store impl #1072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In this commit we define the schema for the `actions` table along with various queries we will need for interacting with the table.
In this commit we let the firewalldb.SQLDB implement the ActionsDB. We also ensure that all the action unit tests now run against the SQL impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks great, really awesome work 🔥🚀!!
Leaving some comments below, where I think the most substantial comment is in regards to the actual usage of the indexes.
feature_name TEXT, | ||
|
||
-- Meta info detailing what caused this action to be executed. | ||
trigger TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: trigger is a reserved keyword in SQL, so could potentially consider using another name for this column.
OFFSET @num_offset; | ||
|
||
-- name: ListActions :many | ||
SELECT a.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm not sure that the indexes will end up getting with these queries, due to the "OR IS NULL"
clauses for all of the index filter params. It requires some more investigation if it's the case or not, but if this is an issue, we should consider if it's worth creating separate queries that's being used when for example the sessionID
is not nil
at the call site. That way, we'd ensure that the indexes actually gets used.
Just for some reference, I copied the table + this query into GPT and asked if the indexes will gets used. I'm posting the response below:
Because each filter is written as
(column = :param OR :param IS NULL)
PostgreSQL’s planner can only turn that into an index‐based lookup when it can prove, at plan‐time, that :param
is non‐NULL and that the condition simplifies to a simple column = constant
. As soon as you wrap it in an OR … IS NULL
, you introduce uncertainty:
-
For a single invocation with literal parameters
-
If you prepare the statement and bind, say,
session_id = 42
, then(a.session_id = 42 OR 42 IS NULL)
is semantically equivalent to
a.session_id = 42
, so in principle theactions_session_id_idx
can be used for that clause.
-
-
When the planner builds a “generic” plan
- PostgreSQL distinguishes custom plans (where it knows your parameter values at plan‐time) from generic plans (where it has to pick one plan for all future binds).
- By default, after a few executions it switches to a generic plan, where it doesn’t know whether each
:param IS NULL
will turn on or off the filter. In that case it almost always chooses a sequential scan, because it can’t assume you’ll eliminate most rows.
-
Multiple optional filters multiply the uncertainty
- You have six of these
(… OR … IS NULL)
filters all ANDed together. Even if one of them would by itself use an index, the planner sees the whole WHERE clause as “maybe trivial, maybe selective” and still tends to fall back to seq‐scan.
- You have six of these
-
Dynamic ordering makes it worse
- The
CASE WHEN :reversed … END DESC, CASE WHEN NOT :reversed … END ASC
trick also prevents an “index‐only scan in index‐order” oncreated_at_idx
; PostgreSQL won’t match that to a simpleORDER BY created_at ASC/DESC
.
- The
Bottom line
-
In custom plans (i.e. if the planner does re-optimize on each bind, or if you force
plan_cache_mode = force_custom_plan
), then when you supply non-NULL parameters each(col = constant OR constant IS NULL)
really does reduce tocol = constant
, and the corresponding single-column index can be used. -
In generic plans (the default after a few executions of a prepared statement), the planner assumes the worst (that your filters might all be turned off), so it chooses a sequential scan and then a regular sort.
How to guarantee index usage
-
Generate the SQL dynamically so you only include
WHERE session_id = …
clauses for the parameters you actually passed. No ORs, no uncertainty — the planner sees plain equality and happily uses the index. -
Use a stored procedure with
EXECUTE format()
. Inside PL/pgSQL you can build the WHERE list piecewise and thenEXECUTE
the resulting text; the final plan is always custom. -
Split into two code‐paths:
- One static SQL with all parameters NOT NULL (index usage guaranteed).
- One static SQL with no filters (trivial
SELECT * … ORDER BY …
; index usage isn’t needed).
-
Partial indexes or multicolumn indexes won’t help here unless you know in advance which combinations of columns you’ll filter on.
TL;DR
Will the indexes actually end up getting used?
Not reliably. With the(col = :param OR :param IS NULL)
pattern inside a prepared statement, Postgres will usually fall back to a sequential scan and a regular sort—unless you force it to rebuild a custom plan each time or rewrite the query to eliminate theOR … IS NULL
out of your WHERE clause.
account_id BIGINT REFERENCES accounts(id) ON DELETE CASCADE, | ||
|
||
-- An ID derived from the macaroon used to perform the action. | ||
macaroon_identifier BLOB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be BLOB NOT NULL
instead. We could potentially also add a constraint for the length of this column as well, since the unmarshaling of this requires exactly 4 bytes.
Note that even for actions where we don't set any macaroon, we pass a [4]byte array with each byte set to 0
bytes which will get written to this column. If this is not intentional, we should change this so that this gets set an empty array when being persisted, and then also handle that case during the unmarshaling.
if req.ActorName != "" { | ||
actor.Valid = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of these if
cases here and below, we can instead just set:
actor.Valid = opts.actorName != ""
We can actually also just set during the var
initialization, so that it's instead set as:
actor = sql.NullString{
String: req.ActorName
Valid: opts.actorName != ""
}
if opts.actorName != "" { | ||
actorName.Valid = true | ||
} | ||
if opts.featureName != "" { | ||
feature.Valid = true | ||
} | ||
if opts.methodName != "" { | ||
rpcMethod.Valid = true | ||
} | ||
if opts.state != 0 { | ||
actionState.Valid = true | ||
} | ||
if !opts.startTime.IsZero() { | ||
startTime.Valid = true | ||
} | ||
if !opts.endTime.IsZero() { | ||
endTime.Valid = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -20,3 +22,33 @@ func NewTestDBWithSessions(t *testing.T, sessionStore session.Store, | |||
|
|||
return NewSQLDB(sessions.BaseDB, clock) | |||
} | |||
|
|||
// NewTestDBWithSessionsAndAccounts creates a new test BoltDB Store with access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// NewTestDBWithSessionsAndAccounts creates a new test BoltDB Store with access | |
// NewTestDBWithSessionsAndAccounts creates a new test SQLDB Store with access |
This is the final PR for part 1 of #917 🎉
In this PR, we introduce the schema for the
actions
table. See the diagram in #917 for how this fits in with regardto our other tables (namely sessions & accounts).
We then implement the SQL version of the Actions store using the new queries.
Finally, our
dev
build is updated to use the SQL version of the actions table.