Skip to content

Conversation

@mastermanu
Copy link
Member

@mastermanu mastermanu commented Jul 21, 2020

This PR fixes the following issues:

  1. SQL stores were using the Workflow Open Time instead of the Close Time for pagination, which is not aligned with Cassandra and is not the intention.

  2. The SQL store pagination logic had a bug where the tie-breaker logic of using run_id for two rows with the same close_time was not implemented properly.

  3. The SQL store pagination logic had a bug where it was using the MinStart time instead of the MaxStart time for pagination purposes (to be confirmed as to whether this is truly a bug or not, but it does appear that way)

  4. Fixes Makefile so that schema installation for MySQL / Postgres functions properly.

  5. Fixes Postgres username / password in postgres development environment.

  6. Adds appropriate MySQL/Postgres indexes for querying by close_time

Testing:

  • Tested manually with development environment + tctl for both MySQL and PostGres
  • Surprisingly no unit tests have failed as a result of these changes. If integration tests also don't fail, will open a separate issue to have proper tests for the above scenarios

mdb.converter.ToMySQLDateTime(*filter.MaxStartTime),
*filter.RunID,
*filter.MinStartTime,
*filter.MaxStartTime,
Copy link
Member Author

@mastermanu mastermanu Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawnhathaway - this also seemed like a bug to me. can you confirm that this change makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this does look like a bug as it doesn't appear MinStartTime is even affected by the paging token and this would break the contract of returning earlier records than specified with the filter it appears.

pdb.converter.ToPostgresDateTime(*filter.MaxStartTime),
*filter.RunID,
*filter.MinStartTime,
*filter.MaxStartTime,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawnhathaway - ditto here

CREATE INDEX by_type_start_time ON executions_visibility (namespace_id, workflow_type_name, status, start_time DESC, run_id);
CREATE INDEX by_workflow_id_start_time ON executions_visibility (namespace_id, workflow_id, status, start_time DESC, run_id);
CREATE INDEX by_status_by_close_time ON executions_visibility (namespace_id, status, start_time DESC, run_id);
CREATE INDEX by_status_by_start_time ON executions_visibility (namespace_id, status, start_time DESC, run_id);
Copy link
Member Author

@mastermanu mastermanu Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawnhathaway - technically we can create a partial index here (https://www.postgresql.org/docs/8.0/indexes-partial.html). MySQL doesn't support this, but Postgres does. This means we can have some indexes only for workflows in the closed state while others are in the open state. The downside is that there are more updates to the indices. Let me know if you think it is worth doing the differentiation here, or if you think it is fine as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, this should be fine as is unless it's very low hanging fruit effort-wise to add the partials. It adds size to index, but from my understanding high-performant visibility users will use Elastisearch. Let's sync today.

CREATE INDEX by_type_start_time ON executions_visibility (namespace_id, workflow_type_name, status, start_time DESC, run_id);
CREATE INDEX by_workflow_id_start_time ON executions_visibility (namespace_id, workflow_id, status, start_time DESC, run_id);
CREATE INDEX by_status_by_close_time ON executions_visibility (namespace_id, status, start_time DESC, run_id);
CREATE INDEX by_status_by_start_time ON executions_visibility (namespace_id, status, start_time DESC, run_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, this should be fine as is unless it's very low hanging fruit effort-wise to add the partials. It adds size to index, but from my understanding high-performant visibility users will use Elastisearch. Let's sync today.

mdb.converter.ToMySQLDateTime(*filter.MaxStartTime),
*filter.RunID,
*filter.MinStartTime,
*filter.MaxStartTime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this does look like a bug as it doesn't appear MinStartTime is even affected by the paging token and this would break the contract of returning earlier records than specified with the filter it appears.

@mastermanu mastermanu merged commit 9d81f1d into temporalio:master Jul 21, 2020
@Alex-Tideman Alex-Tideman mentioned this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change default order for listClosedWorkflowExecutions to by close time [SQL Visibility Store]

2 participants