Skip to content

Conversation

@mastermanu
Copy link
Member

@mastermanu mastermanu commented Sep 15, 2020

The current implementations of UpsertWorkflowExecution on the Cassandra/SQL visibility persistence stores return a Service Error since Query Operations are not supported for those stores.

The problem is that we do allow customers to invoke UpsertWorkflowExecution successfully even if Elastic Search isn't enabled. This results in our Transfer Task Queues getting clogged because it keeps retrying a persistence operation that will always fail.

The fix is to just have all attempts to UpsertWorkflowExecution on Cassandra/SQL to function as a no-op so that it does not "fail" the transfer task perpetually. We will still fail with explicit errors on List/Scan/Count, so the user should be able to easily tell that their application logic is broken if it does depend on Elastic Search.

The change was verified as follows:

  1. Existing Unit test was modified to ensure that no-op is always returned for UpsertWorkflowExecution
  2. Running Bench Test on Temporal Server w/o ES cluster stopped spamming the "Critical error processing task" error log once this change was made.

This is a very low risk change.

@mastermanu mastermanu merged commit a09867d into temporalio:master Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants