-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
Expected Behavior
If ExecuteWorkflow is called with args that serialize to larger than the max history size, a proper error should be returned from ExecuteWorkflow and the workflow history should reflect that error.
Actual Behavior
A workflow execution is created with no history. Subsequent attempts to retrieve the workflow via tctl or temporal-web get Failed to get history on workflow or corrupted history event batch, eventID is not continuous, and it appears that subsequent workflow executions in the same namespace (even for different workflows) get stuck behind the now corrupt workflow.
We haven't yet found a way to manually clear this execution with no history through Temporal provided tools and end up having to manually clear it from the database.
Some tctl logs:
$ docker-compose run tctl --ns default wf list -op
WORKFLOW TYPE | WORKFLOW ID | RUN ID | TASK QUEUE | START TIME | EXECUTION TIME
TS.Load | ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ | 2e30564c-8f4d-45e2-a5ed-d0af34c0a337 | TIMESERIES_TASK_QUEUE | 17:42:22 | 17:42:22
$ docker-compose run tctl --ns default wf show -wid ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ
Error: Failed to get history on workflow id: ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ, run id: .
Error Details: context deadline exceeded
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)
$ docker-compose run tctl --ns default admin wf desc -wid ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ
Cache mutable state:
{
"executionInfo": {
"namespaceId": "67f5aeeb-0190-43d7-9ecf-1827acc18083",
"workflowId": "ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ",
"taskQueue": "TIMESERIES_TASK_QUEUE",
"workflowTypeName": "TS.Load",
"workflowExecutionTimeout": "0s",
"workflowRunTimeout": "0s",
"defaultWorkflowTaskTimeout": "10s",
"lastEventTaskId": "1048613",
"lastFirstEventId": "4",
"startTime": "2021-02-07T17:42:22.593879300Z",
"lastUpdateTime": "2021-02-08T15:34:19.889Z",
"workflowTaskScheduleId": "5",
"workflowTaskTimeout": "10s",
"workflowTaskAttempt": 7832,
"workflowTaskScheduledTime": "2021-02-08T15:34:19.889413600Z",
"workflowTaskOriginalScheduledTime": "2021-02-08T15:34:19.889412300Z",
"workflowTaskRequestId": "emptyUuid",
"stickyScheduleToStartTimeout": "0s",
"attempt": 1,
"autoResetPoints": {
},
"versionHistories": {
"histories": [
{
"branchToken": "CiQyZTMwNTY0Yy04ZjRkLTQ1ZTItYTVlZC1kMGFmMzRjMGEzMzcSJDY1ZDlhMjlmLTc3NzQtNGFiYS1hYjZjLTdiOWFjYWNiMjM0Nw==",
"items": [
{
"eventId": "4"
}
]
}
]
},
"firstExecutionRunId": "2e30564c-8f4d-45e2-a5ed-d0af34c0a337",
"executionStats": {
"historySize": "53699"
},
"workflowRunExpirationTime": "0001-01-01T00:00:00Z"
},
"executionState": {
"createRequestId": "db985d3e-5659-4c1c-bc68-184858dcb9e7",
"runId": "2e30564c-8f4d-45e2-a5ed-d0af34c0a337",
"state": "Running",
"status": "Running"
},
"nextEventId": "5"
}
Database mutable state:
{
"executionInfo": {
"namespaceId": "67f5aeeb-0190-43d7-9ecf-1827acc18083",
"workflowId": "ts-TS.Load-GTMTj7XPrsW6d8iVnohKGs6LZ",
"taskQueue": "TS_TASK_QUEUE",
"workflowTypeName": "TS.Load",
"workflowExecutionTimeout": "0s",
"workflowRunTimeout": "0s",
"defaultWorkflowTaskTimeout": "10s",
"lastEventTaskId": "1048613",
"lastFirstEventId": "4",
"startTime": "2021-02-07T17:42:22.593879300Z",
"lastUpdateTime": "2021-02-08T15:34:19.889Z",
"workflowTaskScheduleId": "5",
"workflowTaskTimeout": "10s",
"workflowTaskAttempt": 7832,
"workflowTaskScheduledTime": "2021-02-08T15:34:19.889413600Z",
"workflowTaskOriginalScheduledTime": "2021-02-08T15:34:19.889412300Z",
"workflowTaskRequestId": "emptyUuid",
"stickyScheduleToStartTimeout": "0s",
"attempt": 1,
"autoResetPoints": {
},
"versionHistories": {
"histories": [
{
"branchToken": "CiQyZTMwNTY0Yy04ZjRkLTQ1ZTItYTVlZC1kMGFmMzRjMGEzMzcSJDY1ZDlhMjlmLTc3NzQtNGFiYS1hYjZjLTdiOWFjYWNiMjM0Nw==",
"items": [
{
"eventId": "4"
}
]
}
]
},
"firstExecutionRunId": "2e30564c-8f4d-45e2-a5ed-d0af34c0a337",
"executionStats": {
"historySize": "53699"
},
"workflowRunExpirationTime": "0001-01-01T00:00:00Z"
},
"executionState": {
"createRequestId": "db985d3e-5659-4c1c-bc68-184858dcb9e7",
"runId": "2e30564c-8f4d-45e2-a5ed-d0af34c0a337",
"state": "Running",
"status": "Running"
},
"nextEventId": "5"
}
Current branch token:
{
"tree_id": "2e30564c-8f4d-45e2-a5ed-d0af34c0a337",
"branch_id": "65d9a29f-7774-4aba-ab6c-7b9acacb2347"
}
History service address: 172.22.0.3:7234
Shard Id: 1
$ docker-compose run tctl --ns default admin wf show --db_address 127.0.0.1 --db_port 9042 --tree_id 2e30564c-8f4d-45e2-a5ed-d0af34c0a337
Error: ReadHistoryBranch err
Error Details: ReadHistoryBranch. Close operation failed. Error: invalid UUID ""
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)Steps to Reproduce the Problem
- Execute a workflow with args that serialize to larger than max history
- Try to get any details about the "Running" execution
Specifications
- Version: 1.6.3
- Platform: