enforce WorkflowExecutionTimeout in test framework #562

Groxx · 2018-09-06T01:53:42Z

NewTestWorkflowEnvironment().SetTestTimeout(...) does not appear to work in all cases.

Given a diff like this plus go test -timeout 10s ./internal 2>&1 > out.txt:

diff --git a/internal/internal_workflow_test.go b/internal/internal_workflow_test.go
index 99824b6..21a8f2a 100644
--- a/internal/internal_workflow_test.go
+++ b/internal/internal_workflow_test.go
@@ -57,6 +57,7 @@ func (s *WorkflowUnitTest) SetupSuite() {
        RegisterWorkflow(activityOptionsWorkflow)
        RegisterWorkflow(receiveAsync_CorruptSignalOnClosedChannelWorkflowTest)
        RegisterWorkflow(receive_CorruptSignalOnClosedChannelWorkflowTest)
+       RegisterWorkflow(neverEndingWorkflow)
 
        s.activityOptions = ActivityOptions{
                ScheduleToStartTimeout: time.Minute,
@@ -766,3 +767,24 @@ func (s *WorkflowUnitTest) Test_ActivityOptionsWorkflow() {
        env.GetWorkflowResult(&result)
        s.Equal("id1 id2", result)
 }
+
+func neverEndingWorkflow(ctx Context) error {
+       for {
+               Sleep(ctx, time.Hour)
+               fmt.Println("context canceled?", ctx.Err() != nil)
+       }
+}
+
+func (s *WorkflowUnitTest) Test_TestTimeoutRespectedInBusyLoop() {
+       env := s.NewTestWorkflowEnvironment()
+       timeout := 100*time.Millisecond
+       upperLimit := 150*time.Millisecond // relatively generous upper bound
+
+       env.SetTestTimeout(timeout)
+
+       before := time.Now()
+       env.ExecuteWorkflow(neverEndingWorkflow)
+       elapsed := time.Now().Sub(before)
+
+       s.Truef(elapsed < upperLimit, "wall-clock elapsed time of %v should be near test timeout of %v", elapsed, timeout)
+}

the test suite eventually times out after 10 seconds (normal run-time is about 5 seconds), and produces 50MB of output like:

2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 0, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 1, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 2, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 3, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false

... snip about 300,000 lines ...

2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242721, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242722, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242723, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
panic: test timed out after 10s

goroutine 16155 [running]:
testing.(*M).startAlarm.func1()
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1240 +0xfc
created by time.goFunc
        /usr/local/Cellar/go/1.10.3/libexec/src/time/sleep.go:172 +0x44

goroutine 1 [chan receive]:
testing.(*T).Run(0xc42035a780, 0x183513a, 0x14, 0x1865748, 0x107a201)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:825 +0x301
testing.runTests.func1(0xc420364000)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1063 +0x64
testing.tRunner(0xc420364000, 0xc420079df8)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:777 +0xd0
testing.runTests(0xc42030cb60, 0x1cd5940, 0x5a, 0x5a, 0x1012389)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1061 +0x2c4
testing.(*M).Run(0xc420326200, 0x0)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:978 +0x171
main.main()
        _testmain.go:220 +0x151

... more stacktraces ...

Which is well over the 100ms timeout. Even logs show it:

2018-09-05T18:24:48.881-0700 DEBUG
...
2018-09-05T18:24:54.002-0700 DEBUG

Note that this does work if the workflow does not "busy"-loop. E.g. this workflow panics the whole suite in 100ms:

func foreverSelectingWorkflow(ctx Context) error {
       f, _ := NewFuture(ctx)
       NewSelector(ctx).AddFuture(f, func(Future){}).Select(ctx)
       return errors.New("should not complete")
}

IMO this is a three-part issue / feature request.

SetTestTimeout should work in more scenarios
SetTestTimeout should lead to env.IsWorkflowCompleted() == false instead of panicking.
- Panics are kinda hard to debug, log to stdout instead of the associated test, and do not currently flush any logs produced by t.Log(...), which is the default behavior when you set env.Test(t) (which I'd very much recommend in general).
- There is an issue about this, and a "fix" slated for Go 1.12, but it requires running tests verbosely to receive output, which is kinda crappy. CI usually does this though, so maybe it's fine.
When starting a workflow IRL, you set the execution time limit, and the workflow will be stopped (roughly) at that time. There does not appear to be an equivalent API during tests, though it seems like it would generally be useful.
- In the busy-looping case, it'd likely stop trying after only a couple hundred cycles, which is far better than burning CPU time until the hard timeout occurs.
- In the deadlock case, it'd time out essentially immediately.

The text was updated successfully, but these errors were encountered:

yiminc-zz · 2018-09-06T18:40:21Z

The SetTestTimeout(idleTimeout) is to set the idle timeout for the test. What it means is that if there is nothing happen to the workflow (means workflow is idle) beyond the idleTimeout, the test framework will exit. For the example that you have with busy-looping case, the workflow is not idle and events (timer fired) are happening because auto forward clock.
I agree the API name is confusing, but the behavior for this is expected.

I think there 2 things we might want. 1) a control nob that set the timeout based on workflow time (based on the mock clock). This is like the WorkflowExecutionTimeout. Currently, the test framework does not require this setting and does not enforce a limit on the tested workflow.
2) a control nob that set the end to end test timeout for the test, which is more like a feature for go to offer. golang/go#10529

Groxx · 2018-09-06T20:18:03Z

The SetTestTimeout(idleTimeout) is to set the idle timeout for the test. What it means is that if there is nothing happen to the workflow (means workflow is idle) beyond the idleTimeout, the test framework will exit.

It should probably be documented differently then, currently it just says:

// SetTestTimeout sets the wall clock timeout for this workflow test run. When test timeout happen, it means workflow is
// blocked and cannot make progress. This could happen if workflow is waiting for activity result for too long.
// This is real wall clock time, not the workflow time (a.k.a workflow.Now() time).

which doesn't really imply "when test is idle for [duration]" to me, just "the wall clock timeout for this workflow test run". If it's intended to be an idle timeout though, I'll go crank our default one down to like 10ms, they should effectively never block except in integration tests.

So ignoring idle test timeout: yeah, workflow-time timeout / WorkflowExecutionTimeout covers most of our needs. Hopefully not too hard to build?

yiminc-zz · 2018-09-06T23:50:28Z

Change the idle timeout to 10ms might not be what you want. If you have activity that takes 20ms to finish and your tested workflow is idle while waiting on that activity, it could fail the test.

yiminc-zz · 2018-09-06T23:50:55Z

Will keep this issue to track the request to enforce WorkflowExecutionTimeout.

Groxx changed the title ~~test workflow environment's test timeout does not work~~ test workflow environment's test timeout does not work in some cases Sep 6, 2018

Groxx mentioned this issue Sep 6, 2018

Mocks do not retain knowledge of panicked calls #493

Closed

yiminc-zz changed the title ~~test workflow environment's test timeout does not work in some cases~~ enforce WorkflowExecutionTimeout in test framework Sep 6, 2018

yiminc-zz mentioned this issue Sep 7, 2018

Add support for WorkflowTimeout to test framework #567

Merged

yiminc-zz closed this as completed Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enforce WorkflowExecutionTimeout in test framework #562

enforce WorkflowExecutionTimeout in test framework #562

Groxx commented Sep 6, 2018

yiminc-zz commented Sep 6, 2018

Groxx commented Sep 6, 2018 •

edited

Loading

yiminc-zz commented Sep 6, 2018

yiminc-zz commented Sep 6, 2018

enforce WorkflowExecutionTimeout in test framework #562

enforce WorkflowExecutionTimeout in test framework #562

Comments

Groxx commented Sep 6, 2018

yiminc-zz commented Sep 6, 2018

Groxx commented Sep 6, 2018 • edited Loading

yiminc-zz commented Sep 6, 2018

yiminc-zz commented Sep 6, 2018

Groxx commented Sep 6, 2018 •

edited

Loading