Skip to content

enforce WorkflowExecutionTimeout in test framework #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Groxx opened this issue Sep 6, 2018 · 4 comments
Closed

enforce WorkflowExecutionTimeout in test framework #562

Groxx opened this issue Sep 6, 2018 · 4 comments

Comments

@Groxx
Copy link
Member

Groxx commented Sep 6, 2018

NewTestWorkflowEnvironment().SetTestTimeout(...) does not appear to work in all cases.

Given a diff like this plus go test -timeout 10s ./internal 2>&1 > out.txt:

diff --git a/internal/internal_workflow_test.go b/internal/internal_workflow_test.go
index 99824b6..21a8f2a 100644
--- a/internal/internal_workflow_test.go
+++ b/internal/internal_workflow_test.go
@@ -57,6 +57,7 @@ func (s *WorkflowUnitTest) SetupSuite() {
        RegisterWorkflow(activityOptionsWorkflow)
        RegisterWorkflow(receiveAsync_CorruptSignalOnClosedChannelWorkflowTest)
        RegisterWorkflow(receive_CorruptSignalOnClosedChannelWorkflowTest)
+       RegisterWorkflow(neverEndingWorkflow)
 
        s.activityOptions = ActivityOptions{
                ScheduleToStartTimeout: time.Minute,
@@ -766,3 +767,24 @@ func (s *WorkflowUnitTest) Test_ActivityOptionsWorkflow() {
        env.GetWorkflowResult(&result)
        s.Equal("id1 id2", result)
 }
+
+func neverEndingWorkflow(ctx Context) error {
+       for {
+               Sleep(ctx, time.Hour)
+               fmt.Println("context canceled?", ctx.Err() != nil)
+       }
+}
+
+func (s *WorkflowUnitTest) Test_TestTimeoutRespectedInBusyLoop() {
+       env := s.NewTestWorkflowEnvironment()
+       timeout := 100*time.Millisecond
+       upperLimit := 150*time.Millisecond // relatively generous upper bound
+
+       env.SetTestTimeout(timeout)
+
+       before := time.Now()
+       env.ExecuteWorkflow(neverEndingWorkflow)
+       elapsed := time.Now().Sub(before)
+
+       s.Truef(elapsed < upperLimit, "wall-clock elapsed time of %v should be near test timeout of %v", elapsed, timeout)
+}

the test suite eventually times out after 10 seconds (normal run-time is about 5 seconds), and produces 50MB of output like:

2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 0, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 1, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 2, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:48.881-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 3, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false

... snip about 300,000 lines ...

2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242721, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242722, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
context canceled? false
2018-09-05T18:24:54.002-0700    DEBUG   internal/internal_workflow_testsuite.go:554     Auto fire timer {"TimerID": 242723, "TimerDuration": "1h0m0s", "TimeSkipped": "1h0m0s"}
panic: test timed out after 10s

goroutine 16155 [running]:
testing.(*M).startAlarm.func1()
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1240 +0xfc
created by time.goFunc
        /usr/local/Cellar/go/1.10.3/libexec/src/time/sleep.go:172 +0x44

goroutine 1 [chan receive]:
testing.(*T).Run(0xc42035a780, 0x183513a, 0x14, 0x1865748, 0x107a201)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:825 +0x301
testing.runTests.func1(0xc420364000)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1063 +0x64
testing.tRunner(0xc420364000, 0xc420079df8)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:777 +0xd0
testing.runTests(0xc42030cb60, 0x1cd5940, 0x5a, 0x5a, 0x1012389)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:1061 +0x2c4
testing.(*M).Run(0xc420326200, 0x0)
        /usr/local/Cellar/go/1.10.3/libexec/src/testing/testing.go:978 +0x171
main.main()
        _testmain.go:220 +0x151

... more stacktraces ...

Which is well over the 100ms timeout. Even logs show it:

2018-09-05T18:24:48.881-0700 DEBUG
...
2018-09-05T18:24:54.002-0700 DEBUG

Note that this does work if the workflow does not "busy"-loop. E.g. this workflow panics the whole suite in 100ms:

func foreverSelectingWorkflow(ctx Context) error {
       f, _ := NewFuture(ctx)
       NewSelector(ctx).AddFuture(f, func(Future){}).Select(ctx)
       return errors.New("should not complete")
}

IMO this is a three-part issue / feature request.

  1. SetTestTimeout should work in more scenarios
  2. SetTestTimeout should lead to env.IsWorkflowCompleted() == false instead of panicking.
    • Panics are kinda hard to debug, log to stdout instead of the associated test, and do not currently flush any logs produced by t.Log(...), which is the default behavior when you set env.Test(t) (which I'd very much recommend in general).
    • There is an issue about this, and a "fix" slated for Go 1.12, but it requires running tests verbosely to receive output, which is kinda crappy. CI usually does this though, so maybe it's fine.
  3. When starting a workflow IRL, you set the execution time limit, and the workflow will be stopped (roughly) at that time. There does not appear to be an equivalent API during tests, though it seems like it would generally be useful.
    • In the busy-looping case, it'd likely stop trying after only a couple hundred cycles, which is far better than burning CPU time until the hard timeout occurs.
    • In the deadlock case, it'd time out essentially immediately.
@Groxx Groxx changed the title test workflow environment's test timeout does not work test workflow environment's test timeout does not work in some cases Sep 6, 2018
@yiminc-zz
Copy link
Contributor

The SetTestTimeout(idleTimeout) is to set the idle timeout for the test. What it means is that if there is nothing happen to the workflow (means workflow is idle) beyond the idleTimeout, the test framework will exit. For the example that you have with busy-looping case, the workflow is not idle and events (timer fired) are happening because auto forward clock.
I agree the API name is confusing, but the behavior for this is expected.

I think there 2 things we might want. 1) a control nob that set the timeout based on workflow time (based on the mock clock). This is like the WorkflowExecutionTimeout. Currently, the test framework does not require this setting and does not enforce a limit on the tested workflow.
2) a control nob that set the end to end test timeout for the test, which is more like a feature for go to offer. golang/go#10529

@Groxx
Copy link
Member Author

Groxx commented Sep 6, 2018

The SetTestTimeout(idleTimeout) is to set the idle timeout for the test. What it means is that if there is nothing happen to the workflow (means workflow is idle) beyond the idleTimeout, the test framework will exit.

It should probably be documented differently then, currently it just says:

// SetTestTimeout sets the wall clock timeout for this workflow test run. When test timeout happen, it means workflow is
// blocked and cannot make progress. This could happen if workflow is waiting for activity result for too long.
// This is real wall clock time, not the workflow time (a.k.a workflow.Now() time).

which doesn't really imply "when test is idle for [duration]" to me, just "the wall clock timeout for this workflow test run". If it's intended to be an idle timeout though, I'll go crank our default one down to like 10ms, they should effectively never block except in integration tests.


So ignoring idle test timeout: yeah, workflow-time timeout / WorkflowExecutionTimeout covers most of our needs. Hopefully not too hard to build?

@yiminc-zz yiminc-zz changed the title test workflow environment's test timeout does not work in some cases enforce WorkflowExecutionTimeout in test framework Sep 6, 2018
@yiminc-zz
Copy link
Contributor

Change the idle timeout to 10ms might not be what you want. If you have activity that takes 20ms to finish and your tested workflow is idle while waiting on that activity, it could fail the test.

@yiminc-zz
Copy link
Contributor

Will keep this issue to track the request to enforce WorkflowExecutionTimeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants