Improved time counter used to compute test durations. #6939

smarie · 2020-03-19T14:18:48Z

CallInfo.start is now retro-computed as CallInfo.stop - duration, where duration is counted using time.perf_counter(). Fixes #4391

Include documentation when adding new features.
Include new tests or update existing tests when applicable.
Allow maintainers to push and squash when merging my commits. Please uncheck this if you prefer to squash the commits yourself.

Unless your change is trivial or a small documentation fix (e.g., a typo or reword of a small section) please:

Create a new changelog file in the changelog folder, with a name like <ISSUE NUMBER>.<TYPE>.rst. See changelog/README.rst for details.

Write sentences in the past or present tense, examples:
- Improved verbose diff output with sequences.
- Terminal summary statistics now use multiple colors.
Also make sure to end the sentence with a ..
Add yourself to AUTHORS in alphabetical order.
-->

…` is now retro-computed as `CallInfo.stop - duration`, where `duration` is counted using `time.perf_counter()`. Fixes pytest-dev#4391

smarie · 2020-03-19T14:23:11Z

As of now I rely on time.perf_counter() as suggested by @Zac-HD in #4391 (comment), and not on timeit.default_timer as used in pytest-benchmark

smarie · 2020-03-19T15:08:54Z

Failing tests seem to be caused by more precise duration counting. I'll check this and fix.

bluetech · 2020-03-19T15:50:37Z

The retro-computing trick is clever but I'm not completely sure about it. From what I understand:

If start and stop are not defined to have a reference point, then just straight using a monotonic clock (time.perf_counter) is more appropriate, since any hybrid solution will inevitably lead to misunderstanding and incorrect assumptions.
If start and stop are defined to be real/system time, then this is a breaking change since start is no longer that.

Since I've learned that Hyrum's Law definitely holds for pytest, I don't think (1) is an option.

One possible solution is to add a new duration attribute to CallInfo, which is calculated using perf_counter, and use that in the duration reports. And then document that start and stop use system time (clock can jump), but duration is precise.

smarie · 2020-03-19T16:16:29Z

Thanks for the feedback @bluetech !

Well it seems that this suggestion from @Zac-HD (#4391 (comment)) was made with that concern in mind: we preserve the start/stop API because users may rely on it, and we just make sure that their difference is more accurate than before.

I do not see any reason why this would be considered breaking API retrocompatibility, do you ?

Sorry not to refer to the "law" you mention, I did not know this name and in general am not very fond of developers defining laws after their own name :) . But I admit that this makes a good reference point if users need to be convinced that an API should be stable.

…to display the number of skipped items with small duration < 0.005s. Fixed pytest-dev#6940

smarie · 2020-03-19T17:14:33Z

Something very strange is happening: all windows and macOS targets of the continuous integration fail, while the linux ones are ok.

On MacOS targets the issue is not related with my modifications in pytest code, but with my improvement of the test itself. The test acceptance_tests.py::TestDurations.test_calls fails because the three tests always appear in the terminal, while test 1 should not (it should have a too small duration and therefore be skipped). If I revert the test to do a less precise match, it succeeds as previously, but should we be satisfied with these durations ? I do not think so:

0.03s call test_calls.py::test_2
0.03s call test_calls.py::test_3
0.02s call test_calls.py::test_1

(I remind that test 1 is doing a sleep(0.005), test 2 is doing sleep(0.01) and the 3 is sleep(0.02). This is clearly not reflected in the numbers above.

On Windows targets things behave much more randomly, with one or several tests below failing:

acceptance_tests.py::TestDurations.test_calls
acceptance_tests.py::TestDurations.test_calls_showall
acceptance_tests.py::TestDurations.test_calls
acceptance_tests.py::TestDurations.test_setup_function
acceptance_tests.py::TestDurationWithFixture.test_setup_function

I do not manage to reproduce on my platform. Could this be related to parallelization with x-dist ?

Another thing is maybe that perf_counter is maybe less precise than the doc states ?

bluetech · 2020-03-19T18:11:11Z

I do not see any reason why this would be considered breaking API retrocompatibility, do you ?

Suppose a test runs for 10 minutes, and in the middle of that the system time is adjusted (correctly, or not). Before, start would have the (system) time as it was when the test started. After, it will be the (system) time when the test ended minus the duration, which is not the same.

I know it seems unlikely that someone relies on this, but that's why I cited the "law" and offered an alternative.

src/_pytest/runner.py

bluetech · 2020-03-19T18:26:39Z

testing/acceptance_test.py

@@ -915,7 +915,8 @@ def test_calls(self, testdir):
            ["*durations*", "*call*test_3*", "*call*test_2*"]
        )
        result.stdout.fnmatch_lines(
-            ["(0.00 durations hidden.  Use -vv to show these durations.)"]
+            ["(8 items with duration < 0.005s hidden.  Use -vv to show these "


How come it shows 8 items I wonder? I count 4 items in total, and out of only 2 are supposed to count.

That's because it also includes setup/teardown durations, here's the run with --durations=999 -vv:

===================== slowest 99 test durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 0.00s call test_durations.py::test_1 0.00s setup test_durations.py::test_3 0.00s setup test_durations.py::test_something 0.00s teardown test_durations.py::test_2 0.00s setup test_durations.py::test_1 0.00s teardown test_durations.py::test_something 0.00s teardown test_durations.py::test_3 0.00s call test_durations.py::test_something 0.00s teardown test_durations.py::test_1 0.00s setup test_durations.py::test_2 ========================= 4 passed in 0.08s =========================

With --durations=10 we get:

===================== slowest 10 test durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 (8 items with duration < 0.005s hidden. Use -vv to show these durations.) ========================= 4 passed in 0.06s =========================

Perhaps "items" is a bit misleading? How about "entries" instead?

i propose the word reports since i believe it may match better

@RonnyPfannschmidt , I also prefer "entries" over "reports" : even if the latter is more accurately reflecting the inner objects (TestReport), it is a bit misleading for the "lambda" user. But obviously that is not my call here, I let the team decide.

Shall I also modify the slowest xx test durations header to remove the word test ?

For example:

===================== slowest 10 durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 (8 entries with duration < 0.005s hidden. Use -vv to show these durations.) ========================= 4 passed in 0.06s =========================

@smarie of course, im sure that with more discussion and input we will reach a conclusion that will serve the users best,

i do have to admit that my bias is structurally technical (aka i like to have the ux and the technicalities to match in such a sense that we do not need to have a dissonance between what appens on technical level and what happens on ux level), but it may be better to change the technical aspect to match the comprehensible ux rather than the other way around

I eventually modified it that way:

===================== slowest 10 durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 (8 durations < 0.005s hidden. Use -vv to show these durations.) ========================= 4 passed in 0.06s =========================

Let me know if that works for everyone

I like it 👍

bluetech · 2020-03-19T18:29:46Z

Something very strange is happening: all windows and macOS targets of the continuous integration fail, while the linux ones are ok.

I don't that test is a good idea, if it relies on actual system behavior. The system can be slow of switch threads etc. and the time of a single test can be unbounded. So using an exact count would be flaky. Maybe just monkeypatch the time functions for this test to get consistent results?

changelog/4391.feature.rst

nicoddemus · 2020-03-20T00:08:10Z

testing/acceptance_test.py

@@ -915,7 +915,8 @@ def test_calls(self, testdir):
            ["*durations*", "*call*test_3*", "*call*test_2*"]
        )
        result.stdout.fnmatch_lines(
-            ["(0.00 durations hidden.  Use -vv to show these durations.)"]
+            ["(8 items with duration < 0.005s hidden.  Use -vv to show these "


That's because it also includes setup/teardown durations, here's the run with --durations=999 -vv:

===================== slowest 99 test durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 0.00s call test_durations.py::test_1 0.00s setup test_durations.py::test_3 0.00s setup test_durations.py::test_something 0.00s teardown test_durations.py::test_2 0.00s setup test_durations.py::test_1 0.00s teardown test_durations.py::test_something 0.00s teardown test_durations.py::test_3 0.00s call test_durations.py::test_something 0.00s teardown test_durations.py::test_1 0.00s setup test_durations.py::test_2 ========================= 4 passed in 0.08s =========================

With --durations=10 we get:

===================== slowest 10 test durations ===================== 0.02s call test_durations.py::test_3 0.01s call test_durations.py::test_2 (8 items with duration < 0.005s hidden. Use -vv to show these durations.) ========================= 4 passed in 0.06s =========================

Perhaps "items" is a bit misleading? How about "entries" instead?

smarie · 2020-03-20T15:51:40Z

For information I did some tests (gist available here) with the python tools for timing, and I have to say that I am quite puzzled with the results: there is no apparent stability on windows 10. So I will not try to re-invent the wheel and will assume that the official python documentation is the source to trust: therefore we should use perf_counter or perf_counter_ns for best performance.

Now,

I agree with @bluetech : for the long run it will probably be much clearer / readable to have a duration attribute in CallInfo, and use it in the TestReport. I also suggest to add a duration_ns attribute when python >= 3.7. EDIT I implemented this in the PR now, you can check it out. Note that I could not find any test in the acceptance test checking if "duration" was available as an attribute on the report object, so I did not know where I should test for existence of the new duration_ns and comparison with duration (duration_ns is 1e9 times the other).
I am ok to monkeypatch the time module for the tests, but... what can we possibly use as a replacement so that the results are ok ? An alternative is to have more permissive tests on windows and macOS targets.

…ed instead of `time.perf_counter` to measure execution time, and `CallInfo` and `TestReport` objects have an extra `duration_ns` attribute to store the highest-precision duration.

…ut is just `None` on python < 3.7

bluetech · 2020-03-27T15:17:49Z

src/_pytest/runner.py

-    """ Result/Exception info a function invocation. """
+    """ Result/Exception info a function invocation.
+
+    :param excinfo: `ExceptionInfo` describing the captured exception if any


Suggested change

:param excinfo: `ExceptionInfo` describing the captured exception if any

:param excinfo: (Optional[ExceptionInfo]) The captured exception, if any.

I'm not fond of using typing symbols in docstring, they are not for humans. Especially as the real official meaning of Optional is that the type is optional (hence the attribute is nonable).

Also, Sphinx might like the use of backticks, no ?

The sphinx docs already display type annotations for annotated functions, for example: https://docs.pytest.org/en/latest/reference.html#pytest-exit. Normally with annotations we don't need to write types in comments, but attrs is a special case.

I would not say type annotations are not for humans -- programmers are humans too, at least for now :) I except a lot of Python programmers will eventually learn what they mean.

I like to have the Optional, it tells me immediately it might be None, no need to read the text.

I'm not sure about the backticks, I suggested to remove it because you didn't add it for the others. I'm fine with whatever works...

oh thanks I forgot to include them for the others.

src/_pytest/runner.py

bluetech · 2020-03-27T15:22:47Z

Seems like another rebase is needed to fix CI, now due to #6976 -- bad luck :)

Co-Authored-By: Ran Benita <[email protected]>

…o fix_issue_4391

…x_issue_4391

smarie · 2020-03-27T16:30:20Z

Regarding duration_ns, I can think of two reasons to have it, but looking closer I'm not sure it's very beneficial:

You are right, this is probably overkill. I had a look at PEP564 and it seems to mostly state that the issue is with keeping nanosecond resolution when measuring very long durations (> 104 days).

So completely out of scope for pytest. I'll remove this.

smarie · 2020-03-27T16:45:22Z

Everything should now be ok. Let me know !

blueyed · 2020-03-27T19:19:27Z

src/_pytest/runner.py

@@ -226,7 +226,7 @@ def call_runtest_hook(item, when: "Literal['setup', 'call', 'teardown']", **kwds
 class CallInfo:
    """ Result/Exception info a function invocation.

-    :param excinfo: (`ExceptionInfo`, optional) describing the captured exception if any
+    :param excinfo: (`ExceptionInfo`, optional) describing the captured exception if any.
    :param start: (`float`) The system time when the call started, in seconds since the epoch.
    :param stop: (`float`) The system time when the call ended, in seconds since the epoch.
    :param duration: (`float`) The call duration, in seconds.


The format is :param type name:, e.g. :param float start:.
You could also add it below the attribs:

start = attr.ib() """The system time when the call started, in seconds since the epoch. :type: float"""

Check with make -C doc/en html / tox -e docs though, of course.

Thanks for the feedback. I'll simply update the class docstring then, if it does not make much difference.

Note that I cant check the doc generation as tox -e docs does not seem to work on windows

docs run-test: commands[0] | sh -c 'towncrier --draft > doc/en/_changelog_towncrier_draft.rst' ERROR: InvocationError for command could not find executable sh

And with make I seem to get an error

(tools_py37) C:\_dev\python_ws\_Libs_OpenSource\pytest>make -C doc/en html make: Entering directory `C:/_dev/python_ws/_Libs_OpenSource/pytest/doc/en' Running Sphinx v2.4.4 Configuration error: There is a programmable error in your configuration file: Traceback (most recent call last): File "c:\miniconda3\envs\tools_py37\lib\site-packages\sphinx\config.py", line 348, in eval_config_file execfile_(filename, namespace) File "c:\miniconda3\envs\tools_py37\lib\site-packages\sphinx\util\pycompat.py", line 81, in execfile_ exec(code, _globals) File "C:\_dev\python_ws\_Libs_OpenSource\pytest\doc\en\conf.py", line 22, in <module> from _pytest.compat import TYPE_CHECKING ImportError: cannot import name 'TYPE_CHECKING' from '_pytest.compat' (c:\miniconda3\envs\tools_py37\lib\site-packages\_pytest\compat.py) make: *** [html] Error 2 make: Leaving directory `C:/_dev/python_ws/_Libs_OpenSource/pytest/doc/en'

So for the sake of completing the PR I chose the easy way as you seem to say that any way works ok :)

I'm also on Windows, and I use tox -e docs to check/generate documentation. 😁

@nicoddemus which installer do you use for the missing sh required by tox -e docs ? (see error message above)
git bash, cygwin, or some conda package ?

Ahh sorry missed that one. My bash comes from git:

λ where bash C:\Program Files\Git\usr\bin\bash.exe

It is a bit of annoying to require bash on windows just to generate the docs...

nicoddemus

Thanks a lot for the great work and patience @smarie!

src/_pytest/runner.py

bluetech · 2020-03-29T12:21:09Z

Squashed and merged. Thanks @smarie, and @nicoddemus and @blueyed for review.

Sylvain MARIE added 2 commits March 19, 2020 15:08

Improved time counter used to compute test durations. `CallInfo.start…

04c5ca5

…` is now retro-computed as `CallInfo.stop - duration`, where `duration` is counted using `time.perf_counter()`. Fixes pytest-dev#4391

Added changelog and authors

bf03b05

smarie mentioned this pull request Mar 19, 2020

Get the best available time counter for node call duration calculation #4391

Closed

Sylvain MARIE added 2 commits March 19, 2020 15:28

Linter fix

bdb0d19

Linter fix 2

55d484c

Sylvain MARIE added 3 commits March 19, 2020 17:31

Improved terminal message when the --duration option is used so as …

8c041d3

…to display the number of skipped items with small duration < 0.005s. Fixed pytest-dev#6940

Updated durations test to match the new terminal output

4d77bb2

make linter happy

1c50d3d

bluetech reviewed Mar 19, 2020

View reviewed changes

nicoddemus reviewed Mar 20, 2020

View reviewed changes

As per code review: renamed towncrier changelog file

47de8b8

Sylvain MARIE added 11 commits March 20, 2020 17:09

New when time.perf_counter_ns is available (python >= 3.7) it is us…

23d7a6c

…ed instead of `time.perf_counter` to measure execution time, and `CallInfo` and `TestReport` objects have an extra `duration_ns` attribute to store the highest-precision duration.

Edited changelog according to latest proposal

481ba13

Linter/black edit

1649820

Made one test a little looser

e1eb122

Attempt to improve terminal reporting again

5373d18

Modified tests so that they now pass of windows

70460b5

Fixed test mistake on windows

e060ee0

Linter fixes

2450d0c

Flake8 fix

7997436

Code simplification for linter: duration_ns is now always present b…

8099f02

…ut is just `None` on python < 3.7

Updated changelog

c828c0b

bluetech reviewed Mar 27, 2020

View reviewed changes

smarie and others added 9 commits March 27, 2020 16:35

Update src/_pytest/runner.py

2f6ffde

Co-Authored-By: Ran Benita <[email protected]>

Update src/_pytest/runner.py

92017f6

Co-Authored-By: Ran Benita <[email protected]>

Update src/_pytest/runner.py

b8830d9

Co-Authored-By: Ran Benita <[email protected]>

Merge branch 'fix_issue_4391' of https://github.com/smarie/pytest int…

37347e2

…o fix_issue_4391

Merge branch 'master' of https://github.com/pytest-dev/pytest into fi…

514f055

…x_issue_4391

Added type hints for CallInfo

99b779e

as per code review using sys.platform instead of the platform module

a1e92e9

lint fix

7f37a49

Improved changelog

9a33ef4

Sylvain MARIE added 3 commits March 27, 2020 17:35

As per code review: removed duration_ns from the picture

51ebcba

Final changelog update

4b23061

Added changelog for the terminal improvement

a2f682c

Sylvain MARIE added 2 commits March 27, 2020 17:48

Minor docstring edits

c86a42b

Minor docstring edits

8a27c97

blueyed reviewed Mar 27, 2020

View reviewed changes

nicoddemus approved these changes Mar 27, 2020

View reviewed changes

src/_pytest/runner.py Outdated Show resolved Hide resolved

docstring update as per code review

daa588c

bluetech approved these changes Mar 28, 2020

View reviewed changes

src/_pytest/runner.py Show resolved Hide resolved

nicoddemus added 2 commits March 28, 2020 11:04

Update docstring as per request in code review

533e5ec

Fix linting

57f8a41

smarie commented Mar 28, 2020

View reviewed changes

src/_pytest/runner.py Show resolved Hide resolved

bluetech merged commit 95fadd5 into pytest-dev:master Mar 29, 2020

smarie deleted the fix_issue_4391 branch March 29, 2020 20:16

smarie mentioned this pull request Mar 29, 2020

--duration More precise terminal message when tests with small duration are hidden #6940

Closed

nicoddemus mentioned this pull request Apr 25, 2025

Replace time.time() with _pytest.timing.perf_counter() #13394

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved time counter used to compute test durations. #6939

Improved time counter used to compute test durations. #6939

smarie commented Mar 19, 2020 •

edited by Zac-HD

Loading

smarie commented Mar 19, 2020 •

edited

Loading

smarie commented Mar 19, 2020

bluetech commented Mar 19, 2020

smarie commented Mar 19, 2020 •

edited

Loading

smarie commented Mar 19, 2020

bluetech commented Mar 19, 2020

bluetech Mar 19, 2020

nicoddemus Mar 20, 2020

RonnyPfannschmidt Mar 20, 2020

smarie Mar 20, 2020 •

edited

Loading

RonnyPfannschmidt Mar 20, 2020

smarie Mar 21, 2020

bluetech Mar 27, 2020

bluetech commented Mar 19, 2020

nicoddemus Mar 20, 2020

smarie commented Mar 20, 2020 •

edited

Loading

bluetech Mar 27, 2020

smarie Mar 27, 2020

bluetech Mar 27, 2020

smarie Mar 27, 2020

bluetech commented Mar 27, 2020

smarie commented Mar 27, 2020 •

edited

Loading

smarie commented Mar 27, 2020

blueyed Mar 27, 2020 •

edited

Loading

smarie Mar 28, 2020

nicoddemus Mar 28, 2020

smarie Mar 28, 2020

nicoddemus Mar 28, 2020

nicoddemus left a comment

bluetech commented Mar 29, 2020

	:param excinfo: `ExceptionInfo` describing the captured exception if any
	:param excinfo: (Optional[ExceptionInfo]) The captured exception, if any.

Improved time counter used to compute test durations. #6939

Improved time counter used to compute test durations. #6939

Conversation

smarie commented Mar 19, 2020 • edited by Zac-HD Loading

smarie commented Mar 19, 2020 • edited Loading

smarie commented Mar 19, 2020

bluetech commented Mar 19, 2020

smarie commented Mar 19, 2020 • edited Loading

smarie commented Mar 19, 2020

bluetech commented Mar 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarie Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech commented Mar 19, 2020

Choose a reason for hiding this comment

smarie commented Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech commented Mar 27, 2020

smarie commented Mar 27, 2020 • edited Loading

smarie commented Mar 27, 2020

blueyed Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicoddemus left a comment

Choose a reason for hiding this comment

bluetech commented Mar 29, 2020

smarie commented Mar 19, 2020 •

edited by Zac-HD

Loading

smarie commented Mar 19, 2020 •

edited

Loading

smarie commented Mar 19, 2020 •

edited

Loading

smarie Mar 20, 2020 •

edited

Loading

smarie commented Mar 20, 2020 •

edited

Loading

smarie commented Mar 27, 2020 •

edited

Loading

blueyed Mar 27, 2020 •

edited

Loading