-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Improved time counter used to compute test durations. #6939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…` is now retro-computed as `CallInfo.stop - duration`, where `duration` is counted using `time.perf_counter()`. Fixes pytest-dev#4391
As of now I rely on |
Failing tests seem to be caused by more precise duration counting. I'll check this and fix. |
The retro-computing trick is clever but I'm not completely sure about it. From what I understand:
Since I've learned that Hyrum's Law definitely holds for pytest, I don't think (1) is an option. One possible solution is to add a new |
Thanks for the feedback @bluetech ! Well it seems that this suggestion from @Zac-HD (#4391 (comment)) was made with that concern in mind: we preserve the I do not see any reason why this would be considered breaking API retrocompatibility, do you ? Sorry not to refer to the "law" you mention, I did not know this name and in general am not very fond of developers defining laws after their own name :) . But I admit that this makes a good reference point if users need to be convinced that an API should be stable. |
…to display the number of skipped items with small duration < 0.005s. Fixed pytest-dev#6940
Something very strange is happening: all windows and macOS targets of the continuous integration fail, while the linux ones are ok. On MacOS targets the issue is not related with my modifications in pytest code, but with my improvement of the test itself. The test 0.03s call test_calls.py::test_2 (I remind that test 1 is doing a On Windows targets things behave much more randomly, with one or several tests below failing:
I do not manage to reproduce on my platform. Could this be related to parallelization with x-dist ? Another thing is maybe that |
Suppose a test runs for 10 minutes, and in the middle of that the system time is adjusted (correctly, or not). Before, I know it seems unlikely that someone relies on this, but that's why I cited the "law" and offered an alternative. |
testing/acceptance_test.py
Outdated
@@ -915,7 +915,8 @@ def test_calls(self, testdir): | |||
["*durations*", "*call*test_3*", "*call*test_2*"] | |||
) | |||
result.stdout.fnmatch_lines( | |||
["(0.00 durations hidden. Use -vv to show these durations.)"] | |||
["(8 items with duration < 0.005s hidden. Use -vv to show these " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come it shows 8 items I wonder? I count 4 items in total, and out of only 2 are supposed to count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because it also includes setup/teardown durations, here's the run with --durations=999 -vv
:
===================== slowest 99 test durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
0.00s call test_durations.py::test_1
0.00s setup test_durations.py::test_3
0.00s setup test_durations.py::test_something
0.00s teardown test_durations.py::test_2
0.00s setup test_durations.py::test_1
0.00s teardown test_durations.py::test_something
0.00s teardown test_durations.py::test_3
0.00s call test_durations.py::test_something
0.00s teardown test_durations.py::test_1
0.00s setup test_durations.py::test_2
========================= 4 passed in 0.08s =========================
With --durations=10
we get:
===================== slowest 10 test durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
(8 items with duration < 0.005s hidden. Use -vv to show these durations.)
========================= 4 passed in 0.06s =========================
Perhaps "items" is a bit misleading? How about "entries" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i propose the word reports
since i believe it may match better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RonnyPfannschmidt , I also prefer "entries" over "reports" : even if the latter is more accurately reflecting the inner objects (TestReport
), it is a bit misleading for the "lambda" user. But obviously that is not my call here, I let the team decide.
Shall I also modify the slowest xx test durations
header to remove the word test
?
For example:
===================== slowest 10 durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
(8 entries with duration < 0.005s hidden. Use -vv to show these durations.)
========================= 4 passed in 0.06s =========================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarie of course, im sure that with more discussion and input we will reach a conclusion that will serve the users best,
i do have to admit that my bias is structurally technical (aka i like to have the ux and the technicalities to match in such a sense that we do not need to have a dissonance between what appens on technical level and what happens on ux level), but it may be better to change the technical aspect to match the comprehensible ux rather than the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eventually modified it that way:
===================== slowest 10 durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
(8 durations < 0.005s hidden. Use -vv to show these durations.)
========================= 4 passed in 0.06s =========================
Let me know if that works for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it 👍
I don't that test is a good idea, if it relies on actual system behavior. The system can be slow of switch threads etc. and the time of a single test can be unbounded. So using an exact count would be flaky. Maybe just monkeypatch the |
testing/acceptance_test.py
Outdated
@@ -915,7 +915,8 @@ def test_calls(self, testdir): | |||
["*durations*", "*call*test_3*", "*call*test_2*"] | |||
) | |||
result.stdout.fnmatch_lines( | |||
["(0.00 durations hidden. Use -vv to show these durations.)"] | |||
["(8 items with duration < 0.005s hidden. Use -vv to show these " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because it also includes setup/teardown durations, here's the run with --durations=999 -vv
:
===================== slowest 99 test durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
0.00s call test_durations.py::test_1
0.00s setup test_durations.py::test_3
0.00s setup test_durations.py::test_something
0.00s teardown test_durations.py::test_2
0.00s setup test_durations.py::test_1
0.00s teardown test_durations.py::test_something
0.00s teardown test_durations.py::test_3
0.00s call test_durations.py::test_something
0.00s teardown test_durations.py::test_1
0.00s setup test_durations.py::test_2
========================= 4 passed in 0.08s =========================
With --durations=10
we get:
===================== slowest 10 test durations =====================
0.02s call test_durations.py::test_3
0.01s call test_durations.py::test_2
(8 items with duration < 0.005s hidden. Use -vv to show these durations.)
========================= 4 passed in 0.06s =========================
Perhaps "items" is a bit misleading? How about "entries" instead?
For information I did some tests (gist available here) with the python tools for timing, and I have to say that I am quite puzzled with the results: there is no apparent stability on windows 10. So I will not try to re-invent the wheel and will assume that the official python documentation is the source to trust: therefore we should use Now,
|
…ed instead of `time.perf_counter` to measure execution time, and `CallInfo` and `TestReport` objects have an extra `duration_ns` attribute to store the highest-precision duration.
…ut is just `None` on python < 3.7
src/_pytest/runner.py
Outdated
""" Result/Exception info a function invocation. """ | ||
""" Result/Exception info a function invocation. | ||
|
||
:param excinfo: `ExceptionInfo` describing the captured exception if any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param excinfo: `ExceptionInfo` describing the captured exception if any | |
:param excinfo: (Optional[ExceptionInfo]) The captured exception, if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fond of using typing
symbols in docstring, they are not for humans. Especially as the real official meaning of Optional
is that the type
is optional (hence the attribute is nonable).
Also, Sphinx might like the use of backticks, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sphinx docs already display type annotations for annotated functions, for example: https://docs.pytest.org/en/latest/reference.html#pytest-exit. Normally with annotations we don't need to write types in comments, but attrs is a special case.
I would not say type annotations are not for humans -- programmers are humans too, at least for now :) I except a lot of Python programmers will eventually learn what they mean.
I like to have the Optional
, it tells me immediately it might be None
, no need to read the text.
I'm not sure about the backticks, I suggested to remove it because you didn't add it for the others. I'm fine with whatever works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh thanks I forgot to include them for the others.
Seems like another rebase is needed to fix CI, now due to #6976 -- bad luck :) |
Co-Authored-By: Ran Benita <[email protected]>
Co-Authored-By: Ran Benita <[email protected]>
Co-Authored-By: Ran Benita <[email protected]>
You are right, this is probably overkill. I had a look at PEP564 and it seems to mostly state that the issue is with keeping nanosecond resolution when measuring very long durations (> 104 days). So completely out of scope for pytest. I'll remove this. |
Everything should now be ok. Let me know ! |
src/_pytest/runner.py
Outdated
@@ -226,7 +226,7 @@ def call_runtest_hook(item, when: "Literal['setup', 'call', 'teardown']", **kwds | |||
class CallInfo: | |||
""" Result/Exception info a function invocation. | |||
|
|||
:param excinfo: (`ExceptionInfo`, optional) describing the captured exception if any | |||
:param excinfo: (`ExceptionInfo`, optional) describing the captured exception if any. | |||
:param start: (`float`) The system time when the call started, in seconds since the epoch. | |||
:param stop: (`float`) The system time when the call ended, in seconds since the epoch. | |||
:param duration: (`float`) The call duration, in seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format is :param type name:
, e.g. :param float start:
.
You could also add it below the attribs:
start = attr.ib()
"""The system time when the call started, in seconds since the epoch.
:type: float"""
Check with make -C doc/en html
/ tox -e docs
though, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I'll simply update the class docstring then, if it does not make much difference.
Note that I cant check the doc generation as tox -e docs
does not seem to work on windows
docs run-test: commands[0] | sh -c 'towncrier --draft > doc/en/_changelog_towncrier_draft.rst'
ERROR: InvocationError for command could not find executable sh
And with make
I seem to get an error
(tools_py37) C:\_dev\python_ws\_Libs_OpenSource\pytest>make -C doc/en html
make: Entering directory `C:/_dev/python_ws/_Libs_OpenSource/pytest/doc/en'
Running Sphinx v2.4.4
Configuration error:
There is a programmable error in your configuration file:
Traceback (most recent call last):
File "c:\miniconda3\envs\tools_py37\lib\site-packages\sphinx\config.py", line 348, in eval_config_file
execfile_(filename, namespace)
File "c:\miniconda3\envs\tools_py37\lib\site-packages\sphinx\util\pycompat.py", line 81, in execfile_
exec(code, _globals)
File "C:\_dev\python_ws\_Libs_OpenSource\pytest\doc\en\conf.py", line 22, in <module>
from _pytest.compat import TYPE_CHECKING
ImportError: cannot import name 'TYPE_CHECKING' from '_pytest.compat' (c:\miniconda3\envs\tools_py37\lib\site-packages\_pytest\compat.py)
make: *** [html] Error 2
make: Leaving directory `C:/_dev/python_ws/_Libs_OpenSource/pytest/doc/en'
So for the sake of completing the PR I chose the easy way as you seem to say that any way works ok :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also on Windows, and I use tox -e docs
to check/generate documentation. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicoddemus which installer do you use for the missing sh
required by tox -e docs
? (see error message above)
git bash, cygwin, or some conda package ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry missed that one. My bash
comes from git:
λ where bash
C:\Program Files\Git\usr\bin\bash.exe
It is a bit of annoying to require bash on windows just to generate the docs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the great work and patience @smarie!
Squashed and merged. Thanks @smarie, and @nicoddemus and @blueyed for review. |
CallInfo.start
is now retro-computed asCallInfo.stop - duration
, whereduration
is counted usingtime.perf_counter()
. Fixes #4391Unless your change is trivial or a small documentation fix (e.g., a typo or reword of a small section) please:
Create a new changelog file in the
changelog
folder, with a name like<ISSUE NUMBER>.<TYPE>.rst
. See changelog/README.rst for details.Write sentences in the past or present tense, examples:
Also make sure to end the sentence with a
.
.Add yourself to
AUTHORS
in alphabetical order.-->