-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Preserve chronological order of stdout and stderr with capsys
#6690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve chronological order of stdout and stderr with capsys
#6690
Conversation
Thanks for working on this. |
btw: see also #6671 (feedback would be appreciated). This should also be done for |
Right, I wanted to put it in In theory, if I'll put tuples into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would be using the same tmpfile you could more easily achieve your goal (ordered/single output).
However, I think that the goal here is to have an additional "joined/merged streams" property, and therefore it makes sense to have it via a queue (with tuples) internally.
This should also support --capture=sys-merged
then later (I've used merged
here for what you call ord
).
I've changed the capture class for |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Took a quick look only)
I think this can be simplified / made clearer in general.
@aklajnert |
@blueyed, I think we want to just add the |
The idea is to only use a single |
FWIW I still think that the feature itself is good (since using a single tmpfile would not allow for distinguishing between which stream was used). I've also just came across 6f385fb - where a non-used "mixed" mode was removed (using a single tmpfile). |
b46ed10
to
cbeda28
Compare
@blueyed I think I have the code in a quite good shape now (except https://github.com/pytest-dev/pytest/pull/6690/files#r382912377 which I'm not sure how to proceed). I need to update the PR description, you think it will be better to just create a fresh PR with a new description, or just continue here? |
src/_pytest/capture.py
Outdated
"""Read and return the captured output so far, resetting the internal buffer. | ||
|
||
:return: captured content as a namedtuple with ``out`` and ``err`` string attributes | ||
""" | ||
if combined: | ||
return CaptureResult(self._get_combined(), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick thought: couldn't CaptureResult
gain an optional combined
attribute/property?
It would still provide out
and err
then, but would also have combined
.
combined
could even default to out + err
then (when not combining).
(I think it's bad to actually have the combined output as out
here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about it initially, but since the CaptureResult
is a namedtuple
I see 2 potential solutions:
- It should be changed to regular class with a new attribute or a method, that behaves as tuple to preserve current behavior.
- Instead of returning
CaptureResult
forcombined=True
, just return a single string with the combined output.
I think both approaches are bad - the first one has no added value and may cause confusion, and the second one will make readouterr()
inconsistent and also potentially confusing.
Maybe adding a new read()
or readcombined()
method with a different return type would make sense here?
450d18c
to
8ee8a41
Compare
capsys
57510e9
to
600c055
Compare
capsys
capsys
8ad560a
to
2ba6c66
Compare
Hi @aklajnert, I'd like to review this PR, might get me a bit to get to it though. From a cursory look, I think the functionality would be good to have, but I also suspect there is a better way to achieve the goal, however it will take me some time to think "how would I do that". In the meantime, some comments:
|
Thanks for the comments @bluetech. I'll do another approach for removing the deque from global variables. |
bc1e5bd
to
e429277
Compare
@bluetech I've created a new method to read combined output which returns the single output. Also, the I didn't find how to make the |
e429277
to
0199962
Compare
@bluetech @nicoddemus - should I discard this PR, or is there still a chance for it to get merged? |
Apologies for the delay. We had some internal issues that we have since then been resolved, and we are in the process of picking up old PRs and clearing up old ones, so it might still take a while to get back to your PR. Thanks for the contribution and for the patience! Feel free to ping back in a week or so if still nobody posts further comments here. 👍 |
@aklajnert thanks again for submitting this PR, we greatly appreciate it. Decided to take the time to review now, given the work you have put into this already and the usefulness of the feature. 👍 As @bluetech commented, it would probably better if we could get away with the global variables and tight coupling between I think the code as you wrote it is as good as we can get under the current design. I also think we (the maintainers) should take some time later to refactor things to make them nicer, I'm sure this is possible given that this code has evolved organically a lot since the first time it was written. Having said that, we just need to make sure the API doesn't prevent us from refactoring this to something we believe is better from the maintenance point of view. This is hard to decide now, given that we don't even have a specific refactoring in mind. Just as an example of a future API that I can imagine coming up later with a redesign: def test(capsys):
capsys.combine()
out, err = capsys.readouterr()
# out is combined stream, err is None From an user POV this looks reasonable to me because that's how So I propose we mark the So in summary, what I propose:
This of course comes from my belief that this can't be easily accomplished on the current design, but if someone else has an idea on how to implement this cleanly now then by all means let us know. 😁 Thoughts @bluetech @asottile @RonnyPfannschmidt? |
Experimental and go 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general implying ordering when dealing with standard streams leads to bugs so I'm slightly hesitant for this (as they are not usually synchronized at the OS level). the only time that ordering is well defined is when one of the streams is set to the other (2>&1
for instance)
that said, if this api helps someone with something it's probably fine to include (especially if we're marking it experimental with the idea that maybe it doesn't succeed) -- though as soon as someone depends on it I imagine we'll have a hard time removing it even if we mark it experimental 🤔
|
||
def __init__(self, fd: int) -> None: | ||
super().__init__() | ||
self._fd = fd # type: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mypy can infer this value
class OrderedCapture(SysCapture): | ||
"""Capture class that keeps streams in order.""" | ||
|
||
streams = collections.deque() # type: collections.deque |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit odd for this to be a class member and not an instance member (as written this makes it effectively a global variable). I'd rather see it as an instance variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, but see my comment in in #6690 (comment):
As @bluetech commented, it would probably better if we could get away with the global variables and tight coupling between OrderedCapture and OrderedWriter, but looking at the overall design, I see why it was coded this way: currently MultiCapture always demands 3 captures (stdout, stderr, stdin), so we need a way to communicate/order the writes.
IOW I don't know how to easily fix this, as @aklajnert also commented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all that's needed is to grow another api on the capture classes which represents combined? if that's possible I think that would also fix the problem below of the backward-class-coupling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but currently who deals with each capture class is MultiCapture
, which always creates 3 capture instances (Capture(0)
, Capture(1)
, and Capture(2)
).
Or do you mean MultiCapture
to grow this new API, and forward to the underlying Capture
classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see :/ maybe nevermind then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is a bit frustrating, that's why I think we have to rethink the whole design (in a future PR that is).
the correct order of messages. | ||
""" | ||
if self.captureclass is not OrderedCapture: | ||
raise AttributeError("Only capsys is able to combine streams.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is usually a design smell for a superclass to know about a particular subclass, perhaps this method should defer to the captured data's class and the particular subclasses can decide whether they implement this api or not
(I think it would also be possible to implement this for capfd, but as I've implied above it's difficult / impossible to preserve exact ordering there in software -- maybe there's an api for fd duping that does the right thing but I haven't come across it yet) |
print("I'm in stderr", file=sys.stderr) | ||
print("Hey, stdout again!") | ||
|
||
output = capsys.read_combined() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this just occurred to me, to follow with the naming of the other methods (readouterr
), we should stick to readcombined
.
(If we all agree on the approach I suggested, I can finish up the PR by making the appropriate experimental/docs changes required). |
The status on my end is: the typing work and this PR prompted me to look at My main point is that I'm not sure that capture is currently in a state where it should be getting new features, instead it should be cleaned up first, to have a solid foundation for new work. Hopefully I'll get to it in a ~month's time. But I understand that it is not fair to block work on some refactor for an undetermined amount of time, so I'm perfectly fine if another maintainer feels the code is good and the feature is worth having. I'll just rebase my work on top. |
Thanks, that's fair and very reasonable. However in this specific case, we are taking some measures to at least let us change/remove the
I'm not an expert on that code, but it seems it should definitely be possible to redesign it to support the feature. And if turns out to be wrong, we can remove the feature if we mark it as experimental; it might upset some people at that point, but then we can point out that the feature would not even be included in the first place otherwise. 😁 |
I do have additional concern that pytest would be giving the false illusion that stdout and stderr can be deterministically ordered if this feature were integrated -- but I'm fine being overridden on that :) |
As it stands this works only for |
my point is, even though we can guarantee the calls are in order for |
Ahh sorry yes, I did misunderstand what you said before, thanks for the clarification. I wonder what sort of |
I've never been able to successfully get a process consuming another processes output on >1 streams to have a consistent ordering without adding significant but I guess as long as this PR is "know that write() was called in this order" and not really guaranteeing what the actual behaviour will end up being then maybe it's fine? |
Agree, and I think so IMHO. 😁 |
Just to make it clear though, can we all vote (in this comment) if we should go with the proposal above or not? 👍: We mark the function 👎: We leave this unmerged until the time comes to refactor the module and implement the overall feature in a more elegant manner. |
Thank you, folks, for the feedback. I've spent some time figuring out how to implement this and the final solution came out a bit hacky, so I wouldn't mind postponing merge after the module will be refactored (happy to help with the refactor if you want). I wasn't aware that the streams are not synchronized by the OS. I thought that if there is no threading involved, both streams will have a deterministic order. Thanks @asottile for pointing that out. I'll play with it a bit when I'll find some time. I guess that everything should be in the correct order when interpreter is executed with |
Hi everyone, @bluetech you voted to postpone this until the refactor, you still maintain this position? If so I believe we should close this for now, and retake this once the refactoring is done. 👍 What do you think? |
I think we shouldn't merge the approach taken in this PR, so I'll close this. Just to be clear, the feature itself seems useful, so it's not a rejection of the idea itself. Thanks @aklajnert for working on this! |
This is a solution that will allow preserving chronological order of stdout and stderr while capturing (#5449).
The solution changes
capsys
to usedeque
for keeping the order of streams. It is possible to retrieve both streams traditionally or as a combined version.I've added two arguments to
readouterr()
but only forcapsys
. I wonder if that's a good approach, or should I add a new method for retrieving combined output? An additional argument is to not flush streams while retrieving. This will allow retrieving the streams traditionally and combined at the same time.