-
Notifications
You must be signed in to change notification settings - Fork 157
Fix t5516 flakiness in Visual Studio builds #428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI @SyntevoAlex |
So apparently it is not only t0001.21 init allows insanely long --template, but also t3510.1[23] that fail due to this change (in the latter case, it would appear that a multi-line message is broken apart, something I thought not possible with Worse, t5516.81 now possibly fails with a "Broken pipe" on macOS. Will investigate more on Monday, unless you beat me to it, @SyntevoAlex. |
Please let me finish my patches instead of jumping in and reauthoring them. |
Sure. FWIW this here PR is mostly an experiment to see how quickly I could get a fix or t5516 without the additional complexity of allowing unlimited message lengths, in the hope to get this in before v2.24.0 (which is now unlikely). It just looked to me as if you were ready to wait for others to confirm my suggestion that the unlimited message length is irrelevant, or at least substantially less relevant. What I would have loved for you to do was to disentangle the concerns that your single patch tangled up, and to do a much better job at separating what information to put into code comments and what information to put into the commit message. But as I said, it did not appear to me as if you were eager to do these things, but rather wanted to wait, and I was not ready to join you in the waiting room. |
No, I'm actually aimed at getting things done. Just wanted to make sure before discarding a few days worth of work and going into another effort. Trimming is harder then it seems. You've already run into segfault, because vsnprintf() return value is desired buffer size and not what it has written. Ok, that's easy to fix, but appending '\n' is harder: need to avoid issues with partial multibyte characters. |
I've seen a broken pipe and an unexpected re-ordering of output lines, where did you see the segmentation fault?
Valid point. |
As for commit message, I actually think that having info about t5516 in code comment is better. This is what people are going to read, and commit message is often forgotten for archeologists. |
If |
On Monday, I will rework my patch. Please let me submit it in my name. If you figure what's wrong with broken pipe and/or reordering before that, I would be happy to co-author you. Otherwise, I will deal with all issues on Monday. FYI, https://public-inbox.org/git/[email protected]/
|
I suspect that broken pipe arises from parent process exiting before child process. If so, that is a separate issue and will not block my patch. |
Well, that's almost certainly true. But that also puts a dent into the hope of fixing the flakiness of t5516 any time soon. |
a6bb6d1
to
455026c
Compare
I already pointed out that a code comment this far removed from the actual problem is prone to become stale. Whereas the commit message is the exact spot to describe a breakage that is being fixed. Besides, it is poor practice to talk about an already-fixed problem in a code comment.
Not in the Git project: there, contributors (such as yourself, as witnessed in your very own commit message) do investigate the commit history of the code in question.
Sure.
If you take (most of) my alternative patch, I would like to be mentioned as co-author, too. FWIW I am not sure that it is worth doing this fix if it does not address the broken pipe issue on macOS: the test would be just as flaky, just in a different job of the Pipeline. That does not buy me much. And my patch still results in two failures in t3510, although I think I figured it out (I force-pushed, and the Azure Pipeline will tell me whether I managed to fix things). That is a very illuminating thread. |
The comment, and whole patch, isn't about fixing t5516. It's about fixing a real problem where user will observe interleaved outputs. Code comment isn't talking about fixing t5516, it's talking about various problems that could arise in fprintf(), and explains the need to preformat into memory buffer. This is what makes me believe that it's well placed.
That's good to hear.
Let's first see what investigation shows.
You've ran into partial multibyte character issue that I talked about before. |
Since the tests pass here, locally, without any multi-byte character stunt, I believe that such a stunt is not necessary. AFAICT most consoles (probably even all of them) have code in place to handle invalid/incomplete multi-byte characters gracefully. |
Seems that the tests now all pass with my patch. |
But how to explain broken pipe? |
It is a flaky test, that's why it only breaks the build sometimes. |
Yes, that's what I meant, you expressed concern about broken pipe, that it's an obstacle in getting this patch merged, and it seems to still be there. |
Any reason why |
Yes.
No. It is not an obstacle to getting it merged, but it is an obstacle to help with the flakiness of t5516 in general: that broken pipe problem is still there.
Since |
I see, I misunderstood it.
Didn't want to mix up |
@SyntevoAlex since this PR essentially does what I want it to do, I plan on submitting it in about 2h. Did you still plan on working on #407? |
Yes, I am working on #407 and I kindly ask you to let me do that in peace and without rush. |
Well, I kind of have something here that works, you know? And that is clean enough to get maybe even code-reviewed before v2.24.0 is due. I have no idea what you want to do on top of what I already did, so I am a bit at a loss why I should be waiting, i.e. what I should be waiting for. |
First, it's pretty impolite to ignore issue for months and then mess with someone else's work when he volunteers. |
I am sorry that it took so long for this patch to get reviewed. We're all busy.
I apologize if you have the impression that I mess with your work. From my point of view, I came up with a much smaller, much more elegant solution, and since you had already provided a good description of the "prior art" in your commit message, I copy-edited that, and made you a co-author. Further, I only started to work for real on this when I sensed reluctance on your part to accept that my suggestion had merit. You asked for a second opinion about the truncation, after all, when it was pretty obvious from the commit history you yourself found that there is a strong preference for truncating the error message rather than risking OOMs, and you never addressed my concerns about the unnecessary complexity of your code and about the inappropriate location of the comments about t5516, instead trying to double-down on commenting on a particular test case far away from that test case, in the code that implements a generic error reporting mechanism. If I had had the impression that we could reach consensus where I was purely reviewing your code and providing suggestions how to improve it, I would have gladly refrained from implementing a fix myself. It does not appear to me that I was wrong in my assessment.
It still has two bugs? Please point them out to me.
I pushed a tentative version and let the PR build figure out what still needs to be dealt with, which is how I work. I am sorry if you don't like that style.
On my side, I would place a higher priority in reducing complexity. Complexity is never good, and I see way too much complexity for no benefit in #407. As to "rushing"... Sending a patch to the mailing list for discussion is not at all rushing anything. It is opening a discussion among interested and highly competent Git developers, and I fully intend on benefittin from that expertise to the fullest. Finally, it is not exactly "rushing" if I hold back a perfectly fine patch for over a day just because you said you wanted to work on your version (without any indication what you find wrong in my version, which I tolerated, but find unsatisfying, to say the least). I am beginning to suspect that the time I waited was not well spent. |
I actually saw that patch was implemented with a fixed buffer as most simple and then it repeatedly failed to work well, until there was an attempt to remove the limit.
This is dependent on decision about limit, which wasn't clear to me.
You jumped to conclusions without giving me a single working day to react.
This doesn't feel right to me when I'm trying to finish my work and you jump in and offer your patch, also not perfect, and I'm expected to switch to polishing your patch instead of polishing mine.
It's best to submit perfect patches firsthand. |
Can we please reset the discussion? Let's get it back into a relaxed flow. I'm working on my contribution to ensure perfect quality, give me some time, and then you can see it. |
The MSVC runtime behavior differs from glibc's with respect to `fprintf(stderr, ...)` in that the former writes out the message character by character. In t5516, this leads to a funny problem where a `git fetch` process as well as the `git upload-pack` process spawned by it _both_ call `die()` at the same time. The output can look like this: fatal: git uploadfata-lp: raemcokte :error: upload-pnot our arcef k6: n4ot our ea4cr1e3f 36d45ea94fca1398e86a771eda009872d63adb28598f6a9 8e86a771eda009872d6ab2886 Let's avoid this predicament altogether by rendering the entire message, including the prefix and the trailing newline, into the buffer we already have (and which is still fixed size) and then write it out via `write_in_full()`. We still clip the message to at most 4095 characters. The history of `vreportf()` with regard to this issue includes the following commits: d048a96 (2007-11-09) - 'char msg[256]' is introduced to avoid interleaving 389d176 (2009-03-25) - Buffer size increased to 1024 to avoid truncation 625a860 (2009-11-22) - Buffer size increased to 4096 to avoid truncation f4c3edc (2015-08-11) - Buffer removed to avoid truncation b5a9e43 (2017-01-11) - Reverts f4c3edc to be able to replace control chars before sending to stderr 9ac13ec (2006-10-11) - Another attempt to solve interleaving. This is seemingly related to d048a96. 137a0d0 (2007-11-19) - Addresses out-of-order for display() 34df8ab (2009-03-10) - Switches xwrite() to fprintf() in recv_sideband() to support UTF-8 emulation eac14f8 (2012-01-14) - Removes the need for fprintf() for UTF-8 emulation, so it's safe to use xwrite() again 5e5be9e (2016-06-28) - recv_sideband() uses xwrite() again Note that we print nothing if the `vsnprintf()` call failed to render the error message; There is little we can do in that case, and it should not happen anyway. The process may have written to `stderr` and there may be something left in the buffer kept in the stdio layer. Call `fflush(stderr)` before writing the message we prepare in this function. Helped-by: Jeff King <[email protected]> Helped-by: Alexandr Miloslavskiy <[email protected]> Helped-by: SZEDER Gábor <[email protected]> Helped-by: Junio C Hamano <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
fce0894
to
f6d6c81
Compare
/submit |
Submitted as [email protected] |
This patch series was integrated into pu via git@80f5cd0. |
This patch series was integrated into pu via git@41faec3. |
This patch series was integrated into pu via git@69eb89d. |
This patch series was integrated into pu via git@33193cc. |
This patch series was integrated into pu via git@4c8b5bf. |
This patch series was integrated into pu via git@a11a2c9. |
This patch series was integrated into pu via git@7090f4e. |
This patch series was integrated into pu via git@6ae1307. |
This patch series was integrated into pu via git@00c0217. |
This patch series was integrated into next via git@cae61b9. |
This patch series was integrated into pu via git@137e3f0. |
This patch series was integrated into pu via git@f3df0fc. |
This patch series was integrated into pu via git@ee27365. |
This patch series was integrated into pu via git@c0c6e16. |
This patch series was integrated into pu via git@9139157. |
This patch series was integrated into pu via git@5444d52. |
This patch series was integrated into next via git@5444d52. |
This patch series was integrated into master via git@5444d52. |
Closed via 5444d52. |
Among the flaky tests, it seems that the Azure Pipeline suffers relatively frequently from t5516 failing with the Visual Studio builds. Essentially, we grep for an error message, but that error message is produced twice, once by a
fetch
and once by theupload-pack
spawned from it, and those error messages are usually interleaved because of MSVC runtimefprintf()
idiosyncracies.As a consequence, I have to re-run about half a dozen failed builds a day, which I would like to avoid. My plan is therefore to merge this patch into Git for Windows v2.24.0-rc2.
The commit message of this patch is based, in part, on #407.
This fixes #240.
Changes since v3:
fflush(stderr);
for accuracy, as suggested by Junio.prefix
handling as well as providing a properBUG
message andabort()
ing when theprefix
is too long. Thanks Junio!Changes since v2:
write_in_full()
instead ofxwrite()
again (to make sure that the entire message is printed).vsnprintf()
fails, now we at least print the prefix.Changes since v1:
xwrite()
function this patch uses, rather than thewrite_in_full()
function used by an earlier iteration, thanks Gábor).Cc: Jeff King [email protected], SZEDER Gábor [email protected], Alexandr Miloslavskiy [email protected]