-
Notifications
You must be signed in to change notification settings - Fork 141
mingw: Fix crash in needs_hiding() #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mingw: Fix crash in needs_hiding() #414
Conversation
For strings ending with slash, loop jumped over terminating null. Signed-off-by: Alexandr Miloslavskiy <[email protected]>
@dscho Would you like to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I would like to see a less intrusive version of this patch, as indicated, other than that it's really good.
} while (is_dir_sep(*path)); | ||
/* ignore trailing slashes */ | ||
if (*path) | ||
basename = path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an else break;
here? That would make your patch substantially simpler.
And yes, this is a bug...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested loops with single counter are prone to bugs like the one I'm fixing. That's why I decided to also remove the surface where problems arise. If you really want to, I'll change it, but I'd rather submit current patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to change the shape of the function, that's fine, but please do that as a separate patch that does not claim to fix a bug at the same time. It is hard to spot bugs when a patch tries to do too many things, and I have to admit that I could not convince myself even after reading the patch for a full 2 minutes that it does not introduce a regression...
if (is_dir_sep(path[1])) | ||
continue; | ||
|
||
basename = path + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's way too clever for my liking. I had to contort my brain a little to see that you are essentially skipping chains of directory separators in a convoluted way.
So let me propose something completely different instead, something that should be uncontroversial:
basename = find_last_dir_sep(path);
if (basename)
basename++;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that switching from one algorithm to another is a bit hard to understand. Reading my algorithm with a fresh eye is probably easier then reading old code, though. But yes, this can vary from reader to reader.
The new code you suggested will fail if path ends with slash - it will not achieve the goal of finding last non-empty component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of my algorithm is that it inspects every slash and decides yes/no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code you suggested will fail if path ends with slash - it will not achieve the goal of finding last non-empty component.
Good point.
Reading my algorithm with a fresh eye is probably easier then reading old code, though. But yes, this can vary from reader to reader.
I will readily admit that I find your code harder to follow.
In case of a chain of slashes (as in hello///this/
), to understand your code, the reader has to follow the code through several iterations with different continue
clauses. That's worse than a nested loop. It is a nested logic inside a single loop. It is true that every algorithm can be rewritten as a single loop over a single switch
statement. Does that mean you should do it? I think it shouldn't.
And I had another look at your version vs the original version with the added else break;
, and quite honestly, that latter version does read a lot more obvious to me.
It is more obvious because it saves me brain cycles: we're looking for a directory separator, okay. If we found one, we skip over any directory separators. Okay. Now we see whether there is anything left, and if so, assign basename
, otherwise we stop.
It requires only linear thinking.
In your version, you artificially avoid one indentation level by starting with an if (!dir_sep) continue
. That requires convoluted thinking on the readers' part because the "are we looking at something other than a directory separator" is the exact opposite of what we actually want: we do want to look for a directory separator. This is compounded by the third if
which uses the exact opposite condition, but again, guarding a continue
rather than the thing we actually want to do. And I guarantee you that it will caus even yourself some head scratching, six months from now, when you read the code and think: okay, to pass this first if
, we have to see a directory separator, but why do we then not assign basename
if the next character is a directory separator? Ah, because then we get into the next loop, and we obviously pass the first if
because obviously what was the next character now is the current one, and that is a directory separator, but we had to test it twice to get so far.
See where this is going? It requires quite a bit of twisting, and that was not the case with the original version.
Now, your version could be improved by skipping the third if
and by avoiding the opposite conditions of what we're asking for:
/* find last non-empty path component */
for (basename = path; *path; path++)
if (is_dir_sep(*path) && path[1] && !is_dir_sep(path[1]))
basename = path + 1;
I do not particularly like this condition (is_dir_sep(*path) && path[1] && !is_dir_sep(path[1])
), it is not all that obvious, so I would add a comment. Further, it could be DRYed up a bit more by writing this:
for (basename = path; *path; )
/*
* A basename
* - starts after a directory separator,
* - is not empty, and
* - does not begin with a directory separator
*/
if (is_dir_sep(*(path++)) && *path && !is_dir_sep(*path))
basename = path;
Oh, and I just noticed that you mix spaces and tabs for indentation. Please don't do that. Git uses 8-column tabs only, with spaces for the final part of the indentation if required.
BTW have you noticed that all discussed versions mishandle a path that consists only of slashes? Even on Windows, \
is a valid path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, I think that this is a much better version:
static inline int needs_hiding(const char *path)
{
const char *basename = NULL;
if (hide_dotfiles == HIDE_DOTFILES_FALSE)
return 0;
/* We cannot use basename(), as it would remove trailing slashes */
win32_skip_dos_drive_prefix((char **)&path);
if (!is_dir_sep(*path))
basename = path;
while (*path)
/*
* A basename
* - starts after a directory separator,
* - is not empty, and
* - does not begin with a directory separator
*/
if (is_dir_sep(*(path++) && *path && !is_dir_sep(*path)))
basename = path;
if (!basename)
return 0;
if (hide_dotfiles == HIDE_DOTFILES_TRUE)
return *basename == '.';
assert(hide_dotfiles == HIDE_DOTFILES_DOTGITONLY);
return !strncasecmp(".git", basename, 4) &&
(!basename[4] || is_dir_sep(basename[4]));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disagreement is negligible, so I'm fine with whatever code you like more.
It's a pity the patch is no longer mine. Boasting rights, you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with whatever code you like more.
It's a pity the patch is no longer mine. Boasting rights, you know.
Sorry, I meant to give you credit via that footer. Would you like a Co-authored-by:
footer instead? The analysis is essentially yours...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see my algorithm this way:
1. We're not interested if it's not a slash. 2. Now we're on slash. 3. We're not interested if null follows. 4. We're not interested if another slash follows.
Right, but there is also:
- we will still be interested in what follows, but only once we found the last directory separator of the chain, and this will be handled via a subsequent iteration of the same loop.
However, this is pretty much one of the most interesting aspects of the algorithm, and I find it harder to reason about the correctness of your algorithm in that respect than about my version.
BTW have you noticed that all discussed versions mishandle a path that consists only of slashes? Even on Windows,
\
is a valid path...It doesn't really mishandle it.
But it does: the basename
will still point to the first character, i.e. we assume a base name of \
.
Instead, what we would want is basename == NULL
to indicate that there is not even a base name to begin with.
If loop searches for last non-empty, then there's nothing to find in such string. Also, this fits into the goal of
needs_hiding()
.
It happens to work because needs_hiding()
really is only interested in the question whether the base name starts with a .
or whether it is .git
. In both cases, \
will not match, so there... 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK :) We just have slightly different preferences.
How do we handle upstream git now? |
How about #427? |
OK! (didn't receive notification for that PR, 'cc' didn't auto-subscribe me on gitgitgadget) |
Sorry, my fault, I should have used the GitHub method also ( |
For strings ending with slash, loop jumped over terminating null.
Signed-off-by: Alexandr Miloslavskiy [email protected]