Skip to content

mingw: Fix crash in needs_hiding() #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

SyntevoAlex
Copy link

For strings ending with slash, loop jumped over terminating null.

Signed-off-by: Alexandr Miloslavskiy [email protected]

For strings ending with slash, loop jumped over terminating null.

Signed-off-by: Alexandr Miloslavskiy <[email protected]>
@SyntevoAlex
Copy link
Author

@dscho Would you like to review?

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I would like to see a less intrusive version of this patch, as indicated, other than that it's really good.

} while (is_dir_sep(*path));
/* ignore trailing slashes */
if (*path)
basename = path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about an else break; here? That would make your patch substantially simpler.

And yes, this is a bug...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested loops with single counter are prone to bugs like the one I'm fixing. That's why I decided to also remove the surface where problems arise. If you really want to, I'll change it, but I'd rather submit current patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change the shape of the function, that's fine, but please do that as a separate patch that does not claim to fix a bug at the same time. It is hard to spot bugs when a patch tries to do too many things, and I have to admit that I could not convince myself even after reading the patch for a full 2 minutes that it does not introduce a regression...

if (is_dir_sep(path[1]))
continue;

basename = path + 1;
Copy link
Member

@dscho dscho Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's way too clever for my liking. I had to contort my brain a little to see that you are essentially skipping chains of directory separators in a convoluted way.

So let me propose something completely different instead, something that should be uncontroversial:

basename = find_last_dir_sep(path);
if (basename)
    basename++;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that switching from one algorithm to another is a bit hard to understand. Reading my algorithm with a fresh eye is probably easier then reading old code, though. But yes, this can vary from reader to reader.

The new code you suggested will fail if path ends with slash - it will not achieve the goal of finding last non-empty component.

Copy link
Author

@SyntevoAlex SyntevoAlex Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of my algorithm is that it inspects every slash and decides yes/no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code you suggested will fail if path ends with slash - it will not achieve the goal of finding last non-empty component.

Good point.

Reading my algorithm with a fresh eye is probably easier then reading old code, though. But yes, this can vary from reader to reader.

I will readily admit that I find your code harder to follow.

In case of a chain of slashes (as in hello///this/), to understand your code, the reader has to follow the code through several iterations with different continue clauses. That's worse than a nested loop. It is a nested logic inside a single loop. It is true that every algorithm can be rewritten as a single loop over a single switch statement. Does that mean you should do it? I think it shouldn't.

And I had another look at your version vs the original version with the added else break;, and quite honestly, that latter version does read a lot more obvious to me.

It is more obvious because it saves me brain cycles: we're looking for a directory separator, okay. If we found one, we skip over any directory separators. Okay. Now we see whether there is anything left, and if so, assign basename, otherwise we stop.

It requires only linear thinking.

In your version, you artificially avoid one indentation level by starting with an if (!dir_sep) continue. That requires convoluted thinking on the readers' part because the "are we looking at something other than a directory separator" is the exact opposite of what we actually want: we do want to look for a directory separator. This is compounded by the third if which uses the exact opposite condition, but again, guarding a continue rather than the thing we actually want to do. And I guarantee you that it will caus even yourself some head scratching, six months from now, when you read the code and think: okay, to pass this first if, we have to see a directory separator, but why do we then not assign basename if the next character is a directory separator? Ah, because then we get into the next loop, and we obviously pass the first if because obviously what was the next character now is the current one, and that is a directory separator, but we had to test it twice to get so far.

See where this is going? It requires quite a bit of twisting, and that was not the case with the original version.

Now, your version could be improved by skipping the third if and by avoiding the opposite conditions of what we're asking for:

	/* find last non-empty path component */
	for (basename = path; *path; path++)
		if (is_dir_sep(*path) && path[1] && !is_dir_sep(path[1]))
			basename = path + 1;

I do not particularly like this condition (is_dir_sep(*path) && path[1] && !is_dir_sep(path[1])), it is not all that obvious, so I would add a comment. Further, it could be DRYed up a bit more by writing this:

	for (basename = path; *path; )
		/*
		 * A basename
		 * - starts after a directory separator,
		 * - is not empty, and
		 * - does not begin with a directory separator
		 */
		if (is_dir_sep(*(path++)) && *path && !is_dir_sep(*path))
			basename = path;

Oh, and I just noticed that you mix spaces and tabs for indentation. Please don't do that. Git uses 8-column tabs only, with spaces for the final part of the indentation if required.

BTW have you noticed that all discussed versions mishandle a path that consists only of slashes? Even on Windows, \ is a valid path...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, I think that this is a much better version:

static inline int needs_hiding(const char *path)
{
	const char *basename = NULL;

	if (hide_dotfiles == HIDE_DOTFILES_FALSE)
		return 0;

	/* We cannot use basename(), as it would remove trailing slashes */
	win32_skip_dos_drive_prefix((char **)&path);

	if (!is_dir_sep(*path))
		basename = path;

	while (*path)
		/*
		 * A basename
		 * - starts after a directory separator,
		 * - is not empty, and
		 * - does not begin with a directory separator
		 */
		if (is_dir_sep(*(path++) && *path && !is_dir_sep(*path)))
			basename = path;

	if (!basename)
		return 0;

	if (hide_dotfiles == HIDE_DOTFILES_TRUE)
		return *basename == '.';

	assert(hide_dotfiles == HIDE_DOTFILES_DOTGITONLY);
	return !strncasecmp(".git", basename, 4) &&
		(!basename[4] || is_dir_sep(basename[4]));
}

Copy link
Author

@SyntevoAlex SyntevoAlex Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disagreement is negligible, so I'm fine with whatever code you like more.

git-for-windows#2371

It's a pity the patch is no longer mine. Boasting rights, you know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with whatever code you like more.

git-for-windows#2371

It's a pity the patch is no longer mine. Boasting rights, you know.

Sorry, I meant to give you credit via that footer. Would you like a Co-authored-by: footer instead? The analysis is essentially yours...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see my algorithm this way:

1. We're not interested if it's not a slash.

2. Now we're on slash.

3. We're not interested if null follows.

4. We're not interested if another slash follows.

Right, but there is also:

  1. we will still be interested in what follows, but only once we found the last directory separator of the chain, and this will be handled via a subsequent iteration of the same loop.

However, this is pretty much one of the most interesting aspects of the algorithm, and I find it harder to reason about the correctness of your algorithm in that respect than about my version.

BTW have you noticed that all discussed versions mishandle a path that consists only of slashes? Even on Windows, \ is a valid path...

It doesn't really mishandle it.

But it does: the basename will still point to the first character, i.e. we assume a base name of \.

Instead, what we would want is basename == NULL to indicate that there is not even a base name to begin with.

If loop searches for last non-empty, then there's nothing to find in such string. Also, this fits into the goal of needs_hiding().

It happens to work because needs_hiding() really is only interested in the question whether the base name starts with a . or whether it is .git. In both cases, \ will not match, so there... 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK :) We just have slightly different preferences.

@SyntevoAlex
Copy link
Author

How do we handle upstream git now?

@dscho
Copy link
Member

dscho commented Oct 25, 2019

How do we handle upstream git now?

How about #427?

@SyntevoAlex
Copy link
Author

OK! (didn't receive notification for that PR, 'cc' didn't auto-subscribe me on gitgitgadget)

@dscho
Copy link
Member

dscho commented Oct 25, 2019

Sorry, my fault, I should have used the GitHub method also (cc @SyntevoAlex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants