-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Block paths which might be misinterpreted as alternate file streams #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block paths which might be misinterpreted as alternate file streams #686
Conversation
ref git-for-windows#679 Windows disallows the colon `:` character in file names. However many win32 file APIs allow path specifications of the form `<file path>:<stuff>` when reading or writing files. These are interpreted as pointing to the *alternate data stream* named `<stuff>` within the `<file path>` file. Documentation on alternate data streams: https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx Git for Windows, ignorant of file streams, will incorrectly map a Unix file named like `foo:bar` into the `bar` alternate stream of a file `foo`. This results in an unexpected file `foo` with size 0 in the working tree, and (depending on core.fscache setting), the expected "foo:bar" file being flagged as deleted (or maybe not). It would be preferrable if Git for Windows detected such files and issued errors, similar to how it does for various other invalid path situations. This would help reduce pain and make things less confusing for those working in a mixed Unix/Windows team. This change adds a check for ':' so that we never accidentally unpack a file into an alternate stream by accident. Any file path with a ':' is considered invalid, which is perfectly sensible for the purposes of git. If such a file is indeed detected and blocked, users can instruct git to totally ignore it via `git update-index --assume-unchanged`, just like they need to today for other invalid path situations. NB - a determined Windows user can still confuse the system in certain ways by explicitly creating alternate streams, but that requires exceptional user effort and is judged to be not worth pursuing at this time. Signed-off-by: Lincoln Atkinson <[email protected]>
4e1a55d
to
c4a42f2
Compare
I wonder if it would make sense to refuse other special characters as well. Edit: sorry, Other special characters would include glob chars ( And maybe make the feature configurable to be more acceptable to upstream folks? |
@kblees this would not be wrong to do, but it's not necessary. Eventually these paths are passed through Issues only arise when the path in question is not valid as a literal file name, but is still acceptable to Windows APIs due to some kind of special encoded format. e.g. there is already a check for this in the same function I updated
This catches files named like I am merely extending this check to also catch encodings of alternate data streams. |
This depends on what you want to achieve with this patch. If you just want proper error detection on Windows, the additional check should be made Windows-specific (e.g. wrapped in a macro that evaluates to nothing on other platforms, similar to If, on the other hand, you want to prevent illegal file names on all platforms (which IMO would be a Good Thing in a multi-platform project), the check should be platform independent (as your patch currently is), it should probably be configurable, and it should check for more illegal characters (as Git on Mac/Unix will obviously not be able to call Windows APIs for file name validation). |
All I aim to do with this patch is to handle this one Windows-specific edge case. I didn't realize code from this repo flowed back to non-Windows repos, thank you for raising that. I've refactored into a macro-style approach with a no-op implementation for non-Windows, matching I'd be happy to add some tests, but I'm finding it challenging to understand the test system or see where is a sensible place to plug in. It's somewhat difficult, too, as it's not possible to add a file with a problematic filename. Is there a facility for unit-testing? |
Thank you, @latkin, for your contribution! Since we are a (Windows-specific) downstream project of the Git project, we inherit their rules. And some of those rules are very strict because the Git developers not only care a lot about clean code, but also about a clean history. A lot. 😄 So when the patch that claims to refactor a filename check actually changes the formatting of dozens of comments, it is safe to assume that the Git maintainer would simply reject the patch. The reason this patch would be rejected is not just arbitrary pettiness, but it is a very good reason: every patch has to be reviewed carefully to ensure that it does not introduce inadvertent regressions. And it is just really, really (and unnecessarily) hard to review a patch that is dominated by formatting changes (that would even violate the coding conventions of Git) and whose actually functional changes are consequently really hard to find. If you do not believe me how hard it is to see whether your changes contain any bugs, have a look yourself: https://github.com/git-for-windows/git/pull/686/files (GitHub even refuses to show the diff for The aim of the Git for Windows project always has been to collect, vet and mature Windows-specific patches with the end goal of contributing the patches upstream. That means if our patches do not conform to the upstream conventions, either I have to clean them up, or I have to ask the contributors to change them to conform to Git's norms. The good news is that I actually had worked on the problem this Pull Request tries to address. Well, not the alternate file streams, but preventing Git from trying to check out file names that are illegal in DOS file names (whose convention NTFS inherited). The issue came up in the context of preventing anything to be checked out into I brushed up the patch and pushed it here: master...dscho:dos-filenames @latkin please have a look and comment. |
Oh sheesh, I did not realize so much formatting had changed, of course you shouldn't have to deal with that. I must have accidentally triggered auto-format in Visual Studio inadvertently. My apologies! |
ad8b721
to
1ee81d7
Compare
… Windows Signed-off-by: Lincoln Atkinson <[email protected]>
1ee81d7
to
e6806a6
Compare
@dscho I have corrected the egregious formatting changes, sorry about that |
@latkin have you had a chance to look at my |
Re: 8.3 short names won't always be problematic - an isolated file named This PR has been untouched for 4 months. Please let me know what else is required for acceptance, or if you aren't interested and would prefer to use the approach in your branch, go ahead and close this out. |
The issue is not so much whether you can create it. The issue is whether it can interfere with your collaborators' work trees' files. And a file
Sorry, I should have made it clearer that I require some effort on your side, too. I am a maintainer and need to rely on contributors to perform the bulk of the work. Anything else would not scale. |
Yes, I am asking explicitly what is that remaining effort that you require? What still needs to be done here? I am happy to continue volunteering my time to take care of it. Otherwise, if this isn't a contribution you are interested in, please close it. |
That is easy.
In other words, I would strongly suggest to take the By the way, I would like to point out that I suggested to rewrite the history to avoid uglifying |
@latkin to be quite honest, I would be really excited and thankful if you continued to work on this. |
Ah, had forgotten about this. I might be able to revisit in the next couple weeks, thanks for the reminder. |
I may be entirely wrong here, but I think Anyway, my point here is. Suppose you have drives
this should give an error in a freshly opened command prompt. Now run the following sequence:
That should open the file Anyway, because Git for Windows also runs in the command interpreter, perhaps it's not the best idea to block this kind of path? ...
The official documentation resides over at "Naming Files, Paths, and Namespaces". Another quite useful and comprehensive article over on Project Zero explains some further intricacies of path names on Windows. Since Git for Windows aims for the Win32 subsystem, it seems logical, however, to assume the strictest notion of what entails a valid path name (which includes stuff like no trailing dot - The alternative namespaces (Win32, Win32 device and NT "native") would in theory still allow to escape some of the rules imposed by Win32, but I am not sure how much of that would get swallowed by "Cygwin". That is, a file or folder named |
I'll close this, as it has been addressed as part of git@817ddd6. |
For the record, we also integrated it into Git for Windows' |
ref #679
Windows disallows the colon
:
character in file names. However manywin32 file APIs allow path specifications of the form
<file path>:<stuff>
when reading or writing files. These are interpreted as pointing to
the alternate data stream named
<stuff>
within the<file path>
file.Documentation on alternate data streams:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx
Git for Windows, ignorant of file streams, will incorrectly map a Unix
file named like
foo:bar
into thebar
alternate stream of a filefoo
. This results in an unexpected filefoo
with size 0 in the workingtree, and (depending on core.fscache setting), the expected "foo:bar" file
being flagged as deleted (or maybe not).
It would be preferrable if Git for Windows detected such files and issued
errors, similar to how it does for various other invalid path situations.
This would help reduce pain and make things less confusing for those
working in a mixed Unix/Windows team.
This change adds a check for ':' so that we never accidentally unpack a file
into an alternate stream by accident. Any file path with a ':' is considered
invalid, which is perfectly sensible for the purposes of git.
If such a file is indeed detected and blocked, users can instruct git to
totally ignore it via
git update-index --assume-unchanged
, just likethey need to today for other invalid path situations.
NB - a determined Windows user can still confuse the system in certain ways
by explicitly creating alternate streams, but that requires exceptional
user effort and is judged to be not worth pursuing at this time.
Signed-off-by: Lincoln Atkinson [email protected]