-
-
Notifications
You must be signed in to change notification settings - Fork 199
[16.0][FIX] fs_storage: Resolve rooted_dir sub path is inside path virtually #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 16.0
Are you sure you want to change the base?
[16.0][FIX] fs_storage: Resolve rooted_dir sub path is inside path virtually #522
Conversation
|
5b11f19 to
c662cc9
Compare
Ready for review, tagging @ivs-cetmix as the requestor of the PR from the issue. |
|
Hi @hnavarro-kernet thank you for your contribution! Unfortunately I don't possess enough technical knowledge of this module to do a comprehensive review, however I would kindly ask @sbidoul or maybe even @pedrobaeza (why not 😄 ) to assist here. |
|
Not using them. |
|
I'll leave that to @lmignon But to better understand, why do you mention |
As seen on the traceback reported in #513, the usage of make_path_posix in the original rooted_dir_file_system storage/fs_storage/rooted_dir_file_system.py Lines 28 to 29 in b50f468
https://github.com/fsspec/filesystem_spec/blob/e12aa7571244f6695264c92c4867978fed5ad092/fsspec/implementations/local.py#L319 |
Before this fix, a FileNotFoundError would ocassionally arise when 'os.getcwd()' is called due to the rooted_dir _join method to check whether the sub path is part of the parent path. This is probably caused by the number and/or combination of workers and threads and the transactions to the FS storage, having race issues. After this commit, the sub path is checked if it is contained inside the parent path virtually via posixpath and pathlib modules avoiding 'os.getcwd()' calls and running into this possible race condition.
c662cc9 to
d23c455
Compare
|
Ah, I see it now, thanks! |
|
This PR has the |
sbidoul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... since this is security sensitive I'm going to mark it as Request changes to be sure everything is crystal clear (I may totally be missing something, though).
| root = PurePosixPath(self.path).as_posix() | ||
| rnorm = posixpath.normpath(root) | ||
|
|
||
| jnorm = posixpath.normpath(joined or ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by this. Is it guaranteed that self.path will parse correctly as a PurePosixPath? Is it guaranteed that joined will be handled correctly by posixpath.normpath (Looking at the DirFileSystem implementation, it could be a dict or a list, and there is no guarantee the separator will be /).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, ... this code lack of tests...
Considering the remarks mentioned here, the solution should take into account the separator to detect the kind of path (Posix or Windows)..... @sbidoul I may be wrong, but regardless of the type of the path argument, the result of the call to super will always be a string... We could add a check before the added logic to make sure, but in my opinion, it's useless.
def _join(self, path):
joined = super()._join(path)
if not isinstance(joined, str):
return joined
# Uses the rigth path abstraction according to the
# path separator
if self.sep == '/':
PathClass = PurePosixPath
normpath = posixpath.normpath
elif self.sep == '\\':
PathClass = PureWindowsPath
normpath = ntpath.normpath
else:
raise ValueError(f"Unknown path separator: {self.sep!r}")
root = PathClass(self.path)
joined_path = PathClass(joined or '.')
rnorm = normpath(str(root))
jnorm = normpath(str(joined_path))
if not (jnorm == rnorm or jnorm.startswith(rnorm + self.sep)):
raise PermissionError(
f"Path {path!r} resolves to {jnorm!r} which is outside "
f"the root path {rnorm!r}"
)
return joinedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. But since we see that normpath is involving os.getcwd, does it make sense at all to use it on paths that may have nothing to do with the local file system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.getcwd was called by the make_path_posix from fsspec.implementations.local AFAIK, it's not the case with the methods from python core.
Yeah, understandable. I'll be happy to keep reviewing and raising concerns once I'm back from PTO next week. |
Before this fix, a FileNotFoundError would ocassionally arise when 'os.getcwd()' is called due to the rooted_dir _join method to check whether the sub path is part of the parent path.
This is probably caused by the number and/or combination of workers and threads and the transactions to the FS storage, having race issues.
After this commit, the sub path is checked if it is contained inside the parent path virtually via posixpath and pathlib modules avoiding 'os.getcwd()' calls and running into this possible race condition.
closes #513