Skip to content

Conversation

@hnavarro-kernet
Copy link

Before this fix, a FileNotFoundError would ocassionally arise when 'os.getcwd()' is called due to the rooted_dir _join method to check whether the sub path is part of the parent path.
This is probably caused by the number and/or combination of workers and threads and the transactions to the FS storage, having race issues.

After this commit, the sub path is checked if it is contained inside the parent path virtually via posixpath and pathlib modules avoiding 'os.getcwd()' calls and running into this possible race condition.

closes #513

@hnavarro-kernet hnavarro-kernet marked this pull request as draft October 2, 2025 08:32
@hnavarro-kernet
Copy link
Author

hnavarro-kernet commented Oct 2, 2025

  • Fix tests

@hnavarro-kernet hnavarro-kernet force-pushed the fix/16.0-hnavarro-kernet-fs_storage-virtual-join branch from 5b11f19 to c662cc9 Compare October 2, 2025 10:52
@hnavarro-kernet
Copy link
Author

  • Fixed an issue with the patch that created a duplicated candidate: subdirectory/here/subdirectory/here, although it would not report an error
  • Changed the implementation which caused absolute path candidates to fail even if they were an expansion of the root path

Ready for review, tagging @ivs-cetmix as the requestor of the PR from the issue.

@ivs-cetmix
Copy link
Member

Hi @hnavarro-kernet thank you for your contribution! Unfortunately I don't possess enough technical knowledge of this module to do a comprehensive review, however I would kindly ask @sbidoul or maybe even @pedrobaeza (why not 😄 ) to assist here.

@pedrobaeza pedrobaeza added this to the 16.0 milestone Oct 8, 2025
@pedrobaeza
Copy link
Member

Not using them.

@sbidoul
Copy link
Member

sbidoul commented Oct 8, 2025

I'll leave that to @lmignon

But to better understand, why do you mention os.getcwd? I don't see that in the diff, so I don't understand the problem this PR is solving.

@hnavarro-kernet
Copy link
Author

I'll leave that to @lmignon

But to better understand, why do you mention os.getcwd? I don't see that in the diff, so I don't understand the problem this PR is solving.

As seen on the traceback reported in #513, the usage of make_path_posix in the original rooted_dir_file_system

path_posix = os.path.normpath(make_path_posix(path))
root_posix = os.path.normpath(make_path_posix(self.path))
is the root cause of the os.getcwd calls, here's the original implementation:
https://github.com/fsspec/filesystem_spec/blob/e12aa7571244f6695264c92c4867978fed5ad092/fsspec/implementations/local.py#L319

Before this fix, a FileNotFoundError would ocassionally arise when
'os.getcwd()' is called due to the rooted_dir _join method to check
whether the sub path is part of the parent path.
This is probably caused by the number and/or combination of workers
and threads and the transactions to the FS storage, having race
issues.

After this commit, the sub path is checked if it is contained
inside the parent path virtually via posixpath and pathlib modules
avoiding 'os.getcwd()' calls and running into this possible race
condition.
@hnavarro-kernet hnavarro-kernet force-pushed the fix/16.0-hnavarro-kernet-fs_storage-virtual-join branch from c662cc9 to d23c455 Compare October 8, 2025 14:00
@sbidoul
Copy link
Member

sbidoul commented Oct 8, 2025

Ah, I see it now, thanks!

@OCA-git-bot
Copy link
Contributor

This PR has the approved label and has been created more than 5 days ago. It should therefore be ready to merge by a maintainer (or a PSC member if the concerned addon has no declared maintainer). 🤖

Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... since this is security sensitive I'm going to mark it as Request changes to be sure everything is crystal clear (I may totally be missing something, though).

root = PurePosixPath(self.path).as_posix()
rnorm = posixpath.normpath(root)

jnorm = posixpath.normpath(joined or ".")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this. Is it guaranteed that self.path will parse correctly as a PurePosixPath? Is it guaranteed that joined will be handled correctly by posixpath.normpath (Looking at the DirFileSystem implementation, it could be a dict or a list, and there is no guarantee the separator will be /).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, ... this code lack of tests...
Considering the remarks mentioned here, the solution should take into account the separator to detect the kind of path (Posix or Windows)..... @sbidoul I may be wrong, but regardless of the type of the path argument, the result of the call to super will always be a string... We could add a check before the added logic to make sure, but in my opinion, it's useless.

def _join(self, path):
    joined = super()._join(path)
    if not isinstance(joined, str):
        return joined
    # Uses the rigth path abstraction according to the
    # path separator
    if self.sep == '/':
        PathClass = PurePosixPath
        normpath = posixpath.normpath
    elif self.sep == '\\':
        PathClass = PureWindowsPath
        normpath = ntpath.normpath
    else:
        raise ValueError(f"Unknown path separator: {self.sep!r}")

    root = PathClass(self.path)
    joined_path = PathClass(joined or '.')

    rnorm = normpath(str(root))
    jnorm = normpath(str(joined_path))

    if not (jnorm == rnorm or jnorm.startswith(rnorm + self.sep)):
        raise PermissionError(
            f"Path {path!r} resolves to {jnorm!r} which is outside "
            f"the root path {rnorm!r}"
        )

    return joined

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But since we see that normpath is involving os.getcwd, does it make sense at all to use it on paths that may have nothing to do with the local file system?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.getcwd was called by the make_path_posix from fsspec.implementations.local AFAIK, it's not the case with the methods from python core.

@hnavarro-kernet
Copy link
Author

I'm not sure... since this is security sensitive I'm going to mark it as Request changes to be sure everything is crystal clear (I may totally be missing something, though).

Yeah, understandable.
I might have come too quickly into a fix that works for us, since the issue was growing rapidly becoming more and more reproducible and problematic.

I'll be happy to keep reviewing and raising concerns once I'm back from PTO next week.

@lmignon lmignon changed the title [FIX] fs_storage: Resolve rooted_dir sub path is inside path virtually [16.0][FIX] fs_storage: Resolve rooted_dir sub path is inside path virtually Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants