Skip to content

Set file permissions accoding to umask #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
funkey opened this issue Nov 7, 2018 · 19 comments · Fixed by #493
Closed

Set file permissions accoding to umask #325

funkey opened this issue Nov 7, 2018 · 19 comments · Fixed by #493

Comments

@funkey
Copy link
Contributor

funkey commented Nov 7, 2018

Zarr uses temporary files that are moved in place to write meta-files and chunks. As a side effect, the created containers are only readable by the user who created them (as is default for temporary files). A better approach might be to set the file permissions after the move according to the users umask.

@alimanfoo
Copy link
Member

Thanks @funkey.

Do you know if there a reliable way to determine what the current user's umask is? I found something like this, but don't know if it's the best way, and involves temporarily obliterating the umask just to find out what it is, which seems a little awkward:

import os
umask = os.umask(0)
os.umask(umask)

Is this something that can be done once and cached somewhere, or would we need to query the umask before every time you want to change a file's permissions?

Also wondering if file permissions should be surfaced as an argument to the DirectoryStore constructor, so users can control explicitly if they want to?

@funkey
Copy link
Contributor Author

funkey commented Nov 8, 2018

I think this is the only way to get the umask, this seems to go back to the POSIX standard (see man umask).

Caching should be fine, I don't think the umask of a process can be changed from the outside.

Having file permissions as an optional argument to DirectoryStore sounds like a good idea.

@alimanfoo
Copy link
Member

Cool. How about adding in a chmod argument to the DirectoryStore constructor. If None (default) then files will be set to permissions based on the user's umask. If False then permissions will not be changed (current behaviour). If some other value then file permissions will be changed to that value, assuming the value is a numeric mode that can be passed to os.chmod(). Thoughts?

@funkey
Copy link
Contributor Author

funkey commented Nov 8, 2018

That sounds very reasonable to me!

@alimanfoo
Copy link
Member

Great, thanks @funkey. A PR would be welcome if you have the time, otherwise we can label this as "help wanted".

@jakirkham
Copy link
Member

What if we just used a normal file (instead of a temp file)? Would that solve this issue?

@funkey
Copy link
Contributor Author

funkey commented Nov 8, 2018

Actually, yes. That might be the easiest solution.

@alimanfoo
Copy link
Member

alimanfoo commented Nov 8, 2018 via email

@jakirkham
Copy link
Member

Meaning the randomized part? We could use a UUID.

@alimanfoo
Copy link
Member

So the benefit of using a normal file is (1) no need to query current umask, and (2) no need to chmod?

I.e., in the default case (file should be created according to current umask) would you generate a temporary file name using UUID, then use the built-in open() function to open it? (Then use os.replace() to move it into place when writing is finished successfully, or os.remove() if any errors occur.)

In the case where you want to allow the user to provide an explicit file mode, would you open the file with the lower-level os.open() function? Or open with built-in open() then os.chmod() if write succeeds? Or do we not worry about supporting this now (it's beyond the scope of the original issue)?

@alimanfoo
Copy link
Member

Thinking about this has actually reminded me that we might want to reconsider the strategy of using a different randomly-named temporary file for each atomic write. I've tried to write up the issue in #328. It's somewhat orthogonal to the issue here of file permissions, however if we are considering changes to the way files are created then #328 is also relevant.

@jakirkham
Copy link
Member

So the benefit of using a normal file is (1) no need to query current umask, and (2) no need to chmod?

That's right.

I.e., in the default case (file should be created according to current umask) would you generate a temporary file name using UUID, then use the built-in open() function to open it? (Then use os.replace() to move it into place when writing is finished successfully, or os.remove() if any errors occur.)

Sounds like a good plan to me.

In the case where you want to allow the user to provide an explicit file mode, would you open the file with the lower-level os.open() function? Or open with built-in open() then os.chmod() if write succeeds? Or do we not worry about supporting this now (it's beyond the scope of the original issue)?

Wouldn't worry about this yet. This basically hasn't come up AFAICT. Except of course because temp files are a little unusual. Moving away from temp files should eliminate the need to think about this.

@eddienko
Copy link

eddienko commented Jun 26, 2019

This is bothering me a lot, since the directory storage created by Zarr cannot be shared between members of the group. See my quick fix in the PR, does that seem reasonable?

eddienko added a commit to eddienko/zarr that referenced this issue Jul 3, 2019
@mcherkassky
Copy link

+1 can this be included this in the next release?

@manolinux
Copy link

+1 In my opinion, this is absolutely necessary in UNIX enviroments, should be released ASAP ...

@manolinux
Copy link

manolinux commented Jul 29, 2019

Just in case someone needs it:

I've solved this in a dirty way ... The problem comes with tempfile__mkstemp_inner, that creates the temp file with 0600 permissions. I've set up this to use current mask (mine takes into account group is 0007):

def _mkstemp_inner(dir, pre, suf, flags, output_type):
    """Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""

    names = tempfile._get_candidate_names()
    if output_type is bytes:
        names = map(tempfile._os.fsencode, names)

    for seq in range(tempfile.TMP_MAX):
        name = next(names)
        file = tempfile._os.path.join(dir, pre + name + suf)
        try:
            umask = os.umask(0)
            os.umask(umask)
            fd = tempfile._os.open(file, flags, 0o666 & ~umask)
        except FileExistsError:
            continue    # try again
        except PermissionError:
            # This exception is thrown when a directory with the chosen name
            # already exists on windows.
            if (tempfile._os.name == 'nt' and tempfile._os.path.isdir(dir) and
                tempfile._os.access(dir, tempfile._os.W_OK)):
                continue
            else:
                raise
        return (fd, tempfile._os.path.abspath(file))

    raise tempfile.FileExistsError(tempfile._errno.EEXIST,
                          "No usable temporary file name found")

tempfile._mkstemp_inner = _mkstemp_inner

@d-v-b
Copy link
Contributor

d-v-b commented Aug 31, 2019

+1 A fix to this would be greatly appreciated.

@adcroft
Copy link

adcroft commented Oct 9, 2019

+1 We also ran into this limitation just recently. A fix seems necessary to be able share data as expected @raphaeldussin

@raphaeldussin
Copy link
Contributor

+1 with @d-v-b and @adcroft

this seems like a blocking issue for collaborative work

jakirkham pushed a commit that referenced this issue Oct 28, 2019
* Set file permissions in DirectoryStorage according to umask

Fixes #325

* Use built-in open function

* Adds link to GH issue

* Adds release notes entry
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Aug 24, 2023
### Description
The `download_url_to_file` function in torch.hub uses a temporary file to prevent overriding a local working checkpoint with a broken download.This temporary file is created using `NamedTemporaryFile`. However, since `NamedTemporaryFile` creates files with overly restrictive permissions (0600), the resulting download will not have default permissions and will not respect umask on Linux (since moving the file will retain the restrictive permissions of the temporary file). This is especially problematic when trying to share model checkpoints between multiple users as other users will not even have read access to the file.

The change in this PR fixes the issue by using custom code to create the temporary file without changing the permissions to 0600 (unfortunately there is no way to override the permissions behaviour of existing Python standard library code). This ensures that the downloaded checkpoint file correctly have the default permissions applied. If a user wants to apply more restrictive permissions, they can do so via usual means (i.e. by setting umask).

See these similar issues in other projects for even more context:
* borgbackup/borg#6400
* borgbackup/borg#6933
* zarr-developers/zarr-python#325

### Issue
#81297

### Testing
Extended the unit test `test_download_url_to_file` to also check permissions.

Pull Request resolved: #82869
Approved by: https://github.com/vmoens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants