-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-117841: Add C implementation of ntpath.lexists
#117842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Eryk Sun <[email protected]>
…lexists()` Use `os.path.lexists()` rather than `os.lstat()` to test whether paths exist. This is equivalent on POSIX, but faster on Windows.
ntpath.lexists
I'm working on what I hope will be an improved version compared to the first draft. AFAIK, the |
Here's the revised implementation of static PyObject *
nt_exists(PyObject *path, int follow_symlinks)
{
path_t _path = PATH_T_INITIALIZE("exists", "path", 0, 1);
HANDLE hfile;
BOOL traverse = follow_symlinks;
int result = 0;
if (!path_converter(path, &_path)) {
path_cleanup(&_path);
if (PyErr_ExceptionMatches(PyExc_ValueError)) {
PyErr_Clear();
Py_RETURN_FALSE;
}
return NULL;
}
Py_BEGIN_ALLOW_THREADS
if (_path.fd != -1) {
hfile = _Py_get_osfhandle_noraise(_path.fd);
if (hfile != INVALID_HANDLE_VALUE) {
result = 1;
}
}
else if (_path.wide) {
BOOL slow_path = TRUE;
FILE_STAT_BASIC_INFORMATION statInfo;
if (_Py_GetFileInformationByName(_path.wide, FileStatBasicByNameInfo,
&statInfo, sizeof(statInfo)))
{
if (!(statInfo.FileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) ||
!follow_symlinks &&
IsReparseTagNameSurrogate(statInfo.ReparseTag))
{
slow_path = FALSE;
result = 1;
}
else {
// reparse point but not name-surrogate
traverse = TRUE;
}
}
else if (_Py_GetFileInformationByName_ErrorIsTrustworthy(
GetLastError()))
{
slow_path = FALSE;
}
if (slow_path) {
BOOL traverse = follow_symlinks;
if (!traverse) {
hfile = CreateFileW(_path.wide, FILE_READ_ATTRIBUTES, 0, NULL,
OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT |
FILE_FLAG_BACKUP_SEMANTICS, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
FILE_ATTRIBUTE_TAG_INFO info;
if (GetFileInformationByHandleEx(hfile,
FileAttributeTagInfo, &info, sizeof(info)))
{
if (!(info.FileAttributes &
FILE_ATTRIBUTE_REPARSE_POINT) ||
IsReparseTagNameSurrogate(info.ReparseTag))
{
result = 1;
}
else {
// reparse point but not name-surrogate
traverse = TRUE;
}
}
else {
// device or legacy filesystem
result = 1;
}
CloseHandle(hfile);
}
else {
STRUCT_STAT st;
switch (GetLastError()) {
case ERROR_ACCESS_DENIED:
case ERROR_SHARING_VIOLATION:
case ERROR_CANT_ACCESS_FILE:
case ERROR_INVALID_PARAMETER:
if (!LSTAT(_path.wide, &st)) {
result = 1;
}
}
}
}
if (traverse) {
hfile = CreateFileW(_path.wide, FILE_READ_ATTRIBUTES, 0, NULL,
OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
CloseHandle(hfile);
result = 1;
}
else {
STRUCT_STAT st;
switch (GetLastError()) {
case ERROR_ACCESS_DENIED:
case ERROR_SHARING_VIOLATION:
case ERROR_CANT_ACCESS_FILE:
case ERROR_INVALID_PARAMETER:
if (!STAT(_path.wide, &st)) {
result = 1;
}
}
}
}
}
}
Py_END_ALLOW_THREADS
path_cleanup(&_path);
if (result) {
Py_RETURN_TRUE;
}
Py_RETURN_FALSE;
} |
Co-authored-by: Eryk Sun <[email protected]>
Are you happy with the new implementation, or do you have more ideas? |
The performance for existent files has already greatly improved. Nice job! |
That's all I have for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance looks correct, although I am not happy that we added so much complicated code for pure optimization of functions which should not be used in performance critical code (os.scandir()
should be used instead).
The performance of A disappointment with these builtin functions is that they can't be leveraged in Regarding Footnotes
|
We can at least use it from I wouldn't mind if we made the |
Co-authored-by: Serhiy Storchaka <[email protected]>
cc @zooba |
So I'm hesitant to take this for three reasons (and these do apply to previous enhancements as well, but didn't exist at that time):
If someone can show a scenario where you would have a significant (hundreds+) list of paths, need to check whether they exist, but couldn't use one of I don't think we can really reduce the amount of code. If it happened to be shorter and easier to follow then I'd be less concerned about long-term maintenance, but I'm pretty sure it's as good as it gets (without adding indirection and hurting the performance again - same tradeoff we made with the earlier |
On this specifically, one example would be globbing for Lines 508 to 520 in a7711a2
Note that glob results can include dangling symlinks, hence |
The implementation was consolidated with |
Could we simplify this? It already seems to know if it's a directory of file: Lines 5123 to 5129 in a6b610a
Lines 5220 to 5226 in a6b610a
|
Yes, the fast path can be simplified. If it's not a reparse point, |
Tracking further in #118755. |
Benchmark
script
ntpath.lexists
#117841