-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-111495: Add PyFile_*
CAPI tests
#111709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests fail on Windows (I have a very limited experience with this platform):
Is it correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This family has little functions, but they should be tested with many cases.
Because the default encoding on Windows is not UTF-8. Always specify encoding for text files. |
@serhiy-storchaka thanks a lot for your detailed review! You are one of the best reviewers I know :) |
Address sanitizer build fails with:
Maybe I should use a different string? Suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It's unrelated to Address sanitizer. It's just that this CI builds Python is release mode. And in release mode, the error handler is only used if the string cannot be decoded (decoding error). In debug mode, the error handler is always checked. You can skip this test if |
To reproduce the Address Sanitizer issue, I used:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have not finished the review yet. It is difficult with so many tests. So I can find other issues later.
The main problem is that they incorrectly create non-decodable files. You should use binary files to write them.
It would be nice also to reduce the number of lines where it is possible.
Lib/test/test_capi/test_file.py
Outdated
def test_name_invalid_utf(self): | ||
with open(os_helper.TESTFN, "w", encoding="utf-8") as f: | ||
file_obj = _testcapi.file_from_fd( | ||
f.fileno(), "abc\xe9", "w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not invalid UTF-8. When you pass the Python string, it is encoded to UTF-8, therefore the C string is always valid UTF-8. You have to pass a bytes object, e.g. b'\xff'
. See for example tests for PyDict_GetItemString()
or PyObject_GetAttrString()
.
Lib/test/test_capi/test_file.py
Outdated
first_line = "\xc3\x28\n" | ||
with open(os_helper.TESTFN, "w", encoding="utf-8") as f: | ||
f.writelines([first_line]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it does not create invalid UTF-8.
with open(os_helper.TESTFN, "w", encoding="utf-8") as f: | ||
f.writelines([first_line, second_line]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many tests can use StringIO. E.g.
f = io.StringIO('first_line\nsecond_line\n')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have explicit tests for both file object and io.StringIO
:
def test_file_get_multiple_lines(self):
first_line = "text with юникод 统一码\n"
second_line = "second line\n"
with open(os_helper.TESTFN, "w", encoding="utf-8") as f:
f.writelines([first_line, second_line])
with open(os_helper.TESTFN, encoding="utf-8") as f:
self.assertEqual(self.get_line(f, 0), first_line)
self.assertEqual(self.get_line(f, 0), second_line)
def test_file_get_line_from_file_like(self):
first_line = "text with юникод 统一码\n"
second_line = "second line\n"
contents = io.StringIO(f"{first_line}{second_line}")
self.assertEqual(self.get_line(contents, 0), first_line)
self.assertEqual(self.get_line(contents, 0), second_line)
@sobolevn: What's the status of this PR? Do you plan to attempt to address @serhiy-storchaka's latest review? |
yes, sure! adding this to my queue. |
@serhiy-storchaka @vstinner I partially addressed your review. The only part that I didn't implement is invalid utf8 tests. I want to ask for advice on how it should be done. For example, right now I cannot pass if (!PyArg_ParseTuple(args, "izzizzzi",
&fd,
&name, &mode,
&buffering,
&encoding, &errors, &newline,
&closefd)) {
return NULL;
} What is the best way to pass static PyObject *
file_from_fd_with_bytes(PyObject *Py_UNUSED(self), PyObject *args) and allow passing bytes there? |
What are your issues with passing a bytes object? |
raise ValueError("str raised") | ||
|
||
with self.assertRaisesRegex(ValueError, "str raised"): | ||
self.write_and_return(StrRaises(), flags=_testcapi.Py_PRINT_RAW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what is the difference between these tests if it raises in any case. You should either define __str__
and __repr__
that do not raise in corresponding classes and test both classes with and without Py_PRINT_RAW, or just make both __str__
and __repr__
in the same class raising different exceptions and test that writing with and without Py_PRINT_RAW gives different errors. The former option will duplicate other tests, so I suggest the later way.
Oh, and you do not need to use write_and_return here.
self.assertRaises(AttributeError, self.write, NULL, object(), 0) | ||
self.assertRaises(TypeError, self.write, NULL, NULL, 0) | ||
wr = self.write | ||
self.assertRaises(TypeError, wr, object(), io.BytesIO(), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a string instead of object()
. It will be clearer what you write and why this fails.
Oh no, I did it again :-( I forgot about this PR and I wrote a new one (that I just merged): #129449. Sorry about that. It seems like this PR has more tests. |
@vstinner thanks a lot for your PR, I forgot about that one several times already :) You can port some of the tests from here to your version if it helps. |
I will try to add tests from this PR. |
Looks like
PyFile_SetOpenCodeHook
is already tested here:cpython/Programs/_testembed.c
Lines 1177 to 1232 in 20cfab9