-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-124188: Fix PyErr_ProgramTextObject() #124189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-124188: Fix PyErr_ProgramTextObject() #124189
Conversation
* Detect source file encoding. * Use the "replace" error handler even for UTF-8 (default) encoding. * Remove the BOM. * Fix detection of too long lines if they contain NUL. * Return the head rather than the tail for truncated long lines.
if isinstance(source, str): | ||
with open(TESTFN, 'w', encoding='utf-8') as testfile: | ||
testfile.write(dedent(source)) | ||
else: | ||
with open(TESTFN, 'wb') as testfile: | ||
testfile.write(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be replaced by a call to script_helper.make_script"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this option, but I have doubt that this would make the code simpler. This code calls dedent()
for string source, so you need a branching in any case. The rest of make_script()
is not useful here. You would need to pass os.curdir
as the first argument, the .py
suffix is not relevant here, as well as the importlib
cache invalidation.
} while (*pLastChar != '\0' && *pLastChar != '\n'); | ||
const char *line = linebuf; | ||
/* Skip BOM. */ | ||
if (lineno == 1 && line_size >= 3 && memcmp(line, "\xef\xbb\xbf", 3) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe #define BOM_PREFIX "\xef\xbb\xbf"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string is only occurs once in the sources. I do not think that defining a name for it (potentially in a place far from its use) would make the code more clearer. If the comment is not enough, suggest better comment.
@@ -1960,20 +1960,41 @@ PyErr_ProgramText(const char *filename, int lineno) | |||
return res; | |||
} | |||
|
|||
/* Function from Parser/tokenizer/file_tokenizer.c */ | |||
extern char* _PyTokenizer_FindEncodingFilename(int, PyObject *); | |||
|
|||
PyObject * | |||
_PyErr_ProgramDecodedTextObject(PyObject *filename, int lineno, const char* encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be _PyErr_DecodedProgramTextObject
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already was here.
if isinstance(source, str): | ||
with open(TESTFN, 'w', encoding='utf-8') as testfile: | ||
testfile.write(dedent(source)) | ||
else: | ||
with open(TESTFN, 'wb') as testfile: | ||
testfile.write(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this option, but I have doubt that this would make the code simpler. This code calls dedent()
for string source, so you need a branching in any case. The rest of make_script()
is not useful here. You would need to pass os.curdir
as the first argument, the .py
suffix is not relevant here, as well as the importlib
cache invalidation.
@@ -2052,11 +2057,32 @@ def test_assertion_error_location(self): | |||
'AssertionError', | |||
], | |||
), | |||
('assert 1 > 2, "message"', | |||
('assert 1 > 2, "messäge"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are not relevant to compiler errors. I updated/added them just to test that tracebacks for other errors also extract non-ASCII lines correctly.
@@ -1960,20 +1960,41 @@ PyErr_ProgramText(const char *filename, int lineno) | |||
return res; | |||
} | |||
|
|||
/* Function from Parser/tokenizer/file_tokenizer.c */ | |||
extern char* _PyTokenizer_FindEncodingFilename(int, PyObject *); | |||
|
|||
PyObject * | |||
_PyErr_ProgramDecodedTextObject(PyObject *filename, int lineno, const char* encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already was here.
} while (*pLastChar != '\0' && *pLastChar != '\n'); | ||
const char *line = linebuf; | ||
/* Skip BOM. */ | ||
if (lineno == 1 && line_size >= 3 && memcmp(line, "\xef\xbb\xbf", 3) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string is only occurs once in the sources. I do not think that defining a name for it (potentially in a place far from its use) would make the code more clearer. If the comment is not enough, suggest better comment.
A test fails on WASI:
|
Oh, it's a bug in unittest. It calls str() on a bytes object indirectly. Workaround: diff --git a/Lib/unittest/case.py b/Lib/unittest/case.py
index 55c79d35353..4e00267e77c 100644
--- a/Lib/unittest/case.py
+++ b/Lib/unittest/case.py
@@ -1455,6 +1455,8 @@ class _SubTest(TestCase):
def __init__(self, test_case, message, params):
super().__init__()
+ if message is not _subtest_msg_sentinel and not isinstance(message, str):
+ message = repr(message)
self._message = message
self.test_case = test_case
self.params = params |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
* Detect source file encoding. * Use the "replace" error handler even for UTF-8 (default) encoding. * Remove the BOM. * Fix detection of too long lines if they contain NUL. * Return the head rather than the tail for truncated long lines. (cherry picked from commit e2f7107) Co-authored-by: Serhiy Storchaka <[email protected]>
Sorry, @serhiy-storchaka, I could not cleanly backport this to
|
GH-124423 is a backport of this pull request to the 3.13 branch. |
* Detect source file encoding. * Use the "replace" error handler even for UTF-8 (default) encoding. * Remove the BOM. * Fix detection of too long lines if they contain NUL. * Return the head rather than the tail for truncated long lines. (cherry picked from commit e2f7107) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-124426 is a backport of this pull request to the 3.12 branch. |
* Detect source file encoding. * Use the "replace" error handler even for UTF-8 (default) encoding. * Remove the BOM. * Fix detection of too long lines if they contain NUL. * Return the head rather than the tail for truncated long lines. (cherry picked from commit e2f7107)
That's a big piece of work. While _PyErr_ProgramDecodedTextObject() change is short, you added tons of tests for that: thanks. |
Because it contained tons of bugs. Not all tests were backported to 3.12, because they exposed other bugs, not fixed in 3.12. |
* Detect source file encoding. * Use the "replace" error handler even for UTF-8 (default) encoding. * Remove the BOM. * Fix detection of too long lines if they contain NUL. * Return the head rather than the tail for truncated long lines. (cherry picked from commit e2f7107) Co-authored-by: Serhiy Storchaka <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.