Skip to content

gh-124188: Fix PyErr_ProgramTextObject() #124189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 24, 2024

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Sep 17, 2024

  • Detect source file encoding.
  • Use the "replace" error handler even for UTF-8 (default) encoding.
  • Remove the BOM.
  • Fix detection of too long lines if they contain NUL.
  • Return the head rather than the tail for truncated long lines.

* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
@serhiy-storchaka serhiy-storchaka added 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Sep 17, 2024
Comment on lines +2016 to +2021
if isinstance(source, str):
with open(TESTFN, 'w', encoding='utf-8') as testfile:
testfile.write(dedent(source))
else:
with open(TESTFN, 'wb') as testfile:
testfile.write(source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be replaced by a call to script_helper.make_script"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this option, but I have doubt that this would make the code simpler. This code calls dedent() for string source, so you need a branching in any case. The rest of make_script() is not useful here. You would need to pass os.curdir as the first argument, the .py suffix is not relevant here, as well as the importlib cache invalidation.

} while (*pLastChar != '\0' && *pLastChar != '\n');
const char *line = linebuf;
/* Skip BOM. */
if (lineno == 1 && line_size >= 3 && memcmp(line, "\xef\xbb\xbf", 3) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe #define BOM_PREFIX "\xef\xbb\xbf"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is only occurs once in the sources. I do not think that defining a name for it (potentially in a place far from its use) would make the code more clearer. If the comment is not enough, suggest better comment.

@@ -1960,20 +1960,41 @@ PyErr_ProgramText(const char *filename, int lineno)
return res;
}

/* Function from Parser/tokenizer/file_tokenizer.c */
extern char* _PyTokenizer_FindEncodingFilename(int, PyObject *);

PyObject *
_PyErr_ProgramDecodedTextObject(PyObject *filename, int lineno, const char* encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be _PyErr_DecodedProgramTextObject?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already was here.

Comment on lines +2016 to +2021
if isinstance(source, str):
with open(TESTFN, 'w', encoding='utf-8') as testfile:
testfile.write(dedent(source))
else:
with open(TESTFN, 'wb') as testfile:
testfile.write(source)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this option, but I have doubt that this would make the code simpler. This code calls dedent() for string source, so you need a branching in any case. The rest of make_script() is not useful here. You would need to pass os.curdir as the first argument, the .py suffix is not relevant here, as well as the importlib cache invalidation.

@@ -2052,11 +2057,32 @@ def test_assertion_error_location(self):
'AssertionError',
],
),
('assert 1 > 2, "message"',
('assert 1 > 2, "messäge"',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are not relevant to compiler errors. I updated/added them just to test that tracebacks for other errors also extract non-ASCII lines correctly.

@@ -1960,20 +1960,41 @@ PyErr_ProgramText(const char *filename, int lineno)
return res;
}

/* Function from Parser/tokenizer/file_tokenizer.c */
extern char* _PyTokenizer_FindEncodingFilename(int, PyObject *);

PyObject *
_PyErr_ProgramDecodedTextObject(PyObject *filename, int lineno, const char* encoding)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already was here.

} while (*pLastChar != '\0' && *pLastChar != '\n');
const char *line = linebuf;
/* Skip BOM. */
if (lineno == 1 && line_size >= 3 && memcmp(line, "\xef\xbb\xbf", 3) == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is only occurs once in the sources. I do not think that defining a name for it (potentially in a place far from its use) would make the code more clearer. If the comment is not enough, suggest better comment.

@vstinner
Copy link
Member

A test fails on WASI:

ERROR: test_assertion_error_location (test.test_exceptions.AssertionErrorTests.test_assertion_error_location)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Lib/test/test_exceptions.py", line 2124, in test_assertion_error_location
    result = run_script(source)
  File "/Lib/test/test_exceptions.py", line 2022, in run_script
    _rc, _out, err = script_helper.assert_python_failure('-Wd', '-X', 'utf8', TESTFN)
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/test/support/script_helper.py", line 193, in assert_python_failure
    return _assert_python(False, *args, **env_vars)
  File "/Lib/unittest/case.py", line 156, in skip_wrapper
    raise SkipTest(reason)
unittest.case.SkipTest: requires subprocess support

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Lib/test/support/__init__.py", line 2814, in wrapper
    return func(*args, **kwargs)
  File "/Lib/test/test_exceptions.py", line 2123, in test_assertion_error_location
    with self.subTest(source):
         ~~~~~~~~~~~~^^^^^^^^
  File "/Lib/contextlib.py", line 162, in __exit__
    self.gen.throw(value)
    ~~~~~~~~~~~~~~^^^^^^^
  File "/Lib/unittest/case.py", line 555, in subTest
    with self._outcome.testPartExecutor(self._subtest, subTest=True):
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/contextlib.py", line 162, in __exit__
    self.gen.throw(value)
    ~~~~~~~~~~~~~~^^^^^^^
  File "/Lib/unittest/case.py", line 63, in testPartExecutor
    _addSkip(self.result, test_case, str(e))
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/unittest/case.py", line 89, in _addSkip
    addSkip(test_case, reason)
    ~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/Lib/test/libregrtest/testresult.py", line 123, in addSkip
    super().addSkip(test, reason)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/Lib/unittest/runner.py", line 119, in addSkip
    self._write_status(test, "skipped {0!r}".format(reason))
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/unittest/runner.py", line 71, in _write_status
    self.stream.write(self.getDescription(test))
                      ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Lib/unittest/runner.py", line 54, in getDescription
    return str(test)
  File "/Lib/unittest/case.py", line 1487, in __str__
    return "{} {}".format(self.test_case, self._subDescription())
                                          ~~~~~~~~~~~~~~~~~~~~^^
  File "/Lib/unittest/case.py", line 1469, in _subDescription
    parts.append("[{}]".format(self._message))
                 ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
BytesWarning: str() on a bytes instance

@vstinner
Copy link
Member

Oh, it's a bug in unittest. It calls str() on a bytes object indirectly. Workaround:

diff --git a/Lib/unittest/case.py b/Lib/unittest/case.py
index 55c79d35353..4e00267e77c 100644
--- a/Lib/unittest/case.py
+++ b/Lib/unittest/case.py
@@ -1455,6 +1455,8 @@ class _SubTest(TestCase):
 
     def __init__(self, test_case, message, params):
         super().__init__()
+        if message is not _subtest_msg_sentinel and not isinstance(message, str):
+            message = repr(message)
         self._message = message
         self.test_case = test_case
         self.params = params

@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes and removed 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Sep 24, 2024
@serhiy-storchaka serhiy-storchaka merged commit e2f7107 into python:main Sep 24, 2024
42 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the err-programtext branch September 24, 2024 08:01
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 24, 2024
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
(cherry picked from commit e2f7107)

Co-authored-by: Serhiy Storchaka <[email protected]>
@miss-islington-app
Copy link

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker e2f710792b0418b8ca1ca3b8cdf39588c7268495 3.12

@bedevere-app
Copy link

bedevere-app bot commented Sep 24, 2024

GH-124423 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 24, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Sep 24, 2024
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
(cherry picked from commit e2f7107)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Sep 24, 2024

GH-124426 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Sep 24, 2024
serhiy-storchaka added a commit that referenced this pull request Sep 24, 2024
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
(cherry picked from commit e2f7107)
@vstinner
Copy link
Member

That's a big piece of work. While _PyErr_ProgramDecodedTextObject() change is short, you added tons of tests for that: thanks.

@serhiy-storchaka
Copy link
Member Author

Because it contained tons of bugs.

Not all tests were backported to 3.12, because they exposed other bugs, not fixed in 3.12.

@serhiy-storchaka serhiy-storchaka removed their assignment Sep 29, 2024
serhiy-storchaka added a commit that referenced this pull request Oct 7, 2024
* Detect source file encoding.
* Use the "replace" error handler even for UTF-8 (default) encoding.
* Remove the BOM.
* Fix detection of too long lines if they contain NUL.
* Return the head rather than the tail for truncated long lines.
(cherry picked from commit e2f7107)

Co-authored-by: Serhiy Storchaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants