Skip to content

gh-123682: Fix Unicode category check #123683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SnoopJ
Copy link
Contributor

@SnoopJ SnoopJ commented Sep 4, 2024

This changeset fixes the faulty predicate identified by #123682 when testing if an input character is in one of the Unicode Other categories

I assume this doesn't need a news fragment, but if I'm wrong, let me know and I'll add one.
(Sorry about the force-push chatter, I started this branch from an unrelated commit 😬)

@SnoopJ SnoopJ force-pushed the bugfix/gh-123682_tweak-pyrepl-unicodedata-usage branch from 495bb25 to 52fa7be Compare September 4, 2024 16:42
@SnoopJ SnoopJ force-pushed the bugfix/gh-123682_tweak-pyrepl-unicodedata-usage branch from 52fa7be to 0ab70cf Compare September 4, 2024 16:42
vstinner
vstinner previously approved these changes Sep 5, 2024
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

cc @ambv

@vstinner vstinner added needs backport to 3.13 bugs and security fixes topic-repl Related to the interactive shell labels Sep 5, 2024
@vstinner
Copy link
Member

vstinner commented Sep 6, 2024

Ah, you have to update test_pyrepl:

ERROR: test_control_character (test.test_pyrepl.test_pyrepl.TestPyReplOutput.test_control_character)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_pyrepl/test_pyrepl.py", line 657, in test_control_character
    output = multiline_input(reader)
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_pyrepl/support.py", line 18, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/_pyrepl/reader.py", line 786, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/_pyrepl/reader.py", line 743, in handle1
    event = self.console.get_event(block)
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_pyrepl/support.py", line 123, in get_event
    return next(self.events)
StopIteration

======================================================================
FAIL: test_control_characters (test.test_pyrepl.test_reader.TestReader.test_control_characters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_pyrepl/test_reader.py", line 126, in test_control_characters
    self.assert_screen_equals(reader, 'flag = "🏳️\\u200d🌈"')
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_pyrepl/test_reader.py", line 16, in assert_screen_equals
    self.assertListEqual(actual, expected)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: Lists differ: ['flag = "🏳️🌈"'] != ['flag = "🏳️\\u200d🌈"']

First differing element 0:
'flag = "🏳️🌈"'
'flag = "🏳️\\u200d🌈"'

- ['flag = "🏳️🌈"']
+ ['flag = "🏳️\\u200d🌈"']
?             +++++++

@SnoopJ
Copy link
Contributor Author

SnoopJ commented Sep 6, 2024

Oops! Hmm, I think that needs more attention than just in the tests.

The second failure is because the ZWJ is in category Cf and is being erroneously treated as invalid. If I change the predicate to test only for Cc membership, that passes.

The first failure looks more serious. The proximate cause is (I think) the missing newline (also in Cc) but an ad-hoc tweak to the predicate to try and let the newline pass through still fails.

I think overall this means that a category check might not be the right way to express the intent of the original code. @pablogsal, do you mind chiming in here? My understanding of this check is that it's meant to filter out control characters like what's in C0 and C1, but it seems like some of those characters are not wanted in this predicate, so just testing ord(key) <= 0x1F wouldn't be appropriate either. The \x1d in the test should definitely pass through, for instance.

@vstinner vstinner dismissed their stale review September 8, 2024 11:20

tests fail, the current change looks wrong in fact

@serhiy-storchaka serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes topic-repl Related to the interactive shell
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants