-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-98401: Invalid escape sequences emits SyntaxWarning #99011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previous attempt in 2018:
|
I tested the Python test suite with SyntaxWarning treated as error: it does pass.
|
This issue mostly hit code defining regular expressions. Example in BeautifulSoup 3.2.2:
Examples of code, re.compile() calls:
|
Lib/test/test_codecs.py
Outdated
for i in range(97, 123): | ||
b = bytes([i]) | ||
if b not in b'abfnrtvx': | ||
with self.assertWarns(DeprecationWarning): | ||
with self.assertWarns(SyntaxWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SyntaxWarning is not related to codecs. It only should be emitted by the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which warning should be emitted if SyntaxWarning is not the best choice? UnicodeWarning?
Does UnicodeWarning make sense for codecs.escape_decode()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeprecationWarning, as for all other deprecated features.
Did you remove all pyc files and regenerate frozen modules before tests? Also try to regenerate generated code using the new Python binary as PYTHON_FOR_REGEN. |
I ran |
Ok, I updated my PR: Use SyntaxWarning for invalid octal sequence. |
A backslash-character pair that is not a valid escape sequence now generates a SyntaxWarning, instead of DeprecationWarning. For example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an invalid escape sequence), use raw strings for regular expression: re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will eventually be raised, instead of SyntaxWarning. Octal escapes with value larger than 0o377 (ex: "\477"), deprecated in Python 3.11, now produce a SyntaxWarning, instead of DeprecationWarning. In a future Python version they will be eventually a SyntaxError. codecs.escape_decode() and codecs.unicode_escape_decode() are left unchanged: they still emit DeprecationWarning. * The parser only emits SyntaxWarning for Python 3.12 (feature version), and still emits DeprecationWarning on older Python versions. * Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and wasm_build.py.
Hum, I messed up my PR so I squashed two commits and I fixed my PR:
I also mentioned the future convertion to SyntaxError in the doc (What's New / NEWS entries). |
@mdickinson @serhiy-storchaka @hugovk: Would you mind to review my PR? |
Did you remove all pyc files? find -name '*.py[co]' -exec rm -rf '{}' + |
Let me try these commands:
The last command fails as expected with:
|
Oops, test_string_literals didn't work when run with I ran the test suite with:
Note: I would prefer to run the whole test suite with |
Fixes this warning experienced with Python 3.12 (python/cpython#98401, python/cpython#99011): faa_cs_aan.py:690: SyntaxWarning: invalid escape sequence '\.' email_match = re.search('For Inquiries: ([0-9a-z._-]+@[0-9a-z.-]+)\.?$',
Use r-strings for all regular expressions. Fixes these warnings experienced with Python 3.12 (python/cpython#98401, python/cpython#99011, https://docs.python.org/3/whatsnew/3.12.html#other-language-changes point 2): run_tests.py:200: SyntaxWarning: invalid escape sequence '\d' FINAL_LINE_RE = re.compile('status=(\d+)$') run_tests.py:441: SyntaxWarning: invalid escape sequence '\*' re.match('^\* daemon .+ \*$', line) or line == ''): Change-Id: I71ddfb1a2ca62654378ae67a99e9aeb4ce7b7394 Reviewed-on: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/6254063 Commit-Queue: Mark Mentovai <[email protected]> Reviewed-by: Nico Weber <[email protected]>
Use r-strings for all regular expressions. Fixes these warnings experienced with Python 3.12 (python/cpython#98401, python/cpython#99011, https://docs.python.org/3/whatsnew/3.12.html#other-language-changes point 2): run_tests.py:200: SyntaxWarning: invalid escape sequence '\d' FINAL_LINE_RE = re.compile('status=(\d+)$') run_tests.py:441: SyntaxWarning: invalid escape sequence '\*' re.match('^\* daemon .+ \*$', line) or line == ''): Change-Id: I71ddfb1a2ca62654378ae67a99e9aeb4ce7b7394 Reviewed-on: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/6254063 Commit-Queue: Mark Mentovai <[email protected]> Reviewed-by: Nico Weber <[email protected]>
All other regular expressions in the file already used r-strings, but this newer one added in beba95a did not. Fixes this warning experienced with Python ≥3.12 (python/cpython#98401, python/cpython#99011, python/cpython@a60ddd31be7f): bisect-builds.py:1260: SyntaxWarning: invalid escape sequence '\.' if not re.search("\.apks?$", apk_name): Change-Id: Ia8f1981efde3b14d5c36c3dfc2bca7737fd4cb89 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6647645 Commit-Queue: Mark Mentovai <[email protected]> Reviewed-by: Kuan Huang <[email protected]> Cr-Commit-Position: refs/heads/main@{#1474605}
Since 61fad56 (https://chromium-review.googlesource.com/c/5848450, 2024-09-11), autoninja under Python 3.12 presents these warnings: ``` …/autoninja.py:73: SyntaxWarning: invalid escape sequence '\s' m = re.match('instance\s*=\s*projects/([^/]*)/instances/.*', line) …/autoninja.py:92: SyntaxWarning: invalid escape sequence '\s' m = re.match('SISO_PROJECT=\s*(\S*)\s*', line) ``` This warning appears in Python 3.12 ([1], [2], [3]). '\s' and '\S' are not valid escape sequences in strings. r'\s' and r'\S' are valid in regular expressions, but outside of raw strings, they would need to be written as '\\s' and '\\S'. There is no reason to not use raw strings in this case, so the new regular expression pattern strings introduced in 61fad56 are changed to raw strings. [1] https://docs.python.org/3/whatsnew/3.12.html#:~:text=A%20backslash%2Dcharacter%20pair%20that%20is%20not%20a%20valid%20escape%20sequence%20now%20generates%20a%20SyntaxWarning%2C%20instead%20of%20DeprecationWarning. [2] python/cpython#98401 [3] python/cpython#99011 Bug: b/364318216 Change-Id: I0f237976fe9c39208541ae78205f5bdbf126fa82 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/tools/depot_tools/+/5859159 Commit-Queue: Mark Mentovai <[email protected]> Reviewed-by: Philipp Wollermann <[email protected]> Auto-Submit: Mark Mentovai <[email protected]>
A backslash-character pair that is not a valid escape sequence now generates a SyntaxWarning, instead of DeprecationWarning. For example, re.compile("\d+.\d+") now emits a SyntaxWarning ("\d" is an invalid escape sequence), use raw strings for regular expression: re.compile(r"\d+.\d+"). In a future Python version, SyntaxError will eventually be raised, instead of SyntaxWarning.
Octal escapes with value larger than 0o377 (ex: "\477"), deprecated in Python 3.11, now produce a SyntaxWarning, instead of DeprecationWarning. In a future Python version they will be eventually a SyntaxError.
codecs.escape_decode() and codecs.unicode_escape_decode() are left unchanged: they still emit DeprecationWarning.