gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic #129894

picnixz · 2025-02-09T12:52:40Z

Issue: Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

picnixz · 2025-02-09T13:09:28Z

Huh, I probably messed up my index somewhere. Will fix it later or tomorrow.

Python/codecs.c

…refreplace-handler-129173

encukou · 2025-02-26T12:39:47Z

I filed my suggestion as https://github.com/picnixz/cpython/pull/1/files

…ler-129173 cpython#129894: just get the log10

picnixz · 2025-03-01T09:27:05Z

I've applied your suggestion and tweaked it a bit. By the way, I observe that I forgot to remove a "object is ready" comment. Do you mind that after this PR and the following one for backslashreplace, I skim through the code base to remove the un-necessary related comments? there are a few occurrences of saying that some Unicode object is "ready"

encukou · 2025-03-03T09:20:50Z

That's a style change, which we generally only do when touching neighboring code.
On the other hand, the comments will be getting increasingly confusing as PyUnicode_READY/PyUnicode_WCHAR_KIND is forgotten and people don't associate “ready” with them.

How many are there? I'd need to review each of those removals in context.

Python/codecs.c

picnixz · 2025-03-03T09:56:08Z

How many are there? I'd need to review each of those removals in context.

Apart from those already in codecs.c that I forgot to remove, not many:

cpython/Modules/unicodedata.c

Lines 594 to 596 in a85eeb9

    
           /* result is guaranteed to be ready, as it is compact. */ 
        
           kind = PyUnicode_KIND(result); 
        
           data = PyUnicode_DATA(result);

cpython/Modules/unicodedata.c

Lines 655 to 661 in a85eeb9

    
           result = nfd_nfkd(self, input, k); 
        
           if (!result) 
        
               return NULL; 
        
           /* result will be "ready". */ 
        
           kind = PyUnicode_KIND(result); 
        
           data = PyUnicode_DATA(result); 
        
           len = PyUnicode_GET_LENGTH(result);

cpython/Modules/_io/textio.c

Lines 357 to 363 in a85eeb9

    
           kind = PyUnicode_KIND(modified); 
        
           out = PyUnicode_DATA(modified); 
        
           PyUnicode_WRITE(kind, out, 0, '\r'); 
        
           memcpy(out + kind, PyUnicode_DATA(output), kind * output_len); 
        
           Py_SETREF(output, modified); /* output remains ready */ 
        
           self->pendingcr = 0; 
        
           output_len++;

cpython/Modules/_io/textio.c

Lines 1821 to 1824 in a85eeb9

    
           /* decoded_chars is guaranteed to be "ready". */ 
        
           avail = (PyUnicode_GET_LENGTH(self->decoded_chars) 
        
                    - self->decoded_chars_used);

cpython/Parser/lexer/lexer.c

Lines 311 to 315 in a85eeb9

    
           /* Verify that the identifier follows PEP 3131. 
        
              All identifier strings are guaranteed to be "ready" unicode objects. 
        
            */ 
        
           static int 
        
           verify_identifier(struct tok_state *tok)

cpython/Parser/pegen.c

Lines 505 to 513 in a85eeb9

    
           PyObject * 
        
           _PyPegen_new_identifier(Parser *p, const char *n) 
        
           { 
        
               PyObject *id = PyUnicode_DecodeUTF8(n, (Py_ssize_t)strlen(n), NULL); 
        
               if (!id) { 
        
                   goto error; 
        
               } 
        
               /* PyUnicode_DecodeUTF8 should always return a ready string. */ 
        
               assert(PyUnicode_IS_READY(id));

cpython/Python/tracemalloc.c

Lines 252 to 259 in a85eeb9

    
               if (!PyUnicode_IS_READY(filename)) { 
        
                   /* Don't make a Unicode string ready to avoid reentrant calls 
        
                      to tracemalloc_alloc() or tracemalloc_realloc() */ 
        
           #ifdef TRACE_DEBUG 
        
                   tracemalloc_error("filename is not a ready unicode string"); 
        
           #endif 
        
                   return; 
        
               }

The tracemalloc one seems to be dead code.

encukou

Thanks!

encukou · 2025-03-03T10:59:22Z

You might want to split the comment removals to 5 PRs (each to be seen by different subject experts), and combine them with removing the remaining calls to PyUnicode_IS_READY.

Careful, the one in unicodeobject.c refers to a different kind of "ready", PyUnicode_Type.tp_flags & Py_TPFLAGS_READY.

picnixz · 2025-03-03T11:08:22Z

Careful, the one in unicodeobject.c refers to a different kind of "ready", PyUnicode_Type.tp_flags & Py_TPFLAGS_READY.

Oups, you're right.

Python/codecs.c

picnixz · 2025-03-03T11:17:33Z

I really want to be able to preview the commit message when I'm enabling auto-merge...

…thon#129894) Writing the decimal representation of a Unicode codepoint only requires to know the number of digits. --------- Co-authored-by: Petr Viktorin <[email protected]>

Use new helpers in the xmlcharrefreplace handler.

0c9d5ad

picnixz added the skip news label Feb 9, 2025

bedevere-app bot mentioned this pull request Feb 9, 2025

Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

Closed

picnixz commented Feb 9, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

Fix tests

cb7114a

picnixz commented Feb 9, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

Update Python/codecs.c

c624693

picnixz commented Feb 9, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

Python/codecs.c Outdated Show resolved Hide resolved

Python/codecs.c Outdated Show resolved Hide resolved

picnixz commented Feb 9, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

Python/codecs.c Outdated Show resolved Hide resolved

picnixz added 2 commits February 9, 2025 15:14

Fix tests

bf2f4de

Merge branch 'main' into feat/codecs/xmlcharrefreplace-handler-129173

1eb2503

picnixz changed the title ~~gh-129173: Use new helpers in the xmlcharrefreplace handler.~~ gh-129173: Use new helpers in the xmlcharrefreplace handler Feb 24, 2025

picnixz added 2 commits February 25, 2025 14:25

Merge remote-tracking branch 'upstream/main' into feat/codecs/xmlchar…

329c039

…refreplace-handler-129173

Merge remote-tracking branch 'upstream/main' into feat/codecs/xmlchar…

7dfec2e

…refreplace-handler-129173

picnixz marked this pull request as ready for review February 25, 2025 13:28

picnixz requested a review from encukou February 25, 2025 13:28

bedevere-app bot added the awaiting core review label Feb 25, 2025

encukou added 2 commits February 26, 2025 13:33

Get log10 only, fill buffer backwards

713ece5

Remove obsolete comment

6edcfef

encukou mentioned this pull request Feb 26, 2025

cpython#129894: just get the log10 picnixz/cpython#1

Merged

picnixz added 4 commits March 1, 2025 10:19

Merge pull request #1 from encukou/feat/codecs/xmlcharrefreplace-hand…

dd36c99

…ler-129173 cpython#129894: just get the log10

post-merge

b8fe3b6

post-merge

51664c1

post-merge

c6feca6

picnixz requested review from encukou and removed request for encukou March 2, 2025 12:01

encukou reviewed Mar 3, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

picnixz commented Mar 3, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

Invoke forgotten PEP-7 rule

97c04b5

picnixz requested a review from encukou March 3, 2025 10:39

encukou approved these changes Mar 3, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Mar 3, 2025

picnixz commented Mar 3, 2025

View reviewed changes

Python/codecs.c Show resolved Hide resolved

add empty line

f9a305d

picnixz enabled auto-merge (squash) March 3, 2025 11:16

picnixz changed the title ~~gh-129173: Use new helpers in the xmlcharrefreplace handler~~ gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic Mar 3, 2025

picnixz disabled auto-merge March 3, 2025 11:16

picnixz enabled auto-merge (squash) March 3, 2025 11:17

picnixz merged commit f693f84 into python:main Mar 3, 2025
41 checks passed

bedevere-app bot removed the awaiting merge label Mar 3, 2025

picnixz deleted the feat/codecs/xmlcharrefreplace-handler-129173 branch March 3, 2025 12:17

picnixz mentioned this pull request Mar 3, 2025

Remove references to Unicode objects being ready #130790

Closed

sergey-miryanov mentioned this pull request Mar 3, 2025

gh-130790: Remove references about unicode's readiness from comments #130801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic #129894

gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic #129894

picnixz commented Feb 9, 2025 •

edited by bedevere-app bot

Loading

picnixz commented Feb 9, 2025

encukou commented Feb 26, 2025

picnixz commented Mar 1, 2025

encukou commented Mar 3, 2025

picnixz commented Mar 3, 2025 •

edited

Loading

encukou left a comment

encukou commented Mar 3, 2025

picnixz commented Mar 3, 2025

picnixz commented Mar 3, 2025

gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic #129894

gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic #129894

Conversation

picnixz commented Feb 9, 2025 • edited by bedevere-app bot Loading

picnixz commented Feb 9, 2025

encukou commented Feb 26, 2025

picnixz commented Mar 1, 2025

encukou commented Mar 3, 2025

picnixz commented Mar 3, 2025 • edited Loading

encukou left a comment

Choose a reason for hiding this comment

encukou commented Mar 3, 2025

picnixz commented Mar 3, 2025

picnixz commented Mar 3, 2025

gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic #129894

gh-129173: simplify `PyCodec_XMLCharRefReplaceErrors` logic #129894

picnixz commented Feb 9, 2025 •

edited by bedevere-app bot

Loading

picnixz commented Mar 3, 2025 •

edited

Loading