Skip to content

gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic #129894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Feb 9, 2025

@picnixz
Copy link
Member Author

picnixz commented Feb 9, 2025

Huh, I probably messed up my index somewhere. Will fix it later or tomorrow.

@picnixz picnixz changed the title gh-129173: Use new helpers in the xmlcharrefreplace handler. gh-129173: Use new helpers in the xmlcharrefreplace handler Feb 24, 2025
@picnixz picnixz marked this pull request as ready for review February 25, 2025 13:28
@picnixz picnixz requested a review from encukou February 25, 2025 13:28
@encukou
Copy link
Member

encukou commented Feb 26, 2025

@picnixz
Copy link
Member Author

picnixz commented Mar 1, 2025

I've applied your suggestion and tweaked it a bit. By the way, I observe that I forgot to remove a "object is ready" comment. Do you mind that after this PR and the following one for backslashreplace, I skim through the code base to remove the un-necessary related comments? there are a few occurrences of saying that some Unicode object is "ready"

@picnixz picnixz requested review from encukou and removed request for encukou March 2, 2025 12:01
@encukou
Copy link
Member

encukou commented Mar 3, 2025

That's a style change, which we generally only do when touching neighboring code.
On the other hand, the comments will be getting increasingly confusing as PyUnicode_READY/PyUnicode_WCHAR_KIND is forgotten and people don't associate “ready” with them.

How many are there? I'd need to review each of those removals in context.

@picnixz
Copy link
Member Author

picnixz commented Mar 3, 2025

How many are there? I'd need to review each of those removals in context.

Apart from those already in codecs.c that I forgot to remove, not many:

/* result is guaranteed to be ready, as it is compact. */
kind = PyUnicode_KIND(result);
data = PyUnicode_DATA(result);

result = nfd_nfkd(self, input, k);
if (!result)
return NULL;
/* result will be "ready". */
kind = PyUnicode_KIND(result);
data = PyUnicode_DATA(result);
len = PyUnicode_GET_LENGTH(result);

kind = PyUnicode_KIND(modified);
out = PyUnicode_DATA(modified);
PyUnicode_WRITE(kind, out, 0, '\r');
memcpy(out + kind, PyUnicode_DATA(output), kind * output_len);
Py_SETREF(output, modified); /* output remains ready */
self->pendingcr = 0;
output_len++;

cpython/Modules/_io/textio.c

Lines 1821 to 1824 in a85eeb9

/* decoded_chars is guaranteed to be "ready". */
avail = (PyUnicode_GET_LENGTH(self->decoded_chars)
- self->decoded_chars_used);

/* Verify that the identifier follows PEP 3131.
All identifier strings are guaranteed to be "ready" unicode objects.
*/
static int
verify_identifier(struct tok_state *tok)

cpython/Parser/pegen.c

Lines 505 to 513 in a85eeb9

PyObject *
_PyPegen_new_identifier(Parser *p, const char *n)
{
PyObject *id = PyUnicode_DecodeUTF8(n, (Py_ssize_t)strlen(n), NULL);
if (!id) {
goto error;
}
/* PyUnicode_DecodeUTF8 should always return a ready string. */
assert(PyUnicode_IS_READY(id));

if (!PyUnicode_IS_READY(filename)) {
/* Don't make a Unicode string ready to avoid reentrant calls
to tracemalloc_alloc() or tracemalloc_realloc() */
#ifdef TRACE_DEBUG
tracemalloc_error("filename is not a ready unicode string");
#endif
return;
}

The tracemalloc one seems to be dead code.

@picnixz picnixz requested a review from encukou March 3, 2025 10:39
Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@encukou
Copy link
Member

encukou commented Mar 3, 2025

You might want to split the comment removals to 5 PRs (each to be seen by different subject experts), and combine them with removing the remaining calls to PyUnicode_IS_READY.

Careful, the one in unicodeobject.c refers to a different kind of "ready", PyUnicode_Type.tp_flags & Py_TPFLAGS_READY.

@picnixz
Copy link
Member Author

picnixz commented Mar 3, 2025

Careful, the one in unicodeobject.c refers to a different kind of "ready", PyUnicode_Type.tp_flags & Py_TPFLAGS_READY.

Oups, you're right.

@picnixz picnixz enabled auto-merge (squash) March 3, 2025 11:16
@picnixz picnixz changed the title gh-129173: Use new helpers in the xmlcharrefreplace handler gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic Mar 3, 2025
@picnixz picnixz disabled auto-merge March 3, 2025 11:16
@picnixz picnixz enabled auto-merge (squash) March 3, 2025 11:17
@picnixz
Copy link
Member Author

picnixz commented Mar 3, 2025

I really want to be able to preview the commit message when I'm enabling auto-merge...

@picnixz picnixz merged commit f693f84 into python:main Mar 3, 2025
41 checks passed
@picnixz picnixz deleted the feat/codecs/xmlcharrefreplace-handler-129173 branch March 3, 2025 12:17
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…thon#129894)

Writing the decimal representation of a Unicode codepoint only requires to know the number of digits.

---------

Co-authored-by: Petr Viktorin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants