gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` #129135

picnixz · 2025-01-21T14:41:32Z

I extracted the logic of rendering hexadecimal digits but this requires a double pointer (alternatively, I could make the function return the number of characters that were written and advance the pointer after the call).

If inlined calls are preferred, then we can convert the function into a macro to avoid duplications but I think a function call is not that slow (especially if I add the inline qualifier to force a bit the compiler to inline it and then allow it to possibly vectorize it if it's smart). Ideally, I would like to extract that logic from PyCodec_XMLCharRefReplaceErrors as well but it's the only place where we use the decimal base instead. OTOH, if we extract the logic, it's a bit cleaner to read.

After the refactoring of surrogate handlers has been done, I will wrap up this list of PRs by cleaning up the handlers that I fixed previously (I just didn't want to do both cleanup and fixes in the same PR). The idea is to extract the handling of each unicode error type into a separate function (unless the handler only handles a single exception type as it's the case for the namereplace handler).

Issue: Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

picnixz · 2025-01-24T15:33:08Z

Since I will be leaving for two weeks and won't be able to commit or review anything (except on mobile), I kept it as a draft and we'll discuss about it later. What remains do to on the codecs part is mainly refactoring so it's just a feature.

Python/codecs.c

Co-authored-by: Petr Viktorin <[email protected]>

encukou · 2025-02-08T13:43:09Z

If inlined calls are preferred, then we can convert the function into a macro to avoid duplications but I think a function call is not that slow (especially if I add the inline qualifier to force a bit the compiler to inline it and then allow it to possibly vectorize it if it's smart).

static inline functions are definitely preferred to macros :)

picnixz · 2025-02-08T13:45:36Z

In this case, it's just that I need to pass a double pointer, which made it a bit more ugly, but I think it's better like that rather than keeping everything in the loop (it's much more harder to actually see what's happening and I doubt this will really slow down the handler by much).

encukou · 2025-02-08T15:05:24Z

Well, I wouldn't want to see *(*p)++ on a whiteboard at an interview, but in existing code its purpose should be rather clear to readers :)

Use _PyUnicodeError_GetParams for the 'namereplace' handler

64290ce

picnixz added skip issue skip news labels Jan 21, 2025

picnixz changed the title ~~Use _PyUnicodeError_GetParams for the 'namereplace' handler~~ gh-129173: Use _PyUnicodeError_GetParams for the 'namereplace' handler Jan 22, 2025

picnixz removed the skip issue label Jan 22, 2025

bedevere-app bot mentioned this pull request Jan 22, 2025

Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

Closed

picnixz changed the title ~~gh-129173: Use _PyUnicodeError_GetParams for the 'namereplace' handler~~ gh-129173: Use _PyUnicodeError_GetParams in PyCodec_NameReplaceErrors Jan 22, 2025

picnixz added 7 commits January 24, 2025 11:25

Merge branch 'main' into feat/codecs/name-handler

35c9af7

post-merge

a8880d1

extract some logic

9d99097

markup fixup

3e2a7c2

use public names

a790669

use public names

578a8f8

post-merge

0f60651

picnixz marked this pull request as ready for review January 24, 2025 15:21

bedevere-app bot added the awaiting core review label Jan 24, 2025

picnixz marked this pull request as draft January 24, 2025 15:23

bedevere-app bot removed the awaiting core review label Jan 24, 2025

picnixz marked this pull request as ready for review February 8, 2025 11:34

bedevere-app bot added the awaiting core review label Feb 8, 2025

Merge branch 'main' into feat/codecs/name-handler

ae69a17

encukou approved these changes Feb 8, 2025

View reviewed changes

Python/codecs.c Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels Feb 8, 2025

fix typo

b2c5ddd

Co-authored-by: Petr Viktorin <[email protected]>

encukou merged commit a56ead0 into python:main Feb 8, 2025
41 checks passed

bedevere-app bot removed the awaiting merge label Feb 8, 2025

picnixz deleted the feat/codecs/name-handler branch February 8, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` #129135

gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` #129135

picnixz commented Jan 21, 2025 •

edited

Loading

picnixz commented Jan 24, 2025

encukou commented Feb 8, 2025

picnixz commented Feb 8, 2025

encukou commented Feb 8, 2025

gh-129173: Use _PyUnicodeError_GetParams in PyCodec_NameReplaceErrors #129135

gh-129173: Use _PyUnicodeError_GetParams in PyCodec_NameReplaceErrors #129135

Conversation

picnixz commented Jan 21, 2025 • edited Loading

picnixz commented Jan 24, 2025

encukou commented Feb 8, 2025

picnixz commented Feb 8, 2025

encukou commented Feb 8, 2025

gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` #129135

gh-129173: Use `_PyUnicodeError_GetParams` in `PyCodec_NameReplaceErrors` #129135

picnixz commented Jan 21, 2025 •

edited

Loading