Skip to content

Set specific non-UTF8 encodings for 2 known files in .gitattributes#2496

Merged
Avasam merged 5 commits intomhammond:mainfrom
Avasam:renormalize-encodings
Dec 4, 2025
Merged

Set specific non-UTF8 encodings for 2 known files in .gitattributes#2496
Avasam merged 5 commits intomhammond:mainfrom
Avasam:renormalize-encodings

Conversation

@Avasam
Copy link
Copy Markdown
Collaborator

@Avasam Avasam commented Mar 16, 2025

I added the entries to .gitattributes.
I opened com/TestSources/PyCOMTest/PyCOMTest.idl as ISO-8859-1, fixed the "unknown characters", then saved under that encoding.
Finally I ran git add --renormalize . for the changed files

version(1.1),
// an extended character in the help string should stress things...
helpstring("Python COM Test Harness 1.0 Type Library, pywin32 contributors")
helpstring("Python COM Test Harness 1.0 Type Library, © pywin32 contributors")
Copy link
Copy Markdown
Collaborator Author

@Avasam Avasam Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of only 2 places where this symbol is used. The other being

Copyright © 2003-2012.

Everywhere else is either (c) (136 times) or (C) (17 times)

It could just be changed to (c)

@mhammond
Copy link
Copy Markdown
Owner

In general we're not testing a specific encoding, just trying to avoid mojibake etc - I'd be fine with making everyting utf-8 except there the encoding really does specifically matter.

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Mar 17, 2025

com/TestSources/PyCOMTest/PyCOMTest.idl

For com/TestSources/PyCOMTest/PyCOMTest.idl, see this comment for details: #2493 (comment)

It'd be nice if it worked as UTF-8 as well. But I'm not certain of the proper change to make that work in UTF-8.

For Pythonwin/pywin/test/_dbgscript.py iirc it's non-utf8 on purpose to test handling of reading non-utf8 files


// @pymethod <o PyIStream>|PyIStorage|CreateStream|Creates and opens a stream object with the specified name contained
// in this storage object. All elements within a storage objectboth streams and other storage objects are kept in
// in this storage object. All elements within a storage object, both streams and other storage objects, are kept in
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Owner

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the commit message no longer matches the patch? ie, aren't you just replacing a unicode char with an ascii one?

@Avasam
Copy link
Copy Markdown
Collaborator Author

Avasam commented Dec 4, 2025

@mhammond maybe you're only looking at the last commit? The full PR description still applies. .gitattributes changes for 2 necessary files (everything else is utf-8) and fixing <?> chars due to previously incorrect re-encoding.

@mhammond
Copy link
Copy Markdown
Owner

mhammond commented Dec 4, 2025

oops, indeed I was - I followed the wrong link - thanks!

@Avasam Avasam merged commit f62a275 into mhammond:main Dec 4, 2025
30 checks passed
@Avasam Avasam deleted the renormalize-encodings branch December 4, 2025 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants