Look into string sharing for deep-frozen code

When code objects are created by the compiler or by marshal, certain strings are interned using `PyUnicode_InternInPlace()`. But code objects obtained from the deep-freeze process do not do this.

While strings are deduped within a module, certain strings (e.g. dunders, 'name', 'self') occur frequently in different modules and thus the deep-frozen code object will use up slightly more space, and certain operations will be slightly slower (e.g. comparison of two strings that are both interned is done by pointer comparison).

We could do a number of different things (or combine several):

- Add `PyUnicode_InternInPlace()` calls to the "get toplevel" function in each deepfrozen .c file. This would still waste the space though.
- Merge all deepfrozen files into a single file and dedupe strings when that file is written. Requires changing the deepfreeze build procedures for Win and Unix. An advantage is that we could dedupe other things (basically all constants, even bytecode) this way. (@markshannon seems to lean towards this one.)
- Give strings that are likely candidates (i.e., that look like ASCII identifiers -- see `all_name_chars()` in codeobject.c) an external name with "weak linkage" so that the linker can dedupe them. (Props to @lpereira for this one.)
- For strings that occur in the array of known `_Py_Identifier`s, replace the string with a reference into that array, for pure savings. (@ericsnowcurrently has more details; IIRC this array isn't in "main" yet.)

[I am not planning to attack this any time soon, so if somebody wants to tackle this, go ahead and assign to yourself.]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Look into string sharing for deep-frozen code #218

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Look into string sharing for deep-frozen code #218

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions