Skip to content

bpo-45949: Pure Python freeze module for cross builds (GH-29899) #29899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 13, 2021

Conversation

tiran
Copy link
Member

@tiran tiran commented Dec 2, 2021

Use _bootstrap_python interpreter and pure Python implementation of
freeze_module to generate frozen byte code files. Only importlib
bootstrap files are generated with Programs/_freeze_module.

This simplifies cross building, as the build system no longer needs a
_freeze_module binary. A standard Python installation with same
version is sufficient.

Signed-off-by: Christian Heimes [email protected]

https://bugs.python.org/issue45949

@gvanrossum
Copy link
Member

Where would the frozen importlib files come from? Are you adding those back to git? (Otherwise I don't see how this helps.)

@tiran
Copy link
Member Author

tiran commented Dec 2, 2021

Where would the frozen importlib files come from? Are you adding those back to git? (Otherwise I don't see how this helps.)

  • For non-cross builds
    • Python/frozen_modules/importlib._bootstrap* are generated with Program/_freeze_module. The files are needed to create _bootstrap_python.
    • other Python/frozen_modules/*.h are generated with _bootstrap_python and freeze_module.py
  • For cross builds all Python/frozen_modules/*.h are generated with an host Python interpreter (must have same version) and freeze_module.py. Cross builds can neither use ./_bootstrap_python nor Program/_freeze_module from the current build because all files are for a different architecture or platform.

@gvanrossum
Copy link
Member

Cross builds don't just need the same version -- they need to use the same commit, unless we're in beta or beyond.

I'm not sure what's the point of ever using freeze_module.py for non-cross builds. Is it just so that freeze_module.py gets exercised regularly?

@tiran
Copy link
Member Author

tiran commented Dec 2, 2021

Cross builds don't just need the same version -- they need to use the same commit, unless we're in beta or beyond.

Good point, I'll update the documentation.

I'm not sure what's the point of ever using freeze_module.py for non-cross builds. Is it just so that freeze_module.py gets exercised regularly?

Yes, you are correct. It is not necessary to use freeze_module.py for standard builds. It is the easiest way to test that the code does not regress. I'll add a comment.

@tiran
Copy link
Member Author

tiran commented Dec 2, 2021

The main point of the PR is to get rid of ./configure --with-freeze-module=Programs/_freeze_module again. It makes cross compiling more complicated. Users need to provide the helper tool for the build platform.

This patch removes the dependency. Users will be able to cross build a 3.11 interpreter with a 3.11 host Python (once byte code and API has stabilized in beta).

@kumaraditya303
Copy link
Contributor

How about committing the importlib bootstrap related files to git and generate the rest with pure python version on both standard and cross builds provided the bytecode is same, this would simplify things a lot ?

@tiran
Copy link
Member Author

tiran commented Dec 3, 2021

How about committing the importlib bootstrap related files to git and generate the rest with pure python version on both standard and cross builds provided the bytecode is same, this would simplify things a lot ?

In the past we had the importlib bootstrap files in git. The approach has the downside that it increases the size of git history a lot. Really a lot a lot.

The files are fairly large, about 350k in total. They also must be regenerated every time the source Python files are updated or any aspect of the byte code generator and optimizer is changed. Even a tiny change results in a large diff, often affecting all lines of the header files.

@kumaraditya303
Copy link
Contributor

In the past we had the importlib bootstrap files in git. The approach has the downside that it increases the size of git history a lot. Really a lot a lot.

Yes, it will increase the size of git history but does anyone clones the repo in full ?, for builds most would do a shallow clone.

The files are fairly large, about 350k in total. They also must be regenerated every time the source Python files are updated or any aspect of the byte code generator and optimizer is changed. Even a tiny change results in a large diff, often affecting all lines of the header files.

For diff, these files can be marked as linguist-generated then github would suppress them in PRs

@tiran
Copy link
Member Author

tiran commented Dec 3, 2021

Yes, it will increase the size of git history but does anyone clones the repo in full ?, for builds most would do a shallow clone.

I have a full clone and usually everybody else with a local checkout has a full clone as well.

@arhadthedev
Copy link
Member

For diff, these files can be marked as linguist-generated then github would suppress them in PRs

It opens a great possibility to introduce a vulnerability into the Python codebase. Huge (especially suppressed) diffs are hard to analyze so they allow alterations of a generated file that will go unnoticed.

In addition, GitHub is just a part of a development conveyor. Before it there ara local non-Github tools (git diff/gitk, TortoiseGit, Sublime Merge, and so on) that do not support linguist-generated=true extension.

[...] but does anyone clones the repo in full?

Actually this is the very purpose of Git as a distributed version control system, to distribute full clones and work with them offline.

@kumaraditya303
Copy link
Contributor

It opens a great possibility to introduce a vulnerability into the Python codebase. Huge (especially suppressed) diffs are hard to analyze so they allow alterations of a generated file that will go unnoticed.

If generated files were to be added to the repo then it would have been verified on CI that the generated files and files generated on CI are equal.

@tiran
Copy link
Member Author

tiran commented Dec 3, 2021

If generated files were to be added to the repo then it would have been verified on CI that the generated files and files generated on CI are equal.

You would also have to make sure that non of the validating tools rely on the generated files.

@arhadthedev
Copy link
Member

If generated files were to be added to the repo then it would have been verified on CI that the generated files and files generated on CI are equal.

If bots are/will be involved, then I agree and retract my note on vulnerability planting.

In fact, I am aware that bootstrapping in a bare environment with no predecessor (so _bootstrap_python minimal interpreter is introduced) is a can of worms that force hardly bearable compromises.

@arhadthedev
Copy link
Member

Indeed, maintainability of the diffs is still a question. Even with suppression, unanalysable black boxes carried along with actual proposed changed is not a good idea.

@tiran tiran force-pushed the bpo-45949-pyfreeze branch from 7d9cda1 to dce68d0 Compare December 3, 2021 15:25
@tiran tiran marked this pull request as ready for review December 3, 2021 15:25
@tiran
Copy link
Member Author

tiran commented Dec 3, 2021

I have added more comments to explain the bootstrap process.

Sorry for the squash merge + force push. I ran into merge conflicts and eventually gave up. Merge was less work.

@tiran tiran force-pushed the bpo-45949-pyfreeze branch 2 times, most recently from f7e07fd to 2a44284 Compare December 3, 2021 18:36
@tiran tiran changed the title bpo-45949: Pure Python freeze module bpo-45949: Pure Python freeze module for cross builds Dec 5, 2021
@tiran tiran force-pushed the bpo-45949-pyfreeze branch 4 times, most recently from 454514c to 55e1527 Compare December 9, 2021 14:26
Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

I've left a few minor comments. The only matter of consequence is about make rule dependencies in the cross-compiling case, as well as running make regen-frozen as one of the steps for a cross-compiled build.

I'm approving the PR under the assumption that you'll resolve my concerns before merging. 🙂

Makefile.pre.in Outdated
Comment on lines 1119 to 1138
# We manually freeze getpath.py rather than through freeze_modules
Python/frozen_modules/getpath.h: Modules/getpath.py $(FREEZE_MODULE_C)
$(FREEZE_MODULE_BOOTSTRAP) getpath $(srcdir)/Modules/getpath.py Python/frozen_modules/getpath.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "manually", can this be generated now? (added to Tools/scripts/freeze_modules.py)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getpath is not like the others. It would need special handling in Tools/scripts/freeze_modules.py, too. Let's tackle the getpath target in another PR. I only moved existing block to a different location so it is closer to other early bootstrap blocks.

reference count > 1. Marshal adds the `FLAG_REF` flag and creates a
reference `hashtable`.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a short note here (or even a TODO) about the relationship with Tools/freeze/freeze.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@ericsnowcurrently
Copy link
Member

and sorry for the delay in getting you a review!

Use `_bootstrap_python` interpreter and pure Python implementation of
`freeze_module` to generate frozen byte code files. Only importlib
bootstrap files are generated with `Programs/_freeze_module`.

This simplifies cross building, as the build system no longer needs a
`_freeze_module` binary. A standard Python installation with same
version is sufficient.

Signed-off-by: Christian Heimes <[email protected]>
@tiran tiran force-pushed the bpo-45949-pyfreeze branch from 55e1527 to 4fd065f Compare December 11, 2021 16:41
- Fix typo
- Remove confusing comment
- Improve dependency handling for freezing
@tiran tiran force-pushed the bpo-45949-pyfreeze branch from 4fd065f to 0d328a9 Compare December 11, 2021 16:42
@tiran
Copy link
Member Author

tiran commented Dec 13, 2021

@ericsnowcurrently Are you satisfied with the current state of the patch?

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tiran tiran changed the title bpo-45949: Pure Python freeze module for cross builds bpo-45949: Pure Python freeze module for cross builds (GH-29899) Dec 13, 2021
@tiran tiran merged commit eb483c4 into python:main Dec 13, 2021
@tiran tiran deleted the bpo-45949-pyfreeze branch December 13, 2021 19:48
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Windows8.1 Non-Debug 3.x has failed when building commit eb483c4.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/405/builds/831) and take a look at the build logs.
  4. Check if the failure is related to this commit (eb483c4) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/405/builds/831

Summary of the results of the build (if available):

== Tests result: ENV CHANGED ==

400 tests OK.

10 slowest tests:

  • test_peg_generator: 2 min 30 sec
  • test_mmap: 2 min 25 sec
  • test_io: 2 min
  • test_multiprocessing_spawn: 1 min 59 sec
  • test_concurrent_futures: 1 min 43 sec
  • test_largefile: 1 min 38 sec
  • test_mailbox: 1 min 13 sec
  • test_asyncio: 59.8 sec
  • test_unparse: 45.5 sec
  • test_socket: 43.0 sec

1 test altered the execution environment:
test_asyncio

30 tests skipped:
test_curses test_dbm_gnu test_dbm_ndbm test_devpoll test_epoll
test_fcntl test_fork1 test_gdb test_grp test_ioctl test_kqueue
test_multiprocessing_fork test_multiprocessing_forkserver test_nis
test_openpty test_ossaudiodev test_pipes test_poll test_posix
test_pty test_pwd test_readline test_resource test_spwd
test_syslog test_threadsignals test_wait3 test_wait4
test_xxtestfuzz test_zipfile64

Total duration: 10 min 8 sec

Click to see traceback logs
Traceback (most recent call last):
  File "D:\buildarea\3.x.ware-win81-release.nondebug\build\Lib\asyncio\proactor_events.py", line 116, in __del__
    self.close()
    ^^^^^^^^^^^^
  File "D:\buildarea\3.x.ware-win81-release.nondebug\build\Lib\asyncio\proactor_events.py", line 108, in close
    self._loop.call_soon(self._call_connection_lost, None)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\buildarea\3.x.ware-win81-release.nondebug\build\Lib\asyncio\base_events.py", line 745, in call_soon
    self._check_closed()
    ^^^^^^^^^^^^^^^^^^^^
  File "D:\buildarea\3.x.ware-win81-release.nondebug\build\Lib\asyncio\base_events.py", line 506, in _check_closed
    raise RuntimeError('Event loop is closed')
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Event loop is closed
k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants