Skip to content

Generate the interpreter #98831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gvanrossum opened this issue Oct 28, 2022 · 2 comments
Closed

Generate the interpreter #98831

gvanrossum opened this issue Oct 28, 2022 · 2 comments
Assignees
Labels
3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Oct 28, 2022

Overview

Over in faster-cpython/ideas we've been exploring the idea of generating the interpreter from a set of instruction definitions. This will eventually enable having multiple versions of the interpreter (e.g. with and without tracing enabled), and it will allow us to automatically combine instructions using a powerful notation (e.g. super(LOAD_FAST__LOAD_FAST) = LOAD_FAST + LOAD_FAST;). It will also let us auto-generate things like stack_effect().

We are planning to land at least an early version of this work in 3.12. We have a tentative grammar for instruction definition DSL -- which will undoubtedly undergo several iterations before we've settled. We have a first draft of the tooling ready for review (which currently reproduces the status quo) done.

Once the first version of the tooling has landed we expect to iterate quickly, using this issue an umbrella issue for our PRs to link to.

References

@gvanrossum gvanrossum added the type-feature A feature request or enhancement label Oct 28, 2022
@kumaraditya303 kumaraditya303 added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.12 only security fixes labels Oct 29, 2022
@gvanrossum gvanrossum self-assigned this Oct 31, 2022
@gvanrossum gvanrossum changed the title Generating the interpreter Generate the interpreter Oct 31, 2022
gvanrossum added a commit that referenced this issue Nov 3, 2022
The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code).

The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md.

This is surely a work-in-progress. An easy next step could be auto-generating super-instructions.

**IMPORTANT: Merge Conflicts**

If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).
gvanrossum added a commit that referenced this issue Nov 3, 2022
gvanrossum added a commit that referenced this issue Nov 4, 2022
gvanrossum added a commit that referenced this issue Nov 6, 2022
gvanrossum added a commit that referenced this issue Nov 10, 2022
)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <[email protected]>
gvanrossum added a commit to gvanrossum/cpython that referenced this issue Nov 10, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <[email protected]>
ethanfurman pushed a commit to ethanfurman/cpython that referenced this issue Nov 12, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <[email protected]>
gvanrossum added a commit that referenced this issue Nov 18, 2022
Also complete cache effects for BINARY_SUBSCR family.
gvanrossum added a commit that referenced this issue Nov 23, 2022
Newly supported interpreter definition syntax:
- `op(NAME, (input_stack_effects -- output_stack_effects)) { ... }`
- `macro(NAME) = OP1 + OP2;`

Also some other random improvements:
- Convert `WITH_EXCEPT_START` to use stack effects
- Fix lexer to balk at unrecognized characters, e.g. `@`
- Fix moved output names; support object pointers in cache
- Introduce `error()` method to print errors
- Introduce read_uint16(p) as equivalent to `*p`

Co-authored-by: Brandt Bucher <[email protected]>
gvanrossum added a commit that referenced this issue Dec 8, 2022
Stack effects can now have a type, e.g. `inst(X, (left, right -- jump/uint64_t)) { ... }`.

Instructions converted to the non-legacy format:

* COMPARE_OP
* COMPARE_OP_FLOAT_JUMP
* COMPARE_OP_INT_JUMP
* COMPARE_OP_STR_JUMP
* STORE_ATTR
* DELETE_ATTR
* STORE_GLOBAL
* STORE_ATTR_INSTANCE_VALUE
* STORE_ATTR_WITH_HINT
* STORE_ATTR_SLOT, and complete the store_attr family
* Complete the store_subscr family: STORE_SUBSCR{,DICT,LIST_INT}
  (STORE_SUBSCR was alread half converted,
  but wasn't using cache effects yet.)
* DELETE_SUBSCR
* PRINT_EXPR
* INTERPRETER_EXIT (a bit weird, ends in return)
* RETURN_VALUE
* GET_AITER (had to restructure it some)
  The original had mysterious `SET_TOP(NULL)` before `goto error`.
  I assume those just account for `obj` having been decref'ed,
  so I got rid of them in favor of the cleanup implied by `ERROR_IF()`.
* LIST_APPEND (a bit unhappy with it)
* SET_ADD (also a bit unhappy with it)

Various other improvements/refactorings as well.
gvanrossum added a commit that referenced this issue Dec 8, 2022
This makes it easier to see what changed in the generated code
when converting an instruction to super or macro.
gvanrossum added a commit that referenced this issue Dec 17, 2022
…#100205)

The presence of this macro indicates that a particular instruction
may be considered for conversion to a register-based format
(see faster-cpython/ideas#485).

An invariant (currently unchecked) is that `DEOPT_IF()` may only
occur *before* `DECREF_INPUTS()`, and `ERROR_IF()` may only occur
*after* it. One reason not to check this is that there are a few
places where we insert *two* `DECREF_INPUTS()` calls, in different
branches of the code. The invariant checking would have to be able
to do some flow control analysis to understand this.

Note that many instructions, especially specialized ones,
can't be converted to use this macro straightforwardly.
This is because the generator currently only generates plain
`Py_DECREF(variable)` statements, and cannot generate
things like `_Py_DECREF_SPECIALIZED()` let alone deal with
`_PyList_AppendTakeRef()`.
gvanrossum added a commit to gvanrossum/cpython that referenced this issue Dec 17, 2022
… input (python#100205)

The presence of this macro indicates that a particular instruction
may be considered for conversion to a register-based format
(see faster-cpython/ideas#485).

An invariant (currently unchecked) is that `DEOPT_IF()` may only
occur *before* `DECREF_INPUTS()`, and `ERROR_IF()` may only occur
*after* it. One reason not to check this is that there are a few
places where we insert *two* `DECREF_INPUTS()` calls, in different
branches of the code. The invariant checking would have to be able
to do some flow control analysis to understand this.

Note that many instructions, especially specialized ones,
can't be converted to use this macro straightforwardly.
This is because the generator currently only generates plain
`Py_DECREF(variable)` statements, and cannot generate
things like `_Py_DECREF_SPECIALIZED()` let alone deal with
`_PyList_AppendTakeRef()`.
shihai1991 added a commit to shihai1991/cpython that referenced this issue Dec 18, 2022
* origin/main: (1306 commits)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
  pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)
  pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)
  pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)
  "Compound statement" docs: Fix with-statement step indexing (python#100286)
  pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
  Improve stats presentation for calls. (pythonGH-100274)
  Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295)
  pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277)
  Document that zipfile's pwd parameter is a `bytes` object (python#100209)
  pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271)
  Fix typo in introduction.rst (python#100266)
  ...
carljm added a commit to carljm/cpython that referenced this issue Dec 19, 2022
* main:
  pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803)
  Docs: Don't upload CI artifacts (python#100330)
  pythongh-94912: Added marker for non-standard coroutine function detection (python#99247)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
mdboom pushed a commit to mdboom/cpython that referenced this issue Jan 31, 2023
carljm added a commit to carljm/cpython that referenced this issue Jan 31, 2023
* main:
  pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439)
  pythongh-99276 - Updated Doc/faq/general.rst (python#101396)
  Add JOBS parameter to docs Makefile (python#101395)
  pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443)
  pythongh-77607: Improve accuracy of os.path.join docs (python#101406)
  Fixes typo in asyncio.TaskGroup context manager code example (python#101449)
  pythongh-98831: Clean up and add cache size static_assert to macro (python#101442)
  pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
carljm added a commit to carljm/cpython that referenced this issue Jan 31, 2023
* main:
  pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354)
  pythongh-101409: Improve generated clinic code for self type checks (python#101411)
  pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458)
  pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
carljm added a commit to carljm/cpython that referenced this issue Feb 1, 2023
* main:
  pythongh-98831: rewrite PUSH_EXC_INFO and conditional jumps in the instruction definition DSL (python#101481)
  pythongh-98831: Modernize the LOAD_ATTR family (python#101488)
  pythongh-101498 : Fix asyncio.Timeout example in docs (python#101499)
  pythongh-101454: fix documentation for END_ASYNC_FOR (python#101455)
  pythongh-101277: Isolate itertools, add group and _grouper types to module state (python#101302)
  pythongh-101317: Add `ssl_shutdown_timeout` parameter for `asyncio.StreamWriter.start_tls` (python#101335)
  datetime.rst: fix combine() signature (python#101490)
iritkatriel added a commit that referenced this issue Feb 3, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Feb 3, 2023
gvanrossum added a commit that referenced this issue Feb 7, 2023
iritkatriel added a commit that referenced this issue Feb 7, 2023
@gvanrossum
Copy link
Member Author

gvanrossum commented Feb 7, 2023

Once these PRs land, every opcode has cache/stack effect specifications, and I will disable the legacy syntax from the parser.

gvanrossum added a commit that referenced this issue Feb 7, 2023
New generator feature: Generate useful glue for output arrays, so you can just write values to the output array (no bounds checking). Rewrote UNPACK_SEQUENCE_TWO_TUPLE to use this, and also UNPACK_SEQUENCE_{TUPLE,LIST}.
gvanrossum added a commit that referenced this issue Feb 8, 2023
Generator update: support balanced parentheses and brackets in conditions and size expressions.
gvanrossum added a commit that referenced this issue Feb 8, 2023
New generator feature: Move CHECK_EVAL_BREAKER() call to just before DISPATCH().
gvanrossum added a commit that referenced this issue Feb 8, 2023
Includes a slight improvement to `DECREF_INPUTS()`.
gvanrossum added a commit that referenced this issue Feb 9, 2023
* Write output and metadata in a single run
  This halves the time to run the cases generator
  (most of the time goes into parsing the input).
* Declare or define opcode metadata based on NEED_OPCODE_TABLES
* Use generated metadata for stack_effect()
* compile.o depends on opcode_metadata.h
* Return -1 from _PyOpcode_num_popped/pushed for unknown opcode
carljm added a commit to carljm/cpython that referenced this issue Feb 9, 2023
* main: (82 commits)
  pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723)
  pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736)
  no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731)
  pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730)
  pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679)
  pythongh-85984: Remove legacy Lib/pty.py code. (python#92365)
  pythongh-98831: Use opcode metadata for stack_effect() (python#101704)
  pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719)
  pythongh-101283: Fix use of unbound variable (pythonGH-101712)
  pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286)
  pythongh-101277: Port more itertools static types to heap types (python#101304)
  pythongh-98831: Modernize CALL and family (python#101508)
  pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697)
  pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329)
  pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672)
  pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324)
  pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615)
  pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934)
  pythonGH-101578: Normalize the current exception (pythonGH-101607)
  pythongh-47937: Note that Popen attributes are read-only (python#93070)
  ...
carljm added a commit to carljm/cpython that referenced this issue Mar 14, 2023
* main: (50 commits)
  pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685)
  pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661)
  pythongh-102354: change python3 to python in docs examples (python#102696)
  pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506)
  pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684)
  pythongh-100315: clarification to `__slots__` docs. (python#102621)
  pythonGH-100227: cleanup initialization of global interned dict (python#102682)
  doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677)
  pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014)
  pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649)
  pythongh-102627: Replace address pointing toward malicious web page (python#102630)
  pythongh-98831: Use DECREF_INPUTS() more (python#102409)
  pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659)
  pythongh-101524: Fix the ChannelID tp_name (pythongh-102655)
  pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075)
  pythongh-98169 dataclasses.astuple support DefaultDict (python#98170)
  pythongh-102650: Remove duplicate include directives from multiple source files (python#102651)
  pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640)
  pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631)
  ...
@gvanrossum
Copy link
Member Author

Let's close this issue, the interpreter is now definitely being generated. Subsequent changes may reference this (if small) or have their own issue.

@github-project-automation github-project-automation bot moved this from Todo to Done in Fancy CPython Board Mar 15, 2023
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants