Skip to content

Define the interface of a CodeLike object #117087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iritkatriel opened this issue Mar 20, 2024 · 7 comments
Open

Define the interface of a CodeLike object #117087

iritkatriel opened this issue Mar 20, 2024 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@iritkatriel
Copy link
Member

iritkatriel commented Mar 20, 2024

The C API for monitoring (#111997) works with code-like objects, so that the user is not required to create a CodeObject where there isn't one already.

We need to define the Python API of a CodeLike so that it's useful for tools that use monitoring. This issue is to define which fields of CodeObject we want to have on a CodeLike.

@markshannon @scoder @nedbat @gaogaotiantian

Linked PRs

@gaogaotiantian
Copy link
Member

So basically the CodeLike object is a Python object for Python libraries to get information right? If it complies to the existing monitoring callback, then we'd expect any callback function to work even with the CodeLike object - but that's not entirely possible right? The user using sys.monitoring can do whatever they want with the code object so unless we mimic a full CodeObject, they can always fall into some trap.

Or we are actually talking about a useful object that can provide some information about the code, which could be used by a variaty of tools? For example, when cython triggers a monitoring event LINE, it provides a CodeLike object that contains a line number for cython code, even though it has nothing to do with CPython CodeObject. Then the question would be - what information the tools need for monitoring events? Debugger might be too complicated to fulfill but the profilers might have something in common.

@markshannon
Copy link
Member

markshannon commented Apr 12, 2024

The main purpose of the code-like object, from the perspective of tools, is to convert a code_like/offset pair into a full location: filename, startline, startcolumn, endline, endcolumn.

We also want to support the instrospection method/attributes of code objects, and with the same names for ease of porting for coverage.py, profile, etc.

So, how about this:

class CodeLike(metaclass=ABCMeta):

    @abstractmethod
    def offset_to_location(self, offset):
        """Returns the 5-tuple (filename, startline, startcolumn, endline, endcolumn) for the given offset.
         May return None if the offset is valid, but there is no location information for it.
         If the offset is not valid, a ValueError should be raised"""

    @abstractproperty
    def co_name(self):
        "The (short) name of this callable"

    @abstractproperty
    def co_qualname(self):
        "The full, qualified name of this callable"

    @abstractproperty
    def co_filename(self): 
        "The name of the primary file defining this callable"

    @abstractproperty
    def co_argcount(self): 
        "The maximum number of arguments for this callable"

    @abstractproperty
    def co_posonlyargcount(self): 
        "The number of positional only arguments for this callable"

    @abstractproperty
    def co_kwonlyargcount(self): 
        "The maximum of keyword only arguments for this callable"

@scoder
Copy link
Contributor

scoder commented May 27, 2024

My impression is that most code that uses tracing currently expects a real CodeObject. And the current implementation requires it, see #111997 (comment) and here:

allocate_instrumentation_data(PyCodeObject *code)
{
ASSERT_WORLD_STOPPED_OR_LOCKED(code);
if (code->_co_monitoring == NULL) {
code->_co_monitoring = PyMem_Malloc(sizeof(_PyCoMonitoringData));
if (code->_co_monitoring == NULL) {
PyErr_NoMemory();
return -1;
}
code->_co_monitoring->local_monitors = (_Py_LocalMonitors){ 0 };
code->_co_monitoring->active_monitors = (_Py_LocalMonitors){ 0 };
code->_co_monitoring->tools = NULL;
code->_co_monitoring->lines = NULL;
code->_co_monitoring->line_tools = NULL;
code->_co_monitoring->per_instruction_opcodes = NULL;
code->_co_monitoring->per_instruction_tools = NULL;
}
return 0;
}

Can't we split the CodeObject type somehow and expose a public part (or subclass) of it, so that the code keeps working that needs the public interface and internal code stays internal and e.g. goes through a generic "here's more internal stuff" pointer? The _PyCoMonitoringData already goes into that direction.

The current CodeObject is really two things in one: a CallableMetadataObject and a BytecodeObject. They are not the same, even in CPython. Builtin functions should have the first but not the second.

@JacobCoffee JacobCoffee added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 10, 2024
@markshannon
Copy link
Member

In the sys.monitoring docs, CodeType is used in the various event callbacks and in sys.monitoring.get_local_events and sys.monitoring.set_local_events.

Changing the callback signature to expect CodeLike instead of just CodeType is mainly a documentation and social issue. We need buy-in from tool authors as well as just changing the docs.
Pure Python code, without explicit isinstance checks should just work, though.

To support sys.monitoring.get_local_events and sys.monitoring.set_local_events we'll need two more methods on CodeLike objects:

class CodeLike(metaclass=ABCMeta):

    ...

    @abstractmethod
    def __get_local_events__(self, tool_id: int) -> int:
        """Gets the local events set previously set by __set_local_events__.
           Called by sys.monitoring.get_local_events(tool_id, self)"""

    @abstractmethod
    def __set_local_events__(self, tool_id: int, event_set: int) -> None:
        "Sets the local events. Called by sys.monitoring.set_local_events(tool_id, self, event_set)"

@nedbat
Copy link
Member

nedbat commented Mar 19, 2025

I'm a tool author, but a bit lost on what is being asked of me here. Coverage.py now fully supports branch coverage using sys.monitoring as it is.

@markshannon
Copy link
Member

markshannon commented Mar 19, 2025

@nedbat We are asking what part of the code object API does coverage (and other tools that use sys.monitoring) actually use, and how easy or difficult would it be to change to use only the API proposed above.

For example, does coverage use the co_positions() method, which is not in the proposed API?
And, if it does use co_positions, how easy would it be to use the proposed offset_to_location() method instead?

@nedbat
Copy link
Member

nedbat commented Mar 20, 2025

Things coverage.py does with code objects:

  • iterates over code.co_consts to find nested code objects (including using isinstance(c, CodeType) to identify them).
  • checks if c.co_name != "__annotate__" to skip annotations while looking for code objects.
  • creates them using the compile() built-in.
  • reads them from .pyc files using marshal.load().
  • uses c.co_lines() to get line numbers of executable code. Also still uses c.co_lnotab and c.co_firstlineno for the same, but that will be gone when we drop 3.9 in the fall.
  • for debug logging, uses c.co_name, c.co_filename, and c.co_firstlineno.
  • calls dis.get_instructions(c) to analyze bytecode.
  • uses c.co_firstlineno to record possible jump destinations.
  • uses id(code) to uniquely identify them (we had a discussion about equality of code objects).

I don't use co_positions() at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants