Skip to content

gh-106608: make uop trace variable length #107531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 5, 2023
Merged

Conversation

rdrf2838
Copy link
Contributor

@rdrf2838 rdrf2838 commented Aug 1, 2023

The _PyUOpExecutorObject was hard-coded to have a length-32 trace buffer of _PyUOpInstructions. The goal is to allow the buffer to have a variable length which can be determined at runtime.

To do so, I changed the type of the base _PyExecutorObject to a variable object, and used the PyObject_NewVar function to initialize the object instead.

After using gdb to trace the usage of the _PyUOpExecutorObject, I did not find any use which needed to be modified to work with variable objects, and so I did not make any additional changes.

This is my first time contributing, so please let me know if I made a mistake anywhere!

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ghost
Copy link

ghost commented Aug 1, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@@ -698,7 +698,7 @@ uop_optimize(
return trace_length;
}
OBJECT_STAT_INC(optimization_traces_created);
_PyUOpExecutorObject *executor = PyObject_New(_PyUOpExecutorObject, &UOpExecutor_Type);
_PyUOpExecutorObject *executor = PyObject_NewVar(_PyUOpExecutorObject, &UOpExecutor_Type, trace_length+1);
Copy link
Contributor Author

@rdrf2838 rdrf2838 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace_length+1 value is used, as line 708 adds a sentinel uop which causes the number of instructions to be trace_length+1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment here so that we don't forget, or to help others understand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, I think the allocator will allocate an extra item anyways because of

.tp_basicsize = sizeof(_PyUOpExecutorObject),

If you compare to PyTuple_Type, it uses this:

sizeof(PyTupleObject) - sizeof(PyObject *),

so that the basic size excludes the size of the dummy array in the PyTupleObject type:

    PyObject *ob_item[1];

I do like that we explicitly have +1 here to make it clear that we intend to use the sentinel value.

However, now that the executor is variable-length and always has room for the sentinel, I think that on line 708 below you need to set the sentinel unconditionally, i.e. remove the condition on line 707.

Please also update uop_len to just return ob_size instead of counting until the sentinel. (This also means we could just get rid of the sentinel -- I think it only exists for uop_size, and it's never reachable by the interpreter.)

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful improvement

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm only requesting changes to counteract @Fidget-Spinner's premature approval.)

@@ -698,7 +698,7 @@ uop_optimize(
return trace_length;
}
OBJECT_STAT_INC(optimization_traces_created);
_PyUOpExecutorObject *executor = PyObject_New(_PyUOpExecutorObject, &UOpExecutor_Type);
_PyUOpExecutorObject *executor = PyObject_NewVar(_PyUOpExecutorObject, &UOpExecutor_Type, trace_length+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, I think the allocator will allocate an extra item anyways because of

.tp_basicsize = sizeof(_PyUOpExecutorObject),

If you compare to PyTuple_Type, it uses this:

sizeof(PyTupleObject) - sizeof(PyObject *),

so that the basic size excludes the size of the dummy array in the PyTupleObject type:

    PyObject *ob_item[1];

I do like that we explicitly have +1 here to make it clear that we intend to use the sentinel value.

However, now that the executor is variable-length and always has room for the sentinel, I think that on line 708 below you need to set the sentinel unconditionally, i.e. remove the condition on line 707.

Please also update uop_len to just return ob_size instead of counting until the sentinel. (This also means we could just get rid of the sentinel -- I think it only exists for uop_size, and it's never reachable by the interpreter.)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@rdrf2838
Copy link
Contributor Author

rdrf2838 commented Aug 4, 2023

In response to Guido's comments, I have made the following changes:

  1. I have changed .tp_basicsize so that no excess memory is allocated.
  2. I verified that the sentinel value is used only in uop_len.
  3. Following 2., I removed the sentinel value and used Py_SIZE to access the object's length instead.
  4. I verified that the changes work by checking that the tests do call the uop_len function, such as in
    uops = {opname for opname, _, _ in ex}

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@gvanrossum, @Fidget-Spinner: please review the changes made to this pull request.

@Fidget-Spinner
Copy link
Member

Woops yeah I overlooked what Guido mentioned. The new PR LGTM though (FWIW :).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peachy perfect. Thanks! I'll merge now.

@gvanrossum gvanrossum merged commit 4e6fac7 into python:main Aug 5, 2023
@gvanrossum
Copy link
Member

There might be buildbot failures. Don't worry I will handle those if necessary (but we've had a lot of false positives there lately).

@rdrf2838 rdrf2838 deleted the gh-106608 branch August 5, 2023 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants