-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-38328: Speed up the creation time of constant list and set literals. #17114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38328: Speed up the creation time of constant list and set literals. #17114
Conversation
@benjaminp, care to take a look at this when you have a chance? |
Would you merge master and |
Thanks @methane. This branch is no longer stale. |
Python/compile.c
Outdated
@@ -3655,6 +3656,27 @@ starunpack_helper(struct compiler *c, asdl_seq *elts, | |||
{ | |||
Py_ssize_t n = asdl_seq_LEN(elts); | |||
Py_ssize_t i, nsubitems = 0, nseen = 0; | |||
if (n > 1 && are_all_items_const(elts, 0, n)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is n > 1
really good threshold? How about n > 2
?
$ python3 -m timeit -s "a=frozenset((1,2))" -- "{*a}"
5000000 loops, best of 5: 63.4 nsec per loop
$ python3 -m timeit "{1,2}"
5000000 loops, best of 5: 61.5 nsec per loop
$ python3 -m timeit "[*(1,2)]"
5000000 loops, best of 5: 46.5 nsec per loop
$ python3 -m timeit "[1,2]"
5000000 loops, best of 5: 46.6 nsec per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only chose n > 1
because it always emits fewer opcodes than before the change. If you think n > 2
is a better heuristic, I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your first example is only slower because it pays the price of a LOAD_NAME
instead of a LOAD_CONST
(just a guess, though). It's not a direct comparison, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fewer opcodes is not always more efficient.
BUILD_LIST_UNPACK calls PyList_New(0)
and PyList_Extend(iterable)
. It causes over-allocation.
On the other hand, BUILD_LIST just calls PyList_New(n)
.
So I think we should be conservative about choosing the threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @methane. I think this makes sense.
I've bumped the threshold to your recommended value of n > 2
. Anything else?
Thanks for the guidance and merge @methane! |
At @methane's request, here is an alternate PR that takes place in the compiler, not the peephole optimizer. It would replace both #16498 and #16878.
I think I like this one better.
https://bugs.python.org/issue38328