Skip to content

Duplicated unused bytecodes at end of function #86862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nedbat opened this issue Dec 20, 2020 · 7 comments
Closed

Duplicated unused bytecodes at end of function #86862

nedbat opened this issue Dec 20, 2020 · 7 comments
Assignees
Labels
3.10 only security fixes

Comments

@nedbat
Copy link
Member

nedbat commented Dec 20, 2020

BPO 42696
Nosy @rhettinger, @nedbat, @markshannon

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/markshannon'
closed_at = None
created_at = <Date 2020-12-20.21:29:18.434>
labels = ['3.10']
title = 'Duplicated unused bytecodes at end of function'
updated_at = <Date 2020-12-22.20:46:01.201>
user = 'https://github.com/nedbat'

bugs.python.org fields:

activity = <Date 2020-12-22.20:46:01.201>
actor = 'Mark.Shannon'
assignee = 'Mark.Shannon'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2020-12-20.21:29:18.434>
creator = 'nedbat'
dependencies = []
files = []
hgrepos = []
issue_num = 42696
keywords = []
message_count = 6.0
messages = ['383460', '383510', '383512', '383598', '383602', '383613']
nosy_count = 3.0
nosy_names = ['rhettinger', 'nedbat', 'Mark.Shannon']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42696'
versions = ['Python 3.10']

@nedbat
Copy link
Member Author

nedbat commented Dec 20, 2020

(Using CPython commit c95f8bc.)

This program has extra bytecodes:

    def f():
        for i in range(10):
            break
        return 17

The dis output is:

  1           0 LOAD_CONST               0 (<code object f at 0x109cf7450, file "afterbreak.py", line 1>)
              2 LOAD_CONST               1 ('f')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (f)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object f at 0x109cf7450, file "afterbreak.py", line 1>:
  2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (10)
              4 CALL_FUNCTION            1
              6 GET_ITER
              8 FOR_ITER                 8 (to 18)
             10 STORE_FAST               0 (i)

  3          12 POP_TOP

  4          14 LOAD_CONST               2 (17)
             16 RETURN_VALUE
        >>   18 LOAD_CONST               2 (17)
             20 RETURN_VALUE

The break has something to do with it, because if I change the Python to:

    def f():
        for i in range(10):
            a = 1
        return 17

then the dis output is:

  1           0 LOAD_CONST               0 (<code object f at 0x10bf8f450, file "afterbreak.py", line 1>)
              2 LOAD_CONST               1 ('f')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (f)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object f at 0x10bf8f450, file "afterbreak.py", line 1>:
  2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (10)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                 8 (to 18)
             10 STORE_FAST               0 (i)

  3          12 LOAD_CONST               2 (1)
             14 STORE_FAST               1 (a)
             16 JUMP_ABSOLUTE            8

  4     >>   18 LOAD_CONST               3 (17)
             20 RETURN_VALUE

@nedbat nedbat added the 3.10 only security fixes label Dec 20, 2020
@markshannon
Copy link
Member

My guess is that we are changing the CFG after computing reachability leaving unreachable blocks marked as reachable.

@markshannon markshannon self-assigned this Dec 21, 2020
@markshannon
Copy link
Member

Our reachability analysis is correct (in this case at least).
There is no unreachable code here.

In theory we could merge the two basic blocks at the end, but searching for identical blocks and merging them is potentially quite expensive (and fiddly).

What might be better is to be change the order of optimizations, so that we duplicate BBs after we have performed jump-to-jump elimination.

Is this a problem in practice?

@nedbat
Copy link
Member Author

nedbat commented Dec 22, 2020

This isn't a problem for me. I noticed it and figured I'd mention it.

@rhettinger
Copy link
Contributor

Is this a problem in practice?

It's just a hint that code generation has imperfections.

From a teacher's or author's point of view, it is something we have to explain away when demonstrating dis(). It leaves the impression that Python is kludgy.

@markshannon
Copy link
Member

In what way is it "kludgy"?
There is a mathematical theorem that states that generated code will be imperfect,
so we will have to live with that :)
https://en.wikipedia.org/wiki/Full_employment_theorem

3.10a produces more efficient bytecode than 3.9.

3.9:

2 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (10)
4 CALL_FUNCTION 1
6 GET_ITER
>> 8 FOR_ITER 8 (to 18)
10 STORE_FAST 0 (i)

3 12 POP_TOP
14 JUMP_ABSOLUTE 18
16 JUMP_ABSOLUTE 8

4 >> 18 LOAD_CONST 2 (17)
20 RETURN_VALUE

3.10a:

2 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (10)
4 CALL_FUNCTION 1
6 GET_ITER
8 FOR_ITER 8 (to 18)
10 STORE_FAST 0 (i)

3 12 POP_TOP

4 14 LOAD_CONST 2 (17)
16 RETURN_VALUE
>> 18 LOAD_CONST 2 (17)
20 RETURN_VALUE

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel
Copy link
Member

TL;DR: the return code is no longer duplicated for this example (but if you remove the return 17 then the implicit return None is still duplicated).

The long story:

The reason for the duplication is a small-exit-block-inlineing heuristic, where if we have an unconditional jump to an exit block of at most 4 instructions, that exit block is inlined and the jump is eliminated. I recently disabled this optimisation for the case where the exit block is associated with a line number, because it caused issues for the debugger (see #94592).

For cases where the block doesn't have a line number (as in the case of the implicit return None), we need to duplicate the block because the return needs to be traced with the line number of the last instruction that was executed that has a line number. Since it's too costly to keep track of this line number at runtime, the line info is static and this means that we need a copy of the return None for each location that it can be reached from.

Not duplicated:

>>> def f():
...         for i in range(10):
...             break
...         return 17
... 
>>> import dis
>>> dis.dis(f)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + range)
             14 LOAD_CONST               1 (10)
             16 CALL                     1
             26 GET_ITER
             28 FOR_ITER                 2 (to 36)
             32 STORE_FAST               0 (i)

  3          34 POP_TOP

  4     >>   36 LOAD_CONST               2 (17)
             38 RETURN_VALUE

Duplicated:

>>> def g():
...         for i in range(10):
...             break
... 
>>> dis.dis(g)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + range)
             14 LOAD_CONST               1 (10)
             16 CALL                     1
             26 GET_ITER
             28 FOR_ITER                 4 (to 40)
             32 STORE_FAST               0 (i)

  3          34 POP_TOP
             36 LOAD_CONST               0 (None)
             38 RETURN_VALUE

  2     >>   40 LOAD_CONST               0 (None)
             42 RETURN_VALUE

@iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Jul 9, 2022
@iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Jul 15, 2022
@erlend-aasland erlend-aasland removed the pending The issue will be closed if no feedback is provided label Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes
Projects
None yet
Development

No branches or pull requests

5 participants