-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-106812: Refactor to allow uops with array stack effects #107564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Fidget-Spinner I am tempted to just merge this without fixing everything I listed above -- we can improve things iteratively, and I feel it's more important to go back to gh-106581 (this started as a giant yak to shave for that). What do you think? |
I'd prefer we work on this iteratively as well. It's easier to review that way too. Also, surprisingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good in general
Tools/cases_generator/analysis.py
Outdated
if vars[eff.name] != eff: | ||
self.error( | ||
f"Instruction {instr.name!r} has " | ||
f"inconsistent types for variable {eff.name!r}: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be inconsistent types or just inconsistent in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could mean that either type
, cond
or size
is inconsistent. I'll change it.
Tools/cases_generator/analysis.py
Outdated
for name, eff in vars.items(): | ||
if name in all_vars: | ||
if all_vars[name] != eff: | ||
self.warning( | ||
f"Macro {mac.name!r} has" | ||
f"inconsistent types for variable {name!r}: " | ||
f"{all_vars[name]} vs {eff} in {part.instr.name!r}", | ||
mac.macro, | ||
) | ||
else: | ||
all_vars[name] = eff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this block ever warn? Wouldn't it all be consistent as it's already checked in get_var_names
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks for inconsistency across different instructions. E.g.
op(A, (args[oparg] --)) { ... }
op(B, (args[oparg+1] --)) { ... }
macro(M) = A + B;
|
||
|
||
@dataclasses.dataclass | ||
class StackOffset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be represented instead as an index from the TOS, then a capture of the "stack"?
E.g. for a PEEK(1)
it would be index=-2, stack=[item1, item2, item3, item4]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... That's closer to the old way of doing this, where the stack was represented (implicitly) by variables _tmp_1
, _tmp_2
, etc., and the stack offset of an effect was represented as being mapped to one of those variables (using the old input_mapping
and output_mapping
members of Component
). I ran into some problems there when an effect is conditional, and managed to hack that in (but only for the output effects of the last component). But with array effects for guard instructions like we will need for CALL guards I just couldn't hack it any more, and instead I came up with this abstraction, which can handle conditional and array effects anywhere in a macro.
The only thing this cannot handle is a situation where a value is temporarily pushed onto the stack, stays there for the next uop, and then is popped later:
op(A, (-- temp)) { ... } // stack: [] -> [temp]
op(B, (--)) { ... } // stack: [temp] -> [temp]
op(C, (temp --)) { ... } // stack: [temp] -> []
macro(M1) = A + C;
macro(M2) = A + B + C;
Here both M1
and M2
have a net stack effect of 0, n_popped
is 0, and n_pushed
is 0, and we cannot express using n_popped
and n_pushed
that this uses one temporary stack item. For M1
that's not a problem, because the algorithm translates the push in A
and the pop in C
into a copy, which doesn't require stack space. (Also, the copy disappears because the variable name is the same.) But for M2
the push and pop are not adjacent so they are not optimized away like that.
I could improve the algorithm to recognize this situation and use a copy for M2, but it's more complicated and it's unlikely that we'll need this. (Most likely in a real case there would be a result pushed onto the stack at the end, so the problem of phantom stack space wouldn't occur.) I ought to at least detect it and warn, but I'd rather do that in a future PR, since this one is complex enough as it is.
Note: the generated code for M2
clearly shows the problem:
TARGET(M2) {
PyObject *temp;
// A
{
...
}
stack_pointer[0] = temp;
// B
{
...
}
// C
temp = stack_pointer[0];
{
...
}
DISPATCH();
}
Note that stack_pointer[0]
is an invalid stack item, pointing just above the current stack top. (The actual top is stack_pointer[-1]
.)
(stack_analysis.py was no longer being called!)
Tools/cases_generator/analysis.py
Outdated
if vars[eff.name] != eff: | ||
self.error( | ||
f"Instruction {instr.name!r} has " | ||
f"inconsistent types for variable {eff.name!r}: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could mean that either type
, cond
or size
is inconsistent. I'll change it.
Tools/cases_generator/analysis.py
Outdated
for name, eff in vars.items(): | ||
if name in all_vars: | ||
if all_vars[name] != eff: | ||
self.warning( | ||
f"Macro {mac.name!r} has" | ||
f"inconsistent types for variable {name!r}: " | ||
f"{all_vars[name]} vs {eff} in {part.instr.name!r}", | ||
mac.macro, | ||
) | ||
else: | ||
all_vars[name] = eff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks for inconsistency across different instructions. E.g.
op(A, (args[oparg] --)) { ... }
op(B, (args[oparg+1] --)) { ... }
macro(M) = A + B;
Tools/cases_generator/analysis.py
Outdated
@@ -371,7 +439,7 @@ def stack_analysis( | |||
eff.size for eff in instr.input_effects + instr.output_effects | |||
): | |||
# TODO: Eventually this will be needed, at least for macros. | |||
self.error( | |||
self.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the checks in this function are no longer needed, and in fact it's not called any more, so I'm deleting it. :-)
|
||
|
||
@dataclasses.dataclass | ||
class StackOffset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... That's closer to the old way of doing this, where the stack was represented (implicitly) by variables _tmp_1
, _tmp_2
, etc., and the stack offset of an effect was represented as being mapped to one of those variables (using the old input_mapping
and output_mapping
members of Component
). I ran into some problems there when an effect is conditional, and managed to hack that in (but only for the output effects of the last component). But with array effects for guard instructions like we will need for CALL guards I just couldn't hack it any more, and instead I came up with this abstraction, which can handle conditional and array effects anywhere in a macro.
The only thing this cannot handle is a situation where a value is temporarily pushed onto the stack, stays there for the next uop, and then is popped later:
op(A, (-- temp)) { ... } // stack: [] -> [temp]
op(B, (--)) { ... } // stack: [temp] -> [temp]
op(C, (temp --)) { ... } // stack: [temp] -> []
macro(M1) = A + C;
macro(M2) = A + B + C;
Here both M1
and M2
have a net stack effect of 0, n_popped
is 0, and n_pushed
is 0, and we cannot express using n_popped
and n_pushed
that this uses one temporary stack item. For M1
that's not a problem, because the algorithm translates the push in A
and the pop in C
into a copy, which doesn't require stack space. (Also, the copy disappears because the variable name is the same.) But for M2
the push and pop are not adjacent so they are not optimized away like that.
I could improve the algorithm to recognize this situation and use a copy for M2, but it's more complicated and it's unlikely that we'll need this. (Most likely in a real case there would be a result pushed onto the stack at the end, so the problem of phantom stack space wouldn't occur.) I ought to at least detect it and warn, but I'd rather do that in a future PR, since this one is complex enough as it is.
Note: the generated code for M2
clearly shows the problem:
TARGET(M2) {
PyObject *temp;
// A
{
...
}
stack_pointer[0] = temp;
// B
{
...
}
// C
temp = stack_pointer[0];
{
...
}
DISPATCH();
}
Note that stack_pointer[0]
is an invalid stack item, pointing just above the current stack top. (The actual top is stack_pointer[-1]
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just me mumbling to myself.)
res = f"stack_pointer[{index}]" | ||
if not lax: | ||
# Check that we're not reading or writing above stack top. | ||
# Skip this for output variable initialization (lax=True). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still pondering if I can add a working check for writing arrays above stack level. This is tricky because output array variables are initialized before the body of the opcode, at a point where stack_pointer
is still low. E.g.
inst(UNPACK_SEQUENCE_TUPLE, (unused/1, seq -- values[oparg])) {
...
}
which produces output like this:
TARGET(UNPACK_SEQUENCE_TUPLE) {
PyObject *seq;
PyObject **values;
seq = stack_pointer[-1];
values = stack_pointer - 1;
...
STACK_SHRINK(1);
STACK_GROW(oparg);
next_instr += 1;
DISPATCH();
}
The assert would trigger, because the code (...
) may well write above stack_pointer
, but all is well because of the following STACK_GROW(oparg)
. Hence the lax
flag, which disables the assert for output array variables (L372 below). It's hard to conceive of a realistic example where the failing assert would not be a false positive. A theoretical example would be:
op(A, (-- temp[oparg])) { ... }
op(B, (temp[oparg] --) { ... }
macro(M) = A + B;
But I don't expect we'll ever write such code.
This adds a new file, stacking.py, which tracks pushes and pops across the uops comprising a macro. Instruction writing for non-macro instructions is also unified with this.
The generated files look quite different, but I have carefully verified that everything works. (And usually if it doesn't, it won't even build. :-)
TODO:
Analyzer
around and turn a few asserts into error messagesAnalyzer.check_macro_consistency
(fold into write_components)Analyzer.stack_analysis
that are no longer errorsStackItem
so the effect itself is included indeep
/high
StackItem
to have aStackOffset
member instead of inheriting itStackOffset
operations to use__add__
,__sub__
etc.