-
Notifications
You must be signed in to change notification settings - Fork 36
Proposed spec changes + rethrow question #125
Comments
I agree that we should do this. The presentation and previous discussion show why, but here's my tl;dr version if it's helpful: The 2 reasons we originally had for moving to use I think getting consensus on switching away from exnref is the critical thing (that's item 1 in @aheejin's list above). The proposed way to solve the unwind mismatch problem is |
(also direct ping of some folks who've expressed opinions or participated in discussion previously: @rossberg @RossTate @ioannad @lukewagner @takikawa @fgmccabe @tlively @conrad-watt @KronicDeth @taralx) |
I find the example in the slides a bit confusing. Why would you not move |
I also am confused the same as @taralx. I assumed the example was just an oversimplification to fit it all in a slide and there is some pattern where the try can’t surround just the one call. |
Yes, that example is an oversimplification for the presentation purpose. There are cases that The below is a portion from a real CFG. It looks a little complicated, but other than BB names are long and cumbersome (because it is from a real application), it is not that bad. This is just to show this kind of CFG exists, so people who aren't interested can skip it. In this example,
try ;; _ZN10PixelArrayIhE6ResizeERK16PixelArrayFormathi.exit / matching 'catch' at ehcleanup59
try ;; _ZN10PixelArrayIhE6ResizeERK16PixelArrayFormathi.exit / matching 'catch' at ehcleanup54
...
code for for.body.i.i.i.i.preheader.i.218 ;; unwind destination mismatch!
...
code for for.body.i.i.i.i.preheader.i.247
...
catch ;; ehcleanup54
end
catch ;; ehcleanup59
end
|
I don't understand here. While it is true that this is the nearest common dominator, this CFG is irreducible even without the exception paths. So it's going to undergo rewriting, and any decision about where to put the Is it possible to distill this into a case with a reducible CFG and exception mismatch? |
@taralx Can I ask why this is irreducible? Currently our backend, which has our own pass for fixing irreducible CFGs, does not seem to rewrite this. I also tried LLVM's target-independent pass for fixing irreducible CFG and it doesn't seem to change the CFG either. |
You're right, not irreducible. But still seems ok after lowering:
|
@taralx Not sure what you mean by 'lowering'. The reason |
cc @backes too |
Here's my problem. If I try to convert your CFG into wasm, I get to this point:
And now what? I said "irreducible" earlier, but that was the wrong word -- it's not in a structured flow form. |
@taralx You can use |
@taralx And those long names in the CFG are BB names, and not function names, and each BB can have any number of instructions, including calls. The technique of placing The algorithm is called CFG stackification, and the full code is here just in case you are interested, but reading this is not necessary for this discussion at all. Maybe there exists a complicated general algorithm that can convert all CFGs that causes this mismatch problem so we don't need |
Ah, I see, this is an artifact of how LLVM converts its CFG, not an inherent limitation of the current model in wasm. FWIW, the Dream decompiler has an algorithm that reconstructs structured control flow from an arbitrary CFG. Importantly, it does so by constructing boolean predicates that can be used to linearize complex flows. That's where that if statement came from in my earlier comment. |
@taralx While our LLVM backend's CFG stackification isn't the only way to make structural control flow, I'm not sure other more powerful algorithms like you pointed out can solve our problem without any spec changes (if we remove The restriction here is, The currently proposed solution is one way to fix this, and we also think it can be a small addition to the first version of the proposal. But maybe there can be other alternative solutions, such as
What I'd like to say is, the unwind mismatch problem itself will very likely to be present even outside of out LLVM backend, as long as someone tries to convert a CFG, basically BB soup, to a linearized wasm. The specific solution we proposed might be more suited to our LLVM backend, but I think other compilers can easily make use of it too. |
To be clear, we are talking about an extra byte for each call instruction. However, I think there is a stronger argument against this (unfortunately): it would require either a set of completely new call instructions, or it would require invalidating existing code. Neither option seems palatable. |
As a meta point, my understanding is that the most pressing item is to determine if we want to go in the overall direction of these changes. Once that's decided, we can have more focused discussions on specifics (like the various options @aheejin mentions), but first we need to establish the high-level direction. |
Yes, that's also one of the concerns I posted.
I listed other options just to illustrate that our proposal wouldn't be the only solution in the world, and while it is possible that other changes can fix this too, they have downsides, and the unwind mismatch problem itself will likely to remain in other toolchain if we don't do anything. We can still find-tune the changes, but I'm not sure if we should spend more time discussing all those different paths in detail (whether we should introduce |
We decided to make the change (without the immediate argument to |
Sorry that I couldn't make it to the meeting yesterday. But I strongly suggest to reconsider dropping the immediate, since it creates a language that does no longer compose. Consider a pseudo-code source snippet like
This is something you would want to write, and it works. But now imagine the failure is itself signalled by an exception. Then you need to be able to write:
This would no longer translate. Such failures of composability of nestable constructs due to implicit naming are a common mistake in programming languages. Please let's not repeat that old mistake! |
It's still unclear what the use cases for |
We should make that an orthogonal discussion. As long as the instruction exists, the reference to the catch should be explicit. |
Makes sense. |
We didn't include it in yesterday's poll, but we can certainly add it if we need it. It was more like we didn't include that in the poll than we decided to drop it. C++ does not need that, but as @rossberg pointed out, other languages might. We can discuss this in a separate issue. |
I'm noticing that |
I don't think we can apply an algorithm similar to fixing irreducible CFG to this problem, and I haven't been able to come up with a general transformation algorithm that solves this problem. Unless we have an algorithm at hand, I'm not sure what you're suggesting. Also, even if we come up with some complicated algorithm someday later, I don't think we necessarily need to remove it. As I said in the discussions on And also, I don't think it is "adding a lot of complexity and questions" as you suggested; the confusion was whether it targets |
As you point out above, the issue comes up when the nearest common dominator of the blocks that a |
@RossTate I don't understand how that would work. Suppose there's a big And even if it can be done in some way, unwind mismatches are rare but not that rare. I've seen dozens of it within a single big file in a real application. One I'm not saying there cannot ever exist an algorithm that can do this transformation. There may be. But that algorithm will be undoubtedly complicated and involve a lot of code duplication. I think this is a sufficient reason to have |
Just to clarify, my thought experiment only requires duplicating the |
|
Apparently it is a new control construct. |
@RossTate Can you clarify how we can duplicate catch blocks to solve unwind mismatch problem? |
Yes, and to elaborate on why, consider the following:
If this were a purely syntactic construct the way @RossTate means that, the However, |
Thanks @tlively. And there's more discussion in #130. (Sorry, I replied to @taralx on my phone and thought we were on that thread.)
Happy to! Let's consider the useful concrete example you found for us above: If you triplicate So one way to modify |
This makes sense. I'm not sure where the code of the handler would live, but "range splitting" the try does make sense, and does avoid the "jump over" which has more complex semantics. |
@RossTate That might work. I think we considered something similar briefly 3 years ago when we first discovered the unwind mismatch problem but dismissed it quickly out of concerns on code size increase. In this particular example we only need to duplicate a small portion, but it can be a large I'm little curious how much the code size increase caused by this will be in many real world programs, and I may want to experiment on that later, but removing this at the moment seems risky to me. Also, your concerns on this instruction don't sound very concrete at the moment, and you mostly talked about hypothetical uncertainties wrt your future instructions. I'd like to keep the same argument as I did for |
Remember that a Having looked at the code you need to change, I am not so sure that I understand that redundancies across instructions are fine. But every instruction is additional implementation and testing complexity, especially so when it involves control and so can be particularly dangerous from a browser-security perspective. So culling the instruction set can help get the feature out the door quicker. |
Not really.
No, it is... easier to have
What is the new mismatch problem you are talking about?
You've been saying the same thing for every single instruction that you are not going to use or are not interested in. Even now, you are trying remove Apparently other people, including VM developers, haven't expressed concerns with this so far, and you've been the only person arguing this is a serious security problem. So I'm curious "get the feature out the door quicker" means. Does that mean you won't let it get out the door unless I remove instructions you are telling me to remove? |
@RossTate Can you answer this question?
What is the new mismatch problem you are talking about? |
If I understand the code correctly, As for the concern about size caused by code-duplication, what I was pointing out is that the code in |
I understand (and was aware of) the former part that we need to insert a new try-delegate in that case, but I don't understand "previous checked function calls in the
Did you read my response above? |
I did. My point was that the handler is small as it is just the filter code, so it's fine to duplicate it with |
Can you answer this question too? |
I am not saying it needs to be extracted. Duplicate the entire Working on the other question. |
@RossTate You might be able to say |
Given C++ code:
this should presumably compile to roughly
Note that none of the user code is inside the wasm- |
|
I have investigated a number of languages and found they do not need
And the algorithm I am suggesting would work just as well for them.
Certainly. But we are already designing under the expectation that |
I think that was your conclusion; there are languages that certainly preserve stack traces when rethrown. You said some of them don't preserve it consistently, but that doesn't mean we should remove stack traces altogether. Usually they are auxiliary info and people want to have them for their debugging. Also as I pointed out in multiple issues,
Currently you are the only person arguing this is a security threat, and I asked many stakeholders, including VM developers, about this instruction and they didn't express concerns. This will increase code size and also code generation complexity. We don't have a data on exactly how many percents, and I don't think you do either. Stakeholders and VM developers agreed to this proposal and the CG passed it, so I think you should provide enough evidence that the code size increase will be negligible in real world programs to overturn this. Also, I don't think you convinced people that this is a security threat as you keep suggesting, which you also did for all other instructions you want to remove.
Are you going to answer this question? |
First step towards the new exception handling proposal: WebAssembly/exception-handling#125 This is essentially a revert of: "[wasm] Switch to new 'catch' and 'br_on_exn' proposal." The changes are: - "catch" instruction takes a tag immediate, - "rethrow" instruction takes a label immediate, - Add "catch_all" instruction, - Remove "br_on_exn" instruction, - Do not push exceptions on the stack, only the encoded values [email protected] CC=[email protected] Bug: v8:8091 Change-Id: Iea4d8d5a5d3ad50693f645e93c13e8de117aa884 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2484514 Commit-Queue: Thibaud Michaud <[email protected]> Reviewed-by: Clemens Backes <[email protected]> Cr-Commit-Position: refs/heads/master@{#71602}
I did a presentation on why we need some spec changes to be extensible to future two-phase unwinding on 8/18 CG meeting: slides
I pre-uploaded slides on the spec changes I am planning to propose in the next week's 9/15 CG meeting: slides
This is basically the same as what I discussed in #123. The changes consist of:
catch_br
instructiontry
-unwind
instructionIf you have any feedback or comments, I'd appreciate them.
Also, I'm not sure if we still want the immediate argument of
rethrow
in the first version of the proposal.rethrow
used to have an immediate argument that specifies which exception in the current EH pad stack to rethrow. I don't have a use case for this in C++, but other languages might need it, so if people have any opinions on keep this or not, I'd appreciate that.The text was updated successfully, but these errors were encountered: