-
Notifications
You must be signed in to change notification settings - Fork 695
Framework for defining behavior beyond standard semantics #1424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I will try to summarize the discussion so far as I understand it, and address some of the issues: The problemThere are a few proposals that concern the execution of a wasm module, but don't add any observable behavior:
A basic implementation of a Wasm engine may ignore these features and still get a correct result. The trendModifying the core spec and requiring all compliant implementations to handle these features is an unnecessary burden. Moreover, not all the implementations may even get anything meaningful from some of these features (e.g. an interpreter doesn't gain anything from branch hinting). As a result, the CG is (rightfully) hesitant in handling these proposal. One "escape hatch" that can move the proposals forward is to use custom sections instead of adding new instructions. Custom sections are just one possible mechanism, and while the preferred way of handling this kind of proposal, it may not be always desirable (In the latest CG meeting, people seemed to agree that new instructions are probably a better way to implement Constant Time, even if the new instructions don't add observable behavior, and basic implementations will probably implement them the same way as the normal instructions). Open questionsThere are currently some open questions, both organizational and technical.
If the answer to the above questions is "yes":
My answersMy answers to the first two questions:
As for the remaining questions, I don't have definite answers:
I am curious to see what other people think about this. |
To elaborate on my technical section, here's a "standard" that I think these "modifying" sections could follow.
This way a tool or engine that understands a modifying section can maintain a cursor within that section, then as it parses the section being modified it can advance the cursor appropriately. If there are many such sections, it's easy to maintain many such cursors (though we'll have to come up with some disambiguation strategy if two cursors insert modifications at the same spot). Furthermore, even if a tool doesn't understand a modifying section, it can still decorate the modified section with these black-box annotations and make an effort to preserve them through any transformations it employs. After transformation, it's still easy for the tool to generate the new modifying section with the target-locations of the modifying instructions adjusted appropriately. |
Another question would be where custom sections get specified. There likely is going to be a broad range of them, with widely varying relevance to the broader ecosystem. It probably doesn't make sense to put them all into the core spec, even if they get standardised. Some layering scales better, so we may want to create separate documents for distinct custom sections. |
Sorry, the response ended up running a bit long, I am trying to break it into sections 😃 We can try to meet in some online form, this usually helps. Spec
In my personal view the answer is "it depends". I think we definitely need these features, however custom sections, the way they are defined now, pose some challenges. One issue is there isn't a way to meaningfully test or examine them, which can lead to divergent implementations. We can check if the section is there, but we don't have a way to test if the implementation or (more importantly) a tool does all the right things with it. However a more serious problem is that any given tool that is not properly aware of any given custom section implementation can break it by not reflecting bytecode changes. Maintaining this type of parallel information is generally error-prone and potentially expensive. This of course is made worse by the first problem, that we can't easily test the tools 😃 I think @RossTate's proposal can help with some of these issues though.
I think so, even if just to make layouts of those sections available to any implementation that wants to support them. Better format
Second that - if we can get to black-box treatment of annotations, then we can solve most of the tooling problems.
I feel like we need to be able to attach tags to particular instructions (the ones they would move with), and whether they would be interpreted as "before", "after" or "at" would be up to the implementation. Also, what about situations where we want to annotate not just instructions, but ranges - for example if you are timing something you don't want start and end to be reordered, or some other sequence dropped in. Better testingEven if we improve the annotation format, this still would not cover the testing issue (we would just make it easier for tools to pass it through undamaged). I am not sure we have a way to fix this, as modifications encoded by custom sections would result in changes on levels below Wasm bytecode, which is beyond the spec. |
Sorry, I meant the "insert" instruction to be just an example. As you mention, there are other kinds of modifying instructions we will want, and we can also add more in the future should new needs arise. |
So it sounds like we need these components:
|
I am worried about annotations that change their meaning depending on their position or other alterations. For example, a branch hint should be flipped if a tool flips the condition and the arms of an if, and an inlining hint might no longer be applicable if the toolchain inlines, etc. At least for custom sections where the annotation is optional (like a branch hint), I would propose that
Tools could then easily remove sections with data that they cannot safely update. |
A very naive question: Taking a hypothetical scenario where a module may have quite a few custom sections defining various bytecode offsets which would cause native codegen to be modified, would there by any startup performance concerns for streaming compilers needing to repeatedly check whether one of the relevant offsets is reached? I guess with some careful organising of the offsets, one could make the check in the regular case no more expensive than just checking whether the "next" offset has been reached? |
I agree with @kripken that having tools handle sections that they don't fully understand is not feasible. We are acting like there will be hundreds of such standardized sections, but in practice I don't expect there to be more than a dozen of them. And adding explicit support to a tool for one of them is not actually complex (not more than supporting new instructions). Especially if we stick to some guidelines on the format so that the parsing will be somewhat similar for many of these sections. Also I would like to argue about byte offsets in the code section: |
The advantage to using offsets into the code section is that they are already used for relocations in the code and in DWARF, so wasm-ld already knows how to read and update them. I would prefer not to introduce a new indexing scheme on top of that. |
I remember that a problem with code section offsets was that there may be multiple code sec]tions in the future. How do DWARF and relocations handle that? |
Adding something to both LLVM and Binaryen, plus ensuring that it keeps working for releases to come is much more complex effort than initial wabt support. And those are the most common tools, there are others as well. A single unsupported tool that the module passes through would be the kryptonite for the feature, no matter how good the support in other tools is. To me it feels that when adding new instructions currently there is better chance to minimize inter-dependence between unrelated features/passes. Additionally, even if adding initial support is not really complex maintenance is still needed - if nobody is looking to keep the feature up it would eventually decay. Supporting multiple custom sections also has the potential to increase surface area of tool changes, as they can (and would) introduce new things for the tools to track, which is an incentive to turn custom section feature off for maintenance reasons. And again, lack of ways to test the semantics of custom sections does not make it easier. I personally see two potential issues with how custom sections are defined now:
I think discarding unsupported sections is better than producing incorrect ones,, but that is still going to be a barrier for the users. I think optimally we should improve indexing, if that is possible. |
About the custom section vs instruction issue regarding tools: Let's say that we add branch hinting as new instructions. These instructions are semantically a NOP, but we require nonetheless all tools and engines to support them, since they will be part of the core spec. I want to focus on the differences in testing and tool support. TestingHow do we add tests? since semantically these instructions are nops, we can't really test that branches are actually hinted. We can only test that the instructions are parsed correctly (and that they are self-consistent, e.g. the only valid hints are 0 and 1, no other values). But this is no different than testing that the custom section version parses correctly and is self consistent. The only tricky part imho is that if the custom section is indeed inconsistent or contains garbage, engines won't error out, so even the basic tests need the addition of some kind of diagnostic (probably console warnings) to be usable by engines. If we use instruction instead, any tool/engine already has a way to signal an error and abort. So we would gain this minor convenience, at the expense of requiring all the ecosystem to "implement" these instructions, even as nops. Tool supportInstructions are not magically safe from rotting. For example a tool that flips branches may "forget"/break branch hinting implemented with instructions as easily as with custom sections. Here the main issue of custom sections is that a tool will need to keep track of changing offsets if it modifies the code section. This doesn't mean that it can be done automatically for any such custom section. I still believe that a tool needs to understand the meaning of the section to be able to safely update it (same for instructions, btw), but a common format can still be useful to reduce the chance of code decay even for less used sections, and can help code reuse. |
I generally support developing a standard way to reference instructions in code sections so that they become visible in tools, and second @tlively's point that we should use byte offsets to do so. One thing I mentioned in the meeting as a criterion for whether something should be an out-of-band instruction modifier or a new instruction, even if semantically equivalent, was interpreters. Interpreters haven't been a big factor in our design calculus, but I think this will change (actually, in point of fact, I aim to change this :)). |
I am trying to summarize the discussion into a concrete plan for updating the branch hinting proposal. I agree with the general idea of this comment , but also with @kripken that each annotation should have its own section. The section name could be prefixed to signal that it is following this format: So, the blueprint format of these sections could be:
in the specific case of branch hinting the new format would look like:
(In this case the "data" vector would just be empty) The annotations proposal seems to me like a good starting point for having a textual representation of the branch hints (and other "code annotation" proposals), but I think that it should proceed separately. I would like to know what people thing about @rossberg 's point of potentially putting the specification of these sections in a separate document and not in the core spec. |
Filing to continue the discussion in the CG meeting today.
Some background, from the point of view of testing, can be found in WebAssembly/spec#1341. Other considerations would include text representation and whether or not we can make this aspect of Wasm runtimes completely separate from the spec (i.e. not requiring proposal process in order to use).
As for what was suggested in the meeting as possible solutions:
/cc @yuri91
The text was updated successfully, but these errors were encountered: