-
Notifications
You must be signed in to change notification settings - Fork 11
Binary encoding of annotations #5
Comments
That is part of it from my pov.
The other part is whether to formalize how annotations are attached to
different elements of a wasm module. E.g., to functions, modules, imports,
parameters etc.
The model for this is the JVM where annotations processing is common.
…On Tue, May 14, 2019 at 2:33 PM Ben Smith ***@***.***> wrote:
In the May 14 CG meeting, there was some discussion about how best to
roundtrip an annotation through the binary format (i.e. text -> binary ->
text), and how to associate it with a particular node in the text source.
(Did I understand correctly?)
cc @titzer <https://github.com/titzer> @fgmccabe
<https://github.com/fgmccabe> @rossberg <https://github.com/rossberg>
@jgravelle-google <https://github.com/jgravelle-google>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAQAXUA2L7IHE63WFH2NQUTPVMV37A5CNFSM4HM53F2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GTY7CGQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAQAXUD3IR7C624I45OGWKLPVMV37ANCNFSM4HM53F2A>
.
--
Francis McCabe
SWE
|
To clarify the goals of the annotation syntax, they are the following:
Non-goals are: In short, annotations are intended as a way to represent custom sections in the text format, not a new way of providing custom information. Almost the entire discussion has been about the latter, which is out of scope for this proposal. Wrt that discussion, I think we are talking about an intractable problem. There was the suggestion of associating annotations in custom sections with specific elements of a module in a generic fashion that all tools would understand and that would reflect in-place textual annotations 1-to-1. But AFAICS, that has serious problems:
The last point in particular is a fundamental problem that no amount of design sophistication can overcome. It's simply impossible. To me this smells of over-engineering. We deliberately made custom sections as simple and generic as they are. Imposing something way more complicated now would likely be counter-productive. |
If the round-tripping is the primary concern, then I suggest removing
annotations from other sections. I.e., no @name annotations.
The issue about 'active comments' is serious. This is a serious issue for
JS today: should a tool preserve JS comments? If one does not, then enough
JS processors rely on 'comments' to break this.
(Not all processing of wasm will be by the authors of the module.)
…On Wed, May 15, 2019 at 5:22 AM Andreas Rossberg ***@***.***> wrote:
To clarify the goals of the annotation syntax, they are the following:
1. Have a user-friendly way to represent certain custom sections in
text format.
2. Added bonus: allow round-tripping binary-text-binary.
Non-goals are:
A. Changing the notion of custom sections in the binary format.
B. Round-tripping text-binary-text in the presence of annotations that a
tool does not understand.
In short, *annotations are intended as a way to represent custom sections
in the text format, not a new way of providing custom information.*
Almost the entire discussion has been about the latter, which is out of
scope for this proposal.
------------------------------
Wrt that discussion, I think we are talking about an intractable problem.
There was the suggestion of associating annotations in custom sections with
specific elements of a module in a generic fashion that all tools would
understand and that would reflect in-place textual annotations 1-to-1. But
AFAICS, that has serious problems:
- Wasm binaries are not just sequences of byte codes, they represent
non-trivial ASTs. A generic format for referencing all kinds of AST nodes
would likely be verbose or inconvenient to use.
- Would we force all custom sections to either choose this complicated
format or forbid them to define in-place annotation syntax?
- Existing custom sections, like the name section, already do not
follow this format, yet would benefit from convenient in-place text
representation.
- We'd need a backwards-compatible way to distinguish this new kind of
structured custom sections.
- Even if we ignored all that, any tool that needs to perform even the
slightest modification/transformation of a module still has no way of
knowing how that ought to affect custom annotations that it does not
understand.
The last point in particular is a fundamental problem that no amount of
design sophistication can overcome. It's simply impossible.
To me this smells of over-engineering. We deliberately made custom
sections as simple and generic as they are. Imposing something way more
complicated now would likely be counter-productive.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAQAXUFRKIDYW7ZQ74LJVX3PVP577A5CNFSM4HM53F2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVOPNKI#issuecomment-492631721>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAQAXUEQIDF7J6AUJ546KGDPVP577ANCNFSM4HM53F2A>
.
--
Francis McCabe
SWE
|
I'm less pessimistic that the custom information description problem is completely intractable, but I'm beyond certain that it is entirely orthogonal to this proposal.
That is the wrong question. Preserving the comments through the tool is not the intention, but allowing the tool to read the comments is. Meaning, the tool should not (indeed, can not) be expected to preserve arbitrary comments. But, a tool may interpret a subset of the comments in a module. I'm thinking that "this allows custom sections to round-trip" has become confused with "this mandates that annotations round-trip". The latter is the JVM model, the former is this proposal. This simply provides a primitive to extend the text format in tool-specified ways. And nothing more. |
I kind of regret ever having mentioned round-tripping. Probably my mistake to give the impression that that was an important motivation for this proposal. :) |
To see why the robust annotation problem is impossible to solve in any interesting generality, it may be necessary to abstract a little. Wasm is a programming language. A language consists of two parts: syntax and semantics. A given definition of custom section or annotations essentially extends Wasm the language with both syntax and semantics of some form. Any of the ideas we have been discussing can only ever hope to make custom syntax (partially) understood by tools. There is no way a tool can second-guess an unknown semantics. However, transforming syntax has to be done in a way that maintains semantics. If you don't know what that semantics is, you cannot maintain it. It is the exception rather than the rule that a semantics is so trivial that any syntactically correct program also is semantically correct (and moreover, equivalent to the original). To give a concrete example: one application for custom sections that I have been discussing with various folks is typing. You could refine Wasm's type system by overlaying it with more precise or rigid rules, ensuring additional properties, e.g., security ones like information flow isolation or the absence of out-of-bounds errors. That would require encoding additional type annotations in various places of a program that a custom module manager would check beforehand. We cannot possibly hope a tool to be able to transform such a program while maintaining well-typedness (a semantic condition) under this custom type system. That may be an advanced use case, but the basic observation applies universally: in the presence of any non-trivial semantics it is insufficient to just maintain syntactic coherence (and thus, IMO, pointless to go to length to try). |
I generally agree that the general problem of understanding and transforming arbitrary annotations is intractable and shouldn't be solved by this proposal. Interpreting annotations is indeed a matter left to tools. However, tools have to play nice together, and "just drop if you don't understand" fundamentally makes all tools non-interoperable. That's the wrong default, IMHO. Instead, I think the default should be "preserve if you don't understand". I also think that designing an annotation mechanism around the text format won't scale to large modules because the text format is so much larger than binary; eventually we would want binary annotations too. The tractable part of preservation is maintaining the mapping of annotations to their locations in the syntax, which necessitates a binary encoding that can reproduce the exact location and contents of annotations. This isn't as hard as it seems at first. Java does this. There are plenty of ways to accomplish this, e.g. by having a binary "annotations" section that refers to other sections and has lists of where to insert what annotations corresponding to byte offsets of, e.g. byte offset within a function body, parameter to a function, function start, section start, etc. It's essentially an index of annotations, and could be organized by either syntax location, or annotation type, or otherwise. It can be densely encoded yet be inflated to match the original textual annotations. It seems weird to me that we would define a text format for a syntax tree and then modify that syntax tree with additional syntactic nodes that are both discarded by default by and not preserved in the binary format. Especially if that defines tokens and syntactic elements that must be at least parsed by a text parser. That's not really syntax then; it's comments, but more restrictive than comments in that it enforces a syntactic structure that comments don't. In short, I think full roundtripping of annotations to binary and text is the only reason to standardize an annotation proposal at all. |
@titzer, the purpose of this proposal is to provide a generic way to represent custom sections in the text format in human-readable form. What alternative do you propose? A few more comments:
|
I understand the highest priority item is to roundtrip custom sections. What is the role of annotations that are attached to expressions within function bodies and elsewhere? Perhaps we can split that part out? |
Ah, annotations aren't syntactically attached to anything per this proposal. They can be lexically inserted anywhere in a source file, just like comments. There are no a priori rules or semantics regarding placement or interpretation whatsoever. Just as with custom sections. But tools may impose certain requirements on those they want to interpret. Again, just as with custom sections. So I'm not sure what can be split out. It's already as minimal as it can possibly get. |
I do not see the problem with “preserve” if you do not understand.
It can be modeled as “this entity has an annotation”
If a tool purports to transform any entity it must understand that entity -
including the presence of annotations. If it cannot handle the annotation
(if for example the tool materially modifies the semantics of same entity)
then it is as though the tool is not recognizing the entirety.
|
@fgmccabe, how do you know that the contents of an annotation do not depend on other entities? For example, it refers to some definition, local, block, type; assumes some value, type, offset, size? How do you know that a local change to one entity does not affect annotations on other entities? |
Based on my admittedly limited experience with comments in JS, and
annotations in Java, this is part of the process. When a third party
designs an annotation scheme, he/she needs to be aware of the potential
impact on tools. I see no reason why annotations are special here: your
arguments apply to wasm itself too.
(to take an example, a tool that processes wasm to remove common
sub-expressions had better understand the full implications of that).
…On Thu, May 16, 2019 at 8:19 AM Andreas Rossberg ***@***.***> wrote:
@fgmccabe <https://github.com/fgmccabe>, how do you know that the
contents of an annotation do not depend on other entities? For example, it
refers to some definition, local, block, type; assumes some value, type,
offset, size? How do you know that a local change to one entity does not
affect annotations on other entities?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAQAXUEV2ZE5GWCLQI6EPH3PVV3OHA5CNFSM4HM53F2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVSEVMY#issuecomment-493111987>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAQAXUBUXT4WSWTMIX47VSLPVV3OHANCNFSM4HM53F2A>
.
--
Francis McCabe
SWE
|
Well, the whole purpose of custom sections was that some parties can add additional stuff to their binaries without having to coordinate with all tool writers in the universe. That would require collaboration to the degree of quasi standardisation, which defeats the purpose. |
Preserving unmodeled annotations or custom sections is an incredibly dangerous game, if your tool makes any transformations. To me, this + roundtripping can be solved with two extremely simple conventions.
It's important to remember that tools want to be as interoperable as is reasonable. This proposal gives tools an additional primitive by which to coordinate.
To me this is a non-sequitur, which makes me think we have very different understandings of the problem this proposal is attempting to solve, so I want to dig a little deeper here. Also, we already do have such a mechanism for the binary format. Custom sections. Tools drop custom sections they don't understand as well. Or preserve them, depending on what the tool does. This is already a consideration we make.
Yes. That is the point. The restricted comment structure allows the lexer to produce tokens that the parser can reason about, or not. It was more trivial to implement than to argue for. |
Even that would already go beyond what the binary format currently offers, since it cannot represent that distinction. |
Which is a useful property to have. Showerthought: a custom section that is itself an index of other custom sections, saying whether to drop or preserve by default. Or extend that to "preserve section X under conditions Y". |
Let's close this issue as out-of-scope. There are a bunch of use cases (e.g. branch hinting, compilation hints, various ideas in Binaryen) that depend on this proposal and none of them require a generic binary format for arbitrary annotations. |
ping @rossberg. I don't have permissions to close this myself. |
In the May 14 CG meeting, there was some discussion about how best to roundtrip an annotation through the binary format (i.e. text -> binary -> text), and how to associate it with a particular node in the text source. (Did I understand correctly?)
cc @titzer @fgmccabe @rossberg @jgravelle-google
The text was updated successfully, but these errors were encountered: