Skip to content

Phase 2 #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ngzhian opened this issue Sep 17, 2021 · 27 comments
Closed

Phase 2 #38

ngzhian opened this issue Sep 17, 2021 · 27 comments

Comments

@ngzhian
Copy link
Member

ngzhian commented Sep 17, 2021

Hi all, I would like to try and move this proposal to phase 2 at an upcoming CG meeting. Filing this issue to gather feedback and concerns.

To recap, phase 2 entry requirements are:

  • Full proposed English spec text available in a forked repo around which a reasonably high level of consensus exists.
  • Updates to the formal notation, test suite, and reference interpreter are NOT yet required.

We have text in the overview, though they aren't exactly what will appear in the actual spec. I believe we have a high level of consensus around the instructions we want in the proposal and the use of fpenv to state dependency.

Note that the instructions aren't fixed yet, but there is consensus around the kind of instructions we want.

@ngzhian
Copy link
Member Author

ngzhian commented Sep 23, 2021

WebAssembly/meetings#885 requesting to present and poll at the next CG meeting (2021-09-28)

@ngzhian
Copy link
Member Author

ngzhian commented Sep 28, 2021

We presented an update at the CG meeting today (will add link to meeting notes when they are available).
Most of the discussion was around the fpenv. Some points:

  • a module can have multiple fpenv, what sort of use cases does that have? When would we want to intentionally have 2 different fpenvs for child modules?
  • do we need it for an MVP? a placeholder 0 byte could be enough for now
  • fpenv looks like a workaround for the spec missing language to specify an "environment" and consistency guarantees within an environment, maybe that's what needed instead? (this is likely insufficient for different modules to express that they require the same consistencies)

We did not continue polling for phase 2, since there wasn't consensus around this design. We should examine use cases more, and see if we can come up with a lighter way for modules to capture their consistency requirements.

@dtig
Copy link
Member

dtig commented Sep 28, 2021

Thanks @ngzhian for presenting at the meeting!

Forfpenv, from the meeting it sounded like most of the implementations (both engines and tools) are looking at defaulting to a 0 byte. Are there implementations that would do anything different right now? If so, it would be useful to hear from them, and the use cases would also be interesting to evaluate.

@sunfishcode
Copy link
Member

fpenv is about preserving the property that one can always link two wasm modules together into a semantically-equivalent wasm module, and split a wasm module into a semantically-equivalent set of modules.

If we don't have fpenv, we'll likely have some kind of rule along the lines of "results have to be consistent within a module (or function, or store, or some other boundary), and we'll lose this property. Once we lose this property, we'll never have it again.

As such, the question isn't whether existing implementations will do anything different with them. It's about preserving a property of core wasm that, so far, we've been careful to protect.

@penzn
Copy link
Contributor

penzn commented Sep 28, 2021

fpenv is about preserving the property that one can always link two wasm modules together into a semantically-equivalent wasm module, and split a wasm module into a semantically-equivalent set of modules.

As @lukewagner pointed out, can you think of a use case where you would merge two modules with different fpenvs? Even GPU portion of GPGPU code does not execute together with CPU portion, instead there is a mechanism to queue work on the GPU.

@sunfishcode
Copy link
Member

Multiple fpenvs comes up in case where you have two modules that you want to link together. Right now, one can link any two modules, and the semantics don't change. If we say "consistent relaxed-simd results guaranteed within a module", then linking two modules changes the semantics—the boundary within which results must be consistent changes.

That may not be a huge practical problem because it'd only make the semantics more deterministic. However if you want to split a module into two, you'd be increasing the scope for nondeterminism, so it wouldn't be valid to do unless you could prove that the module didn't care about the resulting nondeterminism.

I submit that being able to link and split wasm modules without changing semantics is a useful property that we should preserve.

@ngzhian
Copy link
Member Author

ngzhian commented Sep 28, 2021

I think CG was concerned about adding a new construct without strong use cases (please correct me if my reading of the room is off), and we would also want to preserve the semantics after linking/splitting using some sort of mechanism.

One suggestion that came up was a placeholder immediate 0 byte (where the current fpenv immediate goes). This is useful for capturing the consistency dependencies, but useless if modules want flexibility.

  • Linking any number of modules with relaxed-simd will result in all of them having the same fpenv (which is still semantically correct).
  • Splitting a module into multiple modules with relaxed-simd will also preserve semantics, because all of them will have 0 byte.

Then as a future extension, this 0 byte will evolve into a fpenv index, as outlined in the overview currently, and fpenv will be an importabled/exportable construct.

This 0 byte essentially captures what web engines will implement for relaxed-simd, keeps the initial language semantic changes smaller, and leaves room for future extensions.

What do y'all think about this 0 byte placeholder?

@sunfishcode
Copy link
Member

It's not clear to me how a plain 0 byte addresses the consistency concern. If we have a "consistency within a module" rule, and I link two modules together, plain 0 bytes don't contain the information of the location of the original module boundary, so the semantics are not preserved. Similarly, if I split a module, and produce two modules with a 0 byte, it doesn't seem distinguishable from two independently produced modules that both have a 0 byte, so the information about the split modules coming from the same source module and having an expectation of consistency is lost.

@Maratyszcza
Copy link
Collaborator

Maratyszcza commented Sep 28, 2021

I came up with the wording "WebAssembly implementations are required to be consistent, and either always generate FMA instruction, or always generate multiplication+addition pair for a QFMA instruction within a module.", and I had something different in mind than what is denoted as module in WAsm specification or what @sunfishcode's comments refer to. My idea was that code that shares the same address space lowers QFMA instructions the same way (i.e. always to FMA or always to FMUL+FADD). Thus, two WAsm modules linked into the same Web app would either both use FMA or both use FMUL+FADD.

@ngzhian
Copy link
Member Author

ngzhian commented Sep 28, 2021

It's not clear to me how a plain 0 byte addresses the consistency concern.

Not sure if this sounds reasonable: the 0 byte imposes an extreme view of consistency, it means all modules must have the same fpenv - there is only 1 fpenv, no matter how the modules are split (or combined) - the "runner" needs to make sure they get the same env. That's why I said this is "useless". A future extension that adds an importable/exportable fpenv will "relax" this requirement.

Similarly, if I split a module, and produce two modules with a 0 byte, it doesn't seem distinguishable from two independently produced modules that both have a 0 byte

We assume the strict case, that they came from the same module and require the same fpenv. I.e. in the 0-byte world, all split modules come from a single parent source module, and expect to be consistent. The ability to be flexible comes later with fpenv.

@conrad-watt
Copy link
Contributor

We assume the strict case, that they came from the same module and require the same fpenv. I.e. in the 0-byte world, all split modules come from a single parent source module, and expect to be consistent. The ability to be flexible comes later with fpenv.

Yes, this is my interpretation. Essentially, all tools involved with splitting and merging modules would assume that all provided/produced modules are morally using the same "fpenv".

If there's an example of a collection of modules which you'd want to split/merge for which this wouldn't work, that would be a good motivating example for the fpenv design.

@sunfishcode
Copy link
Member

@Maratyszcza Is "address space" the host's virtual address space? I would be opposed to making wasm semantics aware of host virtual address spaces.

@ngzhian Does this mean fully deterministic? If so, that would seem to defeat the entire purpose of relaxed-simd. I would be opposed to a wasm proposal advancing with no purpose other than to be enabled by a future wasm proposal.

@penzn
Copy link
Contributor

penzn commented Sep 28, 2021

I think CG was concerned about adding a new construct without strong use cases (please correct me if my reading of the room is off)

I think that is correct.

Strictly speaking, I don't object to adding a zero byte placeholder. However I still not understand why fpenv is necessary for the proposal move forward. To me this feature seems to be similar to what we already have in Future Features - it represents an idea that is useful, but one we don't have a way to implement yet. To provide an analogy - we don't expect rounding mode to be part of this proposal, so why fpenv?

Another point against it in my view is that even its intended use is going to be inconsistent with identical existing behavior (example linked in #11). This would break existing code, since code that only does platform detection and "strict" SIMD would be allowed to move to an incompatible machine.

In my personal opinion, preserving FP semantics while moving execution between nodes should go to future features - it would require changes to existing spec (we already allow the behavior this is supposed to guard against), and we don't have a way to practically use it just yet.

@ngzhian
Copy link
Member Author

ngzhian commented Sep 28, 2021

Does this mean fully deterministic? If so, that would seem to defeat the entire purpose of relaxed-simd. I would be opposed to a wasm proposal advancing with no purpose other than to be enabled by a future wasm proposal.

No, the instructions themselves can still return different results depending on underlying platform, but it is incorrect to return different results for the same instruction within the same function, module, 2 modules running in a VM, etc.

Hm, but I do see a problem once I start to write this down - it's hard to draw the boundary for what "all modules must be consistent" mean.

@conrad-watt
Copy link
Contributor

conrad-watt commented Sep 28, 2021

Hm, but I do see a problem once I start to write this down - it's hard to draw the boundary for what "all modules must be consistent" mean.

As a first stab at how this would look formally, within the core Wasm spec, each instantiated module could be implicitly passed an fpenv by the host (with all relaxed SIMD functions implicitly referencing this fpenv). It would be up to the host to document what guarantees exist across multiple instantiations.

In 99% of cases, the host (e.g the JS level) would document (e.g. in the JS API) that across an execution every instantiated module gets exactly the same fpenv. In a system where some modules are instantiated for execution on CPU and some for execution on GPU, the host would document that this choice changes the fpenv which is provided.

Tools such as binaryen would almost certainly default to assuming that all provided/produced modules get the same fpenv, although there would be room to be more nuanced if needed.

If per-instance granularity of fpenv isn't sufficient (e.g. one instantiation with some functions executed on CPU and some on GPU), that would motivate something like the language-level fpenv design currently proposed.

EDIT: the simpler and more brutal option would be to model 1 fpenv globally in the store, which would suffice for the Web but would likely be insufficient for mixed CPU/GPU cases where code in a CPU instance can call a function in a GPU instance.

@Maratyszcza
Copy link
Collaborator

Is "address space" the host's virtual address space? I would be opposed to making wasm semantics aware of host virtual address spaces.

AFAICT, the right definition from WAsm specification is "linear memory".

@rossberg
Copy link
Member

I'm still a bit confused by this discussion. A few observations/questions:

  • If there is a practical use case for having multiple different fpenv "values" in a single engine then I think the design along the lines of fpenv declarations is the right one. But the use cases I've heard so far are GPUs and code mobility, which both seem somewhat hypothetical at this point. So it seems fine to leave that for post-MVP?

  • The obvious semantics of leaving it out would be that of a single engine-global fpenv that every op is referencing implicitly. So every execution environment has a global choice how to instantiate the relaxed semantics, but it's the same everywhere. That obviously does not break modularity. In terms of the spec, this would become a parameter to the semantics as a whole.

  • FWIW, I would absolutely not tie this implicit parameter to individual module instantiation. Modules should be thought of as purely a grouping mechanism, module boundaries should never affect runtime behaviour. Doing so would be harmful to modularity, for the reasons @sunfishcode points out, among others. If there is some "state" in a module that affects its semantics then it should be explicit, explicitly referenced, non-singleton, and importable/exportable. That is exactly what the fpenv design achieves, I think.

  • The part I don't fully understand about the fpenv design is how and when an engine would practically choose to instantiate an individual fpenv with a non-default "value". Is the idea that the jit inspects the use sites to make an informed choice?

@conrad-watt
Copy link
Contributor

conrad-watt commented Oct 1, 2021

Modules should be thought of as purely a grouping mechanism, module boundaries should never affect runtime behaviour.

The part I don't fully understand about the fpenv design is how and when an engine would practically choose to instantiate an individual fpenv with a non-default "value".

IIUC, the choice of fpenv is morally determined at the point the Wasm code is compiled to a particular platform (i.e. when you commit to the platform instruction sequence that the Wasm op will be compiled to)? So there is some link to module boundaries, since a module is our unit of compilation at the Wasm level. Given this, I don't see "non-default" fpenv as being meaningfully providable at instantiation-time (if compilation is a separate prior phase), unless the engine is committed to not emitting the platform instruction sequence directly at compile-time and instead doing some kind of dynamic dispatch/patching based on the instantiation-time fpenv.

Could fpenv naturally be thought of as a compile-time import? I'm not suggesting anything concrete about whether the proposal/spec should be changed, just trying to draw analogies to previous conversations about compile-time imports.

@lars-t-hansen
Copy link

Some thoughts.

Say I have an engine that optionally uses a portable interpreter for fast startup and an optimizing compiler for execution speed. At the latest when the interpreter executes a relaxed instruction with fpenv E it commits E to a certain implementation strategy. Suppose my portable interpreter can't do anything other than execute the qfma as separate multiply and add steps. Then it commits the jit to using that same strategy. (In practice, the interpreter may not be able to decide that late, but needs to commit E to the separate-instructions strategy at least when it executes the first instruction in a module that has access to E. Or indeed, the engine must commit to a strategy for E when it chooses the interpreter+jit execution mode.)

I think @Maratyszcza once said something about different families of Intel chips having different lookup tables for some of the reciprocal functions, and that this could be detected in the output of those instructions. Presumably this is a somewhat more relevant concern for code migration than migrating between two instruction sets.

I think for fpenv to be credible we need some very concrete and detailed use cases on the table and some suggestions for plausible implementation strategies in environments that might benefit from having multiple fpenvs. I have a nagging suspicion that it only solves half of a problem. The bigger problem than code migration is data migration, ie distributed/cloud computation where some pieces of a computation might be done on one architecture and some pieces on another.

@lars-t-hansen
Copy link

@lukewagner brought up another issue, namely NaN - which is "nondeterministic" in exactly the sense of relaxed SIMD. What constraints do we have on NaN behavior that might carry over?

@ngzhian
Copy link
Member Author

ngzhian commented Oct 1, 2021

Note that there are 2 issues on fpenv, #19 was filed to discuss how fpenv is used, this issue contains comments/questions from the CG meeting. Both contain important comments so I would like to keep them open for discoverability. Let's try to direct any discussions about fpenv to #19. Thanks!

@ngzhian
Copy link
Member Author

ngzhian commented Nov 1, 2021

Fyi, signed up to do an updated on the Nov 9 CG meeting https://github.com/WebAssembly/meetings/blob/main/main/2021/CG-11-09.md will be presenting what we basically went through in https://github.com/WebAssembly/meetings/blob/main/simd/2021/SIMD-10-29.md wrt spec changes, with some additional work looking into if the current relaxed semantics will work for PowerPC + RISC V .

@ngzhian
Copy link
Member Author

ngzhian commented Nov 2, 2021

Updated https://www.ngzhian.com/relaxed-simd/core/exec/numerics.html#relaxed-operations to reflect changes after looking at PowerPC + RISC V.

The only change needed is: relaxed min/max, RISC V is slightly different, corresponds to minimumNumber/maxmimumNumber of IEEE-754 2019. (PowerPC and ARM is minimum/maximum, x86 is its own thing).

@ngzhian
Copy link
Member Author

ngzhian commented Nov 9, 2021

We polled successfully for phase 2 today.
Some comments from CG:

  • the choice of fixed projection means that the individual cases for an instruction are correlated, e.g. for relaxed min we cannot have the case where you return the the first operand when either operand is nan (like x86 but returning first operand)
  • the instructions are using set notations, it should be a list of sets

@penzn
Copy link
Contributor

penzn commented Nov 10, 2021

  • the choice of fixed projection means that the individual cases for an instruction are correlated, e.g. for relaxed min we cannot have the case where you return the the first operand when either operand is nan (like x86 but returning first operand)

I personally understood that comment a bit differently, that we should have consistency between similar ops, multiply add and multiply subtract, for example.

@ngzhian
Copy link
Member Author

ngzhian commented Nov 10, 2021

  • the choice of fixed projection means that the individual cases for an instruction are correlated, e.g. for relaxed min we cannot have the case where you return the the first operand when either operand is nan (like x86 but returning first operand)

I personally understood that comment a bit differently, that we should have consistency between similar ops, multiply add and multiply subtract, for example.

Thanks for pointing this out, I missed this comment in the summary.
These are 2 separate comments, what you are talking about is inter-instruction consistency.
What I was talking about was correlation between results within the same instruction.

@ngzhian
Copy link
Member Author

ngzhian commented Feb 18, 2022

#53 tracks TODO for spec text based on comments in Phase 2 poll. And since we have successfully advanced to Phase 2, closing thi issue.

@ngzhian ngzhian closed this as completed Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants