Negotiated heap size and methods of resizing the heap. #331

ghost · 2015-09-05T00:09:50Z

A wasm application should declare:

A minimum usable heap size. The runtime would not start the application unless this could be allocated, and might use this to defer the start of the app until resources are available, or use the size in planning etc.
A requested maximum heap size. The runtime can choose to allocate a size between the minimum and this maximum inclusive.
Request the use of on-the-fly heap resizing. The runtime can accept or decline this request.
Request an ability to grow or shrink the heap after exiting the wasm execution context. The wasm app would be re-entered after the size has changed, and the module might have been re-compiled. The runtime could decline this request. Global variable state would probably want to be preserved.

The runtime can:

Allocate a size between the minimum and maximum. This allows the runtime to round up the size to better support bounds checking and to allocate a larger initial size where there is ample address space. For example, the runtime can round up the size to a page boundary. For example, on ARM 32-bit the runtime can round up the size to a value that can be encoded as an immediate value in bounds check instructions.
Choose if on-the-fly heap resizing is supported. For example, a runtime that has degraded performance when using on-the-fly heap resizing but has ample address space can decline the request for on-the-fly heap resizing and allocate a large fixed heap size and optimize for this. For example, a device with limited cpu power and limit memory could run all-in on one application, giving it all the memory available and optimizing performance for a fixed heap size.
Emit optimized asm.js code when decoding, avoiding the heap resizing code if this has been declined so that the host asm.js compiler emits optimal code.
Emit constant globals with the accepted sizes and choices. For example: a constant length if a fixed heap size is chosen. Perhaps also index masking constants if a fixed heap size is chosen (but this can be another issue).

Alternatives to the on-the-fly resizing should be considered for the MVP. The motivation for the on-the-fly resizing appears to be legacy applications that hit OOM in deeply nested call contexts and need to expand the heap while preserving this call context. If applications can exit execution while resizing and re-enter when done then the runtime has far more options and this approach should be advocated for new applications and legacy applications should be encouraged to be reworked to use this approach.

The specification should warn developers that the use of on-the-fly heap resizing limits optimization of the code and does not perform well across browsers and limits the usability of the application by limiting the ability of the runtime to manage resources.

There should be an interface for applications to call to request a larger heap size after exiting the wasm context. This interface could also include a minimum and maximum size. If the runtime could not allocate the minimum then than app would enter an OOM state. The runtime can choose to resize between the minimum and maximum requested sizes, rounding for performance. A runtime could choose to re-compile a module with a new fixed size if warranted to affect a heap size increase.

lukewagner · 2015-09-08T16:46:14Z

Lot of interesting ideas in here:

The runtime can ... Choose if on-the-fly heap resizing is supported.

I definitely agree that modules should have to declare their intent to resize the heap (one way or another) for the benefit of optimization. Currently, this is possible by having the initial and maximum memory sizes (declared by the module [1][2]) be equal. resize_memory can also fail any time it wants (it's a fallible operation). I guess your point here is that perhaps some wasm implementations will not want to support resizing ever so it might be nice to give the app an explicit feature test rather than just having resize_memory always fail. I don't think we should add this query (effectively making resize_memory optional) unless we find an engine that is seriously considering doing so (which it doesn't sound like atm).

Request an ability to grow or shrink the heap after exiting the wasm execution context.

This is an interesting variation on synchronous heap resizing. One issue is that wasm doesn't have a notion of event loop so while this feature would make sense in a browser, it'd make less sense in a traditional shell environment where all the time is spent under a single invocation of main. Also, while sync resize has the obvious malloc use case, I think apps would have to be re-architected to take advantage of this async resize and, in that case, I think this belongs as a future feature, prioritized by demand/feedback.

A requested maximum heap size. The runtime can choose to allocate a size between the minimum
and this maximum inclusive.

The current design has initial/max memory, but, iiuc, the difference is that you're proposing that these two numbers form a range from which the host may nondeterministically pick the initial heap size. I'm not sure what problem this generalization fixes: an engine that doesn't want to deal with (moving) resizing can always allocate as much virtual address space up front as it wants internally and then map this memory as needed in response to resize_memory calls.

jfbastien · 2015-09-08T19:52:38Z

All important to discuss, but we should also take into account what these proposals entail for developers and users (by users I mean folks browsing the web, loading .wasm without knowing it).

Specifically, these ideas are very much oriented towards making the implementation better. That's good, but we need to understand when the cost imposed on developers is too big, or when a compiler targeting wasm will just ragequit and e.g. always resize memory and just ask for 0 bytes to 4 GiB of min/max. I'm worried that diverging too much from "it just works" will make the wasm platform hard to use, or will mean that we expose a bunch of features nobody uses (and are therefore broken).

#306 and linked discussions touched on some of this (#302 #53 #227).

If we revisit memory management, I'd like to make sure we address at least:

madvise scenarios where pysical memory can be given back.
Dynamic linking.

ghost · 2015-09-10T15:38:29Z

@lukewagner Having a module declare that it is not resizing by declaring the initial and maximum memory the same would be useful, but only for apps that have a fixed memory requirement. It would not handle the case of apps that have a growing or variable memory requirement and when the runtime has a limit between the minimum and maximum and/or wants to optimize for no-resizing. Resizing needs to be optional and a choice for the runtime and user, for performance and usability reasons, a technical matter.

You suggest allocating the runtime chosen virtual address space up front and mapping from this as needed in response to resize_memory calls, but I don't see how this changes anything as the virtual address space would be equivalent to the linear memory. Adding another 'committed' memory layer is not going to address this issue. I presume the a wasm module can safely address all the allocated virtual address space (safe for the runtime) so bounds check would be optimized for this virtual address length and this is what this issue is about, not adding another 'committed' memory layer which would be orthogonal.

@jfbastien It gives the runtime the choice, so runtimes can offer users a range of options - this can only be good for users. The runtime could choose to allocate the minimum requested length and support resizing, and then there is no difference for the user from what @lukewagner wants. With the current index-masking this is done by setting the masks to 0xffffffff and the compiler optimizes them away. The choice can be dynamic - if a runtime has a good reserve of address space it can keep allocating the maximum requested and optimize for performance - good for the user again. Developers have more choices, a good thing, and I can't imagine that code that has specified a min/max range will break if given a larger allocation than the min to start with.

madvise seems orthogonal to the issue here. I explored and implemented this on firefox some time ago as buffer.discard and this would be useful to support. Practically it requires the linear memory to be page aligned. With the buffer at absolute zero there can be nothing before this allocation, such as a header, and this is the case on firefox, not sure about v8 and jsc.

lukewagner · 2015-09-10T16:49:35Z

@JSStats My point with talking about of front allocation virtual address space is that the runtime already has a lot of choice: it can allocate a big slab up front, it can realloc as it goes, or it can use a combination.

Moreover, it's difficult to discuss the merits of this negotiated heap size proposal without a concrete list of problems and how they are addressed by this proposal but not otherwise.

jfbastien · 2015-09-11T03:23:08Z

Yes choice is good but only up to a point, e.g. it introduces noticeable implementation differences, and drives all implementations to do the same, raising maintenance cost.

I think at this time we'll want to try thing out once the implementations are more concrete. As @lukewagner says we'll want to have concrete upsides with new proposals, and as I suggested we want to ensure all memory related things are addressed, including madvise, mprotect, dynamic linking, ...

ghost · 2015-09-11T03:46:26Z

@lukewagner

This is basically about performance. Runtimes could follow your memory-resizing scheme if they ignored performance. I think there are other very significant matters to do with resource planning, but we can put those aside for now.

As you would well know the runtime must be safe from the wasm code. Wasm targets a range of systems, some 32-bit with limited VM and memory, some 64-bit. Also a range or processors, some with more limited addressing modes and immediate argument support.

The key point is (optionally) exposing the linear memory size to the wasm code, allowing (not requiring) the wasm code to apply constraints on the memory access indexes that prove that the index is within bounds so that the runtime can avoid redundant bounds checks. For example a comparison test, or high bit masking. This creates some further technical problems.

The wasm code has limited type information and the wasm code generator likely knows more about the valid range for indexes. For example, it might know that a wasm int32 is a pointer. This potentially allows the code generator to apply much more aggressive optimizations and hoisting of index constraints than a wasm runtime could.

A runtime might better (or only) support some linear memory sizes exposed to the wasm code. For example on the ARM only a limited set of immediate constants can be encoded in a comparison instruction.

With the compiled code specialized to the linear memory size, re-compilation will be required for top performance which would be better avoided.

Proving that bounds checks are unnecessary is more effective if the limit is a constant and at least does not decrease across calls etc.

The runtime needs to be able to choose the linear memory size, a size greater than the minimum necessary size requested by the wasm application. This can avoid unnecessary re-specialization. The runtime chosen size might also be the limit that the runtime can (or will) allocate anyway and the runtime can optimize the code knowing that the memory limits will not decrease.

I understand some runtimes want to optimize for the allocated virtual address space by supporting fine incremental memory resizing, even if this sacrifices performance, and this should be supported too. The runtime should be able to choose not to expose a fixed linear memory size to the wasm code when compiling so that it is sure that the code is not specialize to this size, but only when the wasm code either does not need it or can work without it.

ghost · 2015-09-11T05:01:45Z

@jfbastien I've explored this in asm.js, have an emscripten branch to dynamically (optionally) emit the index masking code, patches for Jsc, V8, odin, Ion that also help regular JS code too. If you 'want to have concrete upsides', consider an asm.js zlib benchmark:

v8 tf x64, resizeable buffer: 92.8 sec
v8 no-tf x64, resizeable buffer: 22.7
v8 tf x64: fixed buffer: 19.4
v8 tf x64 dynamic index masking: 14.6u
odin x64, resizeable buffer: 14.7u

V8 is very competitive with Odin which uses a memory protection scheme to omit bounds check, but v8 is only competitive when the asm.js uses index masking. I understand from the discussions that the Odin memory protection scheme is not viable, that some systems do not have the VM for many such wasm apps, and it does not scale to 64-bit, so v8 TF seems representative of a wasm implementation?

V8 could do better as index masking also supports moving a scale factor into the x64 addressing mode - something JSC can do.

Even if people doubt that an application might be better off emitting support for index masking, surely everyone would agree that there are applications that naturally mask indexes and this support would significantly improve their performance.

If you do not understand a 'concrete' use or technical issue then please bounce back your understanding so I can try to focus some more?

As I have said, I think this is orthogonal to mprotect and madvise. It may well impact dynamic linking and other matters, but they seem so poorly defined that it is hard to evaluate.

lukewagner · 2015-09-11T13:50:16Z

Agreed with @jfbastien that choice is good up to a point but can hurt too. We've definitely seen this in JS.

Since it is possible for wasm to specify that it wants a fixed heap size which the impl can know statically, I still don't see any specific problems/solutions in your above post, just broad assertions.

I don't think v8/asm.js measurements can be taken as representative since v8 is using a completely different compilation strategy with asm.js than it's using with wasm. In particular, the 'v8 tf resizable buffer' figures represent a bizarre performance cliff and are thus have no bearing on the question of resizing.

ghost · 2015-09-11T14:21:27Z

@lukewagner The problem is not solved by allowing the wasm code to dictate a fixed size. The wasm dictated size might give shocking performance for a given runtime, for example limits that can not be encoded efficiently in an ARM instruction for bounds checking. Further, it does not handle the case of wasm apps with a range of memory size requirements where the runtime might want to choose the maximum size, or allocate the most memory it can, so that it can then allow the wasm code to specialize on this fixed size.

I worry more about the complexity that will be needed to achieve top performance from a design that allows only resizing. It might end up needing a JIT, and we have will have the same problem as JS all over.

lukewagner · 2015-09-11T15:11:01Z

The wasm dictated size might give shocking performance for a given runtime, for example limits that
can not be encoded efficiently in an ARM instruction for bounds checking.

Ah, so here is a very specific issue; I agree this is a problem and I think we can address it independently by specifying that both the initial memory size and the size after resize_memory can be rounded up by an implementation-defined amount to provide a more efficient length. The application can observe this rounded-up value via memory_size. The downside is slightly more nondeterminism, but I think it'd be worth it. This could be split off into a nice, specific separate issue.

Further, it does not handle the case of wasm apps with a range of memory size requirements
where the runtime might want to choose the maximum size, or allocate the most memory it can,
so that it can then allow the wasm code to specialize on this fixed size.

This is still too vague a problem statement to be actionable.

I worry more about the complexity that will be needed to achieve top performance from a design that
allows only resizing. It might end up needing a JIT, and we have will have the same problem as JS
all over.

If there ends up being a significant-enough perf difference, as I already said, apps can specify that they don't want heap resizing (setting initial=max). Devs care about performance, so they will do this if it is worth it to them. Letting the wasm engine decide if it wants to support resizing takes away control from app devs and also reduces the portability of the web as a platform (where only certain browsers/archs deny resizing).

lukewagner · 2015-09-11T15:18:01Z

Another specific use case I think maybe you're getting at is: an application decides it doesn't want resizing (for performance reasons) and wants to ask the browser for "as much memory as you can give me in this [min, max] range". I can see why that is attractive (if indeed there is a performance reason; that remains to be demonstrated). It's also a small generalization of what's in the design now and I think could be added in a backwards compatible way (it's just a new option for the memory section) so I think it is best added a future feature that we should prioritize based on user feedback and experience.

ghost · 2015-09-11T20:32:45Z

@lukewagner You are still dis-regarding the use case of 'wasm apps with a range of memory size requirements' which is exactly the use case for memory resizing, and I presume well recognised here!

Given this use case, plus the technical performance problem, allowing the wasm app to dictate the memory size is not a solution.

Perhaps you recognize this in the follow up comment, but the request is not about the wasm app dictating that it get all the memory it can, rather this is just one choice for the runtime which might be a good choice if it has ample VM.

I see no point moving the 'rounding-up' use case off into a separate issue as there may be other unanticipated runtime needs that also need some flexibility in the choice of the memory size.

The performance benefits of passing the memory size to the app and allowing it to specialize on this are obvious, and not a matter of dispute. Does anyone one here dispute that a compiler optimizing away a redundant bounds check is not a performance win? As mention many times, some applications naturally masking pointers, or will want to check indexes are within bounds for their own internal protection.

It also needs to be optional if this memory size is exposed to the wasm app, for the case in which the memory is to be resized at runtime, (your preferred solution).

jfbastien · 2015-09-11T21:44:46Z

I'm sorry to say this, but I think this discussion has gone past its usefulness point. It would be great to move relevant points to separate issues that are shorter and easier to get into without reading what is now a wall of text.

ghost · 2015-09-23T01:39:37Z

Another related matter noted while exploring WAVM is that resize_memory could be used to 'commit' memory from a much larger reserve - this is the current state in WAVM. This could result in developers using resize_memory to make many small incremental increases in the committed memory, and this usage seems like an unintended consequence of adding resize_memory while not adding support for 'commiting' memory.

In the case of WAVM, which can reserve a large area of address space, the wasm code could take advantage of this to optimize away bounds checks when they the proven to be within the reserved memory size, not just the committed memory size. In order to allow the wasm code to explicitly optimize bounds checking, it would need to know this reserved size. This would mean that a memory access could fail even when within the memory_size, because it accessed un-committed memory (or protected memory in future), but that there is a concept of these accesses being 'safe' wrt the runtime sandbox.

I don't think the implications of this have been factored into the design, and that this needs some more consideration, and not taking this into account for the MVP looks like leading to unintended consequences - namely the high use of resize_memory to make small incremental increases in the committed memory.

Adding a commit_memory operation and having memory_size return the reserved size might address this. Asm.js has no concept of committing memory, but could this operation be defined as optional for the MVP. Could some of the popular web browsers bring forward support for this to help ensure the code works as expected when required to commit memory before use?

ghost · 2015-09-26T09:08:42Z

Some preliminary implementation results, for the zlib benchmark. WAVM (x64 linux llvm 3.8) modified to place the buffer at zero plus a slight memory access optimization when using pointer-masking. WAVM already uses pointer-masking, but internally on each access, and this is redundantly when the application explicitly masks the index and explicitly masking the index can support better code generation. WAVM is compared against Odin x64 nightly (the fastest asm.js implementation), running the non-pointer-masking version of the benchmark for which Odin is faster. Excluding compile times, to focus on code potential.

Odin: 14.51 sec, WAVM: 11.43 sec [78.7%]

ghost · 2015-09-26T09:22:50Z

Oh, and WAVM without pointer-masking (masking internally at each access) with buffer at zero again:
WAVM non-pointer-masking app: 13.06 sec, WAVM pointer masking app: 11.43 sec [87.5%]

ghost · 2015-09-28T13:19:38Z

x86 32-bit results:
Odin non-pointer-masking: 19.052 sec, WAVM pointer-masking: 13.18 sec [68.3%]
WAVM non-pointer-masking: 14.51 sec, WAVM pointer-masking: 13.18 sec [90.8%]

Bit hard to compare here as WAVM with implicit masking might not even be semantically correct wasm, and Odin x86 is not very competitive even though it does have an optimized bounds checking scheme.

jfbastien · 2015-09-28T23:05:54Z

As discussed, let's split this up into manageable sub-issues which refer to this one. More data in this issue is of course welcome.

ghost · 2015-10-02T15:20:26Z

ARMv7 32-bit RPi2 results:
Odin non-pointer-masking: 257.3 sec, WAVM pointer-masking: 155.4 sec [60.2%]
WAVM non-pointer-masking: 211.5 sec, WAVM pointer-masking: 155.4 sec [73.5%]

Again a bit hard to compare as WAVM with implicit masking might not even be semantically correct wasm. If wasm demanded non power of two bounds checking on ARM then the performance might be much worse than even the Odin result above (although an implementation could mask to a power of two and use page protection if this helped performance)

These are all results for an application that is using pointer-masking simply for performance, and would work fine if the runtime chose not to use this by supplying a mask of -1 at compile time.

For an application that exploits the masking to remove pointer tags the performance difference would be expected to even greater, such as for a VM implemented in wasm. This is not a matter for the wasm designers to decide, it is an app developers decision. The decisions for the wasm designers is if they will support this extra use case well, and if they will support exposing the memory size at compile time to help optimize away bounds checks.

ghost mentioned this issue Sep 25, 2015

Remove the max from the module's memory declaration. #370

Closed

jfbastien closed this as completed Sep 28, 2015

This was referenced Sep 29, 2015

Memory size tiers #376

Closed

Define a 'management' module for loading. #378

Closed

Imported global constants #333

Closed

Negotiated heap size and methods of resizing the heap. #331

Negotiated heap size and methods of resizing the heap. #331

Comments

ghost commented Sep 5, 2015

lukewagner commented Sep 8, 2015

Uh oh!

jfbastien commented Sep 8, 2015

Uh oh!

ghost commented Sep 10, 2015

Uh oh!

lukewagner commented Sep 10, 2015

Uh oh!

jfbastien commented Sep 11, 2015

Uh oh!

ghost commented Sep 11, 2015

Uh oh!

ghost commented Sep 11, 2015

Uh oh!

lukewagner commented Sep 11, 2015

Uh oh!

ghost commented Sep 11, 2015

Uh oh!

lukewagner commented Sep 11, 2015

Uh oh!

lukewagner commented Sep 11, 2015

Uh oh!

ghost commented Sep 11, 2015

Uh oh!

jfbastien commented Sep 11, 2015

Uh oh!

ghost commented Sep 23, 2015

Uh oh!

ghost commented Sep 26, 2015

Uh oh!

ghost commented Sep 26, 2015

Uh oh!

ghost commented Sep 28, 2015

Uh oh!

jfbastien commented Sep 28, 2015

Uh oh!

ghost commented Oct 2, 2015

Uh oh!