-
Notifications
You must be signed in to change notification settings - Fork 40
Inadvertent exporting of Wasm functions #31
Comments
Technically it's not any function; just functions that are statically referrenced by Maybe the issue you're getting at, though, which I do think we should consider, is that, in the MVP, we know all the external (exported or stored in an external table) functions before the code section starts, whereas with |
I agree with @lukewagner here. It'd be nice to know up front which functions are used in a first-class way, before the code section. |
So, if I was a lazy producer that simply put all their functions into this section, what would be the penalty? |
On V8, the penalty is that we generate wrapper code per signature (essentially a machine code stub whose complexity is proportional to the number of arguments) plus a JavaScript function per Wasm function (minimum two objects--a JSFunction and a SharedFunctionInfo). We're working to make the generation of those wrappers cheaper (and happen in parallel), but the JSFunctions we'd need to do lazily to avoid the upfront cost. |
Hm, how difficult would a lazy approach be for an implementation? An extra section adds somewhat non-trivial language complexity in this case: we would need to decode that section, validate it, extend static function contexts to track the information whether a func is "first-class", employ that to validate uses of func.ref, etc. All doable, but somewhat ugly and ad-hoc. Shame, I suppose a custom section for this sort of (primarily JS-relevant?) optimisation hint would not cut it, since you cannot rely on its being accurate. |
Indirect callability could end up influencing codegen for streaming non-JS hosts as well. I would think, spec-wise, this could be handled as a purely binary encoding detail, similar to how we did with the data segments up-front size declaration. |
Yeah, I'm not sure. The "binary encoding detail" approach already became slightly hacky for the simple data count section. I suspect it would be borderline for something like this, which requires more involved validation. I'll think about it. |
I'm about to implement this; which means either harvesting I still think we should maintain the general property that all aliased functions are known before the code section. I realized that aliased functions are not just externally callable (requiring entry stubs, which should be lazily generated for a couple of reasons) but also indirectly callable (since any |
Ah, thanks for the reminder. I have been throwing this around in my head for a while and so far think that a clean and forward-looking solution involves the following:
I also have considered the idea of only making it a detail of the binary format. But it seems that would be stretching it too far in this case, since the validation conditions are substantially more extensive than just comparing two numbers. WDYT? |
FWIW, another option might be to piggyback on (and abuse) segments and the validation exception that we'll probably need to introduce for them anyway. That is:
This solution is a bit more hacky, but would require fewer extensions to AST, binary, and text format (only adding the new segment kind). |
I think using element segments came up before but was shot down as too hacky. The declared bit (expressible more to the point?) reduces the hackiness by quite a lot, I think. If other kinds of values require a reference section in the future so be it, but until then we keep the complexity down. |
Well, I wouldn't want to end up in a future where we have both. If we go for the segment approach, then we should design it such that the new segment kind is extensible to other reference kinds later. |
@rossberg Your "reference section" idea is exactly what I was imagining. Regarding binary-encoding vs. validation: I think it'd be kindof a pain to have to forward-declare references in the text format, but if we have them implicitly declared/unified (as we do with function types), then we won't be able to write negative tests. If the distinction is not present in the text format, I'd be inclined to keep it in the binary format as well, but no strong opinion here. Regarding the elem section idea: scanning the elem section encoding just now, it's already a bit hairy, so I'd be reticent to add another case. Plus, it raises some questions without very satisfying answers:
|
@lukewagner, wasn't the text format meant to be a faithful representation of the AST, plus some sugar? That implies that everything expressible in the AST should be expressible in the text format. Making ref decls implicit would violate that (unlike implicit function types, which are just a shorthand). Analogous to
plus the shorthand
Regarding elem sections, I think question (1) may have to be answered either way: does the ref section have to repeat indices already mentioned in a segment? If no then the segment approach would actually make that more uniform. For (2), I was imagining something like I don't understand the distinction you are making between "stored in table" and "first-class". Under this proposal, you can always emulate But in general, I agree that a proper reference section would be cleaner. |
Yes, but whether it's in the AST or not is what I was (lightly) suggesting may not be necessary :) If it goes into the AST, I agree it should be fully explicit in the text format. The sugar you mention seems totally reasonable, though. (Actually, if we end up extending the elem section, the sugar could also be used to put a function into a normal (non-declared) elem section too, allowing the sugar to bind the elem index (which you need to implement address-of-function) to a
Oh right, I was only thinking of |
Yeah, as I argued above, not putting it into the AST would require cramping all the associated validation into decoding, which would be too much in this case (and by the same token, too difficult to write tests for).
Right, which may make the declare-by-segment approach look somewhat less hacky. |
I agree with the overall goals here, even though we use the same ABI for
direct and indirect calls, so we don't need the code generator to know
about refs up front, and we switched to lazy generation of the wrapper
stubs needed for table.get from JS. Given that both element segments and
ref.func are essentially the same from the code generator's point of view,
I think the ref-by-segment approach is more attractive. An extra ref
section seems a little weird to me. So I like Andreas's suggestion of
another elem segment type. Would that then mean that the ref.func bytecode
has immediates of [seg_index][elem_index]?
As for designing for future compatibility, what else do we anticipate being
able to ref? Are all of those things potentially table elements too? If so,
then that would suggest this is the right path.
…On Fri, Apr 26, 2019 at 8:53 AM Andreas Rossberg ***@***.***> wrote:
@lukewagner <https://github.com/lukewagner>:
Yes, but whether it's in the AST or not is what I was (lightly) suggesting
may not be necessary :)
Yeah, as I argued above, not putting it into the AST would require
cramping all the associated validation into decoding, which would be too
much in this case (and by the same token, too difficult to write tests for).
From that perspective, there's a lot more symmetry between ref.funcd and
table-initialized functions than I initially thought...
Right, which may make the declare-by-segment approach look somewhat less
hacky.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC46VVHZILWMCZ2GV4R4IVTPSKREZANCNFSM4GXQW3AQ>
.
|
Ah, no, that might be yet another way to do it (requiring ref.* to make an indirection through a segment), but that would be tedious because you'd probably need separate segments per type, to make the type information propagate. The suggestion was simpler: keep ref.func as is, but make it a validation error to use it with a funcidx that does not also appear in an element segment. The new form of element segment would just provide a way to fulfill that requirement with a dummy
I think we want to keep the door open to potentially allow refs to all module entities (func, table, mem, global). But both proposals could easily allow that. |
Ok, elem segments sound good then. Looking more carefully at the bulk-memory-ops elem segment encoding, though, I noticed that, if we encode "declared" segments like passive segments (just with a different flags value), then functions are only implicitly declared by their use in an This makes me actually question whether we should re-think our encoding of passive segments; what if each segment declared a single "definition kind" (instead of Thoughts? (cc @binji ) |
@lukewagner, I think "definition kinds" are insufficient for more general table types. Just consider a table of type anyref -- you might reasonably want to define a segment consisting of heterogeneous kinds of entries. |
I think it may be a bit late for this -- we've enabled bulk memory by default now in v8. (I would have assumed SM did too...). We could still use a different encoding for the new segment type though, if we were concerned about size. |
TBH if the change is very small and could be agreed upon right away, we could backport a v8 fix to M74, but toolchains also need to have a fix, no? This is thin ice, though. |
@rossberg Yeah, I considered that case, but thought it might be rare and thus it'd be Good Enough to require using multiple passive segments for that case. But I didn't realize V8 was about to ship (in FF, it's still locked to Nightly; looks like bulk-memory-ops are still Stage 3), so I guess that raises the bar for mucking with passive segments. Anyhow, "declared" elem segments don't have the same issue; the order doesn't matter so it should be perfectly fine to have one declared elem segment per definition kind. Does that sound good? |
@lukewagner, yes, as mentioned somewhere upthread, I would imagine the format |
Ah, I missed that; great. Then my next question is, to avoid even more asymmetric cases for each flags value, should we change the bulk-mem-ops elem segments to have the same format (where only flags=0 has the hardcoded array-of-func-indices-and-table-index-0). I know bulk-mem-ops is near shipping, but it's also technically only Stage 3 and I'd guess that, while toolchains are already producing passive data segments (b/c worker-based threads), they may not be using any of the new elem segments, so there might not be any significant retooling effort. |
We could do that, but it would make null references inexpressible, unless we use a sentinel index value or perhaps a special |
Down the road it would make any non-homogeneous segment inexpressible, null refs are just the most obvious case. It will also make segment slots taken from imported globals impossible. So such a change would greatly reduce the value of table segments. |
Given |
So you're suggesting we modify the flags=2 case? For the bulk-memory proposal, that case isn't very useful anyway, since we only allow one table. Maybe we should make that illegal in the bulk-memory proposal (i.e. disallow the flags=2 index=0 case). I'd guess that's a small enough change to merge. |
Yes, and :) Really I'd love to see is symmetry between all the cases, using the format " I expect V8's immediate use case is the passive data segments since, with worker-based threads, MVP elem segments are good enough, so a conservative way to ship the uncontroversial part would be to just disable flags!=0 in release. |
I would rather vote for getting rid of flag 2 segments entirely. ;) |
The overhead of the current passive element segment design is not great, but not too bad either. Initializing a table cost 3 bytes per function reference (assuming 1-byte function indexes) plus the cost of
Yes, but I don't think we want to half-ship the bulk-memory proposal. I'd rather change the proposal minimally and ship that.
Well, if we punt flags=2 to the reference-types proposal, we can decide then whether to keep or throw away. :) |
Multiple segments will probably not work in those cases, since you'd have to split into groups of same-kind consecutive entries, which could easily be singletons. And if we are talking about exporting segments that's even worse, because the splitting is dependent on the data in there, so that's not modular or robust against changes.
Well, a more important goal is keeping data and elem segments alike, so that would only make sense if we punted on both. ;) |
@binji If the whole point of elem segments is size (compared to @rossberg Are we really considering importing/exporting segments? There's a fair amount of overhead to complete the set of functionality needed for each definition kind (giving them their own reference type, JS API reflection, |
I don't think it is --
I can see
OK, but we could do that for bulk-memory anyway, since MVP only allows 1 memory, 1 table. TBH, I'd be OK with this change. |
Yes, but using the same manner of argument as with size: if it's removing branches we're optimizing for, performance would be further improved by removing the branching on the definition kind from the inner loop of
But frequently interleaved throughout an otherwise contiguous segment? Note that this isn't even possible in the MVP's elem segment. If nothing else, this seems like a waste of binary size. |
Let's discuss this tomorrow in the CG meeting: WebAssembly/meetings#405 |
Just to confirm, is the consensus above that for a |
Yes, |
This commit implements the 'ref.func' instruction by emitting an instance call to WasmInstanceObject::getExportedFunction. The referenced function must be used in an element segment to validate. See [1] for more details. [1] WebAssembly/reference-types#31 Differential Revision: https://phabricator.services.mozilla.com/D40586 --HG-- extra : moz-landing-system : lando
This commit implements the 'ref.func' instruction by emitting an instance call to WasmInstanceObject::getExportedFunction. The referenced function must be used in an element segment to validate. See [1] for more details. [1] WebAssembly/reference-types#31 Differential Revision: https://phabricator.services.mozilla.com/D40586
This commit implements the 'ref.func' instruction by emitting an instance call to WasmInstanceObject::getExportedFunction. The referenced function must be used in an element segment to validate. See [1] for more details. [1] WebAssembly/reference-types#31 Differential Revision: https://phabricator.services.mozilla.com/D40586 UltraBlame original commit: 00e5548e9c435314beb0c572e1476330b06e1d6d
This commit implements the 'ref.func' instruction by emitting an instance call to WasmInstanceObject::getExportedFunction. The referenced function must be used in an element segment to validate. See [1] for more details. [1] WebAssembly/reference-types#31 Differential Revision: https://phabricator.services.mozilla.com/D40586 UltraBlame original commit: 00e5548e9c435314beb0c572e1476330b06e1d6d
This commit implements the 'ref.func' instruction by emitting an instance call to WasmInstanceObject::getExportedFunction. The referenced function must be used in an element segment to validate. See [1] for more details. [1] WebAssembly/reference-types#31 Differential Revision: https://phabricator.services.mozilla.com/D40586 UltraBlame original commit: 00e5548e9c435314beb0c572e1476330b06e1d6d
Actually, declared segments haven't landed yet, neither in the interpreter nor in the spec nor are there tests. I have an earlier version of them on a branch, but need to rebase on the bulk proposal, since that significantly changed the segment representation more recently. Discovered various holes in the spec & implementation of that proposal on the way that had to be fixed first. I'll try to get to it next week. |
…mory.init`, `memory.drop`, `table.init`, and `table.drop` identify passive segments; only that they index a valid segment. (#31)
Fixed via #73. |
The anyref/anyfunc allows a function to exported from a WASM module dynamically - as part of the returned value of another exported function. Similarly for imports.
This effectively blows apart any discipline for what is exported from a WASM module. It also complicates implementation because potentially every WASM function may become accessible from JS.
The table.set operator in conjunction with indirect calling of functions from any table allows monkey-patching of functions.
The text was updated successfully, but these errors were encountered: