WebAssembly · tlively · Oct 20, 2021 · Oct 5, 2021 · Oct 20, 2021 · Oct 20, 2021
diff --git a/gc/2021/GC-10-05.md b/gc/2021/GC-10-05.md
@@ -0,0 +1,194 @@
+![WebAssembly logo](/images/WebAssembly.png)
+
+## Agenda for the October 5 video call of WebAssembly's Garbage Collection Subgroup
+
+- **Where**: zoom.us
+- **When**: October 5, 4pm-6pm UTC (October 5, 9am-11am Pacific Daylight Time)
+- **Location**: *link on calendar invite*
+
+### Registration
+
+None required if you've attended before. Fill out the form here to sign up if
+it's your first time: https://forms.gle/JehrAB4gWbtHjybt9. The meeting is open
+to CG members only.
+
+## Logistics
+
+The meeting will be on a zoom.us video conference.
+Installation is required, see the calendar invite.
+
+## Agenda items
+
+1. Opening, welcome and roll call
+    1. Opening of the meeting
+    1. Introduction of attendees
+1. Find volunteers for note taking (acting chair to volunteer)
+1. Adoption of the agenda
+1. Proposals and discussions
+    1. Announcement: Proposal update in wider CG
+    1. Discussion: Prototyping and investigation plans (10 - 15 minutes)
+    1. Presentation: dart2wasm (Aske Simon Christenson, 20-30 minutes)
+1. Closure
+
+## Meeting Notes
+
+### Introduction of attendees
+
+- Thomas Lively
+- Aske Simon Christensen
+- Ben Titzer
+- Zhi An Ng
+- Roberto Lublinerman
+- Rick Battagline
+- Ross Tate
+- Zalim Bashorov
+- Lars Hansen
+- Keith Miller
+- Tim Steenvoorden
+- Alon Zakai
+- Jakob Kummerow
+- Andreas Rossberg
+- Conrad Watt
+- Goktug Gokdogan
+- Luke Wagner
+- Francis McCabe
+- Mark (??)
+- Asumu Takikawa
+- Slaval Kuzmich
+
+###  Announcement: Proposal update in wider CG
+
+TL: Let me know if you want to join me, Alon, and Jakob in presenting recent progress on the GC proposal.
+
+### Presentation: dart2wasm (Aske Simon Christenson, 20-30 minutes)
+
+[slides](presentations/2021-10-05-christensen-compiling-dart.pdf)
+
+ASC: Dart is very dynamic. Fields can be overridden by getters and setters or vice versa. Even adding two integers is a call in theory. We depend on type flow analysis heavily for devirtualization in the front end. Every object has a class ID. Every method has a selector offset into the table. Call dispatch uses Wasm tables and call_indirect. Any class can be used as an interface, so all calls are interface calls?
+
+BT: you have reference types in func types, how closely do they need to match for call_indirect?
+
+ASC: I will get back to that, but parameters can change types in overrides. So the wasm signature needs to be the least upper bound. More details later.
+
+ZB (chat): Why do you have tree shaking 2 times?
+
+SK (chat): Does it mean that table size is (num_interface_methods * num_ClassIds)?
+
+ASC: no, these are interleaved. If you have the complete square table it will be very sparse. They are packed together. Each row corresponding to a particular selector will have entries for all the class ids that have that particular method, rows will be sparsed, and interleaved into a 1 dimensional table, so it's usually not very big
+
+FM: on this slide, do you mean each dart object is translated to a Wasm struct rather than each class?
+
+ASC: each class translates to a struct definition
+
+RT: What is the `context` field in _Function?
+
+ASC: It’s a pointer to the context.
+
+RT: Could you have the context be fields after the `functionRef` in e.g. _Function0?
+
+ASC: If you have two different functions with different contexts and different variables, they need to have the same type so the functions are interchangeable.
+
+RT: The two different functions being subclasses of _Function0 would accomplish that.
+
+ASC: There needs to be a cast anyway since the contexts can all be different, so I didn’t do that.
+
+AR: like flattened closures, save 1 indirection and allocation per closure
+
+ASC: you would put function ref in the same object as the context
+
+ASC (Post-meeting addendum): Multiple functions inside the same function can share a context. Mutations from one function are seen by the others. Therefore, there needs to be an indirection from the closure to the context (i.e. deep closures).
+
+AR: if functions are divided by arity, does it mean any parameter and result has to be boxed
+
+ASC: yes at the moment. It is somewhat trivial to do that and still have this be assignable through the dart subtyping rules, worth looking into to avoid boxing. Currently prototype does not support optional parameters at all for function objects, working on this at the moment.
+
+AR: can avoid boxing by using anyref instead of top type, can use i31ref or bools or small ints
+
+ASC: for small ints, there are a few unsolved challenges (see slide)
+
+RT: In the system I briefly presented last time, we did something similar with call tags to solve some of these problems.
+
+BT: About complex type tests: In Jawa I implemented interfaces with a search, but because you used call_indirect you don’t need to do that. But then search comes back for type tests, and I think that it’s unavoidable that it shows up somewhere.
+
+ASC: yea, the way it is implemented in Dart VM, the class ids are numbered such that for most classes, all of its subclasses constitutes an interval of class ids. When testing against a class, you can test class id is inside an interval. An interface with a number of classes implementing it, you check against multiple intervals.
+
+ASC: It would be useful if we could use the JS weakmap on Wasm GC objects.
+
+FM: question about async/await. Is this the only place you needed the arbitrary goto?
+
+ASC: One other situation that creates non-reducible control flow is a switch statement that continues to a previous switch label, which is possible in Dart. That can also be implemented by jumping to the beginning of the switch with an appropriate value on the stack, but is slightly less efficient.
+
+FM: That sounds like CW's multi-loop thing. On async/await, is this transformation part of the semantics, exposed to the programmer? Or can you implement async/await using the stack-switching primitives?
+
+ASC: Stack switching feature could be used, the async/await is not specified in terms of the transformation, it specifies that it suspends at a particular point, then resumes at the same point. If that can be achieved with stack switching, it will be a feasible implementation strategy. In the case of Dart, it's always just 1 stack frame.
+
+ASC: tree-shaking in type flow analysis is based on the result of the type flow, will figure out certain parts are never called, classes never instantiated. Second tree-shaking is there because intrinsicifaction can make methods calls unnecessary because they have be intrinsified.
+
+
+(chat)
+
+From Luke Wagner to Everyone:
+Also, if the captured variables are mutable and there are multiple live closures, you in general need to share the activation state
+From Ben Titzer to Everyone:
+Yeah, you might need to share contexts if functions can be nested multiple levels deep
+i.e. linked closures
+From Mark to Everyone:
+Very cool and interesting presentation! Thanks so much for sharing. From all of the issues that you mentioned have you come across anything so far that you feel like would prevent Dart from becoming a showcase language of how GC languages can work with Wasm or do you feel like the trade offs are too much at the moment? What do you think the main challenges are from here from your point of view?
+
+ASC: not really, with the way Wasm GC is progressing so far, I have a good impression of the group being open to implementor feedback, if something shows up, will have confidence that we can solve it. Have not encountered that looks like a complete blocker.
+
+TL: already beating JS on most benchmarks?
+
+ASC: yes, on big benchmarks, it's something like factor of 2 for time to first frame. After it gets up to speed, a factor of 3. Not done thorough analysis, this particular benchmarks does a lot of megamorphic calls in JS, but efficient when using global dispatch table in Wasm.
+
+AR: does the JS version of Dart support separate compilation in any way? This is about whole program right?
+
+ASC: JS is a whole program optimizing compiler as well.
+
+RL: how does it compare to the performance when running in the Dart VM?
+
+ASC: varies between a little slower to 2-3x slower. It is not as fast as the Dart VM, in some cases it comes close, especially for peak performance on large benchmarks. Native AOT clearly beats it on startup time, doesn't do anything except load code into memory. On the execution speed, not as fast, but not much slower.
+
+AR: any feel for specific language constructs in Dart that are significantly slower in the Wasm version? Or equally distributed? Hard to tell?
+
+ASC: definitely some overhead in call_indirect that is not there in the VM. call_indirect has to do a bounds check and a type check, which are not needed in VM runtime because it trusts the table. Also VM trusts the Dart type system, all the places Wasm needs to insert casts, VM doesn't need to.
+
+AR: are there specific constructs where casting is particularly bad? or across the board?
+
+ASC: one thing I can definitely see in performance is null check, so many of them. This will likely change when V8 implements null check elimination (not implemented yet), at least in current implementation, seen quite some overhead by modifying V8 and looking at the difference there.
+
+AR: V8 does implement nullable and non-nullable?
+
+ASC: no checks for non-nullable. Locals have to be nullable.
+
+AR: goes back to issue with let or uninitialized locals
+
+JK: we implement non-nullable locals, even the completely unsafe version where V8 trusts you to do the right thing
+
+ASC: haven't tried that yet, will try that
+
+AR: you mentioned call_indirect, why can't you use call_ref, which saves some checks?
+
+RT: have to load from vtable anyways, a lot of the same overhead
+
+ASC: will need to load from an array, then do the call, you could have a huge struct with the specific types there, it wouldn't work since you have to index into it. Not really a good fit for call_ref. call_indirect does exactly what it needs to do, but it cannot trust the table.
+
+GG: in our experience, wonder why different in two langauges, is it possible you have less efficient structure? Memamorphic JS beats Wasm quite highly. In J2CL, megamorphic code happening in Wasm is much slower than JS. Have benchmark sthat show that Wasm is significantly slower than V8 dynamic dispatch.
+
+RT: megamorphic class method calls or to interface methods?
+
+GG: class methods? In cases we cannot devirtualize, class with multiple implementation. If you look at delta benchmark from octane, ported to Java, can see Wasm is much slower, JS is close to JVM, Wasm is 50x slower.
+
+JK: hard to say for sure without taking a closer look. 10 years spending making the JS fast, Wasm can't compete with that yet. Could be inlining helps, maybe JS can do that. We can take a closer look offline.
+
+GG: surprising that Dart didn't see that in their benchmarks, but might be just the way they generate JS, don't get full benefit of V8
+
+RT: we have measurements to suggest call_indrect can be done faster than call_ref
+
+BT: also implementation details about call_ref in V8, some devil in the details, without seeing the actual machine instruction hard to tell
+
+JK: call_indirect and call_ref should be roughly the same. CPU internal optimizations or lack of play a big role. In some cases the branch predictor can help. Hard to make general statements there.
+
+### Discussion: Prototyping and investigation plans (10 - 15 minutes)
+
+(Deferred to GitHub / next meeting)
diff --git a/gc/2021/presentations/2021-10-05-christensen-compiling-dart.pdf b/gc/2021/presentations/2021-10-05-christensen-compiling-dart.pdf