-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Better JS size for small programs #5794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think before we do this, there's a lot we can optimize with feature-specific flags and other methods. Not too long ago I wrote a tiny chess game in GLES2, at https://github.com/juj/tiny_chess, and looking at its build output, in one evening I was able to remove about 2/3rds of the runtime boilerplate as unnecessary. (from uncompressed 150KB down to uncompressed 50KB). There does exist a lot of low hanging fruit here, and I think it would be best to start splicing off these types of items on a feature basis, that way it won't be a wholesale "old runtime" or "new runtime" question. Longjmp support is one such example. Another thing is that I'd like to start slimming down individual items in |
Good points. Thinking on this some more, maybe we should clarify the target goal. Is it
I think the first goal is reachable with an incremental improvement approach, but to reach the second, starting "from scratch" in a mode where we add only necessary JS seems more likely to succeed. |
Crazy thought: if the goal is to have a minimal wasm hello world, can we change users' expectations around what that looks like? As a strawman: #include <js_console.h>
int main() {
console_log("Hello world!");
return 0;
} can generate (import "env" "console_log" (func $console_log ...))
(data (i32.const 1024) ("Hello world!\00"))
(func $main (result i32)
(call $console_log
(i32.const 1024)
)
(i32.const 0)
) Then the JS glue would just be The idea being, if people want minimal wasm, they're either doing something web-first, and/or they're experimenting with the wasm format, and C just happens to compile to that today. Supporting |
Yeah, that's very relevant here - people that want minimal output should avoid printf etc. We do already have stuff for them, like |
I wonder if we could save code size by making a C++ API that does string conversion. class Console {
template <typename... Rest>
void log(char* str, Rest... rest) {
add_to_inner_buffer(str);
log(rest...);
}
template <typename... Rest>
void log(int x, Rest... rest) {
add_to_inner_buffer(itoa(str));
log(rest...);
}
void log() {
console_log(inner_buffer);
clear_inner_buffer();
}
} console;
// Later
console.log("Foo = ", foo, ", bar = ", bar); Anyway all that is to say that I think |
If you call printf, that's going to take a lot of code. I wouldn't consider that overhead though, it's the true cost of a powerful C function. Of course it would be good to make it easier to use cheaper logging functions. I don't think long term it would be helpful to have two modes. If a minimal JS mode is started it should eventually be the default and then the only mode. Changes I think could help:
|
I did some experiments on this. The result is pretty long, so I put in a repo. Please see https://github.com/rongjiecomputer/minimal-emscripten Demo is GameBoy emulator by @binji. |
@rongjiecomputer Yeah, I specifically made sure that binjgb didn't use much host functionality (GL, audio, etc.) so it was easier to port. I'm not sure many developers will want to (or even be able to) do that. But it's worth exploring that direction -- I really wish I could have a minimal bindings layer like the one that you generated! Perhaps it would be good to explore which use cases you want to support, and have that point you toward the goals. I can think of a few:
And I'm sure there are many more! Thoughts? |
My experiment is basically option 2 mentioned by @kripken, that is, a new mode starts from nothing and slowly adds more things when needed. A none user-facing breaking change is the use ES6 template string instead of hacky C preprocessor. This allows more complex operations in the template. The downside is
This is me as well. In my opinion, this offers the best performance due to less wrapper layers and temporary JS objects.
I want to support this, but user is expected to write his own JS code to handle filesystem as well. Most big C/C++ projects like Currently C++ exception is not supported as I have not implemented Currently my plan is let user call I have a final exam to prepare, so I am going to submerge myself for a while. I might still have time to participate in the discussion, but will only have time to work on this some time in December. |
A nice way to do simple prints from C is #include <emscripten.h>
int main()
{
int foo = 42;
EM_ASM(console.log('hello ' + $0), foo);
} that is also size efficient (although could be even smaller if
I think we could do both; but what I'm saying is that we should have an extremely strong bias to doing the first item first until we run out of items to optimize, because refactoring >>> rewriting (the usual Joel on Software and Coding Horror articles etc.). Jumping on first foot forward to rewrite here would feel like a bad engineering call. I don't like the idea of having two runtimes and scenarios where we would end up with people considering "how do I do this in the new vs old runtime?", or "is this thing X still compatible with the new runtime?", or "will this thing X ever work in new runtime?". There is nothing fundamentally incompatible or impossible why the current runtime could not be DCEd one item at a time, except the engineering time to start looking at opportunities to optimize. Different directions are raising these kind of "hands up" looks with "what if we just started over with the runtime?", and those kinds of thoughts come mainly from not understanding why the undesired lines exist in the runtime, when they are needed, what problems they are there for to solve, and what the path would be to optimizing them away. Sure, nobody likes having to deal with overhead from other developers' problems, that is understandable. As first party developers of all these features that go into the runtime, we do have that knowledge, so we should be able to cut those items down one at a time. Taking a peek at building the above code example and looking at the output, there are a lot of content there that we have an opportunity to implement better DCE machinery for. If we "started over" but with no better DCE machinery than we currently have, we would still be lacking the capability to DCE, and would end up in a similar situation eventually, where we don't have the means to DCE undesired code away, and will then start to implement the exact DCE methods that should probably have been built in the first place. Having flags Then after we have slimmed down all that we have any chance to do, we can look at what remains and figure out why those lines are so fundamentally difficult that they could not become manually exportable in some fashion. I could be wrong in that the engineering effort to do the above might prove to be too much, that we won't be able to pull it off. Though I think that would be the path to best success that would allow all features to be used - compartmentalize different features in boxes, then either automatically have the compiler choose which boxes are used, or if not possible, do it manually, so that developers have the options to exactly pay only for which features they use. |
Oh, on this specific note, I want to kill |
@juj I think your suggestion of modularizing the current runtime is a good one! I agree that starting over from scratch is probably the wrong way to go. But I'm also a bit concerned about the idea of adding more |
@rongjiecomputer - thanks for sharing that experiment! Very interesting and helpful to think about this. I think that experiment shows it is possible to offer a "minimal JS" runtime option. The details are tricky, though - I would suggest different stuff be enabled in that mode than in the experiment ;) - which perhaps proves one of @juj's points. @binji - what type of interfaces do you mean here? And what type of use cases do you have in mind for developers replacing parts of the runtime? |
Basically just imports/exports to each layer. The compiler provides an initial layer, and each additional layer depends on a previous one. Should be analogous to a standard module system, though probably won't be exactly that. So something like cwrap depends on very little, and something like the filesystem layer depends on more, and the IndexedDB layer even more, etc. It seems like you have a lot of this behavior already implemented w/ the
Well, I gave some examples about some use cases above, but personally I don't need a lot of the features, so all I want is enough to get the C/C++ code off the ground (setup linear memory, run static initializers, etc) then allow me to call into the module. I typically won't use many C library or POSIX features (probably just printf) and will instead plumb through my own functions. |
@juj I also agree that starting over from scratch may be bad because we will need up supporting two runtime versions, adding to the already heavy maintenance burden. Due to lack of DCE and the fact that everything is exposed in global, whatever we do here will most likely break someone's code.
I share the same concern. For new users, they won't know all these flags and completely puzzled why their hello world is large even though they don't use Any thoughts about the possibility of using ES6 template string as template engine instead of C preprocessor and
Some of the things I want to kill:
|
I think it's not concerning at all to default to exporting fewer functions or features, since the breakage will then in 99% cases manifest as some code being missing ( If there was a new "completely from scratch" runtime, then we'd be scrambling to figure out how to make it compatible, and what use cases it's for. It would have a chance to split the developer community into two, like what happened to Python 2 and 3. Your list of things to optimize is a good one, and I think that's already a great start to boxing up features to separate compartments to control specifically. I still think that the C preprocessor with Baby steps to success, in reviewable units. #5826 to start off with.
I think "in addition to", rather than "instead", except unless this can be proven to be superior in practice? We will migrate to latest Node.js LTS in emsdk in next tagged release, so this will be available at least out of the box. |
With more And could emcc.py be made more modular? It's huge and very hard to approach as someone new to the codebase. (I know this would be quite the undertaking.) I also don't think it should directly contain JS code gen. |
@binji I see, thanks. That sounds good in general, but I admit I don't have a clear idea of how the layers would look yet. I think, though, that such a design could be done in separate manner from the better JS size issue? Maybe it would even become easier to do after we do some of the JS size shrinking, as it will entail refactoring and modularization. @curiousdannii Both your suggestions are very good, we should do those. For emcc.py modularization, we can split out code into smaller components in |
This is a key point, good to bring it up. If we focus on the incremental modularization approach (instead of a new "minimal JS" mode), then I think we need to agree that
Do those principles make sense? |
@kripken Yes, I think layering can help make DCE simpler, but it seems like there are low-hanging fruit to remove first.
Awesome! I'm very excited to see this. Backward compatibility is great, but I think we as developers understand that sometimes things must break to move the project forward. As long as I can pin my emscripten version and only upgrade as desired, and there are clear errors when things break, then I am OK.
This is a good idea, but I don't think hello world is a good example. It's true that everyone tries this out and having it be slim is good PR, but it isn't very realistic for actual users. Emscripten is pretty widely used; perhaps you can just take some concrete projects as examples and measure reduction for them instead? |
Fair enough, yeah. We can figure out what to measure on (and the size target to aim for on it) later, assuming we agree to go down this route. And yeah, maybe a real-world project could be good (box2D or ammo maybe?). |
I appreciate both the argument for why Option 2 can achieve the better end result and why we should avoid splitting the ecosystem and duplicating work. It seems like both are achievable, though: build the Option 2 ideal in terms of general primitives while refactoring Emscripten to be implemented in terms of these same primitives. I've seen this strategy work in practice a number of times. Really, this is the path we've started on already with the upstream llvm and lld wasm integration projects; it seems like we just need to continue this refactoring "up the stack". IMHO, the ideal end state here is that Emscripten is like a Linux distro, pulling together a collection of tools and packages, providing an easy install with good defaults and regularity, filling in the gaps with custom bits, and hosting a community. Does that make sense to everyone else? One refinement I'd propose making to Option 2 though: in addition to defining Option 2 in terms of "minimal output size", could we also say that Option 2 specifically generates ES Modules which are designed to be used as part of a bigger app that uses lots of ES modules. This change in output form can help explain why the Option 2 environment doesn't attempt to provide a full POSIX environment and why certain things must be passed as explicit parameters instead of using the global scope. I think being able to explain Emscripten output in terms of familiar ES Module concepts will help adoption almost as much as getting the "hello world" output size down. |
I don't think I understand how that would work, and I'm not sure what the suggested primitives here would be. I am also unclear on why refactoring to use general primitives would help decrease code size which is the issue here - for example, using lld (one of your examples) will likely increase our code size. (But it's worth doing for other reasons of course!) |
After some offline discussion with @lukewagner I opened #5828 for discussion of the modularization approach. For this issue, I think we should make sure that what we decide makes sense with the long-term goal of having ES6-module-based output, as mentioned there:
|
For now, I can accept the plan to just guard more components with I like the idea of
"In addition to" sounds good. |
I think rather than "how much code size it wins", I see it as "how does one deal with the breakage?". There are a lot of different types of breakage, and something we analyze constantly when communicating to external parties. When there's an easy "you'll start to get a compilation/runtime error about X, so then do Y" model to follow, it's not much of an issue to disrupt users, since the path to action is clear. Even if you don't save too many characters with this, it's easy to justify if it simplifies, since the action to resolve is easy. A more difficult one is when there's a bidirectional breakage: old code will not compile/work on a newer compiler version, and the fixed code will not compile/work on an older compiler version. We had this with the Wasm debug table formats change. This is much more annoying than the above, because one can't write any kind of "one ideal form of code" that would be compatible with different versions. We want to avoid bidirectional breakages whenever possible, since this impacts distribution update paths in ecosystems. If it's not possible, then we should aim to be diligent to identify where such bidirectional breakages lie, so that those will be easy to discover. Removing the Then there are the changes where we know something will break, but do not want to think about how it will manifest, which existing features are (in)compatible with the changes, how to migrate, or debug. These are in the red flag zone - users will get angry if they will need to research new breakages after someone else's PR lands and if they discover the breakage was not an accident but intended. Shrinking code size and ES6 modularization are very orthogonal, we can certainly add ES6 module structure even without putting any effort in to shrinking code at all. Both of these features can be worked in parallel as well. |
@juj - good points, agreed. I'd add that I think many of the changes here could either have compile-time error messages, or if not, then at least run-time errors, something like this:
I think doing that (+ compile-time errors when possible etc.) would reasonably mitigate the breaking changes we are proposing here. |
Opened #5836 with some data on the JS we emit for one testcase, and a list of tasks for it. |
Interesting thread! I share the concern about adding more -s flags. Main reasons:
I think the ability to build without a build system is especially important for beginners (in the sense of "people new to WASM/asm.js", or when building WASM modules for an app that's primarily written in JS. So from my point of view, a way to move some of these configuration options into the source code would be highly appreciated (although I have no good answers of how to achieve this, but EMSCRIPTEN_KEEPALIVE or the custom #pragmas point into the right direction). |
This makes us not exit the runtime by default. That means we don't emit code for atexits and other things that happen when the runtime shuts down, like flushing the stdio streams. This is beneficial for 2 reasons: * For #5794, this helps remove code. It avoids all the support for shutting down the runtime, emitting atexits, etc. It also enables more optimizations (the ctor evaller in wasm can do better without calls to atexit). This removes 3% of hello world's wasm size and 0.5% of its JS. * A saner default for the web. A program on the web that does anything asynchronous will not want the runtime to exit when main() exits, so we set this flag to 1 for many tests, which this PR lets us remove. However, this is a breaking change. As already mentioned, the possible breakages are * printf("hello") will not console.log since there is no newline. Only when the streams are flushed would that be printed out. So this change would make us not emit that. * atexits do not run. Both of those risks are mitigated in this PR: In ASSERTIONS mode, check if there is unflushed stream output, and explain what to do if so. Same if atexit is called. This PR has a lot of test changes, some that simplify web tests - because the new default is better for the web - but others that add a param to a shell test - because the new default is less optimal in a shell environment. I think the risk here is lower than those shell tests indicate: we do test quite a lot of things in the shell, but just because it's convenient, not because that's what most users care about. Also: * this PR found an unnoticed bug: FORCE_FILESYSTEM didn't actually do what the name suggests. I think we just never tested it properly with NO_EXIT_RUNTIME. Fixed in this PR. * emrun sets NO_EXIT_RUNTIME=0. it is a mode where we specifically want to get the exit code from the running program, as if it were a shell command, not a browser app * add faq entry, and mention the faq * fix an existing emterpreter-async bug: if we are unwinding the stack as we leave main(), then do not call exit, we are not exiting yet - code is yet to run later * metadce is now more effective, update test * faq entry on Module.* is not a function * fix browser.test_emscripten_main_loop - the pthreads part needs the runtime to exit
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant. |
We've historically focused on full support of existing C/C++ code by default, which leads to us emitting a bunch of things that increase the default JS code size, like
/dev/random
).As a result the JS emitted for "hello world" is not tiny.
In a medium or large project the compiled code is the large bulk anyhow and we have very good code size optimization there. But people do notice the JS size on small programs, and with wasm increasing the interest in compiling to the web, this has been coming up.
I've been thinking that dynamic linking might get us there, as we emit no JS for a standalone wasm dynamic library (SIDE_MODULE). However, dynamic libraries add relocation (unnecessary for many things) and we don't necessarily want 0 JS, we want "minimal" JS. So dynamic libraries are not the solution here.
Two other possible paths we could go down:
your program uses longjmp, you need to compile with -s LONGJMP_SUPPORT=1
).MINIMAL_JS
perhaps, which we would design from scratch to be minimal - we'd start from nothing and add just the things we want in that mode. It wouldn't support things like a filesystem or POSIX or atexit etc. (and we'd need to decide on exceptions and longjmp, asm.js or just wasm, etc.). We'd point people to the "normal" non-minimal JS for those things.Thoughts?
The text was updated successfully, but these errors were encountered: