Skip to content

Call populate_args only if we actually need command-line arguments #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 25, 2019

Conversation

sunfishcode
Copy link
Member

This avoids linking in the argv/argc initialization code,
and the __wasi_args_sizes_get and __wasi_args_get imports, in
programs that don't use command-line arguments. The way this works is,
if the user writes int main(int argc, char *argv[]), the argument
initialization code is loaded, and if they write int main(void),
it's not loaded.

This promotes the __original_main mechanism into an effective contract
between the compiler and libc, which wasn't its original purpose,
however it seems to fit this purpose quite well.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2019

I'm not huge fan of depending on the compiler workaround in this way. Especially since using a wrapper function like this is not the only way to implement to workaround in the compiler (I would have preferred to simply re-write main's signature in place).

I'm also not convinced that using the presence of the args in the main signature really tells us whether the args are used by the program, but I guess this will catch at least some such users.

But if this helps deal with some of the code size issues raised in WebAssembly/WASI#109 then.. maybe?

@sunfishcode
Copy link
Member Author

This isn't related to the discussion in WebAssembly/WASI#109; it's just a general code-size reduction that some actual users are asking me for. WASI's argv/argc APIs could doubtlessly be improved, but there will always be some initialization code needed, and this patch makes it easy to skip the initialization if we don't need it.

Rewriting main's signature in place isn't easy, because C programs can call their own main, and there's no way to detect this reliably from within main's translation unit. Unless we do some serious magic, it's likely we'll always have wrappers. And, it happens that the wrapper does exactly what we want here.

I'm also not convinced that using the presence of the args in the main signature really tells us whether the args are used by the program, but I guess this will catch at least some such users.

argv is the only way to get the arguments in C/C++, so if it's not present, arguments aren't being used. This will break three-arg main users, however neither POSIX nor ISO requires any implementation to support that, and such users will get a signature-mismatch warning from the linker, and it's easy to fix in a portable way.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2019

What I meant was that there will programs that declare a two argument main but never reference the arguments. So this method is conservative.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2019

We are actively looking at moving the __original_main hacks into the linker which could also change the naming of the wrapper function: https://reviews.llvm.org/D68751

@sunfishcode
Copy link
Member Author

What I meant was that there will programs that declare a two argument main but never reference the arguments. So this method is conservative.

True. A natural place to implement that last piece of the optimization would be in LLVM, around where WebAssemblyFixFunctionBitcasts.cpp is already detecting the "main" function. That leads me to:

We are actively looking at moving the __original_main hacks into the linker which could also change the naming of the wrapper function: https://reviews.llvm.org/D68751

That won't break __original_main, which LLVM emits, unless you also remove that code from LLVM. However I'm proposing you don't remove it, because it's also a natural place to implement the unused-argument version of the optimization too.

@sunfishcode
Copy link
Member Author

I split out the first patch from this PR in #118 for easier reviewing.

@sunfishcode
Copy link
Member Author

#118 is now landed, and this PR is now rebased on top of it, so it's a simpler PR now!

This avoids linking in the argv/argc initialization code,
and the __wasi_args_sizes_get and __wasi_args_get imports, in
programs that don't use command-line arguments. The way this works is,
if the user writes `int main(int argc, char *argv[])`, the argument
initialization code is loaded, and if they write `int main(void)`,
it's not loaded.

This promotes the `__original_main` mechanism into an effective contract
between the compiler and libc, which wasn't its original purpose,
however it seems to fit this purpose quite well.
@sunfishcode
Copy link
Member Author

And the CI now passes :-).

@@ -0,0 +1,53 @@
#include <wasi/core.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filename now seems misleading. How about __wasilibc_original_main or at least something with main in it?

Copy link
Member Author

@sunfishcode sunfishcode Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about __original_main.c, since that's the main symbol it defines?

Edit: no pun intended, oof.

int r = main(argc, argv);
// Call `__original_main` which will either be a compiler-synthesized
// function which calls `main` with no arguemnts, or a libc routine
// which populates `argv` and `argc` and calls `main` with them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate?

Is this more like.. ".. will either be the application's zero-argument main function (renamed by the compiler) or a libc routine which populates argv and argc and calls the application's two-argument main."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I was trying to keep it abstract here, but it isn't essential. I've now updated the comment to your suggested wording.

@sunfishcode sunfishcode merged commit afbf94c into master Oct 25, 2019
@sunfishcode sunfishcode deleted the lazy-argv branch October 25, 2019 00:30
@sunfishcode
Copy link
Member Author

Unexpected failures:
asan__deep-stack-uaf-1.C.o.wasm
asan__symbolize-callback-1.C.o.wasm

Both of those tests deliberately execute a use-after-free, intending to test asan functionality, but if I understand how the waterfall works, it's not testing asan here, so those aren't meaningful regressions.

@aheejin
Copy link
Member

aheejin commented Oct 25, 2019

Ah you're right. We shouldn't run or care about the results of those test cases for sanitizers after all actually...

aheejin added a commit to WebAssembly/waterfall that referenced this pull request Oct 26, 2019
We don't run sanitizers on them, so all sanitizer tests (ubsan, asan,
tsan, ...) may start to fail at any point, and we shouldn't care about
the results. For now I update the expectation to match the current
behavior, but if this happens frequently enough, we may need to fix the
runner script so that we can skip those tests or something.

The two asan tests started to fail after WebAssembly/wasi-libc#112, just for
reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants