-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Error: "exports count of 143958 exceeds internal limit of 10000" #22863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When i get rid of the --export-all flags I end up with this file (https://github.com/anutosh491/xeus-cpp-lite-debug/blob/main/tmptqjqh2js.json) but then I lose out of symbols which are all present in the above and I end up with these errors when I host
|
I guess adding I could try that but I was interested in all of them cause I am not sure what sort of symbols I need and the ones I can skip ! |
Yes, you almost certainly don't want to be passing In terms of exporting symbols, you should only export symbol that you intent to use directly from JS, i.e. that you want to call from the embedder. What are you trying to build here? Are you trying to just run clang? If so, than you don't need any exports other than the default |
The issues I have been opening lately (1, 2 which I think you self assigned and this one) are all towards the same goal that is to get xeus-cpp-lite (xeus-cpp + jupyter-lite) working Xeus-cpp is a Jupyter kernel that allows us to run C++ code on a jupyter notebook. Jupyter-lite basically aids running jupyter kernels so that you can have something completely in browser. Basically there are 4 layers here and all of them should be compiled against emscripten
i) Now llvm has been compiled cleanly to I have access to But I end up with this error The stack trace tells me that
|
So the error is basically coming from here Pasting the stack trace too for more info So I am kinda trying to make sure where the error is
So i am not sure what might be going wrong. Hence I was just passing ( Also now that I don't see any undefined symbol during build time but I see the above undefined symbol during runtime. I am just curious if this has something to do with mangling. I can see that almost all stuff coming out of clangInterpreter is like |
No, I am not running wasm-ld directly. I am running em++ itself. My final link.txt looks like this when I use both the export flags That being said I am seeing this.
But when I try option 2) I end up getting the above error |
That being said, I think it would be appropriate to close this issue. After exporting the required symbol something like |
Although I have closed the issue, I am curious to know what might be giving us the error. For example now i am running this on the code cell
so I get this error
This doesn't look like a symbol not present issue. |
Looking at the code in https://github.com/llvm/llvm-project/blob/9f796159f28775b3f93d77e173c1fd3413c2e60e/clang/lib/Interpreter/Wasm.cpp#L74-L86 it doesn't link in any standard libraries so it not suprising that things like printf are not defined. Do you what that |
Hey @sbc100 Thanks for the reply
Ahh yes that's something I missed totally.
Yes, so this is where it at all starts. So Wasm.cpp basically is a code based representation of the approach stated above. So |
Well so as the above approach goes we would be interested in how we do the linking. Your reply tells me that we might be interested in more flags .... not sure maybe some more flags for standard libraries as follows !
Along with what we already have. Does that look like the way to go here ? |
If you want to make the code in https://github.com/llvm/llvm-project/blob/9f796159f28775b3f93d77e173c1fd3413c2e60e/clang/lib/Interpreter/Wasm.cpp#L74-L86 somehow compatible with emscripten linker, I think that would represent a lot of work. You would basically have to duplicate all/most of the logic in the emscripten linker in that BTW, if you were going to try to make this work I would suggest getting it all working first with the normal native/desktop version of clang (i.e. solve the problem of making |
To give you an idea of the complexity here is just a small subset of the logic in the emscripten linker for generating the link command: Lines 133 to 293 in 4b3bace
|
Few comments here
That being said, would this approach make sense
|
I have been discussing this with @vgvassilev and @argentite who were the authors of Wasm.cpp. As far as my understanding goes they could get a very similar demo working not neccessarily passing the flags to the linker (and hence I think the LinkerArgs in wasm.cpp doesn't possess flags like standard libraries as of now) The demo can be seen here (https://wasmdemo.argentite.me/) |
Hey @sbc100 I would like to try something out. The following is my understanding. Maybe you could let me know if you think I am wrong somewhere
Now while building with Emscripten (we link again -lc or -lc-debug by default) and hence xcpp.wasm (our main module) has have access to printf (as can bee seen in this json file that gets produced while building with emcc_debug=1) This tells me that if we add --allow-undefined (nothing but --unresolve-symbols=ignore + --import-undefined) for our linker flags
I can see here (https://github.com/llvm/llvm-project/blob/0c5bf565ba7059ca8542c522fcab019f2e2c82bb/clang/lib/Interpreter/Wasm.cpp#L91C1-L92C62) we use this
And the docs tell me
So my understanding here tells me that maybe we can not resolve symbols at link time rather do it when we dynamically load libraries using dlopen ? The change involved is just the following (which would be made here https://github.com/llvm/llvm-project/blob/9f796159f28775b3f93d77e173c1fd3413c2e60e/clang/lib/Interpreter/Wasm.cpp#L74-L86)
|
Yes, both of those assertions are correct. |
Yes, you can either have the static linker (wasm-ld) resolve the symbols, or you can have the dyanmic linker (the code in library_dylink.js) resolve the symbols when
If you want to load the resulting module with |
Thanks for the quick reply. So as the approach I pasted above also speaks about generating a "shared library" As can be seen this also mentions about resolving the symbols at the dynamic loading step. So what I'll do is, maybe try with this building llvm against emscripten with this patch
I guess the above confirms that we are looking for a shared library (that avoids the symbol resolution at link time) that can be dynamically loaded using dlopen. Do you see any other necessary change ? |
Building a shared libraries correctly will likely involve mimicking the current logic in emscripten for doing so: Lines 133 to 249 in 5c81135
As you can see there is a fair amount of complexity there so trying to reimplement that in Wasm.cpp will likely involve a fair amount of work, and back and forth. I'm not saying its impossible, but be aware that you could be signing up for a fair bit more work/debugging here. |
Hmm, well this is something I really need to confirm cause the above approach (or the people who executed it) don't neccessarily mention this and were successfully able to get a REPL running in the browser. You can try it out here too (https://wasmdemo.argentite.me/) Now obviously each iteration or code input in the REPL ends up being a shared/side module and needs to be loaded on top of the main module and these modules obviously need to have access to symbols (I am guessing atleast those that are available in the main module or from standard libraries) and even these side modules should keep propagating the symbols to the other side modules that would keep on being generated as we provide code through the REPL. That being said llvm (or wasm.cpp) currently shows these linker flags (https://github.com/llvm/llvm-project/blob/9f796159f28775b3f93d77e173c1fd3413c2e60e/clang/lib/Interpreter/Wasm.cpp#L74-L86) Now for sure I can say that the approach above says that the undefined symbols should be resolved at the dlopen step so surely we need to use |
Also in my comment above(#22863 (comment)) , I wrote about an approach
Maybe you could validate if this is a valid approach. So I say that as you mention the script in |
Going through the What we know as of now
Talking about the flags that should end up here ( https://github.com/llvm/llvm-project/blob/9f796159f28775b3f93d77e173c1fd3413c2e60e/clang/lib/Interpreter/Wasm.cpp#L74-L86)
Okay this is confusing According to the approach Each code-block in the repl maps to a side module So cell1 defines and implements sqrt So I guess we need to propagate symbols through the shared libraries being built too ?
NOTE
--no-entry is only added if not a side module, but in our case it is a side module, so not sure if this is necessary in wasm.cpp too ! |
Hmm, so I am basically trying this out (@sbc100)
Maybe you could confirm the last 2 points I raised above (about requirement of export-all and no-entry) |
I am also kinda confused as to whether
I don't see any pie or shared flag here. So I am guessing it is omitted somehow which makes me kinda confused if we even need it there. |
As an update I could atleast get something running when I used the above patch (#22863 (comment)) Obviously lot of stuff doesn't work just yet ![]() So my doubts related to the flags that are neccesaary still prevail (#22863 (comment)) especially the ones with Also can't thank you enough for the help. This atleast gets me started !!! |
Gentle ping @sbc100. Would be really helpful if I could know your thoughts/comments on the approach/ideas I have written above ! |
I would download the compiled module ( |
Ahh yess this is obviously one doubt I have but more importantly I kinda need to have your views on some fundamental doubts. Once we have more info here, we can proceed to reviewing the wasm binaries
Once I have info on these things, we can look into other stuff. Also one thing I am curious about is if you had a chance to look at llvm/llvm-project#114651 |
I have been experimenting around and have quite some stuff to discuss regarding the wasm binaries being generated (the first couple might be fine after which the subsequent I think I shall be in a much better position to discuss this once I get answers for the above questions !
I don't get a incr_module_1.wasm but a incr_module_2.wasm I shall share more soon! |
Cc @sbc100 |
Another error I am curious about is this. So even though I can run the REPL, I see this Now xcpp.js tells me the following
Now I have always been using FORCE_FILESYSTEM (hence I think I shouldn't be seeing this error right ?) My link.txt shows me this
Not sure if I am missing something |
If you are building with Have symbol names would allow you to see where the call to |
Hey Sam, Thanks a lot for the reply, shall look into it. I am also looking forward to your reply on a couple of my comments above especially (#22863 (comment)) |
Hey @sbc100, We had the above changes merged related to the linker flags merged (llvm/llvm-project#116735) But that being said I have a doubt now that I can get stuff to run (obviously if you could answer #22863 (comment) that would be great as well) So if you see the output, you can see duplication of output from cell 1 to cell 2 The wasm modules being generates show straightaway why we have the duplication.
So So there are two things that go on here , So the error is out of this function So what I can see is that the 2nd module loads the 1st one again or maybe the linker combines the two modules. So not sure of a fix here because not sure if a flag might be promoting this behaviour but I think the error is out of either
If It is the link step at fault here. What I think needs to be done is after every dlopen call, we need to remove the llvm IR (or rather any sort of static initialization) I guess, so that we are good before the next linkage occurs ? |
To solve the above I thought that we could make So I framed this
And placed a call for after the dlopen step (https://github.com/llvm/llvm-project/blob/32da1fd8c7d45d5209c6c781910c51940779ec52/clang/lib/Interpreter/Wasm.cpp#L91C1-L97C4)
This results into the following It says
So I am not sure if my thinking is correct here but I thought what if once we've made use of a module, we can just get rid of this |
Is the notebook style execution supposed to be a program that you are incrementally building? Each time you add new code are you building a completely new program or building on top of the old one? If you are building on top of the new one then it makes sense that the new program (which includes both of the old and new code) would execute the static constructors from both the old and the new program (i.e. it represents the sum of all snippets). How does this notebook work when you execute on non-wasm platforms? Or doesn't it? |
Hey @sbc100 thanks for your reply Giving you some more context (I think I gave some in the past but reiterating)
So basically as the approach I pasted above says
Now although we fixed most of the linker args for the linking step and also the loading step through dlopen looks correct the error I face as also pointed out in the above comment (#22863 (comment)) is that I see the content being duplicated from the previous modules into the latest modules. So basically if we have two repl's/ code block
The desired out for the shared wasm binary generated from block 2 is to only have Now my thought here was what if we remove the static initializers from the module once it has been loaded using dlopen, hence I came up with
So that I could call |
Yeah exactly, we build/load on top of the old module. So the point is obviously we want to use global variables and function declarations and other stuff from the symbol table or memory but we don't want to execute the function body that defined in the previous cell blocks !!! Is there a way to achieve this (through a linker flag or a custom logic like what I did) . I thought my logic would work as expected but it didn't ! |
I pasted the code above for the module being generated out of the second code block
Can we get to this
Basically using stuff like |
If you are putting the top level code from each cell into a static constructor, then how to you avoid this issue on native platforms? i.e. how do you avoid running that static constructors from each cell each time you run the program? I guess my point is that Wasm should be no different to native platforms in how it runs or doesn't run static constructors in the LLVM IR. I doubt you want to me trying to manually prune static constructors from the IR, unless that is the solution you also using on native platforms? |
The fact that both of these constructors are included in the IR means you much be including both blocks of code right? Are you compiling each block to its own object file? Or just creating one big object file? |
Hmmm well for native/non-wasm cases we don't need to do anything. So we use clang-repl in the background and for native cases I think clang-repl uses the LLVM JIT but for the wasm case, it uses the above approach through the linker and dynamic loading through dlopen . So every thing boils down to the addModule function
So there are 2 different incremental executor responsible for addition of modules. In the first case the lllvm JIT takes care of everything (through addMoudle and runctors I suppose) but I don't think the same is being done through the Linker approach for wasm cases. So the important point here is xeus-cpp is only providing the frontend (a notebook and a kernel to run stuff) . All heavy lifiting is being done by clang-repl .... so we don't do anything special from our side to cater to any build platform. |
I mean yeah both blocks of code are being included. We just want 1 main module and each code block should give us a side module that we keep loading on top of the main one as we run code in the repl. So just each code block maps a separate wasm shared binary ! Now if you see the linker and also the flags present in dlopen
you can see we use
Exactly what I am trying to have your views on ;)
So can we do something (maybe through a linkerarg passed to wasm-ld or changing the way lld::wasm::link works or maybe through the method I suggested to get rid of the the global initializers after the dlopen) to avoid the duplication ?! |
It doesn't sound to me like that approach your are taking will work since you will end up with side module N include all the code from the N-1 other side modules. This includes global data and static constructors. You could try to had the top level code for the previous N-1 modules when compiling module N, but its does at least needs to see all the declarations and types from the N-1 previous modules. For example imagine if the fist module declare some global data and mutates in the static constructor (top level code).
If the next module duplicates this code it would get is own copy of data and you would want to rerun the constructors to ensure the value of
The approach sounds really tricky though. Perhaps it might be worth attempting to use the same IncrementalExecutor.cpp method using LLVM JIT? That way the solution will be closer to one that you know already works. |
Screen.Recording.2024-12-04.at.11.03.06.AM.movThanks a lot for the discussion here. I finally got clang-repl running completely in browser. Sharing a screen recording .. not sure how clear it turns out to be though (I think making it full screen makes it much better) |
Hi,
I am facing this issue
I am not planning on missing out any function/symbol (hence providing
EXPORT_ALL=1
to em++ and--export-all
to wasm-ld)When building with
EMCC_DEBUG=1 emmake make -j16 install VERBOSE=1
, it generates a file (/var/folders/m1/cdn74f917994jd99d_2cpf440000gn/T/emscripten_temp/tmp6qr4um78.json
) which is giving me more info on the exported functions. I see a lot of them hence wanted to confirm if this is due to the huge list of functions I am interested in and is there a way to get past this error ?I've pasted the contents of the file here for anyone interested (https://raw.githubusercontent.com/anutosh491/xeus-cpp-lite-debug/refs/heads/main/tmp6qr4um78.json)
The text was updated successfully, but these errors were encountered: