Skip to content

Status & Roadblocks for a portable Emscripten #11175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 5 tasks
jeff-hykin opened this issue May 16, 2020 · 26 comments
Closed
2 of 5 tasks

Status & Roadblocks for a portable Emscripten #11175

jeff-hykin opened this issue May 16, 2020 · 26 comments

Comments

@jeff-hykin
Copy link

jeff-hykin commented May 16, 2020

What this issue is about

I hope for this to be an ongoing collaboration/discussion for creating some form of Emscripten with fully bundled dependencies. Portable meaning a flashdrive inserted into an fresh install of any major OS, with no internet connection, could compile C++ to wasm, and wouldn't break with changes to global versions of system packages. Possibly some other people, along with myself, would like to work on this, but we could use help getting started. (A related discussion was here: #9313 , however I wanted this issue to include portability outside of a browser context)

In theory, the basic job of Emscripten is a purely functional operation; input of one language output to another language. So (very much in-theory) it is possible for that operation to be done in an OS-independent portable way, and even possibly in a browser using WASM. And to be clear "portable" doesn't mean practical or primary, this isn't about removing dependencies from emscripten-master. A 1 hour compile time requiring 32Gb of RAM for a hello world.cpp using an out-of-date emscripten would still count as an initial success despite being extremely less than ideal.

Work Overview

The main dependencies as I understand them are python, LLVM with clang & wasmId, node.js, binaryen, and the emscripten code itself.

Portability

  • emscripten code itself (trivial: can be bundled in a zip)
  • node.js runtime can be bundled into an electron app
  • Python; there are many not-quite-100% ports to node.js/browser such as Pyoide, MicroPython, and skulpt
  • LLVM has an alpha-implementation of a WASM version here
  • binaryen I'm assuming can be treated similar to emscripten, although I do not know for certain

Questions

Did I misunderstand any of the dependencies: Is cmake optional for Mac/Linux even though they're in the OS pre-reqs? Are LLVM and binaryen downloaded to the local emsdk folder at installation time?

What obvious & non-obvious challenges are there for having emscripten use portable versions of Python and LLVM?

Current rough plan

Create a python command that is actually using node.js and pyodide in order to be portable. Use decorators to catch python system calls and manually run them with the correct ENV variables and portable versions of the executables. Perform something similar for LLVM.

@sbc100
Copy link
Collaborator

sbc100 commented May 16, 2020

Doesn't emsdk serve the purpose of providing emscripten fully bundled with all its dependencies? Indeed that seems to be the sole purpose of emsdk. We could do more (for example, I'll looking an bundling python3 with the macOS version in order to avoid users having to install it themselves).

Admittedly, emsdk it only supported 3 operating systems. Do you have specific other platforms you want to run the toolchain on? Is sounds like you are defining portable as "runs in JavaScript", is that right? (Not criticizing this definition, just trying to defined what we you asking for here).

We also the the emscripten docker image which packages up even more dependencies and has even fewer system dependencies: https://hub.docker.com/r/trzeci/emscripten/.

@jeff-hykin
Copy link
Author

jeff-hykin commented May 18, 2020

The portable idea, which I probably should clarify in the first post, would mean you could put it on a flashdrive and plug & play with most Linux distros, MacOS, and Windows 10. Prebuilt binaries would work, but the eventual goal would be to have emscripten be executable on the client side of a website. Node has prebuilt binaries, and could be an intermediary to reach web-JavaScript, but ideally all dependencies would eventually be compiled to Wasm.

Emscripten, especially with docker, is perfectly easy to setup for most devs. My use case is needing WASM compilation in an app with a non-technical audience that wouldn't be comfortable installing, or even checking a version of python. Beyond that it would be desirable to have the global version of python be independent of the one used by emscripten.

@juj
Copy link
Collaborator

juj commented May 18, 2020

Emscripten SDK is designed to be portable for flashdrive plug&play use, but unfortunately not "all-in-one" for all OSes in one go. You could get around this by preparing a USB flash drive with different emsdk_mac/, emsdk_win/ and emsdk_linux/ directories.

On Windows Emsdk depends operates by bundling python 3, because Windows does not offer a system python installation. On Linux, a system python 3 installation is expected, and on macOS, a a system python 2 installation is expected.

@jeff-hykin
Copy link
Author

jeff-hykin commented May 18, 2020

@juj that general approach would work. How would be best to target different OS's and create those three folders? Is there a way to perform the install command beforehand, and target each OS from a single OS.

If a standalone python executable was created, how might I go about directing emscripten to use it instead of looking for a global one?

@juj
Copy link
Collaborator

juj commented May 18, 2020

Is there a way to perform the install command beforehand, and target each OS from a single OS.

Emsdk itself does not have a cross-compilation architecture. However if you build manually, you would be able to cross-compile to each OS, as long as you set up cross-compiler archs accordingly.

If a standalone python executable was created

The python installation that emsdk ships for Windows is a portable/standalone python installation. You can look at the emsdk_manifest.json files on how it solves that, and reuse the same scheme for other OSes.

Instead of editing emsdk_manifest.json for those OSes, check how emsdk precreates the .emscripten config file and the env. vars for Windows, which locate the portable python installation. The same scheme can be used for the other OSes, it will work identically there.

@sbc100
Copy link
Collaborator

sbc100 commented May 18, 2020

If you want to how we bundle the windows python in the emsdk I wrote this little script that generates it:

https://github.com/emscripten-core/emsdk/blob/master/scripts/update_python.py

@kripken
Copy link
Member

kripken commented May 18, 2020

@jeff-hykin

I'd love to see this happen!

LLVM and Binaryen are the easy parts. There are ports of both to js+wasm (using node.js file access), although the LLVM side is less polished, but it could be. A week or so ago I helped an LLVM-using project port itself to wasm, it took only about a day, although we did add a bunch of hacks along the way.

If you're fine running node.js in Electron, that sounds fine. (I am personally also interested in a 100% Web solution, which means no Node, or limited functionality.)

The bigger issue is Python. There are ports of it, and they work, but really just the pure computation side. Emscripten uses python to do things like file access, network access, multiprocessing, etc. Those things are harder to port. I see 2 main routes here:

  1. Do the hard work to port them. For example, find whatever system calls python ends up doing to get its multiprocess support (popen etc.?) and map those to Node.js APIs. This might not be easy, and not all APIs might be available in Node. But there can't be that many APIs here, so just trying might be the best thing.
  2. Rewrite our python code to something else. Node.js was extremely new back when emscripten started, so we couldn't consider it, and python was the easy option. But node.js would be fine today. This would mean converting several thousand lines of code, but not all of it would be necessary immediately. Also I've wondered if some kind of script could automatically convert 90% of the syntax properly, and leave 10% to humans (luckily neither of the two languages declares types!).

I wish I had time for these myself, but I can help someone else get started and with any questions!

@jeff-hykin
Copy link
Author

The python installation that emsdk ships for Windows is a portable/standalone python installation.

@juj This is fantastic to hear, I had no idea emsdk was shipped with a portable python for windows.

Instead of editing emsdk_manifest.json for those OSes, check how emsdk precreates the .emscripten config file and the env. vars for Windows, which locate the portable python installation. The same scheme can be used for the other OSes, it will work identically there.

Thanks for the info, this is definitely enough that I can get started with. And thanks @sbc100 that is also really helpful!

@jeff-hykin
Copy link
Author

@kripken your news is also great to hear. Do you have any links to the LLVM repo you worked on? I reached out to the author of the clang-wasm demo, but the repo was still very alpha (the iostream header doesn't even work).

The python multiprocessing will certainly be very hard to port, I didn't realize that was part of the code base. I'll be exploring all the ways; portable python, auto-convert and fix bugs, or manual translation from python to node. For fun sometime I'll probably make a rough python-to-javascript syntax converter although it'll probably be more like 75% instead of 90% hands-free conversion. The multiprocessing code will be the real work in almost any case though.

@sbc100
Copy link
Collaborator

sbc100 commented May 20, 2020

If I were you I would start with wask-sdk. Its just one or two standalone binaries. You will learn a lot in the process and, if you are successful in building a version that suites your needs then you might consider extending the project to emscripten which is strictly harder (by several orders of magnitude IMHO).

@kripken
Copy link
Member

kripken commented May 20, 2020

@jeff-hykin

Do you have any links to the LLVM repo you worked on?

It's not my project, and I'm not sure they want it shared widely yet, sorry. (But there isn't much there that would help you - basically, the port was just: run emcmake cmake and then make, and add a few hacks to avoid missing features.)

For fun sometime I'll probably make a rough python-to-javascript syntax converter

Btw, I did some searching meanwhile, and found JavaScripthon. It looks like a pretty serious effort at translating Python to JavaScript, so might be worth looking at!

@jeff-hykin
Copy link
Author

jeff-hykin commented May 23, 2020

After getting more familiar with the codebase, I should partially apologize as I should've done a better job reading the readme. I can see why my first few comments on portability were confusing since I was misunderstanding the dependencies. You guys are really friendly despite it not making too much sense! (I meant to post this issue to emsdk instead of core)

To my defense, I think there is a subtle issue on emscripten.org, which I was using instead of the readme. The Download and Install page thats linked by Google has this:

Installation instructions

First check the Platform-specific notes below and install any prerequisites.


And following that link you reach:

Windows

  1. Install Python 2.7.12 or newer (older versions may not work due to a GitHub change with SSL).

Followed by

macOS

If you use MacOS 10.13.3 or later then you should have a new enough version of Python installed (older versions may not work due to a GitHub change with SSL). Otherwise you can manually install and use Python 2.7.12 or newer.

These instructions explain how to install all the required tools.


Following that "all the required tools" link ends up here:

Emscripten tools and dependencies

In general a complete Emscripten environment requires the following tools. First test to see if they are already installed using the instructions below.

  • Node.js (0.8 or above; 0.10.17 or above to run websocket-using servers in node):
  • Python 2.7.12 or above, or Python 3.5 or above (Python 2.7.0 or newer may also work, but is known to have SSL related issues, emsdk install failing #6275)
  • Java (1.6.0_31 or later). Java is optional. It is required to use the Closure Compiler (in order to minify your code).
  • Git client. Git is required if building tools from source.
  • LLVM (LLVM, including clang and wasm-ld)
  • Binaryen (Binaryen, including wasm-opt, wasm-emscripten-finalize, etc.)
  • The Emscripten code, from GitHub

Which seems like the normal requirements since there's explicit statements about building from source like "Git is required if building tools from source." But upon closer inspection I see now that the whole list is under the building from source page.

Thats why I was really surprised @juj when you mentioned bundled python with windows (since emscripten.org said to install python ≥ 2.7.12 on Windows).

@jeff-hykin
Copy link
Author

jeff-hykin commented May 23, 2020

So to revise the issue: 99% of my needs are met! (Even if it's still far away from a web implementation) I'm really impressed how you guys bundle everything, I can't remember the last time I've used a tool that didn't need global non-preinstalled dependencies. I've been getting started on the last 1%, which is to have a single folder corresponding to a specific version of emscripten where ./emsdk/**/emcc file.c works without needing to run any setup commands, and have it where no ~/.emscripten is created/needed.

I'm still interested in reducing the dependencies as much as possible, but it will be probably be awhile before I put significant work into converting the whole code base to node or compile the LLVM itself to WASM/WASI. Running it with a portable linux python will probably be my next goal, since I'm working on building my own standalone tool on top of WASM that some might end up be run inside a docker container.

@jeff-hykin
Copy link
Author

jeff-hykin commented May 23, 2020

@kripken thanks for sharing JavaScripthon, I didn't see that in my search. It looks really interesting, I might use it on other projects as well.

Node.js was extremely new back when emscripten started, so we couldn't consider it, and python was the easy option. But node.js would be fine today.

Is there any interest in slowly moving away from python? Either towards node, or towards something like Rust/C++ that can be compiled to WASM.

@kripken
Copy link
Member

kripken commented May 26, 2020

Is there any interest in slowly moving away from python? Either towards node, or towards something like Rust/C++ that can be compiled to WASM.

Yes, I think that would be good to explore (I don't have a strong opinion between the various options, each has advantages). Would be great if someone experimented with this, something like getting "hello world" to work with an emcc.py replacement.

@jeff-hykin
Copy link
Author

jeff-hykin commented May 28, 2020

That's good to hear. In that case I may rewrite some files when I'm working to understand them.

I've coincidentally been working on something that should help; a Node recreation of a library for scripting tasks. It seems like a decent chunk of the python is simple tasks like zip/unzip, temp dirs, or converting between windows/unix filepaths.

I found the mention of the "embedded" mode in the code and took advantage of that. I didn't see anything about the embedded mode in the documentation though. I fully built a emsdk_windows, and emsdk_mac folder. I modified some of the code in shared.py to get it working without absolute paths in .emscripten. I tested the USB drive on a Mac with a different OS version and it worked! I'll have access to a separate windows machine for testing on Saturday.

The only issue on the Mac test, was that it rebuilds the cache on the first run even though the cache was already built. It would be really nice to not have to rebuild it since it takes a good 10min, but I'm not sure if that's worth forcing: is the cache dependent on OS-version/hardware architecture? I could probably force it to check the relative path, but it's not worth pursuing if it's going to result in a corrupt cache.

@sbc100
Copy link
Collaborator

sbc100 commented May 28, 2020

If you install with "./emsdk --embedded" the resulting tree should be completely portable/movable.
It works by using relative paths in the config file (it does this by reading the EM_CONFIG environment variable). I'm hoping to make the embedded options the default very soon.

If you install emscripten with "emsdk" the cache should these days come fully populated.

The cache should platform independent, although as it happens the emsdk builders for each OS build their own cache. I would be interesting to figure out why the case is being cleared. EMCC_DEBUG=1 might help you here.

@jeff-hykin
Copy link
Author

@sbc100 Thanks for the info. I'm using emsdk so I'll figure out what's going on with that debugging tip. Thanks also for the brief explanation of the embedded option. I (of course) found the documentation immediately after reading your comment, so I'll be spending a lot more time reading.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 1, 2020

Embedded is now the default mode for emsdk as of a couple of days ago!

@jeff-hykin
Copy link
Author

(did you mean now? Instead of not)

@sbc100
Copy link
Collaborator

sbc100 commented Jun 1, 2020

Yes! :)

@jeff-hykin
Copy link
Author

whoop! 🎊 🚀

@npip99
Copy link

npip99 commented Oct 3, 2020

It would be nice to see emscripten as a complete .wasm file / a single executable, that can take C++ as input and output wasm, right now I have a desire to get emscripten working on the web but it appears that this may not be possible. I would use pkg as opposed to electron though. Electron is for porting the entire browser environment, pkg is for only nodejs apps.

@jeff-hykin
Copy link
Author

jeff-hykin commented Nov 5, 2021

@sbc100
Copy link
Collaborator

sbc100 commented Nov 5, 2021

Wow! That is very impressive indeed.

@kripken
Copy link
Member

kripken commented Nov 8, 2021

Thanks for mentioning that @jeff-hykin !

As mentioned in #6432 (comment) , I think we can close this now. See notes there about possible followup issues.

@kripken kripken closed this as completed Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants