Skip to content

Fix the WATT_JIT feature #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 15, 2022
Merged

Conversation

alexcrichton
Copy link
Collaborator

I was curious to see the impact of Wasmtime's recent development since I
last added the WATT_JIT env var feature to watt a few years ago
since quite a lot has changed about Wasmtime in the meantime. The
changes in this PR account for some ABI changes which have happened in
the C API which doesn't account for anything major.

Taking my old benchmark of #[derive(Serialize)] on
struct S(f32, ... /* 1000 times */) the timings I get for the latest
version of serde_derive are:

native watt watt (cached)
debug 156ms 280ms 125ms
release 70ms 257ms 100ms

Using instead #[derive(Serialize)] struct S(f32) the timings I get are:

native watt watt (cached)
debug 1ms 241ms 41ms
release 387us 205ms 46ms

So for large inputs jit-compiled WebAssembly can be faster than the
native serde_derive when serde is itself compiled in debug mode. Note
that this is almost always the default nowadays since cargo build --release will currently build build-dependencies with no
optimizations. Only through explicit profile configuration can
serde_derive be built in optimized mode (as I did to collect the
above numbers).

The watt (cached) column is where I enabled Wasmtime's global
compilation cache to avoid recompiling the module every time the
proc-macro is loaded which is why the timings are much lower. The
difference between watt and watt (cached) is the compile time of the
module itself. The 40ms or so in watt (cached) is almost entirely
overhead of loading the module from cache which involves decompressing
the module from disk and additionally sloshing bytes around. More
efficient storage mediums exist for Wasmtime modules which means that it
would actually be pretty easy to shave off a good chunk of time from
that. Additionally Wasmtime has a custom C API which significantly
differs from the one used in this repository which would also be
significantly faster for calling into the host from wasm. Of the current
~3ms runtime in wasm itself that could probably be reduced further with
more optimized calls.

Overall this seems like pretty good progress made on Wasmtime in the
interim since all my initial work in #2. In any case I wanted to post
this to get the WATT_JIT feature at least working again since
otherwise it's segfaulting right now, and perhaps in the future if
necessary more perf work can be done!

I was curious to see the impact of Wasmtime's recent development since I
last added the `WATT_JIT` env var feature to `watt` a few years ago
since quite a lot has changed about Wasmtime in the meantime. The
changes in this PR account for some ABI changes which have happened in
the C API which doesn't account for anything major.

Taking my old benchmark of `#[derive(Serialize)]` on
`struct S(f32, ...  /* 1000 times */)` the timings I get for the latest
version of `serde_derive` are:

|         | native | watt  | watt (cached) |
|---------|--------|-------|---------------|
| debug   | 156ms  | 280ms | 125ms         |
| release |  70ms  | 257ms | 100ms         |

Using instead `#[derive(Serialize)] struct S(f32)` the timings I get are:

|         | native | watt  | watt (cached) |
|---------|--------|-------|---------------|
| debug   |  1ms   | 241ms | 41ms          |
| release |  387us | 205ms | 46ms          |

So for large inputs jit-compiled WebAssembly can be faster than the
native `serde_derive` when serde is itself compiled in debug mode. Note
that this is almost always the default nowadays since `cargo build
--release` will currently build build-dependencies with no
optimizations. Only through explicit profile configuration can
`serde_derive` be built in optimized mode (as I did to collect the
above numbers).

The `watt (cached)` column is where I enabled Wasmtime's global
compilation cache to avoid recompiling the module every time the
proc-macro is loaded which is why the timings are much lower. The
difference between `watt` and `watt (cached)` is the compile time of the
module itself. The 40ms or so in `watt (cached)` is almost entirely
overhead of loading the module from cache which involves decompressing
the module from disk and additionally sloshing bytes around. More
efficient storage mediums exist for Wasmtime modules which means that it
would actually be pretty easy to shave off a good chunk of time from
that. Additionally Wasmtime has a custom C API which significantly
differs from the one used in this repository which would also be
significantly faster for calling into the host from wasm. Of the current
~3ms runtime in wasm itself that could probably be reduced further with
more optimized calls.

Overall this seems like pretty good progress made on Wasmtime in the
interim since all my initial work in dtolnay#2. In any case I wanted to post
this to get the `WATT_JIT` feature at least working again since
otherwise it's segfaulting right now, and perhaps in the future if
necessary more perf work can be done!
Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you.

@dtolnay dtolnay merged commit 9435d3f into dtolnay:master Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants