Skip to content

Avoid scalarization in _mm_madd_epi16 #13454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2021

Conversation

kleisauke
Copy link
Collaborator

@kleisauke kleisauke commented Feb 9, 2021

The i32x4.dot_i16x8_s instruction was merged into the SIMD proposal with WebAssembly/simd#127.

This instruction is already supported in Chrome stable. On the latest versions of Node, it can be tested with the --wasm-simd-post-mvp V8 flag. See for example:

$ cat >test.c <<EOL
#include <stdio.h>
#include <string.h>
#include <wasm_simd128.h>

void print128_num(v128_t var) {
  uint16_t val[8];
  memcpy(val, &var, sizeof(val));
  printf("%i %i %i %i %i %i %i %i\n",
         val[0], val[1], val[2], val[3],
         val[4], val[5], val[6], val[7]);
}

int main() {
  v128_t i = wasm_i16x8_splat(1);
  print128_num(__builtin_wasm_dot_s_i32x4_i16x8(i, i));
  return 0;
}
EOL
$ emcc -msimd128 test.c -otest.js
$ node -v
v15.8.0
$ node --experimental-wasm-simd --wasm-simd-post-mvp test.js
2 0 2 0 2 0 2 0

@kripken kripken requested a review from tlively February 12, 2021 18:36
@kleisauke
Copy link
Collaborator Author

I just noticed that the --wasm-simd-post-mvp V8 flag is not required anymore since commit v8/v8@01b8b3e (V8 v8.8.101). A alternative would be to not guard it with an ifdef, but Node 14.15.5 (latest LTS release) depends on V8 v8.4.371.19 (and that commit isn't cherry-picked).

Note that on the latest version of Chrome (v88) this just works with the --experimental-wasm-simd flag only (which is in origin trial).

@tlively
Copy link
Member

tlively commented Feb 12, 2021

Thanks for the PR! SIMD will be standardized very soon and at that point I will be removing the experimental-simd128 feature from clang at that point and only engines with up-to-date implementations of the standard will be supported. Since this is already supported in Chrome stable, I think the best option here would be skip the ifdef.

@kleisauke
Copy link
Collaborator Author

OK, I skipped the ifdef and updated the PR description. I'm not blocked, so you could merge this PR after SIMD is standardized, if that's more convenient.

@kleisauke
Copy link
Collaborator Author

... rebased on top of the master branch to (hopefully) resolve the test failures.

@tlively
Copy link
Member

tlively commented Feb 16, 2021

Great, thanks for the update. Would you mind updating the docs on instruction performance to reflect this change as well?

@kleisauke
Copy link
Collaborator Author

I updated the docs, and was able to reproduce the test_sse* failures locally. Commit 952d765 resolves this, but I'm not sure what triggered it (perhaps the use of a __builtin_ function?).

Here's the salient part of the log:

Details
In file included from /root/project/tests/sse/test_avx.cpp:10:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/immintrin.h:11:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/avxintrin.h:14:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/nmmintrin.h:14:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/smmintrin.h:14:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/tmmintrin.h:14:
In file included from /root/emsdk/upstream/emscripten/cache/sysroot/include/compat/pmmintrin.h:14:
/root/emsdk/upstream/emscripten/cache/sysroot/include/compat/emmintrin.h:672:52: error: cannot initialize a parameter of type '__attribute__((__vector_size__(8 * sizeof(short)))) short' (vector of 8 'short' values) with an rvalue of type 'v128_t' (vector of 4 'int32_t' values)
  return (__m128i)__builtin_wasm_dot_s_i32x4_i16x8((v128_t)__a, (v128_t)__b);
                                                   ^~~~~~~~~~~
1 error generated.

@tlively
Copy link
Member

tlively commented Feb 16, 2021

Ah, thanks for looking into that. The issue is that the builtin functions expect specifically typed arguments whereas the intrinsics all operate in terms of a generic v128_t. The intrinsic implementations fix this by casting both arguments and results of the builtins and you should be able to do the same fix here.

@tlively tlively merged commit 050b60b into emscripten-core:master Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants