Debugging kernel creation failure on Intel GPU w/Beignet driver (tl;dr use Intel Neo unified driver) #82

lissyx · 2017-12-22T00:52:45Z

Trying to get OpenCL builds on top of TensorFlow, I am running into that kind of failure:

alex@portable-alex:~/tmp/deepspeech/sycl_eigen_hack$ LC_ALL=C ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt
2017-12-21 01:31:01.407356: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Platform name intel gen ocl driver
2017-12-21 01:31:01.476690: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:66] Found following OpenCL devices:
2017-12-21 01:31:01.476719: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:68] id: 0, type: GPU, name: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2, vendor: Intel, profile: FULL_PROFILE
One module without kernel function!
terminate called after throwing an instance of 'cl::sycl::cl_exception'
  what():  Error: [ComputeCpp:RT0101] Failed to create kernel ((Kernel Name: SYCL_cac2b3592d2272412db5415963f17f08_0))
Abandon (core dumped)
alex@portable-alex:~/tmp/deepspeech/sycl_eigen_hack$

I had to force-enable support for Intel GPU's with Beignet, as suggested on #78, but I am seeing the very same error (same kernel name, same error code) on a NVIDIA GTX1080 card. Now, I understand that both are not really expected to work, as documented here: #78 (comment) for NVIDIA, and the blacklist was likely here for a reason.

However, i'd like to dig more and understand better why this is failing, especially in case it is not related to hardware / driver support but rather to the model itself. So far, looking into the SDK/source for this RT0101 error code was not helpful at all, and I could not find anything documenting how to debug further ComputeCpp kernel creations.

Thanks for any debugging pointers, docs and tips :)

The text was updated successfully, but these errors were encountered:

DuncanMcBain · 2017-12-22T02:04:54Z

Hi @lissyx, as I might have mentioned in the other thread (and I'm sorry if I didn't), the reason this is likely not working is because our Beignet was having some issues parsing the output from our compiler reliably. What OS are you using? Are you using a package-manager version of Beignet, or did you compile it yourself? A way to test whether this is a Tensorflow-specific problem would be to try to run the samples from this repository. If any of them fail with the same error, then my guess is that it's a Beignet incompatibility.

For what it's worth, there is a document in the ComputeCpp package that has descriptions (if brief) for all the error codes. Often ComputeCpp is simply "passing on" an error from the OpenCL implementation and cannot give any more information (like in this case - the "One module without kernel function" output is actually from Beignet.

Would you be able to try Intel's closed source GPU driver instead?

lissyx · 2017-12-22T08:21:14Z

Thanks!
So my system is Ubuntu 17.10:

beignet 1.3.1-1 from packages
kernel 4.13.0-19-generic from packages

I might be able to try the closed source driver, but I don't want to mess too much with my system so far.

I read some things about SPIR-32 vs SPIR-64 ? I saw some bugs referring to that, but unluckyli, trying to build TensorFlow with SYCL support and forcing SPIR-32 would fail in an unexpected way within SYCL headers about some redefinition of int, include/SYCL/predefines.h:431 (ComputeCpp 0.4.0).

I'll give a closer look at the samples as suggested :)

lissyx · 2017-12-22T08:36:48Z

Building against ComputeCpp 0.4.0 somehow fails on my system, using GCC 5.4 and 6.4:

alex@portable-alex:~/tmp/deepspeech/computecpp-sdk/build-0.4.0$ CC=gcc-6 CXX=g++-6 cmake ../ -DCOMPUTECPP_PACKAGE_ROOT_DIR=/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit/
-- The C compiler identification is GNU 6.4.0
-- The CXX compiler identification is GNU 6.4.0
-- Check for working C compiler: /usr/bin/gcc-6
-- Check for working C compiler: /usr/bin/gcc-6 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-6
-- Check for working CXX compiler: /usr/bin/g++-6 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- host compiler - gcc 6.4.0
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.0") 
-- ComputeCpp package - Found
-- compute++ - Found
-- computecpp_info - Found
-- ComputeCpp runtime: /home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so - Found
-- ComputeCpp includes - Found
-- Package version - CE 0.4.0
-- compute++ flags - -O2 -mllvm -inline-threshold=1000 -sycl -emit-llvm -intelspirmetadata
-- platform - your system can support ComputeCpp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alex/tmp/deepspeech/computecpp-sdk/build-0.4.0
alex@portable-alex:~/tmp/deepspeech/computecpp-sdk/build-0.4.0$ make -j1
[  1%] Built target accessors_accessors.cpp_0_ih
[  3%] Building CXX object samples/accessors/CMakeFiles/accessors.dir/accessors.cpp.o
[  4%] Linking CXX executable accessors
[  4%] Built target accessors
[  6%] Building ComputeCpp integration header file /home/alex/tmp/deepspeech/computecpp-sdk/build-0.4.0/samples/async-handler/async-handler.cpp.sycl
/home/alex/tmp/deepspeech/computecpp-sdk/samples/async-handler/async-handler.cpp:123:13: error: no matching constructor for initialization of 'cl::sycl::context'
    context myContext(myDevice);
            ^         ~~~~~~~~
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:85:12: note: candidate constructor not viable: no known conversion from 'cl::sycl::device' to 'async_handler'
      (aka 'std::function<void (cl::sycl::exception_list)>') for 1st argument
  explicit context(async_handler asyncHandler = nullptr);
           ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:92:3: note: candidate constructor not viable: no known conversion from 'cl::sycl::device' to 'cl_context' (aka '_cl_context *') for 1st argument
  context(cl_context context, async_handler asyncHandler = nullptr);
  ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:145:3: note: candidate constructor not viable: no known conversion from 'cl::sycl::device' to 'const cl::sycl::context' for 1st argument
  context(const context &rhs);
  ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:202:12: note: candidate constructor not viable: no known conversion from 'cl::sycl::device' to 'cl::sycl::detail::context *' for 1st argument
  explicit context(cl::sycl::detail::context *detail);
           ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:203:12: note: candidate constructor not viable: no known conversion from 'cl::sycl::device' to 'dcontext_shptr'
      (aka 'std::shared_ptr<cl::sycl::detail::context>') for 1st argument
  explicit context(dcontext_shptr detail);
           ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:103:3: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
  context(const device_selector &deviceSelector,
  ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:115:3: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
  context(const device &dev, info::gl_context_interop interopFlag,
  ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:127:3: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
  context(const platform &plt, info::gl_context_interop interopFlag,
  ^
/home/alex/tmp/deepspeech/ComputeCpp-CE-0.4.0-Ubuntu-16.04-64bit//include/SYCL/context.h:138:3: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
  context(vector_class<device> deviceList, info::gl_context_interop interopFlag,
  ^
1 error generated.
samples/async-handler/CMakeFiles/async-handler_async-handler.cpp_0_ih.dir/build.make:61 : la recette pour la cible « samples/async-handler/async-handler.cpp.sycl » a échouée
make[2]: *** [samples/async-handler/async-handler.cpp.sycl] Erreur 1
CMakeFiles/Makefile2:289 : la recette pour la cible « samples/async-handler/CMakeFiles/async-handler_async-handler.cpp_0_ih.dir/all » a échouée
make[1]: *** [samples/async-handler/CMakeFiles/async-handler_async-handler.cpp_0_ih.dir/all] Erreur 2
Makefile:140 : la recette pour la cible « all » a échouée
make: *** [all] Erreur 2

But using 0.5.0, it completed the build. The sad thing is that I'm building TensorFlow with 0.4.0 because the build fails with 0.5.0 :):

alex@portable-alex:~/tmp/deepspeech/computecpp-sdk/build-0.5.0$ ./samples/hello-world/hello-world 
Running on Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
One module without kernel function!
terminate called after throwing an instance of 'cl::sycl::exception'
Abandon (core dumped)

So I guess that matches the "it's your driver" path :[

rodburns · 2017-12-22T11:38:03Z

@lissyx You are seeing those errors with v 0.4.0 because of a mismatch between the ComputeCpp compiler and your path. How do you have LD_LIBRARY_PATH set up?

lissyx · 2017-12-22T12:10:42Z

@rodburns I made no change to LD_LIBRARY_PATH, but I don't think it's a big deal, given that 0.5.0 builds and explodes with the same kind of error as TensorFlow, it's enough I guess to know that it's more likely to be related to the driver than anything else, which is all I needed to know so far :)

DuncanMcBain · 2017-12-22T12:47:18Z

There have been interface changes in ComputeCpp v0.4.0 to v0.5.0, matching the changes in the specification versions 1.2 and 1.2.1. All Tensorflow branches after a certain point (roughly two weeks ago) will no longer build with old versions of ComputeCpp and vice-versa. You might get more success with compiling Beignet on your own machine, but it's unlikely.

lissyx · 2017-12-22T12:48:44Z

@DuncanMcBain Thanks, I suspected that kind of stuff, but since there was nothing written for sure I could only be speculating. I'll try and test the intel closed source driver to verify :)

lissyx · 2017-12-22T13:13:07Z

Turns out that one can get quite some debugging out of Beignet, by tracking those macros use: BVAR, FVAR, IVAR and SVAR. They are defined here: https://github.com/intel/beignet/blob/8efa803f2f93e377b30ff957a74c5d69beec7744/backend/src/sys/cvar.hpp#L61-L77

DuncanMcBain · 2018-01-05T13:55:45Z

Hi @lissyx, did you make any progress with this? I'm afraid we can't really support Beignet, but if the problem reproduces on the other Intel drivers we can look into that. Thanks!

lissyx · 2018-01-05T14:03:47Z

Thanks for pinging, sadly I had no time, between bad recovery from jetlag and shoulder issue, but it's still on my radar for sure :)

DuncanMcBain · 2018-01-05T14:05:22Z

OK, that's fine. Let us know if you find out anything else! :)

lissyx · 2018-02-09T15:53:51Z

@DuncanMcBain Hello, I still had no time for that, but I might be able to hack around soonish. Now, I wonder what version of ComputeCpp I should test. It seems that even right now, tensorflow/master does not builds with 0.5.0 nor 0.5.1. Am I going to waste my time ?

DuncanMcBain · 2018-02-09T17:05:26Z

At the moment, we would recommend either dev/eigen_mehdi or dev/amd_gpu. I am not sure which of these two should perform better, but either should work with your setup (and ComputeCpp 0.5.1!)

lissyx · 2018-02-09T17:30:59Z

Right, https://github.com/lukeiwanski/tensorflow/tree/dev/amd_gpu seems to be the most uptodate. I'll give it a try, thanks!

BTW, are you planning on merging this upstream? And if so, do you have any ETA ?

DuncanMcBain · 2018-02-09T17:36:13Z

Slight clarification: AMD GPU is a branch which is focussing particularly on AMD performance, so I'm not sure what it will look like on Intel.

@lukeiwanski can say more about our upstreaming plans - I'm not entirely sure!

lissyx · 2018-02-12T10:30:21Z

@DuncanMcBain Okay, so far I could not get anything to build, it is failing about AVX somehow:

  external/local_config_sycl/crosstool/computecpp -fPIE -fno-omit-frame-pointer -Wall -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-mtune=generic' '-march=x86-64' -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx '-fvisibility=hidden' '-std=c++11' -MD -MF bazel-out/local_linux-opt/bin/native_client/_objs/generate_trie/native_client/kenlm/util/integer_to_string.d '-frandom-seed=bazel-out/local_linux-opt/bin/native_client/_objs/generate_trie/native_client/kenlm/util/integer_to_string.o' '-DKENLM_MAX_ORDER=6' -iquote . -iquote bazel-out/local_linux-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local_linux-opt/genfiles/external/bazel_tools -isystem native_client/kenlm -isystem bazel-out/local_linux-opt/genfiles/native_client/kenlm -isystem native_client/boost_locale -isystem bazel-out/local_linux-opt/genfiles/native_client/boost_locale -isystem external/bazel_tools/tools/cpp/gcc3 '-std=c++11' -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c native_client/kenlm/util/integer_to_string.cc -o bazel-out/local_linux-opt/bin/native_client/_objs/generate_trie/native_client/kenlm/util/integer_to_string.o).
In file included from /tmp/integer_to_string-d7ca47.sycl:14:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit//include/SYCL/common.h:21:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit//include/SYCL/include_opencl.h:33:
In file included from /usr/include/CL/cl.h:35:
In file included from /usr/include/CL/cl_platform.h:518:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/immintrin.h:40:
/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/pmmintrin.h:45:19: error: unknown type name '__m128i'; did you mean 'util::__m128i'?
static __inline__ __m128i __DEFAULT_FN_ATTRS
                  ^
/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/emmintrin.h:30:19: note: 'util::__m128i' declared here
typedef long long __m128i __attribute__((__vector_size__(16)));
                  ^
In file included from /tmp/integer_to_string-d7ca47.sycl:14:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit//include/SYCL/common.h:21:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit//include/SYCL/include_opencl.h:33:
In file included from /usr/include/CL/cl.h:35:
In file included from /usr/include/CL/cl_platform.h:518:
In file included from /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/immintrin.h:40:
/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/pmmintrin.h:45:27: error: '__target__' attribute only applies to functions
static __inline__ __m128i __DEFAULT_FN_ATTRS
                          ^
/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.5.1-Ubuntu-16.04-64bit/bin/../lib/clang/3.9.0/include/pmmintrin.h:31:50: note: expanded from macro '__DEFAULT_FN_ATTRS'
  __attribute__((__always_inline__, __nodebug__, __target__("sse3")))

[...]

lissyx · 2018-02-16T12:28:23Z

Right, never mind comment above. Turns out that it's just ComputeCpp compiler's choking on something in KenLM's code. Which we don't care about at all, so I've been able to move forward by adding "kenlm" to the list of folders to skip in third_party/sycl/crosstool/computecpp.tpl. It is all now building successfully using latest dev/amg_gpu branch.

DuncanMcBain · 2018-02-16T14:05:04Z

OK, great! Sorry for not replying, I had some stuff to do this week (compilation improvements, in fact). Glad to hear it's working!

lissyx · 2018-02-16T14:26:34Z

Well, it's buiding :-). Still broken on my GTX1080 and on my Intel open source driver. But at least I have uptodate build to test with close-source driver and AMD devices when I can :)

DuncanMcBain · 2018-02-16T16:22:29Z

Ah, I see. We don't really support enough PTX output to be able to run all of TensorFlow on any nvidia card, though we'd like to improve that soon. I'm pretty certain that all development on the Beignet driver has stopped, and as such the issues you're seeing are unlikely to go away. Keep us informed of how it's going!

lissyx · 2018-02-16T16:28:44Z

@DuncanMcBain Do you have actual informations from Intel about stopping Beignet? Looking at the Git history, it seems not dead, even though not very very active: https://cgit.freedesktop.org/beignet/log/. That'd be unfortunate if they stop :/. BTW, docs mentions Windows 7/Intel support, is that the only combination, or can I expect good support on intel GPU on any recent windows?

DuncanMcBain · 2018-02-16T18:13:45Z

Huh, I swear I'd seen an announcement on Phoronix. I might be thinking of this from today: https://www.phoronix.com/scan.php?page=news_item&px=Intel-New-Compute-Runtime

Our test infrastructure doesn't yet fully cover Windows 10/Intel. I imagine it should work, or at least I am unaware of any barriers to it working, but we can't say we support it (since we don't test that combination).

lissyx · 2018-02-17T10:38:38Z

@DuncanMcBain Thanks for the confirmation for Windows 10. Also, the Phoronix news is not bad, this Neo driver is the new one then. Aside, I've been able to get some ComputeCpp 0.5.1 based on our build system, so I can hack this more easily. Currently relying on your dev/amd_gpu, however the CUDA builds from this branch is broken (something around Eigen). I guess that's not too much of a surprise :), and won't stop me from testing.

I'll have a look with the Neo driver, if I can :)

lissyx · 2018-02-17T11:00:14Z

Haha, Neo docs states: Intel Core Processors with Gen8 graphics devices (formerly Broadwell) - OpenCL 2.0 at https://github.com/intel/compute-runtime/blob/89627bd81e407d2d8b8e4c651c53835c1a3048fa/README.md#supported-platforms

My i7 5600U is Broadwell :-)

lissyx · 2018-02-19T14:42:42Z

@DuncanMcBain So far, it's failing on Neo's side, but I have no idea if it's just an expected failure because the driver is too new, or anything else. I've shared some of the debug on their github issue tracker, if you are curious: intel/compute-runtime#20 (comment)

DuncanMcBain · 2018-02-19T14:54:15Z

OK, thanks for the link! That's cool to see that there's already some engagement there 😄

DuncanMcBain · 2018-03-21T18:09:25Z

Yeah, you're right, this is a bit of a nasty problem. I am not sure how to proceed with this! That would be the place to add the flags to, but since sometimes the host compiler is compute++, it might not be possible to avoid passing that flag to compute++.

DuncanMcBain · 2018-03-22T19:08:17Z

Hi @lissyx, as it happens we tried a little test here internally. We have been able to get compute++ to use the system assembler by passing -no-integrated-as on the command line; we have then traced it through to the arguments passed to /usr/bin/as. Would you be able to send a small snippet of the error you get when trying this with bazel + TensorFlow?

lissyx · 2018-03-22T21:33:27Z

Sure, but for the next days / weeks I'll be not working for personal reasons, so I cannot guarantee quick reply :-/. But I should be able to at least get the error maybe tomorrow?

DuncanMcBain · 2018-03-22T23:49:56Z

That's fine, no worries! We'll be here when you're able to give it another try :) If you manage to get the error for tomorrow, I'll post what information I can, if not then we can pick it up when you are back working.

lissyx · 2018-03-23T00:10:40Z

@DuncanMcBain You are lucky, I've got some spare moment :). This is the error:

external/local_config_sycl/crosstool/computecpp -fPIE -Wall -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-mtune=generic' '-march=x86-64' -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx '-fvisibility=hidden' -DCTC_DISABLE_OMP -no-integrated-as '-Wa,--defsym,powf=powf@GLIBC_2.2.5' '-Wa,--defsym,expf=expf@GLIBC_2.2.5' '-Wa,--defsym,logf=logf@GLIBC_2.2.5' -MD -MF bazel-out/k8-opt/bin/native_client/_objs/deepspeech_utils/native_client/kiss_fft130/tools/kiss_fftr.pic.d -fPIC -iquote . -iquote bazel-out/k8-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/k8-opt/genfiles/external/bazel_tools -isystem native_client/c_speech_features -isystem bazel-out/k8-opt/genfiles/native_client/c_speech_features -isystem native_client/kiss_fft130 -isystem bazel-out/k8-opt/genfiles/native_client/kiss_fft130 -isystem external/bazel_tools/tools/cpp/gcc3 -mno-fma -mno-avx -mno-avx2 -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c native_client/kiss_fft130/tools/kiss_fftr.c -o bazel-out/k8-opt/bin/native_client/_objs/deepspeech_utils/native_client/kiss_fft130/tools/kiss_fftr.pic.o)
gcc: error: unrecognized command line option '-no-integrated-as'; did you mean '-no-integrated-cpp'?

And it happens with bazel build [...] --copt="-no-integrated-as" --copt="-Wa,--defsym,powf=powf@GLIBC_2.2.5" --copt="-Wa,--defsym,expf=expf@GLIBC_2.2.5" --copt="-Wa,--defsym,logf=logf@GLIBC_2.2.5" [...].

DuncanMcBain · 2018-03-23T11:56:08Z

Ah, so it's actually GCC that doesn't support the -no-integrated-as argument. In that case, you could add it to the compute++ specific arguments in computecpp.tpl, starting on line 56. That should work! Probably the best way is to add the -Wa, arguments to compiler_flags on line 87 and only add the -no-integrated-as somewhere after line 56, I think.

It was easy enough that I actually made a diff. This might not apply but it should have enough context to work it out:

diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl
index 8f3535dbe1..d51f7d2b7c 100755
--- a/third_party/sycl/crosstool/computecpp.tpl
+++ b/third_party/sycl/crosstool/computecpp.tpl
@@ -68,6 +68,7 @@ def get_device_compiler_flags(compiler_flags):
       '-DEIGEN_HAS_C99_MATH=1',
       '-DEIGEN_HAS_CXX11_MATH=1',
       '-DDISABLE_SKINNY=1',
+      '-no-integrated-as',
   ]
   return compiler_flags + computecpp_flags
 
@@ -84,6 +85,11 @@ def checkComputeCppIsSupported():
 def useDriver(compiler_flags):
   output_file_index = compiler_flags.index('-o') + 1
   output_file_name = compiler_flags[output_file_index]
+  compiler_flags += [
+    '-Wa,--defsym,powf=powf@GLIBC_2.2.5',
+    '-Wa,--defsym,expf=expf@GLIBC_2.2.5',
+    '-Wa,--defsym,logf=logf@GLIBC_2.2.5',
+  ]
 
   # Check whether we should disable double or half support
   if DOUBLE_SUPPORT == "0":

lissyx · 2018-04-05T21:55:35Z

@DuncanMcBain Ok, clearly, this is too much hacky. I guess I'll just try to find another solution. After all, bionic will be released soon and comes with libc6 2.27, not worth the pain.

DuncanMcBain · 2018-04-06T16:28:35Z

Oh OK, if you're able to upgrade your test machine then I guess that makes the most sense. If there are any problems we can discuss them here, but otherwise this issue seems fairly sorted for now?

lissyx · 2018-04-06T17:26:28Z

@DuncanMcBain Yes, we can mark that as fixed. For now I'll keep the T450s with me as well to be able to test on both GPUs. I'm still on and off for paternity leave, but I want to continue hacking on OpenCL. Should we have a discussion in a better-suited place?

mirh · 2018-04-06T18:33:50Z

(OP can close himself his own issues)
I thought we were complaining about debug builds blowing up here?

DuncanMcBain · 2018-04-11T14:59:57Z

I might be misunderstanding, but it seemed to me that the original issue (running on Intel GPU) is more-or-less fixed. If there's another issue here that I've missed (or forgotten!) then we can keep this open (though it might be more efficient to move discussion to a new issue).

mirh · 2018-04-12T21:16:27Z

Well, I was pinged when discussing debug builds skyrocketing in memory usage...
Though I guess you are right the point of the thread actually being another.

I'm not sure where I should report the thing though then

lissyx · 2018-04-14T15:56:33Z

Yep, we can close that, and I'll file new issue in the future for more targetted work :)

DuncanMcBain · 2018-04-14T21:44:28Z

@mirh you're totally right about the memory usage. Can you make a separate issue for that as well, please?

mirh · 2018-04-15T21:07:03Z

I have that problem with eigen I mentioned here now, when building debug.
(not to mention that if I update from bazel 0.11.1 to 0.12.0 I get an error in external/jpeg/BUILD:126:12)

So.. It would seem wrong to open a report I cannot "get" up to the point of reproduce.

DuncanMcBain · 2018-04-16T11:51:50Z

Bazel's already at version 0.12.0? They move pretty fast...
For the assert-not-compiling issue, we know what is causing it but haven't been able to assign resources to fix it. A quick-and-dirty fix might be to add the header to Eigen/src/Core/util/Macros.h in the local downloaded copy of Eigen that bazel downloads. A find command should be able to locate it, but it will be something like bazel-tensorflow/local-dbg/Eigen (I've almost certainly made a mistake there, but it is definitely something like that).

lissyx · 2018-04-17T10:52:27Z

@DuncanMcBain Just adding some noise here, but looking at hardware support, I only saw ARM64, nothing about ARMv7. I'd like to explore OpenCL on RPi3, since it looks like the VC4CL project moved in a good direction: https://github.com/doe300/VC4CL here they claim OpenCL 1.2 embedded profile compatibility, that seems like what would be compatible with ComputeCpp, right? Except RPi3 distros are ARM and not ARM64 :-). Do you have any plan on providing ARM binaries, or should I hack an ARM64 system on my RPi3 to test that ?

mirh · 2018-04-17T15:41:45Z

TIL newer RPis are ARMv8.
ComputeCpp is provided in arm32 fashion if you choose "ubuntu 14.04" then.
Just make sure to have TF_USE_DOUBLE_SYCL and TF_USE_HALF_SYCL set to 0.

lissyx · 2018-04-17T16:27:33Z

@mirh The SoC are indeed ARMv8, but the Raspbian distro is not. I missed that there was some 32 bits under Ubuntu 14.04, thanks for the warning on types :)

DuncanMcBain · 2018-04-17T17:27:23Z

@mirh is correct, selecting Ubuntu 14.04 gives you the option of an arm32 download. We should make this page more clear.

It looks like that implementation might work, though we've never tried it. It looks like it implements enough of the API to work, as well as implementing both SPIR and SPIRV support which is pretty cool! You might try the SDK first to see if it runs (some samples definitely will fail to run). The version of ComputeCpp built on 14.04 uses the older GCC ABI as documented here: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html. Apologies if you know this already, but I seem to remember the RPi shipping with a compiler newer than 4.8, but there's a flag in the SDK for fixing that: https://github.com/codeplaysoftware/computecpp-sdk/blob/master/cmake/Modules/FindComputeCpp.cmake#L59

lissyx · 2018-05-18T10:57:44Z

@DuncanMcBain I'm moving forward on testing this VC4 driver for the RPi3's GPU. I've been able to cross-compile and install it, and it should have been done so using the SPIRV Frontend config. Yet, using computecpp_info, I get:

pi@rpi3-opencl-20180518:~ $ ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/computecpp_info 
********************************************************************************

ComputeCpp Info (CE 0.7.0)

********************************************************************************

Toolchain information:

GLIBC version: 2.24
GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : NO - Device does not support SPIR
  CL_DEVICE_NAME                          : VideoCore IV GPU
  CL_DEVICE_VENDOR                        : Broadcom
  CL_DRIVER_VERSION                       : 0.4
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.7.0/platform-support-notes

********************************************************************************

Is that a hard stopper, or just a side effect because the device is not tested by you?

lissyx · 2018-05-18T10:58:36Z

And more verbose output:

pi@rpi3-opencl-20180518:~ $ sudo ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/computecpp_info --verbose
********************************************************************************

ComputeCpp Info (CE 0.7.0)

********************************************************************************

Toolchain information:

GLIBC version: 2.24
GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : NO - Device does not support SPIR
  CL_DEVICE_NAME                          : VideoCore IV GPU
  CL_DEVICE_VENDOR                        : Broadcom
  CL_DRIVER_VERSION                       : 0.4
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
  CL_DEVICE_VERSION                       : OpenCL 1.2 VC4CL 0.4
  CL_DEVICE_PROFILE                       : EMBEDDED_PROFILE
  CL_DEVICE_MAX_COMPUTE_UNITS             : 1
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS      : 3
  CL_DEVICE_MAX_WORK_ITEM_SIZES           : 12 / 12 / 12
  CL_DEVICE_MAX_WORK_GROUP_SIZE           : 12
  CL_DEVICE_MAX_CLOCK_FREQUENCY           : 300 MHz
  CL_DEVICE_ADDRESS_BITS                  : 32
  CL_DEVICE_HOST_UNIFIED_MEMORY           : YES
  CL_DEVICE_MAX_MEM_ALLOC_SIZE            : 76 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE               : 76 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT      : NO
  CL_DEVICE_LOCAL_MEM_TYPE                : global
  CL_DEVICE_LOCAL_MEM_SIZE                : 77824 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE      : 77824 KByte
  CL_DEVICE_QUEUE_PROPERTIES              : CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT                 : NO
  CL_DEVICE_MAX_READ_IMAGE_ARGS           : 64
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS          : 64
  CL_DEVICE_IMAGE2D_MAX_WIDTH             : 2048
  CL_DEVICE_IMAGE2D_MAX_HEIGHT            : 2048
  CL_DEVICE_IMAGE3D_MAX_WIDTH             : 2048
  CL_DEVICE_IMAGE3D_MAX_HEIGHT            : 2048
  CL_DEVICE_IMAGE3D_MAX_DEPTH             : 2048
  CL_DEVICE_PREFERRED_VECTOR_WIDTH        : CHAR 16 SHORT 16 INT 16 LONG 0 FLOAT 16 DOUBLE 0 
  CL_DEVICE_EXTENSIONS                    : cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory


If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.7.0/platform-support-notes

********************************************************************************

mirh · 2018-05-18T12:10:42Z

Totally normal afaik.

DuncanMcBain · 2018-05-21T14:09:09Z

If the device doesn't report the Khronos extension "cl_khr_spir" when queried for its device extensions, we say it's not supported. That said, it is definitely possible for a device to... misreport its extensions. I would say that the only way to be sure is to try actual SPIR-V code IR!

lissyx · 2018-05-21T14:38:21Z

Thanks. I checked, the build do toggle HAS_COMPILER=1, which should expose cl_khr_spir: https://github.com/doe300/VC4CL/blob/7d5d906c8e2e69ff94ae605cdbfe1f7a32c87833/src/vc4cl_config.h#L36-L42 and https://github.com/doe300/VC4CL/blob/7d5d906c8e2e69ff94ae605cdbfe1f7a32c87833/src/Platform.cpp#L48-L51

But those are referring to "platform". There's a Device.cpp returning something else, that does not seems to include cl_khr_spir. So I don't know ? A bug in the driver ?

mirh · 2018-05-21T17:43:26Z

Could be.
...
What do you say about moving this in its own actual issue, and only there then ping doe300?

DuncanMcBain · 2018-05-21T22:25:57Z

I should have made it clear earlier, but where computecpp_info checks for the presence of these extensions, the actual ComputeCpp library does not - unless you ask it to. So by running, say, the samples on this repo, you'll know pretty quickly whether or not it works.

lissyx · 2018-05-22T15:23:28Z

@mirh You're right. i've opened #117 for sharing my issues around ARM system :)

DuncanMcBain self-assigned this Dec 22, 2017

DuncanMcBain added the question label Dec 22, 2017

lissyx changed the title ~~Debugging kernel creation failure~~ Debugging kernel creation failure on Intel GPU w/Beignet driver Dec 22, 2017

mirh mentioned this issue Jan 15, 2018

OpenCL support tensorflow/tensorflow#22

Closed

lissyx closed this as completed Apr 14, 2018

lukeiwanski mentioned this issue Jun 21, 2018

One module without kernel function - then core dump lukeiwanski/tensorflow#248

Closed

Debugging kernel creation failure on Intel GPU w/Beignet driver (tl;dr use Intel Neo unified driver) #82

Debugging kernel creation failure on Intel GPU w/Beignet driver (tl;dr use Intel Neo unified driver) #82

Comments

lissyx commented Dec 22, 2017

DuncanMcBain commented Dec 22, 2017

lissyx commented Dec 22, 2017

lissyx commented Dec 22, 2017

rodburns commented Dec 22, 2017

lissyx commented Dec 22, 2017

DuncanMcBain commented Dec 22, 2017

lissyx commented Dec 22, 2017

lissyx commented Dec 22, 2017

DuncanMcBain commented Jan 5, 2018

lissyx commented Jan 5, 2018

DuncanMcBain commented Jan 5, 2018

lissyx commented Feb 9, 2018

DuncanMcBain commented Feb 9, 2018

lissyx commented Feb 9, 2018 • edited Loading

DuncanMcBain commented Feb 9, 2018

lissyx commented Feb 12, 2018

lissyx commented Feb 16, 2018

DuncanMcBain commented Feb 16, 2018

lissyx commented Feb 16, 2018

DuncanMcBain commented Feb 16, 2018

lissyx commented Feb 16, 2018

DuncanMcBain commented Feb 16, 2018

lissyx commented Feb 17, 2018

lissyx commented Feb 17, 2018

lissyx commented Feb 19, 2018

DuncanMcBain commented Feb 19, 2018

DuncanMcBain commented Mar 21, 2018

DuncanMcBain commented Mar 22, 2018

lissyx commented Mar 22, 2018

DuncanMcBain commented Mar 22, 2018

lissyx commented Mar 23, 2018

DuncanMcBain commented Mar 23, 2018

lissyx commented Apr 5, 2018

DuncanMcBain commented Apr 6, 2018

lissyx commented Apr 6, 2018

mirh commented Apr 6, 2018

DuncanMcBain commented Apr 11, 2018

mirh commented Apr 12, 2018

lissyx commented Apr 14, 2018

DuncanMcBain commented Apr 14, 2018

mirh commented Apr 15, 2018

DuncanMcBain commented Apr 16, 2018

lissyx commented Apr 17, 2018

mirh commented Apr 17, 2018

lissyx commented Apr 17, 2018

DuncanMcBain commented Apr 17, 2018

lissyx commented May 18, 2018

lissyx commented May 18, 2018

mirh commented May 18, 2018

DuncanMcBain commented May 21, 2018

lissyx commented May 21, 2018

mirh commented May 21, 2018 • edited Loading

DuncanMcBain commented May 21, 2018

lissyx commented May 22, 2018

lissyx commented Feb 9, 2018 •

edited

Loading

mirh commented May 21, 2018 •

edited

Loading