-
Notifications
You must be signed in to change notification settings - Fork 90
Debugging kernel creation failure on Intel GPU w/Beignet driver (tl;dr use Intel Neo unified driver) #82
Comments
Hi @lissyx, as I might have mentioned in the other thread (and I'm sorry if I didn't), the reason this is likely not working is because our Beignet was having some issues parsing the output from our compiler reliably. What OS are you using? Are you using a package-manager version of Beignet, or did you compile it yourself? A way to test whether this is a Tensorflow-specific problem would be to try to run the samples from this repository. If any of them fail with the same error, then my guess is that it's a Beignet incompatibility. For what it's worth, there is a document in the ComputeCpp package that has descriptions (if brief) for all the error codes. Often ComputeCpp is simply "passing on" an error from the OpenCL implementation and cannot give any more information (like in this case - the "One module without kernel function" output is actually from Beignet. Would you be able to try Intel's closed source GPU driver instead? |
Thanks!
I might be able to try the closed source driver, but I don't want to mess too much with my system so far. I read some things about SPIR-32 vs SPIR-64 ? I saw some bugs referring to that, but unluckyli, trying to build TensorFlow with SYCL support and forcing SPIR-32 would fail in an unexpected way within SYCL headers about some redefinition of int, I'll give a closer look at the samples as suggested :) |
Building against ComputeCpp 0.4.0 somehow fails on my system, using GCC 5.4 and 6.4:
But using 0.5.0, it completed the build. The sad thing is that I'm building TensorFlow with 0.4.0 because the build fails with 0.5.0 :):
So I guess that matches the "it's your driver" path :[ |
@lissyx You are seeing those errors with v 0.4.0 because of a mismatch between the ComputeCpp compiler and your path. How do you have LD_LIBRARY_PATH set up? |
@rodburns I made no change to LD_LIBRARY_PATH, but I don't think it's a big deal, given that 0.5.0 builds and explodes with the same kind of error as TensorFlow, it's enough I guess to know that it's more likely to be related to the driver than anything else, which is all I needed to know so far :) |
There have been interface changes in ComputeCpp v0.4.0 to v0.5.0, matching the changes in the specification versions 1.2 and 1.2.1. All Tensorflow branches after a certain point (roughly two weeks ago) will no longer build with old versions of ComputeCpp and vice-versa. You might get more success with compiling Beignet on your own machine, but it's unlikely. |
@DuncanMcBain Thanks, I suspected that kind of stuff, but since there was nothing written for sure I could only be speculating. I'll try and test the intel closed source driver to verify :) |
Turns out that one can get quite some debugging out of Beignet, by tracking those macros use: |
Hi @lissyx, did you make any progress with this? I'm afraid we can't really support Beignet, but if the problem reproduces on the other Intel drivers we can look into that. Thanks! |
Thanks for pinging, sadly I had no time, between bad recovery from jetlag and shoulder issue, but it's still on my radar for sure :) |
OK, that's fine. Let us know if you find out anything else! :) |
@DuncanMcBain Hello, I still had no time for that, but I might be able to hack around soonish. Now, I wonder what version of ComputeCpp I should test. It seems that even right now, tensorflow/master does not builds with 0.5.0 nor 0.5.1. Am I going to waste my time ? |
At the moment, we would recommend either dev/eigen_mehdi or dev/amd_gpu. I am not sure which of these two should perform better, but either should work with your setup (and ComputeCpp 0.5.1!) |
Right, https://github.com/lukeiwanski/tensorflow/tree/dev/amd_gpu seems to be the most uptodate. I'll give it a try, thanks! BTW, are you planning on merging this upstream? And if so, do you have any ETA ? |
Slight clarification: AMD GPU is a branch which is focussing particularly on AMD performance, so I'm not sure what it will look like on Intel. @lukeiwanski can say more about our upstreaming plans - I'm not entirely sure! |
@DuncanMcBain Okay, so far I could not get anything to build, it is failing about AVX somehow:
|
Right, never mind comment above. Turns out that it's just ComputeCpp compiler's choking on something in KenLM's code. Which we don't care about at all, so I've been able to move forward by adding "kenlm" to the list of folders to skip in |
OK, great! Sorry for not replying, I had some stuff to do this week (compilation improvements, in fact). Glad to hear it's working! |
Well, it's buiding :-). Still broken on my GTX1080 and on my Intel open source driver. But at least I have uptodate build to test with close-source driver and AMD devices when I can :) |
Ah, I see. We don't really support enough PTX output to be able to run all of TensorFlow on any nvidia card, though we'd like to improve that soon. I'm pretty certain that all development on the Beignet driver has stopped, and as such the issues you're seeing are unlikely to go away. Keep us informed of how it's going! |
@DuncanMcBain Do you have actual informations from Intel about stopping Beignet? Looking at the Git history, it seems not dead, even though not very very active: https://cgit.freedesktop.org/beignet/log/. That'd be unfortunate if they stop :/. BTW, docs mentions Windows 7/Intel support, is that the only combination, or can I expect good support on intel GPU on any recent windows? |
Huh, I swear I'd seen an announcement on Phoronix. I might be thinking of this from today: https://www.phoronix.com/scan.php?page=news_item&px=Intel-New-Compute-Runtime Our test infrastructure doesn't yet fully cover Windows 10/Intel. I imagine it should work, or at least I am unaware of any barriers to it working, but we can't say we support it (since we don't test that combination). |
@DuncanMcBain Thanks for the confirmation for Windows 10. Also, the Phoronix news is not bad, this Neo driver is the new one then. Aside, I've been able to get some ComputeCpp 0.5.1 based on our build system, so I can hack this more easily. Currently relying on your I'll have a look with the Neo driver, if I can :) |
Haha, Neo docs states: My i7 5600U is Broadwell :-) |
@DuncanMcBain So far, it's failing on Neo's side, but I have no idea if it's just an expected failure because the driver is too new, or anything else. I've shared some of the debug on their github issue tracker, if you are curious: intel/compute-runtime#20 (comment) |
OK, thanks for the link! That's cool to see that there's already some engagement there 😄 |
Yeah, you're right, this is a bit of a nasty problem. I am not sure how to proceed with this! That would be the place to add the flags to, but since sometimes the host compiler is compute++, it might not be possible to avoid passing that flag to compute++. |
Hi @lissyx, as it happens we tried a little test here internally. We have been able to get compute++ to use the system assembler by passing |
Sure, but for the next days / weeks I'll be not working for personal reasons, so I cannot guarantee quick reply :-/. But I should be able to at least get the error maybe tomorrow? |
That's fine, no worries! We'll be here when you're able to give it another try :) If you manage to get the error for tomorrow, I'll post what information I can, if not then we can pick it up when you are back working. |
@DuncanMcBain You are lucky, I've got some spare moment :). This is the error:
And it happens with |
Ah, so it's actually GCC that doesn't support the -no-integrated-as argument. In that case, you could add it to the compute++ specific arguments in computecpp.tpl, starting on line 56. That should work! Probably the best way is to add the -Wa, arguments to It was easy enough that I actually made a diff. This might not apply but it should have enough context to work it out: diff --git a/third_party/sycl/crosstool/computecpp.tpl b/third_party/sycl/crosstool/computecpp.tpl
index 8f3535dbe1..d51f7d2b7c 100755
--- a/third_party/sycl/crosstool/computecpp.tpl
+++ b/third_party/sycl/crosstool/computecpp.tpl
@@ -68,6 +68,7 @@ def get_device_compiler_flags(compiler_flags):
'-DEIGEN_HAS_C99_MATH=1',
'-DEIGEN_HAS_CXX11_MATH=1',
'-DDISABLE_SKINNY=1',
+ '-no-integrated-as',
]
return compiler_flags + computecpp_flags
@@ -84,6 +85,11 @@ def checkComputeCppIsSupported():
def useDriver(compiler_flags):
output_file_index = compiler_flags.index('-o') + 1
output_file_name = compiler_flags[output_file_index]
+ compiler_flags += [
+ '-Wa,--defsym,powf=powf@GLIBC_2.2.5',
+ '-Wa,--defsym,expf=expf@GLIBC_2.2.5',
+ '-Wa,--defsym,logf=logf@GLIBC_2.2.5',
+ ]
# Check whether we should disable double or half support
if DOUBLE_SUPPORT == "0": |
@DuncanMcBain Ok, clearly, this is too much hacky. I guess I'll just try to find another solution. After all, bionic will be released soon and comes with libc6 2.27, not worth the pain. |
Oh OK, if you're able to upgrade your test machine then I guess that makes the most sense. If there are any problems we can discuss them here, but otherwise this issue seems fairly sorted for now? |
@DuncanMcBain Yes, we can mark that as fixed. For now I'll keep the T450s with me as well to be able to test on both GPUs. I'm still on and off for paternity leave, but I want to continue hacking on OpenCL. Should we have a discussion in a better-suited place? |
(OP can close himself his own issues) |
I might be misunderstanding, but it seemed to me that the original issue (running on Intel GPU) is more-or-less fixed. If there's another issue here that I've missed (or forgotten!) then we can keep this open (though it might be more efficient to move discussion to a new issue). |
Well, I was pinged when discussing debug builds skyrocketing in memory usage... I'm not sure where I should report the thing though then |
Yep, we can close that, and I'll file new issue in the future for more targetted work :) |
@mirh you're totally right about the memory usage. Can you make a separate issue for that as well, please? |
I have that problem with eigen I mentioned here now, when building debug. So.. It would seem wrong to open a report I cannot "get" up to the point of reproduce. |
Bazel's already at version 0.12.0? They move pretty fast... |
@DuncanMcBain Just adding some noise here, but looking at hardware support, I only saw ARM64, nothing about ARMv7. I'd like to explore OpenCL on RPi3, since it looks like the VC4CL project moved in a good direction: https://github.com/doe300/VC4CL here they claim OpenCL 1.2 embedded profile compatibility, that seems like what would be compatible with ComputeCpp, right? Except RPi3 distros are ARM and not ARM64 :-). Do you have any plan on providing ARM binaries, or should I hack an ARM64 system on my RPi3 to test that ? |
TIL newer RPis are ARMv8. |
@mirh The SoC are indeed ARMv8, but the Raspbian distro is not. I missed that there was some 32 bits under Ubuntu 14.04, thanks for the warning on types :) |
@mirh is correct, selecting Ubuntu 14.04 gives you the option of an arm32 download. We should make this page more clear. It looks like that implementation might work, though we've never tried it. It looks like it implements enough of the API to work, as well as implementing both SPIR and SPIRV support which is pretty cool! You might try the SDK first to see if it runs (some samples definitely will fail to run). The version of ComputeCpp built on 14.04 uses the older GCC ABI as documented here: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html. Apologies if you know this already, but I seem to remember the RPi shipping with a compiler newer than 4.8, but there's a flag in the SDK for fixing that: https://github.com/codeplaysoftware/computecpp-sdk/blob/master/cmake/Modules/FindComputeCpp.cmake#L59 |
@DuncanMcBain I'm moving forward on testing this VC4 driver for the RPi3's GPU. I've been able to cross-compile and install it, and it should have been done so using the SPIRV Frontend config. Yet, using computecpp_info, I get:
Is that a hard stopper, or just a side effect because the device is not tested by you? |
And more verbose output:
|
Totally normal afaik. |
If the device doesn't report the Khronos extension "cl_khr_spir" when queried for its device extensions, we say it's not supported. That said, it is definitely possible for a device to... misreport its extensions. I would say that the only way to be sure is to try actual SPIR-V code IR! |
Thanks. I checked, the build do toggle But those are referring to "platform". There's a Device.cpp returning something else, that does not seems to include |
Could be. |
I should have made it clear earlier, but where computecpp_info checks for the presence of these extensions, the actual ComputeCpp library does not - unless you ask it to. So by running, say, the samples on this repo, you'll know pretty quickly whether or not it works. |
Trying to get OpenCL builds on top of TensorFlow, I am running into that kind of failure:
I had to force-enable support for Intel GPU's with Beignet, as suggested on #78, but I am seeing the very same error (same kernel name, same error code) on a NVIDIA GTX1080 card. Now, I understand that both are not really expected to work, as documented here: #78 (comment) for NVIDIA, and the blacklist was likely here for a reason.
However, i'd like to dig more and understand better why this is failing, especially in case it is not related to hardware / driver support but rather to the model itself. So far, looking into the SDK/source for this
RT0101
error code was not helpful at all, and I could not find anything documenting how to debug furtherComputeCpp
kernel creations.Thanks for any debugging pointers, docs and tips :)
The text was updated successfully, but these errors were encountered: