-
Notifications
You must be signed in to change notification settings - Fork 251
Unable to get working on Broadwell i7 5600U, or missing steps? #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The GPU seems to be identified as Broadwell in the source as well, under the define |
Broadwell is supported by Neo driver. Your issue looks similar to problem already reported in #9. Could you please check it? To operate correctly Neo driver requires Khronos ICD loader: https://github.com/KhronosGroup/OpenCL-ICD-Loader. It is known limitation that we are trying to solve. |
Ok, |
Okay, after adding a clone of https://github.com/KhronosGroup/OpenCL-ICD-Loader into the set of sources for the Neo driver and building it by hand, re-running CMake, it was properly picked up as this line shows: Then, it's being properly packaged, and |
@pwilma i'm unable to get any debug, even forcing |
Yes, debug variables won't work for Release. Please try to build Debug version. |
@pwilma Thanks, I'm doing that right now. BTW, one step I had to make but I'm not completely sure about is: I did git clone the Khronos ICD Loader next to the neo driver, and I built it by hand. Was building by hand required, or should your CMake build system have handled that for me automagically? |
You did it correctly. Manual build of Khronos ICD Loader is required. Neo CMake buildsystem won't build it automatically. |
@pwilma Debug-enabled build is failing: |
@pwilma I've removed the unexisting member, built and installed debug build, but I'm still unable to get any debug message :(. Update: it works, but only through
|
Not sure yet, but there's a |
Enabling DRM's debug in
|
@pwilma Here is most of the debug info I could gather so far. Is it possible it is failing because the driver does not yet support enough of the required calls / opencl for proper operation with TensorFlow / ComputeCpp 0.5.1 ? Or do I need a newer kernel (Ubuntu 17.10 is on 4.13) ? |
@pwilma I've also collected some debug files, if it can be of help, tell me what would be the best way to share them:
|
Useful stuff for debugging:
|
@MichalMrozek Thanks! I have the following in
|
I cannot find anything referring to |
Can you try to use this flag ? |
@MichalMrozek Magic! it's not failing anymore, it seems to be doing actual stuff :) |
Looks like there may be a bug in our clEnqueueWriteBuffer implementation. |
@MichalMrozek I can share you the binaries to reproduce as well, if you need. It's DeepSpeech + ComputeCpp 0.5.1. Now, so far, it started the run, but it's not completing anything yet, the output is "blocked" on this:
The |
@MichalMrozek Building release and changing the value of
|
@MichalMrozek Okay, performances are very bad, but it executed successfully:
For reference, the same run on pure CPU takes 4-6 secs instead of 835. But for experimental driver and with the flag that we had to flip, it's probably not surprising. Good news is that it works ! |
It is expected that some tests will fail in non-default debug variable setting. If you are interested in analyzing why execution is slow I kindly suggest following project Good to start would be This will analyze application for potential pitfalls. If you want to find the GPU execution times I recommend |
@MichalMrozek Thanks for those hints, I'll take a look into that. I pushed the testing to comparing the same model:
And on pure CPU:
Given the current state of your driver (and the flag flipped), are those times expected, or is there something valuable to investigate ? |
This flag only changes how writeBuffer operation is being handled, instead of going to GPU it uses CPU for transfer. It is unlikely that it affects performance so significantly, therefore I think problem is somewhere else. I think your investigation may provide valuable feedback. |
@bashbaug Yes I did, I was calling with those env variables: |
@lissyx strange, should be working then. To confirm, you're seeing other output in your log (such as OpenCL API calls, from CallLogging)? Since you've also set LogToFile, your log will be in:
If you're seeing other output in your log, but not the performance hints, please file an issue against the Intercept Layer and we can discuss further there - thanks! |
@bashbaug Oh, my bad, I was expecting that on |
@MichalMrozek Okay, so it's confirmed that the (unaligned) allocations are done on the |
Hi @lissyx , You may now try to run the app without DoCpuCopyOnWriteBuffer set to = 1. |
@mejch Thanks for the heads-up, I'm syncing and rebuilding :) |
@mejch Quick test shows some progress :)
So, with |
Thanks for checking, can you share repro steps for multi file tests? |
@MichalMrozek Sure, I'm sending those by email to you. |
Hi, I am running the first command line provided with new input files: And I get SIGABORT after some mmap operation fails ( from strace ), that does not come from libigdrcl.so: mmap(NULL, 33558528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc6b7fee000 As I understand, above cmd line is working for you? I've also ran second cmd line (mmap version)
I'll try more times to see if I get the same error as you. Thanks, |
@mejch Good catch, it requires some memory: |
That would be very helpful for further debugging |
@mejch Should have been shared to your colleague @MichalMrozek, tell me if you run into other issue :-) |
Thanks a lot, But I was not able to reproduce the problem with output_graph.pbmm. From log above I see that abort was called : If the fix does not resolve the issue, can you share the callstack and errno code when the abort is called - now it is line 172 in drm_buffer_obcject.cpp in BufferObject::exec(): UNRECOVERABLE_IF(true); That will help understand what leads to this fail in exec. Thanks, |
@mejch Interesting. Unfortunately, I won't be able to test that soon, for personnal reasons, but as soon as I can, I'll keep you posted. |
Ok, no problem, I'll try on different machine and let you know if i was able to reproduce and fix this. |
Awesome, I'll try to find some time to test that asap, but I cannot promise any ETA :) |
No problem, I hope your issues are fixed now. |
@mejch Okay, I could test updated code on my new laptop (i7-8650U), and it seems the issue is fixed :), all inference can be run with both @BartoszDunajski I'll have to retest on my previous laptop, because I'm getting weird results, it seems not much faster with the i7-8650U compared to i7-5600U. |
In terms of GPU performance we are comparing Intel® HD Graphics 5500 and Intel® UHD Graphics 620. Intel® HD Graphics 5500 has theoretical compute power of ~364 GFLOPS while Intel® UHD Graphics 620 has ~384-400. Bottom line here is that for compute bound workloads performance delta may follow theoretical compute power, which is not significantly higher for Intel® UHD Graphics 620. |
@lissyx looks like all the problems are resolved, can we close this issue ? |
@MichalMrozek It's fine by me. Is that an issue if I post new comments to update on latest retries ? What would be the best communication channel for further discussion about improvements, if needed ? Filing a new issue ? |
For me I like the 1 problem/suggestion == 1 issue approach. |
@MichalMrozek I totally agree with you, let's close this, but I'll keep the work going :). Thanks for your help, we made a lot of progresses :) |
@BartoszDunajski I played a bit with those on my new laptop, and it looks like specific tuning is not required anymore ! Thanks ! |
I have built and installed without any error (following exactly the steps from https://github.com/intel/compute-runtime#building) on my laptop (ubuntu 17.10, Broadwell i7-5600U), but
clinfo
returns 0 platform found. Checkingclinfo
calls withstrace
shows it is properly picking-up/opt/intel/opencl/libigdrcl.so
.Is Broadwell supported right now? Readme states it is https://github.com/intel/compute-runtime#supported-platforms, but the "GenX" naming is unclear to me, and https://ark.intel.com/products/85215/Intel-Core-i7-5600U-Processor-4M-Cache-up-to-3_20-GHz reports "5th gen". And the GPU is Intel HD Graphics 5500, whose naming would be consistent with 5th gen, but it's not listed on https://www.intel.com/content/www/us/en/architecture-and-technology/visual-technology/graphics-overview.html.
So, am I trying to get it working on an unsupported platform, or did I missed anything to get that working ? Thanks!
The text was updated successfully, but these errors were encountered: