Skip to content

gpu: Fix compilation errors in NVIDIA GPUs with cuDNN 8 #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 28, 2021

Conversation

Dr-Noob
Copy link
Contributor

@Dr-Noob Dr-Noob commented Jan 18, 2021

Description

The current version does not build with cudDNN 8 since there are some changes in the API. Building with cuDNN 8 results in a failure, as I described in issue #885.

Deprecated calls in cuDNN 8 used in oneDNN:

  • cudnnGetConvolutionBackwardDataAlgorithm
  • cudnnGetConvolutionBackwardFilterAlgorithm
  • cudnnGetConvolutionForwardAlgorithm

All of them were replaced by _v7 calls. They could have been replaced by cuddnFind... counterparts, but because they do not behave in the same way, I decided to use the _v7 functions. API changed, and now cuDNN functions return a list of the possible algorithms to choose instead of returning the fastest one. Therefore, I implemented a simple algorithm to choose the fastest suitable algorithm from the list.

Because _v7 functions build successfully with cuDNN 7, I did not write any conditional to check the version of cuDNN; both cuDNN 7 and 8 builds successfully with the same code.

I am unable to run the tests for GPU with an untouched master branch, so I did not include any new test.

Checklist

Code-change submissions

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally?
  • Have you formatted the code using clang-format?

Bug fixes

  • Have you added relevant regression tests?
  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?

@vpirogov vpirogov added this to the v2.2 milestone Jan 19, 2021
@dzarukin
Copy link
Contributor

Hi @Dr-Noob, thank you for your contribution. We will take this changes to test them internally with both cuDNN v7 and v8 and come back to you with our results. Does it sound good for you? Thanks.

@dzarukin dzarukin self-assigned this Jan 19, 2021
@Dr-Noob
Copy link
Contributor Author

Dr-Noob commented Jan 20, 2021

Great, please let me know if you find any problem or if there is further modification needed

@dzarukin
Copy link
Contributor

Hi @Dr-Noob, I've got good news and bad news.
So good news are proposed changes are fully legit with both v7 and v8 cuDNN versions of the library and work as expected (at least on ci dataset our team monitors regularly). It means that we will promote them.
Bad news are:

  1. It looks like we have the cuda backend flow broken for 1,5 months for convolutions, when de-tangled a service stream from a computational stream inside the library. This results in UNIMPLEMENTED cases if running them in batch (like --batch=test_conv_gpu_ci file does in benchdnn) starting from a second one. At some point it just explodes. We decided to revert the change for cuda backend and will investigate it later, though the fact it explodes might be either a compiler problem or integration of cuda code problem, we don't have a specific conclusion yet. This fix will happen before we promote this PR.
  2. It seems to me cuDNN v8 has removed or broken some support for convolutions/deconvolutions. This is one of 58 shapes caught unimplemented:
320:UNIMPLEMENTED __REPRO: --conv --engine=gpu --cfg=f16 g1ic3id5oc17od5kd3pd1n"3d_tail_conv:1st"

With v7 it dispatched to CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM, it doesn't do that any more. Moreover, neither of other 7 algs available for such problem do. I'm not sure what that means but to me it looks like a regression and I don't know whether it's appropriate for you.

Saying all that, we will inform you when we promote this PR. I would kindly request you to rebase this PR on top of master (we updated it today and it contains a fix for broken names) and squash the last commit into a second one. I've also left some minor comments for changes. Thank you.

@Dr-Noob
Copy link
Contributor Author

Dr-Noob commented Jan 25, 2021

Hi,

I'm not sure if I understand you correctly. I have rebased the PR on top of master and later pushed a new commit with the changes you suggested, but I don't know how to squash my commits. Which commits do you want me to squash? You want to squash commit 1, 2 and 3 (fe6a2b1, f083f0c and 10ced24) into a single commit and then leave the last one (356315f) as the second commit?

@dzarukin
Copy link
Contributor

Hi @Dr-Noob, sorry for not being clear. We prefer to have commits that don't break any checks. Currently, commit three "Fix style with clang-format" fixes the issue introduces in commit two "gpu: Choose best suitable algorithm for convolutions". I'm asking to merge commit three into commit two. I'm fine with having rest three logically separated commits. You may use git push --force in your branch when requested commits will be squashed. Thanks.

@Dr-Noob
Copy link
Contributor Author

Dr-Noob commented Jan 26, 2021

A little bit hacky, but I think I did what you requested. I hope it looks better now!

@dzarukin
Copy link
Contributor

Thanks for updating, looks good to me. I left one more small comment. Thanks @echeresh for spotting it.

@Dr-Noob
Copy link
Contributor Author

Dr-Noob commented Jan 27, 2021

Whoops...fixed

@dzarukin dzarukin merged commit dcf444e into uxlfoundation:master Jan 28, 2021
@dzarukin
Copy link
Contributor

Thanks for your contribution!

vpirogov pushed a commit that referenced this pull request Feb 1, 2021
Replace previous calls with _v7 counterparts.
Improved best algorithm pick up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants