Skip to content

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Nov 7, 2024

Conversation

zhiyuan1i
Copy link
Contributor

@zhiyuan1i zhiyuan1i commented Nov 2, 2024

Overview

This update focuses on two major optimizations for RWKV6 operators:

  1. Standardize operator naming for better code readability
  2. Implement CPU multi-core parallel acceleration to improve inference performance

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 2, 2024
@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Nov 2, 2024
@zhiyuan1i
Copy link
Contributor Author

The SYCL backend of WKV6 is still being tested and may be pushed in the near future

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 2, 2024
@zhiyuan1i zhiyuan1i changed the title Optimize RWKV6 Operator Naming and Implement Multi-core CPU Acceleration Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration Nov 2, 2024
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airMeng Can someone on your team review the SYCL changes?

@zhiyuan1i zhiyuan1i requested a review from ggerganov November 4, 2024 13:58
Co-authored-by: Georgi Gerganov <[email protected]>
Copy link
Collaborator

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM excepts some minor comments

@zhiyuan1i zhiyuan1i requested a review from ggerganov November 4, 2024 15:57
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uniartisan
It's great work! Including to refactor the SYCL backend.
I test the code with base cases. They are passed.

Thank you!

@airMeng airMeng merged commit 3bcd40b into ggml-org:master Nov 7, 2024
53 checks passed
Alcpz added a commit that referenced this pull request Nov 13, 2024
* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133)

* Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops

* Fixes asserts in norm to fix debug builds.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…eleration (ggml-org#10133)

* rwkv6: rename to wkv6

* rwkv6: support avx2 avx512 armv8 armv9

* rwkv6: update cuda file name

* rwkv6: rename params

* wkv on sycl

* sycl: add some ops

* sycl: Enhance OP support judgment

* wkv6: drop armv9 and tranfer to GGML style

ggml-ci

* sync : ggml

* update the function to use appropriate types

* fix define error

* Update ggml/src/ggml-cpu.c

* add appropriate asserts

* move element-wise functions outside

* put the declaration outside the loop

* rewrite to be more inline with the common pattern for distributing threads

* use recommended way GGML_TENSOR_LOCALS

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Meng, Hengyu <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggml-org#10133)

* Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops

* Fixes asserts in norm to fix debug builds.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…eleration (ggml-org#10133)

* rwkv6: rename to wkv6

* rwkv6: support avx2 avx512 armv8 armv9

* rwkv6: update cuda file name

* rwkv6: rename params

* wkv on sycl

* sycl: add some ops

* sycl: Enhance OP support judgment

* wkv6: drop armv9 and tranfer to GGML style

ggml-ci

* sync : ggml

* update the function to use appropriate types

* fix define error

* Update ggml/src/ggml-cpu.c

* add appropriate asserts

* move element-wise functions outside

* put the declaration outside the loop

* rewrite to be more inline with the common pattern for distributing threads

* use recommended way GGML_TENSOR_LOCALS

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Meng, Hengyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants