Skip to content

Conversation

@vladimir-paramuzov
Copy link

@vladimir-paramuzov vladimir-paramuzov commented May 23, 2024

Details:

  • Added SDPA impl based on microkernels using internal onednn API and related infra
  • Current limitations:
    • fused transpose shouldn't change order of innermost dim (head size).
    • is_causal = true is not supported
    • fp16 only
    • num heads dimension must be static
    • no indirect kv support
  • Initial version of KV Cache + SDPA func test
  • Enabled Transpose+SDPA fusion for static shape too

Tickets:

@vladimir-paramuzov vladimir-paramuzov added this to the 2024.3 milestone May 23, 2024
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label May 23, 2024
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jun 4, 2024
@vladimir-paramuzov vladimir-paramuzov force-pushed the micro_sdpa branch 5 times, most recently from 5dbb9c7 to d149dd4 Compare June 13, 2024 13:03
@github-actions github-actions bot removed the category: IE Tests OpenVINO Test: plugins and common label Jun 13, 2024
@vladimir-paramuzov vladimir-paramuzov force-pushed the micro_sdpa branch 2 times, most recently from 5cc640c to 6c596a9 Compare June 14, 2024 11:06
@vladimir-paramuzov vladimir-paramuzov marked this pull request as ready for review June 14, 2024 11:08
@vladimir-paramuzov vladimir-paramuzov requested review from a team as code owners June 14, 2024 11:08
@vladimir-paramuzov vladimir-paramuzov changed the title [WIP][GPU] Micro sdpa draft [GPU] Micro sdpa draft Jun 14, 2024
@vladimir-paramuzov vladimir-paramuzov changed the title [GPU] Micro sdpa draft [GPU] Micro sdpa Jun 14, 2024
@vladimir-paramuzov vladimir-paramuzov force-pushed the micro_sdpa branch 3 times, most recently from 979a0d4 to 9525423 Compare June 29, 2024 11:50
return false;

// For platforms with DPAS support we don't have any shape-based limitations
if (device_info.supports_immad && cldnn::query_microkernels_supported(m_context->get_engine(), config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to kernel selector, micro SDPA kernel supports only f16 data type and bfyx 4D format - should we relax these limitations for kernel, or restrict such cases here? Also, dynamic num_heads dimension is not supported and then sdpa_ref kernel will be used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted callback to decompose those cases

Copy link
Contributor

@sshlyapn sshlyapn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me


#ifdef ENABLE_ONEDNN_FOR_GPU

#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be moved to the top of the file probably

@p-durandin p-durandin enabled auto-merge July 1, 2024 13:55
@p-durandin p-durandin added this pull request to the merge queue Jul 1, 2024
Merged via the queue into openvinotoolkit:master with commit 2918322 Jul 1, 2024
@vladimir-paramuzov vladimir-paramuzov deleted the micro_sdpa branch July 3, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: GPU OpenVINO GPU plugin under_perf_check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants