-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] add compute pass API for memory barriers, re-enable for Vulkan. #49946
Changes from all commits
845e8f6
94e274e
e049730
30e474c
2b4c3bc
802de21
ec94606
e5f44ce
0c2adf5
0ec1f76
555e92d
8220ebf
e9adf67
303f1e7
18f6f54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,11 +104,7 @@ fml::Status ComputePassVK::Compute(const ISize& grid_size) { | |
|
||
// Special case for linear processing. | ||
if (height == 1) { | ||
int64_t minimum = 1; | ||
int64_t threadGroups = std::max( | ||
static_cast<int64_t>(std::ceil(width * 1.0 / max_wg_size_[0] * 1.0)), | ||
minimum); | ||
command_buffer_vk.dispatch(threadGroups, 1, 1); | ||
command_buffer_vk.dispatch(width, 1, 1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic was incorrect... |
||
} else { | ||
while (width > max_wg_size_[0]) { | ||
width = std::max(static_cast<int64_t>(1), width / 2); | ||
|
@@ -216,8 +212,53 @@ bool ComputePassVK::BindResource(size_t binding, | |
return true; | ||
} | ||
|
||
// Note: | ||
// https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how that is the case. Though you do need the buffer and texture references to specify the barriers on. So this approach is definitely more easy to use if coarse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could easily add these to the API and create the more specific barriers, but given that we dont have any way to evaluate it this yet, I'd prefer to follow the recommendations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sgtm |
||
// Seems to suggest that anything more finely grained than a global memory | ||
// barrier is likely to be weakened into a global barrier. Confirming this on | ||
// mobile devices will require some experimentation. | ||
|
||
// |ComputePass| | ||
void ComputePassVK::AddBufferMemoryBarrier() { | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These to me seem underspecified. Right now, this says "All commands before the barrier must have buffer-writes in their compute shaders be visible to buffer-reads in also the compute shaders of commands after the barrier". The same can apply to transfer read-write ( You can just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would also match up with Metal semantics implemented above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would I need to specify transfer dependencies in a barrier API that is specifically for compute-compute dependencies? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I clarified on the API docs that it is only for compute-compute dependencies, but I can add a caveat that it does not cover other cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair. But the first compute pass still needs to wait for say transfer writes on the buffers it works on to be visible. So my expectation would be that compute-compute dependencies while necessary are not sufficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't use transfer commands anywhere yet though. my expectation is that if we needed this we would add this blit/transfer -> compute barrier functionality to the blit pass and not to the compute pass. |
||
barrier.dstAccessMask = vk::AccessFlagBits::eShaderRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eComputeShader, {}, 1, &barrier, 0, {}, 0, {}); | ||
} | ||
|
||
// |ComputePass| | ||
void ComputePassVK::AddTextureMemoryBarrier() { | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
barrier.dstAccessMask = vk::AccessFlagBits::eShaderRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eComputeShader, {}, 1, &barrier, 0, {}, 0, {}); | ||
} | ||
|
||
// |ComputePass| | ||
bool ComputePassVK::EncodeCommands() const { | ||
// Since we only use global memory barrier, we don't have to worry about | ||
// compute to compute dependencies across cmd buffers. Instead, we pessimize | ||
// here and assume that we wrote to a storage image or buffer and that a | ||
// render pass will read from it. if there are ever scenarios where we end up | ||
// with compute to compute dependencies this should be revisited. | ||
|
||
// This does not currently handle image barriers as we do not use them | ||
// for anything. | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
barrier.dstAccessMask = | ||
vk::AccessFlagBits::eIndexRead | vk::AccessFlagBits::eVertexAttributeRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eVertexInput, {}, 1, &barrier, 0, {}, 0, {}); | ||
|
||
return true; | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -308,6 +308,7 @@ TEST_P(ComputeTest, MultiStageInputAndOutput) { | |
CS1::BindOutput(*pass, DeviceBuffer::AsBufferView(output_buffer_1)); | ||
|
||
ASSERT_TRUE(pass->Compute(ISize(512, 1)).ok()); | ||
pass->AddBufferMemoryBarrier(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we're now running concurrently, we need a memory barrier here so that step 2 sees the output of step 1. |
||
} | ||
|
||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default we were creating a serial encoder, which acts like it has a global memory barrier between each command. Since we need to specify the barriers for Vulkan, we might as well use the concurrent dispatcher for iOS too.