-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] add compute pass API for memory barriers, re-enable for Vulkan. #49946
Changes from 10 commits
845e8f6
94e274e
e049730
30e474c
2b4c3bc
802de21
ec94606
e5f44ce
0c2adf5
0ec1f76
555e92d
8220ebf
e9adf67
303f1e7
18f6f54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -213,8 +213,53 @@ bool ComputePassVK::BindResource(size_t binding, | |
return true; | ||
} | ||
|
||
// Note: | ||
// https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how that is the case. Though you do need the buffer and texture references to specify the barriers on. So this approach is definitely more easy to use if coarse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could easily add these to the API and create the more specific barriers, but given that we dont have any way to evaluate it this yet, I'd prefer to follow the recommendations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sgtm |
||
// Seems to suggest that anything more finely grained than a global memory | ||
// barrier is likely to be ignored. Confirming this on mobile devices will | ||
// require some experimentation. | ||
|
||
// |ComputePass| | ||
void ComputePassVK::AddBufferMemoryBarrier() { | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These to me seem underspecified. Right now, this says "All commands before the barrier must have buffer-writes in their compute shaders be visible to buffer-reads in also the compute shaders of commands after the barrier". The same can apply to transfer read-write ( You can just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would also match up with Metal semantics implemented above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would I need to specify transfer dependencies in a barrier API that is specifically for compute-compute dependencies? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I clarified on the API docs that it is only for compute-compute dependencies, but I can add a caveat that it does not cover other cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair. But the first compute pass still needs to wait for say transfer writes on the buffers it works on to be visible. So my expectation would be that compute-compute dependencies while necessary are not sufficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't use transfer commands anywhere yet though. my expectation is that if we needed this we would add this blit/transfer -> compute barrier functionality to the blit pass and not to the compute pass. |
||
barrier.dstAccessMask = vk::AccessFlagBits::eShaderRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eComputeShader, {}, 1, &barrier, 0, {}, 0, {}); | ||
} | ||
|
||
// |ComputePass| | ||
void ComputePassVK::AddTextureMemoryBarrier() { | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
barrier.dstAccessMask = vk::AccessFlagBits::eShaderRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eComputeShader, {}, 1, &barrier, 0, {}, 0, {}); | ||
} | ||
|
||
// |ComputePass| | ||
bool ComputePassVK::EncodeCommands() const { | ||
// Since we only use global memory barrier, we don't have to worry about | ||
// compute to compute dependencies across cmd buffers. Instead, we pessimize | ||
// here and assume that we wrote to a storage image or buffer and that a | ||
// render pass will read from it. if there are ever scenarios where we end up | ||
// with compute to compute dependencies this should be revisited. | ||
|
||
// This does not currently handle image barriers as we do not use them | ||
// for anything. | ||
vk::MemoryBarrier barrier; | ||
barrier.srcAccessMask = vk::AccessFlagBits::eShaderWrite; | ||
barrier.dstAccessMask = | ||
vk::AccessFlagBits::eIndexRead | vk::AccessFlagBits::eVertexAttributeRead; | ||
|
||
command_buffer_->GetEncoder()->GetCommandBuffer().pipelineBarrier( | ||
vk::PipelineStageFlagBits::eComputeShader, | ||
vk::PipelineStageFlagBits::eVertexInput, {}, 1, &barrier, 0, {}, 0, {}); | ||
|
||
return true; | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,25 @@ class ComputePass : public ResourceBinder { | |
|
||
virtual fml::Status Compute(const ISize& grid_size) = 0; | ||
|
||
/// @brief Ensures all previously encoded commands have finished writing to | ||
/// buffers before any following commands can read from those buffer. | ||
/// | ||
/// This method after encoding a command that writes to a storage | ||
/// buffer that a subsequent command will read from, provided that | ||
/// is a compute command and not a render pass command. It does not | ||
/// matter if the compute command is in a different command buffer. | ||
virtual void AddBufferMemoryBarrier() = 0; | ||
|
||
/// @brief Ensures all previously encoded commands have finished writing to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor edit for clarity: "Ensures that side effects (to textures) of all commands previously encoded in the pass are visible to subsequent commands in the pass." Since "finished writing" isn't quite the same as having those writes be visible to subsequent commands. Because of caches and such. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
/// textures before any following commands can read from those | ||
/// textures. | ||
/// | ||
/// This method after encoding a command that writes to a storage | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is quite confusing to me. Did you mean "Call this method?" as a recommended usage? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah removed, I don't know what I was doing here |
||
/// texture that a subsequent command will read from, provided that | ||
/// is a compute command and not a render pass command. It does not | ||
/// matter if the compute command is in a different command buffer. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Different command buffer but still subject to queue submission order right? Perhaps I am being pedantic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, done. Expect of course on metal where the command buffers are themselves a unit of synchronization. |
||
virtual void AddTextureMemoryBarrier() = 0; | ||
|
||
//---------------------------------------------------------------------------- | ||
/// @brief Encode the recorded commands to the underlying command buffer. | ||
/// | ||
|
@@ -43,6 +62,8 @@ class ComputePass : public ResourceBinder { | |
/// | ||
virtual bool EncodeCommands() const = 0; | ||
|
||
const Context& GetContext() const { return *context_; } | ||
|
||
protected: | ||
const std::shared_ptr<const Context> context_; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -308,6 +308,7 @@ TEST_P(ComputeTest, MultiStageInputAndOutput) { | |
CS1::BindOutput(*pass, DeviceBuffer::AsBufferView(output_buffer_1)); | ||
|
||
ASSERT_TRUE(pass->Compute(ISize(512, 1)).ok()); | ||
pass->AddBufferMemoryBarrier(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we're now running concurrently, we need a memory barrier here so that step 2 sees the output of step 1. |
||
} | ||
|
||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default we were creating a serial encoder, which acts like it has a global memory barrier between each command. Since we need to specify the barriers for Vulkan, we might as well use the concurrent dispatcher for iOS too.