Skip to content

Commit 1366dbd

Browse files
committed
[ET-VK] Changing texture access pattern for conv2d pw op to improve performance.
Pull Request resolved: #7476 This diff changes the texture access pattern for conv2d pw op to iterate first on x axis then y and then z to improve performance. ghstack-source-id: 260166241 @exported-using-ghexport Differential Revision: [D67769100](https://our.internmc.facebook.com/intern/diff/D67769100/)
1 parent c805499 commit 1366dbd

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

backends/vulkan/runtime/graph/ops/glsl/conv2d_pw.glsl

+4-4
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,13 @@ shared u16vec2 pos_shared[gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroup
4343
* size is only 1x1, making it easier to re-use loaded texels from t_kernel.
4444
*/
4545
void main() {
46-
const uint16_t out_limits_y_scaled = uint16_t((out_limits.y + TILE_SIZE - 1) / TILE_SIZE);
46+
const uvec2 out_limits_scaled = (out_limits.xy + TILE_SIZE - 1) / TILE_SIZE;
4747
const uint shared_mem_stride = gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z;
4848

4949
const u16vec3 gpos = u16vec3(
50-
gl_GlobalInvocationID.x / (out_limits_y_scaled * out_limits.z),
51-
(gl_GlobalInvocationID.x / out_limits.z) % out_limits_y_scaled,
52-
gl_GlobalInvocationID.x % out_limits.z);
50+
gl_GlobalInvocationID.x % out_limits_scaled.x,
51+
(gl_GlobalInvocationID.x / out_limits_scaled.x) % out_limits_scaled.y,
52+
gl_GlobalInvocationID.x / (out_limits_scaled.x * out_limits_scaled.y));
5353

5454
// Output position for TILE_SIZE = 2
5555
// +--------+--------+

0 commit comments

Comments
 (0)