[AMD] Skip mfma layout in maybeDuplicate #4170

zhanglx13 · 2024-06-19T22:12:15Z

The workaround introduced in #4048 "forgot" to skip mfma layout.

The workaround introduced in triton-lang#4048 "forgot" to skip mfma layout.

ThomasRaoux · 2024-06-20T01:17:28Z

include/triton/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVMBase.h

@@ -88,6 +88,8 @@ class ElementwiseOpConversionBase : public ConvertOpToLLVMPattern<SourceOp> {
      // encoding not available
      return resultVals;
    Attribute baseEncoding = encoding;
+    if (isa<AMDMfmaEncodingAttr>(baseEncoding))


are sliced of Mfma okay? Shouldn't this check go line 95

slice of mfma is ok after #3870

Can we add some comments on why skipping here to be clear?

The workaround introduced in #4048 "forgot" to skip mfma layout.

Update Update Update Update Add a more meaningful check to make sure we are not merging blocks (#4186) This is a follow-up to #4176 (comment) I am now counting the number of blocks with (17) and without (31) block merging. I double checked to make sure this does not pass when we use an aggressive region simplification strategy. [AMD] Skip mfma layout in maybeDuplicate (#4170) The workaround introduced in #4048 "forgot" to skip mfma layout. [TEST] Merge duplicate `max_num_imprecise_acc` tests and improve code (#4191) [DOCS][NFC] Fix doc formatting problems (#4195) 1. f-string cannot be used as docstrings in Python. 2. URLs should follow the reStructuredText format. 3. Code snippets in a code block should be indented. Tested and passed on a local machine. [BACKEND] Fix regression in pipeliner pre-checks. (#4196) During some previous refactoring we changed the logic and started pipeling cases that had incompatible shared encoding. This was missed because one of the lit test had not been updated :( Remove tl.multiple_of call from tma persistent kernel (#4198) [AMD] Guard against null in `BypassEpilogueSMEM` (#4203) `val.getDefiningOp()` can return `nullptr`. In this case, we must fail the `BypassEpilogueSMEM` rewrite pass for the given op. This prevents run-time crashes. [FRONTEND][NFC] Fix type checking, conditional logic, and loop structures for improved readability and performance (#4208) Document TRITON_HOME (#4210) Document the existence of `TRITON_HOME` environment variable. The `TRITON_HOME` variable controls the location of the `.triton` directory that stores, among other things, the files downloaded during a `pip install -e python` virtualenv build. By default, this is located in the user's home directory, at `~/.triton`. I was trying to build Triton on my system on a large local disk, but with limited network home directory space, and the `pip` command kept failing with out of disk space errors. It turned out that during installation, large files were downloaded to the `~/.triton` directory causing failure. After checking that it was not `pip` doing this, I found the `TRITON_HOME` variable which allowed me to workaround the issue and build Triton successfully. After seconding #4007, I decided to contribute this documentation fix. Co-authored-by: sree <sree@buckyball> [BACKEND] Fix regression in i1 reduction (#4215) Recent refactoring broke i1 shared memory load. [BUILD] update URL for LLVM tarballs (#4216) [BACKEND] Fix divisibility analysis for shift ops (#4221) Divisibility does not ensure that a value is not 0 therefore we cannot use divisibility as a minimum shifted values. Support FP8 constant (#4222) To unblock the compilation of kernels like below which don't operate arithmetically on FP8. ``` @triton.jit def triton_poi_fused__scaled_mm__to_copy_constant_pad_nd_lift_fresh_2(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 400624 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex % 784 x1 = (xindex // 784) x2 = xindex tmp0 = x0 tmp1 = tl.full([1], 769, tl.int64) tmp2 = tmp0 < tmp1 tmp3 = tl.load(in_ptr0 + (x0 + (769*x1)), tmp2 & xmask, other=0.0) tmp4 = tmp3.to(tl.float8e4nv) tmp5 = tl.full(tmp4.shape, 0.0, tmp4.dtype) tmp6 = tl.where(tmp2, tmp4, tmp5) tl.store(out_ptr0 + (x2), tmp6, xmask) ``` [INTERPRETER] Implement implicit tensor conversion for assignment operators (#4214) Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update Update

The workaround introduced in triton-lang#4048 "forgot" to skip mfma layout.

[AMD] Skip mfma layout in maybeDuplicate

c79231d

The workaround introduced in triton-lang#4048 "forgot" to skip mfma layout.

zhanglx13 requested a review from ThomasRaoux June 20, 2024 00:07

zhanglx13 marked this pull request as ready for review June 20, 2024 00:07

zhanglx13 requested a review from ptillet as a code owner June 20, 2024 00:07

zhanglx13 requested a review from antiagainst June 20, 2024 00:07

ThomasRaoux reviewed Jun 20, 2024

View reviewed changes

Added comments

dca557e

ThomasRaoux approved these changes Jun 24, 2024

View reviewed changes

zhanglx13 merged commit 0a66c1b into triton-lang:main Jun 24, 2024
6 checks passed

jlebar mentioned this pull request Jun 25, 2024

Pass repr=key when calling JITFunction.cache_hook #4207

Closed

Jokeren pushed a commit that referenced this pull request Jul 1, 2024

[AMD] Skip mfma layout in maybeDuplicate (#4170)

34766d6

The workaround introduced in #4048 "forgot" to skip mfma layout.

bertmaher pushed a commit to bertmaher/triton that referenced this pull request Sep 24, 2024

[AMD] Skip mfma layout in maybeDuplicate (triton-lang#4170)

67874ae

The workaround introduced in triton-lang#4048 "forgot" to skip mfma layout.

bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024

[AMD] Skip mfma layout in maybeDuplicate (triton-lang#4170)

7d3f8b8

The workaround introduced in triton-lang#4048 "forgot" to skip mfma layout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Skip mfma layout in maybeDuplicate #4170

[AMD] Skip mfma layout in maybeDuplicate #4170

Uh oh!

zhanglx13 commented Jun 19, 2024

Uh oh!

ThomasRaoux Jun 20, 2024

Uh oh!

zhanglx13 Jun 20, 2024

Uh oh!

antiagainst Jun 21, 2024

Uh oh!

Uh oh!

Uh oh!

[AMD] Skip mfma layout in maybeDuplicate #4170

[AMD] Skip mfma layout in maybeDuplicate #4170

Uh oh!

Conversation

zhanglx13 commented Jun 19, 2024

Uh oh!

ThomasRaoux Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

zhanglx13 Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

antiagainst Jun 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!