Skip to content

Some tests failing in the M3 Max #10

Closed
@dgazzoni

Description

@dgazzoni

This is the output of make test in an Apple M3 Max:

Testing AMX_LDX... Failed on iteration 0.0 (operand 0xe7ee80da3d4b9e09)
Testing AMX_LDY... Failed on iteration 0.0 (operand 0xe7ee80da3d4b9e09)
Testing AMX_LDZ... OK   
Testing AMX_LDZI... OK   
Testing AMX_STX... OK   
Testing AMX_STY... OK   
Testing AMX_STZ... OK   
Testing AMX_STZI... OK   
Testing AMX_EXTRX... OK   
Testing AMX_EXTRY... OK   
Testing AMX_MAC16... OK   
Testing AMX_FMA16... OK   
Testing AMX_FMA32... OK   
Testing AMX_FMA64... OK   
Testing AMX_FMS16... OK   
Testing AMX_FMS32... OK   
Testing AMX_FMS64... OK   
Testing AMX_VECINT... OK   
Testing AMX_VECFP... OK   
Testing AMX_MATINT... Failed on iteration 1.575 (operand 0xd060b15ad5995f37)
Testing AMX_MATFP... OK   
Testing AMX_GENLUT... OK

This suggests either Apple broke compatibility with the previous versions, or there are new features using some of the previously-ignored bit in the parameters to these instructions.

I think the former is unlikely, as I have been writing lots of AMX code lately, with excellent test coverage, and I'm yet to see any unexplained failures in my software tests, e.g. something that behaves differently from the M3 than the M1 I also have here (using your documented M1 features, and also some of the documented M2 features, which work as expected on the M3). So hopefully there are new features in the M3.

I investigated this a bit by changing the random values, and I see that for AMX_LDX and AMX_LDY, out of the previously ignored bits (63, 61 and 59), only bit 61 is always set in case of a test error; for 63 and 59, they are sometimes set and sometimes not (indeed, I've seen an error for which bits 63 and 59 were not set, only 61 was).

So I wrote a small program to investigate this, and found that bit 61 represents a strided load: when loading pairs, the stride is 4 (that is, if you start at X0, it loads to X0 and X4), whereas when loading 4 at a time, the stride is 2 (e.g. X0, X2, X4, X6). I will attach a test program and its output on my M3. For AMX_LDY, results are identical.

As for AMX_MATINT, I collected a bunch of values where the tests fail:

0xd060b15ad5995f37
0xd26ab256885620e0
0xd060b15ad5995f37
0x0e61b375b73c8104
0xbc7870a58e4864bc
0xce6b3046d4af6812
0xa069335db08b4b0e
0x5a71b34bf47fe485
0xee7172c6ce0a04ec
0xa87ef14a1baca54d
0xb662b045bc40cdb0
0xc4697074e454ab6f

ANDing these together, the common theme is bits 44, 45, 53 and 54 set. I see that having bits 53 and 54 set means an indexed load in ALU mode 8. For that mode, there are two lane width modes (i.e. bits 45:42): 10 or any other value. However, having bits 44 and 45 set would correspond to 12.

If you'd like to investigate, but don't have access to an M3, I can run any tests you need; just let me know.

ldx_m3_src.txt
ldx_m3_out.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions