Feature/CPU Detection for Apple M1

Originally posted [here](https://github.com/JuliaLang/julia/issues/36617#issuecomment-843753220).

The Apple M1 [supports ARMv8.4-A](https://en.wikipedia.org/wiki/Apple_M1), but Julia/LLVM treats it like an A7/Cyclone CPU:
```julia
julia> versioninfo()
Julia Version 1.7.0-DEV.1107
Commit 5aca7a37be* (2021-05-15 16:39 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin20.3.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cyclone)
Environment:
  JULIA_NUM_THREADS = 4
```
Which [is ARMv8-a](https://en.wikipedia.org/wiki/Apple_A7). Although the page on the [A14](https://en.wikipedia.org/wiki/Apple_A14) claims it is ARMv8.5-a. for the firestorm/icestorm cores.

As such, atomics are implemented using a load link/conditional store loop:
```julia
julia> a = Threads.Atomic{Int}(1)
Base.Threads.Atomic{Int64}(1)

julia> @code_native Threads.atomic_add!(a, 2)
```
```asm
        .section        __TEXT,__text,regular,pure_instructions
; ┌ @ atomics.jl:405 within `atomic_add!'
        mov     x8, x0
L4:
        ldaxr   x0, [x8]
        add     x9, x0, x1
        stlxr   w10, x9, [x8]
        cbnz    w10, L4
        ret
; └
```
```julia
julia> @code_native Threads.atomic_cas!(a, 5, 2)
```
```asm
        .section        __TEXT,__text,regular,pure_instructions
; ┌ @ atomics.jl:373 within `atomic_cas!'
        mov     x8, x0
L4:
        ldaxr   x0, [x8]
        cmp     x0, x1
        b.ne    L28
        stlxr   w9, x2, [x8]
        cbnz    w9, L4
        ret
L28:
        clrex
        ret
; └
```
However, if I start Julia with ` -C'armv8.4-a'`:
```julia
julia> a = Threads.Atomic{Int}(1)
Base.Threads.Atomic{Int64}(1)

julia> @code_native Threads.atomic_add!(a, 2)
```
```asm
        .section        __TEXT,__text,regular,pure_instructions
; ┌ @ atomics.jl:405 within `atomic_add!'
        ldaddal x1, x0, [x0]
        ret
; └
```
```julia
julia> @code_native Threads.atomic_cas!(a, 5, 2)
```
```asm
        .section        __TEXT,__text,regular,pure_instructions
; ┌ @ atomics.jl:373 within `atomic_cas!'
        casal   x1, x2, [x0]
        mov     x0, x1
        ret
; └
```

Starting Julia without `-C` flags:
```julia
julia> using Octavian

julia> M = K = N = 72; A = rand(M,K); B = rand(K,N); C1 = @time(A * B); C0 = similar(C1);
  0.000087 seconds (2 allocations: 40.578 KiB)

julia> @benchmark matmul!($C0,$A,$B) # threaded matmul uses atomics
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.425 μs (0.00% GC)
  median time:      6.525 μs (0.00% GC)
  mean time:        6.530 μs (0.00% GC)
  maximum time:     14.592 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5
```
With `-C'armv8.4-a'`:
```julia
julia> using Octavian

julia> M = K = N = 72; A = rand(M,K); B = rand(K,N); C1 = @time(A * B); C0 = similar(C1);
  0.000100 seconds (2 allocations: 40.578 KiB)

julia> @benchmark matmul!($C0,$A,$B) # threaded matmul uses atomics
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.258 μs (0.00% GC)
  median time:      6.525 μs (0.00% GC)
  mean time:        6.532 μs (0.00% GC)
  maximum time:     13.475 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5
```
I made non-x86 architectures (including the M1) ramp up thread use more slowly, because earlier performance tests suggested the M1 had higher threading overhead. Maybe that was partly because of atomics, and partly because of the lack of a shared L3 cache, and of course maybe for other reasons I don't know.

There's of course [more](https://en.wikipedia.org/wiki/AArch64) than just atomics separating armv8.(4/5)-a and armv8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature/CPU Detection for Apple M1 #40876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature/CPU Detection for Apple M1 #40876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions