You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch does four things:
- Explicitly document that ZT0 is a temporary register.
- Clarify that __arm_sme_save and __arm_sme_restore treat both ZA
and ZT0 specially.
- Add a general note about how platforms should handle asynchronous
control
flow (such as POSIX signals). The specific details are out of scope
for the AAPCS64, which is why this wasn't mentioned before. However,
for the sake of an example, we can at least sketch the approach that
linux will use.
- Change the notes around setjmp and longjmp to recommend that both
routines turn ZA off. Just turning ZA off in longjmp isn't enough
when jumping out of a linux-style signal handler, since longjmp
would not be able to see any ZA save buffer that was in use when the
signal was raised, nor the ZA contents that should be saved to it.
The previous recommendation to turn ZA off only in longjmp was chosen
by analogy with exception unwinding; it seemed to make sense in both
cases to commit a lazy save once it was known that some frames would
be unwound. However, the two cases are not analogous for asynchronous
signals, since exception unwinders can examine each intermediate frame
and see the point at which a signal handler was entered, whereas
longjmp (in general) cannot.
It may not be strictly necessary for longjmp to commit a lazy save
if setjmp does; it might be enough for it to force PSTATE.ZA and
TPIDR2_EL0 to zero. However, that would be a novel state transition
and it would be optimising for what should be a vanishingly rare case
(the one where there is an in-progress SME routine between a setjmp
and a longjmp). It seems safer to use the standard transition instead.
A register used to hold an intermediate value during a calculation (usually, such values are not named in the program source and have a limited lifetime). If a function needs to preserve the value held in such a register over a call to another function, then the calling function must save and restore the value.
419
425
420
426
Callee-saved register
@@ -978,6 +984,9 @@ TPIDR2_EL0
978
984
979
985
See `TPIDR2_EL0`_ for a description of how the AAPCS64 uses this register.
980
986
987
+
In addition, SME2 defines a 512-bit register ZT0, which is accessible when
988
+
PSTATE.ZA is 1. The AAPCS64 defines ZT0 to be a `temporary register`_.
989
+
981
990
Threads and processes
982
991
---------------------
983
992
@@ -1729,6 +1738,68 @@ it induces ``memset`` to modify ``BLK`` before ``memset`` has returned:
1729
1738
TPIDR2_EL0 = &BLK;
1730
1739
memset(&BLK, 0, 16); // Non-conforming
1731
1740
1741
+
Asynchronous transfers of control
1742
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1743
+
1744
+
**(Beta)**
1745
+
1746
+
The specification of the lazy save scheme in this document says that
1747
+
certain things must be true of a thread at all times after initialization,
1748
+
rather than requiring those things to be true only at function-call
1749
+
boundaries. As a result, the specification also restricts the order
1750
+
in which certain state transitions can happen.
1751
+
1752
+
This is done to support asynchronous transfers of control, as provided
1753
+
by things like POSIX signals. The exact treatment of such transfers
1754
+
is platform-specific and is outside the scope of this specification.
1755
+
However, in the case where the asynchronous transfer involves calling
1756
+
some form of handler, a platform must implement transfers to and from
1757
+
the handler in a way that is compatible with the lazy save scheme.
1758
+
This applies even if the handler returns abnormally, such as by calling
1759
+
``longjmp`` or by raising an exception, if the platform allows such
1760
+
abnormal returns.
1761
+
1762
+
One approach that a platform can take for such handlers is as follows:
1763
+
1764
+
* Save all SME state before entering a handler.
1765
+
1766
+
* Leave a record on the stack so that an unwinder can recover this
1767
+
saved information.
1768
+
1769
+
* After saving the SME state, set PSTATE.ZA and TPIDR2_EL0 to zero,
1770
+
so that the handler starts with ZA in the “off” state.
1771
+
1772
+
* On a normal return from the handler, restore the saved SME state
1773
+
and resume execution.
1774
+
1775
+
* Make ``setjmp`` and ``longjmp`` turn ZA off; see `setjmp and longjmp`_
1776
+
for details.
1777
+
1778
+
* Make the platform subroutines for throwing exceptions turn ZA off; see
1779
+
`Exceptions`_ for details. When unwinding out of a handler, use the
1780
+
saved TPIDR2_EL0 information to see whether ZA was previously dormant;
1781
+
that is, if TPIDR2_EL0 pointed to a TPIDR2 block that has a nonnull ZA
1782
+
save buffer. If so, copy the saved ZA contents to this buffer.
1783
+
1784
+
Note that if an asynchronous exception is thrown while ZA is active,
1785
+
the subroutines that are using that active ZA state cannot catch the
1786
+
exception and expect to recover the contents of ZA at the point that
1787
+
the exception was thrown.
1788
+
1789
+
Other state controlled by PSTATE.ZA
1790
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1791
+
1792
+
**(Beta)**
1793
+
1794
+
Access to the SME2 ZT0 register is also controlled by PSTATE.ZA.
1795
+
As described in `SME state`_, the AAPCS64 defines ZT0 to be a
1796
+
`temporary register`_, meaning that its contents may be changed by a
1797
+
call to any subroutine, unless the subroutine makes a specific promise
1798
+
not to do so. Subroutines that make such a promise are said to
1799
+
“preserve ZT0”.
1800
+
1801
+
ZT0 is therefore not handled by the lazy save scheme.
1802
+
1732
1803
Types of subroutine interface
1733
1804
-----------------------------
1734
1805
@@ -2411,12 +2482,13 @@ by PSTATE.ZA.
2411
2482
2412
2483
* The subroutine is called ``__arm_sme_save``.
2413
2484
2414
-
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
2485
+
* The subroutine has a `streaming-compatible interface`_ with
2415
2486
the following properties:
2416
2487
2417
2488
* X1-X15, X19-X29 and SP are call-preserved.
2418
2489
* Z0-Z31 are call-preserved.
2419
2490
* P0-P15 are call-preserved.
2491
+
* ZA and ZT0 are handled specially, as described below.
2420
2492
2421
2493
* The subroutine takes the following arguments:
2422
2494
@@ -2472,12 +2544,13 @@ enabled by PSTATE.ZA.
2472
2544
2473
2545
* The subroutine is called ``__arm_sme_restore``.
2474
2546
2475
-
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
2476
-
the following properties:
2547
+
* The subroutine has a `streaming-compatible interface`_ with the
2548
+
following properties:
2477
2549
2478
2550
* X1-X15, X19-X29 and SP are call-preserved.
2479
2551
* Z0-Z31 are call-preserved.
2480
2552
* P0-P15 are call-preserved.
2553
+
* ZA and ZT0 are handled specially, as described below.
2481
2554
2482
2555
* The subroutine takes the following arguments:
2483
2556
@@ -2589,8 +2662,8 @@ conforming way for S to flush any dormant ZA state before S uses ZA itself:
2589
2662
2590
2663
If S has a `private-ZA`_ interface, the following pseudo-code describes a
2591
2664
conforming way for S to clear PSTATE.ZA. This procedure is useful for
2592
-
things like the C subroutine ``longjmp`` (see `setjmp and longjmp`_) and
2593
-
exception unwinders (see `Exceptions`_).
2665
+
things like the C subroutines ``setjmp`` and ``longjmp`` (see
2666
+
`setjmp and longjmp`_) and exception unwinders (see `Exceptions`_).
2594
2667
2595
2668
.. code-block:: c++
2596
2669
@@ -3096,27 +3169,22 @@ specifically to ``setjmp`` and ``longjmp``:
3096
3169
* ZA must be in the “off” state when ``setjmp`` returns to its caller via a
3097
3170
``longjmp``.
3098
3171
3099
-
``longjmp`` can meet this requirement by using ``__arm_za_disable``
3100
-
to `turn ZA off`_.
3101
-
3102
-
The intention of this definition is to allow a subroutine that has a
3103
-
`private-ZA`_ interface to use ``setjmp`` and ``longjmp`` without being aware
3104
-
of ZA.
3172
+
A platform can meet this requirement by making both ``setjmp`` and ``longjmp``
3173
+
call ``__arm_za_disable`` to `turn ZA off`_.
3105
3174
3106
-
This approach to saving ZA is intended to be conservatively correct.
3107
-
It may lead to ``longjmp`` saving the ZA contents for a subroutine that is
3108
-
about to be “unwound”, in which case the save is wasted work but is
3109
-
otherwise harmless.
3175
+
.. note::
3110
3176
3111
-
``setjmp`` is encouraged not to `commit a lazy save`_. The intention is
3112
-
for ``longjmp`` rather than ``setjmp`` to bear the cost of the save,
3113
-
because not all calls to ``setjmp`` have a partnering call to
3114
-
``longjmp``.
3177
+
A consequence of this is that any existing open-coded copies of
3178
+
``setjmp`` and ``longjmp`` become invalid.
3115
3179
3116
3180
.. note::
3117
3181
3118
-
A consequence of this is that any existing open-coded copies of ``longjmp``
3119
-
become invalid.
3182
+
Versions 2025Q1 and earlier of the AAPCS64 instead recommended that
3183
+
only ``longjmp`` call ``__arm_za_disable``. However, that approach
3184
+
was found to be incompatible with the handling of POSIX signals on
3185
+
some platforms. The underlying requirement that ZA must be in the
3186
+
“off” state when ``setjmp`` returns to its caller via a ``longjmp``
0 commit comments