Skip to content

Commit 3952cfb

Browse files
authored
Some tweaks to the SME lazy save scheme (#325)
This patch does four things: - Explicitly document that ZT0 is a temporary register. - Clarify that __arm_sme_save and __arm_sme_restore treat both ZA and ZT0 specially. - Add a general note about how platforms should handle asynchronous control flow (such as POSIX signals). The specific details are out of scope for the AAPCS64, which is why this wasn't mentioned before. However, for the sake of an example, we can at least sketch the approach that linux will use. - Change the notes around setjmp and longjmp to recommend that both routines turn ZA off. Just turning ZA off in longjmp isn't enough when jumping out of a linux-style signal handler, since longjmp would not be able to see any ZA save buffer that was in use when the signal was raised, nor the ZA contents that should be saved to it. The previous recommendation to turn ZA off only in longjmp was chosen by analogy with exception unwinding; it seemed to make sense in both cases to commit a lazy save once it was known that some frames would be unwound. However, the two cases are not analogous for asynchronous signals, since exception unwinders can examine each intermediate frame and see the point at which a signal handler was entered, whereas longjmp (in general) cannot. It may not be strictly necessary for longjmp to commit a lazy save if setjmp does; it might be enough for it to force PSTATE.ZA and TPIDR2_EL0 to zero. However, that would be a novel state transition and it would be optimising for what should be a vanishingly rare case (the one where there is an in-progress SME routine between a setjmp and a longjmp). It seems safer to use the standard transition instead.
1 parent 4741495 commit 3952cfb

File tree

1 file changed

+90
-22
lines changed

1 file changed

+90
-22
lines changed

aapcs64/aapcs64.rst

Lines changed: 90 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,12 @@ changes to the content of the document for that release.
264264
| | | - Add agnostic-ZA interface and routines to save/restore SME |
265265
| | | state. |
266266
+------------+--------------------+------------------------------------------------------------------+
267+
| | | - Explicitly say that ZT0 is a temporary register. |
268+
| | | - Add a note about the interaction between the SME lazy save |
269+
| | | scheme and asynchronous transfers of control. |
270+
| | | - Recommend that ``setjmp`` as well as ``longjmp`` call |
271+
| | | ``__arm_za_disable``. |
272+
+------------+--------------------+------------------------------------------------------------------+
267273

268274
References
269275
^^^^^^^^^^
@@ -414,7 +420,7 @@ Global register
414420
Program state
415421
The state of the program’s memory, including values in machine registers.
416422

417-
Scratch register, temporary register, caller-saved register
423+
Scratch register, _`temporary register`, caller-saved register
418424
A register used to hold an intermediate value during a calculation (usually, such values are not named in the program source and have a limited lifetime). If a function needs to preserve the value held in such a register over a call to another function, then the calling function must save and restore the value.
419425

420426
Callee-saved register
@@ -978,6 +984,9 @@ TPIDR2_EL0
978984

979985
See `TPIDR2_EL0`_ for a description of how the AAPCS64 uses this register.
980986

987+
In addition, SME2 defines a 512-bit register ZT0, which is accessible when
988+
PSTATE.ZA is 1. The AAPCS64 defines ZT0 to be a `temporary register`_.
989+
981990
Threads and processes
982991
---------------------
983992

@@ -1729,6 +1738,68 @@ it induces ``memset`` to modify ``BLK`` before ``memset`` has returned:
17291738
TPIDR2_EL0 = &BLK;
17301739
memset(&BLK, 0, 16); // Non-conforming
17311740
1741+
Asynchronous transfers of control
1742+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1743+
1744+
**(Beta)**
1745+
1746+
The specification of the lazy save scheme in this document says that
1747+
certain things must be true of a thread at all times after initialization,
1748+
rather than requiring those things to be true only at function-call
1749+
boundaries. As a result, the specification also restricts the order
1750+
in which certain state transitions can happen.
1751+
1752+
This is done to support asynchronous transfers of control, as provided
1753+
by things like POSIX signals. The exact treatment of such transfers
1754+
is platform-specific and is outside the scope of this specification.
1755+
However, in the case where the asynchronous transfer involves calling
1756+
some form of handler, a platform must implement transfers to and from
1757+
the handler in a way that is compatible with the lazy save scheme.
1758+
This applies even if the handler returns abnormally, such as by calling
1759+
``longjmp`` or by raising an exception, if the platform allows such
1760+
abnormal returns.
1761+
1762+
One approach that a platform can take for such handlers is as follows:
1763+
1764+
* Save all SME state before entering a handler.
1765+
1766+
* Leave a record on the stack so that an unwinder can recover this
1767+
saved information.
1768+
1769+
* After saving the SME state, set PSTATE.ZA and TPIDR2_EL0 to zero,
1770+
so that the handler starts with ZA in the “off” state.
1771+
1772+
* On a normal return from the handler, restore the saved SME state
1773+
and resume execution.
1774+
1775+
* Make ``setjmp`` and ``longjmp`` turn ZA off; see `setjmp and longjmp`_
1776+
for details.
1777+
1778+
* Make the platform subroutines for throwing exceptions turn ZA off; see
1779+
`Exceptions`_ for details. When unwinding out of a handler, use the
1780+
saved TPIDR2_EL0 information to see whether ZA was previously dormant;
1781+
that is, if TPIDR2_EL0 pointed to a TPIDR2 block that has a nonnull ZA
1782+
save buffer. If so, copy the saved ZA contents to this buffer.
1783+
1784+
Note that if an asynchronous exception is thrown while ZA is active,
1785+
the subroutines that are using that active ZA state cannot catch the
1786+
exception and expect to recover the contents of ZA at the point that
1787+
the exception was thrown.
1788+
1789+
Other state controlled by PSTATE.ZA
1790+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1791+
1792+
**(Beta)**
1793+
1794+
Access to the SME2 ZT0 register is also controlled by PSTATE.ZA.
1795+
As described in `SME state`_, the AAPCS64 defines ZT0 to be a
1796+
`temporary register`_, meaning that its contents may be changed by a
1797+
call to any subroutine, unless the subroutine makes a specific promise
1798+
not to do so. Subroutines that make such a promise are said to
1799+
“preserve ZT0”.
1800+
1801+
ZT0 is therefore not handled by the lazy save scheme.
1802+
17321803
Types of subroutine interface
17331804
-----------------------------
17341805

@@ -2411,12 +2482,13 @@ by PSTATE.ZA.
24112482

24122483
* The subroutine is called ``__arm_sme_save``.
24132484

2414-
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
2485+
* The subroutine has a `streaming-compatible interface`_ with
24152486
the following properties:
24162487

24172488
* X1-X15, X19-X29 and SP are call-preserved.
24182489
* Z0-Z31 are call-preserved.
24192490
* P0-P15 are call-preserved.
2491+
* ZA and ZT0 are handled specially, as described below.
24202492

24212493
* The subroutine takes the following arguments:
24222494

@@ -2472,12 +2544,13 @@ enabled by PSTATE.ZA.
24722544

24732545
* The subroutine is called ``__arm_sme_restore``.
24742546

2475-
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
2476-
the following properties:
2547+
* The subroutine has a `streaming-compatible interface`_ with the
2548+
following properties:
24772549

24782550
* X1-X15, X19-X29 and SP are call-preserved.
24792551
* Z0-Z31 are call-preserved.
24802552
* P0-P15 are call-preserved.
2553+
* ZA and ZT0 are handled specially, as described below.
24812554

24822555
* The subroutine takes the following arguments:
24832556

@@ -2589,8 +2662,8 @@ conforming way for S to flush any dormant ZA state before S uses ZA itself:
25892662

25902663
If S has a `private-ZA`_ interface, the following pseudo-code describes a
25912664
conforming way for S to clear PSTATE.ZA. This procedure is useful for
2592-
things like the C subroutine ``longjmp`` (see `setjmp and longjmp`_) and
2593-
exception unwinders (see `Exceptions`_).
2665+
things like the C subroutines ``setjmp`` and ``longjmp`` (see
2666+
`setjmp and longjmp`_) and exception unwinders (see `Exceptions`_).
25942667

25952668
.. code-block:: c++
25962669

@@ -3096,27 +3169,22 @@ specifically to ``setjmp`` and ``longjmp``:
30963169
* ZA must be in the “off” state when ``setjmp`` returns to its caller via a
30973170
``longjmp``.
30983171

3099-
``longjmp`` can meet this requirement by using ``__arm_za_disable``
3100-
to `turn ZA off`_.
3101-
3102-
The intention of this definition is to allow a subroutine that has a
3103-
`private-ZA`_ interface to use ``setjmp`` and ``longjmp`` without being aware
3104-
of ZA.
3172+
A platform can meet this requirement by making both ``setjmp`` and ``longjmp``
3173+
call ``__arm_za_disable`` to `turn ZA off`_.
31053174

3106-
This approach to saving ZA is intended to be conservatively correct.
3107-
It may lead to ``longjmp`` saving the ZA contents for a subroutine that is
3108-
about to be “unwound”, in which case the save is wasted work but is
3109-
otherwise harmless.
3175+
.. note::
31103176

3111-
``setjmp`` is encouraged not to `commit a lazy save`_. The intention is
3112-
for ``longjmp`` rather than ``setjmp`` to bear the cost of the save,
3113-
because not all calls to ``setjmp`` have a partnering call to
3114-
``longjmp``.
3177+
A consequence of this is that any existing open-coded copies of
3178+
``setjmp`` and ``longjmp`` become invalid.
31153179

31163180
.. note::
31173181

3118-
A consequence of this is that any existing open-coded copies of ``longjmp``
3119-
become invalid.
3182+
Versions 2025Q1 and earlier of the AAPCS64 instead recommended that
3183+
only ``longjmp`` call ``__arm_za_disable``. However, that approach
3184+
was found to be incompatible with the handling of POSIX signals on
3185+
some platforms. The underlying requirement that ZA must be in the
3186+
“off” state when ``setjmp`` returns to its caller via a ``longjmp``
3187+
is unchanged.
31203188

31213189
Exceptions
31223190
----------

0 commit comments

Comments
 (0)