Skip to content

Commit d8bde8f

Browse files
committed
Update the ULFM Readme
Comment indentation Signed-off-by: Aurelien Bouteiller <[email protected]>
1 parent b7f9d29 commit d8bde8f

File tree

2 files changed

+149
-103
lines changed

2 files changed

+149
-103
lines changed

docs/features/ulfm.rst

Lines changed: 145 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,18 @@ User-Level Fault Mitigation (ULFM)
66
This chapter documents the features and options specific to the **User
77
Level Failure Mitigation (ULFM)** Open MPI implementation.
88

9+
Quick Start
10+
-----------
11+
12+
This is an extremely terse summary of how to use ULFM:
13+
14+
.. code-block::
15+
16+
shell$ ./configure --with-ft=ulfm [...options...]
17+
shell$ make [-j N] all install
18+
shell$ mpicc my-ft-program.c -o my-ft-program
19+
shell$ mpiexec -n 4 --with-ft ulfm my-ft-program
20+
921
Features
1022
--------
1123

@@ -100,11 +112,12 @@ Available from: https://journals.sagepub.com/doi/10.1177/1094342013488238.
100112
Building ULFM support in Open MPI
101113
---------------------------------
102114

103-
In Open MPI |ompi_ver|, ULFM support is **enabled by default** |mdash|
104-
when you build Open MPI, unless you specify ``--without-ft``, ULFM
105-
support will automatically be built.
115+
In Open MPI |ompi_ver|, ULFM support is **built-in by default** |mdash|
116+
that is, when you build Open MPI, unless you specify ``--without-ft``, ULFM
117+
support is automatically available (but is inactive unless enabled at
118+
runtime).
106119

107-
Optionally, you can specify ``--with-ft`` to ensure that ULFM support
120+
Optionally, you can specify ``--with-ft ulfm`` to ensure that ULFM support
108121
is definitely built.
109122

110123
Support notes
@@ -114,89 +127,6 @@ Support notes
114127
that if you are going to use ULFM, you should disable building
115128
OpenSHMEM with ``--disable-oshmem``.
116129

117-
* SLURM is tested and supported with fault tolerance.
118-
119-
.. important:: Do not use ``srun``, or your application gets killed
120-
by the scheduler upon the first failure. Instead,
121-
use ``mpirun`` in an ``salloc/sbatch`` allocation.
122-
123-
* LSF is untested with fault tolerance.
124-
125-
* PBS/Torque is tested and supported with fault tolerance.
126-
127-
.. important:: Be sure to use ``mpirun`` in a ``qsub`` allocation.
128-
129-
Modified, Untested and Disabled Components
130-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
131-
132-
Frameworks and components which are not listed in the following list
133-
are unmodified and support fault tolerance. Listed frameworks may be
134-
**modified** (and work after a failure), **untested** (and work before
135-
a failure, but may malfunction after a failure), or **disabled** (they
136-
cause unspecified behavior all around when FT is enabled).
137-
138-
All runtime disabled components are listed in the ``ft-mpi`` aggregate
139-
MCA param file
140-
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
141-
runtime behavior with ULFM by either setting or unsetting variables in
142-
this file (or by overiding the variable on the command line (e.g.,
143-
``--mca btl ofi,self``). Note that if fault tolerance is disabled at
144-
runtime, these components will load normally (this may change observed
145-
performance when comparing with and without fault tolerance).
146-
147-
* ``pml``: MPI point-to-point management layer
148-
149-
* ``monitoring``, ``v``: **untested** (they have not been modified
150-
to handle faults)
151-
* ``cm``, ``crcpw``, ``ucx``: **disabled**
152-
153-
* ``btl``: Point-to-point Byte Transfer Layer
154-
155-
* ``ofi``, ``portals4``, ``smcuda``, ``usnic``, ``sm(+knem)``:
156-
**untested** (they may work properly, please report)
157-
158-
* ``mtl``: Matching transport layer Used for MPI point-to-point messages on
159-
some types of networks
160-
161-
* All ``mtl`` components are **disabled**
162-
163-
* ``coll``: MPI collective algorithms
164-
165-
* ``cuda``, ``inter``, ``sync``, ``sm``: **untested** (they have not
166-
been modified to handle faults, but we expect correct post-fault
167-
behavior)
168-
* ``hcoll``, ``portals4`` **disabled** (they have not been modified
169-
to handle faults, and we expect unspecified post-fault behavior)
170-
171-
* ``osc``: MPI one-sided communications
172-
173-
* All ``osc`` components are **untested** (they have not been
174-
modified to handle faults, and we expect unspecified post-fault
175-
behavior)
176-
177-
* ``io``: MPI I/O and dependent components
178-
179-
* ``fs``: File system functions for MPI I/O
180-
* ``fbtl``: File byte transfer layer: abstraction for individual
181-
read/write operations for OMPIO
182-
* ``fcoll``: Collective read and write operations for MPI I/O
183-
* ``sharedfp``: Shared file pointer operations for MPI I/O
184-
* All components in these frameworks are unmodified, **untested**
185-
(we expect clean post-failure abort)
186-
187-
* ``vprotocol``: Checkpoint/Restart components
188-
189-
* These components have not been modified to handle faults, and are
190-
**untested**.
191-
192-
* ``threads``, ``wait-sync``: Multithreaded wait-synchronization
193-
object
194-
195-
* ``argotbots``, ``qthreads``: **disabled** (these components have
196-
not been modified to handle faults; we expect post-failure
197-
deadlock)
198-
199-
200130
Running ULFM Open MPI
201131
---------------------
202132

@@ -215,12 +145,14 @@ Running your application
215145

216146
You can launch your application with fault tolerance by simply using
217147
the normal Open MPI ``mpiexec`` launcher, with the
218-
``--with-ft ulfm`` CLI option:
148+
``--with-ft ulfm`` CLI option (or its synonym ``--with-ft mpi``):
219149

220150
.. code-block::
221151
222152
shell$ mpirun --with-ft ulfm ...
223153
154+
.. important:: by default, fault tolerance is not active.
155+
224156
Running under a batch scheduler
225157
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
226158

@@ -231,6 +163,18 @@ process fails. In order to avoid this problem, it is preferred that
231163
you use ``mpiexec`` within an allocation (e.g., ``salloc``,
232164
``sbatch``, ``qsub``) rather than a direct launch (e.g., ``srun``).
233165

166+
* SLURM is tested and supported with fault tolerance.
167+
168+
.. important:: Do not use ``srun``, or your application gets killed
169+
by the scheduler upon the first failure. Instead,
170+
use ``mpirun`` in an ``salloc/sbatch`` allocation.
171+
172+
* LSF is untested with fault tolerance.
173+
174+
* PBS/Torque is tested and supported with fault tolerance.
175+
176+
.. important:: Be sure to use ``mpirun`` in a ``qsub`` allocation.
177+
234178
Run-time tuning knobs
235179
^^^^^^^^^^^^^^^^^^^^^
236180

@@ -240,12 +184,21 @@ most cases. You can change the default settings with ``--mca
240184
mpi_ft_foo <value>`` for Open MPI options, and with ``--prtemca
241185
errmgr_detector_bar <value>`` for PRTE options.
242186

187+
.. important:: The main control for enabling/disabling fault tolerance
188+
at runtime is the ``--with-ft ulfm`` (or its synomym
189+
``--with-ft mpi``) ``mpiexec`` CLI option. This option
190+
setup multiple subsystems of Open MPI to enable fault
191+
tolerance. The options described below are best used to
192+
overide the default behavior after the ``--with-ft ulfm``
193+
opion is used.
194+
243195
PRTE level options
244196
~~~~~~~~~~~~~~~~~~
245197

246-
* ``prrte_enable_recovery <true|false> (default: false)`` controls
198+
* ``prrte_enable_ft <true|false> (default: false)`` controls
247199
automatic cleanup of apps with failed processes within
248-
mpirun. Enabling this option also enables ``mpi_ft_enable``.
200+
mpirun. This option is automatically set to ``true`` when using
201+
``--with-ft ulfm``.
249202
* ``errmgr_detector_priority <int> (default 1005``) selects the
250203
PRRTE-based failure detector. Only available when
251204
``prte_enable_recovery`` is ``true``. You can set this to ``0`` when
@@ -263,17 +216,33 @@ PRTE level options
263216
Open MPI level options
264217
~~~~~~~~~~~~~~~~~~~~~~
265218

266-
* ``mpi_ft_enable <true|false> (default: same as
267-
prrte_enable_recovery)`` permits turning on/off fault tolerance at
268-
runtime. When false, failure detection is disabled; Interfaces
269-
defined by the fault tolerance extensions are substituted with dummy
270-
non-fault tolerant implementations (e.g., ``MPIX_Comm_agree`` is
271-
implemented with ``MPI_Allreduce``); All other controls below become
272-
irrelevant.
219+
Some default values are applied to some Open MPI parameters when using
220+
``mpiexec --with-ft ulfm``. These defaults are obtained from the ``ft-mpi``
221+
aggregate MCA param file
222+
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
223+
runtime behavior with ULFM by either setting or unsetting variables in
224+
this file, or by overiding the variable on the command line (e.g.,
225+
``--mca btl ofi,self``).
226+
227+
.. important:: Note that if fault tolerance is disabled at runtime,
228+
that is, when not using ``--with-ft ulfm``), the
229+
``ft-mpi`` MCA param file is not loaded, thus
230+
components that are unsafe for fault tolerance will
231+
load normally (this may change observed performance
232+
when comparing with and without fault tolerance).
233+
234+
* ``mpi_ft_enable <true|false> (default: false)``
235+
permits turning on/off fault tolerance at runtime. This option is
236+
automatically set to ``true`` from the aggregate MCA param file
237+
``ft-mpi`` loaded when using ``--with-ft ulfm``. When false, failure
238+
detection is disabled; Interfaces defined by the fault tolerance extensions
239+
are substituted with dummy non-fault tolerant implementations (e.g.,
240+
``MPIX_Comm_agree`` is implemented with ``MPI_Allreduce``); All other
241+
controls below become irrelevant.
273242
* ``mpi_ft_verbose <int> (default: 0)`` increases the output of the
274243
fault tolerance activities. A value of 1 will report detected
275244
failures.
276-
* ``mpi_ft_detector <true|false> (default: false)``, **EXPERIMENTAL**
245+
* ``mpi_ft_detector <true|false> (default: false)``, **DEPRECATED**
277246
controls the activation of the Open MPI level failure detector. When
278247
this detector is turned off, all failure detection is delegated to
279248
PRTE (see above). The Open MPI level fault detector is
@@ -291,22 +260,99 @@ Open MPI level options
291260
latency (typically 1us increase).* You may want to **enable this
292261
option if you experience false positive** processes incorrectly
293262
reported as failed with the Open MPI failure detector.
263+
This option is only relevant when `mpi_ft_detector` is `true`.
294264
* ``mpi_ft_detector_period <float> (default: 3e0 seconds)`` heartbeat
295265
period. Recommended value is 1/3 of the timeout. _Values lower than
296266
100us may impart a noticeable effect on latency (typically a 3us
297267
increase)._
268+
This option is only relevant when `mpi_ft_detector` is `true`.
298269
* ``mpi_ft_detector_timeout <float> (default: 1e1 seconds)`` heartbeat
299270
timeout (i.e. failure detection speed). Recommended value is 3 times
300271
the heartbeat period.
272+
This option is only relevant when `mpi_ft_detector` is `true`.
301273

302274
Known Limitations in ULFM
303-
^^^^^^^^^^^^^^^^^^^^^^^^^
275+
-------------------------
304276

305277
* InfiniBand support is provided through the UCT BTL; fault tolerant
306278
operation over the UCX PML is not yet supported for production runs.
307279
* TOPO, FILE, RMA are not fault tolerant. They are expected to work
308280
properly before the occurence of the first failure.
309281

282+
Modified, Untested and Disabled Components
283+
------------------------------------------
284+
285+
Frameworks and components which are not listed in the following list
286+
are unmodified and support fault tolerance. Listed frameworks may be
287+
**modified** (and work after a failure), **untested** (and work before
288+
a failure, but may malfunction after a failure), or **disabled** (they
289+
cause unspecified behavior all around when FT is enabled).
290+
291+
All runtime disabled components are listed in the ``ft-mpi`` aggregate
292+
MCA param file
293+
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
294+
runtime behavior with ULFM by either setting or unsetting variables in
295+
this file (or by overiding the variable on the command line (e.g.,
296+
``--mca btl ofi,self``).
297+
298+
.. important:: Note that if fault tolerance is disabled at runtime,
299+
the ``ft-mpi`` MCA param file is not loaded, thus
300+
components that are unsafe for fault tolerance will
301+
load normally (this may change observed performance
302+
when comparing with and without fault tolerance).
303+
304+
* ``pml``: MPI point-to-point management layer
305+
306+
* ``monitoring``, ``v``: **untested** (they have not been modified
307+
to handle faults)
308+
* ``cm``, ``crcpw``, ``ucx``: **disabled**
309+
310+
* ``btl``: Point-to-point Byte Transfer Layer
311+
312+
* ``ofi``, ``portals4``, ``smcuda``, ``usnic``, ``sm(+knem)``:
313+
**untested** (they may work properly, please report)
314+
315+
* ``mtl``: Matching transport layer Used for MPI point-to-point messages on
316+
some types of networks
317+
318+
* All ``mtl`` components are **disabled**
319+
320+
* ``coll``: MPI collective algorithms
321+
322+
* ``cuda``, ``inter``, ``sync``, ``sm``: **untested** (they have not
323+
been modified to handle faults, but we expect correct post-fault
324+
behavior)
325+
* ``hcoll``, ``portals4`` **disabled** (they have not been modified
326+
to handle faults, and we expect unspecified post-fault behavior)
327+
328+
* ``osc``: MPI one-sided communications
329+
330+
* All ``osc`` components are **untested** (they have not been
331+
modified to handle faults, and we expect unspecified post-fault
332+
behavior)
333+
334+
* ``io``: MPI I/O and dependent components
335+
336+
* ``fs``: File system functions for MPI I/O
337+
* ``fbtl``: File byte transfer layer: abstraction for individual
338+
read/write operations for OMPIO
339+
* ``fcoll``: Collective read and write operations for MPI I/O
340+
* ``sharedfp``: Shared file pointer operations for MPI I/O
341+
* All components in these frameworks are unmodified, **untested**
342+
(we expect clean post-failure abort)
343+
344+
* ``vprotocol``: Checkpoint/Restart components
345+
346+
* These components have not been modified to handle faults, and are
347+
**untested**.
348+
349+
* ``threads``, ``wait-sync``: Multithreaded wait-synchronization
350+
object
351+
352+
* ``argotbots``, ``qthreads``: **disabled** (these components have
353+
not been modified to handle faults; we expect post-failure
354+
deadlock)
355+
310356
Changelog
311357
---------
312358

ompi/runtime/ompi_mpi_params.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -406,10 +406,10 @@ int ompi_mpi_register_params(void)
406406

407407
#if OPAL_ENABLE_FT_MPI
408408
/* Before loading any other part of the MPI library, we need to load
409-
* * the ft-mpi tune file to override default component selection when
410-
* * FT is desired ON; this does override openmpi-params.conf, but not
411-
* * command line or env.
412-
* */
409+
* the ft-mpi tune file to override default component selection when
410+
* FT is desired ON; this does override openmpi-params.conf, but not
411+
* command line or env.
412+
*/
413413
if( ompi_ftmpi_enabled ) {
414414
mca_base_var_load_extra_files("ft-mpi", false);
415415
}

0 commit comments

Comments
 (0)