Skip to content

Commit d724405

Browse files
committed
Update the ULFM Readme
Comment indentation Signed-off-by: Aurelien Bouteiller <[email protected]>
1 parent b7f9d29 commit d724405

File tree

2 files changed

+148
-103
lines changed

2 files changed

+148
-103
lines changed

docs/features/ulfm.rst

Lines changed: 144 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,17 @@ User-Level Fault Mitigation (ULFM)
66
This chapter documents the features and options specific to the **User
77
Level Failure Mitigation (ULFM)** Open MPI implementation.
88

9+
TL;DR
10+
-----
11+
This is an extremely terse summary of how to use ULFM:
12+
13+
.. code-block::
14+
15+
./configure --with-ft=ulfm [...options...]
16+
make [-j N] all install
17+
mpicc my-ft-program.c -o my-ft-program
18+
mpiexec -n 4 --with-ft ulfm my-ft-program
19+
920
Features
1021
--------
1122

@@ -100,11 +111,12 @@ Available from: https://journals.sagepub.com/doi/10.1177/1094342013488238.
100111
Building ULFM support in Open MPI
101112
---------------------------------
102113

103-
In Open MPI |ompi_ver|, ULFM support is **enabled by default** |mdash|
104-
when you build Open MPI, unless you specify ``--without-ft``, ULFM
105-
support will automatically be built.
114+
In Open MPI |ompi_ver|, ULFM support is **built-in by default** |mdash|
115+
that is, when you build Open MPI, unless you specify ``--without-ft``, ULFM
116+
support is automatically available (but is inactive unless enabled at
117+
runtime).
106118

107-
Optionally, you can specify ``--with-ft`` to ensure that ULFM support
119+
Optionally, you can specify ``--with-ft ulfm`` to ensure that ULFM support
108120
is definitely built.
109121

110122
Support notes
@@ -114,89 +126,6 @@ Support notes
114126
that if you are going to use ULFM, you should disable building
115127
OpenSHMEM with ``--disable-oshmem``.
116128

117-
* SLURM is tested and supported with fault tolerance.
118-
119-
.. important:: Do not use ``srun``, or your application gets killed
120-
by the scheduler upon the first failure. Instead,
121-
use ``mpirun`` in an ``salloc/sbatch`` allocation.
122-
123-
* LSF is untested with fault tolerance.
124-
125-
* PBS/Torque is tested and supported with fault tolerance.
126-
127-
.. important:: Be sure to use ``mpirun`` in a ``qsub`` allocation.
128-
129-
Modified, Untested and Disabled Components
130-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
131-
132-
Frameworks and components which are not listed in the following list
133-
are unmodified and support fault tolerance. Listed frameworks may be
134-
**modified** (and work after a failure), **untested** (and work before
135-
a failure, but may malfunction after a failure), or **disabled** (they
136-
cause unspecified behavior all around when FT is enabled).
137-
138-
All runtime disabled components are listed in the ``ft-mpi`` aggregate
139-
MCA param file
140-
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
141-
runtime behavior with ULFM by either setting or unsetting variables in
142-
this file (or by overiding the variable on the command line (e.g.,
143-
``--mca btl ofi,self``). Note that if fault tolerance is disabled at
144-
runtime, these components will load normally (this may change observed
145-
performance when comparing with and without fault tolerance).
146-
147-
* ``pml``: MPI point-to-point management layer
148-
149-
* ``monitoring``, ``v``: **untested** (they have not been modified
150-
to handle faults)
151-
* ``cm``, ``crcpw``, ``ucx``: **disabled**
152-
153-
* ``btl``: Point-to-point Byte Transfer Layer
154-
155-
* ``ofi``, ``portals4``, ``smcuda``, ``usnic``, ``sm(+knem)``:
156-
**untested** (they may work properly, please report)
157-
158-
* ``mtl``: Matching transport layer Used for MPI point-to-point messages on
159-
some types of networks
160-
161-
* All ``mtl`` components are **disabled**
162-
163-
* ``coll``: MPI collective algorithms
164-
165-
* ``cuda``, ``inter``, ``sync``, ``sm``: **untested** (they have not
166-
been modified to handle faults, but we expect correct post-fault
167-
behavior)
168-
* ``hcoll``, ``portals4`` **disabled** (they have not been modified
169-
to handle faults, and we expect unspecified post-fault behavior)
170-
171-
* ``osc``: MPI one-sided communications
172-
173-
* All ``osc`` components are **untested** (they have not been
174-
modified to handle faults, and we expect unspecified post-fault
175-
behavior)
176-
177-
* ``io``: MPI I/O and dependent components
178-
179-
* ``fs``: File system functions for MPI I/O
180-
* ``fbtl``: File byte transfer layer: abstraction for individual
181-
read/write operations for OMPIO
182-
* ``fcoll``: Collective read and write operations for MPI I/O
183-
* ``sharedfp``: Shared file pointer operations for MPI I/O
184-
* All components in these frameworks are unmodified, **untested**
185-
(we expect clean post-failure abort)
186-
187-
* ``vprotocol``: Checkpoint/Restart components
188-
189-
* These components have not been modified to handle faults, and are
190-
**untested**.
191-
192-
* ``threads``, ``wait-sync``: Multithreaded wait-synchronization
193-
object
194-
195-
* ``argotbots``, ``qthreads``: **disabled** (these components have
196-
not been modified to handle faults; we expect post-failure
197-
deadlock)
198-
199-
200129
Running ULFM Open MPI
201130
---------------------
202131

@@ -215,12 +144,14 @@ Running your application
215144

216145
You can launch your application with fault tolerance by simply using
217146
the normal Open MPI ``mpiexec`` launcher, with the
218-
``--with-ft ulfm`` CLI option:
147+
``--with-ft ulfm`` CLI option (or its synonym ``--with-ft mpi``):
219148

220149
.. code-block::
221150
222151
shell$ mpirun --with-ft ulfm ...
223152
153+
.. important:: by default, fault tolerance is not active.
154+
224155
Running under a batch scheduler
225156
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
226157

@@ -231,6 +162,18 @@ process fails. In order to avoid this problem, it is preferred that
231162
you use ``mpiexec`` within an allocation (e.g., ``salloc``,
232163
``sbatch``, ``qsub``) rather than a direct launch (e.g., ``srun``).
233164

165+
* SLURM is tested and supported with fault tolerance.
166+
167+
.. important:: Do not use ``srun``, or your application gets killed
168+
by the scheduler upon the first failure. Instead,
169+
use ``mpirun`` in an ``salloc/sbatch`` allocation.
170+
171+
* LSF is untested with fault tolerance.
172+
173+
* PBS/Torque is tested and supported with fault tolerance.
174+
175+
.. important:: Be sure to use ``mpirun`` in a ``qsub`` allocation.
176+
234177
Run-time tuning knobs
235178
^^^^^^^^^^^^^^^^^^^^^
236179

@@ -240,12 +183,21 @@ most cases. You can change the default settings with ``--mca
240183
mpi_ft_foo <value>`` for Open MPI options, and with ``--prtemca
241184
errmgr_detector_bar <value>`` for PRTE options.
242185

186+
.. important:: The main control for enabling/disabling fault tolerance
187+
at runtime is the ``--with-ft ulfm`` (or its synomym
188+
``--with-ft mpi``) ``mpiexec`` CLI option. This option
189+
setup multiple subsystems of Open MPI to enable fault
190+
tolerance. The options described below are best used to
191+
overide the default behavior after the ``--with-ft ulfm``
192+
opion is used.
193+
243194
PRTE level options
244195
~~~~~~~~~~~~~~~~~~
245196

246-
* ``prrte_enable_recovery <true|false> (default: false)`` controls
197+
* ``prrte_enable_ft <true|false> (default: false)`` controls
247198
automatic cleanup of apps with failed processes within
248-
mpirun. Enabling this option also enables ``mpi_ft_enable``.
199+
mpirun. This option is automatically set to ``true`` when using
200+
``--with-ft ulfm``.
249201
* ``errmgr_detector_priority <int> (default 1005``) selects the
250202
PRRTE-based failure detector. Only available when
251203
``prte_enable_recovery`` is ``true``. You can set this to ``0`` when
@@ -263,17 +215,33 @@ PRTE level options
263215
Open MPI level options
264216
~~~~~~~~~~~~~~~~~~~~~~
265217

266-
* ``mpi_ft_enable <true|false> (default: same as
267-
prrte_enable_recovery)`` permits turning on/off fault tolerance at
268-
runtime. When false, failure detection is disabled; Interfaces
269-
defined by the fault tolerance extensions are substituted with dummy
270-
non-fault tolerant implementations (e.g., ``MPIX_Comm_agree`` is
271-
implemented with ``MPI_Allreduce``); All other controls below become
272-
irrelevant.
218+
Some default values are applied to some Open MPI parameters when using
219+
``mpiexec --with-ft ulfm``. These defaults are obtained from the ``ft-mpi``
220+
aggregate MCA param file
221+
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
222+
runtime behavior with ULFM by either setting or unsetting variables in
223+
this file, or by overiding the variable on the command line (e.g.,
224+
``--mca btl ofi,self``).
225+
226+
.. important:: Note that if fault tolerance is disabled at runtime,
227+
that is, when not using ``--with-ft ulfm``), the
228+
``ft-mpi`` MCA param file is not loaded, thus
229+
components that are unsafe for fault tolerance will
230+
load normally (this may change observed performance
231+
when comparing with and without fault tolerance).
232+
233+
* ``mpi_ft_enable <true|false> (default: false)``
234+
permits turning on/off fault tolerance at runtime. This option is
235+
automatically set to ``true`` from the aggregate MCA param file
236+
``ft-mpi`` loaded when using ``--with-ft ulfm``. When false, failure
237+
detection is disabled; Interfaces defined by the fault tolerance extensions
238+
are substituted with dummy non-fault tolerant implementations (e.g.,
239+
``MPIX_Comm_agree`` is implemented with ``MPI_Allreduce``); All other
240+
controls below become irrelevant.
273241
* ``mpi_ft_verbose <int> (default: 0)`` increases the output of the
274242
fault tolerance activities. A value of 1 will report detected
275243
failures.
276-
* ``mpi_ft_detector <true|false> (default: false)``, **EXPERIMENTAL**
244+
* ``mpi_ft_detector <true|false> (default: false)``, **DEPRECATED**
277245
controls the activation of the Open MPI level failure detector. When
278246
this detector is turned off, all failure detection is delegated to
279247
PRTE (see above). The Open MPI level fault detector is
@@ -291,22 +259,99 @@ Open MPI level options
291259
latency (typically 1us increase).* You may want to **enable this
292260
option if you experience false positive** processes incorrectly
293261
reported as failed with the Open MPI failure detector.
262+
This option is only relevant when `mpi_ft_detector` is `true`.
294263
* ``mpi_ft_detector_period <float> (default: 3e0 seconds)`` heartbeat
295264
period. Recommended value is 1/3 of the timeout. _Values lower than
296265
100us may impart a noticeable effect on latency (typically a 3us
297266
increase)._
267+
This option is only relevant when `mpi_ft_detector` is `true`.
298268
* ``mpi_ft_detector_timeout <float> (default: 1e1 seconds)`` heartbeat
299269
timeout (i.e. failure detection speed). Recommended value is 3 times
300270
the heartbeat period.
271+
This option is only relevant when `mpi_ft_detector` is `true`.
301272

302273
Known Limitations in ULFM
303-
^^^^^^^^^^^^^^^^^^^^^^^^^
274+
-------------------------
304275

305276
* InfiniBand support is provided through the UCT BTL; fault tolerant
306277
operation over the UCX PML is not yet supported for production runs.
307278
* TOPO, FILE, RMA are not fault tolerant. They are expected to work
308279
properly before the occurence of the first failure.
309280

281+
Modified, Untested and Disabled Components
282+
------------------------------------------
283+
284+
Frameworks and components which are not listed in the following list
285+
are unmodified and support fault tolerance. Listed frameworks may be
286+
**modified** (and work after a failure), **untested** (and work before
287+
a failure, but may malfunction after a failure), or **disabled** (they
288+
cause unspecified behavior all around when FT is enabled).
289+
290+
All runtime disabled components are listed in the ``ft-mpi`` aggregate
291+
MCA param file
292+
``$installdir/share/openmpi/amca-param-sets/ft-mpi``. You can tune the
293+
runtime behavior with ULFM by either setting or unsetting variables in
294+
this file (or by overiding the variable on the command line (e.g.,
295+
``--mca btl ofi,self``).
296+
297+
.. important:: Note that if fault tolerance is disabled at runtime,
298+
the ``ft-mpi`` MCA param file is not loaded, thus
299+
components that are unsafe for fault tolerance will
300+
load normally (this may change observed performance
301+
when comparing with and without fault tolerance).
302+
303+
* ``pml``: MPI point-to-point management layer
304+
305+
* ``monitoring``, ``v``: **untested** (they have not been modified
306+
to handle faults)
307+
* ``cm``, ``crcpw``, ``ucx``: **disabled**
308+
309+
* ``btl``: Point-to-point Byte Transfer Layer
310+
311+
* ``ofi``, ``portals4``, ``smcuda``, ``usnic``, ``sm(+knem)``:
312+
**untested** (they may work properly, please report)
313+
314+
* ``mtl``: Matching transport layer Used for MPI point-to-point messages on
315+
some types of networks
316+
317+
* All ``mtl`` components are **disabled**
318+
319+
* ``coll``: MPI collective algorithms
320+
321+
* ``cuda``, ``inter``, ``sync``, ``sm``: **untested** (they have not
322+
been modified to handle faults, but we expect correct post-fault
323+
behavior)
324+
* ``hcoll``, ``portals4`` **disabled** (they have not been modified
325+
to handle faults, and we expect unspecified post-fault behavior)
326+
327+
* ``osc``: MPI one-sided communications
328+
329+
* All ``osc`` components are **untested** (they have not been
330+
modified to handle faults, and we expect unspecified post-fault
331+
behavior)
332+
333+
* ``io``: MPI I/O and dependent components
334+
335+
* ``fs``: File system functions for MPI I/O
336+
* ``fbtl``: File byte transfer layer: abstraction for individual
337+
read/write operations for OMPIO
338+
* ``fcoll``: Collective read and write operations for MPI I/O
339+
* ``sharedfp``: Shared file pointer operations for MPI I/O
340+
* All components in these frameworks are unmodified, **untested**
341+
(we expect clean post-failure abort)
342+
343+
* ``vprotocol``: Checkpoint/Restart components
344+
345+
* These components have not been modified to handle faults, and are
346+
**untested**.
347+
348+
* ``threads``, ``wait-sync``: Multithreaded wait-synchronization
349+
object
350+
351+
* ``argotbots``, ``qthreads``: **disabled** (these components have
352+
not been modified to handle faults; we expect post-failure
353+
deadlock)
354+
310355
Changelog
311356
---------
312357

ompi/runtime/ompi_mpi_params.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -406,10 +406,10 @@ int ompi_mpi_register_params(void)
406406

407407
#if OPAL_ENABLE_FT_MPI
408408
/* Before loading any other part of the MPI library, we need to load
409-
* * the ft-mpi tune file to override default component selection when
410-
* * FT is desired ON; this does override openmpi-params.conf, but not
411-
* * command line or env.
412-
* */
409+
* the ft-mpi tune file to override default component selection when
410+
* FT is desired ON; this does override openmpi-params.conf, but not
411+
* command line or env.
412+
*/
413413
if( ompi_ftmpi_enabled ) {
414414
mca_base_var_load_extra_files("ft-mpi", false);
415415
}

0 commit comments

Comments
 (0)