Update docs to mention DataSizeBased aggregation (#4639)

scottwittenburg · web-flow · commit c36a9c829971 · 2025-09-17T13:45:48.000-04:00
diff --git a/docs/user_guide/source/advanced/aggregation.rst b/docs/user_guide/source/advanced/aggregation.rst
@@ -18,6 +18,8 @@ There are two implementations of aggregation in BP5, none of them is the same as
 
 **EveryoneWrites** is the same strategy as the previous except that every process immediately writes its own data to its designated file. Since it basically implements an N-to-N write pattern, this method does not scale, so only use it up to a moderate number of processes (1-4 process * number of file system servers). At small scale, as long as the file system can deal with the on-rush of the write requests, this method can provide the fastest I/O. 
 
+**DataSizeBased** is also similar to *EveryoneWritesSerial*, except that before writing any timestep, writer ranks are first partitioned to balance the amount of data written to each subfile. The current greedy partitioning strategy is fast and "best effort", and likely won't produce subfiles of exactly equal size. Once writer chains are built from the partitioned ranks, writing proceeds exactly as in *EveryoneWritesSerial*. In this aggregator, use *NumSubFiles* to control the number of subfiles, as *NumAggregators* is ignored.
+
 **TwoLevelShm** has a subset of processes that actually write to disk (*NumAggregators*). There must be at least one process per compute node, which creates a shared-memory segment for other processes on the node to send their data. The aggregator process basically serializes the writing of data from this subset of processes (itself and the processes that send data to it). TwoLevelShm performs similarly to EveryoneWritesSerial on Lustre, and is the only good option on Summit's GPFS. 
 
 The number of files (*NumSubFiles*) can be smaller than *NumAggregators*, and then multiple aggregators will write to one file concurrently. Such a setup becomes useful when the number of nodes is many times more than the number of file servers.
diff --git a/docs/user_guide/source/engines/bp5.rst b/docs/user_guide/source/engines/bp5.rst
@@ -62,14 +62,14 @@ This engine allows the user to fine tune the buffering operations through the fo
 
 #. Aggregation
 
-   #. **AggregationType**: *TwoLevelShm*, *EveryoneWritesSerial* and
-      *EveryoneWrites* are three data aggregation strategies. See :ref:`Aggregation in BP5`. The default is *TwoLevelShm*.
+   #. **AggregationType**: *TwoLevelShm*, *EveryoneWritesSerial*, *DataSizeBased*, and
+      *EveryoneWrites* are four data aggregation strategies. See :ref:`Aggregation in BP5`. The default is *TwoLevelShm*.
  
-   #. **NumAggregators**: The number of processes that will ever write data directly to storage. The default is set to the number of compute nodes the application is running on (i.e. one process per compute node). TwoLevelShm will select a fixed number of processes *per compute-node* to get close to the intention of the user but does not guarantee the exact number of aggregators.
+   #. **NumAggregators**: The number of processes that will ever write data directly to storage. The default is set to the number of compute nodes the application is running on (i.e. one process per compute node). TwoLevelShm will select a fixed number of processes *per compute-node* to get close to the intention of the user but does not guarantee the exact number of aggregators. *DataSaizeBased* will ignore this configuration setting and set the value to *NumSubFiles*.
 
    #. **AggregatorRatio**: An alternative option to NumAggregators to pick every nth process as aggregator. The number of aggregators will be automatically kept to be within 1 and total number of processes no matter what bad number is supplied here. Moreover, TwoLevelShm will select an fixed number of processes *per compute-node* to get close to the intention of this ratio but does not guarantee the exact number of aggregators.
 
-   #. **NumSubFiles**: The number of data files to write to in the *.bp/* directory. Only used by *TwoLevelShm* aggregator, where the number of files can be smaller then the number of aggregators. The default is set to *NumAggregators*. 
+   #. **NumSubFiles**: The number of data files to write to in the *.bp/* directory. Used by *TwoLevelShm* and *DataSizeBased* aggregators.  For *TwoLevelShm* the number of files can be smaller then the number of aggregators, while for *DataSizeBased*, the number of aggregators is ignored and set equal to this value. The default is set to *NumAggregators*.
 
    #. **StripeSize**: The data blocks of different processes are aligned to this size (default is 4096 bytes) in the files. Its purpose is to avoid multiple processes to write to the same file system block and potentially slow down the write.  
 
@@ -160,10 +160,10 @@ This engine allows the user to fine tune the buffering operations through the fo
 =============================== ===================== ===========================================================
  OpenTimeoutSecs                 float                 **0** for *ReadRandomAccess* mode, **3600** for *Read* mode, ``10.0``, ``5``
  BeginStepPollingFrequencySecs   float                 **1**, 10.0 
- AggregationType                 string                **TwoLevelShm**, EveryoneWritesSerial, EveryoneWrites
- NumAggregators                  integer >= 1          **0 (one file per compute node)**
+ AggregationType                 string                **TwoLevelShm**, EveryoneWritesSerial, DataSizeBased, EveryoneWrites
+ NumAggregators                  integer >= 1          **0 (one file per compute node)**, ignored when *AggregationType=DataSizeBased*
  AggregatorRatio                 integer >= 1          not used unless set
- NumSubFiles                     integer >= 1          **=NumAggregators**, only used when *AggregationType=TwoLevelShm*
+ NumSubFiles                     integer >= 1          **=NumAggregators**, used when *AggregationType=TwoLevelShm* or *AggregationType=DataSizeBased*
  StripeSize                      integer+units         **4KB**
  MaxShmSize                      integer+units         **4294762496**
  BufferVType                     string                **chunk**, malloc