Description
I'm able to hit the below bug with master, if I run on a filesystem that takes the ADIOI_GEN_WriteStrided path for MPI_File_write(). (I was using gpfs, and offhand the users of that path appear to be gpfs, hfs, ntfs, panfs, pfs, sfs, ufs, xfs.)
Testcase: https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24
% mpicc -o x romio_write_timing.c
% mpirun -mca io romio314 -np 4 ./x
The test has one of the ranks using a small contiguous datatype, eg
// for R0 : [ . . . x x . . . ]
while the other ranks use non-contiguous types that surround the above, eg
// for R1 : [ . . x . . x . . ]
// for R2 : [ . x . . . . x . ]
// for R3 : [ x . . . . . . x ]
The ADIOI_Gen_writestrided() in ad_write_str.c has the contiguous rank fall into this code:
ADIO_WriteContig(fd, buf, count, datatype, ADIO_EXPLICIT_OFFSET,
offset, status, error_code);
around line 294, at which point it's not holding any ADIO_WRITE_LOCK.
While the non-contiguous ranks end up around line 356 in code that goes
/* contiguous in memory, noncontiguous in file. should be the most
common case. */
...
ADIOI_BUFFERED_WRITE
(that macro boils down to ADIOI_WRITE_LOCK and read of a contiguous region like "x . . . . x")
...
/* write the buffer out finally */
..
ADIO_WriteContig(...);
if (!(fd->atomicity))
ADIOI_UNLOCK(fd, writebuf_off, SEEK_SET, writebuf_len);
If all the ranks had non-contiguous data and took the { lock, read, write, unlock } path it would work, but having some ranks do a direct contiguous write with no lock creates a race condition.
I prototyped a fix that added ADIOI_WRITE_LOCK/UNLOCK around the code the contiguous rank uses and that worked. But I'd like thoughts from someone with romio knowledge about whether there's a better way. I assume my fix would have performance implications.