Skip to content

romio ADIOI_GEN_WriteStrided with mixed contig/non-contig types #4144

Open
@markalle

Description

@markalle

I'm able to hit the below bug with master, if I run on a filesystem that takes the ADIOI_GEN_WriteStrided path for MPI_File_write(). (I was using gpfs, and offhand the users of that path appear to be gpfs, hfs, ntfs, panfs, pfs, sfs, ufs, xfs.)

Testcase: https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24
% mpicc -o x romio_write_timing.c
% mpirun -mca io romio314 -np 4 ./x

The test has one of the ranks using a small contiguous datatype, eg

// for R0 : [ . . . x x . . . ]

while the other ranks use non-contiguous types that surround the above, eg

// for R1 : [ . . x . . x . . ]
// for R2 : [ . x . . . . x . ]
// for R3 : [ x . . . . . . x ]

The ADIOI_Gen_writestrided() in ad_write_str.c has the contiguous rank fall into this code:

            ADIO_WriteContig(fd, buf, count, datatype, ADIO_EXPLICIT_OFFSET,
                             offset, status, error_code);

around line 294, at which point it's not holding any ADIO_WRITE_LOCK.

While the non-contiguous ranks end up around line 356 in code that goes

/* contiguous in memory, noncontiguous in file. should be the most
   common case. */
...
                    ADIOI_BUFFERED_WRITE
(that macro boils down to ADIOI_WRITE_LOCK and read of a contiguous region like "x . . . . x")
...
        /* write the buffer out finally */
..
            ADIO_WriteContig(...);
            if (!(fd->atomicity))
                ADIOI_UNLOCK(fd, writebuf_off, SEEK_SET, writebuf_len);

If all the ranks had non-contiguous data and took the { lock, read, write, unlock } path it would work, but having some ranks do a direct contiguous write with no lock creates a race condition.

I prototyped a fix that added ADIOI_WRITE_LOCK/UNLOCK around the code the contiguous rank uses and that worked. But I'd like thoughts from someone with romio knowledge about whether there's a better way. I assume my fix would have performance implications.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions