Skip to content

Commit a4eddfc

Browse files
committed
rcache/grdma: try to prevent erroneous free error messages
It is possible to have parts of an in-use registered region be passed to munmap or madvise. This does not necessarily mean the user has made an error but does mean the entire region should be invalidated. This commit checks that the munmap or madvise base matches the beginning of the cached region. If it does and the region is in-use then we print an error. There will certainly be false-negatives where a user unmaps something that really is in-use but that is preferrable to a false-positive. References #4509 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit d3fa1bb) Signed-off-by: Nathan Hjelm <[email protected]>
1 parent 11431a4 commit a4eddfc

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

opal/mca/rcache/grdma/rcache_grdma_module.c

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* Copyright (c) 2006 Voltaire. All rights reserved.
1515
* Copyright (c) 2007 Mellanox Technologies. All rights reserved.
1616
* Copyright (c) 2010 IBM Corporation. All rights reserved.
17-
* Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights
17+
* Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights
1818
* reserved.
1919
* Copyright (c) 2013 NVIDIA Corporation. All rights reserved.
2020
* Copyright (c) 2016 Research Organization for Information Science
@@ -421,20 +421,29 @@ static int mca_rcache_grdma_deregister (mca_rcache_base_module_t *rcache,
421421
return rc;
422422
}
423423

424+
struct gc_add_args_t {
425+
void *base;
426+
size_t size;
427+
};
428+
typedef struct gc_add_args_t gc_add_args_t;
429+
424430
static int gc_add (mca_rcache_base_registration_t *grdma_reg, void *ctx)
425431
{
426432
mca_rcache_grdma_module_t *rcache_grdma = (mca_rcache_grdma_module_t *) grdma_reg->rcache;
427-
428-
/* unused */
429-
(void) ctx;
433+
gc_add_args_t *args = (gc_add_args_t *) ctx;
430434

431435
if (grdma_reg->flags & MCA_RCACHE_FLAGS_INVALID) {
432436
/* nothing more to do */
433437
return OPAL_SUCCESS;
434438
}
435439

436-
if (grdma_reg->ref_count) {
437-
/* attempted to remove an active registration */
440+
if (grdma_reg->ref_count && grdma_reg->base == args->base) {
441+
/* attempted to remove an active registration. to handle cases where part of
442+
* an active registration has been unmapped we check if the bases match. this
443+
* *hopefully* will suppress erroneously emitted errors. if we can't suppress
444+
* the erroneous error in all cases then this check and return should be removed
445+
* entirely. we are not required to give an error for a user freeing a buffer
446+
* that is in-use by MPI. Its just a nice to have. */
438447
return OPAL_ERROR;
439448
}
440449

@@ -456,7 +465,8 @@ static int mca_rcache_grdma_invalidate_range (mca_rcache_base_module_t *rcache,
456465
void *base, size_t size)
457466
{
458467
mca_rcache_grdma_module_t *rcache_grdma = (mca_rcache_grdma_module_t *) rcache;
459-
return mca_rcache_base_vma_iterate (rcache_grdma->cache->vma_module, base, size, gc_add, NULL);
468+
gc_add_args_t args = {.base = base, .size = size};
469+
return mca_rcache_base_vma_iterate (rcache_grdma->cache->vma_module, base, size, gc_add, &args);
460470
}
461471

462472
/* Make sure this registration request is not stale. In other words, ensure

0 commit comments

Comments
 (0)