-
Notifications
You must be signed in to change notification settings - Fork 68
Conversation
This commit changes the opal_fifo_push code to use opal_update_counted_pointer to set the head. This fixes a data race that occurs possibly because the read of the fifo head in opal_fifo_pop requires two instructions. This combined with the non-atomic update in opal_fifo_push can lead to an ABA issue that puts the fifo in an inconsistant state. There are other ways this problem could be fixed. One way would be to introduce an opal_atomic_read_128 implementation. On x86_64 this would have to use the cmpxchg16b instruction. Since this instruction would have to be in the pop path (and always executed) it would be slower than the fix in this commit. Closes open-mpi/ompi#1460. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@dc00021) Signed-off-by: Nathan Hjelm <[email protected]>
This commit adds support for runtime binary patching. The support is broken down into two parts: util/opal_patcher.[ch] which contains the functionality for runtime patching of symbols, and mca/memory/patcher which patches the various symbols needed to provide support for memory hooks. This work is preliminary and is based off work donated by IBM. The patcher code is disabled if dlopen is disabled. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@7aa03d6) Signed-off-by: Nathan Hjelm <[email protected]>
This commit makes it possible to set relative priorities for components. Before the addition of the patched component there was only one component that would run on any system but that is no longer the case. When determining which component to open each component's query function is called and the one that returns the highest priority is opened. The default priority of the patcher component is set slightly higher than the old ptmalloc2/ummunotify component. This commit fixes a long-standing break in the abstration of the memory components. ompi_mpi_init.c was referencing the linux malloc hook initilize function to ensure the hooks are initialized for libmpi.so. The abstraction break has been fixed by adding a memory base function that calls the open memory component's malloc hook init function if it has one. The code is not yet complete but is intended to support ptmalloc in 2.0.0. In that case the base function will always call the ptmalloc hook init if exists. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@11e2d78) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@4cac623) Signed-off-by: Nathan Hjelm <[email protected]>
The --enable-static gives us what we want: statically linked components. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@b1670f8) Signed-off-by: Nathan Hjelm <[email protected]>
This commit adds a framework to abstract runtime code patching. Components in the new framework can provide functions for either patching a named function or a function pointer. The later functionality is not being used but may provide a way to allow memory hooks when dlopen functionality is disabled. This commit adds two different flavors of code patching. The first is provided by the overwrite component. This component overwrites the first several instructions of the target function with code to jump to the provided hook function. The hook is expected to provide the full functionality of the hooked function. The linux patcher component is based on the memory hooks in ucx. It only works on linux and operates by overwriting function pointers in the symbol table. In this case the hook is free to call the original function using the function pointer returned by dlsym. Both components restore the original functions when the patcher framework closes. Changes had to be made to support Power/PowerPC with the Linux dynamic loader patcher. Some of the changes: - Move code necessary for powerpc/power support to the patcher base. The code is needed by both the overwrite and linux components. - Move patch structure down to base and move the patch list to mca_patcher_base_module_t. The structure has been modified to include a function pointer to the function that will unapply the patch. This allows the mixing of multiple different types of patches in the patch_list. - Update linux patching code to keep track of the matching between got entry and original (unpatched) address. This allows us to completely clean up the patch on finalize. All patchers keep track of the changes they made so that they can be reversed when the patcher framework is closed. At this time there are bugs in the Linux dynamic loader patcher so its priority is lower than the overwrite patcher. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@27f8a4e) Signed-off-by: Nathan Hjelm <[email protected]>
This commit fixes bugs that can cause crashes and memory corruption when the mremap hook is called. The problem occurs because of the ellipses (...) in the mremap intercept function. The ellipses cover the optional new_addr argument on Linux. This commit removes the ellipses and adds an explicit 5th argument. This commit also adds a hook for shmdt. The code only works on Linux at the moment as it needs to read /proc/self/maps to determine the size of the shared memory segment. Additionally, this commit removes the mmap hook. There is no apparent benefit for detecting mmap(..., PROT_NONE, ...) and it seems to cause problems when threads are in use. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@80ec79c) Signed-off-by: Nathan Hjelm <[email protected]>
This commit changes the configury of memory/linux to only allow it to build if the user specifies --with-memory-manager=linux. If this is specified then the ptmalloc component is forced on. If --with-memory-manager=linux is not specified the patcher memory component will be used. Systems without patcher support will not get a memory manager. Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn Might want to add open-mpi/ompi@4b3995d to silence a warning |
(cherry picked from open-mpi/ompi@4b3995d) Signed-off-by: Nathan Hjelm <[email protected]>
Done. |
Test PASSed. |
Test PASSed. |
Coverity found a problem last night:
@gpaulsen @markalle Can you confirm that this PR works properly on the v2.x branch? |
Should be ok. The read will fail and errno will be set. Didn't think it was worth checking the return on open. |
Can you fix anyway, please? 😃 |
Fix CID 1358512: Error handling issues (NEGATIVE_RETURNS): C libraries usually handle read (-1, ...) fine but it is safer to avoid calling read with a negative handle. Added negative file descriptor check. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@7f27117) Signed-off-by: Nathan Hjelm <[email protected]>
Test PASSed. |
…plementation Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit open-mpi/ompi@a8e90e8) Signed-off-by: Nathan Hjelm <[email protected]>
Test PASSed. |
We're getting MTT compile warnings and errors on master from Absoft:
|
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@1147fb3) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@5bc9d9d) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@d981a9f) Signed-off-by: Nathan Hjelm <[email protected]>
bot:nolabel:pushed-back |
Do you need open-mpi/ompi@16c2839, too? |
Test FAILed. |
@jsquyres Nope, that is part of the mpool rewrite. Different PR. |
Looks like a false positive from mellanox. :bot:retest: |
Test PASSed. |
Fixes open-mpi/ompi#1565 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@db854c3) Signed-off-by: Nathan Hjelm <[email protected]>
Test FAILed. |
bot:retest |
Test PASSed. |
@hjelmn Is this PR now good to go? |
Yup. |
|
||
void opal_memory_base_malloc_init_hook (void) | ||
{ | ||
#if MEMORY_LINUX_PTMALLOC2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit weird to have #if's for specific components in the base... is this really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better than the #if
being in ompi_mpi_init.c ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand why they are needed at all...? Can't these callbacks be done in a way that doesn't break abstractions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hjelmn Tells me that there's no component selected at this point. Grumble. Well, it's not a regression, so I guess we'll let this go for v2.0.0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it a partial fix for the abstraction break. Now there is no check for ptmalloc2 in ompi_mpi_init.c.
This is complicated stuff: add some comments so that future maintainers have some rationale to understand the way things have been done. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 78b367e)
This is complicated stuff: add some comments as to why there's an unfortunate-but-intentional abstraction violation in the memory base. (this commit did not come from master, because ptmalloc has been removed from master -- this commit is only for v2.x) Signed-off-by: Jeff Squyres <[email protected]>
Hjelmn v2.x mem hooks
…ponent. (cherry picked from open-mpi/ompi@ed6f508) Signed-off-by: Nathan Hjelm <[email protected]>
…component (cherry picked from open-mpi/ompi@55a15fb) Signed-off-by: Nathan Hjelm <[email protected]>
@hppritcha I think we're good to go here once CI finishes |
Test PASSed. |
yep I'll merge once travis is done. |
This commit brings the new patcher memory hook to v2.0.0. The requested changes to the master commits have been made. The linux memory manager has not been deleted but it will not always be used if --with-memory-manager=linux is specified and will not be built otherwise.
:bot:assign: @jsquyres
:bot:milestone:v2.0.0
:bot🏷️blocker
:bot🏷️bug