-
Notifications
You must be signed in to change notification settings - Fork 68
Conversation
This commit fixes a compile error when the system has mremap but not MREMAP_FIXED. In this case we do not care about the value of new_address as the argument does not exist. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@14c34ae) Signed-off-by: Nathan Hjelm <[email protected]>
Fixed a couple of typos in ia64 code. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@f8b3be6) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@52edb43) Signed-off-by: Nathan Hjelm <[email protected]>
The function signature of mremap on BSD (NetBSD, FreeBSD) differs from the linux version. Added support for the BSD style of mremap. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@eb14b34) Signed-off-by: Nathan Hjelm <[email protected]>
Add a feature check for clflush before trying to use the clflush instruction. As far as I can tell there is no equivalent before the SSE2 instruction set. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit open-mpi/ompi@581e47c) Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn I should note that I see nothing in the PR that would address the patcher SEGVs seen on ARMv6 or PPC -Paul |
@PHHargrove Still working on those problems. They are most likely related and I should have a fix for the PPC one tomorrow. |
Test PASSed. |
Nevermind my request for a tarball - I was able to generate one on my own (after hunting down a system w/ new enough autotools). |
On the Linux/Pentium-III and NetBSD7-i386 systems I get past the previous failure only to hit a new one, introduced by your "checker: check for cflush" commit:
The problem comes from the fact that "-fPIC" on x86 uses EBX as the base register (and NetBSD and Darwin produce PIE executables by default, even w/o -fPIC). So, clobbering EBX is NOT AN OPTION and you must save/restore it. Code to do the save/restore exists in opal/include/opal/sys/ia32/timer.h:opal_sys_timer_get_cycles(). If you choose to clone that logic, then please drop the "{" and "}" in the inline asm (but not the characters between them) to avoid breaking things with Solaris Studio compilers (see https://www.open-mpi.org/community/lists/devel/2015/07/17585.php). On IA64 I get past build, but "make check" fails the dl_open test with a SEGV in the memory_patcher code:
On the old RHEL AS4 system (where MREMAP_FIXED is in linux/mman.h), I was able to build. Additionally, I verified that MREMAP_FIXED was actually used on that system, since you had made it optional. On NetBSD-7/amd64 I had to --enable-mca-no-build=io-ompio to get around an unrelated issue. And one additional observation for you:
|
:bot:assign: @hjelmn |
ebx can not be clobbered when using -fPIC so save and restore the register instead of allowing it to be clobbered. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from open-mpi/ompi@2d0e2b6) Signed-off-by: Nathan Hjelm <[email protected]>
Test PASSed. |
@hjelmn |
Almost done with the patcher fixes. I can now run with ppc32. Just one more bug to track down. |
@hjelmn
Recall that in addition to PPC, I have seen ARM and IA64 failures in the patcher code, and MIPS failures that I could not isolate due to no working debugger. I can provide pointers to the corresponding devel list postings for ARM and MIPS if needed. |
@PHHargrove The ARM and PPC issues are more or less the same. I don't have an ARM system to test on but the backtrace is identical to one I saw on PPC. IA64.. Not sure what I can do about that since we don't have any ia64 systems and qemu doesn't support ia64. |
I have ARM h/w I could give you an account on if QEMU is too horribly slow. I do have both MIPS64 and IA64 h/w access, but "second hand" such that I probably cannot get you an account on either. If there is no way for you and I to fix IA64 support in a reasonable time frame, then I would suggest that "patcher" should be disabled on IA64. |
It looks like the linux patcher isn't quite ready for primetime. It gets all instances of munmap except those called within glibc. I will bring set of patches that will 1) disable overwrite on ia64 (memory hooks on this hardware are not worth the time), 2) disable patcher/linux until it can be fixed, and 3) fix the PPC TOC patching to only apply to PPC64. Once these are in please test. |
The table of contents (TOC) code only appears to only apply to ppc64. The code was incorrectly assuming the existence of the TOC on ppc32. This commit updates the necessary code to only apply to ppc64. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit open-mpi/ompi@71be36d) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit open-mpi/ompi@6c9a0e1) Signed-off-by: Nathan Hjelm <[email protected]>
Signed-off-by: Nathan Hjelm <[email protected]>
@PHHargrove Should be good to go. Includes the patches you just tested on master. |
Tests queued... should have a report late tonight. |
Test PASSed. |
@hjelmn 👍 |
@hjelmn Your comment above makes me nervous: "It looks like the linux patcher isn't quite ready for primetime". Indeed -- it looks like you removed the linux patcher component as the last commit in this PR (at last as of now). Where do we stand with this PR -- are we good to go with the Linux patcher components in v2.0.0? Also, do we need open-mpi/ompi@ff2a54b in this PR? |
I think we do want open-mpi/ompi@ff2a54b, given the reasons why ucx did a cleanup. |
@jsquyres We are good to go. The overwrite patcher is working on x86, x86_64, ppc, and ppc64. I removed the linux patcher because it misses the munmap calls made by free. UCX gets around that by using the glibc malloc hook interface. I think for now that the overwrite patcher is sufficient and I plan to fix ia64 support and add sparcv9 support in v2.1.0. ia64 is very low priority and sparcv9 is medium. |
@hppritcha I left open-mpi/ompi@ff2a54b in master and will only bring it over if I can figure out the best way to handle hooking free(). The cleanup was just to remove a lot of duplicate code. The code is now a bit easier to follow. |
@hjelmn Ok. |
I'm good with this in its current state. |
Fix a number of issues identified by @PHHargrove in the latest rc.
:bot:assign: @PHHargrove
:bot🏷️bug
:bot:milestone:v2.0.0