[libomptarget] Support BE ELF files in plugins-nextgen #83976

uweigand · 2024-03-05T09:00:23Z

Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types.

To fix this, I've refactored the interface to push all ELF specific types into Utils/ELF. Specifically, this patch removes both the getSymbol and getSymbolAddress routines and replaces them with a single findSymbolInImage, which gets a MemoryBufferRef identifying the raw object file image as input, and returns a StringRef covering the data addressed by the symbol (address and size) if found, or an empty StringRef otherwise.

This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.

github-actions · 2024-03-05T09:02:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: llvm#83976

jhuber6

Overall seems fine, just some nits. Hard coding this to LE ELF was the easy solution because we didn't have any targets that used otherwise.

jhuber6 · 2024-03-06T04:36:51Z

openmp/libomptarget/plugins-nextgen/common/src/Utils/ELF.cpp

+  // Little-endian 64-bit
+  if (const ELF64LEObjectFile *ELFObj =
+          dyn_cast<ELF64LEObjectFile>(&**ElfOrErr))
+    return checkMachineImpl(*ELFObj, EMachine);
+  // Big-endian 64-bit


Nit, comments probably unnecessary, but if you keep them they should end with punctuation.

I've just remove those now.

jhuber6 · 2024-03-06T04:39:35Z

openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp

  // Setup the global symbol's address and size.
-  ImageGlobal.setPtr(const_cast<void *>(*AddrOrErr));
-  ImageGlobal.setSize((*SymOrErr)->st_size);
+  ImageGlobal.setPtr((void *)(SymOrErr->data()));


C++ casts please

OK. Unfortunately this takes both a static_cast and a const_cast, but I guess this can't be helped here.

jhuber6 · 2024-03-06T04:41:48Z

openmp/libomptarget/plugins-nextgen/common/src/Utils/ELF.cpp

+  // If symbol not found, return an empty StringRef.
+  if (!*SymOrErr)
+    return StringRef();


Didn't we used to have a separate boolean check for this? I suppose it works if we want to encode that error logic at the call site.

Yes, the point is that the check needs to be done at the call site. One caller wants to check whether the symbol exists or not (so non-existence should not be an error here), for the other caller non-existence is an error, so that error is (still) generated at the call site.

The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: llvm#83976

jhuber6 · 2024-03-06T15:28:18Z

openmp/libomptarget/plugins-nextgen/common/include/Utils/ELF.h

+/// an empty StringRef; otherwise, returns a StringRef covering the symbol's
+/// data in the Obj buffer, based on its address and size
+llvm::Expected<llvm::StringRef>
+findSymbolInImage(const llvm::MemoryBufferRef Obj, llvm::StringRef Name);


All the other functions go off of an llvm::StringRef for the ELF object, can we do the same here?

Well, the caller has an MemoryBufferRef available, and the callee needs a MemoryBufferRef (to pass to ObjectFile::createELFObjectFile), so it seemed preferable to just pass it through rather then stripping out the StringRef in the caller and re-creating another MemoryBufferRef in the callee ...

Caller can use Buffer.getBuffer() to get the StringRef, and we already construct the memory buffer elsewhere. It's just easier to be consistent.

Fair enough, I'll make that change.

jhuber6 · 2024-03-06T15:34:58Z

openmp/libomptarget/plugins-nextgen/common/src/GlobalHandler.cpp

  if (!SymOrErr) {
    consumeError(SymOrErr.takeError());
    return false;
  }

-  return *SymOrErr;
+  return !SymOrErr->empty();


How does this interact with symbols that have no size? I.e. SHT_NOBITS.

I'll probably need to double check how that's handled in the ELF as well, I forget exactly how it's presented in the symbol form since it doesn't have a representation in ELF memory.

It's true that for symbols with no size, we'd also report an empty StringRef, so we cannot distinguish these two cases (easily). I thought this should be OK as the user here actually wants to copy data to/from the memory object identified by the symbol, so it cannot really do anything with a zero-sized symbol either.

If we do need to be able to make that distinction, we'd have to tweak the interface a bit. Either add an explicit boolean, or else expose a bit more details of the implementation (e.g. we could check for SymOrErr->data() != nullptr).

Just make it std::optional if it's not meant to fail.

jhuber6

LG, thanks for cleaning this up. Just a small style nit.

jhuber6 · 2024-03-06T19:22:30Z

openmp/libomptarget/plugins-nextgen/common/src/Utils/ELF.cpp

+static Expected<std::optional<StringRef>>
+findSymbolInImageImpl(const object::ELFObjectFile<ELFT> &ELFObj,
+                      StringRef Name) {
+  // Search for the symbol by name.


Nit, a lot of these comments are just restating what is generally observable from the code. I.e. getSymbol(ELFObj, Name) implies we're looking up a symbol by name.

Left in a single comment describing the return value, removed all the others.

Thanks, looks good.

Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types. To fix this, I've refactored the interface to push all ELF specific types into Utils/ELF. Specifically, this patch removes both the getSymbol and getSymbolAddress routines and replaces them with a single findSymbolInImage, which gets a StringRef identifying the raw object file image as input, and returns a StringRef covering the data addressed by the symbol (address and size) if found, or std::nullopt otherwise. This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.

uweigand · 2024-03-06T19:47:17Z

Thanks for the review!

The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: #83976

This reverts commit 15b7b31.

uweigand · 2024-03-06T20:46:19Z

Unfortunately, this seems to have caused regressions in the cuda and amdgpu builders. I was able to restore the builds by this commit: b64482e, but the amdgpu builders still failed due to some GPU memory address faults:
https://lab.llvm.org/buildbot/#/builders/193/builds/47890

Not sure what this is all about, I've reverted all patches again for now. If you have any suggestion what might have caused that problem, I'd appreciate it! I'll see if I'm able to reproduce the problem locally somehow.

jhuber6 · 2024-03-06T20:48:39Z

Most recent build seems green https://lab.llvm.org/buildbot/#/builders/193/builds/47893. Those bots sometimes just die for no reason, there's a lot of flaky tests unfortunately.

uweigand · 2024-03-06T20:50:20Z

Most recent build seems green https://lab.llvm.org/buildbot/#/builders/193/builds/47893.

Well, that's exactly the revision of my revert ... It does seem to be related, builds started failing exactly with the revision that checked in this PR, and started passing again with the revert.

jhuber6 · 2024-03-06T20:52:19Z

Most recent build seems green https://lab.llvm.org/buildbot/#/builders/193/builds/47893.

Well, that's exactly the revision of my revert ... It does seem to be related, builds started failing exactly with the revision that checked in this PR, and started passing again with the revert.

Ah, I thought you only landed the one fix, apologies. The most recent messages seem to be a compiler failure and not a test failure. But I wouldn't be surprised if there was some hidden behavior here.

uweigand · 2024-03-06T21:06:48Z

Ok, here's the full sequence of commits:

15b7b31 [libomptarget] Support BE ELF files in plugins-nextgen ([libomptarget] Support BE ELF files in plugins-nextgen #83976)
This PR. This actually caused 4 builders to fail with a compiler error, but before I noticed this, I also commited the next PR.
3ecd38c [libomptarget] Build plugins-nextgen for SystemZ ([libomptarget] Build plugins-nextgen for SystemZ #83978)
PR 83978, which I had waited to commit as it depends on this PR. After I had committed this, I started getting the builder failures. I noticed the compiler error, and though this was easy to fix, and checked in the following quick fix.
b64482e [libomptarget] Fix CUDA plugin build regression
Quick fix intended to fix the compile error, which it actually did. This causes two of the four failing builders to pass again. The two remaining ones now also started compiling successfully again, but still failed during test - now with the GPU memory access fault. Here, I decided to revert all three patches again to get the builders green.
d4f4f80 Revert "[libomptarget] Fix CUDA plugin build regression"
One builder picked up this intermediate state and again failed with the compiler error.
70677c8 Revert "[libomptarget] Build plugins-nextgen for SystemZ ([libomptarget] Build plugins-nextgen for SystemZ #83978)"
This intermediate state was also picked up, still failing with the compiler error.
fb7cc73 Revert "[libomptarget] Support BE ELF files in plugins-nextgen ([libomptarget] Support BE ELF files in plugins-nextgen #83976)"
Now all builders are green again.

jhuber6 · 2024-03-06T21:09:11Z

Do you have a GPU to run tests locally on? I would guess that the CPU targets don't requires a lot of the implicit argument handling or kernel argument handling so there's probably some overlooked behavior.

uweigand · 2024-03-14T15:37:57Z

Do you have a GPU to run tests locally on? I would guess that the CPU targets don't requires a lot of the implicit argument handling or kernel argument handling so there's probably some overlooked behavior.

Unfortunately, I don't have an AMD GPU locally. I've done another thorough review, and noticed a number of unintended changes in this PR:

I had overlooked that cuda/src/rtl.cpp directly accesses Handler.getELFObjectFile
When using ObjectFile::createELFObjectFile in the new findSymbolInImage, I was passing /*InitContent=*/false (copied from checkMachine). While this is fine for checkMachine, when doing anything more complicated with the ELFObjectFile, this may cause problems
There are some corner cases where using the new findSymbolInImage for simply checking symbol existance may return a different result than the original getSymbol - I cannot prove this breaks anything, but I cannot really exclude it either.

As a conservative option, I've now implemented a new approach here: #85246 This keeps the overall structure the same, but just replaces the ELF-specific types with more generic ELF types in the common-code interfaces. This works the same on IBM Z, and I hope it will avoid introducing an breakage elsewhere (which I guess we'll see via build bot results if and when it can get committed)

llvmbot added the openmp:libomptarget OpenMP offload runtime label Mar 5, 2024

uweigand force-pushed the openmp-plugin-elfbe branch from 35cdcd7 to 407ac26 Compare March 5, 2024 09:12

uweigand mentioned this pull request Mar 5, 2024

[libomptarget] Build plugins-nextgen for SystemZ #83978

Merged

uweigand requested review from jhuber6 and jdoerfert March 5, 2024 16:51

uweigand mentioned this pull request Mar 5, 2024

Add openmp support to System z #66081

Merged

jhuber6 reviewed Mar 6, 2024

View reviewed changes

uweigand force-pushed the openmp-plugin-elfbe branch from 407ac26 to 0433b56 Compare March 6, 2024 15:09

jhuber6 reviewed Mar 6, 2024

View reviewed changes

uweigand force-pushed the openmp-plugin-elfbe branch from 0433b56 to bb2e42a Compare March 6, 2024 19:19

jhuber6 approved these changes Mar 6, 2024

View reviewed changes

uweigand force-pushed the openmp-plugin-elfbe branch from bb2e42a to cf93c04 Compare March 6, 2024 19:34

jhuber6 approved these changes Mar 6, 2024

View reviewed changes

uweigand merged commit 15b7b31 into llvm:main Mar 6, 2024

uweigand deleted the openmp-plugin-elfbe branch March 6, 2024 19:49

uweigand added a commit that referenced this pull request Mar 6, 2024

Revert "[libomptarget] Support BE ELF files in plugins-nextgen (#83976)"

fb7cc73

This reverts commit 15b7b31.

uweigand restored the openmp-plugin-elfbe branch March 6, 2024 20:38

uweigand deleted the openmp-plugin-elfbe branch March 15, 2024 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libomptarget] Support BE ELF files in plugins-nextgen #83976

[libomptarget] Support BE ELF files in plugins-nextgen #83976

uweigand commented Mar 5, 2024

github-actions bot commented Mar 5, 2024 •

edited

Loading

jhuber6 left a comment

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

jhuber6 left a comment

jhuber6 Mar 6, 2024

uweigand Mar 6, 2024

jhuber6 Mar 6, 2024

uweigand commented Mar 6, 2024

uweigand commented Mar 6, 2024

jhuber6 commented Mar 6, 2024

uweigand commented Mar 6, 2024 •

edited

Loading

jhuber6 commented Mar 6, 2024

uweigand commented Mar 6, 2024

jhuber6 commented Mar 6, 2024

uweigand commented Mar 14, 2024

[libomptarget] Support BE ELF files in plugins-nextgen #83976

[libomptarget] Support BE ELF files in plugins-nextgen #83976

Conversation

uweigand commented Mar 5, 2024

github-actions bot commented Mar 5, 2024 • edited Loading

jhuber6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhuber6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand commented Mar 6, 2024

uweigand commented Mar 6, 2024

jhuber6 commented Mar 6, 2024

uweigand commented Mar 6, 2024 • edited Loading

jhuber6 commented Mar 6, 2024

uweigand commented Mar 6, 2024

jhuber6 commented Mar 6, 2024

uweigand commented Mar 14, 2024

github-actions bot commented Mar 5, 2024 •

edited

Loading

uweigand commented Mar 6, 2024 •

edited

Loading