-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[libomptarget] Support BE ELF files in plugins-nextgen #83976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
35cdcd7
to
407ac26
Compare
The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: llvm#83976
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems fine, just some nits. Hard coding this to LE ELF was the easy solution because we didn't have any targets that used otherwise.
// Little-endian 64-bit | ||
if (const ELF64LEObjectFile *ELFObj = | ||
dyn_cast<ELF64LEObjectFile>(&**ElfOrErr)) | ||
return checkMachineImpl(*ELFObj, EMachine); | ||
// Big-endian 64-bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, comments probably unnecessary, but if you keep them they should end with punctuation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just remove those now.
// Setup the global symbol's address and size. | ||
ImageGlobal.setPtr(const_cast<void *>(*AddrOrErr)); | ||
ImageGlobal.setSize((*SymOrErr)->st_size); | ||
ImageGlobal.setPtr((void *)(SymOrErr->data())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ casts please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Unfortunately this takes both a static_cast and a const_cast, but I guess this can't be helped here.
// If symbol not found, return an empty StringRef. | ||
if (!*SymOrErr) | ||
return StringRef(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we used to have a separate boolean check for this? I suppose it works if we want to encode that error logic at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the point is that the check needs to be done at the call site. One caller wants to check whether the symbol exists or not (so non-existence should not be an error here), for the other caller non-existence is an error, so that error is (still) generated at the call site.
The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: llvm#83976
407ac26
to
0433b56
Compare
/// an empty StringRef; otherwise, returns a StringRef covering the symbol's | ||
/// data in the Obj buffer, based on its address and size | ||
llvm::Expected<llvm::StringRef> | ||
findSymbolInImage(const llvm::MemoryBufferRef Obj, llvm::StringRef Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other functions go off of an llvm::StringRef
for the ELF object, can we do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the caller has an MemoryBufferRef
available, and the callee needs a MemoryBufferRef
(to pass to ObjectFile::createELFObjectFile
), so it seemed preferable to just pass it through rather then stripping out the StringRef
in the caller and re-creating another MemoryBufferRef
in the callee ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caller can use Buffer.getBuffer()
to get the StringRef, and we already construct the memory buffer elsewhere. It's just easier to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll make that change.
if (!SymOrErr) { | ||
consumeError(SymOrErr.takeError()); | ||
return false; | ||
} | ||
|
||
return *SymOrErr; | ||
return !SymOrErr->empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with symbols that have no size? I.e. SHT_NOBITS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably need to double check how that's handled in the ELF as well, I forget exactly how it's presented in the symbol form since it doesn't have a representation in ELF memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that for symbols with no size, we'd also report an empty StringRef, so we cannot distinguish these two cases (easily). I thought this should be OK as the user here actually wants to copy data to/from the memory object identified by the symbol, so it cannot really do anything with a zero-sized symbol either.
If we do need to be able to make that distinction, we'd have to tweak the interface a bit. Either add an explicit boolean, or else expose a bit more details of the implementation (e.g. we could check for SymOrErr->data() != nullptr
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it std::optional
if it's not meant to fail.
0433b56
to
bb2e42a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, thanks for cleaning this up. Just a small style nit.
static Expected<std::optional<StringRef>> | ||
findSymbolInImageImpl(const object::ELFObjectFile<ELFT> &ELFObj, | ||
StringRef Name) { | ||
// Search for the symbol by name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, a lot of these comments are just restating what is generally observable from the code. I.e. getSymbol(ELFObj, Name)
implies we're looking up a symbol by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left in a single comment describing the return value, removed all the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types. To fix this, I've refactored the interface to push all ELF specific types into Utils/ELF. Specifically, this patch removes both the getSymbol and getSymbolAddress routines and replaces them with a single findSymbolInImage, which gets a StringRef identifying the raw object file image as input, and returns a StringRef covering the data addressed by the symbol (address and size) if found, or std::nullopt otherwise. This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.
bb2e42a
to
cf93c04
Compare
Thanks for the review! |
The plugin was not getting built as the build_generic_elf64 macro assumes the LLVM triple processor name matches the CMake processor name, which is unfortunately not the case for SystemZ. Fix this by providing two separate arguments instead. Actually building the plugin exposed a number of other issues causing various test failures. Specifically, I've had to add the SystemZ target to - CompilerInvocation::ParseLangArgs - linkDevice in ClangLinuxWrapper.cpp - OMPContext::OMPContext (to set the device_kind_cpu trait) - LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt - a check_plugin_target call in libomptarget/src/CMakeLists.txt Finally, I've had to set a number of test cases to UNSUPPORTED on s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on s390x for what seem to be the same reason. In addition, this also requires support for BE ELF files in plugins-nextgen: #83976
Unfortunately, this seems to have caused regressions in the cuda and amdgpu builders. I was able to restore the builds by this commit: b64482e, but the amdgpu builders still failed due to some GPU memory address faults: Not sure what this is all about, I've reverted all patches again for now. If you have any suggestion what might have caused that problem, I'd appreciate it! I'll see if I'm able to reproduce the problem locally somehow. |
Most recent build seems green https://lab.llvm.org/buildbot/#/builders/193/builds/47893. Those bots sometimes just die for no reason, there's a lot of flaky tests unfortunately. |
Well, that's exactly the revision of my revert ... It does seem to be related, builds started failing exactly with the revision that checked in this PR, and started passing again with the revert. |
Ah, I thought you only landed the one fix, apologies. The most recent messages seem to be a compiler failure and not a test failure. But I wouldn't be surprised if there was some hidden behavior here. |
Ok, here's the full sequence of commits:
|
Do you have a GPU to run tests locally on? I would guess that the CPU targets don't requires a lot of the implicit argument handling or kernel argument handling so there's probably some overlooked behavior. |
Unfortunately, I don't have an AMD GPU locally. I've done another thorough review, and noticed a number of unintended changes in this PR:
As a conservative option, I've now implemented a new approach here: #85246 This keeps the overall structure the same, but just replaces the ELF-specific types with more generic ELF types in the common-code interfaces. This works the same on IBM Z, and I hope it will avoid introducing an breakage elsewhere (which I guess we'll see via build bot results if and when it can get committed) |
Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types.
To fix this, I've refactored the interface to push all ELF specific types into Utils/ELF. Specifically, this patch removes both the getSymbol and getSymbolAddress routines and replaces them with a single findSymbolInImage, which gets a MemoryBufferRef identifying the raw object file image as input, and returns a StringRef covering the data addressed by the symbol (address and size) if found, or an empty StringRef otherwise.
This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.