Skip to content

Add IECORE_RTLD_GLOBAL environment variable #1127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 2, 2021

Conversation

johnhaddon
Copy link
Member

This adds an environment variable which can be used to turn off the RTLD_GLOBAL flag we've used to load our python modules up until now. After an unpleasant day or two of debugging where this was implicated but turned out not to be the issue, I'd really like to make some progress towards to getting rid of it so there's one less thing to worry about. @andrewkaufman, having done the work, I now realise you have something similar already in #954. The difference here is that I've also updated the CI setup to run all the unit tests with the flag turned off, demonstrating (hopefully) that all is well.

The dependencies update here isn't purely coincidental - it seemed that older versions of pyopenvdb would crash on load if the flag was off, independent of our code, but that appears to be fixed in 7.2.2. The next step is to update Gaffer to turn the flag off by default, and I'm getting close with that - everything working so far except for an issue with the Arnold display driver.

@johnhaddon johnhaddon self-assigned this Feb 18, 2021
@andrewkaufman
Copy link
Member

I would need the 2 commits from my other PR affecting IECoreMaya & IECoreHoudini as well. Do you want to cherry-pick those over (and fixup to use the new env var name) or should I try pushing into your branch myself? Or alternatively I could cherry-pick your commits into my PR.

I don't recall why we didn't merge mine back then, but I'd guess it was about not wanting the imath shenanigans. Maybe that'll just be unneeded now we're using 2.4.1 🤞

@johnhaddon
Copy link
Member Author

I've pushed 9654922 which fixes the last problem running Gaffer with IECORE_RTLD_GLOBAL=0

I would need the 2 commits from my other PR affecting IECoreMaya & IECoreHoudini as well.

My aim in this PR is to get the minimum requirement for Gaffer standalone merged (while not changing default behaviour), and then hand over to you to take on the broader DCC/IE stuff as a followup. From the comments on #954 it seems there might be unresolved problems with Maya/Houdini even with those commits, so I'd rather not cherry-pick blindly until those are addressed.

@johnhaddon
Copy link
Member Author

Would be good to get this merged so I can follow up with the matching PR in Gaffer (targeting 0.60). As mentioned above, this shouldn't change anything (except 9654922) unless opted into, so should be safe for the custom IE builds.

johnhaddon and others added 4 commits March 1, 2021 09:10
This is only an alpha release in the sense that we will probably bump major version for some other dependencies (e.g. USD) before calling it 3.0.0. All the dependencies are from officially released (non-alpha) versions. The intention is to use 3.0.0 for Gaffer 0.60, but 3.0.0 is also a better approximation of the environment used for deployment at Image Engine.
This can be used to turn off the use of the `RTLD_GLOBAL` flag in Python, which makes us less vulnerable to symbol clashes between libraries. Updated CI to run the unit tests with it turned off.
This was causing crashes at render shutdown in Gaffer unless we either used `RTLD_GLOBAL`
or we used `LD_PRELOAD=libai.so`. The crashes were always during deallocation, so I suspect
that we were getting mismatched allocation/deallocation between Arnold's allocator and our own.
@andrewkaufman
Copy link
Member

I'm testing this at IE now. Do you have an accompanying Gaffer PR? So far Cortex tests are all passing with RTLD_GLOBAL disabled, but I can't do interactive testing because libIERendering complains about undefined symbol: _ZN14GafferBindings6Detail23dependencyNodeMetaclassEv (GafferBindings::Detail::dependencyNodeMetaclass()), which means I can't verify that the libai linking change is ok on our end.

@andrewkaufman
Copy link
Member

I just pushed the extra commits from my other PR necessary for testing this at IE... hope that's ok...

@andrewkaufman
Copy link
Member

To clarify, these extra commits are good-to-go on our end. The branch works fine in Maya/Houdini/Gaffer when IECORE_RTLD_GLOBAL is enabled (or not specified), the Maya/Houdini unittests pass with IECORE_RTLD_GLOBAL disabled, and the limited interactive testing I was able to do when disabled seems to have gone ok as well.

So for us its ok to merge the PR, but it would be nice to confirm the Arnold change when IECORE_RTLD_GLOBAL is disabled...

@andrewkaufman
Copy link
Member

Looks like the py3 test failure is that I rebased on master. I'm not really sure why what test failure didn't showup on #1128 directly though... maybe I accidentally merged that one before the CI actually started...

@johnhaddon
Copy link
Member Author

Do you have an accompanying Gaffer PR?

I don't have a PR, and won't until I update the dependencies package once this Cortex PR is merged. But this is the branch I've been testing with :

https://github.com/johnhaddon/gaffer/tree/rtldGlobalDisabling

There are no changes really, just something in the wrapper to set IECORE_RTLD_GLOBAL=0.

libIERendering complains about undefined symbol: _ZN14GafferBindings6Detail23dependencyNodeMetaclassEv

At first glance that seems to be exported correctly in Gaffer. Perhaps you need to link against libGafferBindings.so?

Looks like the py3 test failure is that I rebased on master.

I'll open a PR to fix that.

@andrewkaufman andrewkaufman merged commit 18e60af into ImageEngine:master Mar 2, 2021
@johnhaddon johnhaddon deleted the dlopenFlagsPR branch August 2, 2021 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants