Skip to content

Wrapper : Set IECORE_RTLD_GLOBAL=0#4175

Merged
johnhaddon merged 1 commit intoGafferHQ:mainfrom
johnhaddon:rtldGlobalDisabling
May 5, 2023
Merged

Wrapper : Set IECORE_RTLD_GLOBAL=0#4175
johnhaddon merged 1 commit intoGafferHQ:mainfrom
johnhaddon:rtldGlobalDisabling

Conversation

@johnhaddon
Copy link
Member

This builds on ImageEngine/cortex#1127, disabling the use of RTLD_GLOBAL in Gaffer.

@johnhaddon johnhaddon self-assigned this Mar 22, 2021
johnhaddon added a commit to johnhaddon/cortex that referenced this pull request Mar 22, 2021
The release packages we're uploading from GitHub Actions CI are intended to be compatible replacements for GafferHQ/dependencies version 3, and that's only possible if the Arnold versions match. This mismatch is the cause of at least some of the problems on GafferHQ/gaffer#4175.
@johnhaddon johnhaddon force-pushed the rtldGlobalDisabling branch from 4af7b8c to e462ff2 Compare April 1, 2021 10:31
@johnhaddon johnhaddon force-pushed the rtldGlobalDisabling branch from e462ff2 to 40d8849 Compare May 13, 2021 13:21
@johnhaddon
Copy link
Member Author

johnhaddon commented May 14, 2021

This is crashing in the Arnold unit tests :

* Arnold 6.2.0.1 [903992ac] linux clang-10.0.1 oiio-2.2.1 osl-1.11.6 vdb-7.1.1 clm-2.0.0.235 rlm-12.4.2 optix-6.6.0 2021/02/09 02:32:52
* CRASHED in je_arena_mapbitsp_read 
* signal caught: SIGSEGV -- Invalid memory reference (address not mapped to object)
*
* backtrace:
>> 0 0x00007f1cd0b2c22c [libjemalloc.so ] je_arena_mapbitsp_read                                          [arena.h   :525]
*  1 0x00007f1c648d25dc [alembic_proc.so] std::deque<std::string, std::allocator<std::string> >::~deque() 
*  2 0x00007f1c648cf1a2 [alembic_proc.so] Guard::~Guard()                                                 [crtstuff.c:  ?]
*  3 0x00007f1ccfa53059 [libc.so.6      ] __cxa_finalize                                                  
*  4 0x00007f1c648412f2 [alembic_proc.so] __do_global_dtors_aux                                           [crtstuff.c:  ?]

The problem code can be distilled down to this :

arnold.AiBegin()

import appleseed as asr

s = Gaffer.ScriptNode()

s["diffuse_edf"] = GafferAppleseed.AppleseedLight( "diffuse_edf" )
s["diffuse_edf"].loadShader( "diffuse_edf" )

s["render"] = GafferAppleseed.AppleseedRender( "AppleseedRender" )
s["render"]["in"].setInput( s["diffuse_edf"]["out"] )
s["render"]["mode"].setValue( s["render"].Mode.SceneDescriptionMode )

projectFilename =  self.temporaryDirectory() + "/test.appleseed"
s["render"]["fileName"].setValue( projectFilename )
s["render"]["task"].execute()

reader = asr.ProjectFileReader()
options = asr.ProjectFileReaderOptions.OmitReadingMeshFiles
project = reader.read( projectFilename, os.path.join( os.environ["APPLESEED"], "schemas", "project.xsd" ), options )

arnold.AiEnd()

But then simplied even further :

import ctypes
import arnold

arnold.AiBegin()

ctypes.CDLL(
	#"/home/john/dev/build/gafferPython3/lib/libImath-2_4.so.24",
	"libstdc++.so.6",
	mode = ctypes.DEFAULT_MODE | ctypes.RTLD_GLOBAL
)

arnold.AiEnd()

The issue seems to be triggered by Appleseed loading its plugins (ieDisplay.so in this case) using RTLD_GLOBAL, which is what is simulated by the CDLL call in the simpler example.

Potential fixes seem to be :

  • Patch Appleseed to drop RTLD_GLOBAL
  • Fix whatever makes alembic_proc.so vulnerable in the first place
  • Don't leave Arnold universes open when we're not using them. We do that currently to allow many "readers" to coexist with a single "writer", but Arnold claims to provide secondary universes that work for reading now. In this case, "readers" are ArnoldShaderUI and ArnoldShader.loadShader() that just want to query what plugins are available and their parameters. If we go this route, we need to avoid creating a new temp universe for every single shader load though, as that's way too slow.

@johnhaddon
Copy link
Member Author

We should be able to revisit this PR once #5257 is merged.

@johnhaddon johnhaddon force-pushed the rtldGlobalDisabling branch from 40d8849 to bad846a Compare April 19, 2023 13:39
@johnhaddon
Copy link
Member Author

johnhaddon commented Apr 19, 2023

I've resurrected this PR - now that we've removed GafferAppleseed we should be in the clear. This time round I've gone further, making it completely unconditional that we never load anything with RTLD_GLOBAL. In #4461 (comment), Andrew indicated that IE has been running Gaffer this way since March 2021, and yesterday Daniel confirmed that is still the case. I think that's a good indication that it should be stable...

@johnhaddon
Copy link
Member Author

Would be good to get your eyes on this one @danieldresser-ie.

@danieldresser-ie
Copy link
Contributor

I can't say that I understand everything this affects, but the code looks fine, and based on IE's experience, hopefully it will be fine.

@johnhaddon johnhaddon force-pushed the rtldGlobalDisabling branch from bad846a to 59f65bb Compare May 5, 2023 07:25
@johnhaddon johnhaddon merged commit acbea06 into GafferHQ:main May 5, 2023
@johnhaddon johnhaddon deleted the rtldGlobalDisabling branch May 9, 2023 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants