Skip to content

Support Arnold 7 and 6.2 in a single build#4461

Merged
johnhaddon merged 18 commits intoGafferHQ:masterfrom
johnhaddon:ieCoreArnold
Nov 15, 2021
Merged

Support Arnold 7 and 6.2 in a single build#4461
johnhaddon merged 18 commits intoGafferHQ:masterfrom
johnhaddon:ieCoreArnold

Conversation

@johnhaddon
Copy link
Member

This moves IECoreArnold from Cortex into Gaffer, fixes a few Arnold 7 issues, and adds dual Arnold 7/6.2 support to Gaffer. The intention is that for Gaffer 0.61, a single Gaffer download will serve both Arnold 6.2 and 7, choosing the appropriate version at runtime.

@johnhaddon johnhaddon self-assigned this Nov 9, 2021
andrewkaufman added a commit to andrewkaufman/gaffer that referenced this pull request Nov 10, 2021
This is a temp commit to get the tests passing. I'm assuming we _do_ want global attributes passed to the renderer, so we'd either need to add support for Box2fData (after GafferHQ#4461 is merged) or to somehow indentify the "special case" `render:*` globals that should _not_ be passed through.

Without this, we would get the following:

```
======================================================================
ERROR: testRenderRegion (GafferArnoldTest.ArnoldRenderTest.ArnoldRenderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andrewk/apps/gaffer/0.61.0.0dev/cent7.x86_64/cortex/10.3/gaffer/py2/python/GafferTest/TestCase.py", line 126, in __messageHandlerCleanup
    raise RuntimeError( "Unexpected message : " + failureHandler.levelAsString( message.level ) + " : " + message.context + " : " + message.message )
RuntimeError: Unexpected message : WARNING : setParameter : Unsupported data type "Box2fData" for name "render:cropWindow"
```
@johnhaddon johnhaddon force-pushed the ieCoreArnold branch 2 times, most recently from 385adaf to 83970b6 Compare November 10, 2021 10:52
Copy link
Contributor

@danieldresser-ie danieldresser-ie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I also checked with Andrew, and he doesn't think his hesitations about this organization are worth holding this up for.

# _without_ RTLD_GLOBAL. This prevents clashes between the
# LLVM symbols in libai.so and the Mesa OpenGL driver.
# Ideally we wouldn't use RTLD_GLOBAL anywhere - see
# Make sure we _GafferArnold _without_ RTLD_GLOBAL. This prevents clashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Make sure we import

I'm assuming the reason why we no longer do this to IECoreArnold here is because as part of moving it from Cortex, we no longer compile it with RTLD_GLOBAL?

Copy link
Member Author

@johnhaddon johnhaddon Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the reason why we no longer do this to IECoreArnold here is because as part of moving it from Cortex, we no longer compile it with RTLD_GLOBAL?

Ah, no, this was an error actually. It made sense to remove it in the earlier commit because _IECoreArnold no longer existed (it was just additional bindings in _GafferArnold). But now it is a standalone module, it needs to come back. I'll sort it out. I also need to get to the bottom of these Mac failures, and I think I can improve things by taking a more principled approach to the problem of the Arnold log and profile still being a global resource.

As an aside, RTLD_GLOBAL is a runtime thing, not a compile time thing. It gets turned on in IECore/__init__.py to work around historical problems, and I'm keen to remove it completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is holding us back from removing it? At IE we flipped it off back in March and haven’t had any issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's documented here : #4175. Problem with Appleseed, which is presumably why it hasn't affected IE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In preparation for moving the components of the original IECoreArnold from Cortex to Gaffer. Gaffer is effectively the only client of IECoreArnold now, and having all the Arnold code under one roof is going to simplify development and deployment considerably.

Also move the Python module into a Private submodule, mirroring the layout of the header files.
This was documented as needing moving, as it isn't used anywhere else in Cortex or Gaffer.
This is a straightforward copy, except :

- I've omitted the headers for MeshAlgo, SphereAlgo etc. This functionality is used exclusively via NodeAlgo anyway.
- Some of the test classes have been renamed. e.g. MeshTest -> MeshAlgoTest.
- Python 3 deprecation warnings have been fixed in the tests.
- `UniverseTest.testMetadataLoading` now launches a subprocess to ensure the metadata can be loaded.
Building it as a generic "Gaffer.so" Arnold plugin, as I have some plans for adding additional Arnold extensions in future, and it's simpler to have one plugin for them all.
This will allow us to support multiple Arnold versions in a single Gaffer install.
This makes them visible to the SCons scanner, so that SCons can see when we are building with different Arnold versions, and can therefore cache build results appropriately.

The downside of not using `-isystem` is that the headers aren't opted out of the `-Werror` arguments we build with. This exposes a bug in Arnold's own headers in 6.2, whereby `ai_dotass.h` contains calls to deprecated functions. We work around this by including exactly the `ai_*.h` headers we need, instead of using the catch-all `ai.h`.
Some API changed snuck in since the last beta version we tested against.
These are waiting on Arnold 7 fixes.
Also remove Azure workarounds from `installArnold.sh`.
This emphasises the boundary between IECoreArnold and GafferArnold, with GafferArnold depending on IECoreArnold, and IECoreArnold only being dependent on other `IECore*` libs (apart from the temporary dependency on `IECoreScenePreview::Renderer`).
This matches our previous behaviour for Arnold 6, and avoids a recursive initialisation problem when `libIECoreArnold` is first loaded by an Arnold plugin.
For some reason, on Mac only, the standard plugins appear to be loaded _after_ the contents of `ARNOLD_PLUGIN_PATH`. This triggers a warning because our `gaffer.mtd` file tries to register metadata for the `cryptomatte` shader, which only exists once the standard plugins have loaded.
@johnhaddon
Copy link
Member Author

I've rebased this to include the RTLD_GLOBAL change discussed inline, and added 3 further commits on the end. All tests are now passing on Linux, but we have remaining failures on Mac for Arnold 7. I've whittled those down to a very small standalone repro using only the Arnold Python API, and provided it to SolidAngle. I'm hoping they can either point to some misuse of the API or fix on their side.

In the meantime, the question is "what do we do?". Regardless of the Mac failures, I think there's a lot of value in making Arnold 7 available for beta testing, so I'm somewhat inclined to merge this anyway (assuming the new commits look OK). Any objections or other ideas?

@danieldresser-ie
Copy link
Contributor

All looks reasonable, and I agree with disabling the mac test in order to get a release out.

@johnhaddon
Copy link
Member Author

johnhaddon commented Nov 15, 2021

I agree with disabling the mac test in order to get a release out.

Done in 3b8b388. I've also been able to confirm that the tests pass in the as-yet-unreleased 7.0.0.1.

These exposed a bug in Arnold 7's render session handling. This could be reproduced with the following simple script :

```
arnold.AiBegin()

while True :

	# Make a universe, do an interactive render.
	# If we don't do this, then all is well.

	if True :

		universe = arnold.AiUniverse()
		arnold.AiSceneLoad( universe, './render.ass', None )
		session = arnold.AiRenderSession( universe )

		arnold.AiRenderBegin( session, arnold.AI_RENDER_MODE_CAMERA )
		arnold.AiRenderEnd( session )

		arnold.AiRenderSessionDestroy( session )
		arnold.AiUniverseDestroy( universe )

	# Make another universe, and load an ass file.
	# Occasionally this will fail, either crashing or
	# not producing any nodes.

	universe = arnold.AiUniverse()
	arnold.AiSceneLoad( universe, './load.ass', None )
	assert( arnold.AiNodeLookUpByName( universe, "test" ) is not None )
	arnold.AiUniverseDestroy( universe )

arnold.AiEnd()
```

We've confirmed that the upcoming Arnold 7.0.0.1 fixes both this simple repro and the original problem in the unit tests. But while that is still pending release, we want to be able to release Gaffer 0.61. Hence we disable the tests for the specific problematic Arnold version.
@johnhaddon johnhaddon merged commit cda9f40 into GafferHQ:master Nov 15, 2021
@johnhaddon johnhaddon deleted the ieCoreArnold branch November 19, 2021 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants