Skip to content

Building using versions of cuda-python with the new layout breaks runtime compatibility with older versions #274

Open
@vyasr

Description

@vyasr

RAPIDS started rolling out new pinnings to use cuda-python 11.8.5/12.6.2 at build-time. Once we did so, we started observing some cascading issues like this one. Specifically, once rmm was built using the new-layout cuda-python versions, we started seeing the following error in downstream repositories that were still pinned to older versions of cuda-python due to #215 and #226:

  File "device_buffer.pyx", line 1, in init rmm.pylibrmm.device_buffer
ModuleNotFoundError: No module named 'cuda.bindings'

The last frame in the traceback always points to the initialization of the rmm.pylibrmm.device_buffer module. This indicates to me that parts of cuda-python that are cimported into this module are embedding the cuda.bindings namespace into the module initialization in a way that is likely defeating the trampoline modules that were added to cuda-python for backwards compatibility, thus making the rmm modules compiled against new-layout cuda-python incompatible with runtime usage of old-layout cuda-python.

My guess is that some of the same issues that are causing us to have to manually do the __pyx_capi__ definition are causing this. Cython simply isn't designed for mismatching layouts in this way, and my guess is that some of the objects that it defines internally in the cuda-bindings module are being copied directly over to the legacy cuda modules even though that isn't what was intended, resulting in those modules having internal objects that still specify the new-layout module names and then break consumers in the mixed build/runtime version case.

I'm not sure this will be fixable without further interactions with Cython internals like the __pyx_capi__ change. It may not be worthwhile, and at this point it might make sense for cuda-python to simply advocate a clean break.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CI/CDCI/CD infrastructurebugSomething isn't workingcuda.bindingsEverything related to the cuda.bindings modulewontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions