Move CUDA stuff to an extension #4499

michel2323 · 2025-05-12T14:39:14Z

This PR isolates CUDA into src/arch_cuda.jl. This removes any direct CUDA calls in the remaining Oceananigans code base. That feel can either serve as a template for a new GPU architecture or for a future CUDA extension. @vchuravy

Closes #3481

src/arch_cuda.jl

glwagner · 2025-05-12T15:10:26Z

Possibly, we should simply implement a CUDA extension in this PR with appropriate organization of the code and get on with the breaking change!

tl;dr then after this is merged, anybody doing computations on nvidia GPU has to write

using Oceananigans
using CUDA

glwagner · 2025-05-12T15:10:37Z

@simone-silvestri curious to hear your thoughts

simone-silvestri · 2025-05-12T17:05:14Z

I think it's a good idea. It provides templates to add new architectures and makes the code completely architecture agnostic. the extra using CUDA is a small price to pay.

src/Utils/versioninfo.jl

glwagner · 2025-05-24T17:23:03Z

@michel2323 let us know when this is ready for prime time

glwagner · 2025-06-11T19:40:54Z

src/DistributedComputations/distributed_architectures.jl

-        isnothing(devices) ? device!(node_rank % ndevices()) : device!(devices[node_rank+1])
+        isnothing(devices) ? device!(child_architecture, node_rank % ndevices(child_architecture)) : device!(child_architecture, devices[node_rank+1])


@simone-silvestri

src/Architectures.jl

michel2323 · 2025-06-11T20:34:01Z

@glwagner For the failing tests, we have 4 in total

oceanangians-distributed: I think it can't find the commit because it's run from a fork.
cpu-turbulence-closure-tests: Bus error. No idea man.
gpu-multi-region-tests: That's the hard one I have to sit on. I dived into it and it's definitely my changes. However, I don't understand what is different at runtime that triggers this.
Documentation something.

michel2323 · 2025-06-12T15:59:37Z

I think the main problem is that I haven't figured out what getdevice actually means across all objects where it is implemented. In particular, there's a bunch of getdevice(somearray). @vchuravy How would that look like with KA? Do the arrays know on which device they are?

vchuravy · 2025-06-12T17:31:29Z

Do the arrays know on which device they are?

To my knowledge that's an ill-formed query.

michel2323 · 2025-06-12T17:53:32Z

Do the arrays know on which device they are?

To my knowledge that's an ill-formed query.

@simone-silvestri @glwagner How do you want to proceed with these? Can this be rewritten to only use stuff from Architectures?

https://github.com/michel2323/Oceananigans.jl/blob/2e5f75498e8fa7896a91241351c0e2bac9904adc/src/Utils/multi_region_transformation.jl#L54-L70

navidcy · 2025-07-06T21:14:04Z

src/BoundaryConditions/boundary_condition.jl

-validate_boundary_condition_architecture(::CuArray, ::GPU, bc, side) = nothing
-
-validate_boundary_condition_architecture(::CuArray, ::CPU, bc, side) =
-    throw(ArgumentError("$side $bc must use `Array` rather than `CuArray` on CPU architectures!"))

 validate_boundary_condition_architecture(::Array, ::GPU, bc, side) =
    throw(ArgumentError("$side $bc must use `CuArray` rather than `Array` on GPU architectures!"))


this reads like a CUDA-specific message? Is the error message wrongly CUDA-specific here?

Suggested change

throw(ArgumentError("$side $bc must use `CuArray` rather than `Array` on GPU architectures!"))

throw(ArgumentError("$side $bc must use `GPUArray` rather than `Array` on GPU architectures!"))

?

navidcy · 2025-07-06T21:45:56Z

I fixed the docs.

The last error seems to be coming from MultiRegion. The cubed sphere simulation fails at the run!(simulation) when it tries to write output. Seems that the error is coming from an iterator? I wasn't able to figure it out.

Here's an MWE

using Oceananigans
grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
model = HydrostaticFreeSurfaceModel(; grid,
                                    momentum_advection = WENOVectorInvariant(order=5),
                                    tracer_advection = WENO(order=5),
                                    free_surface = SplitExplicitFreeSurface(grid; substeps=12),
                                    coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
                                    tracers = :b,
                                    buoyancy = BuoyancyTracer())
simulation = Simulation(model, Δt=60, stop_time=600)

simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
                                                schedule = IterationInterval(2),
                                                filename = "cubed_sphere_output",
                                                verbose = false,
                                                overwrite_existing = true)

run!(simulation)

navidcy · 2025-07-06T21:49:58Z

julia> using Oceananigans
[ Info: Oceananigans will use 12 threads

julia> grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} partitioned on CPU():
├── grids: 18×18×9 OrthogonalSphericalShellGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6×6×6 halo and with precomputed metrics
├── partitioning: CubedSpherePartition with (1 region in each panel)
├── connectivity: CubedSphereConnectivity
└── devices: (CPU(), CPU(), CPU(), CPU(), CPU(), CPU())

julia> model = HydrostaticFreeSurfaceModel(; grid,
                                           momentum_advection = WENOVectorInvariant(order=5),
                                           tracer_advection = WENO(order=5),
                                           free_surface = SplitExplicitFreeSurface(grid; substeps=12),
                                           coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
                                           tracers = :b,
                                           buoyancy = BuoyancyTracer())
HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
├── grid: 18×18×9 ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6×6×6 halo
├── timestepper: QuasiAdamsBashforth2TimeStepper
├── tracers: b
├── closure: Nothing
├── buoyancy: BuoyancyTracer with ĝ = NegativeZDirection()
├── free surface: SplitExplicitFreeSurface with gravitational acceleration 9.80665 m s⁻²
│   └── substepping: FixedSubstepNumber(8)
├── advection scheme:
│   ├── momentum: MultiRegionObject{NTuple{6, WENOVectorInvariant{3, 3, Float64, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.VelocityStencil, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}}}, NTuple{6, CPU}, KernelAbstractions.CPU}
│   └── b: WENO{3, Float64, Float32}(order=5)
└── coriolis: HydrostaticSphericalCoriolis{Oceananigans.Advection.EnstrophyConserving{Float64}, Float64}

julia> simulation = Simulation(model, Δt=60, stop_time=600)
Simulation of HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
├── Next time step: 1 minute
├── Elapsed wall time: 0 seconds
├── Wall time per iteration: NaN days
├── Stop time: 10 minutes
├── Stop iteration: Inf
├── Wall time limit: Inf
├── Minimum relative step: 0.0
├── Callbacks: OrderedDict with 4 entries:
│   ├── stop_time_exceeded => 4
│   ├── stop_iteration_exceeded => -
│   ├── wall_time_limit_exceeded => e
│   └── nan_checker => }
├── Output writers: OrderedDict with no entries
└── Diagnostics: OrderedDict with no entries

julia> simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
                                                       schedule = IterationInterval(2),
                                                       filename = "cubed_sphere_output",
                                                       verbose = false,
                                                       overwrite_existing = true)
JLD2Writer scheduled on IterationInterval(2):
├── filepath: cubed_sphere_output.jld2
├── 7 outputs: (u, v, w, b, η, U, V)
├── array type: Array{Float32}
├── including: [:grid, :coriolis, :buoyancy, :closure]
├── file_splitting: NoFileSplitting
└── file size: 1.9 MiB

julia> run!(simulation)
[ Info: Initializing simulation...
ERROR: MethodError: no method matching MultiRegionObject(::NTuple{6, Array{Float32, 3}})

Closest candidates are:
  MultiRegionObject(::KernelAbstractions.Backend, ::Tuple, ::Tuple)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:25
  MultiRegionObject(::KernelAbstractions.Backend, Any...; devices)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:18
  MultiRegionObject(::Oceananigans.Architectures.AbstractArchitecture, ::Tuple, ::Tuple)
   @ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:34
  ...

Stacktrace:
  [1] convert_output(mo::MultiRegionObject{…}, writer::JLD2Writer{…})
    @ Oceananigans.MultiRegion ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/MultiRegion/multi_region_output_writers.jl:54
  [2] fetch_and_convert_output(output::Field{…}, model::HydrostaticFreeSurfaceModel{…}, writer::JLD2Writer{…})
    @ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/fetch_output.jl:40
  [3] (::Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}})(::Tuple{Symbol, Field{…}})
    @ Oceananigans.OutputWriters ./none:0
  [4] iterate
    @ ./generator.jl:47 [inlined]
  [5] merge(a::@NamedTuple{}, itr::Base.Generator{Base.Iterators.Zip{Tuple{…}}, Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}}})
    @ Base ./namedtuple.jl:360
  [6] NamedTuple
    @ ./namedtuple.jl:151 [inlined]
  [7] macro expansion
    @ ./timing.jl:395 [inlined]
  [8] write_output!(writer::JLD2Writer{…}, model::HydrostaticFreeSurfaceModel{…})
    @ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/jld2_writer.jl:253
  [9] write_output!(writer::JLD2Writer{…}, sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/simulation.jl:252
 [10] initialize!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:243
 [11] time_step!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:136
 [12] run!(sim::Simulation{…}; pickup::Bool)
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:105
 [13] run!(sim::Simulation{…})
    @ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:92
 [14] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
``

navidcy · 2025-07-06T23:02:41Z

src/Utils/versioninfo.jl

+# This should be deprectated. Calls GPU() which is only
+# defined when CUDA is loaded and maps to CUDAGPU()


I think at the moment only the NCDatasetExt uses this method?
do we really need it? cc @tomchor

I guess it depends how you defined need lol. This just spits out versioninfo() with the GPU info at bottom. It's definitely good to have that information in NetCDF files after the runs, although it's not essential. That said, it's a small piece of code so I don't know if we'd gain much by removing it.

It's also used in benchmark/src/Benchmarks.jl btw.

navidcy · 2025-07-06T23:44:33Z

I added the backend as first arg in the MultiRegionObject constructor; see

https://github.com/michel2323/Oceananigans.jl/blob/05b1d9927275d7ec74fce2f7a7afab2769bc21d5/src/MultiRegion/multi_region_output_writers.jl#L56

Is this the right thing to do?

Now there is a FieldTimeSeries-related error is further down in the test_multi_region_cubed_sphere.jl...

navidcy · 2025-07-07T00:19:52Z

Providing CPU() to MultiRegionObject constructor seems to do the job...

convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))

But is this the right thing to do? Not sure.

navidcy · 2025-07-07T01:46:00Z

All tests on tartarus pass!
I'd like to see the distributed CI pass as well tho...

navidcy · 2025-07-07T03:20:46Z

Distributed CI passes 🎉

michel2323 · 2025-07-07T14:56:40Z

Providing CPU() to MultiRegionObject constructor seems to do the job...
convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects))
But is this the right thing to do? Not sure.

I think the more general form is

 convert_output(mo::MultiRegionObject, writer) =
    MultiRegionObject(
        architecture_from_type(writer.array_type),
        Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects)
    )

I had to add:

architecture_from_type(type::Type{<:AbstractArray}) = architecture(type())

@glwagner Opinions?

michel2323 · 2025-07-07T14:57:45Z

@navidcy Thank you so much for the review and fixes!

navidcy · 2025-07-07T21:49:47Z

hi @michel2323,
I doubt that the architecture_from_type method did the job -- the tests still fail.

That's because Array type cannot be instantiated like Array() I believe...

What do we want here? We want it to return the architecture that corresponds to the outer type of the writer.array_type, correct? E.g., if this is Array{Float32} we want CPU() and if it's CuArray{Float64} we want GPU(), etc?

src/MultiRegion/multi_region_output_writers.jl

navidcy · 2025-07-07T22:51:11Z

hi @michel2323, I doubt that the architecture_from_type method did the job -- the tests still fail.

That's because Array type cannot be instantiated like Array() I believe...

What do we want here? We want it to return the architecture that corresponds to the outer type of the writer.array_type, correct? E.g., if this is Array{Float32} we want CPU() and if it's CuArray{Float64} we want GPU(), etc?

ff957c8 attempts to resolve this

michel2323 and others added 4 commits April 11, 2025 17:05

Update .gitlab-ci.yml file

6743772

Adding Aurora CI

314ddea

Fix

5656758

Fix

15e1fb1

michel2323 mentioned this pull request May 12, 2025

add KA.get_backend(dev) JuliaGPU/CUDA.jl#2779

Closed

vchuravy reviewed May 12, 2025

View reviewed changes

src/arch_cuda.jl Outdated Show resolved Hide resolved

vchuravy reviewed May 12, 2025

View reviewed changes

src/arch_cuda.jl Outdated Show resolved Hide resolved

vchuravy reviewed May 12, 2025

View reviewed changes

src/arch_cuda.jl Outdated Show resolved Hide resolved

vchuravy reviewed May 12, 2025

View reviewed changes

src/arch_cuda.jl Outdated Show resolved Hide resolved

navidcy added the GPU 👾 Where Oceananigans gets its powers from label May 13, 2025

michel2323 force-pushed the ms/ka branch 2 times, most recently from 69eb545 to 59a441d Compare May 16, 2025 16:08

vchuravy reviewed May 19, 2025

View reviewed changes

src/Utils/versioninfo.jl Outdated Show resolved Hide resolved

michel2323 force-pushed the ms/ka branch from bad037c to 56820c7 Compare May 19, 2025 14:52

michel2323 force-pushed the ms/ka branch 2 times, most recently from 4bbb99e to 3e84145 Compare June 5, 2025 18:04

glwagner mentioned this pull request Jun 10, 2025

Extend FFTBasedPoissonSolver to work on AMDGPU #4593

Open

glwagner reviewed Jun 11, 2025

View reviewed changes

src/Architectures.jl Outdated Show resolved Hide resolved

michel2323 mentioned this pull request Jun 12, 2025

Distributed working with AMD issue #4597

Open

michel2323 force-pushed the ms/ka branch from fc45a83 to 2e5f754 Compare June 12, 2025 15:27

michel2323 force-pushed the ms/ka branch from 266435e to 5e68a49 Compare June 18, 2025 14:14

install CUDA

0a73212

navidcy reviewed Jul 6, 2025

View reviewed changes

navidcy added 4 commits July 7, 2025 09:35

delete stray empty line

f065e2a

add backend when creating MultiRegionObject

f960b63

missed =

63539ed

remove some duplicate defs and gather tests together

05b1d99

navidcy self-requested a review July 6, 2025 23:41

navidcy approved these changes Jul 6, 2025

View reviewed changes

convert_output(mo::MultiRegionObject, model) always on CPU?

609e5a7

navidcy added the extensions 🧬 label Jul 7, 2025

navidcy requested review from glwagner and simone-silvestri July 7, 2025 01:20

navidcy mentioned this pull request Jul 7, 2025

Only allow Oceananigans 0.96.x CliMA/ClimaOcean.jl#572

Merged

MultiRegionOuputWriter fix

984eb19

navidcy reviewed Jul 7, 2025

View reviewed changes

src/MultiRegion/multi_region_output_writers.jl Outdated Show resolved Hide resolved

navidcy added 4 commits July 8, 2025 08:06

let go of arch_array

3209c40

reorganize imports

1c6fa73

add method for architecture(::Type{<:AbstractArray})

ff957c8

Merge branch 'main' into ms/ka

93a75d7

Merge branch 'main' into ms/ka

3696c43

		isnothing(devices) ? device!(node_rank % ndevices()) : device!(devices[node_rank+1])
		isnothing(devices) ? device!(child_architecture, node_rank % ndevices(child_architecture)) : device!(child_architecture, devices[node_rank+1])

	throw(ArgumentError("$side $bc must use `CuArray` rather than `Array` on GPU architectures!"))
	throw(ArgumentError("$side $bc must use `GPUArray` rather than `Array` on GPU architectures!"))

		# This should be deprectated. Calls GPU() which is only
		# defined when CUDA is loaded and maps to CUDAGPU()

Move CUDA stuff to an extension #4499

Are you sure you want to change the base?

Move CUDA stuff to an extension #4499

Conversation

michel2323 commented May 12, 2025 • edited by glwagner Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glwagner commented May 12, 2025

Uh oh!

glwagner commented May 12, 2025

Uh oh!

simone-silvestri commented May 12, 2025

Uh oh!

Uh oh!

glwagner commented May 24, 2025

Uh oh!

glwagner Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michel2323 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michel2323 commented Jun 12, 2025

Uh oh!

vchuravy commented Jun 12, 2025

Uh oh!

michel2323 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

navidcy Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

navidcy Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

navidcy commented Jul 6, 2025

Uh oh!

navidcy commented Jul 6, 2025

Uh oh!

navidcy Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

tomchor Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

navidcy commented Jul 6, 2025

Uh oh!

navidcy commented Jul 7, 2025

Uh oh!

navidcy commented Jul 7, 2025

Uh oh!

navidcy commented Jul 7, 2025

Uh oh!

michel2323 commented Jul 7, 2025

Uh oh!

michel2323 commented Jul 7, 2025

Uh oh!

navidcy commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

navidcy commented Jul 7, 2025

Uh oh!

Uh oh!

michel2323 commented May 12, 2025 •

edited by glwagner

Loading

michel2323 commented Jun 11, 2025 •

edited

Loading

michel2323 commented Jun 12, 2025 •

edited

Loading

navidcy commented Jul 7, 2025 •

edited

Loading