-
Notifications
You must be signed in to change notification settings - Fork 241
Move CUDA stuff to an extension #4499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Possibly, we should simply implement a CUDA extension in this PR with appropriate organization of the code and get on with the breaking change! tl;dr then after this is merged, anybody doing computations on nvidia GPU has to write using Oceananigans
using CUDA |
@simone-silvestri curious to hear your thoughts |
I think it's a good idea. It provides templates to add new architectures and makes the code completely architecture agnostic. the extra |
69eb545
to
59a441d
Compare
@michel2323 let us know when this is ready for prime time |
4bbb99e
to
3e84145
Compare
isnothing(devices) ? device!(node_rank % ndevices()) : device!(devices[node_rank+1]) | ||
isnothing(devices) ? device!(child_architecture, node_rank % ndevices(child_architecture)) : device!(child_architecture, devices[node_rank+1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glwagner For the failing tests, we have 4 in total
|
I think the main problem is that I haven't figured out what |
To my knowledge that's an ill-formed query. |
@simone-silvestri @glwagner How do you want to proceed with these? Can this be rewritten to only use stuff from |
validate_boundary_condition_architecture(::CuArray, ::GPU, bc, side) = nothing | ||
|
||
validate_boundary_condition_architecture(::CuArray, ::CPU, bc, side) = | ||
throw(ArgumentError("$side $bc must use `Array` rather than `CuArray` on CPU architectures!")) | ||
|
||
validate_boundary_condition_architecture(::Array, ::GPU, bc, side) = | ||
throw(ArgumentError("$side $bc must use `CuArray` rather than `Array` on GPU architectures!")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reads like a CUDA-specific message? Is the error message wrongly CUDA-specific here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw(ArgumentError("$side $bc must use `CuArray` rather than `Array` on GPU architectures!")) | |
throw(ArgumentError("$side $bc must use `GPUArray` rather than `Array` on GPU architectures!")) |
?
I fixed the docs. The last error seems to be coming from MultiRegion. The cubed sphere simulation fails at the run!(simulation) when it tries to write output. Seems that the error is coming from an iterator? I wasn't able to figure it out. Here's an MWE using Oceananigans
grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection = WENOVectorInvariant(order=5),
tracer_advection = WENO(order=5),
free_surface = SplitExplicitFreeSurface(grid; substeps=12),
coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
tracers = :b,
buoyancy = BuoyancyTracer())
simulation = Simulation(model, Δt=60, stop_time=600)
simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
schedule = IterationInterval(2),
filename = "cubed_sphere_output",
verbose = false,
overwrite_existing = true)
run!(simulation) |
julia> using Oceananigans
[ Info: Oceananigans will use 12 threads
julia> grid = ConformalCubedSphereGrid(CPU(); panel_size = (18, 18, 9), z = (0, 1), radius = 1, horizontal_direction_halo = 6)
ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} partitioned on CPU():
├── grids: 18×18×9 OrthogonalSphericalShellGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6×6×6 halo and with precomputed metrics
├── partitioning: CubedSpherePartition with (1 region in each panel)
├── connectivity: CubedSphereConnectivity
└── devices: (CPU(), CPU(), CPU(), CPU(), CPU(), CPU())
julia> model = HydrostaticFreeSurfaceModel(; grid,
momentum_advection = WENOVectorInvariant(order=5),
tracer_advection = WENO(order=5),
free_surface = SplitExplicitFreeSurface(grid; substeps=12),
coriolis = HydrostaticSphericalCoriolis(eltype(grid)),
tracers = :b,
buoyancy = BuoyancyTracer())
HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
├── grid: 18×18×9 ConformalCubedSphereGrid{Float64, Oceananigans.Grids.FullyConnected, Oceananigans.Grids.FullyConnected, Bounded} on CPU with 6×6×6 halo
├── timestepper: QuasiAdamsBashforth2TimeStepper
├── tracers: b
├── closure: Nothing
├── buoyancy: BuoyancyTracer with ĝ = NegativeZDirection()
├── free surface: SplitExplicitFreeSurface with gravitational acceleration 9.80665 m s⁻²
│ └── substepping: FixedSubstepNumber(8)
├── advection scheme:
│ ├── momentum: MultiRegionObject{NTuple{6, WENOVectorInvariant{3, 3, Float64, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.VelocityStencil, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, WENO{3, Float64, Float32, Nothing, WENO{2, Float64, Float32, Nothing, UpwindBiased{1, Float64, Nothing, Centered{1, Float64, Nothing}}, Centered{1, Float64, Nothing}}, Centered{2, Float64, Centered{1, Float64, Nothing}}}, Oceananigans.Advection.OnlySelfUpwinding{Centered{2, Float64, Centered{1, Float64, Nothing}}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.divergence_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.u_smoothness)}, Oceananigans.Advection.FunctionStencil{typeof(Oceananigans.Advection.v_smoothness)}}}}, NTuple{6, CPU}, KernelAbstractions.CPU}
│ └── b: WENO{3, Float64, Float32}(order=5)
└── coriolis: HydrostaticSphericalCoriolis{Oceananigans.Advection.EnstrophyConserving{Float64}, Float64}
julia> simulation = Simulation(model, Δt=60, stop_time=600)
Simulation of HydrostaticFreeSurfaceModel{CPU, MultiRegionGrid}(time = 0 seconds, iteration = 0)
├── Next time step: 1 minute
├── Elapsed wall time: 0 seconds
├── Wall time per iteration: NaN days
├── Stop time: 10 minutes
├── Stop iteration: Inf
├── Wall time limit: Inf
├── Minimum relative step: 0.0
├── Callbacks: OrderedDict with 4 entries:
│ ├── stop_time_exceeded => 4
│ ├── stop_iteration_exceeded => -
│ ├── wall_time_limit_exceeded => e
│ └── nan_checker => }
├── Output writers: OrderedDict with no entries
└── Diagnostics: OrderedDict with no entries
julia> simulation.output_writers[:fields] = JLD2Writer(model, fields(model);
schedule = IterationInterval(2),
filename = "cubed_sphere_output",
verbose = false,
overwrite_existing = true)
JLD2Writer scheduled on IterationInterval(2):
├── filepath: cubed_sphere_output.jld2
├── 7 outputs: (u, v, w, b, η, U, V)
├── array type: Array{Float32}
├── including: [:grid, :coriolis, :buoyancy, :closure]
├── file_splitting: NoFileSplitting
└── file size: 1.9 MiB
julia> run!(simulation)
[ Info: Initializing simulation...
ERROR: MethodError: no method matching MultiRegionObject(::NTuple{6, Array{Float32, 3}})
Closest candidates are:
MultiRegionObject(::KernelAbstractions.Backend, ::Tuple, ::Tuple)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:25
MultiRegionObject(::KernelAbstractions.Backend, Any...; devices)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:18
MultiRegionObject(::Oceananigans.Architectures.AbstractArchitecture, ::Tuple, ::Tuple)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Utils/multi_region_transformation.jl:34
...
Stacktrace:
[1] convert_output(mo::MultiRegionObject{…}, writer::JLD2Writer{…})
@ Oceananigans.MultiRegion ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/MultiRegion/multi_region_output_writers.jl:54
[2] fetch_and_convert_output(output::Field{…}, model::HydrostaticFreeSurfaceModel{…}, writer::JLD2Writer{…})
@ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/fetch_output.jl:40
[3] (::Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}})(::Tuple{Symbol, Field{…}})
@ Oceananigans.OutputWriters ./none:0
[4] iterate
@ ./generator.jl:47 [inlined]
[5] merge(a::@NamedTuple{}, itr::Base.Generator{Base.Iterators.Zip{Tuple{…}}, Oceananigans.OutputWriters.var"#36#37"{JLD2Writer{…}, HydrostaticFreeSurfaceModel{…}}})
@ Base ./namedtuple.jl:360
[6] NamedTuple
@ ./namedtuple.jl:151 [inlined]
[7] macro expansion
@ ./timing.jl:395 [inlined]
[8] write_output!(writer::JLD2Writer{…}, model::HydrostaticFreeSurfaceModel{…})
@ Oceananigans.OutputWriters ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/OutputWriters/jld2_writer.jl:253
[9] write_output!(writer::JLD2Writer{…}, sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/simulation.jl:252
[10] initialize!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:243
[11] time_step!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:136
[12] run!(sim::Simulation{…}; pickup::Bool)
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:105
[13] run!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl/src/Simulations/run.jl:92
[14] top-level scope
@ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
`` |
# This should be deprectated. Calls GPU() which is only | ||
# defined when CUDA is loaded and maps to CUDAGPU() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at the moment only the NCDatasetExt uses this method?
do we really need it? cc @tomchor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends how you defined need lol. This just spits out versioninfo()
with the GPU info at bottom. It's definitely good to have that information in NetCDF files after the runs, although it's not essential. That said, it's a small piece of code so I don't know if we'd gain much by removing it.
It's also used in benchmark/src/Benchmarks.jl
btw.
I added the backend as first arg in the MultiRegionObject constructor; see Is this the right thing to do? Now there is a FieldTimeSeries-related error is further down in the test_multi_region_cubed_sphere.jl... |
Providing CPU() to MultiRegionObject constructor seems to do the job... convert_output(mo::MultiRegionObject, writer) =
MultiRegionObject(CPU(), Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects)) But is this the right thing to do? Not sure. |
All tests on tartarus pass! |
Distributed CI passes 🎉 |
I think the more general form is convert_output(mo::MultiRegionObject, writer) =
MultiRegionObject(
architecture_from_type(writer.array_type),
Tuple(convert(writer.array_type, obj) for obj in mo.regional_objects)
) I had to add: architecture_from_type(type::Type{<:AbstractArray}) = architecture(type()) @glwagner Opinions? |
@navidcy Thank you so much for the review and fixes! |
hi @michel2323, That's because What do we want here? We want it to return the architecture that corresponds to the outer type of the |
ff957c8 attempts to resolve this |
This PR isolates CUDA into
src/arch_cuda.jl
. This removes any direct CUDA calls in the remaining Oceananigans code base. That feel can either serve as a template for a new GPU architecture or for a future CUDA extension. @vchuravyCloses #3481