Skip to content

DRAFT: Optimized Non-Conservative Zonal Average #1180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hongyuchen1030
Copy link
Contributor

@hongyuchen1030 hongyuchen1030 commented Mar 12, 2025

Closes #1181 #880

Overview

  • Introduce specialized helper functions for the optimized non-conservative zonal average routine in zonal.py
  • Implement caching and reuse of intermediary computational steps to reduce redundant recalculations
  • Retain the general-purpose helper functions (which use data copying) for broader usability
  • Improve overall performance of the zonal averaging computation

Expected Usage

import uxarray as ux

grid_path = "/path/to/grid.nc"
data_path = "/path/to/data.nc"

uxds = ux.open_dataset(grid_path, data_path)

# this is how you use this function
some_output = uxds.some_function()

# this is another way to use this function
other_output = uxds.some_function(some_param = True)

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have been added to all new functions
  • Docstrings have updated with any function changes
  • Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • User functions have been added to docs/user_api/index.rst

Examples

  • Any new notebook examples added to docs/examples/ folder
  • Clear the output of all cells before committing
  • New notebook files added to docs/examples.rst toctree
  • New notebook files added to new entry in docs/gallery.yml with appropriate thumbnail photo in docs/_static/thumbnails/

@hongyuchen1030 hongyuchen1030 self-assigned this Mar 12, 2025
@hongyuchen1030 hongyuchen1030 marked this pull request as draft March 12, 2025 02:03
@hongyuchen1030 hongyuchen1030 linked an issue Mar 12, 2025 that may be closed by this pull request
6 tasks
@hongyuchen1030 hongyuchen1030 changed the title initial commit Optimized Non-Conservative Zonal Average Mar 12, 2025
@hongyuchen1030 hongyuchen1030 marked this pull request as ready for review March 14, 2025 05:10
@hongyuchen1030 hongyuchen1030 requested a review from philipc2 March 14, 2025 05:14
@hongyuchen1030 hongyuchen1030 added the run-benchmark Run ASV benchmark workflow label Mar 14, 2025
@rajeeja
Copy link
Contributor

rajeeja commented Mar 14, 2025

ASV Benchmark failing?

@rajeeja rajeeja self-requested a review March 14, 2025 14:12
@philipc2 philipc2 added run-benchmark Run ASV benchmark workflow and removed run-benchmark Run ASV benchmark workflow labels Mar 17, 2025
@philipc2
Copy link
Member

pre-commit.ci autofix

@hongyuchen1030
Copy link
Contributor Author

@philipc2

The benchmark still doesn't work out for this PR, can you take a look? Thanks

Copy link
Contributor

@rajeeja rajeeja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, we can fix the benchmarking for forked repo's later.

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label Mar 31, 2025
@hongyuchen1030
Copy link
Contributor Author

Hi @philipc2 ,

Thanks again for the discussion—I'm glad we reached an agreement. I've added the latest commit, which removes the "faces_edges_cartesian" and "faces_edges_spherical" properties at the end of the zonal_mean call. This change should help save duplicate call time and prevent unnecessary memory expansion.

Please take a look when you have a moment. Once this is merged, I'll open another PR where we can integrate and carefully parallelize the latlonbound and zonal_avg calls.

Thanks!

@hongyuchen1030
Copy link
Contributor Author

hongyuchen1030 commented Apr 1, 2025

@philipc2 And I also thought carefully about optimizing the loop you mentioned:

Something like the following should work if we put it inside of the loop?

_get_cartesian_face_edge_nodes(
       uxgrid.face_node_connectivity.values[face_indices], # only candidate faces
       len(face_indices), # number of candidate faces
       uxgrid.n_max_face_nodes,
       uxgrid.node_x.values,
       uxgrid.node_y.values,
       uxgrid.node_z.values,
    )

I know it sounds good to build the connectivity for each face slice inside the loop, and it would be very helpful if the user is querying just a few latitudes. However, consider the case of a batch of latitudes: for every latitude call, we would be querying the associated uxgrid.node_x.values, uxgrid.node_y.values, and uxgrid.node_z.values for the candidate faces, converting them into Cartesian face edge nodes, and then discarding them. Meanwhile, we already have this information available from the latlon_bound build.

This brings up another general design idea (which also applies to S2 geometry): we should separate the single call from the batch call. Our current version works best for a batch call for candidate faces—which is a common use case. In a future optimization, we can also consider optimizing the function for a single latitude call, perhaps even creating a dedicated function tailored for that scenario.

Let me know your thoughts.

@philipc2
Copy link
Member

philipc2 commented Apr 1, 2025

@philipc2 And I also thought carefully about optimizing the loop you mentioned:

Something like the following should work if we put it inside of the loop?

_get_cartesian_face_edge_nodes(
       uxgrid.face_node_connectivity.values[face_indices], # only candidate faces
       len(face_indices), # number of candidate faces
       uxgrid.n_max_face_nodes,
       uxgrid.node_x.values,
       uxgrid.node_y.values,
       uxgrid.node_z.values,
    )

I know it sounds good to build the connectivity for each face slice inside the loop, and it would be very helpful if the user is querying just a few latitudes. However, consider the case of a batch of latitudes: for every latitude call, we would be querying the associated uxgrid.node_x.values, uxgrid.node_y.values, and uxgrid.node_z.values for the candidate faces, converting them into Cartesian face edge nodes, and then discarding them. Meanwhile, we already have this information available from the latlon_bound build.

This is a good point. Keeping the call outside of the loop should be good for our current application.

This brings up another general design idea (which also applies to S2 geometry): we should separate the single call from the batch call. Our current version works best for a batch call for candidate faces—which is a common use case. In a future optimization, we can also consider optimizing the function for a single latitude call, perhaps even creating a dedicated function tailored for that scenario.

I like that idea.

Comment on lines 1448 to 1485
@property
def faces_edges_cartesian(self):
"""Cartesian Coordinates for each Face Edge.

Dimensions ``(n_face, n_max_face_edges, 2, 3)``
"""
if "faces_edges_cartesian" not in self._ds:
_populate_faces_edges_cartesian(self)
return self._ds["faces_edges_cartesian"]

@faces_edges_cartesian.setter
def faces_edges_cartesian(self, value):
self._ds["faces_edges_cartesian"] = value

@faces_edges_cartesian.deleter
def faces_edges_cartesian(self):
if "faces_edges_cartesian" in self._ds:
del self._ds["faces_edges_cartesian"]

@property
def faces_edges_spherical(self):
"""Latitude Longitude Coordinates for each Face in radians.

Dimensions ``(n_face, n_max_face_edges, 2, 2)``
"""
if "faces_edges_spherical" not in self._ds:
_populate_faces_edges_spherical(self)
return self._ds["faces_edges_spherical"]

@faces_edges_spherical.setter
def faces_edges_spherical(self, value):
self._ds["faces_edges_spherical"] = value

@faces_edges_spherical.deleter
def faces_edges_spherical(self):
if "faces_edges_spherical" in self._ds:
del self._ds["faces_edges_spherical"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a step in the right direction, although I don't think we should store the face_edges in self._ds due to the memory concerns.

I'm happy to keep the two faces_edges_spherical and faces_edges_cartesian as part of the API, but instead they construct the variables by calling _get_cartesian_faces_edge_nodes.

Inside of zonal.py, calling this attribute within the scope of the function will automatically manage the memory associated with it once we exit the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping face_edges in zonal was my first thought as well. However, I then realized that latlonbound also needs to use this property. That means we would have to modify the latlonbound API to accept incoming face_edges, which involves more API changes than I initially expected—and I’m wary of making such significant modifications to the existing API and functionality.

Do you have any suggestions on where we might store this information to address the memory concerns without causing a large API overhaul?

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the two @property implementations, and have them return the output of _get_cartesian_face_edge_nodes() and _get_lonlat_rad_face_edge_nodes() and replace the following two lines with calls to Grid.face_edges_cartesian and Grid.face_edges_spherical

# Prepare data for Numba functions
faces_edges_cartesian = _get_cartesian_face_edge_nodes(
grid.face_node_connectivity.values,
grid.n_face,
grid.n_max_face_edges,
grid.node_x.values,
grid.node_y.values,
grid.node_z.values,
)
faces_edges_lonlat_rad = _get_lonlat_rad_face_edge_nodes(
grid.face_node_connectivity.values,
grid.n_face,
grid.n_max_face_edges,
grid.node_lon.values,
grid.node_lat.values,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongyuchen1030

Were you able to make this change?

Copy link
Contributor Author

@hongyuchen1030 hongyuchen1030 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipc2

I am unable to find such a leftover design in this PR. I have also pinned you to the corresponding populate_bounds section because the whole point of the PR was to reduce such duplicate calls. I wonder which commit this file belongs to?

@philipc2
Copy link
Member

philipc2 commented Apr 1, 2025

@hongyuchen1030

Here's an optimized version of get_cartesian_face_edge_nodes using Numba. Here we re-use the face_edge_connectivity and edge_node_connectivity that can be computed internally and re-used.

This runs in about 90ms for me, down from the ~500ms we currently have.

@njit(cache=True, parallel=True)
def get_cartesian_face_edge_nodes(face_edge_connectivity, edge_node_connectivity,
                                  node_x, node_y, node_z,
                                  n_edges_per_face, n_face, n_max_face_edges):

    # Allocate result array
    # I used np.nan here instead of INT_FILL_VALUE to be consistent with the double type (INT_FILL_VALUE is for our int64 arrays)
    face_edges = np.full((n_face, n_max_face_edges, 2, 3), np.nan, dtype=np.float64)

    # Construct edge_nodes for each face in parallel
    for face_i in prange(n_face):
        n_edges = n_edges_per_face[face_i]
        edge_indices = face_edge_connectivity[face_i, :n_edges]
        node_indices = edge_node_connectivity[edge_indices]  

        first_nodes = node_indices[:, 0]
        second_nodes = node_indices[:, 1]

        # Assign x coordinates
        face_edges[face_i, :n_edges, 0, 0] = node_x[first_nodes]
        face_edges[face_i, :n_edges, 1, 0] = node_x[second_nodes]

        # Assign y coordinates
        face_edges[face_i, :n_edges, 0, 1] = node_y[first_nodes]
        face_edges[face_i, :n_edges, 1, 1] = node_y[second_nodes]

        # Assign z coordinates
        face_edges[face_i, :n_edges, 0, 2] = node_z[first_nodes]
        face_edges[face_i, :n_edges, 1, 2] = node_z[second_nodes]

    return face_edges

The same can be applied to the spherical one, with changes only needed to the assignment of coordinates.

@hongyuchen1030
Copy link
Contributor Author

face_node_connectivity

Thanks for the information! This is actually what I had in mind for get_cartesian_face_edge_nodes using face_edge_connectivity and edge_node_connectivity. However, I recall that we had a lengthy discussion about why we should separate the steps—first populating face_node_connectivity (using face_edge_connectivity and edge_node_connectivity) and storing it in _ds, and then populating face_edge_connectivity when needed.

Do you still recall the rationale behind that decision? I'm trying to avoid duplicate iterations as much as possible.

Thanks!

@philipc2
Copy link
Member

philipc2 commented Apr 1, 2025

face_node_connectivity

Thanks for the information! This is actually what I had in mind for get_cartesian_face_edge_nodes using face_edge_connectivity and edge_node_connectivity. However, I recall that we had a lengthy discussion about why we should separate the steps—first populating face_node_connectivity (using face_edge_connectivity and edge_node_connectivity) and storing it in _ds, and then populating face_edge_connectivity when needed.

Do you still recall the rationale behind that decision? I'm trying to avoid duplicate iterations as much as possible.

Thanks!

Most of the time, the face_node_connectivity is already present and we do not need to construct it. The other two connectivity variables are automatically populated when we call their respective attribute.

It makes sense to me to re-use those connectivity variables instead of doing the extra operations within the function call, which each time essentially "derives" the same information already present in the connectivity.

@hongyuchen1030
Copy link
Contributor Author

@hongyuchen1030

Here's an optimized version of get_cartesian_face_edge_nodes using Numba. Here we re-use the face_edge_connectivity and edge_node_connectivity that can be computed internally and re-used.

This runs in about 90ms for me, down from the ~500ms we currently have.

@njit(cache=True, parallel=True)
def get_cartesian_face_edge_nodes(face_edge_connectivity, edge_node_connectivity,
                                  node_x, node_y, node_z,
                                  n_edges_per_face, n_face, n_max_face_edges):

    # Allocate result array
    # I used np.nan here instead of INT_FILL_VALUE to be consistent with the double type (INT_FILL_VALUE is for our int64 arrays)
    face_edges = np.full((n_face, n_max_face_edges, 2, 3), np.nan, dtype=np.float64)

    # Construct edge_nodes for each face in parallel
    for face_i in prange(n_face):
        n_edges = n_edges_per_face[face_i]
        edge_indices = face_edge_connectivity[face_i, :n_edges]
        node_indices = edge_node_connectivity[edge_indices]  

        first_nodes = node_indices[:, 0]
        second_nodes = node_indices[:, 1]

        # Assign x coordinates
        face_edges[face_i, :n_edges, 0, 0] = node_x[first_nodes]
        face_edges[face_i, :n_edges, 1, 0] = node_x[second_nodes]

        # Assign y coordinates
        face_edges[face_i, :n_edges, 0, 1] = node_y[first_nodes]
        face_edges[face_i, :n_edges, 1, 1] = node_y[second_nodes]

        # Assign z coordinates
        face_edges[face_i, :n_edges, 0, 2] = node_z[first_nodes]
        face_edges[face_i, :n_edges, 1, 2] = node_z[second_nodes]

    return face_edges

The same can be applied to the spherical one, with changes only needed to the assignment of coordinates.

Hi @philipc2

Have you compared the new version's results with the old one? I set up the following test and found that the results differ:

def test_cartesian_test():
    grid = ux.open_grid(gridfile_CSne8)
    new_faces_edges_cartesian = get_cartesian_face_edge_nodes(
        grid.face_edge_connectivity.values,
        grid.edge_node_connectivity.values,
        grid.node_x.values,
        grid.node_y.values,
        grid.node_z.values,
        grid.n_edges_per_face,  # or grid.n_face, depending on what you intended
        grid.n_max_face_edges
    )

    old_faces_edges_cartesian = _get_cartesian_faces_edge_nodes(
        grid.face_node_connectivity.values, grid.n_face,
        grid.n_max_face_edges, grid.node_x.values,
        grid.node_y.values, grid.node_z.values
    )

    nt.assert_array_equal(new_faces_edges_cartesian, old_faces_edges_cartesian)

The error messages are:

args = (<built-in function eq>, array([[[[ 0.63933827, -0.42719217, -0.63933827],
         [ 0.57735027, -0.57735027, -0.5773... 0.63933827]],

        [[-0.63933827,  0.42719217,  0.63933827],
         [-0.48565274,  0.48565274,  0.72683068]]]]))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 4516 / 9216 (49%)
E           Max absolute difference among violations: 0.19509032
E           Max relative difference among violations: 1.60290531e+16
E            ACTUAL: array([[[[ 0.639338, -0.427192, -0.639338],
E                    [ 0.57735 , -0.57735 , -0.57735 ]],
E           ...
E            DESIRED: array([[[[ 0.57735 , -0.57735 , -0.57735 ],
E                    [ 0.639338, -0.427192, -0.639338]],
E           ...

@philipc2
Copy link
Member

philipc2 commented Apr 3, 2025

@hongyuchen1030
Here's an optimized version of get_cartesian_face_edge_nodes using Numba. Here we re-use the face_edge_connectivity and edge_node_connectivity that can be computed internally and re-used.
This runs in about 90ms for me, down from the ~500ms we currently have.

@njit(cache=True, parallel=True)
def get_cartesian_face_edge_nodes(face_edge_connectivity, edge_node_connectivity,
                                  node_x, node_y, node_z,
                                  n_edges_per_face, n_face, n_max_face_edges):

    # Allocate result array
    # I used np.nan here instead of INT_FILL_VALUE to be consistent with the double type (INT_FILL_VALUE is for our int64 arrays)
    face_edges = np.full((n_face, n_max_face_edges, 2, 3), np.nan, dtype=np.float64)

    # Construct edge_nodes for each face in parallel
    for face_i in prange(n_face):
        n_edges = n_edges_per_face[face_i]
        edge_indices = face_edge_connectivity[face_i, :n_edges]
        node_indices = edge_node_connectivity[edge_indices]  

        first_nodes = node_indices[:, 0]
        second_nodes = node_indices[:, 1]

        # Assign x coordinates
        face_edges[face_i, :n_edges, 0, 0] = node_x[first_nodes]
        face_edges[face_i, :n_edges, 1, 0] = node_x[second_nodes]

        # Assign y coordinates
        face_edges[face_i, :n_edges, 0, 1] = node_y[first_nodes]
        face_edges[face_i, :n_edges, 1, 1] = node_y[second_nodes]

        # Assign z coordinates
        face_edges[face_i, :n_edges, 0, 2] = node_z[first_nodes]
        face_edges[face_i, :n_edges, 1, 2] = node_z[second_nodes]

    return face_edges

The same can be applied to the spherical one, with changes only needed to the assignment of coordinates.

Hi @philipc2

Have you compared the new version's results with the old one? I set up the following test and found that the results differ:

def test_cartesian_test():
    grid = ux.open_grid(gridfile_CSne8)
    new_faces_edges_cartesian = get_cartesian_face_edge_nodes(
        grid.face_edge_connectivity.values,
        grid.edge_node_connectivity.values,
        grid.node_x.values,
        grid.node_y.values,
        grid.node_z.values,
        grid.n_edges_per_face,  # or grid.n_face, depending on what you intended
        grid.n_max_face_edges
    )

    old_faces_edges_cartesian = _get_cartesian_faces_edge_nodes(
        grid.face_node_connectivity.values, grid.n_face,
        grid.n_max_face_edges, grid.node_x.values,
        grid.node_y.values, grid.node_z.values
    )

    nt.assert_array_equal(new_faces_edges_cartesian, old_faces_edges_cartesian)

The error messages are:

args = (<built-in function eq>, array([[[[ 0.63933827, -0.42719217, -0.63933827],
         [ 0.57735027, -0.57735027, -0.5773... 0.63933827]],

        [[-0.63933827,  0.42719217,  0.63933827],
         [-0.48565274,  0.48565274,  0.72683068]]]]))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 4516 / 9216 (49%)
E           Max absolute difference among violations: 0.19509032
E           Max relative difference among violations: 1.60290531e+16
E            ACTUAL: array([[[[ 0.639338, -0.427192, -0.639338],
E                    [ 0.57735 , -0.57735 , -0.57735 ]],
E           ...
E            DESIRED: array([[[[ 0.57735 , -0.57735 , -0.57735 ],
E                    [ 0.639338, -0.427192, -0.639338]],
E           ...

Looks like there is some small differences. We can move this optimizing into the work I'm putting together in #1195 and keep this PR smaller in scope.

@hongyuchen1030
Copy link
Contributor Author

Looks like there is some small differences. We can move this optimizing into the work I'm putting together in #1195 and keep this PR smaller in scope.

@philipc2
Sounds good, Can you approve this PR so it can be merged? Thanks

@philipc2
Copy link
Member

philipc2 commented Apr 3, 2025

Looks like there is some small differences. We can move this optimizing into the work I'm putting together in #1195 and keep this PR smaller in scope.

@philipc2 Sounds good, Can you approve this PR so it can be merged? Thanks

Yep! Wrapping up a final review.

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify a couple things.

I would prefer not to handle the deletion of the allocated arrays through the Grid attributes.

If we want to keep the attributes (i.e. Grid.faces_edges_cartesian), these should not store anything as part of the Grid._ds. We would instead use them for convenience within our calls (i.e. what you've already done inside of zonal average).

This loops back to the discussion I had previously with @erogluorhan . Pinging him here if he has anything to add.

I do see the benefit of caching and don't mean to invalidate the optimizations in this PR, however the current issue that is still making the usage of some of our functionality impractical for the highest resolution grids is still the execution time of Grid.bounds

I hope this makes sense.

@hongyuchen1030
Copy link
Contributor Author

Just to clarify a couple things.

I would prefer not to handle the deletion of the allocated arrays through the Grid attributes.

If we want to keep the attributes (i.e. Grid.faces_edges_cartesian), these should not store anything as part of the Grid._ds. We would instead use them for convenience within our calls (i.e. what you've already done inside of zonal average).

This loops back to the discussion I had previously with @erogluorhan . Pinging him here if he has anything to add.

I do see the benefit of caching and don't mean to invalidate the optimizations in this PR, however the current issue that is still making the usage of some of our functionality impractical for the highest resolution grids is still the execution time of Grid.bounds

I hope this makes sense.

Hi @philipc2 ,

Sorry, I’m losing track a bit here.

I would prefer not to handle the deletion of the allocated arrays through the Grid attributes.

  1. I understand that you’d prefer not to keep these variables in Grid._ds, but you previously mentioned that it’s okay to have them as properties there. So I'm not entirely clear on where you’d like to store them instead.

If we want to keep the attributes (i.e. Grid.faces_edges_cartesian), these should not store anything as part of the Grid._ds. We would instead use them for convenience within our calls (i.e. what you've already done inside of zonal average).

  1. You mentioned “(i.e. what you've already done inside of zonal average)”. In my implementation, I populate these two properties inside Grid._ds once I initialize the zonal call, so that both the zonal call and latlonbounds can benefit from them.

I do see the benefit of caching and don't mean to invalidate the optimizations in this PR, however the current issue that is still making the usage of some of our functionality impractical for the highest resolution grids is still the execution time of Grid.bounds

  1. As I’ve mentioned before, if I remove these properties and instead populate them inside zonal_mean and pass them as references to latlonbounds, it would induce noticeable API changes—which I want to avoid in this PR. This PR is intended to provide a better design idea for the package by identifying duplicate calls and removing unnecessary intermediate steps, not to apply large architectural changes.

Could you please clarify your intended design regarding where to store these two variables specifically?

Thanks!

@philipc2
Copy link
Member

philipc2 commented Apr 7, 2025

I apologize if I've caused any confusion.

I understand that you’d prefer not to keep these variables in Grid._ds, but you previously mentioned that it’s okay to have them as properties there. So I'm not entirely clear on where you’d like to store them instead.

Our properties don't need to return cached variables only. We could simply wrap the building functions for face_edges_cartesian and face_edges_spherical and use those in our implementations. This is operationally the same, but reduces the overall length of our code.

I do see the benefit in caching the variables, especially considering the breakoutof the zonal average call above. However, before we chose to cache and re-use this values, I believe that optimizing the helper function should be the first step.

Since constructing the face_edges only involves the manipulation of connectivity variables and indexing existing coordinates, I'd prefer not to use up extra memory throughout the life-span of the Grid.

The current issue with how you've implemented things is that it assumes the user is going to immediately call .zonal_average() after constructing the bounds, which isn't always the case.

Also, the performance of our Zonal Average functionality is at a good spot right now. The current bottle neck still lies in the initial bounds construction, which has come a long way since our initial implementation, but could still use extra optimizations. The change here would not lead to a noticeable change in perfromance.

Resolution Nodes Faces Edges Grid Load Time (s) Bounds Construction Time (s) Total Time (s)
30km 1,310,720 655,362 1,966,080 1.923 2.541 4.464
15km 5,242,880 2,621,442 7,864,320 7.489 10.191 17.68
7.5km 20,971,520 10,485,762 31,457,280 29.228 37.866 67.095
3.75km 83,886,080 41,943,042 125,829,120 113.028 155.486 268.514

To summarize:

  • We should investigate further optimizations into constructing the face_edge_ variables before considering caching
    • If we reach a wall here, we could consider adding a configuration parameter that the user can select if they would like to consider caching.
  • We should continue to optimize the Bounds construction, since this is the most noticeable slowdown (even though it is a one-time execution)

I am happy to have a discussion about this during this weeks UXarray meeting.

@hongyuchen1030
Copy link
Contributor Author

The current issue with how you've implemented things is that it assumes the user is going to immediately call .zonal_average() after constructing the bounds, which isn't always the case.

Hi Philip,

Thanks for the clarification. So you want to move away from the new implementation for caching these connectivity variables altogether?

Regarding the comment:

The current issue with how you've implemented things is that it assumes the user is going to immediately call .zonal_average() after constructing the bounds, which isn't always the case.

Before we tackle these individual function blocks one by one, we should consider the entire algorithm as a pipeline. Like we talked about, if you refer to the algorithm remapping paper, you'll see that the reason we need latlon bounds is to enable zonal averaging and remapping in the next step. In other words, latlonbound was intended as an intermediate step—almost internal to the remapping process—rather than being exposed as a user API. The same principle applies in S2, CDO, and CGAL; users typically call a query function and let the package handle the segmentation and arrangement of the grid.

Again, I am open to all kinds of proposed code designs, and I mostly focus on the algorithm development. If you think caching the variable is unnecessary here, I’m happy to remove the changes. For any other design ideas that involve API changes, we should open another PR to discuss those.

Thanks again for the discussion.

@philipc2
Copy link
Member

philipc2 commented Apr 7, 2025

Hi Hongyu!

Thanks for the clarification. So you want to move away from the new implementation for caching these connectivity variables altogether?

That's correct.

Before we tackle these individual function blocks one by one, we should consider the entire algorithm as a pipeline. Like we talked about, if you refer to the algorithm remapping paper, you'll see that the reason we need latlon bounds is to enable zonal averaging and remapping in the next step. In other words, latlonbound was intended as an intermediate step—almost internal to the remapping process—rather than being exposed as a user API.

That is understandable, however having the bounds is a valuable addition to the user API, as it has applications to visualization and subsetting/cross-section functionality that we have implemented.

The same principle applies in S2, CDO, and CGAL; users typically call a query function and let the package handle the segmentation and arrangement of the grid.

This is the direction we should head towards, and I believe something that we have both discussed as a path forward when we consider refactoring our geometry handling.

Again, I am open to all kinds of proposed code designs, and I mostly focus on the algorithm development. If you think caching the variable is unnecessary here, I’m happy to remove the changes. For any other design ideas that involve API changes, we should open another PR to discuss those.

This is very much appreciated! I've already shared some initial ideas and we've brainstormed about this before. I am currently in the processes of opening an Issue for this to continue our discussion.

@hongyuchen1030
Copy link
Contributor Author

That is understandable, however having the bounds is a valuable addition to the user API, as it has applications to visualization and subsetting/cross-section functionality that we have implemented.

I can proceed to revoke the changes.

Just some comments on this:

That is understandable, however having the bounds is a valuable addition to the user API, as it has applications to visualization and subsetting/cross-section functionality that we have implemented.

This still means the latlonbound is served as an intermediate process. And the right way to do is to optimize the subsetting/cross-section functionality as a whole, and even provide a tailored latlonbounds building for these functionalities in the future

@hongyuchen1030 hongyuchen1030 force-pushed the optimized_nonconservative_avg branch from 9e6a72b to b392046 Compare April 7, 2025 18:42
@philipc2
Copy link
Member

@hongyuchen1030

I am putting together optimizations to the constructing of Grid.bounds in #1205 to compute the face edge nodes for each face as they are needed, instead of pre-computing the entire array.

Once that is merged, we should considering doing something similar for the zonal face weights. We could tackle that as part of this PR.

@erogluorhan erogluorhan marked this pull request as draft April 15, 2025 20:38
@philipc2 philipc2 changed the title Optimized Non-Conservative Zonal Average DRAFT: Optimized Non-Conservative Zonal Average Apr 23, 2025
@erogluorhan
Copy link
Member

@hongyuchen1030 given that @philipc2 's improvements have gone into release, should we close this one and its corresponding issue #1181?

@erogluorhan
Copy link
Member

Confirmed with @hongyuchen1030 that this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Non-Conservative Zonal Average in zonal.py
4 participants