-
Notifications
You must be signed in to change notification settings - Fork 40
DRAFT: Optimized Non-Conservative Zonal Average #1180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Optimized Non-Conservative Zonal Average #1180
Conversation
ASV Benchmark failing? |
pre-commit.ci autofix |
The benchmark still doesn't work out for this PR, can you take a look? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, we can fix the benchmarking for forked repo's later.
Hi @philipc2 , Thanks again for the discussion—I'm glad we reached an agreement. I've added the latest commit, which removes the "faces_edges_cartesian" and "faces_edges_spherical" properties at the end of the zonal_mean call. This change should help save duplicate call time and prevent unnecessary memory expansion. Please take a look when you have a moment. Once this is merged, I'll open another PR where we can integrate and carefully parallelize the latlonbound and zonal_avg calls. Thanks! |
@philipc2 And I also thought carefully about optimizing the loop you mentioned:
I know it sounds good to build the connectivity for each face slice inside the loop, and it would be very helpful if the user is querying just a few latitudes. However, consider the case of a batch of latitudes: for every latitude call, we would be querying the associated This brings up another general design idea (which also applies to S2 geometry): we should separate the single call from the batch call. Our current version works best for a batch call for candidate faces—which is a common use case. In a future optimization, we can also consider optimizing the function for a single latitude call, perhaps even creating a dedicated function tailored for that scenario. Let me know your thoughts. |
This is a good point. Keeping the call outside of the loop should be good for our current application.
I like that idea. |
uxarray/grid/grid.py
Outdated
@property | ||
def faces_edges_cartesian(self): | ||
"""Cartesian Coordinates for each Face Edge. | ||
|
||
Dimensions ``(n_face, n_max_face_edges, 2, 3)`` | ||
""" | ||
if "faces_edges_cartesian" not in self._ds: | ||
_populate_faces_edges_cartesian(self) | ||
return self._ds["faces_edges_cartesian"] | ||
|
||
@faces_edges_cartesian.setter | ||
def faces_edges_cartesian(self, value): | ||
self._ds["faces_edges_cartesian"] = value | ||
|
||
@faces_edges_cartesian.deleter | ||
def faces_edges_cartesian(self): | ||
if "faces_edges_cartesian" in self._ds: | ||
del self._ds["faces_edges_cartesian"] | ||
|
||
@property | ||
def faces_edges_spherical(self): | ||
"""Latitude Longitude Coordinates for each Face in radians. | ||
|
||
Dimensions ``(n_face, n_max_face_edges, 2, 2)`` | ||
""" | ||
if "faces_edges_spherical" not in self._ds: | ||
_populate_faces_edges_spherical(self) | ||
return self._ds["faces_edges_spherical"] | ||
|
||
@faces_edges_spherical.setter | ||
def faces_edges_spherical(self, value): | ||
self._ds["faces_edges_spherical"] = value | ||
|
||
@faces_edges_spherical.deleter | ||
def faces_edges_spherical(self): | ||
if "faces_edges_spherical" in self._ds: | ||
del self._ds["faces_edges_spherical"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a step in the right direction, although I don't think we should store the face_edges
in self._ds
due to the memory concerns.
I'm happy to keep the two faces_edges_spherical
and faces_edges_cartesian
as part of the API, but instead they construct the variables by calling _get_cartesian_faces_edge_nodes
.
Inside of zonal.py
, calling this attribute within the scope of the function will automatically manage the memory associated with it once we exit the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping face_edges
in zonal was my first thought as well. However, I then realized that latlonbound
also needs to use this property. That means we would have to modify the latlonbound
API to accept incoming face_edges, which involves more API changes than I initially expected—and I’m wary of making such significant modifications to the existing API and functionality.
Do you have any suggestions on where we might store this information to address the memory concerns without causing a large API overhaul?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the two @property
implementations, and have them return the output of _get_cartesian_face_edge_nodes()
and _get_lonlat_rad_face_edge_nodes()
and replace the following two lines with calls to Grid.face_edges_cartesian
and Grid.face_edges_spherical
uxarray/uxarray/grid/geometry.py
Lines 1412 to 1428 in a6aa629
# Prepare data for Numba functions | |
faces_edges_cartesian = _get_cartesian_face_edge_nodes( | |
grid.face_node_connectivity.values, | |
grid.n_face, | |
grid.n_max_face_edges, | |
grid.node_x.values, | |
grid.node_y.values, | |
grid.node_z.values, | |
) | |
faces_edges_lonlat_rad = _get_lonlat_rad_face_edge_nodes( | |
grid.face_node_connectivity.values, | |
grid.n_face, | |
grid.n_max_face_edges, | |
grid.node_lon.values, | |
grid.node_lat.values, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to find such a leftover design in this PR. I have also pinned you to the corresponding populate_bounds
section because the whole point of the PR was to reduce such duplicate calls. I wonder which commit this file belongs to?
Here's an optimized version of This runs in about @njit(cache=True, parallel=True)
def get_cartesian_face_edge_nodes(face_edge_connectivity, edge_node_connectivity,
node_x, node_y, node_z,
n_edges_per_face, n_face, n_max_face_edges):
# Allocate result array
# I used np.nan here instead of INT_FILL_VALUE to be consistent with the double type (INT_FILL_VALUE is for our int64 arrays)
face_edges = np.full((n_face, n_max_face_edges, 2, 3), np.nan, dtype=np.float64)
# Construct edge_nodes for each face in parallel
for face_i in prange(n_face):
n_edges = n_edges_per_face[face_i]
edge_indices = face_edge_connectivity[face_i, :n_edges]
node_indices = edge_node_connectivity[edge_indices]
first_nodes = node_indices[:, 0]
second_nodes = node_indices[:, 1]
# Assign x coordinates
face_edges[face_i, :n_edges, 0, 0] = node_x[first_nodes]
face_edges[face_i, :n_edges, 1, 0] = node_x[second_nodes]
# Assign y coordinates
face_edges[face_i, :n_edges, 0, 1] = node_y[first_nodes]
face_edges[face_i, :n_edges, 1, 1] = node_y[second_nodes]
# Assign z coordinates
face_edges[face_i, :n_edges, 0, 2] = node_z[first_nodes]
face_edges[face_i, :n_edges, 1, 2] = node_z[second_nodes]
return face_edges The same can be applied to the spherical one, with changes only needed to the assignment of coordinates. |
Thanks for the information! This is actually what I had in mind for Do you still recall the rationale behind that decision? I'm trying to avoid duplicate iterations as much as possible. Thanks! |
Most of the time, the It makes sense to me to re-use those connectivity variables instead of doing the extra operations within the function call, which each time essentially "derives" the same information already present in the connectivity. |
Hi @philipc2 Have you compared the new version's results with the old one? I set up the following test and found that the results differ: def test_cartesian_test():
grid = ux.open_grid(gridfile_CSne8)
new_faces_edges_cartesian = get_cartesian_face_edge_nodes(
grid.face_edge_connectivity.values,
grid.edge_node_connectivity.values,
grid.node_x.values,
grid.node_y.values,
grid.node_z.values,
grid.n_edges_per_face, # or grid.n_face, depending on what you intended
grid.n_max_face_edges
)
old_faces_edges_cartesian = _get_cartesian_faces_edge_nodes(
grid.face_node_connectivity.values, grid.n_face,
grid.n_max_face_edges, grid.node_x.values,
grid.node_y.values, grid.node_z.values
)
nt.assert_array_equal(new_faces_edges_cartesian, old_faces_edges_cartesian) The error messages are: args = (<built-in function eq>, array([[[[ 0.63933827, -0.42719217, -0.63933827],
[ 0.57735027, -0.57735027, -0.5773... 0.63933827]],
[[-0.63933827, 0.42719217, 0.63933827],
[-0.48565274, 0.48565274, 0.72683068]]]]))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}
@wraps(func)
def inner(*args, **kwds):
with self._recreate_cm():
> return func(*args, **kwds)
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 4516 / 9216 (49%)
E Max absolute difference among violations: 0.19509032
E Max relative difference among violations: 1.60290531e+16
E ACTUAL: array([[[[ 0.639338, -0.427192, -0.639338],
E [ 0.57735 , -0.57735 , -0.57735 ]],
E ...
E DESIRED: array([[[[ 0.57735 , -0.57735 , -0.57735 ],
E [ 0.639338, -0.427192, -0.639338]],
E ... |
Looks like there is some small differences. We can move this optimizing into the work I'm putting together in #1195 and keep this PR smaller in scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify a couple things.
I would prefer not to handle the deletion of the allocated arrays through the Grid
attributes.
If we want to keep the attributes (i.e. Grid.faces_edges_cartesian
), these should not store anything as part of the Grid._ds
. We would instead use them for convenience within our calls (i.e. what you've already done inside of zonal average).
This loops back to the discussion I had previously with @erogluorhan . Pinging him here if he has anything to add.
I do see the benefit of caching and don't mean to invalidate the optimizations in this PR, however the current issue that is still making the usage of some of our functionality impractical for the highest resolution grids is still the execution time of Grid.bounds
I hope this makes sense.
Hi @philipc2 , Sorry, I’m losing track a bit here.
Could you please clarify your intended design regarding where to store these two variables specifically? Thanks! |
I apologize if I've caused any confusion.
Our properties don't need to return cached variables only. We could simply wrap the building functions for I do see the benefit in caching the variables, especially considering the breakoutof the zonal average call above. However, before we chose to cache and re-use this values, I believe that optimizing the helper function should be the first step. Since constructing the The current issue with how you've implemented things is that it assumes the user is going to immediately call Also, the performance of our Zonal Average functionality is at a good spot right now. The current bottle neck still lies in the initial bounds construction, which has come a long way since our initial implementation, but could still use extra optimizations. The change here would not lead to a noticeable change in perfromance.
To summarize:
I am happy to have a discussion about this during this weeks UXarray meeting. |
Hi Philip, Thanks for the clarification. So you want to move away from the new implementation for caching these connectivity variables altogether? Regarding the comment:
Before we tackle these individual function blocks one by one, we should consider the entire algorithm as a pipeline. Like we talked about, if you refer to the algorithm remapping paper, you'll see that the reason we need latlon bounds is to enable zonal averaging and remapping in the next step. In other words, latlonbound was intended as an intermediate step—almost internal to the remapping process—rather than being exposed as a user API. The same principle applies in S2, CDO, and CGAL; users typically call a query function and let the package handle the segmentation and arrangement of the grid. Again, I am open to all kinds of proposed code designs, and I mostly focus on the algorithm development. If you think caching the variable is unnecessary here, I’m happy to remove the changes. For any other design ideas that involve API changes, we should open another PR to discuss those. Thanks again for the discussion. |
Hi Hongyu!
That's correct.
That is understandable, however having the bounds is a valuable addition to the user API, as it has applications to visualization and subsetting/cross-section functionality that we have implemented.
This is the direction we should head towards, and I believe something that we have both discussed as a path forward when we consider refactoring our geometry handling.
This is very much appreciated! I've already shared some initial ideas and we've brainstormed about this before. I am currently in the processes of opening an Issue for this to continue our discussion. |
I can proceed to revoke the changes. Just some comments on this:
This still means the latlonbound is served as an intermediate process. And the right way to do is to optimize the subsetting/cross-section functionality as a whole, and even provide a tailored latlonbounds building for these functionalities in the future |
9e6a72b
to
b392046
Compare
I am putting together optimizations to the constructing of Once that is merged, we should considering doing something similar for the zonal face weights. We could tackle that as part of this PR. |
@hongyuchen1030 given that @philipc2 's improvements have gone into release, should we close this one and its corresponding issue #1181? |
Confirmed with @hongyuchen1030 that this can be closed. |
Closes #1181 #880
Overview
zonal.py
Expected Usage
PR Checklist
General
Testing
Documentation
_
) and have been added todocs/internal_api/index.rst
docs/user_api/index.rst
Examples
docs/examples/
folderdocs/examples.rst
toctreedocs/gallery.yml
with appropriate thumbnail photo indocs/_static/thumbnails/