-
Notifications
You must be signed in to change notification settings - Fork 40
add deterministic xr-metrics to asv benchmark and asv refactor #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add deterministic xr-metrics to asv benchmark and asv refactor #231
Conversation
I'm fine with this going in. I'm also ok with not documenting it. Just add a note in the CHANGELOG about it being for advanced users. |
Just update the CHANGELOG and i'll merge if you think this is good to go. |
Co-authored-by: Ray Bell <[email protected]>
How much faster is this for smaller datasets compared to the original np_deterministic? I am afraid of redundancy and the resulting maintenance required if you do add these. Not to mention, for new users, if they see two methods for RMSE, they may be confused as to why they would choose one over the other (e.g. pandas and their redundant methods like pd.read_table, pd.read_csv, etc) |
@ahuang11 raises good points. @aaronspring If you do That said I don't think we have the user base of pandas :) and there are learnings from the benchmark. To Andrew's point I don't think it's worth maintaining and leave it here for advanced users. |
Sorry, I have trouble deciphering the benchmark. I think I only see the after, but not the before. How much faster is it? Nevermind I see it now From the zen of python: And less is more. In a similar sense, if you compare |
the dont get added, because I didnt add them to api.rst
all these functions are not available for |
I agree my new code is redundant for the user (unless interested in trivial speedups in the milliseconds).
no users wont see this, as it is not in the main API These functions provide: I am not arguing that we should maintain this code. I think the distance based xr metrics wont require any maintinance because they are written so simple. and if they break somehow, we can also get rid of them. |
You may have to manually update (rebase) CHANGELOG https://github.com/xarray-contrib/xskillscore/blob/master/CHANGELOG.rst |
I added a small disclaimer to the new files. While I see the risk from this redundancy, I also see the gain from independent testing via xarray based functions. I leave it up to you guys whether you think this PR is useful. If not, I can also just delete the xr metrics part, and just merge the asv refactoring. I dont have high stakes in this. |
Since it's only a few seconds faster, I would say don't merge it. If it's
about testing, you could add to the unit testing. But I also don't have
high stakes.
…On Tue, Jan 12, 2021, 8:36 AM Aaron Spring ***@***.***> wrote:
I added a small disclaimer to the new files.
While I see the risk from this redundancy, I also see the gain from
independent testing via xarray based functions.
I leave it up to you guys whether you think this PR is useful.
If not, I can also just delete the xr metrics part, and just merge the asv
refactoring. I dont have high stakes in this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU7FFQUHEXK4JGUBDGWCC3SZRM7PANCNFSM4VVQAISA>
.
|
I'm leaning towards keeping the asv stuff |
20-30% is often worth optimizing. And those numbers actually show how |
do you think implementing |
removed |
I would agree about the 20-30% optimization in speed if the time scale was on minutes / hours and it was able to outscale its counterpart. For example, from https://stackoverflow.com/questions/3650194/are-numpys-math-functions-faster-than-pythons
abs is 26x faster than numpy.abs, but it does not warrant an additional implementation in numpy because the time scale is in microseconds, and numpy outscales the std lib math. |
create a 1e4x1e4x1e4 array, and you longer compute times |
@aaronspring Is this good to go? Sorry I don't get the question here
|
Thanks @aaronspring |
Description
The good news is that
xskillscore
beatsxr
-metrics for large inputs, at least on my laptop, 10-40%.The distance metrics have also the keywords
skipna
andweighted
and are much more concise (6 lines of code only).An xarray PR for
xr.corr(weighted, skipna)
would be nice.asv
benchmarksType of change
Please delete options that are not relevant.
asv
to detect performance changes)How Has This Been Tested?
Please describe the tests that you ran to verify your changes. This could point to a cell in the updated notebooks. Or a snippet of code with accompanying figures here.
Checklist (while developing)
Pre-Merge Checklist (final steps)
References
Please add any references to manuscripts, textbooks, etc.