-
Notifications
You must be signed in to change notification settings - Fork 6
Improved to be about 5x faster #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
package `libblur v0.17.3` requires rustc 1.82.0 or newer
|
I'm pondering whether the decrease in accuracy is worth the speed difference. I'll ask some users to see what the general vibe is before moving forward. The blur has been found to be the biggest bottleneck, but as you also found, the only way that has been discovered to get significant performance gains is to emulate a gaussian blur instead of doing a true gaussian. |
|
Yes. I think it would be better to keep the existing Gaussian blur implementation and separate the part that uses libblur as a distinct feature. It seems preferable to give users the choice. |
|
I edited it. Please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
Looking at the docs for libblur, it seems that EdgeMode::Clamp means that for edges, anything outside the image is assumed to be == the edge pixel value. The gaussian_impl handles this differently IIRC, have you tried different settings to maybe improve the accuracy?
It should also be noted that the current gaussian implementation is already an approximation and not a true gaussian filter. So if anything, the libblur implementation should be compared to a real reference implementation. I would honestly not be surprised if the libblur implementation is both more performant and more accurate.
I like the idea of giving multiple blur algorithm options. (Maybe even at runtime, but that's out of scope for this PR).
Code-wise, the implementation looks good. I'd prefer the git history to be less messy but that probably won't matter for the squash-and-merge. CI should test all feature combinations if possible, if you're not familiar with GitHub actions I/we could just add this at the end.
| - name: Use predefined lockfile | ||
| run: mv Cargo.lock.MSRV Cargo.lock | ||
| - name: Build | ||
| run: cargo check --locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is here for a reason, if CI fails here then the lockfile is incorrect. Instead of removing --locked, update the Cargo.lock.MSRV file.
changed SIGMA from 2.3 to 2.2943 EdgeMode::Clamp, // diff 0.000 / 0.932 EdgeMode::Reflect101, // diff 0.008 / 0.933 EdgeMode::Reflect, // diff 0.003 / 0.934 EdgeMode::Constant, // diff 0.089 / 1.099 EdgeMode::Wrap, // diff 0.718 / 1.017 [cfg(not(feature = "rayon"))] added in blur_impl.rs
…ck to its original state. updated yuvxyb 0.4.1 -> 0.4.2
|
I adjusted the SIGMA constant in libblur_impl from 2.3 to 2.2943. This reduced the diff to 0.0/0.932. The libblur implementation shows increasing error as the image size grows, but I don't think this is something I can fix. By adjusting the KERNEL_SIZE and SIGMA constant values, it might be possible to further reduce the error, but currently 11 and 2.2943 seem to be the optimal values. |
|
The AI says that in the libblur source code, gaussian::gaussian_blur_f32 uses classical kernel-based convolution. It claims this is theoretically more accurate than the recursive Gaussian filter used in gaussian_impl, which is hard to believe. More accurate AND faster! But honestly, I'm not sure about this part. It would be most accurate to ask the creator of libblur. I tried testing with KERNEL_SIZE and SIGMA constant values estimated in build.rs, using the values 11 and 1.5 directly. This is the result. It seems correct to consider that the accuracy has improved, not decreased. Also, it seems good to add options for people who want even faster processing to choose libblur::stack_blur or libblur::fast_gaussian. |
|
Sorry, I lost track of this. Within the next day or two I'll give this a proper review, but my thinking is that this is a good option to have, but we should not make it the default due to the accuracy decrease. Rather I'd prefer it to be a separate "fast" version of the algorithm that the CLI can then provide via a Side note: I don't know what I was thinking when I made the CLI a completely separate repo, but I should go ahead and move it into this repo and make this a proper workspace so that it can be more easily updated together with the library. |
|
Just to clarify: If anything, if your current implementation is compiled by customers without explicitly setting From signal theory alone, any Gaussian approximation makes the signal useless for any consequential analysis. After approximation your signal is composed from squares, triangles or whatever else your approximation uses. It makes no sense to perform any analysis on this. |
|
@awxkee Thank you for the clear explanation! I'm not very familiar with Gaussian blur, so I wasn't sure how to proceed and had forgotten about this PR. Thank you for your response. Based on what you've said, it seems like completely replacing the existing implementation with libblur::gaussian_blur would be the right approach. What do you think? It's really impressive that it improves both accuracy and performance. |
|
You'd better wait to see what authors here will say on that matter. I'd even argue that when a true Gaussian is replaced with any kind of approximation, it will very likely produce results that are GIGO. The authors here likely just want the output numbers to roughly match with a reference implementation. Even if that reference's output may itself be quite vague. |
Tested on Ryzen 3900x. Rayon feature has always been activated and measured.
I tried to improve the functions in lib.rs, but
most of them are improvements within the margin of error. Except for the ssim_map function.
The biggest improvement is Blur.
By using libblur crate, the performance improved by at least 5x to at most 7x.
However, the evaluation score accuracy decreased by about 0.001% to 1.1%.
This seems like a tolerable tradeoff.
I've improved the implementation even when using a directly implemented blur function instead of libblur.
The difference with the Older Version is that it transposes the input data, applies the horizontal filter, and then transposes it again.
I applied rayon to the vertical_pass as well. While accuracy decreases by 0.01%, it's about 4 times faster.
Edited. 2025-05-09