Skip to content

Conversation

gembleman
Copy link

@gembleman gembleman commented May 5, 2025

Benchmark Older Version Newer Version(Use Libblur) Newer Version(Implement directly)
ssimulacra2 20.16 ms 15.96 ms 25.04ms
blur 11.61 ms 10.60 ms 15.75ms
downscale_by_2 1.94 ms 1.73 ms 2.18ms
image_multiply 2.35 ms 2.31 ms 2.90ms
ssim_map 11.65 ms 4.43 ms 4.63ms
xyb_to_planar 5.49 ms 5.12 ms 7.52ms
Image Size Older Version New Version(Use Libblur) Performance Improvement Score (Old version) Score (New version / Use Libblur) Accuracy Change
1448x1080 1.39s 264.70ms 5.3x faster 17.392219 17.392059 -0.001%
2828x4242 11.02s 1.67s 6.6x faster 84.031931 84.963643 -1.11%

Tested on Ryzen 3900x. Rayon feature has always been activated and measured.

I tried to improve the functions in lib.rs, but
most of them are improvements within the margin of error. Except for the ssim_map function.
The biggest improvement is Blur.
By using libblur crate, the performance improved by at least 5x to at most 7x.
However, the evaluation score accuracy decreased by about 0.001% to 1.1%.
This seems like a tolerable tradeoff.

Image Size Older Version New Version(Implement directly) Performance Improvement Score (Old version) Score (New version / Implement directly) Accuracy Change
1448x1080 1.39s 298.96ms 4.5x faster 17.392219 17.385432 -0.01%
2828x4242 11.02s 2.69s 4.0x faster 84.031931 84.052290 -0.02%

I've improved the implementation even when using a directly implemented blur function instead of libblur.
The difference with the Older Version is that it transposes the input data, applies the horizontal filter, and then transposes it again.
I applied rayon to the vertical_pass as well. While accuracy decreases by 0.01%, it's about 4 times faster.

Edited. 2025-05-09

@shssoichiro
Copy link
Member

I'm pondering whether the decrease in accuracy is worth the speed difference. I'll ask some users to see what the general vibe is before moving forward. The blur has been found to be the biggest bottleneck, but as you also found, the only way that has been discovered to get significant performance gains is to emulate a gaussian blur instead of doing a true gaussian.

@gembleman
Copy link
Author

gembleman commented May 8, 2025

Yes. I think it would be better to keep the existing Gaussian blur implementation and separate the part that uses libblur as a distinct feature. It seems preferable to give users the choice.

@gembleman
Copy link
Author

I edited it. Please check.

Copy link
Contributor

@FreezyLemon FreezyLemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Looking at the docs for libblur, it seems that EdgeMode::Clamp means that for edges, anything outside the image is assumed to be == the edge pixel value. The gaussian_impl handles this differently IIRC, have you tried different settings to maybe improve the accuracy?

It should also be noted that the current gaussian implementation is already an approximation and not a true gaussian filter. So if anything, the libblur implementation should be compared to a real reference implementation. I would honestly not be surprised if the libblur implementation is both more performant and more accurate.

I like the idea of giving multiple blur algorithm options. (Maybe even at runtime, but that's out of scope for this PR).

Code-wise, the implementation looks good. I'd prefer the git history to be less messy but that probably won't matter for the squash-and-merge. CI should test all feature combinations if possible, if you're not familiar with GitHub actions I/we could just add this at the end.

- name: Use predefined lockfile
run: mv Cargo.lock.MSRV Cargo.lock
- name: Build
run: cargo check --locked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here for a reason, if CI fails here then the lockfile is incorrect. Instead of removing --locked, update the Cargo.lock.MSRV file.

gembleman added 3 commits May 9, 2025 13:54
changed SIGMA from 2.3 to 2.2943

EdgeMode::Clamp, // diff 0.000 / 0.932

EdgeMode::Reflect101, // diff 0.008 / 0.933

EdgeMode::Reflect, // diff 0.003 / 0.934

EdgeMode::Constant, // diff 0.089 / 1.099

EdgeMode::Wrap, // diff 0.718 / 1.017

[cfg(not(feature = "rayon"))] added in blur_impl.rs
…ck to its original state.

updated yuvxyb 0.4.1 -> 0.4.2
@gembleman
Copy link
Author

gembleman commented May 9, 2025

I adjusted the SIGMA constant in libblur_impl from 2.3 to 2.2943. This reduced the diff to 0.0/0.932.
I also tried various changes to EdgeMode, and Clamp seems to be the best.

The libblur implementation shows increasing error as the image size grows, but I don't think this is something I can fix. By adjusting the KERNEL_SIZE and SIGMA constant values, it might be possible to further reduce the error, but currently 11 and 2.2943 seem to be the optimal values.

@gembleman
Copy link
Author

gembleman commented May 9, 2025

The AI says that in the libblur source code, gaussian::gaussian_blur_f32 uses classical kernel-based convolution. It claims this is theoretically more accurate than the recursive Gaussian filter used in gaussian_impl, which is hard to believe. More accurate AND faster! But honestly, I'm not sure about this part. It would be most accurate to ask the creator of libblur.

I tried testing with KERNEL_SIZE and SIGMA constant values estimated in build.rs, using the values 11 and 1.5 directly.

const SIGMA: f64 = 1.5f64;
let radius = 3.2795f64.mul_add(SIGMA, 0.2546).round();
radius = 3.2795 × 1.5 + 0.2546 = 5.17385
KERNEL_SIZE = 2 × 5 + 1 = 11
tests::test_ssimulacra2
Elapsed time: 269.2497ms
Result: 21.030474  
Expected: 17.398505
Diff: 3.632

tests::test2_ssimulacra2
Elapsed time: 1.6811704s
Result: 84.496677
Expected: 84.031931
Diff: 0.465

This is the result. It seems correct to consider that the accuracy has improved, not decreased.
If that's the case, I think the existing gaussian_impl is useless - should we delete it?

Also, it seems good to add options for people who want even faster processing to choose libblur::stack_blur or libblur::fast_gaussian.

@shssoichiro
Copy link
Member

Sorry, I lost track of this. Within the next day or two I'll give this a proper review, but my thinking is that this is a good option to have, but we should not make it the default due to the accuracy decrease. Rather I'd prefer it to be a separate "fast" version of the algorithm that the CLI can then provide via a --fast parameter.

Side note: I don't know what I was thinking when I made the CLI a completely separate repo, but I should go ahead and move it into this repo and make this a proper workspace so that it can be more easily updated together with the library.

@awxkee
Copy link

awxkee commented Jul 5, 2025

Just to clarify: libblur::gaussian_blur performs pure 2D separable convolution, as it should. The only factors that can affect the result are the system installed libm (used to generate filter weights, although you can invoke libblur::filter1d with your own weights if needed) and FMA availability ( checked at runtime ). If you're working with extremely large intensities ( > 2e10 ) or extremely small ones ( < 2e-12 ), mandatory FMA or Dekker arithmetic may be required, which libblur does not provide.

If anything, if your current implementation is compiled by customers without explicitly setting -Ctarget-feature=+fma for a platform where LLVM could not detect FMA at compile time, performance might be interesting. Rust will emit software calls to libm::fmaf, and many math libraries do provide a software FMA fallback.

From signal theory alone, any Gaussian approximation makes the signal useless for any consequential analysis. After approximation your signal is composed from squares, triangles or whatever else your approximation uses. It makes no sense to perform any analysis on this.

@gembleman
Copy link
Author

@awxkee Thank you for the clear explanation! I'm not very familiar with Gaussian blur, so I wasn't sure how to proceed and had forgotten about this PR. Thank you for your response.

Based on what you've said, it seems like completely replacing the existing implementation with libblur::gaussian_blur would be the right approach. What do you think? It's really impressive that it improves both accuracy and performance.

@awxkee
Copy link

awxkee commented Jul 6, 2025

You'd better wait to see what authors here will say on that matter.

I'd even argue that when a true Gaussian is replaced with any kind of approximation, it will very likely produce results that are GIGO. The authors here likely just want the output numbers to roughly match with a reference implementation. Even if that reference's output may itself be quite vague.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants