Improved to be about 5x faster #28

gembleman · 2025-05-05T14:03:20Z

Benchmark	Older Version	Newer Version(Use Libblur)	Newer Version(Implement directly)
ssimulacra2	20.16 ms	15.96 ms	25.04ms
blur	11.61 ms	10.60 ms	15.75ms
downscale_by_2	1.94 ms	1.73 ms	2.18ms
image_multiply	2.35 ms	2.31 ms	2.90ms
ssim_map	11.65 ms	4.43 ms	4.63ms
xyb_to_planar	5.49 ms	5.12 ms	7.52ms

Image Size	Older Version	New Version(Use Libblur)	Performance Improvement	Score (Old version)	Score (New version / Use Libblur)	Accuracy Change
1448x1080	1.39s	264.70ms	5.3x faster	17.392219	17.392059	-0.001%
2828x4242	11.02s	1.67s	6.6x faster	84.031931	84.963643	-1.11%

Tested on Ryzen 3900x. Rayon feature has always been activated and measured.

I tried to improve the functions in lib.rs, but
most of them are improvements within the margin of error. Except for the ssim_map function.
The biggest improvement is Blur.
By using libblur crate, the performance improved by at least 5x to at most 7x.
However, the evaluation score accuracy decreased by about 0.001% to 1.1%.
This seems like a tolerable tradeoff.

Image Size	Older Version	New Version(Implement directly)	Performance Improvement	Score (Old version)	Score (New version / Implement directly)	Accuracy Change
1448x1080	1.39s	298.96ms	4.5x faster	17.392219	17.385432	-0.01%
2828x4242	11.02s	2.69s	4.0x faster	84.031931	84.052290	-0.02%

I've improved the implementation even when using a directly implemented blur function instead of libblur.
The difference with the Older Version is that it transposes the input data, applies the horizontal filter, and then transposes it again.
I applied rayon to the vertical_pass as well. While accuracy decreases by 0.01%, it's about 4 times faster.

Edited. 2025-05-09

package `libblur v0.17.3` requires rustc 1.82.0 or newer

shssoichiro · 2025-05-08T00:07:53Z

I'm pondering whether the decrease in accuracy is worth the speed difference. I'll ask some users to see what the general vibe is before moving forward. The blur has been found to be the biggest bottleneck, but as you also found, the only way that has been discovered to get significant performance gains is to emulate a gaussian blur instead of doing a true gaussian.

gembleman · 2025-05-08T08:04:21Z

Yes. I think it would be better to keep the existing Gaussian blur implementation and separate the part that uses libblur as a distinct feature. It seems preferable to give users the choice.

gembleman · 2025-05-08T13:21:32Z

I edited it. Please check.

FreezyLemon

Thank you for the PR!

Looking at the docs for libblur, it seems that EdgeMode::Clamp means that for edges, anything outside the image is assumed to be == the edge pixel value. The gaussian_impl handles this differently IIRC, have you tried different settings to maybe improve the accuracy?

It should also be noted that the current gaussian implementation is already an approximation and not a true gaussian filter. So if anything, the libblur implementation should be compared to a real reference implementation. I would honestly not be surprised if the libblur implementation is both more performant and more accurate.

I like the idea of giving multiple blur algorithm options. (Maybe even at runtime, but that's out of scope for this PR).

Code-wise, the implementation looks good. I'd prefer the git history to be less messy but that probably won't matter for the squash-and-merge. CI should test all feature combinations if possible, if you're not familiar with GitHub actions I/we could just add this at the end.

FreezyLemon · 2025-05-08T14:06:36Z

.github/workflows/rust.yml

      - name: Use predefined lockfile
        run: mv Cargo.lock.MSRV Cargo.lock
      - name: Build
-        run: cargo check --locked


This is here for a reason, if CI fails here then the lockfile is incorrect. Instead of removing --locked, update the Cargo.lock.MSRV file.

changed SIGMA from 2.3 to 2.2943 EdgeMode::Clamp, // diff 0.000 / 0.932 EdgeMode::Reflect101, // diff 0.008 / 0.933 EdgeMode::Reflect, // diff 0.003 / 0.934 EdgeMode::Constant, // diff 0.089 / 1.099 EdgeMode::Wrap, // diff 0.718 / 1.017 [cfg(not(feature = "rayon"))] added in blur_impl.rs

…ck to its original state. updated yuvxyb 0.4.1 -> 0.4.2

gembleman · 2025-05-09T05:27:00Z

I adjusted the SIGMA constant in libblur_impl from 2.3 to 2.2943. This reduced the diff to 0.0/0.932.
I also tried various changes to EdgeMode, and Clamp seems to be the best.

The libblur implementation shows increasing error as the image size grows, but I don't think this is something I can fix. By adjusting the KERNEL_SIZE and SIGMA constant values, it might be possible to further reduce the error, but currently 11 and 2.2943 seem to be the optimal values.

gembleman · 2025-05-09T06:39:31Z

The AI says that in the libblur source code, gaussian::gaussian_blur_f32 uses classical kernel-based convolution. It claims this is theoretically more accurate than the recursive Gaussian filter used in gaussian_impl, which is hard to believe. More accurate AND faster! But honestly, I'm not sure about this part. It would be most accurate to ask the creator of libblur.

I tried testing with KERNEL_SIZE and SIGMA constant values estimated in build.rs, using the values 11 and 1.5 directly.

const SIGMA: f64 = 1.5f64;
let radius = 3.2795f64.mul_add(SIGMA, 0.2546).round();
radius = 3.2795 × 1.5 + 0.2546 = 5.17385
KERNEL_SIZE = 2 × 5 + 1 = 11

tests::test_ssimulacra2
Elapsed time: 269.2497ms
Result: 21.030474  
Expected: 17.398505
Diff: 3.632

tests::test2_ssimulacra2
Elapsed time: 1.6811704s
Result: 84.496677
Expected: 84.031931
Diff: 0.465

This is the result. It seems correct to consider that the accuracy has improved, not decreased.
If that's the case, I think the existing gaussian_impl is useless - should we delete it?

Also, it seems good to add options for people who want even faster processing to choose libblur::stack_blur or libblur::fast_gaussian.

shssoichiro · 2025-07-02T12:56:49Z

Sorry, I lost track of this. Within the next day or two I'll give this a proper review, but my thinking is that this is a good option to have, but we should not make it the default due to the accuracy decrease. Rather I'd prefer it to be a separate "fast" version of the algorithm that the CLI can then provide via a --fast parameter.

Side note: I don't know what I was thinking when I made the CLI a completely separate repo, but I should go ahead and move it into this repo and make this a proper workspace so that it can be more easily updated together with the library.

awxkee · 2025-07-05T13:42:42Z

Just to clarify: libblur::gaussian_blur performs pure 2D separable convolution, as it should. The only factors that can affect the result are the system installed libm (used to generate filter weights, although you can invoke libblur::filter1d with your own weights if needed) and FMA availability ( checked at runtime ). If you're working with extremely large intensities ( > 2e10 ) or extremely small ones ( < 2e-12 ), mandatory FMA or Dekker arithmetic may be required, which libblur does not provide.

If anything, if your current implementation is compiled by customers without explicitly setting -Ctarget-feature=+fma for a platform where LLVM could not detect FMA at compile time, performance might be interesting. Rust will emit software calls to libm::fmaf, and many math libraries do provide a software FMA fallback.

From signal theory alone, any Gaussian approximation makes the signal useless for any consequential analysis. After approximation your signal is composed from squares, triangles or whatever else your approximation uses. It makes no sense to perform any analysis on this.

gembleman · 2025-07-06T01:39:17Z

@awxkee Thank you for the clear explanation! I'm not very familiar with Gaussian blur, so I wasn't sure how to proceed and had forgotten about this PR. Thank you for your response.

Based on what you've said, it seems like completely replacing the existing implementation with libblur::gaussian_blur would be the right approach. What do you think? It's really impressive that it improves both accuracy and performance.

awxkee · 2025-07-06T10:42:16Z

You'd better wait to see what authors here will say on that matter.

I'd even argue that when a true Gaussian is replaced with any kind of approximation, it will very likely produce results that are GIGO. The authors here likely just want the output numbers to roughly match with a reference implementation. Even if that reference's output may itself be quite vague.

gembleman added 5 commits May 5, 2025 21:52

Version 0.6.0 / Up to 3x faster

fc51a18

add rustflags

b798476

add big size imgs for test

468ace2

Run cargo check "--locked" delete

ed4534c

set Rust version 1.82.0

ffb087b

package `libblur v0.17.3` requires rustc 1.82.0 or newer

gembleman added 2 commits May 8, 2025 19:18

Added our own Gaussian filter implementation

8bded21

Blur function improved. About 4x faster.

612443d

FreezyLemon reviewed May 8, 2025

View reviewed changes

gembleman added 3 commits May 9, 2025 13:54

cleaned libblur_impl

d0a70c7

Restore the 'expected' variable in the test_ssimulacra2() function ba…

04c639a

…ck to its original state. updated yuvxyb 0.4.1 -> 0.4.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved to be about 5x faster #28

Improved to be about 5x faster #28

Uh oh!

gembleman commented May 5, 2025 •

edited

Loading

Uh oh!

shssoichiro commented May 8, 2025

Uh oh!

gembleman commented May 8, 2025 •

edited

Loading

Uh oh!

gembleman commented May 8, 2025

Uh oh!

FreezyLemon left a comment

Uh oh!

FreezyLemon May 8, 2025

Uh oh!

gembleman commented May 9, 2025 •

edited

Loading

Uh oh!

gembleman commented May 9, 2025 •

edited

Loading

Uh oh!

shssoichiro commented Jul 2, 2025

Uh oh!

awxkee commented Jul 5, 2025 •

edited

Loading

Uh oh!

gembleman commented Jul 6, 2025

Uh oh!

awxkee commented Jul 6, 2025

Uh oh!

Uh oh!

Improved to be about 5x faster #28

Are you sure you want to change the base?

Improved to be about 5x faster #28

Uh oh!

Conversation

gembleman commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shssoichiro commented May 8, 2025

Uh oh!

gembleman commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gembleman commented May 8, 2025

Uh oh!

FreezyLemon left a comment

Choose a reason for hiding this comment

Uh oh!

FreezyLemon May 8, 2025

Choose a reason for hiding this comment

Uh oh!

gembleman commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gembleman commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shssoichiro commented Jul 2, 2025

Uh oh!

awxkee commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gembleman commented Jul 6, 2025

Uh oh!

awxkee commented Jul 6, 2025

Uh oh!

Uh oh!

gembleman commented May 5, 2025 •

edited

Loading

gembleman commented May 8, 2025 •

edited

Loading

gembleman commented May 9, 2025 •

edited

Loading

gembleman commented May 9, 2025 •

edited

Loading

awxkee commented Jul 5, 2025 •

edited

Loading