CUDA implementation of Real-time Image Smoothing via Iterative Least Squares for VapourSynth.
It is a global optimization based edge-preserving smoothing filter, which can avoid haloing and gradient reversal artifacts commonly found in weighted average based methods like bilateral filter and guided filter.
-
CUDA-enabled GPU(s).
-
cuFFT library, i.e.
cufft64_*.dllon Windows orlibcufft.so.*on Linux.
ils.ILS(clip clip[, float lambda=0.5, int iteration=4, float p=0.8, float eps=0.0001, float gamma=None, float c=None, bool use_welsch=False, int device_id=0, int num_streams=2, bool use_cuda_graph=True])In short, use use_welsch=True with lambda, iterations, gamma for compression artifacts removal tasks or use_welsch=False with lambda, iterations, p for detail manipulation tasks.
-
clipThe input clip. Must be of 32 bit float format. Only the first plane is processed.
-
lambdaSmoothing strength of the filter.
Default:
0.5 -
iterationIteration number of optimization. A larger iteration number can lead to stronger smoothing on large-amplitude details at the expense of a much higher computational cost.
Default:
4 -
pPower norm of the penalty on gradient, which controls the sensitivity to the edges in the input image. A smaller value tends to blur smooth regions but leaving salient edges untouched. A value in 0.8 ∼ 1 may be suitable for tasks of tone and detail manipulation, which can produce results with little visible artifacts.
Default:
0.8 -
epsSmall constant to make the penalty function differentiable at the origin. A larger leads to higher convergency speed with the risk of resulting in halo artifacts.
Default:
0.0001 -
gamma,cComputed automatically.
Default:
-
gamma: 0.5 * p - 1 -
c: p * (eps ** gamma)
-
-
use_welsch:Whether to use the Welsch penalty function. If not, the Charbonnier penalty is used instead.
The Welsch penalty is suitable for clip-art compression artifacts removal while the Charbonnier penalty is suitable for tone and detail manipulation.
Default:
False -
device_idSet GPU to be used.
Default:
0 -
num_streamsNumber of CUDA streams, enables concurrent kernel execution and data transfer.
Default:
4 -
use_cuda_graphWhether to use CUDA Graphs to reduce CPU cost and kernel launch overhead.
Default:
True
cmake -S . -B build -D CMAKE_BUILD_TYPE=Release -D CMAKE_CUDA_FLAGS="--threads 0 --use_fast_math -Wno-deprecated-gpu-targets" -D CMAKE_CUDA_ARCHITECTURES="50;61-real;75-real;86"
cmake --build build --config Release