Compositor: Speedup anisotropic Kuwahara operation #108796

Sergey Sharybin · 2023-06-09T11:05:33+02:00

Sergey Sharybin commented

2023-06-09 11:05:33 +02:00

There are two major sources of speedup:

Stick to single precision floating point values
Move towards vectorized types

Using single precision floating point values is something that needs
to be tackled sooner or later in order to make the code easier to be
ported to GPU.

There is possibly some difference in the output images caused by the
different handling of epsilons. The code follows closer how we handle
similar issues in Cycles, and the original image where the NaN issues
were spotted still renders fine.

Use of vectorized types explicitly solves the issue of sampling the
input multiple times, and calculating luminance for the same pixel
multiple times. It also helps to benefit of auto-vectorization.

When compositing 3840 x 2160 image the operation itself is 4x faster
on Apple M2 (36.2 sec before, 8.2 after), the final compositing is
somewhat less linearly scaled (39.6 sec before, 11.3 after). This is
because there are some other operations involved to reach the final
frame.

Note that the numbers are from the full-frame compositor. The tiled
compositor is also speed-up using the same changes, but there the
absolute values are much higher, and the relative speedup is about
3x only.

There are two major sources of speedup: - Stick to single precision floating point values - Move towards vectorized types Using single precision floating point values is something that needs to be tackled sooner or later in order to make the code easier to be ported to GPU. There is possibly some difference in the output images caused by the different handling of epsilons. The code follows closer how we handle similar issues in Cycles, and the original image where the NaN issues were spotted still renders fine. Use of vectorized types explicitly solves the issue of sampling the input multiple times, and calculating luminance for the same pixel multiple times. It also helps to benefit of auto-vectorization. When compositing 3840 x 2160 image the operation itself is 4x faster on Apple M2 (36.2 sec before, 8.2 after), the final compositing is somewhat less linearly scaled (39.6 sec before, 11.3 after). This is because there are some other operations involved to reach the final frame. Note that the numbers are from the full-frame compositor. The tiled compositor is also speed-up using the same changes, but there the absolute values are much higher, and the relative speedup is about 3x only.

Sergey Sharybin added the

 @ -284,0 +284,4 @@
         float4 color;
         image->read_elem(xx, yy, &color.x);
         /* TODO(@zazizizou): only compute lum once per region. */

 @ -11,2 +13,4 @@
 namespace blender::compositor {
 /* Compute x to the given power, in a safe manner which does not produce non-finite values.
  * For non-positive values of x zero si returned. */

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Compositor: Speedup anisotropic Kuwahara operation #108796

Pull request closed