Compositor: Speedup anisotropic Kuwahara operation #108796

Closed
Sergey Sharybin wants to merge 5 commits from Sergey/blender:kuwahara_speedup into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.

There are two major sources of speedup:

  • Stick to single precision floating point values
  • Move towards vectorized types

Using single precision floating point values is something that needs
to be tackled sooner or later in order to make the code easier to be
ported to GPU.

There is possibly some difference in the output images caused by the
different handling of epsilons. The code follows closer how we handle
similar issues in Cycles, and the original image where the NaN issues
were spotted still renders fine.

Use of vectorized types explicitly solves the issue of sampling the
input multiple times, and calculating luminance for the same pixel
multiple times. It also helps to benefit of auto-vectorization.

When compositing 3840 x 2160 image the operation itself is 4x faster
on Apple M2 (36.2 sec before, 8.2 after), the final compositing is
somewhat less linearly scaled (39.6 sec before, 11.3 after). This is
because there are some other operations involved to reach the final
frame.

Note that the numbers are from the full-frame compositor. The tiled
compositor is also speed-up using the same changes, but there the
absolute values are much higher, and the relative speedup is about
3x only.

There are two major sources of speedup: - Stick to single precision floating point values - Move towards vectorized types Using single precision floating point values is something that needs to be tackled sooner or later in order to make the code easier to be ported to GPU. There is possibly some difference in the output images caused by the different handling of epsilons. The code follows closer how we handle similar issues in Cycles, and the original image where the NaN issues were spotted still renders fine. Use of vectorized types explicitly solves the issue of sampling the input multiple times, and calculating luminance for the same pixel multiple times. It also helps to benefit of auto-vectorization. When compositing 3840 x 2160 image the operation itself is 4x faster on Apple M2 (36.2 sec before, 8.2 after), the final compositing is somewhat less linearly scaled (39.6 sec before, 11.3 after). This is because there are some other operations involved to reach the final frame. Note that the numbers are from the full-frame compositor. The tiled compositor is also speed-up using the same changes, but there the absolute values are much higher, and the relative speedup is about 3x only.
Sergey Sharybin added the
Interest
Compositing
Module
VFX & Video
labels 2023-06-09 11:05:34 +02:00
Sergey Sharybin added 1 commit 2023-06-09 11:05:39 +02:00
30c676da41 Compositor: Speedup anisotropic Kuwahara operation
There are two major sources of speedup:
- Stick to single precision floating point values
- Move towards vectorized types

Using single precision floating point values is something that needs
to be tackled sooner or later in order to make the code easier to be
ported to GPU.

There is possibly some difference in the output images caused by the
different handling of epsilons. The code follows closer how we handle
similar issues in Cycles, and the original image where the NaN issues
were spotted still renders fine.

Use of vectorized types explicitly solves the issue of sampling the
input multiple times, and calculating luminance for the same pixel
multiple times. It also helps to benefit of auto-vectorization.

When compositing 3840 x 2160 image the operation itself is 4x faster
on Apple M2 (36.2 sec before, 8.2 after), the final compositing is
somewhat less linearly scaled (39.6 sec before, 11.3 after). This is
because there are some other operations involved to reach the final
frame.

Note that the numbers are from the full-frame compositor. The tiled
compositor is also speed-up using the same changes, but there the
absolute values are much higher, and the relative speedup is about
3x only.
Sergey Sharybin requested review from Habib Gahbiche 2023-06-09 11:05:48 +02:00
Habib Gahbiche reviewed 2023-06-10 09:12:41 +02:00
@ -284,0 +284,4 @@
float4 color;
image->read_elem(xx, yy, &color.x);
/* TODO(@zazizizou): only compute lum once per region. */
Member

Since you removed the outer loop for channels this todo is now done. Nicely done ;)

Since you removed the outer loop for channels this todo is now done. Nicely done ;)
Sergey marked this conversation as resolved
Habib Gahbiche reviewed 2023-06-10 19:54:44 +02:00
@ -11,2 +13,4 @@
namespace blender::compositor {
/* Compute x to the given power, in a safe manner which does not produce non-finite values.
* For non-positive values of x zero si returned. */
Member

typo

typo
Sergey marked this conversation as resolved
Member

I can confirm the speedup, but results are now significantly different than before. In images with many rough surfaces I think the results are visually equal, if not better. But for images with flat surfaces, I can see black patches again.

  • Image "div.png" shows old divided by new implementation. There are significant differences around edges (up to 90%), but visually the new image looks very good
  • Image "flat surface compare.png" shows old (right) vs. new (left) implementation side by side. You can see black lines and patches appearing again.

I find the speedup with float too attractive to throw away though :) so we might need to solve this on the algorithm level, e.g. handle special cases where strength values from structure tensor are below a threshold..

I can confirm the speedup, but results are now significantly different than before. In images with many rough surfaces I think the results are visually equal, if not better. But for images with flat surfaces, I can see black patches again. - Image "div.png" shows old divided by new implementation. There are significant differences around edges (up to 90%), but visually the new image looks very good - Image "flat surface compare.png" shows old (right) vs. new (left) implementation side by side. You can see black lines and patches appearing again. I find the speedup with float too attractive to throw away though :) so we might need to solve this on the algorithm level, e.g. handle special cases where strength values from structure tensor are below a threshold..
Sergey Sharybin added 1 commit 2023-06-12 10:43:28 +02:00
76bab989a0 Updates for the reivew
- Mark TODO as resolved
- Fix typo in comment
Sergey Sharybin added 1 commit 2023-06-12 10:49:14 +02:00
Author
Owner

Image "flat surface compare.png" shows old (right) vs. new (left) implementation side by side. You can see black lines and patches appearing again.

I am a bit confused. The black patches are on the right.

I can confirm the black pixels in some of artificial setup I've made. Think we should have something like this in the automated regression suit.

Replacing

var[i] = math::safe_sqrt(var[i]);

with

var[i] = math::max(var[i], float3(FLT_EPSILON * FLT_EPSILON));
var[i] = math::sqrt(var[i]);

makes the code behave closer to what it was before, and solves the blackening regions. I'll commit this tweak to have the latest state of the code in the branch.

With the updated resolution in the regression tests it does not pass the tests. But to me it is a bit hard to judge whether it is actual behavior regression, or something just caused by slightly different math (aka, could be declared expected). Not sure how to go about it.

> Image "flat surface compare.png" shows old (right) vs. new (left) implementation side by side. You can see black lines and patches appearing again. I am a bit confused. The black patches are on the right. I can confirm the black pixels in some of artificial setup I've made. Think we should have something like this in the automated regression suit. Replacing ``` var[i] = math::safe_sqrt(var[i]); ``` with ``` var[i] = math::max(var[i], float3(FLT_EPSILON * FLT_EPSILON)); var[i] = math::sqrt(var[i]); ``` makes the code behave closer to what it was before, and solves the blackening regions. I'll commit this tweak to have the latest state of the code in the branch. With the updated resolution in the regression tests it does not pass the tests. But to me it is a bit hard to judge whether it is actual behavior regression, or something just caused by slightly different math (aka, could be declared expected). Not sure how to go about it.
Sergey Sharybin added 1 commit 2023-06-12 11:14:36 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
315ee6b626
Fix blackening of almost-uniform regions
Make the variance calculation closer to what it was before the
changes.
Member

I am a bit confused. The black patches are on the right.

Sorry I meant new implementation is on the right of course. I swear I know the difference between left and right :p

From the tests I did, I saw some minor improvements around edges from this patch actually. I will document the results later this week. Most differences are around edges so I also want to test it better in combination with the patch #108858 that improves edge detection.

> I am a bit confused. The black patches are on the right. Sorry I meant new implementation is on the right of course. I swear I know the difference between left and right :p From the tests I did, I saw some minor improvements around edges from this patch actually. I will document the results later this week. Most differences are around edges so I also want to test it better in combination with the patch #108858 that improves edge detection.
Member

I had a closer look at the results. I tested with and without the changes from #108858 but the changes from #108858 didn't make much of a difference, so the following comparisons are only between this patch and main.

I can confirm black patches are not a problem anymore, but there are still significant differences around the edges. I attached two images that summarize my findings well.

  • "water drop.png": Left is original image, middle is this patch, right is main. You can see that the water stream (sharp vertical edge) is better preserved in the right image.
  • "window brackets.png": left is original image, middle is this patch, right is main. You can see the brackets are also better preserved in the right image (main). So diagonal edges are somehow slightly worse with this patch

Since the anisotropic Kuwahara filter is about better preservation of edges, I think the old implementation performs better. In my opinion the old implementation is also closer to what we want in Blender.

The big question is of course how significant are these differences. From the images I tested, you really need to zoom in to notice the difference. For example, in the image "whole render side by side.png" images look very similar to me. I would love to get an artists opinion on such differences.

I had a closer look at the results. I tested with and without the changes from #108858 but the changes from #108858 didn't make much of a difference, so the following comparisons are only between this patch and main. I can confirm black patches are not a problem anymore, but there are still significant differences around the edges. I attached two images that summarize my findings well. - "water drop.png": Left is original image, middle is this patch, right is main. You can see that the water stream (sharp vertical edge) is better preserved in the right image. - "window brackets.png": left is original image, middle is this patch, right is main. You can see the brackets are also better preserved in the right image (main). So diagonal edges are somehow slightly worse with this patch Since the anisotropic Kuwahara filter is about better preservation of edges, I think the old implementation performs better. In my opinion the old implementation is also closer to what we want in Blender. The big question is of course how significant are these differences. From the images I tested, you really need to zoom in to notice the difference. For example, in the image "whole render side by side.png" images look very similar to me. I would love to get an artists opinion on such differences.
Sergey Sharybin added this to the Compositing project 2023-06-20 11:34:33 +02:00
Member

@blender-bot build

@blender-bot build
Habib Gahbiche added 1 commit 2023-06-25 19:13:10 +02:00
Author
Owner

@blender-bot package

@blender-bot package
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR108796) when ready.
Author
Owner

There were a lot of algothimic improvements which bring performance to even better level.

There were a lot of algothimic improvements which bring performance to even better level.
Sergey Sharybin closed this pull request 2023-11-01 10:56:27 +01:00
Sergey Sharybin removed this from the Compositing project 2023-11-01 10:56:37 +01:00
Some checks failed
buildbot/vexp-code-patch-coordinator Build done.

Pull request closed

Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#108796
No description provided.