WIP: Cycles: Implement blue-noise dithered sampling #118479

Draft
Lukas Stockner wants to merge 1 commits from LukasStockner/blender:blue-noise-dithered into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

This patch implements blue-noise dithered sampling as described by @nathanvegdahl here, which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels".

The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.

While this initial implementation produces promising results (see below), there are a number of open points remaining:

  • The implementation could still be cleaned up quite a bit.
  • The logic for branching the sequence (e.g. for Bevel and AO) doesn't work anymore and needs to be adapted.
  • Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though.
  • As usual with RNG stuff, a lot of testing is needed in case there are any correlation issues somewhere.
  • In some cases, denoisers appear to not handle the different noise pattern well and confuse it for detail.

To try it out, enable debug mode for Cycles and switch the sampler to Sobol-Burley (should be renamed as part of the patch).
For now, here's the same test scene as in the linked article, rendered at 1spp:

Mode Noisy Denoised Reference
Default noise_white.png noise_white_denoised.png noise_ref.png
Blue noise noise_blue.png noise_blue_denoised.png
This patch implements blue-noise dithered sampling as described by @nathanvegdahl [here](https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling), which in turn is based on ["Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels"](https://repository.kaust.edu.sa/items/1269ae24-2596-400b-a839-e54486033a93). The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence. While this initial implementation produces promising results (see below), there are a number of open points remaining: - The implementation could still be cleaned up quite a bit. - The logic for branching the sequence (e.g. for Bevel and AO) doesn't work anymore and needs to be adapted. - Notable improvements are only seen with low **maximum** SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though. - As usual with RNG stuff, a lot of testing is needed in case there are any correlation issues somewhere. - In some cases, denoisers appear to not handle the different noise pattern well and confuse it for detail. To try it out, enable debug mode for Cycles and switch the sampler to Sobol-Burley (should be renamed as part of the patch). For now, here's the same test scene as in the linked article, rendered at 1spp: | Mode | Noisy | Denoised | Reference | | - | - | - | - | | Default | ![noise_white.png](/attachments/5a40d4a3-0203-4292-814b-bc2f80605e01) | ![noise_white_denoised.png](/attachments/c7ea7129-3a59-47f0-8d93-f93a7697914a) | ![noise_ref.png](/attachments/cce976c5-e276-406d-84c7-4a58d9f6caa2) | | Blue noise | ![noise_blue.png](/attachments/4bf215c0-5b8c-4e0f-b929-ce4b7a831f87) | ![noise_blue_denoised.png](/attachments/f18a7fc7-e04d-4636-8310-394892f3a744) | |
Lukas Stockner added the
Module
Render & Cycles
label 2024-02-20 03:40:58 +01:00
Lukas Stockner added 1 commit 2024-02-20 03:41:03 +01:00
Member

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option.

Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user.

Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option. Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user. Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.

Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.
Member

Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though.

Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used all samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far.

My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is substantially different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion.

(Having said that, I of course like the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)

> Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though. Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used *all* samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far. My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is *substantially* different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion. (Having said that, I of course *like* the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)
Nathan Vegdahl reviewed 2024-02-21 12:42:19 +01:00
@ -28,0 +29,4 @@
* Performs base-4 Owen scrambling on a reversed-bit unsigned integer.
*
* This is equivalent to the Laine-Karras permutation, but much higher
* quality. See https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling
Member

I suspect this is just a copy/paste oversight, but just want to note that this bit:

This is equivalent to the Laine-Karras permutation, but much higher quality.

Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

I suspect this is just a copy/paste oversight, but just want to note that this bit: > This is equivalent to the Laine-Karras permutation, but much higher quality. Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.

I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.
Member

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request.

As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues.

  1. While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit?

  2. In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed?

  3. The sample offset option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render.


There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this.

  1. Sobol Burley does not support the Scrambling Distance feature. What will happen here?
    • Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu?
    • Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss.
  2. Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu?

Some of these are more general questions, feel free to shift the discussion elsewhere.

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request. As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues. 1. While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit? 2. In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed? 3. The `sample offset` option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render. --- There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this. 1. Sobol Burley does not support the Scrambling Distance feature. What will happen here? - Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu? - Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss. 2. Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu? Some of these are more general questions, feel free to shift the discussion elsewhere.
First-time contributor

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints.

I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU!

I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds.

Let the TDs do their job, we are artists too.

Stefan Werner has a few patches with dithered sampling working with different sampling methods.

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints. I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU! I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds. Let the TDs do their job, we are artists too. Stefan Werner has a few patches with dithered sampling working with different sampling methods.
This pull request is marked as a work in progress.
This branch is out-of-date with the base branch

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u blue-noise-dithered:LukasStockner-blue-noise-dithered
git checkout LukasStockner-blue-noise-dithered
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#118479
No description provided.