WIP: Cycles: Implement blue-noise dithered sampling #118479

Draft
Lukas Stockner wants to merge 1 commits from LukasStockner/blender:blue-noise-dithered into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

This patch implements blue-noise dithered sampling as described by @nathanvegdahl here, which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels".

The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.

While this initial implementation produces promising results (see below), there are a number of open points remaining:

  • The implementation could still be cleaned up quite a bit.
  • The logic for branching the sequence (e.g. for Bevel and AO) doesn't work anymore and needs to be adapted.
  • Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though.
  • As usual with RNG stuff, a lot of testing is needed in case there are any correlation issues somewhere.
  • In some cases, denoisers appear to not handle the different noise pattern well and confuse it for detail.

To try it out, enable debug mode for Cycles and switch the sampler to Sobol-Burley (should be renamed as part of the patch).
For now, here's the same test scene as in the linked article, rendered at 1spp:

Mode Noisy Denoised Reference
Default noise_white.png noise_white_denoised.png noise_ref.png
Blue noise noise_blue.png noise_blue_denoised.png
This patch implements blue-noise dithered sampling as described by @nathanvegdahl [here](https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling), which in turn is based on ["Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels"](https://repository.kaust.edu.sa/items/1269ae24-2596-400b-a839-e54486033a93). The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence. While this initial implementation produces promising results (see below), there are a number of open points remaining: - The implementation could still be cleaned up quite a bit. - The logic for branching the sequence (e.g. for Bevel and AO) doesn't work anymore and needs to be adapted. - Notable improvements are only seen with low **maximum** SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though. - As usual with RNG stuff, a lot of testing is needed in case there are any correlation issues somewhere. - In some cases, denoisers appear to not handle the different noise pattern well and confuse it for detail. To try it out, enable debug mode for Cycles and switch the sampler to Sobol-Burley (should be renamed as part of the patch). For now, here's the same test scene as in the linked article, rendered at 1spp: | Mode | Noisy | Denoised | Reference | | - | - | - | - | | Default | ![noise_white.png](/attachments/5a40d4a3-0203-4292-814b-bc2f80605e01) | ![noise_white_denoised.png](/attachments/c7ea7129-3a59-47f0-8d93-f93a7697914a) | ![noise_ref.png](/attachments/cce976c5-e276-406d-84c7-4a58d9f6caa2) | | Blue noise | ![noise_blue.png](/attachments/4bf215c0-5b8c-4e0f-b929-ce4b7a831f87) | ![noise_blue_denoised.png](/attachments/f18a7fc7-e04d-4636-8310-394892f3a744) | |
Lukas Stockner added the
Module
Render & Cycles
label 2024-02-20 03:40:58 +01:00
Lukas Stockner added 1 commit 2024-02-20 03:41:03 +01:00
Member

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option.

Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user.

Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option. Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user. Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.

Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.
Member

Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though.

Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used all samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far.

My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is substantially different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion.

(Having said that, I of course like the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)

> Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though. Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used *all* samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far. My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is *substantially* different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion. (Having said that, I of course *like* the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)
Nathan Vegdahl reviewed 2024-02-21 12:42:19 +01:00
@ -28,0 +29,4 @@
* Performs base-4 Owen scrambling on a reversed-bit unsigned integer.
*
* This is equivalent to the Laine-Karras permutation, but much higher
* quality. See https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling
Member

I suspect this is just a copy/paste oversight, but just want to note that this bit:

This is equivalent to the Laine-Karras permutation, but much higher quality.

Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

I suspect this is just a copy/paste oversight, but just want to note that this bit: > This is equivalent to the Laine-Karras permutation, but much higher quality. Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.

I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.
Member

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request.

As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues.

  1. While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit?

  2. In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed?

  3. The sample offset option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render.


There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this.

  1. Sobol Burley does not support the Scrambling Distance feature. What will happen here?
    • Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu?
    • Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss.
  2. Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu?

Some of these are more general questions, feel free to shift the discussion elsewhere.

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request. As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues. 1. While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit? 2. In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed? 3. The `sample offset` option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render. --- There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this. 1. Sobol Burley does not support the Scrambling Distance feature. What will happen here? - Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu? - Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss. 2. Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu? Some of these are more general questions, feel free to shift the discussion elsewhere.
First-time contributor

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints.

I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU!

I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds.

Let the TDs do their job, we are artists too.

Stefan Werner has a few patches with dithered sampling working with different sampling methods.

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints. I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU! I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds. Let the TDs do their job, we are artists too. Stefan Werner has a few patches with dithered sampling working with different sampling methods.
Author
Member

Okay, I've looked into a few ways to improve the behavior here, but not with much success.
As a summary of the requirements:

  • We'd like the method to also work well for a prefix of the full sequence (e.g. when using adaptive sampling)
  • We'd like the 1-SPP case in particular to work well for interactive viewport navigation, even when the max SPP is higher
  • We'd like the method to work well for all SPP values, not just certain magic values (e.g. powers of 2)
  • We'd like to not sacrifice stratification within each pixel

The methods I have kept around for testing are:

  • "Pure": The method as originally implemented, uses a blue noise sequence matching the scene SPP
  • "Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though...
  • "Cascade": Starts with a 1-SPP blue noise sequence, then a 2-SPP, then a 4-SPP etc. Keeps going until it reaches the overall sample count, the last sequence therefore is not completely used.
    • Performs quite poorly, probably since the sequences are independent of each other so they don't combine nicely w.r.t. stratification etc.
  • "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol.
    • The most boring one, since it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value.

I've spent way too long staring at noise patterns in renders, so I figure I'll just push a version with all four included and trigger the buildbot so people can give feedback.

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

  • Classic (= Tabulated Sobol)
  • Blue-Noise (= "Pure" above)
  • Automatic (default, = "First" for viewport and "Pure" for final render)
    One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.
Okay, I've looked into a few ways to improve the behavior here, but not with much success. As a summary of the requirements: - We'd like the method to also work well for a prefix of the full sequence (e.g. when using adaptive sampling) - We'd like the 1-SPP case in particular to work well for interactive viewport navigation, even when the max SPP is higher - We'd like the method to work well for all SPP values, not just certain magic values (e.g. powers of 2) - We'd like to not sacrifice stratification within each pixel The methods I have kept around for testing are: - "Pure": The method as originally implemented, uses a blue noise sequence matching the scene SPP - "Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though... - "Cascade": Starts with a 1-SPP blue noise sequence, then a 2-SPP, then a 4-SPP etc. Keeps going until it reaches the overall sample count, the last sequence therefore is not completely used. - Performs quite poorly, probably since the sequences are independent of each other so they don't combine nicely w.r.t. stratification etc. - "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol. - The most boring one, since it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value. I've spent way too long staring at noise patterns in renders, so I figure I'll just push a version with all four included and trigger the buildbot so people can give feedback. My pick at this point would be to have three options in the final enum (not hidden behind debug flags): - Classic (= Tabulated Sobol) - Blue-Noise (= "Pure" above) - Automatic (default, = "First" for viewport and "Pure" for final render) One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.
Lukas Stockner force-pushed blue-noise-dithered from 4b20562c7f to 6f0f66e439 2024-05-13 01:45:22 +02:00 Compare
Author
Member

@blender-bot package +gpu

@blender-bot package +gpu
Member

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR118479) when ready.
Member

@LukasStockner would you like feedback here, or on a devtalk thread (to avoid clutter in the pull request)

@LukasStockner would you like feedback here, or on a devtalk thread (to avoid clutter in the pull request)

I think it's fine here, no need for a devtalk thread.

I think it's fine here, no need for a devtalk thread.

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

  • Classic (= Tabulated Sobol)
  • Blue-Noise (= "Pure" above)
  • Automatic (default, = "First" for viewport and "Pure" for final render)
    One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.

This sounds good to me.

> My pick at this point would be to have three options in the final enum (not hidden behind debug flags): > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) > One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix. This sounds good to me.
Member
  • "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol.
    • it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value.

There's still the issue that during viewport navigating, the sample count changes between 1-4 SSP. So if only the first one is blue noise, and you're using 4 SSP for navigation, then you don't see much of the benefit.

If denoising is enabled, then 1 SSP is used while navigating, so it helps there. But in my testing, 1 SSP blue noise with denoising can sometimes result in the noise being visible through the denoiser, while tabulated sobol + denoising is just blobby (which I think is preferred here).

My suggestion is:

  1. Fix the issue where first sample is blue noise, then following samples are tabulated sobol when using >1 SSP for viewport navigation. That way blue noise is visible while navigating more often.
  2. For the proposed "Automatic" mode, viewport could use "First" if denoising is disabled, otherwise Tabulated Sobol if denoising is enabled (to avoid the noise being visible through the denoiser as you navigate). This suggestion may change depending on how common the "noise through the denoiser" issue is.

With the "First" option (in it's current form), we can't use "Scrambling Distance" on the Tabulated Sobol samples. This is probably alright, but it might be something to consider enabling?


  • Classic (= Tabulated Sobol)
  • Blue-Noise (= "Pure" above)
  • Automatic (default, = "First" for viewport and "Pure" for final render)

Maybe you could re-add the original Sobol-Burley behind the debug option? Other than that it seems alright.

> - "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol. > - it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value. There's still the issue that during viewport navigating, the sample count changes between 1-4 SSP. So if only the first one is blue noise, and you're using 4 SSP for navigation, then you don't see much of the benefit. If denoising is enabled, then 1 SSP is used while navigating, so it helps there. But in my testing, 1 SSP blue noise with denoising can sometimes result in the noise being visible through the denoiser, while tabulated sobol + denoising is just blobby (which I think is preferred here). My suggestion is: 1. Fix the issue where first sample is blue noise, then following samples are tabulated sobol when using >1 SSP for viewport navigation. That way blue noise is visible while navigating more often. 2. For the proposed "Automatic" mode, viewport could use "First" if denoising is disabled, otherwise Tabulated Sobol if denoising is enabled (to avoid the noise being visible through the denoiser as you navigate). This suggestion may change depending on how common the "noise through the denoiser" issue is. --- With the "First" option (in it's current form), we can't use "Scrambling Distance" on the Tabulated Sobol samples. This is probably alright, but it might be something to consider enabling? --- > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) Maybe you could re-add the original Sobol-Burley behind the debug option? Other than that it seems alright.
Member

I've looked into a few ways to improve the behavior here, but not with much success.

Yeah, that's been the struggle for me as well. But I think Brecht's earlier point about this being useful regardless is a good one.

"Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though...

[...]

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

  • Classic (= Tabulated Sobol)
  • Blue-Noise (= "Pure" above)
  • Automatic (default, = "First" for viewport and "Pure" for final render)

I like that set of options, but I would suggest using "round" as you described rather than "pure" for the Blue-Noise and Automatic options. It should play nicer with Owen scrambling, IIRC.


Something else I've been wondering about that is relevant but not specific to this PR (and perhaps should be split into a separate discussion) is how we deal with distributed rendering. For example, rendering 256 samples on one machine, 256 on another, etc. and then merging them all afterwards.

Specifically, although Cycles does have the Sample Offset parameter for this use case, that only does the "right" thing with sequences that don't change when the sample count changes.

For Sobol-Burley that holds true. But for Blue Noise Sobol-Burley it doesn't: if on one machine you render 256 samples with offset 0, and on another machine 256 samples with offset 256, intuitively you would expect those machines to be rendering the 1st and 2nd set of 256 samples from the same sequence. But in fact the first one will be rendering from a sequence that distributes sets of 256 samples to each pixel, and the latter will be rendering from a sequence that distributes sets of 512 samples to each pixel.

These two different sequences on the two different machines are not stratified with each other. In practice it should(?) still converge, but it will take more samples than it would if properly stratified. And in any case, will certainly not be blue-noise distributed anymore.

(Also, come to think of it, I think we might(?) have a similar problem with Tabulated Sobol due to one of the optimizations we added: it generates a different-sized table of samples depending on the SPP setting. And I don't think(?) the sampling code ensures that the tables are used such that smaller tables behave as a prefix of larger ones.)

So it might be a good idea to add another parameter that pairs with Sample Offset: the number of samples from the larger sequence to render. That way the normal render settings specify the total sample count for the completed merged render, and Sample Offset + Sample Subset Count (needs a better name) together handle the distributed-render use case. And then Cycles has enough information to ensure that the sampling is coordinated properly between all the machines.

> I've looked into a few ways to improve the behavior here, but not with much success. Yeah, that's been the struggle for me as well. But I think [Brecht's earlier point](https://projects.blender.org/blender/blender/pulls/118479#issuecomment-1128831) about this being useful regardless is a good one. > "Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though... > > [...] > > My pick at this point would be to have three options in the final enum (not hidden behind debug flags): > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) I like that set of options, but I would suggest using "round" as you described rather than "pure" for the `Blue-Noise` and `Automatic` options. It should play nicer with Owen scrambling, IIRC. ---- Something else I've been wondering about that is relevant but not specific to this PR (and perhaps should be split into a separate discussion) is how we deal with distributed rendering. For example, rendering 256 samples on one machine, 256 on another, etc. and then merging them all afterwards. Specifically, although Cycles does have the `Sample Offset` parameter for this use case, that only does the "right" thing with sequences that *don't change when the sample count changes*. For Sobol-Burley that holds true. But for Blue Noise Sobol-Burley it doesn't: if on one machine you render 256 samples with offset 0, and on another machine 256 samples with offset 256, intuitively you would expect those machines to be rendering the 1st and 2nd set of 256 samples from the same sequence. But in fact the first one will be rendering from a sequence that distributes sets of 256 samples to each pixel, and the latter will be rendering from a sequence that distributes sets of 512 samples to each pixel. These two different sequences on the two different machines are not stratified with each other. In practice it should(?) still converge, but it will take more samples than it would if properly stratified. And in any case, will certainly not be blue-noise distributed anymore. (Also, come to think of it, I think we might(?) have a similar problem with Tabulated Sobol due to one of the optimizations we added: it generates a different-sized table of samples depending on the SPP setting. And I don't think(?) the sampling code ensures that the tables are used such that smaller tables behave as a prefix of larger ones.) So it might be a good idea to add another parameter that pairs with `Sample Offset`: the number of samples from the larger sequence to render. That way the normal render settings specify the total sample count for the completed merged render, and `Sample Offset` + `Sample Subset Count` (needs a better name) together handle the distributed-render use case. And then Cycles has enough information to ensure that the sampling is coordinated properly between all the machines.
Author
Member
  • Regarding performance at e.g. 4 SPP in the viewport: Now that I think about it, "first" should probably do 1 SPP blue noise for the first sample, and (N-1) SPP blue noise for the remaining ones. As mentioned, it appears that even an incomplete blue noise sequence isn't notably worse than Tabulated Sobol.
  • Regarding denoising: Instead of trying to work around the denoiser, I'd prefer to train it to handle blue-noise inputs properly. I'll dig up my dataset from the adaptive sampling work, re-render with blue noise and see if training on that helps. If yes, we should try to get this into the official OIDN weights.
  • Regarding distributed rendering: Yes, good point. I'll add that.
  • Regarding "round" vs. "pure": I'll run some tests to see which one behaves better overall. I'd expect "round" to have better stratification, but "pure" to have better blue-noise properties.
- Regarding performance at e.g. 4 SPP in the viewport: Now that I think about it, "first" should probably do 1 SPP blue noise for the first sample, and (N-1) SPP blue noise for the remaining ones. As mentioned, it appears that even an incomplete blue noise sequence isn't notably worse than Tabulated Sobol. - Regarding denoising: Instead of trying to work around the denoiser, I'd prefer to train it to handle blue-noise inputs properly. I'll dig up my dataset from the adaptive sampling work, re-render with blue noise and see if training on that helps. If yes, we should try to get this into the official OIDN weights. - Regarding distributed rendering: Yes, good point. I'll add that. - Regarding "round" vs. "pure": I'll run some tests to see which one behaves better overall. I'd expect "round" to have better stratification, but "pure" to have better blue-noise properties.
Author
Member

"Round" appears to perform worse than "pure" from what I can see.

Two examples here (BMW at 13spp and Cube Diorama at 6spp):

Scene Tabulated Sobol Pure Round
BMW scaled-bmw27-ts.png scaled-bmw27-pure.png scaled-bmw27-round.png
Diorama scaled-diorama-ts.png scaled-diorama-pure.png scaled-diorama-round.png
"Round" appears to perform worse than "pure" from what I can see. Two examples here (BMW at 13spp and Cube Diorama at 6spp): | Scene | Tabulated Sobol | Pure | Round | |-|-|-|-| | BMW | ![scaled-bmw27-ts.png](/attachments/bc706875-09dc-4c58-84a1-d3b06e953199) | ![scaled-bmw27-pure.png](/attachments/80abc32a-dcf8-4830-bc6c-c8d1f64956c1) | ![scaled-bmw27-round.png](/attachments/da01e418-95a4-4ece-b51b-027bb3aa0838) | | Diorama | ![scaled-diorama-ts.png](/attachments/53fafe7d-722d-4f3d-87cf-b319d24fc356) | ![scaled-diorama-pure.png](/attachments/b04e5fc2-741c-40e7-90fa-cf3a2e452284) | ![scaled-diorama-round.png](/attachments/eded264a-8d5a-4921-a4e4-507cb9e2c794) |
Lukas Stockner force-pushed blue-noise-dithered from 6f0f66e439 to 8e0810650d 2024-05-21 04:07:59 +02:00 Compare
Author
Member
  • Rebased
  • Changed "first" to use another blue-noise sequence for the following pixels
  • Fixed UI enum mapping
  • Moved UI option from debug panel into Advanced panel (some entries are still only shown with debug enabled)
  • Added automatic option

Please disregard the comparison above, that was using a wrong UI enum so it's actually comparing "pure" to "cascade". I've got a better comparison using FFT-based noise spectra now, I'll post these next.

- Rebased - Changed "first" to use another blue-noise sequence for the following pixels - Fixed UI enum mapping - Moved UI option from debug panel into Advanced panel (some entries are still only shown with debug enabled) - Added automatic option Please disregard the comparison above, that was using a wrong UI enum so it's actually comparing "pure" to "cascade". I've got a better comparison using FFT-based noise spectra now, I'll post these next.
Member

Please disregard the comparison above

Ah, got it. I was indeed surprised at the results with round! Of course, my intuitions are often wrong anyway, so I was prepared to believe it. Experiments are always better than assuming. :-)

In any case, looking forward to seeing the fixed results. Thanks for taking the time to do this!

> Please disregard the comparison above Ah, got it. I was indeed surprised at the results with round! Of course, my intuitions are often wrong anyway, so I was prepared to believe it. Experiments are always better than assuming. :-) In any case, looking forward to seeing the fixed results. Thanks for taking the time to do this!
Nikita Sirgienko added this to the 4.2 LTS milestone 2024-05-28 17:09:04 +02:00
Nikita Sirgienko added this to the Render & Cycles project 2024-05-28 17:09:11 +02:00
This pull request is marked as a work in progress.
This branch is out-of-date with the base branch

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u blue-noise-dithered:LukasStockner-blue-noise-dithered
git checkout LukasStockner-blue-noise-dithered
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#118479
No description provided.