Cycles: Implement blue-noise dithered sampling #118479

Lukas Stockner · 2024-02-20T03:40:58+01:00

Lukas Stockner commented

2024-02-20 03:40:58 +01:00

This patch implements blue-noise dithered sampling as described by @nathanvegdahl here, which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels".

The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.

This results in a more high-frequency noise distribution, which appears smoother despite not being less noisy overall.

The main limitation at the moment is that the improvement is only clear if the full sample amount is used per pixel, so interactive preview rendering and adaptive sampling will not receive the benefit. One exception to this is that when using the new "Automatic" setting, the first sample in interactive rendering will also be blue-noise-distributed.

The sampling mode option is now exposed in the UI, with the three options being Blue Noise (the new mode), Classic (the previous Tabulated Sobol method) and the new default, Automatic (blue noise, with the additional property of ensuring the first sample is also blue-noise-distributed in interactive rendering). When debug mode is enabled, additional options appear, such as Sobol-Burley.

Note that the scrambling distance option is not compatible with the blue-noise pattern.

Here's the same test scene as in the linked article, rendered at 1spp:

Mode	Noisy	Denoised	Reference
Default
Blue noise

This patch implements blue-noise dithered sampling as described by @nathanvegdahl [here](https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling), which in turn is based on ["Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels"](https://repository.kaust.edu.sa/items/1269ae24-2596-400b-a839-e54486033a93). The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence. This results in a more high-frequency noise distribution, which appears smoother despite not being less noisy overall. The main limitation at the moment is that the improvement is only clear if the full sample amount is used per pixel, so interactive preview rendering and adaptive sampling will not receive the benefit. One exception to this is that when using the new "Automatic" setting, the first sample in interactive rendering will also be blue-noise-distributed. The sampling mode option is now exposed in the UI, with the three options being Blue Noise (the new mode), Classic (the previous Tabulated Sobol method) and the new default, Automatic (blue noise, with the additional property of ensuring the first sample is also blue-noise-distributed in interactive rendering). When debug mode is enabled, additional options appear, such as Sobol-Burley. Note that the scrambling distance option is not compatible with the blue-noise pattern. Here's the same test scene as in the linked article, rendered at 1spp: | Mode | Noisy | Denoised | Reference | | - | - | - | - | | Default | ![noise_white.png](/attachments/5a40d4a3-0203-4292-814b-bc2f80605e01) | ![noise_white_denoised.png](/attachments/c7ea7129-3a59-47f0-8d93-f93a7697914a) | ![noise_ref.png](/attachments/cce976c5-e276-406d-84c7-4a58d9f6caa2) | | Blue noise | ![noise_blue.png](/attachments/4bf215c0-5b8c-4e0f-b929-ce4b7a831f87) | ![noise_blue_denoised.png](/attachments/f18a7fc7-e04d-4636-8310-394892f3a744) | |

noise_blue_denoised.png

487 KiB

noise_white.png

1.1 MiB

noise_white_denoised.png

516 KiB

noise_ref.png

357 KiB

noise_blue.png

1.1 MiB

👍 8 ❤️ 14 🎉 7 🚀 3

Lukas Stockner added the

Module

Render & Cycles

label 2024-02-20 03:40:58 +01:00

Alaska commented

2024-02-20 06:39:11 +01:00

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option.

Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user.

Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option. Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user. Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.

Brecht Van Lommel commented

2024-02-20 12:27:22 +01:00

Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.

Nathan Vegdahl commented

2024-02-21 12:10:50 +01:00

Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though.

Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used all samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far.

My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is substantially different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion.

(Having said that, I of course like the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)

> Notable improvements are only seen with low maximum SPP values. If we just set 1000 SPP and let adaptive sampling handle it, we don't really have any benefit. This might just be an inherent limitation, though. Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used *all* samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far. My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is *substantially* different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion. (Having said that, I of course *like* the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)

Nathan Vegdahl reviewed 2024-02-21 12:42:19 +01:00

intern/cycles/kernel/sample/util.h Outdated

						
				@ -28,0 +29,4 @@

				 * Performs base-4 Owen scrambling on a reversed-bit unsigned integer.

				 *

				 * This is equivalent to the Laine-Karras permutation, but much higher

				 * quality.  See https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling

Nathan Vegdahl commented

2024-02-21 12:42:19 +01:00

I suspect this is just a copy/paste oversight, but just want to note that this bit:

This is equivalent to the Laine-Karras permutation, but much higher quality.

Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

I suspect this is just a copy/paste oversight, but just want to note that this bit: > This is equivalent to the Laine-Karras permutation, but much higher quality. Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.

nathanvegdahl marked this conversation as resolved

Brecht Van Lommel commented

2024-02-21 13:10:11 +01:00

I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.

👍 4 ❤️ 2

Alaska commented

2024-02-23 03:09:57 +01:00

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request.

As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues.

While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit?
In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed?
The sample offset option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render.

There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this.

Sobol Burley does not support the Scrambling Distance feature. What will happen here?
- Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu?
- Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss.
Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu?

Some of these are more general questions, feel free to shift the discussion elsewhere.

I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request. As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues. 1. While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit? 2. In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed? 3. The `sample offset` option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render. --- There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this. 1. Sobol Burley does not support the Scrambling Distance feature. What will happen here? - Will Tabulated Sobol with Scrambling Distance remain as a debug features? Or will it be accessible without the debug menu? - Will the Scrambling Distance feature be removed (and Tabulated Sobol remain)? There was talk a while ago about whether or not scrambling distance is even worth it. No conclusions were made back then, but it may be something to re-discuss. 2. Should Sobol Burley without blue noise dithered sampling still be an option people can select? If so, should it be accessible to the end user, or remain behind a debug menu? Some of these are more general questions, feel free to shift the discussion elsewhere.

Lord Odin commented

2024-03-04 13:19:11 +01:00

First-time contributor

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints.

I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU!

I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds.

Let the TDs do their job, we are artists too.

Stefan Werner has a few patches with dithered sampling working with different sampling methods.

As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints. I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU! I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds. Let the TDs do their job, we are artists too. Stefan Werner has a few patches with dithered sampling working with different sampling methods.

❤️ 5

Lukas Stockner commented

2024-05-13 01:06:53 +02:00

Okay, I've looked into a few ways to improve the behavior here, but not with much success.
As a summary of the requirements:

We'd like the method to also work well for a prefix of the full sequence (e.g. when using adaptive sampling)
We'd like the 1-SPP case in particular to work well for interactive viewport navigation, even when the max SPP is higher
We'd like the method to work well for all SPP values, not just certain magic values (e.g. powers of 2)
We'd like to not sacrifice stratification within each pixel

The methods I have kept around for testing are:

"Pure": The method as originally implemented, uses a blue noise sequence matching the scene SPP
"Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though...
"Cascade": Starts with a 1-SPP blue noise sequence, then a 2-SPP, then a 4-SPP etc. Keeps going until it reaches the overall sample count, the last sequence therefore is not completely used.
- Performs quite poorly, probably since the sequences are independent of each other so they don't combine nicely w.r.t. stratification etc.
"First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol.
- The most boring one, since it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value.

I've spent way too long staring at noise patterns in renders, so I figure I'll just push a version with all four included and trigger the buildbot so people can give feedback.

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

Classic (= Tabulated Sobol)
Blue-Noise (= "Pure" above)
Automatic (default, = "First" for viewport and "Pure" for final render)
One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.

Okay, I've looked into a few ways to improve the behavior here, but not with much success. As a summary of the requirements: - We'd like the method to also work well for a prefix of the full sequence (e.g. when using adaptive sampling) - We'd like the 1-SPP case in particular to work well for interactive viewport navigation, even when the max SPP is higher - We'd like the method to work well for all SPP values, not just certain magic values (e.g. powers of 2) - We'd like to not sacrifice stratification within each pixel The methods I have kept around for testing are: - "Pure": The method as originally implemented, uses a blue noise sequence matching the scene SPP - "Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though... - "Cascade": Starts with a 1-SPP blue noise sequence, then a 2-SPP, then a 4-SPP etc. Keeps going until it reaches the overall sample count, the last sequence therefore is not completely used. - Performs quite poorly, probably since the sequences are independent of each other so they don't combine nicely w.r.t. stratification etc. - "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol. - The most boring one, since it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value. I've spent way too long staring at noise patterns in renders, so I figure I'll just push a version with all four included and trigger the buildbot so people can give feedback. My pick at this point would be to have three options in the final enum (not hidden behind debug flags): - Classic (= Tabulated Sobol) - Blue-Noise (= "Pure" above) - Automatic (default, = "First" for viewport and "Pure" for final render) One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.

Lukas Stockner force-pushed blue-noise-dithered from 4b20562c7f to 6f0f66e439

2024-05-13 01:45:22 +02:00

Compare

Lukas Stockner commented

2024-05-13 03:08:18 +02:00

@blender-bot package +gpu

Blender Bot commented

2024-05-13 03:08:20 +02:00

Package build started. Download here when ready.

Package build started. [Download here](https://builder.blender.org/download/patch/PR118479) when ready.

Alaska commented

2024-05-13 03:09:33 +02:00

@LukasStockner would you like feedback here, or on a devtalk thread (to avoid clutter in the pull request)

Brecht Van Lommel commented

2024-05-13 20:17:15 +02:00

I think it's fine here, no need for a devtalk thread.

Brecht Van Lommel commented

2024-05-13 20:20:54 +02:00

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

Classic (= Tabulated Sobol)

Blue-Noise (= "Pure" above)

Automatic (default, = "First" for viewport and "Pure" for final render)
One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.

This sounds good to me.

> My pick at this point would be to have three options in the final enum (not hidden behind debug flags): > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) > One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix. This sounds good to me.

Alaska commented

2024-05-16 06:07:52 +02:00

"First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol.

it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value.

There's still the issue that during viewport navigating, the sample count changes between 1-4 SSP. So if only the first one is blue noise, and you're using 4 SSP for navigation, then you don't see much of the benefit.

If denoising is enabled, then 1 SSP is used while navigating, so it helps there. But in my testing, 1 SSP blue noise with denoising can sometimes result in the noise being visible through the denoiser, while tabulated sobol + denoising is just blobby (which I think is preferred here).

My suggestion is:

Fix the issue where first sample is blue noise, then following samples are tabulated sobol when using >1 SSP for viewport navigation. That way blue noise is visible while navigating more often.
For the proposed "Automatic" mode, viewport could use "First" if denoising is disabled, otherwise Tabulated Sobol if denoising is enabled (to avoid the noise being visible through the denoiser as you navigate). This suggestion may change depending on how common the "noise through the denoiser" issue is.

With the "First" option (in it's current form), we can't use "Scrambling Distance" on the Tabulated Sobol samples. This is probably alright, but it might be something to consider enabling?

Classic (= Tabulated Sobol)

Blue-Noise (= "Pure" above)

Automatic (default, = "First" for viewport and "Pure" for final render)

Maybe you could re-add the original Sobol-Burley behind the debug option? Other than that it seems alright.

> - "First": Uses a 1-SPP blue noise sequence for the first sample, then switches to Tabulated Sobol. > - it's almost like T-S for higher SPP values. But that's a good thing - a single sample doesn't mess up the stratification that noticeably, and it still gives the nice interactive viewport experience regardless of SPP value. There's still the issue that during viewport navigating, the sample count changes between 1-4 SSP. So if only the first one is blue noise, and you're using 4 SSP for navigation, then you don't see much of the benefit. If denoising is enabled, then 1 SSP is used while navigating, so it helps there. But in my testing, 1 SSP blue noise with denoising can sometimes result in the noise being visible through the denoiser, while tabulated sobol + denoising is just blobby (which I think is preferred here). My suggestion is: 1. Fix the issue where first sample is blue noise, then following samples are tabulated sobol when using >1 SSP for viewport navigation. That way blue noise is visible while navigating more often. 2. For the proposed "Automatic" mode, viewport could use "First" if denoising is disabled, otherwise Tabulated Sobol if denoising is enabled (to avoid the noise being visible through the denoiser as you navigate). This suggestion may change depending on how common the "noise through the denoiser" issue is. --- With the "First" option (in it's current form), we can't use "Scrambling Distance" on the Tabulated Sobol samples. This is probably alright, but it might be something to consider enabling? --- > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) Maybe you could re-add the original Sobol-Burley behind the debug option? Other than that it seems alright.

👀 1

Nathan Vegdahl commented

2024-05-16 10:17:26 +02:00

I've looked into a few ways to improve the behavior here, but not with much success.

Yeah, that's been the struggle for me as well. But I think Brecht's earlier point about this being useful regardless is a good one.

"Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though...

[...]

My pick at this point would be to have three options in the final enum (not hidden behind debug flags):

Classic (= Tabulated Sobol)

Blue-Noise (= "Pure" above)

Automatic (default, = "First" for viewport and "Pure" for final render)

I like that set of options, but I would suggest using "round" as you described rather than "pure" for the Blue-Noise and Automatic options. It should play nicer with Owen scrambling, IIRC.

Something else I've been wondering about that is relevant but not specific to this PR (and perhaps should be split into a separate discussion) is how we deal with distributed rendering. For example, rendering 256 samples on one machine, 256 on another, etc. and then merging them all afterwards.

Specifically, although Cycles does have the Sample Offset parameter for this use case, that only does the "right" thing with sequences that don't change when the sample count changes.

For Sobol-Burley that holds true. But for Blue Noise Sobol-Burley it doesn't: if on one machine you render 256 samples with offset 0, and on another machine 256 samples with offset 256, intuitively you would expect those machines to be rendering the 1st and 2nd set of 256 samples from the same sequence. But in fact the first one will be rendering from a sequence that distributes sets of 256 samples to each pixel, and the latter will be rendering from a sequence that distributes sets of 512 samples to each pixel.

These two different sequences on the two different machines are not stratified with each other. In practice it should(?) still converge, but it will take more samples than it would if properly stratified. And in any case, will certainly not be blue-noise distributed anymore.

(Also, come to think of it, I think we might(?) have a similar problem with Tabulated Sobol due to one of the optimizations we added: it generates a different-sized table of samples depending on the SPP setting. And I don't think(?) the sampling code ensures that the tables are used such that smaller tables behave as a prefix of larger ones.)

So it might be a good idea to add another parameter that pairs with Sample Offset: the number of samples from the larger sequence to render. That way the normal render settings specify the total sample count for the completed merged render, and Sample Offset + Sample Subset Count (needs a better name) together handle the distributed-render use case. And then Cycles has enough information to ensure that the sampling is coordinated properly between all the machines.

> I've looked into a few ways to improve the behavior here, but not with much success. Yeah, that's been the struggle for me as well. But I think [Brecht's earlier point](https://projects.blender.org/blender/blender/pulls/118479#issuecomment-1128831) about this being useful regardless is a good one. > "Round": I had the feeling that "Pure" sometimes had issues for non-power-of-2 SPP counts, so this variant rounds up the sequence length to the next power of 2 and then just doesn't fully use it. Not sure if it's any better though... > > [...] > > My pick at this point would be to have three options in the final enum (not hidden behind debug flags): > - Classic (= Tabulated Sobol) > - Blue-Noise (= "Pure" above) > - Automatic (default, = "First" for viewport and "Pure" for final render) I like that set of options, but I would suggest using "round" as you described rather than "pure" for the `Blue-Noise` and `Automatic` options. It should play nicer with Owen scrambling, IIRC. ---- Something else I've been wondering about that is relevant but not specific to this PR (and perhaps should be split into a separate discussion) is how we deal with distributed rendering. For example, rendering 256 samples on one machine, 256 on another, etc. and then merging them all afterwards. Specifically, although Cycles does have the `Sample Offset` parameter for this use case, that only does the "right" thing with sequences that *don't change when the sample count changes*. For Sobol-Burley that holds true. But for Blue Noise Sobol-Burley it doesn't: if on one machine you render 256 samples with offset 0, and on another machine 256 samples with offset 256, intuitively you would expect those machines to be rendering the 1st and 2nd set of 256 samples from the same sequence. But in fact the first one will be rendering from a sequence that distributes sets of 256 samples to each pixel, and the latter will be rendering from a sequence that distributes sets of 512 samples to each pixel. These two different sequences on the two different machines are not stratified with each other. In practice it should(?) still converge, but it will take more samples than it would if properly stratified. And in any case, will certainly not be blue-noise distributed anymore. (Also, come to think of it, I think we might(?) have a similar problem with Tabulated Sobol due to one of the optimizations we added: it generates a different-sized table of samples depending on the SPP setting. And I don't think(?) the sampling code ensures that the tables are used such that smaller tables behave as a prefix of larger ones.) So it might be a good idea to add another parameter that pairs with `Sample Offset`: the number of samples from the larger sequence to render. That way the normal render settings specify the total sample count for the completed merged render, and `Sample Offset` + `Sample Subset Count` (needs a better name) together handle the distributed-render use case. And then Cycles has enough information to ensure that the sampling is coordinated properly between all the machines.

Lukas Stockner commented

2024-05-19 18:41:11 +02:00

Regarding performance at e.g. 4 SPP in the viewport: Now that I think about it, "first" should probably do 1 SPP blue noise for the first sample, and (N-1) SPP blue noise for the remaining ones. As mentioned, it appears that even an incomplete blue noise sequence isn't notably worse than Tabulated Sobol.
Regarding denoising: Instead of trying to work around the denoiser, I'd prefer to train it to handle blue-noise inputs properly. I'll dig up my dataset from the adaptive sampling work, re-render with blue noise and see if training on that helps. If yes, we should try to get this into the official OIDN weights.
Regarding distributed rendering: Yes, good point. I'll add that.
Regarding "round" vs. "pure": I'll run some tests to see which one behaves better overall. I'd expect "round" to have better stratification, but "pure" to have better blue-noise properties.

- Regarding performance at e.g. 4 SPP in the viewport: Now that I think about it, "first" should probably do 1 SPP blue noise for the first sample, and (N-1) SPP blue noise for the remaining ones. As mentioned, it appears that even an incomplete blue noise sequence isn't notably worse than Tabulated Sobol. - Regarding denoising: Instead of trying to work around the denoiser, I'd prefer to train it to handle blue-noise inputs properly. I'll dig up my dataset from the adaptive sampling work, re-render with blue noise and see if training on that helps. If yes, we should try to get this into the official OIDN weights. - Regarding distributed rendering: Yes, good point. I'll add that. - Regarding "round" vs. "pure": I'll run some tests to see which one behaves better overall. I'd expect "round" to have better stratification, but "pure" to have better blue-noise properties.

Lukas Stockner commented

2024-05-19 21:28:41 +02:00

"Round" appears to perform worse than "pure" from what I can see.

Two examples here (BMW at 13spp and Cube Diorama at 6spp):

Scene	Tabulated Sobol	Pure	Round
BMW
Diorama

scaled-diorama-ts.png

597 KiB

scaled-diorama-round.png

621 KiB

scaled-bmw27-round.png

159 KiB

scaled-bmw27-ts.png

146 KiB

scaled-bmw27-pure.png

149 KiB

scaled-diorama-pure.png

605 KiB

Lukas Stockner force-pushed blue-noise-dithered from 6f0f66e439 to 8e0810650d

2024-05-21 04:07:59 +02:00

Compare

Lukas Stockner commented

2024-05-21 04:10:20 +02:00

Rebased
Changed "first" to use another blue-noise sequence for the following pixels
Fixed UI enum mapping
Moved UI option from debug panel into Advanced panel (some entries are still only shown with debug enabled)
Added automatic option

Please disregard the comparison above, that was using a wrong UI enum so it's actually comparing "pure" to "cascade". I've got a better comparison using FFT-based noise spectra now, I'll post these next.

- Rebased - Changed "first" to use another blue-noise sequence for the following pixels - Fixed UI enum mapping - Moved UI option from debug panel into Advanced panel (some entries are still only shown with debug enabled) - Added automatic option Please disregard the comparison above, that was using a wrong UI enum so it's actually comparing "pure" to "cascade". I've got a better comparison using FFT-based noise spectra now, I'll post these next.

👍 1

Nathan Vegdahl commented

2024-05-21 10:35:19 +02:00

Please disregard the comparison above

Ah, got it. I was indeed surprised at the results with round! Of course, my intuitions are often wrong anyway, so I was prepared to believe it. Experiments are always better than assuming. :-)

In any case, looking forward to seeing the fixed results. Thanks for taking the time to do this!

> Please disregard the comparison above Ah, got it. I was indeed surprised at the results with round! Of course, my intuitions are often wrong anyway, so I was prepared to believe it. Experiments are always better than assuming. :-) In any case, looking forward to seeing the fixed results. Thanks for taking the time to do this!

Nikita Sirgienko added this to the 4.2 LTS milestone 2024-05-28 17:09:04 +02:00

Nikita Sirgienko added this to the Render & Cycles project 2024-05-28 17:09:11 +02:00

Lukas Stockner commented

2024-06-02 02:46:30 +02:00

Alright, here's the big data dump.

I've used five test locations: The side door (bmwdoor) and the windshield (bmwwindow) of the BMW scene, the white wall with direct lighting (dioramadirect) and the green wall with indirect lighting (dioramaindirect) in the Diorama scene, and a simple test scene with a partially occluded area lamp (flat).

In all scenes, the methodology is the same:

Zoom in extremely closely to make sure that all pixels have the same expected value
For each sampling mode, render 1000 128x128 frames with different seeds
Normalize frames by computing the mean across them, scaling the intensity to unity, and shifting it to be centered at zero
Compute PSNR of all frames, normalize to Tabulated Sobol
Compute magnitude of the 2D FFT
Mask out residual DC term (can otherwise throw off the scale)
Compute average spectrum per method, save as image
Convert 2D spectrum into radial plot, plot all (normalized to Tabulated Sobol) and save as image

The Python script to perform all the analysis on the frames is attached here: fft.py

Here's the results - first of all, the PSNR (to give an indication of the overall noise level):

Scene	SPP	Pure	Round	First	Cascade
`bmwdoor`	13	-0.36	-0.06	-0.46	-0.96
`bmwwindow`	13	-0.03	-0.00	-0.04	-0.14
`dioramadirect`	5	-0.56	-0.48	-1.36	-1.42
`dioramaindirect`	5	-0.10	+0.21	+0.05	-0.64
`flat`	6	-0.60	+0.20	-2.20	-3.83

Note that all scenes intentionally use non-power-of-2 SPPs to demonstrate the difference.

Next, the radial plots:

`bmwdoor`	`bmwwindow`	`dioramadirect`	`dioramaindirect`	`flat`

Finally, I've attached the 2D spectra, but I'm not going to bother putting them all in a list. Note that the color coding is per-image, don't compare quantitatively between images.

Observations:

Cascade is terrible, don't bother.
In terms of pure noise intensity:
- Tabulated Sobol is unsurprisingly the winner.
- Round does a bit better than Pure.
In terms of the spectrum:
- Tabulated Sobol is white noise, as expected.
- Pure is quite straightforward: You trade off low-frequency noise vs. high-frequency.
- Round generally seems like a compromise, except that it's strictly worse than T-S in dioramadirect and doesn't really do much in flat. Strange.
- First generally is just a slightly worse Pure, which makes sense. The exception is once again dioramadirect, where it causes a lot of extra high-frequency noise. Strange.
One interesting note: The spectra generally aren't rotationally symmetric!
- This makes sense since the Z-curve is spaced "more closely" along the X axis, but it's still neat to see it visualized.
- A more complex curve like Hilbert might hide this better, but we'd need to precompute indices for a tile since it's too expensive for runtime evaluation. 64x64 would probably be enough to hide the directionality.
Another interesting note: There is some sort of structured artifact in dioramaindirect that causes a peak in the spectrum. Might be numerical-precision-related due to how far it's zoomed in.

Alright, here's the big data dump. I've used five test locations: The side door (`bmwdoor`) and the windshield (`bmwwindow`) of the BMW scene, the white wall with direct lighting (`dioramadirect`) and the green wall with indirect lighting (`dioramaindirect`) in the Diorama scene, and a simple test scene with a partially occluded area lamp (`flat`). In all scenes, the methodology is the same: - Zoom in extremely closely to make sure that all pixels have the same expected value - For each sampling mode, render 1000 128x128 frames with different seeds - Normalize frames by computing the mean across them, scaling the intensity to unity, and shifting it to be centered at zero - Compute PSNR of all frames, normalize to Tabulated Sobol - Compute magnitude of the 2D FFT - Mask out residual DC term (can otherwise throw off the scale) - Compute average spectrum per method, save as image - Convert 2D spectrum into radial plot, plot all (normalized to Tabulated Sobol) and save as image The Python script to perform all the analysis on the frames is attached here: [fft.py](/attachments/bf921b30-0250-45d6-a48f-49bd86c36ee7) Here's the results - first of all, the PSNR (to give an indication of the overall noise level): | Scene | SPP | T-S | Pure | Round | First | Cascade | | - | - | - | - | - | - | - | | `bmwdoor` | 13 | 0.0 | -0.36 | -0.06 | -0.46 | -0.96 | | `bmwwindow` | 13 | 0.0 | -0.03 | -0.00 | -0.04 | -0.14 | | `dioramadirect` | 5 | 0.0 | -0.56 | -0.48 | -1.36 | -1.42 | | `dioramaindirect` | 5 | 0.0 | -0.10 | **+0.21** | **+0.05** | -0.64 | | `flat` | 6 | 0.0 | -0.60 | **+0.20** | -2.20 | -3.83 | Note that all scenes intentionally use non-power-of-2 SPPs to demonstrate the difference. Next, the radial plots: | `bmwdoor` | `bmwwindow` | `dioramadirect` | `dioramaindirect` | `flat` | | - | - | - | - | - | | ![fft_bmwdoor_radial.png](/attachments/9d8390a6-2759-4d21-a870-54064d63354f) | ![fft_bmwwindow_radial.png](/attachments/70e80c67-9bd2-4c10-ac98-0267a1c611ef) | ![fft_dioramadirect_radial.png](/attachments/08a9d3c7-d1b6-4aef-a086-62655d88c2b4) | ![fft_dioramaindirect_radial.png](/attachments/c26d1557-47cd-4708-ac3d-af6983cbb27d) | ![fft_flat_radial.png](/attachments/924ace4c-ac65-46e8-8811-6ebe28ffc742) | Finally, I've attached the 2D spectra, but I'm not going to bother putting them all in a list. Note that the color coding is per-image, don't compare quantitatively between images. Observations: - Cascade is terrible, don't bother. - In terms of pure noise intensity: - Tabulated Sobol is unsurprisingly the winner. - Round does a bit better than Pure. - In terms of the spectrum: - Tabulated Sobol is white noise, as expected. - Pure is quite straightforward: You trade off low-frequency noise vs. high-frequency. - Round generally seems like a compromise, except that it's strictly worse than T-S in `dioramadirect` and doesn't really do much in `flat`. Strange. - First generally is just a slightly worse Pure, which makes sense. The exception is once again `dioramadirect`, where it causes a lot of extra high-frequency noise. Strange. - One interesting note: The spectra generally aren't rotationally symmetric! - This makes sense since the Z-curve is spaced "more closely" along the X axis, but it's still neat to see it visualized. - A more complex curve like Hilbert might hide this better, but we'd need to precompute indices for a tile since it's too expensive for runtime evaluation. 64x64 would probably be enough to hide the directionality. - Another interesting note: There is some sort of structured artifact in `dioramaindirect` that causes a peak in the spectrum. Might be numerical-precision-related due to how far it's zoomed in.

fft_dioramadirect_ts.png

45 KiB

fft_bmwwindow_first.png

30 KiB

fft_bmwdoor_round.png

41 KiB

fft_bmwdoor_pure.png

37 KiB

fft_bmwdoor_first.png

36 KiB

fft_bmwdoor_ts.png

48 KiB

fft.py

2.4 KiB

fft_flat_round.png

46 KiB

fft_dioramaindirect_radial.png

26 KiB

fft_dioramadirect_radial.png

32 KiB

fft_flat_first.png

30 KiB

fft_bmwdoor_radial.png

29 KiB

fft_bmwwindow_cascade.png

36 KiB

fft_flat_pure.png

31 KiB

fft_dioramadirect_pure.png

34 KiB

fft_bmwwindow_round.png

35 KiB

fft_dioramaindirect_cascade.png

33 KiB

fft_dioramaindirect_first.png

32 KiB

fft_bmwwindow_radial.png

32 KiB

fft_dioramadirect_cascade.png

39 KiB

fft_flat_cascade.png

37 KiB

fft_dioramaindirect_pure.png

34 KiB

fft_flat_ts.png

49 KiB

fft_bmwwindow_pure.png

31 KiB

fft_bmwdoor_cascade.png

41 KiB

fft_dioramaindirect_ts.png

34 KiB

fft_dioramadirect_first.png

30 KiB

fft_dioramadirect_round.png

47 KiB

fft_dioramaindirect_round.png

30 KiB

fft_flat_radial.png

33 KiB

fft_bmwwindow_ts.png

48 KiB

👍 1 ❤️ 1

Lukas Stockner force-pushed blue-noise-dithered from 8e0810650d to b058a3f2b8

2024-06-03 18:12:42 +02:00

Compare

Lukas Stockner commented

2024-06-03 18:25:40 +02:00

New update, this time:

Rebased to latest main
The Sobol mask optimization is back for T-S
The Cascade mode is gone since it's pointless
Blue Noise works for baking now

The remaining points are:

Fixing the path-traced subsurface branching. I'll have that done later today.
Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR.
Deciding between Pure and Round as default. I'd leave it as Pure for now, we can easily change that later.

New update, this time: - Rebased to latest `main` - The Sobol mask optimization is back for T-S - The Cascade mode is gone since it's pointless - Blue Noise works for baking now The remaining points are: - Fixing the path-traced subsurface branching. I'll have that done later today. - Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR. - Deciding between Pure and Round as default. I'd leave it as Pure for now, we can easily change that later.

👍 3

Brecht Van Lommel reviewed 2024-06-03 18:25:55 +02:00

intern/cycles/blender/addon/properties.py Outdated

						
				@ -462,3 +490,3 @@

				        description="Random sampling pattern used by the integrator",

				        items=enum_sampling_pattern,

				        default='TABULATED_SOBOL',

				        default=6,

Brecht Van Lommel commented

2024-06-03 18:24:12 +02:00

AUTOMATIC is 5?

Lukas Stockner commented

2024-06-04 00:39:12 +02:00

Thanks, fixed.

Nathan Vegdahl commented

2024-06-03 19:36:49 +02:00

Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR.

With a proper base-4 Owen scramble it shouldn't matter if you use Hilbert vs a z-curve, since they become equivalent after randomization. So I suspect the root issue is actually the fast hash you're using. Which isn't surprising, since it's not optimized to be high-quality.

Instead of switching to a Hilbert curve you could use the high-quality base-4 Owen scrambling function from my first post on the topic. And you should only need to compute it once per pixel (not per sample), so the performance impact might not be too significant--although you'd have to test to be sure, of course.

> - Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR. With a proper base-4 Owen scramble it shouldn't matter if you use Hilbert vs a z-curve, since they become equivalent after randomization. So I suspect the root issue is actually the [fast hash](https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling) you're using. Which isn't surprising, since it's not optimized to be high-quality. Instead of switching to a Hilbert curve you could use the high-quality base-4 Owen scrambling function from [my first post on the topic](https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling). And you should only need to compute it once per pixel (not per sample), so the performance impact might not be too significant--although you'd have to test to be sure, of course.

Lukas Stockner commented

2024-06-04 01:20:22 +02:00

Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR.

With a proper base-4 Owen scramble it shouldn't matter if you use Hilbert vs a z-curve, since they become equivalent after randomization. So I suspect the root issue is actually the fast hash you're using. Which isn't surprising, since it's not optimized to be high-quality.

Instead of switching to a Hilbert curve you could use the high-quality base-4 Owen scrambling function from my first post on the topic. And you should only need to compute it once per pixel (not per sample), so the performance impact might not be too significant--although you'd have to test to be sure, of course.

Hm, interesting, thanks for that pointer! I've compared the three options (fast hash as currently implemented, tiled 64x64 precomputed Hilbert curve from lukasstockner/blender@591c65bc5e, and slow hash from lukasstockner/blender@72bb5ebde7).

I've tested them in another scene that shows the directional pattern in the fast hash very well. I've attached renders and FFT spectra for comparison (this time, the spectra have a common color map).
Indeed, either of them fixes the directionality. Based on the spectra, the slow hash is properly isotropic, while the Hilbert curve has some remaining bias (but at least it's symmetric w.r.t. the axes now). Visuallu, at least to me, the Hilbert version looks a bit smoother overall.

Timings (just a quick CPU test, not super reliable):

Slow hash: 1:05.54
Slow hash: 1:04.98
Hilbert: 1:05.70

Unfortunately, computing it only once per pixel isn't very practical in Cycles, but the impact doesn't appear to be too bad either way.

> > - Using the Hilbert curve for sampling to avoid directional bias in the noise distribution: Works in initial tests, but needs to be cleaned up. I'd do that as a follow-up PR. > > With a proper base-4 Owen scramble it shouldn't matter if you use Hilbert vs a z-curve, since they become equivalent after randomization. So I suspect the root issue is actually the [fast hash](https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling) you're using. Which isn't surprising, since it's not optimized to be high-quality. > > Instead of switching to a Hilbert curve you could use the high-quality base-4 Owen scrambling function from [my first post on the topic](https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling). And you should only need to compute it once per pixel (not per sample), so the performance impact might not be too significant--although you'd have to test to be sure, of course. Hm, interesting, thanks for that pointer! I've compared the three options (fast hash as currently implemented, tiled 64x64 precomputed Hilbert curve from lukasstockner/blender@591c65bc5e2637231ca16bd8c7b7ff55f6e3e9db, and slow hash from lukasstockner/blender@72bb5ebde79c4040937d2956c4fdb93e6c034e3f). I've tested them in another scene that shows the directional pattern in the fast hash very well. I've attached renders and FFT spectra for comparison (this time, the spectra have a common color map). Indeed, either of them fixes the directionality. Based on the spectra, the slow hash is properly isotropic, while the Hilbert curve has some remaining bias (but at least it's symmetric w.r.t. the axes now). Visuallu, at least to me, the Hilbert version looks a bit smoother overall. Timings (just a quick CPU test, not super reliable): - Slow hash: 1:05.54 - Slow hash: 1:04.98 - Hilbert: 1:05.70 Unfortunately, computing it only once per pixel isn't very practical in Cycles, but the impact doesn't appear to be too bad either way.

fft_fft_radial.png

35 KiB

fft_fft_ts.png

798 B

slowhash.png

3.0 MiB

fasthash.png

3.0 MiB

fft_fft_hilbert.png

1.4 KiB

hilbert.png

3.0 MiB

fft_fft_fasthash.png

1001 B

ts.png

3.0 MiB

fft_fft_slowhash.png

1.2 KiB

Lukas Stockner changed title from ~~WIP: Cycles: Implement blue-noise dithered sampling~~ to Cycles: Implement blue-noise dithered sampling

2024-06-04 01:21:50 +02:00

Brecht Van Lommel commented

2024-06-04 13:15:36 +02:00

@blender-bot build +gpu

Brecht Van Lommel approved these changes 2024-06-04 16:45:41 +02:00

Brecht Van Lommel left a comment

I'm not sure why some of the non-SSS tests are failing, it's not obvious which changes are responsible for that. The noise differences look fine though. Should be ok to just update them.

Lukas Stockner commented

2024-06-05 02:20:04 +02:00

I'm not sure why some of the non-SSS tests are failing, it's not obvious which changes are responsible for that. The noise differences look fine though. Should be ok to just update them.

Looks like the issue is that the previous code would force the sampling pattern to T-S if the debug option was disabled. Therefore, existing files containing a different enum value suddenly start behaving differently.

I'll add versioning code to set all existing files to T-S to match the previous behavior.

> I'm not sure why some of the non-SSS tests are failing, it's not obvious which changes are responsible for that. The noise differences look fine though. Should be ok to just update them. Looks like the issue is that the previous code would force the sampling pattern to T-S if the debug option was disabled. Therefore, existing files containing a different enum value suddenly start behaving differently. I'll add versioning code to set all existing files to T-S to match the previous behavior.

Lukas Stockner force-pushed blue-noise-dithered from 7b2c8618e1 to 86fe14f839

2024-06-05 02:22:44 +02:00

Compare

Lukas Stockner merged commit 5246fb5a57 into main

2024-06-05 02:29:57 +02:00

Lukas Stockner referenced this issue from a commit

2024-06-05 02:29:59 +02:00

Cycles: Implement blue-noise dithered sampling

Lukas Stockner deleted branch blue-noise-dithered

2024-06-05 02:30:00 +02:00

John Kiril Swenson referenced this issue from a commit

2024-06-05 05:27:09 +02:00

Cycles: Implement blue-noise dithered sampling

Weizhen Huang commented

2024-06-05 21:49:28 +02:00

I see weird artifacts with Automatic or Blue-Noise (First) when navigating in the viewport with 1spp, viewed relatively close.
This is the all_light_types regression file.

I see weird artifacts with Automatic or Blue-Noise (First) when navigating in the viewport with 1spp, viewed relatively close. This is the all_light_types regression file.

blue noise problem.mp4

17 MiB

Lukas Stockner commented

2024-06-05 22:19:41 +02:00

I see weird artifacts with Automatic or Blue-Noise (First) when navigating in the viewport with 1spp, viewed relatively close.
This is the all_light_types regression file.

Hm, I can't reproduce that on Linux using CPU, CUDA or Optix.
The only artifacts I see are boundaries in noise level on the sphere, but those are also present with T-S so I assume it's a result of the light tree structure.

Did you make any changes to the scene other than setting the pattern to Automatic?

> I see weird artifacts with Automatic or Blue-Noise (First) when navigating in the viewport with 1spp, viewed relatively close. > This is the all_light_types regression file. Hm, I can't reproduce that on Linux using CPU, CUDA or Optix. The only artifacts I see are boundaries in noise level on the sphere, but those are also present with T-S so I assume it's a result of the light tree structure. Did you make any changes to the scene other than setting the pattern to Automatic?

Weizhen Huang commented

2024-06-05 22:27:29 +02:00

I was using CPU on Mac, also can't reproduce with GPU, maybe something platform-related.

I downloaded the latest build from the website, loaded factory settings, loaded all_light_types.blend, changed viewport spp to 1, changed the pattern to Automatic, then the artifact is there.

I can help with investigating if you can't reproduce it, just thinking maybe you can give a hint where to look at because I have very little knowledge of this technique.

I was using CPU on Mac, also can't reproduce with GPU, maybe something platform-related. I downloaded the latest build from the website, loaded factory settings, loaded all_light_types.blend, changed viewport spp to 1, changed the pattern to Automatic, then the artifact is there. I can help with investigating if you can't reproduce it, just thinking maybe you can give a hint where to look at because I have very little knowledge of this technique.

Lukas Stockner commented

2024-06-05 22:37:25 +02:00

I was using CPU on Mac, also can't reproduce with GPU, maybe something platform-related.

I downloaded the latest build from the website, loaded factory settings, loaded all_light_types.blend, changed viewport spp to 1, changed the pattern to Automatic, then the artifact is there.

I can help with investigating if you can't reproduce it, just thinking maybe you can give a hint where to look at because I have very little knowledge of this technique.

Ah, I missed that you had set the viewport SPP to 1. With that, I can reproduce it as well.
However, it seems that the artifact only appears at coarse viewport resolution - once it switches to full resolution, it's gone. That also explains why it doesn't show up on GPU - it's probably just too fast so it always renders full resolution.

I'll look into it, thanks for the report!

> I was using CPU on Mac, also can't reproduce with GPU, maybe something platform-related. > > I downloaded the latest build from the website, loaded factory settings, loaded all_light_types.blend, changed viewport spp to 1, changed the pattern to Automatic, then the artifact is there. > > I can help with investigating if you can't reproduce it, just thinking maybe you can give a hint where to look at because I have very little knowledge of this technique. Ah, I missed that you had set the viewport SPP to 1. With that, I can reproduce it as well. However, it seems that the artifact only appears at coarse viewport resolution - once it switches to full resolution, it's gone. That also explains why it doesn't show up on GPU - it's probably just too fast so it always renders full resolution. I'll look into it, thanks for the report!

👍 1

Lukas Stockner commented

2024-06-05 23:21:18 +02:00

Found the issue: During viewport navigation, the number of samples would be set to 4 even if the configured number was lower. In this case, the configured number was 1, so blue_noise_sequence_length was 0, so each of the samples past 1 would use the same sequence, which of course causes obvious artifacts.

I feel like going past the configured number is never a good idea, so I've just pushed a fix to clamp navigation SPP to that.

Found the issue: During viewport navigation, the number of samples would be set to 4 even if the configured number was lower. In this case, the configured number was 1, so `blue_noise_sequence_length` was 0, so each of the samples past 1 would use the same sequence, which of course causes obvious artifacts. I feel like going past the configured number is never a good idea, so I've just pushed a fix to clamp navigation SPP to that.

Weizhen Huang commented

2024-06-05 23:34:33 +02:00

Thanks, I can verify the problem is fixed.

Nathan Vegdahl commented

2024-06-06 10:15:54 +02:00

@LukasStockner

Timings (just a quick CPU test, not super reliable):

Ah, cool. Nice to see that the performance is (probably) fine. If that is indeed the case, then I strongly recommend going with the high-quality, slower hash rather than using a Hilbert curve. The former directly addresses the real issue, whereas the latter just masks it.

@LukasStockner > Timings (just a quick CPU test, not super reliable): Ah, cool. Nice to see that the performance is (probably) fine. If that is indeed the case, then I strongly recommend going with the high-quality, slower hash rather than using a Hilbert curve. The former directly addresses the real issue, whereas the latter just masks it.

Weizhen Huang commented

2024-06-06 16:48:02 +02:00

@LukasStockner Very similar to the problem I reported before, I believe path_branched_rng_XD() doesn't work well with Blue Noise (First) with the current implementation. As I understand the branching just tries to get samples past 1 even if the spp is set to 1.

In my restir branch I sample a few initial lights using
float3 rand_light = path_branched_rng_3D(kg, rng_state, i, reservoir.num_light_samples, PRNG_LIGHT);
and pick one from them. If I print rand_light, as you can see the samples past 1 are the same for all pixels.

pixel 0
rand_light: 0.22915691 0.36352929 0.67603362
rand_light: 0.39805114 0.56188452 0.96282297
rand_light: 0.62583876 0.04430021 0.46149436

pixel 1
rand_light: 0.91203868 0.35224625 0.23113325
rand_light: 0.39805114 0.56188452 0.96282297
rand_light: 0.62583876 0.04430021 0.46149436

pixel 2
rand_light: 0.66421771 0.84099811 0.50901717
rand_light: 0.39805114 0.56188452 0.96282297
rand_light: 0.62583876 0.04430021 0.46149436

I wonder if other configurations of Blue Noise works well with path_branched_rng_XD(), as it feels like the sequence is determined by the spp, but it is not obvious to me.

@LukasStockner Very similar to the problem I reported before, I believe `path_branched_rng_XD()` doesn't work well with Blue Noise (First) with the current implementation. As I understand the branching just tries to get samples past 1 even if the spp is set to 1. In my restir branch I sample a few initial lights using ```float3 rand_light = path_branched_rng_3D(kg, rng_state, i, reservoir.num_light_samples, PRNG_LIGHT);``` and pick one from them. If I print `rand_light`, as you can see the samples past 1 are the same for all pixels. ``` pixel 0 rand_light: 0.22915691 0.36352929 0.67603362 rand_light: 0.39805114 0.56188452 0.96282297 rand_light: 0.62583876 0.04430021 0.46149436 pixel 1 rand_light: 0.91203868 0.35224625 0.23113325 rand_light: 0.39805114 0.56188452 0.96282297 rand_light: 0.62583876 0.04430021 0.46149436 pixel 2 rand_light: 0.66421771 0.84099811 0.50901717 rand_light: 0.39805114 0.56188452 0.96282297 rand_light: 0.62583876 0.04430021 0.46149436 ``` I wonder if other configurations of Blue Noise works well with `path_branched_rng_XD()`, as it feels like the sequence is determined by the spp, but it is not obvious to me.

Weizhen Huang commented

2024-06-11 19:42:49 +02:00

@LukasStockner path_branched_rng_2D() is used in ao and bevel, so for example if you open the regression file shader/ambient_occlusion.blend, set spp to 1 and set Samples in the shader node to 2 it would look weird.

@LukasStockner `path_branched_rng_2D()` is used in ao and bevel, so for example if you open the regression file `shader/ambient_occlusion.blend`, set spp to 1 and set `Samples` in the shader node to 2 it would look weird.

Brecht Van Lommel referenced this pull request

2024-06-11 19:49:02 +02:00

Cycles: blue-noise dithered sampling #78011

Arjjacks commented

2024-06-17 11:42:12 +02:00

First-time contributor

This is probably something of a long shot, but looking at how automatic is going to work, i.e. blue noise distributed for the first sample before switching to TS for the all the remaining samples, which makes sense: the bare minimum samples every pixel can have is 1, but what about when a minimum number of samples is set by the user? Because if a user sets a minimum amount of samples for their render, then every pixel will receive at minimum that many samples, before adaptive sampling takes over and makes the sample count arbitrary.

Would that work? Could the minimum samples be used as the threshold for switching from blue noise distributed to the classic TS? Presuming every pixel receives that many minimum samples concurrently first, of course. And obviously if the minimum is set to 0 (or 1), automatic would default back to switching at 1SPP.

Or am I missing something (in all likelihood)?

This is probably something of a long shot, but looking at how automatic is going to work, i.e. blue noise distributed for the first sample before switching to TS for the all the remaining samples, which makes sense: the bare minimum samples every pixel can have is 1, but what about when a _minimum_ number of samples is set by the user? Because if a user sets a minimum amount of samples for their render, then every pixel will receive at minimum that many samples, before adaptive sampling takes over and makes the sample count arbitrary. Would that work? Could the minimum samples be used as the threshold for switching from blue noise distributed to the classic TS? Presuming every pixel receives that many minimum samples concurrently first, of course. And obviously if the minimum is set to 0 (or 1), automatic would default back to switching at 1SPP. Or am I missing something (in all likelihood)?

Alaska commented

2024-06-17 13:03:44 +02:00

The automatic mode uses the blue noise sequence for the first sample in the viewport. This is so that as you navigate around the viewport or are adjusting things in the viewport, you a blue noise distrobution.

If the user was to set a minimum sample count of say 64, and Cycles uses a blue noise sequence for the first 64 samples in the viewport, then switched to Tabulated Sobol, then users won't get the blue noise benefit while navigating or editing their scene in the viewport because the blue noise effect will only appear after using all 64 samples. This makes it significantly less useful while adjusting things in the viewport.

As for using a system that switches between blue noise and tabulated sobol based on the min sample count set by the user in the final render. I can't comment on whether or not that would result in a benefit or if it would detrimental.

The automatic mode uses the blue noise sequence for the first sample in the viewport. This is so that as you navigate around the viewport or are adjusting things in the viewport, you a blue noise distrobution. If the user was to set a minimum sample count of say 64, and Cycles uses a blue noise sequence for the first 64 samples in the viewport, then switched to Tabulated Sobol, then users won't get the blue noise benefit while navigating or editing their scene in the viewport because the blue noise effect will only appear after using all 64 samples. This makes it significantly less useful while adjusting things in the viewport. --- As for using a system that switches between blue noise and tabulated sobol based on the min sample count set by the user in the final render. I can't comment on whether or not that would result in a benefit or if it would detrimental.

Arjjacks commented

2024-06-17 13:47:43 +02:00

First-time contributor

@Alaska Yes, you're quite right, I should've noted that this idea only has any weight for final renders, not viewport navigation, where any potential benefits are obviously lost. It may also be worth noting that scrambling distance should similarly be disabled or ignored internally until the minimum sampling threshold is exceeded and blue noise swaps to TS. Assuming the idea itself is indeed feasible for final renders, naturally.

Arjjacks commented

2024-06-21 15:04:55 +02:00

First-time contributor

@Alaska On the other hand, a minimum sample count switch in the viewport is not necessarily a bad thing either, at least not always. Setting a minimum number of samples now before the denoiser kicks in in the viewport is already a common practice, as allowing the denoiser to gather some samples first before engaging in denoising can lead to crisper results once it does, making the tradeoff of waiting worth it, to some users. Up until the minimum sample threshold is exceeded, the noisy image is no more ideal to interact with than while waiting for blue noise to finish gathering samples for its purposes would be.

Add to this that a great many users, myself among them, have very powerful GPU's that often blitz through dozens of samples in milliseconds, making the wait for 64 or even 128 samples before blue noise processes something of an eye-blink. The minimum sample count can always be set back down to 1 as well, if the user deems waiting too much of an issue.

I'm guessing there are more technical hurdles lying unseen beneath the surface here than I can grasp, but the possibility that there aren't and this is, in fact, feasible keeps gnawing at me.

@Alaska On the other hand, a minimum sample count switch in the viewport is not necessarily a bad thing either, at least not always. Setting a minimum number of samples now before the denoiser kicks in in the viewport is already a common practice, as allowing the denoiser to gather some samples first before engaging in denoising can lead to crisper results once it does, making the tradeoff of waiting worth it, to some users. Up until the minimum sample threshold is exceeded, the noisy image is no more ideal to interact with than while waiting for blue noise to finish gathering samples for its purposes would be. Add to this that a great many users, myself among them, have very powerful GPU's that often blitz through dozens of samples in milliseconds, making the wait for 64 or even 128 samples before blue noise processes something of an eye-blink. The minimum sample count can always be set back down to 1 as well, if the user deems waiting too much of an issue. I'm guessing there are more technical hurdles lying unseen beneath the surface here than I can grasp, but the possibility that there aren't and this is, in fact, feasible keeps gnawing at me.

Brecht Van Lommel commented

2024-06-21 15:13:42 +02:00

The issue is that combining blue noise and regular samples in a single render tends to be worse than not using blue noise at all.