Cycles: Implement blue-noise dithered sampling #118479
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
8 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#118479
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "LukasStockner/blender:blue-noise-dithered"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This patch implements blue-noise dithered sampling as described by @nathanvegdahl here, which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels".
The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.
This results in a more high-frequency noise distribution, which appears smoother despite not being less noisy overall.
The main limitation at the moment is that the improvement is only clear if the full sample amount is used per pixel, so interactive preview rendering and adaptive sampling will not receive the benefit. One exception to this is that when using the new "Automatic" setting, the first sample in interactive rendering will also be blue-noise-distributed.
The sampling mode option is now exposed in the UI, with the three options being Blue Noise (the new mode), Classic (the previous Tabulated Sobol method) and the new default, Automatic (blue noise, with the additional property of ensuring the first sample is also blue-noise-distributed in interactive rendering). When debug mode is enabled, additional options appear, such as Sobol-Burley.
Note that the scrambling distance option is not compatible with the blue-noise pattern.
Here's the same test scene as in the linked article, rendered at 1spp:
Blue Noise dithered sampling replaces Sobol Burley sampling. The name of the sampler should be updated to reflect this, or it should be separated out as it's "own sampler" or a option.
Along with that, the Sobol Burley sampler is hidden behind a debug menu. Ideally this should be moved out of the debug menu if you want Blue Noise dithering to be accessible to the average user.
Sorry for "reviewing" minor features while you probably want feedback on the more important stuff.
Results look great. I hope we can make this the default, and make it work well enough that the sampler choice can remain a debug option.
Yes, this is an inherent limitation of the technique, unfortunately. The blue noise properties only manifest when you've used all samples allocated to a pixel. I've investigated making partial sample counts also have blue noise properties with this technique, but no luck so far.
My only reservation about making this the default sampler—and the reason I haven't implemented this for Cycles already myself—is precisely because of this behavior. 4 samples per pixel with the max set to 4 is substantially different than 4 samples per pixel with the max set to 256, for example. And this could make the sampling settings counter-intuitive for users. That combined with the primary benefit of this technique being at low sample counts, it's not clear to me that the benefits of the technique will outweigh that potential confusion.
(Having said that, I of course like the technique, and have spent a substantial chunk of time working to improve it. But I'm just trying to be practical about the concrete benefits to Cycles users as the technique currently stands.)
@ -28,0 +29,4 @@
* Performs base-4 Owen scrambling on a reversed-bit unsigned integer.
*
* This is equivalent to the Laine-Karras permutation, but much higher
* quality. See https://psychopath.io/post/2022_08_14_a_fast_hash_for_base_4_owen_scrambling
I suspect this is just a copy/paste oversight, but just want to note that this bit:
Is not true of the base-4 hash. It is not equivalent to the Laine-Karras permutation (which is base 2), and is also not especially high quality, as I outlined in the linked post.
I think it's useful even with the limitations. It seems quite reasonable for someone to set up a viewport render or or quick preview render to use e.g. 4 or 16 samples and benefit from this. It may be unintuitive, but for me it's not enough of a reason to make low sample renders more noisy than they could be.
I just wanted to note down some issues that has become more apparent with testing this pull request. The issues also applies to main and may need a bit of work to fix, so it may be best to deal with this in a seperate pull request.
As mentioned already, this sampling pattern works best when the max sample count, and the samples used for rendering, are the same (E.G. Set to 16 samples per pixel, and all 16 are used). Due to the current setup of the Cycles viewport, this behaviour causes some issues.
While navigating/updating the Cycles viewport, Cycles will either use 1, 2, 3, or 4 samples per pixel depending on the resolution of the viewport. However the sampling pattern being used is the one for normal viewport rendering, which usually means the sample count is incorrect for navigation, and the results you get are sub-par (E.G. Viewport is set to 1024 SSP, but while navigating, only 4 SSP are being used from that sequence). Maybe while Cycles is navigating around the viewport, it should use a lower sample count sequence to try and get that blue noise benefit?
In the Cycles viewport, the sample count, can change without viewport rendering restarting. For example, the user can set their sample count to 4 SSP, render those 4 samples, then increase it to 16, and Cycles will just render 12 SSP on top of the existing 4 SSP. This behaviour combined with how these sampling patterns work can result in low quality results. Luckily this isn't too much of an issue, as soon as the viewport rendering restarts (E.G. A camera/object moves, or a material is modified), you start from sample 1 again with the right sequence. But it's still something to consider. Maybe viewport rendering should restart whenever the sample count is changed?
The
sample offset
option can end up reducing the effectiveness of this technique if used improperly. For example, if someone sets their sample count to 4, then set their sample offset to a non integer multiple of 4, then they lose some of the blue noiseness of the render.There is talk of this becoming the default sampling pattern, and other sampling patterns are left behind a debug menu. I have some questions related to this.
Some of these are more general questions, feel free to shift the discussion elsewhere.
As a Render TD and someone who has used dithered sampling in production (Man In The High Castle, Silicon Valley) back when Lukas first implemented it in 2015-2016. I request that this be a feature exposed to the users. Hiding it or automating it serves no benefit for people who are truly trying to squeeze the performance out of cycles and hit budgets constraints.
I have production scenes that are below 32 samples, even a few below 16. It’s no easy task getting to these numbers.. these days I work in milliseconds not seconds, hitting upwards of 60FPS out of cycles in some scenarios on a single PC (Yes final renders with a frame saved to disk). Even on CPU!
I compete against unreal and other real time engines taking over the market using our beloved cycles in stock blender builds.
Let the TDs do their job, we are artists too.
Stefan Werner has a few patches with dithered sampling working with different sampling methods.
Okay, I've looked into a few ways to improve the behavior here, but not with much success.
As a summary of the requirements:
The methods I have kept around for testing are:
I've spent way too long staring at noise patterns in renders, so I figure I'll just push a version with all four included and trigger the buildbot so people can give feedback.
My pick at this point would be to have three options in the final enum (not hidden behind debug flags):
One thing to consider would be to only use blue noise in the Automatic option if adaptive sampling is off, but I think the pure blue noise sequence is never noticeably worse than Tabulated Sobol even if you only use a prefix.
4b20562c7f
to6f0f66e439
@blender-bot package +gpu
Package build started. Download here when ready.
@LukasStockner would you like feedback here, or on a devtalk thread (to avoid clutter in the pull request)
I think it's fine here, no need for a devtalk thread.
This sounds good to me.
There's still the issue that during viewport navigating, the sample count changes between 1-4 SSP. So if only the first one is blue noise, and you're using 4 SSP for navigation, then you don't see much of the benefit.
If denoising is enabled, then 1 SSP is used while navigating, so it helps there. But in my testing, 1 SSP blue noise with denoising can sometimes result in the noise being visible through the denoiser, while tabulated sobol + denoising is just blobby (which I think is preferred here).
My suggestion is:
With the "First" option (in it's current form), we can't use "Scrambling Distance" on the Tabulated Sobol samples. This is probably alright, but it might be something to consider enabling?
Maybe you could re-add the original Sobol-Burley behind the debug option? Other than that it seems alright.
Yeah, that's been the struggle for me as well. But I think Brecht's earlier point about this being useful regardless is a good one.
I like that set of options, but I would suggest using "round" as you described rather than "pure" for the
Blue-Noise
andAutomatic
options. It should play nicer with Owen scrambling, IIRC.Something else I've been wondering about that is relevant but not specific to this PR (and perhaps should be split into a separate discussion) is how we deal with distributed rendering. For example, rendering 256 samples on one machine, 256 on another, etc. and then merging them all afterwards.
Specifically, although Cycles does have the
Sample Offset
parameter for this use case, that only does the "right" thing with sequences that don't change when the sample count changes.For Sobol-Burley that holds true. But for Blue Noise Sobol-Burley it doesn't: if on one machine you render 256 samples with offset 0, and on another machine 256 samples with offset 256, intuitively you would expect those machines to be rendering the 1st and 2nd set of 256 samples from the same sequence. But in fact the first one will be rendering from a sequence that distributes sets of 256 samples to each pixel, and the latter will be rendering from a sequence that distributes sets of 512 samples to each pixel.
These two different sequences on the two different machines are not stratified with each other. In practice it should(?) still converge, but it will take more samples than it would if properly stratified. And in any case, will certainly not be blue-noise distributed anymore.
(Also, come to think of it, I think we might(?) have a similar problem with Tabulated Sobol due to one of the optimizations we added: it generates a different-sized table of samples depending on the SPP setting. And I don't think(?) the sampling code ensures that the tables are used such that smaller tables behave as a prefix of larger ones.)
So it might be a good idea to add another parameter that pairs with
Sample Offset
: the number of samples from the larger sequence to render. That way the normal render settings specify the total sample count for the completed merged render, andSample Offset
+Sample Subset Count
(needs a better name) together handle the distributed-render use case. And then Cycles has enough information to ensure that the sampling is coordinated properly between all the machines."Round" appears to perform worse than "pure" from what I can see.
Two examples here (BMW at 13spp and Cube Diorama at 6spp):
6f0f66e439
to8e0810650d
Please disregard the comparison above, that was using a wrong UI enum so it's actually comparing "pure" to "cascade". I've got a better comparison using FFT-based noise spectra now, I'll post these next.
Ah, got it. I was indeed surprised at the results with round! Of course, my intuitions are often wrong anyway, so I was prepared to believe it. Experiments are always better than assuming. :-)
In any case, looking forward to seeing the fixed results. Thanks for taking the time to do this!
Alright, here's the big data dump.
I've used five test locations: The side door (
bmwdoor
) and the windshield (bmwwindow
) of the BMW scene, the white wall with direct lighting (dioramadirect
) and the green wall with indirect lighting (dioramaindirect
) in the Diorama scene, and a simple test scene with a partially occluded area lamp (flat
).In all scenes, the methodology is the same:
The Python script to perform all the analysis on the frames is attached here: fft.py
Here's the results - first of all, the PSNR (to give an indication of the overall noise level):
bmwdoor
bmwwindow
dioramadirect
dioramaindirect
flat
Note that all scenes intentionally use non-power-of-2 SPPs to demonstrate the difference.
Next, the radial plots:
bmwdoor
bmwwindow
dioramadirect
dioramaindirect
flat
Finally, I've attached the 2D spectra, but I'm not going to bother putting them all in a list. Note that the color coding is per-image, don't compare quantitatively between images.
Observations:
dioramadirect
and doesn't really do much inflat
. Strange.dioramadirect
, where it causes a lot of extra high-frequency noise. Strange.dioramaindirect
that causes a peak in the spectrum. Might be numerical-precision-related due to how far it's zoomed in.8e0810650d
tob058a3f2b8
New update, this time:
main
The remaining points are:
@ -462,3 +490,3 @@
description="Random sampling pattern used by the integrator",
items=enum_sampling_pattern,
default='TABULATED_SOBOL',
default=6,
AUTOMATIC is 5?
Thanks, fixed.
With a proper base-4 Owen scramble it shouldn't matter if you use Hilbert vs a z-curve, since they become equivalent after randomization. So I suspect the root issue is actually the fast hash you're using. Which isn't surprising, since it's not optimized to be high-quality.
Instead of switching to a Hilbert curve you could use the high-quality base-4 Owen scrambling function from my first post on the topic. And you should only need to compute it once per pixel (not per sample), so the performance impact might not be too significant--although you'd have to test to be sure, of course.
Hm, interesting, thanks for that pointer! I've compared the three options (fast hash as currently implemented, tiled 64x64 precomputed Hilbert curve from lukasstockner/blender@591c65bc5e, and slow hash from lukasstockner/blender@72bb5ebde7).
I've tested them in another scene that shows the directional pattern in the fast hash very well. I've attached renders and FFT spectra for comparison (this time, the spectra have a common color map).
Indeed, either of them fixes the directionality. Based on the spectra, the slow hash is properly isotropic, while the Hilbert curve has some remaining bias (but at least it's symmetric w.r.t. the axes now). Visuallu, at least to me, the Hilbert version looks a bit smoother overall.
Timings (just a quick CPU test, not super reliable):
Unfortunately, computing it only once per pixel isn't very practical in Cycles, but the impact doesn't appear to be too bad either way.
WIP: Cycles: Implement blue-noise dithered samplingto Cycles: Implement blue-noise dithered sampling@blender-bot build +gpu
I'm not sure why some of the non-SSS tests are failing, it's not obvious which changes are responsible for that. The noise differences look fine though. Should be ok to just update them.
Looks like the issue is that the previous code would force the sampling pattern to T-S if the debug option was disabled. Therefore, existing files containing a different enum value suddenly start behaving differently.
I'll add versioning code to set all existing files to T-S to match the previous behavior.
7b2c8618e1
to86fe14f839
I see weird artifacts with Automatic or Blue-Noise (First) when navigating in the viewport with 1spp, viewed relatively close.
This is the all_light_types regression file.
Hm, I can't reproduce that on Linux using CPU, CUDA or Optix.
The only artifacts I see are boundaries in noise level on the sphere, but those are also present with T-S so I assume it's a result of the light tree structure.
Did you make any changes to the scene other than setting the pattern to Automatic?
I was using CPU on Mac, also can't reproduce with GPU, maybe something platform-related.
I downloaded the latest build from the website, loaded factory settings, loaded all_light_types.blend, changed viewport spp to 1, changed the pattern to Automatic, then the artifact is there.
I can help with investigating if you can't reproduce it, just thinking maybe you can give a hint where to look at because I have very little knowledge of this technique.
Ah, I missed that you had set the viewport SPP to 1. With that, I can reproduce it as well.
However, it seems that the artifact only appears at coarse viewport resolution - once it switches to full resolution, it's gone. That also explains why it doesn't show up on GPU - it's probably just too fast so it always renders full resolution.
I'll look into it, thanks for the report!
Found the issue: During viewport navigation, the number of samples would be set to 4 even if the configured number was lower. In this case, the configured number was 1, so
blue_noise_sequence_length
was 0, so each of the samples past 1 would use the same sequence, which of course causes obvious artifacts.I feel like going past the configured number is never a good idea, so I've just pushed a fix to clamp navigation SPP to that.
Thanks, I can verify the problem is fixed.
@LukasStockner
Ah, cool. Nice to see that the performance is (probably) fine. If that is indeed the case, then I strongly recommend going with the high-quality, slower hash rather than using a Hilbert curve. The former directly addresses the real issue, whereas the latter just masks it.
@LukasStockner Very similar to the problem I reported before, I believe
path_branched_rng_XD()
doesn't work well with Blue Noise (First) with the current implementation. As I understand the branching just tries to get samples past 1 even if the spp is set to 1.In my restir branch I sample a few initial lights using
float3 rand_light = path_branched_rng_3D(kg, rng_state, i, reservoir.num_light_samples, PRNG_LIGHT);
and pick one from them. If I print
rand_light
, as you can see the samples past 1 are the same for all pixels.I wonder if other configurations of Blue Noise works well with
path_branched_rng_XD()
, as it feels like the sequence is determined by the spp, but it is not obvious to me.@LukasStockner
path_branched_rng_2D()
is used in ao and bevel, so for example if you open the regression fileshader/ambient_occlusion.blend
, set spp to 1 and setSamples
in the shader node to 2 it would look weird.This is probably something of a long shot, but looking at how automatic is going to work, i.e. blue noise distributed for the first sample before switching to TS for the all the remaining samples, which makes sense: the bare minimum samples every pixel can have is 1, but what about when a minimum number of samples is set by the user? Because if a user sets a minimum amount of samples for their render, then every pixel will receive at minimum that many samples, before adaptive sampling takes over and makes the sample count arbitrary.
Would that work? Could the minimum samples be used as the threshold for switching from blue noise distributed to the classic TS? Presuming every pixel receives that many minimum samples concurrently first, of course. And obviously if the minimum is set to 0 (or 1), automatic would default back to switching at 1SPP.
Or am I missing something (in all likelihood)?
The automatic mode uses the blue noise sequence for the first sample in the viewport. This is so that as you navigate around the viewport or are adjusting things in the viewport, you a blue noise distrobution.
If the user was to set a minimum sample count of say 64, and Cycles uses a blue noise sequence for the first 64 samples in the viewport, then switched to Tabulated Sobol, then users won't get the blue noise benefit while navigating or editing their scene in the viewport because the blue noise effect will only appear after using all 64 samples. This makes it significantly less useful while adjusting things in the viewport.
As for using a system that switches between blue noise and tabulated sobol based on the min sample count set by the user in the final render. I can't comment on whether or not that would result in a benefit or if it would detrimental.
@Alaska Yes, you're quite right, I should've noted that this idea only has any weight for final renders, not viewport navigation, where any potential benefits are obviously lost. It may also be worth noting that scrambling distance should similarly be disabled or ignored internally until the minimum sampling threshold is exceeded and blue noise swaps to TS. Assuming the idea itself is indeed feasible for final renders, naturally.
@Alaska On the other hand, a minimum sample count switch in the viewport is not necessarily a bad thing either, at least not always. Setting a minimum number of samples now before the denoiser kicks in in the viewport is already a common practice, as allowing the denoiser to gather some samples first before engaging in denoising can lead to crisper results once it does, making the tradeoff of waiting worth it, to some users. Up until the minimum sample threshold is exceeded, the noisy image is no more ideal to interact with than while waiting for blue noise to finish gathering samples for its purposes would be.
Add to this that a great many users, myself among them, have very powerful GPU's that often blitz through dozens of samples in milliseconds, making the wait for 64 or even 128 samples before blue noise processes something of an eye-blink. The minimum sample count can always be set back down to 1 as well, if the user deems waiting too much of an issue.
I'm guessing there are more technical hurdles lying unseen beneath the surface here than I can grasp, but the possibility that there aren't and this is, in fact, feasible keeps gnawing at me.
The issue is that combining blue noise and regular samples in a single render tends to be worse than not using blue noise at all.
@brecht Ah, I was afraid there would be something insurmountable like that. That is a pity. Oh well.