This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16909
This is both a cleanup and a preparation for the Principled v2 changes.
Notable changes:
- Clearcoat weight is now folded into the closure weight, there's no reason
to track this separately.
- There's a general-purpose helper for computing a Closure's albedo, which is
currently used by the denoising albedo and diffuse/gloss/transmission color
passes.
- The d/g/t color passes didn't account for closure albedo before, this means
that e.g. metallic shaders with Principled v2 now have their color texture
included in the glossy color pass. Also fixes T104041 (sheen albedo).
- Instead of precomputing and storing the albedo during shader setup, compute
it when needed. This is technically redundant since we still need to compute
it on shader setup to adjust the sample weight, but the operation is cheap
enough that freeing up the storage seems worth it.
- Future changes (Principled v2) are easier to integrate since the Fresnel
handling isn't all over the place anymore.
- Fresnel handling in the Multiscattering GGX code is still ugly, but since
removing that entirely is the next step, putting effort into cleaning it up
doesn't seem worth it.
- Apart from the d/g/t color passes, no changes to render results are expected.
Differential Revision: https://developer.blender.org/D17101
The background evaluation samples the sky discretely, so if the sun is
too small, it can be missed in the evaluation. To solve this, the sun is
ignored during the background evaluation and its contribution is
computed separately.
wi is the viewing direction, and wo is the illumination direction. Under this notation, BSDF sampling always samples from wi and outputs wo, which is consistent with most of the papers and mitsuba. This order is reversed compared with PBRT, although PBRT also traces from the camera.
When rendering in the viewport (or probably on instanced objects, but I didn't
test that), emissive objects whose scale is negative give the wrong value on the
"backfacing" input when multiple sampling is enabled.
The underlying problem was a corner case in how normal transformation is handled,
which is generally a bit messy.
From what I can tell, the pattern appears to be:
- If you first transform vertices to world space and then compute the normal from
them (as triangle light samping, MNEE and light tree do), you need to flip
whenever the transform has negative scale regardless of whether the transform
has been applied
- If you compute the normal in object space and then transform it to world space
(as the regular shader_setup_from_ray path does), you only need to flip if the
transform was already applied and was negative
- If you get the normal from a local intersection result (as bevel and SSS do),
you only need to flip if the transform was already applied and was negative
- If you get the normal from vertex normals, you don't need to do anything since
the host-side code does the flip for you (arguably it'd be more consistent to
do this in the kernel as well, but meh, not worth the potential slowdown)
So, this patch fixes the logic in the triangle emission code.
Also, turns out that the MNEE code had the same problem and was also having
problems in the viewport on negative-scale objects, this is also fixed now.
Differential Revision: https://developer.blender.org/D16952
At the first bounce, the diffuse/glossy/transmission weights are stored so that
contributions along the path can be split into the d/g/t indirect passes.
However, volume bounces always set the weight even at indirect bounces, so
even paths that had their first bounce on a purely glossy object would suddenly
start counting towards the diffuse indirect pass after a secondary volume bounce.
Partially addresses T72011.
The problem here is that the previous barycentric clamping did not deal well
with skinny triangles and would end up generating "sub-pixel jittering"
locations that were actually >20 pixels away.
Differential Revision: https://developer.blender.org/D16727
The first two dimensions of scrambled, shuffled Sobol and shuffled PMJ02 are
equivalent, so this makes no real difference for the first two dimensions.
But Sobol allows us to naturally extend to more dimensions.
Pretabulated Sobol is now always used, and the sampling pattern settings is now
only available as a debug option.
This in turn allows the following two things (also implemented):
* Use proper 3D samples for combined lens + motion blur sampling. This
notably reduces the noise on objects that are simultaneously out-of-focus
and motion blurred.
* Use proper 3D samples for combined light selection + light sampling.
Cycles was already doing something clever here with 2D samples, but using
3D samples is more straightforward and avoids overloading one of the
dimensions.
In the future this will also allow for proper sampling of e.g. volumetric
light sources and other things that may need three or four dimensions.
Differential Revision: https://developer.blender.org/D16443
Uses a light tree to more effectively sample scenes with many lights. This can
significantly reduce noise, at the cost of a somewhat longer render time per
sample.
Light tree sampling is enabled by default. It can be disabled in the Sampling >
Lights panel. Scenes using light clamping or ray visibility tricks may render
different as these are biased techniques that depend on the sampling strategy.
The implementation is currently disabled on AMD HIP. This is planned to be fixed
before the release.
Implementation by Jeffrey Liu, Weizhen Huang, Alaska and Brecht Van Lommel.
Ref T77889
This was not working well in non-trivial scenes before the light tree, and now
it is even harder to make it work well with the light tree. It would average the
with equal weight for every light object regardless of intensity or distance, and
be quite noisy due to not working with multiple importance sampling.
We may restore this if were enough good use cases for the previous implementation,
but let's wait and see what the feedback is.
Some uses cases for this have been replaced by the shadow catcher passes, which
did not exist when this was added.
Ref T77889
Materials now have an enum to set the emission sampling method, to be
either None, Auto, Front, Back or Front & Back. This replace the
previous "Multiple Importance Sample" option.
Auto is the new default, and uses a heuristic to estimate the emitted
light intensity to determine of the mesh should be considered as a light
for sampling. Shaders sometimes have a bit of emission but treating them
as a light source is not worth the memory/performance overhead.
The Front/Back settings are not important yet, but will help when a
light tree is added. In that case setting emission to Front only on
closed meshes can help ignore emission from inside the mesh interior that
does not contribute anything.
Includes contributions by Brecht Van Lommel and Alaska.
Ref T77889
* Split light types into own files, move light type specific code from
light tree and MNEE.
* Move flat light distribution code into own kernel file and host side
building function, in preparation of light tree addition. Add light/sample.h
as main entry point to kernel light sampling.
* Better separate calculation of pdf for selecting a light, and pdf for
sampling a point on the light. The selection pdf is now also stored in
LightSampling for MNEE to correctly recalculate the full pdf when the
shading position changes but the point on the light remains fixed.
* Improvement to kernel light storage, using packed_float3, better variable
names, etc.
Includes contributions by Brecht Van Lommel and Weizhen Huang.
Ref T77889
The wrong guiding distribution was used when direct and indirect light
scattering happened at different locations. Now use a different distribution
for each location.
Recording is not quite correct since OpenPGL does not support spliting the
path like this, instead recording at the start of the volume ray. In practice
this seems to make little difference.
Differential Revision: https://developer.blender.org/D16448
This patch generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.
Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.
Maniphest Tasks: T101222
Differential Revision: https://developer.blender.org/D15902
This patch enables MNEE on macOS >= 13. There was an inefficiency in the calculation of spill requirements, fixed as of macOS 13. This patch also adds a temporary inlining workaround for a Metal compiler bug which causes `mnee_compute_constraint_derivatives` to behave incorrectly.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16235
Changing volume parameters during rendering could cause a crash
when guiding was enabled. It was due to an unintialized state paramter
at the beginning of the path tracing process.
In addition guiding is disabled when dealing with almost delta volumes
(i.e., g close to 1.0 or -1.0).
Previously it would bake viewed from above the surface. The new option can be
useful when the baked result is meant to be viewed from a fixed viewpoint or
with limited camera motion.
Some effort is made to give a continuous reflection on parts of the surface
invisible to the camera, but this is necessarily only a rough approximation.
Differential Revision: https://developer.blender.org/D15921
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.
This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.
The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.
On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.
The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.
At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.
Ref T92571
Differential Revision: https://developer.blender.org/D15286
Simplifies code overall to do it inside the eval function, most of the BSDFs
already compute the dot product.
The refactoring in bsdf_principled_hair_eval() was needed to avoid a HIP
compiler bug. Cause is unclear, just changing the implementation enough
is meant to sidestep it.
Ref T92571, D15286
* Return roughness and IOR for BSDF sampling
* Add functions to query IOR and label for given BSDF
* Default IOR to 1.0 instead of 0.0 for BSDFs that don't use it
* Ensure pdf >= 0.0 in case of numerical precision issues
Ref T92571, D15286
Cleans up the file structure to be more similar to that of the SVM
and also makes it possible to build kernels with OSL support, but
without having to include SVM support.
This patch was split from D15902.
Differential Revision: https://developer.blender.org/D15949
The multi-dimensional Sobol pattern required us to carefully use as low
dimensions as possible, as quality goes down in higher dimensions. Now that we
have two sampling patterns that are at least as good, there is no need to keep
it around and the implementation can be simplified.
Differential Revision: https://developer.blender.org/D15788
Fix two issues in the previous implementation:
* Only power-of-two prefixes were progressively stratified, not suffixes.
This resulted in unnecessarily increased noise when using non-power-of-two
sample counts.
* In order to try to get away with just a single sample pattern, the code
used a combination of sample index shuffling and Cranley-Patterson rotation.
Index shuffling is normally fine, but due to the sample patterns themselves
not being quite right (as described above) this actually resulted in
additional increased noise. Cranley-Patterson, on the other hand, always
increases noise with randomized (t,s) nets like PMJ02, and should be avoided
with these kinds of sequences.
Addressed with the following changes:
* Replace the sample pattern generation code with a much simpler algorithm
recently published in the paper "Stochastic Generation of (t, s) Sample
Sequences". This new implementation is easier to verify, produces fully
progressively stratified PMJ02, and is *far* faster than the previous code,
being O(N) in the number of samples generated.
* It keeps the sample index shuffling, which works correctly now due to the
improved sample patterns. But it now uses a newer high-quality hash instead
of the original Laine-Karras hash.
* The scrambling distance feature cannot (to my knowledge) be implemented with
any decorrelation strategy other than Cranley-Patterson, so Cranley-Patterson
is still used when that feature is enabled. But it is now disabled otherwise,
since it increases noise.
* In place of Cranley-Patterson, multiple independent patterns are generated
and randomly chosen for different pixels and dimensions as described in the
original PMJ paper. In this patch, the pattern selection is done via
hash-based shuffling to ensure there are no repeats within a single pixel
until all patterns have been used.
The combination of these fixes brings the quality of Cycles' PMJ sampler in
line with the previously submitted Sobol-Burley sampler in D15679. They are
essentially indistinguishable in terms of quality/noise, which is expected
since they are both randomized (0,2) sequences.
Differential Revision: https://developer.blender.org/D15746
Based on the paper "Practical Hash-based Owen Scrambling" by Brent Burley,
2020, Journal of Computer Graphics Techniques.
It is distinct from the existing Sobol sampler in two important ways:
* It is Owen scrambled, which gives it a much better convergence rate in many
situations.
* It uses padding for higher dimensions, rather than using higher Sobol
dimensions directly. In practice this is advantagous because high-dimensional
Sobol sequences have holes in their sampling patterns that don't resolve
until an unreasonable number of samples are taken. (See Burley's paper for
details.)
The pattern reduces noise in some benchmark scenes, however it is also slower,
particularly on the CPU. So for now Progressive Multi-Jittered sampling remains
the default.
Differential Revision: https://developer.blender.org/D15679
* Store compact ray differentials in ShaderData and compute full differentials
on demand. This reduces register pressure on the GPU.
* Remove BSDF differential code that was effectively doing nothing as the
differential orientation was discarded when making it compact.
This gives a 1-5% speedup with RTX A6000 + OptiX in our benchmarks, with the
bigger speedups in simpler scenes.
Renders appear to be identical except for the Both displacement option that
does both displacement and bump.
Differential Revision: https://developer.blender.org/D15677