This sampling pattern is particularly suited to adaptive sampling, and will
be used for that upcoming feature.
Based on "Progressive Multi-Jittered Sample Sequences" by Per Christensen,
Andrew Kensler and Charlie Kilpatrick.
Ref D4686
This fixes denoising being delayed until after all rendering has finished. Instead, tile-based
denoising is now part of the "RENDER" task again, so that it is all in one task and does not
cause issues with dedicated task pools where tasks are serialized.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6940
This patch adds a new user-configurable option to change at which sample viewport
denoising should kick in. Setting it to zero retains previous behavior (start immediately), while
other values will defer denoising until the particular sample has been reached. Default is now
at one, to avoid the weirdness that is AI denoising at small resolutions.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6906
Rendering with multiple CUDA devices but denoising with OptiX caused parts of the image to go
missing at the start while the resolution was scaled. This is because the copy operation in
`MultiDevice::map_neighbor_tiles` which slices the copy across all devices would slice based on the
full resolution and not the scaled one and therefore copy incorrect data between devices.
Since this is not the recommended way of using viewport denoising anyway, simply avoid those
incorrect copies for now by disabling denoising while the resolution is scaled. Doing both rendering
and denoising with OptiX is not affected by this, since it avoids those copies altogether anyway.
Sometimes the viewport buffer size is zero for a frame, which caused the denoising task to also try to
launch CUDA kernels with a launch size of zero, which in turn failed with a CUDA error. This patch
prevents launches from occuring in this case, similar to how it is handled in `copy_to_display_buffer`.
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D6554
This patch adds support for the OptiX denoiser as an alternative to the existing NLM denoiser in Cycles. It's re-using the same denoising architecture based on tiles and therefore implicitly also works with multiple GPUs.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D6395
This change allows the user to select a renderpass in the 3d viewport.
Added support for external renderers to extend the `View3DShading` struct.
This way Blender doesn't need to know the features an external render engine wants to support.
Note that the View3DShading is also available in the scene->display.shading; although this is
supported, it does not make sense for render engines to put something here as it is really
scene/workbench related.
Currently cycles assumes that it always needs to calculate the combined pass; it ignores the
`pass_flag` in KernelFilm. We could optimize this but that was not in scope of this change
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5689
During viewport rendering where the viewport samples are set to 0 the UI
showed 16777216 as number of samples. We should not show the number of
samples when the number of viewport samples are set to 0.
Differential Revision: https://developer.blender.org/D5301
The main goals of this change is faster starting when using foreground
rendering.
This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.
These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.
In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.
Reviewed By: brecht, #cycles
Maniphest Tasks: T61752
Differential Revision: https://developer.blender.org/D4428
This patch will reduce the number of times that we need to
recompile kernels. It does this by (en/dis)abling features
by default. So when the user needs them that the kernels are
already available.
Other features are enabled by default for background and foreground
rendering. When in background rendering the user wants the best
render performance. When in foreground rendering the user wants
the least amount of recompilations.
Enabling volumetrics or subdivision evaluation will still trigger
a recompilation during foreground rendering.
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D4485
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.
Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.
Reviewed By: brecht, #cycles
Tags: #cycles
Maniphest Tasks: T61971
Differential Revision: https://developer.blender.org/D4412
When using preview rendering through a camera or final rendering
the `scene.render.use_motion_blur` was not respected when building
the compile directives.
This patch will when building the compile directives check if
motion blur is enabled at all. This should lead to more efficient
kernels when no motion blur is needed.
Tags: #cycles
Differential Revision: https://developer.blender.org/D4387
This adds a cycles.denoise_animation operator, which denoises an animation
sequence or individual file. Renders must be saved as multilayer EXR files
with denoising data passes.
By default file path and frame range come from the current scene, and EXR
files are denoised in-place. Alternatively, a different input and/or output
file path can be provided.
Denoising settings come from the current view layer. Renders can be denoised
again with different settings, as the original noisy image is preserved along
with other passes and metadata.
There is no user interface yet for this feature, that comes later.
Code by Lukas with modifications by Brecht. This feature was originally
developed for Tangent Animation, thanks for the support!
Differential Revision: https://developer.blender.org/D3889
This adds a cycles.denoise_animation operator, which denoises an animation
sequence or individual file. Renders must be saved as multilayer EXR files
with denoising data passes.
By default file path and frame range come from the current scene, and EXR
files are denoised in-place. Alternatively, a different input and/or output
file path can be provided.
Denoising settings come from the current view layer. Renders can be denoised
again with different settings, as the original noisy image is preserved along
with other passes and metadata.
There is no user interface yet for this feature, that comes later.
Code by Lukas with modifications by Brecht. This feature was originally
developed for Tangent Animation, thanks for the support!
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.
The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.
Ref D3889.
The integrator maximum number of closures was not set properly for the CPU/mega
kernels to match the actual available memory. Before relatively recent code
refactoring we did not use this value in those kernels so it worked fine.
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.
The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").
Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.
Reviewers: brecht, sergey, swerner
Reviewed By: brecht, swerner
Differential Revision: https://developer.blender.org/D3892
Now it shows more compact info below the view/object name. Render time and
memory usage is left out, as in most cases this is not so important. These
could be added back optionally if needed.
Needed for the animation denoiser since the denoising filter is done separately there.
Reviewers: brecht, sergey
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3833
This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
Goal is to reduce OpenCL kernel recompilations.
Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2775
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
Progressive refine undoes memory saving from save buffers, so enabling
both does not make much sense. Previously enabling progressive refine
would disable denoising, but it should be the other way around since
denoise actually affects the render result.
Includes some code refactor for progressive refine render buffers, and
avoids recomputing tiles for each progressive sample.
This was originally done with the first sample in the kernel for better
performance, but it doesn't work anymore with atomics. Any benefit was
very minor anyway, too small to measure it seems.