This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16909
These warnings can reveal errors in logic, so quiet them by checking
if the features are enabled before using variables or by assigning
empty strings in some cases.
- Check CMAKE_THREAD_LIBS_INIT is set before use as CMake docs
note that this may be left unset if it's not needed.
- Remove BOOST/OPENVDB/VULKAN references when disable.
- Define INC_SYS even when empty.
- Remove PNG_INC from freetype (not defined anywhere).
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16371
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).
Differential Revision: https://developer.blender.org/D16502
This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`).
On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra).
Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an *exact* single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16313
The issue here was that PathTraceWork was set up before checking if
any error occurred, and it didn't account for the dummy device so
it called a non-implemented function.
This fix therefore avoids creating PathTraceWork for dummy devices
and checks for device creation errors earlier in the process.
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.
This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.
The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.
On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.
The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.
At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.
Ref T92571
Differential Revision: https://developer.blender.org/D15286
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.
Differential Revision: https://developer.blender.org/D15657
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.
Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.
Time per sample with OptiX + RTX A6000:
new old
barbershop_interior 0.0730s 0.0727s
bmw27 0.0047s 0.0053s
classroom 0.0428s 0.0464s
fishy_cat 0.0102s 0.0108s
junkshop 0.0366s 0.0395s
koro 0.0567s 0.0578s
monster 0.0206s 0.0223s
pabellon 0.0158s 0.0174s
sponza 0.0088s 0.0100s
spring 0.1267s 0.1280s
victor 0.0524s 0.0531s
wdas_cloud 0.0817s 0.0816s
Ref D15331, T87836
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15331
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
* Rename "texture" to "data array". This has not used textures for a long time,
there are just global memory arrays now. (On old CUDA GPUs there was a cache
for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
these are in a struct.
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.
We can experiment with bigger kernel organization changes later.
Differential Revision: https://developer.blender.org/D15070
* Float/double promotion warnings were mainly meant for avoiding slow
operatiosn in the kernel. Limit it to that to avoid hard to fix warnings
in Hydra.
* Const warnings in Hydra iterators.
* Unused variable warnings when building without glog.
* Wrong camera enum comparisons in assert.
* PASS_UNUSED is not a pass type, only for pass offsets.
Propagate the fp settings from the main thread to all the worker threads (the fp settings includes the FZ settings among other things) - this guarantees consistency in execution of floating point math regardless if its executed in tbb thread arena or on main thread
Add FZ mode to arm64/aarch64 in parallel to the way its been done on intel processors, currently compiling for arm target does not set this mode at all, hence potentially runs slower and with possible results mismatch with intel x86.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14454
Light groups are a type of pass that only contains lighting from a subset of light sources.
They are created in the View layer, and light sources (lamps, objects with emissive materials
and/or the environment) can be assigned to a group.
Currently, each light group ends up generating its own version of the Combined pass.
In the future, additional types of passes (e.g. shadowcatcher) might be getting their own
per-lightgroup versions.
The lightgroup creation and assignment is not Cycles-specific, so Eevee or external render
engines could make use of it in the future.
Note that Lightgroups are identified by their name - therefore, the name of the Lightgroup
in the View Layer and the name that's set in an object's settings must match for it to be
included.
Currently, changing a Lightgroup's name does not update objects - this is planned for the
future, along with other features such as denoising for light groups and viewing them in
preview renders.
Original patch by Alex Fuller (@mistaed), with some polishing by Lukas Stockner (@lukasstockner97).
Differential Revision: https://developer.blender.org/D12871
This patch adds a Hydra render delegate to Cycles, allowing Cycles to be used for rendering
in applications that provide a Hydra viewport. The implementation was written from scratch
against Cycles X, for integration into the Blender repository to make it possible to continue
developing it in step with the rest of Cycles. For this purpose it follows the style of the rest of
the Cycles code and can be built with a CMake option
(`WITH_CYCLES_HYDRA_RENDER_DELEGATE=1`) similar to the existing standalone version
of Cycles.
Since Hydra render delegates need to be built against the exact USD version and other
dependencies as the target application is using, this is intended to be built separate from
Blender (`WITH_BLENDER=0` CMake option) and with support for library versions different
from what Blender is using. As such the CMake build scripts for Windows had to be modified
slightly, so that the Cycles Hydra render delegate can e.g. be built with MSVC 2017 again
even though Blender requires MSVC 2019 now, and it's possible to specify custom paths to
the USD SDK etc. The codebase supports building against the latest USD release 22.03 and all
the way back to USD 20.08 (with some limitations).
Reviewed By: brecht, LazyDodo
Differential Revision: https://developer.blender.org/D14398
When new display driver is given to the PathTrace ensure that there are
no GPU resources used from it by the work. This solves graphics interop
descriptors leak.
This aqlso fixes Invalid graphics context in cuGraphicsUnregisterResource
error when doing final render on the display GPU.
Fixes T95837: Regression: GPU memory accumulation in Cycles render
Fixes T95733: Cycles Cuda/Optix error message with multi GPU devices. (Invalid graphics context in cuGraphicsUnregisterResource)
Fixes T95651: GPU error (Invalid graphics context in cuGraphicsUnregisterResource)
Fixes T95631: VRAM is not being freed when rendering (Invalid graphics context in cuGraphicsUnregisterResource)
Fixes T89747: Cycles Render - Textures Disappear then Crashes the Render
Maniphest Tasks: T95837, T95733, T95651, T95631, T89747
Differential Revision: https://developer.blender.org/D14146
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.
Ref D14069, T95597
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
Sometime throughout development some checks got lost during refactor.
This change makes it so that if OIDN is not supported on the current
CPU Cycles will report an error and stop rendering. This behavior is
similar to when an OptiX denoiser is requested and there is no OptiX
compatible device available.
The easiest way to verify this change is to force return false from
the `openimagedenoise_supported()`.
Fixes Cycles part of the T94127.
Differential Revision: https://developer.blender.org/D13944
The root of the issue is caused by Cycles ignoring OpenGL limitation on
the maximum resolution of textures: Cycles was allocating texture of the
final render resolution. It was exceeding limitation on certain GPUs and
driver.
The idea is simple: use multiple textures for the display, each of which
will fit into OpenGL limitations.
There is some code which allows the display driver to know when to start
the new tile. Also added some code to allow force graphics interop to be
re-created. The latter one ended up not used in the final version of the
patch, but it might be helpful for other drivers implementation.
The tile size is limited to 8K now as it is the safest size for textures
on many GPUs and OpenGL drivers.
This is an updated fix with a workaround for freezing with the NVIDIA
driver on Linux.
Differential Revision: https://developer.blender.org/D13385
Sample offset was not accounted for in the adaptive sampling code and caused
issues, like immediately applied adaptive filtering, with non-zero values.
Differential Revision: https://developer.blender.org/D13510
Adds hysteresis to the resolution divider algorithm to avoid having the resolution bounce around when on the boundary of two resolutions.
Reviewed By: brecht, leesonw
Differential Revision: https://developer.blender.org/D12385