The goal of this change is to make it so the emitters are not forced
to have MIS disabled when shadow linking in used. On the user level
it means that lights sources sources which are behind excluded shadow
blocker will be visible in sharp glossy reflection.
In order to support this an extra shadow ray is traced in the direction
of the main path, to gather contribution of emitters behind shadow
blockers. This is done in the two new kernels.
First kernel performs light intersection to see if the extra shadow
ray is needed. It is not needed if, for example, there are no light
sources in the direction of the main path.
The second kernel shades the light source and generates the actual
shadow path.
Such separation allows to keep kernels small (so that no intersection
and shading happens in the same kernel). It also helps having good
occupancy on the GPU: the main path will not wait for the shadow
kernels to complete before continuing (the main path is needed by the
extra shadow path generation, to have access to direction and shading
reading state).
Current implementation is limited to lights only: there is no support
of mesh lights yet.
The MIS weight of the new shadow ray needs to be double-checked. It was
verified to give same result as the main path when the same light is
hit.
To avoid contribution from the same light source counted by the shadow
ray and the main path the light source which was chosen to trace the
shadow ray to is excluded from intersection via transparent bounces.
This seems unideal and feels that it could be done via some MIS wights
as well. Doing so is an exercise for later.
The kernel naming could be improved. Suggestions are welcome.
There are also quiet some TODOs in the code. Not sure how much of those
must be resolved prior to merge to the cycles-light-linking branch.
Would be cool to have extra eyes on some of the MIS aspects, which is
easier in the shared branch.
Tested on the CPU and Metal on M2 GPU.
Ref #104972
Pull Request: blender/blender#107439
When running oneAPI with AoT binaries, on hardware that's not compatible with
these, recompilation could have been missing from the kernels loading phase and
happen during execution instead.
These changes fixes it, any kernel compilation will now happen during the
kernels loading phase.
Updated Embree 4 library with GPU support is required for it to be
compiled - compatiblity with Embree 3 and Embree 4 without GPU support
is maintained.
Enabling hardware raytracing is an opt-in user setting for now.
Pull Request: blender/blender#106266
This functionality is related only to debugging of SYCL implementation
via single-threaded CPU execution and is disabled by default.
Host device has been deprecated in SYCL 2020 spec and we removed it
in 305b92e05f.
Since this is still very useful for debugging, we're restoring a
similar functionality here through SYCL 2020 Host Task.
Test kernel will now test functionalities related to kernel execution
with USM memory allocations instead of with SYCL buffers and accessors
as these aren't currently used in the backend.
JIT compilation of oneAPI kernels now happens during load stage
and proper message gets shown in the GUI during compilation.
Also, this implementation skips kernels that aren't needed for
the used scene, reducing overall (re)compilation time.
This is a minimal set of changes, allowing a lot of cleanup that can
happen afterward as it allows sycl method and objects to be used outside
of kernel.cpp.
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D15397
Windows drivers 101.3430 fix an important GUI-related crash and it's
best to prevent users from running into it.
Linux drivers weren't affected but still had relevant gpu binary
compatibility fixes, so it makes sense to keep the min-supported version
aligned across OSes.
sycl/L0 runtime reports compute-runtime version since Intel graphics
driver 101.3268 on Windows, when querying driver version from sycl.
Prior to this driver, it was 0. Now we can bump minimum requirement to
this one and filter-out devices returning 0.
Maniphest Tasks: T100648
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
Initially oneAPI implementation have waited after each memory
operation, even if there was no need for this. Now, the implementation
will wait only if it is really necessary - it have improved
performance noticeble for some scenes and a bit for the rest of them.
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>