Commit Graph

129 Commits

Author SHA1 Message Date
Michael Jones
2b88ee50fb Cycles: Tweak inlining policy on Metal
This patch optimises the Metal inlining policy. It gives a small speedup (2-3% on M1 Max) with no notable compilation slowdown vs what is already in master. Previously noted compilation slowdowns (as reported in T100102) were caused by forcing inlining for `ccl_device`, but we get better rendering perf by relying on compiler heuristics in these cases.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16081
2022-09-27 17:01:28 +01:00
Sebastian Herhoz
75a6d3abf7 Cycles: add Path Guiding on CPU through Intel OpenPGL
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.

This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.

The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.

On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.

The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.

At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.

Ref T92571

Differential Revision: https://developer.blender.org/D15286
2022-09-27 15:56:32 +02:00
125ac1f914 Cycles: increase min-supported driver version for Intel GPUs
Windows drivers 101.3430 fix an important GUI-related crash and it's
best to prevent users from running into it.
Linux drivers weren't affected but still had relevant gpu binary
compatibility fixes, so it makes sense to keep the min-supported version
aligned across OSes.
2022-09-26 07:41:47 -07:00
Werner, Stefan
0c824837ab Cycles: Cleanup in oneAPI math includes and definitions
Now explicitly including math.h first before #defining funcitons.
This avoids undefined behavior and improves compatibility with
different SYCL compilers and backends.
2022-09-22 11:33:57 +02:00
6d08ba8a50 Fix T100824: Cycles GPU render broken on macOS 13 Beta and Apple silicon
The recent revert of Apple silicon inlining changes to avoid long compile times
worked on macOS 12, but in macOS 13 Beta it results in render errors. This may
be a compiler bug and perhaps get fixed in time, but try to be on the safe side
and ensure Blender 3.3.0 works regardless.

This brings part of the inlining back, which brings improved performance but
also longer compiler times again. Compile time is around 2min now, where the
previous full inlining was about 5-7min.

Patch by Michael Jones.

Differential Revision: https://developer.blender.org/D15897
2022-09-06 19:11:52 +02:00
6c6a53fad3 Cleanup: spelling in comments, formatting, move comments into headers 2022-09-06 16:25:20 +10:00
cf57624764 Cleanup: refactoring of kernel film function names and organization 2022-09-02 17:13:28 +02:00
3e73afb536 Merge branch 'blender-v3.3-release' 2022-08-31 15:34:44 +02:00
b1231e616a Cycles: Enforce Windows driver version requirements for sycl
sycl/L0 runtime reports compute-runtime version since Intel graphics
driver 101.3268 on Windows, when querying driver version from sycl.
Prior to this driver, it was 0. Now we can bump minimum requirement to
this one and filter-out devices returning 0.

Maniphest Tasks: T100648
2022-08-31 15:33:16 +02:00
658ff994c5 Merge branch 'blender-v3.3-release' 2022-08-29 19:21:49 +02:00
805d1063a0 Cycles: Remove "return" and "assert" from oneAPI kernel code 2022-08-29 19:18:50 +02:00
48e1a66af0 Merge branch 'blender-v3.3-release' 2022-08-29 18:21:56 +02:00
1cd8ca49f9 Cycles: Increased minimum supported driver for Windows in oneAPI 2022-08-29 18:10:56 +02:00
d4764a385a Merge branch 'blender-v3.3-release' 2022-08-25 11:50:55 +02:00
9c2bc57cbd Fix Cycles oneAPI for a newer DPC++ compiler version 2022-08-25 11:50:22 +02:00
9961aae1e6 Merge branch 'blender-v3.3-release' 2022-08-18 20:31:34 +02:00
e11c899e71 Cycles: disable Metal inlining optimization on Apple GPUs
This gave a 1.1x speedup, however also leads to very long compile times
that make it seems like Blender has stopped working.

This can be brought back in the future behind an option that users can
explicitly enabled.

Fix T100102

Ref D14923, D14763, T92212
2022-08-18 20:01:29 +02:00
3aeacb9ab3 Merge branch 'blender-v3.3-release' 2022-08-15 13:53:42 +02:00
c2c019dda8 Fix Cycles MetalRT compile error 2022-08-13 19:55:38 +02:00
1988665c3c Cleanup: make vector types make/print functions consistent between CPU and GPU
Now all the same ones are available on CPU and GPU, which was previously not
possible due to lack of operator overloadng in OpenCL. Print functions are
no-ops on some GPUs.

Ref D15535
2022-08-09 16:07:23 +02:00
47f433c776 Merge branch 'blender-v3.3-release' 2022-08-08 11:00:10 -03:00
1382514bf2 Fix: Error in oneAPI image code for texture access with clip extension 2022-08-08 10:47:11 +02:00
fafd1ab9d3 Merge branch 'blender-v3.3-release' 2022-08-05 19:49:12 +02:00
fa514564b0 Fix T99201: Cycles render difference with 3D hair curves between OptiX and Emrbee
It should consistently use the Cycles pirmitive ID for self intersection detection,
not the one from the OptiX or Embree acceleration structure.

Differential Revision: https://developer.blender.org/D15632
2022-08-05 15:03:47 +02:00
19a7a013ce Merge branch 'blender-v3.3-release' 2022-08-01 14:37:16 +02:00
76169472d3 Cycles: Resolve recent performance regression in oneAPI implementation for Intel® Arc™ GPUs
Recently, performance with oneAPI have regressed due some recent
changes in Blender itself. This commit's changes is resolving this
and also improve compilation time for oneAPI backend first
execution (or Blender compilation time in case of AoT).

Regression have appeared after 5152c7c152 and not related to the
changes itself, but increase of kernels complexity introduced with
it. Changes in this commit is marking some Blender functions as
noinlined for oneAPI backend, which helps GPU compiler to deal with
this complexity without any negative side-effects on performance.
2022-08-01 12:45:34 +02:00
79ab76e156 Cleanup: simplifications and consistency for vector types
* OneAPI: remove separate float3 definition
* OneAPI: disable operator[] to match other GPUs
* OneAPI: make int3 compact to match other GPUs
* Use #pragma once
* Add __KERNEL_NATIVE_VECTOR_TYPES__ to simplify checks
* Remove unused vector3
2022-07-28 21:27:13 +02:00
38af5b0501 Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX
Simplifies intersection code a little and slightly improves precision regarding
self intersection.

The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
2022-07-27 21:03:33 +02:00
cd47d1b2ed Fix broken BVH2 on CPU after recent changes
Runtime switching between Embree and BVH2 got lost.
2022-07-27 20:58:02 +02:00
d706d0460c Cycles oneAPI: simplify num_concurrent_states selection
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
2022-07-27 09:45:33 +02:00
f1f89ca751 Cleanup: spelling in comments 2022-07-26 13:21:21 +10:00
4cf6524731 Fix Cycles Metal build errors after recent changes
float8 is a reserved type in Metal, but is not implemented. So rename to
float8_t for now.

Also move back intersection handlers to kernel.metal, they can't be in the
class that encapsulates the other Metal kernel functions.
2022-07-26 00:17:37 +02:00
f26aa186b2 Cleanup: remove __KERNEL_CPU__
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
2022-07-25 17:43:35 +02:00
7a74d91e32 Cleanup: move device BVH code to kernel/device/*/bvh.h
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
2022-07-25 16:34:22 +02:00
484ad31653 Cycles: simplify handling of ray distance in GPU rendering
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.

There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
2022-07-25 13:27:40 +02:00
5152c7c152 Cycles: refactor rays to have start and end distance, fix precision issues
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.

Fix T98764, T96537, hair ray tracing precision issues.

Differential Revision: https://developer.blender.org/D15455
2022-07-15 18:46:24 +02:00
bb376da6df Fix Cycles MetalRT error after recent specialization changes 2022-07-15 18:28:13 +02:00
da4ef05e4d Cycles: Apple Silicon optimization to specialize intersection kernels
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
2022-07-15 13:40:04 +02:00
47dd42485e Cycles: fix and enable JIT oneAPI CentOS7 builds for drivers 23570+
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
2022-07-12 15:55:32 +02:00
0f50ae131f Cycles: enable oneAPI in Linux release builds
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
2022-07-08 15:39:13 +02:00
190ad73590 Cycles oneAPI: Remove direct dependency on Level-Zero
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
2022-07-06 18:55:38 +02:00
debb233787 Cleanup: fix comments in oneAPI kernel.cpp 2022-07-06 18:55:38 +02:00
0df574b55e Cycles: Improve an occupancy for Intel GPUs
Initially oneAPI implementation have waited after each memory
operation, even if there was no need for this. Now, the implementation
will wait only if it is really necessary - it have improved
performance noticeble for some scenes and a bit for the rest of them.
2022-07-06 17:26:23 +02:00
41c10ac84a Cycles: fix support for multiple Intel GPUs
Identical Intel GPUs ended up with the same id.
Added PCI BDF to the id to make it unique.
2022-07-01 11:20:00 +02:00
0554537c3c Cleanup: add missing license headers in Cycles oneAPI implementation 2022-07-01 10:13:07 +02:00
b6c28002ac Cleanup: spelling in comments 2022-06-30 12:14:22 +10:00
a02992f131 Cycles: Add support for rendering on Intel GPUs using oneAPI
This patch adds a new Cycles device with similar functionality to the
existing GPU devices.  Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.

This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs.  The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.

The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
  https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
  https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
  Compiler All are included in Linux precompiled libraries on svn:
  https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
  Windows precompiled binaries but for the graphics compiler, available
  as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
  https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
  for which path can be set as OCLOC_INSTALL_DIR.

Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15254

Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
2022-06-29 12:58:04 +02:00
Sayak Biswas
abfa09752f Cycles: enable Vega GPU/APU support
Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code
to support 64-bit waves and a new HIP SDK version.

Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon
Graphics APU.

Ref T96740, T91571

Differential Revision: https://developer.blender.org/D15242
2022-06-28 18:35:43 +02:00
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00