Commit Graph

903 Commits

Author SHA1 Message Date
04857cc8ef Cycles: fully decouple triangle and curve primitive storage from BVH2
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.

Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object

Fixes T91968, T91770, T91902

Differential Revision: https://developer.blender.org/D12766
2021-10-06 17:52:04 +02:00
6e268a749f Fix adaptive sampling artifacts on tile boundaries
Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.

Differential Revision: https://developer.blender.org/D12599
2021-10-05 16:19:14 +02:00
74f45ed9c5 Cleanup: spelling in comments 2021-10-03 12:13:29 +11:00
a754e35198 Cycles: refactor API for GPU display
* Split GPUDisplay into two classes. PathTraceDisplay to implement the Cycles side,
  and DisplayDriver to implement the host application side. The DisplayDriver is now
  a fully abstract base class, embedded in the PathTraceDisplay.
* Move copy_pixels_to_texture implementation out of the host side into the Cycles side,
  since it can be implemented in terms of the texture buffer mapping.
* Move definition of DeviceGraphicsInteropDestination into display driver header, so
  that we do not need to expose private device headers in the public API.
* Add more detailed comments about how the DisplayDriver should be implemented.

The "driver" terminology might not be obvious, but is also used in other renderers.

Differential Revision: https://developer.blender.org/D12626
2021-09-30 20:48:08 +02:00
6dceaafe5a Cleanup: trailing space, newlines at EOF 2021-09-29 07:30:34 +10:00
86ec9d79ec Fix build without Cycles HIP device 2021-09-28 20:00:55 +02:00
Brian Savery
044a77352f Cycles: add HIP device support for AMD GPUs
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.

HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.

https://github.com/ROCm-Developer-Tools/HIP.

As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.

See task T91571 for more details on the current status and work remaining
to be done.

Credits:

Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)

Differential Revision: https://developer.blender.org/D12578
2021-09-28 19:18:55 +02:00
2189dfd6e2 Cycles: Rework OptiX visibility flags handling
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).

To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).

This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.

Fixes T89801

Differential Revision: https://developer.blender.org/D12604
2021-09-27 17:12:43 +02:00
a6b53ef994 Cycles: print name of kernels on errors in CUDA queue, for debugging 2021-09-27 15:24:12 +02:00
ab8f24811d Cleanup: remove unused device code and includes 2021-09-24 16:34:14 +02:00
d7f803f522 Fix T91641: crash rendering with 16k environment map in Cycles
Protect against integer overflow.
2021-09-23 17:48:16 +02:00
4d66cbd140 Cleanup: spelling in comments 2021-09-22 14:54:01 +10:00
0803119725 Cycles: merge of cycles-x branch, a major update to the renderer
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.

Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.

Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles

Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)

For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.

Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
2021-09-21 14:55:54 +02:00
93eb460dd0 Cleanup: clang-format (re-run after v12 version bump) 2021-07-30 16:19:19 +10:00
073bf8bf52 Cycles: remove WITH_CYCLES_DEBUG, add WITH_CYCLES_DEBUG_NAN
WITH_CYCLES_DEBUG was used for rendering BVH debugging passes. But since we
mainly use Embree an OptiX now, this information is no longer important.

WITH_CYCLES_DEBUG_NAN will enable additional checks for NaNs and invalid values
in the kernel, for Cycles developers. Previously these asserts where enabled in
all debug builds, but this is too likely to crash Blender in scenes that render
fine regardless of the NaNs. So this is behind a CMake option now.

Fixes T90240
2021-07-28 19:27:57 +02:00
cf74cd9367 Cycles: upgrade CUDA to 11.4
This fixes a performance regression on Ampere cards, on specific scenes like
classroom. For cycles-x there is little difference, but this is still helpful
for LTS releases, and we need to upgrade at some point anyway.
2021-07-26 19:46:51 +02:00
f1e4903854 Cleanup: full sentences in comments, improve comment formatting 2021-06-26 21:50:48 +10:00
4b9ff3cd42 Cleanup: comment blocks, trailing space in comments 2021-06-24 15:59:34 +10:00
cd39e3dec1 OptiX: select BVH build options from Scene params
Currently, the OptiX BVH build options are selected based on whether
we are in background mode (final renders) or not (viewport renders).
In background mode, the BVH is built for fast path tracing and low
memory footprint, while in viewport, it is built for fast updates.

However, on platforms without OpenGL support, the background flag is
always set to true and prevents using fast BVH builds in the viewport.

Now, the BVH options derive from the Scene BVH settings:
* if BVH is static, a fast to trace BVH is built
* if BVH is dynamic, a fast to update BVH is built

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D11154
2021-06-22 07:38:28 +02:00
b046bc536b Fix T88096: Baking with OptiX and displacement fails
Using displacement runs the shader eval kernel, but since OptiX modules are not loaded when
baking is active, those were not available and therefore failed to launch. This fixes that by falling
back to the CUDA kernels.
2021-05-25 16:56:16 +02:00
ff51c2e89a Cleanup: Use named unused arguments in Cycles Device 2021-05-21 11:19:33 +02:00
0745afeddb Merge remote-tracking branch 'origin/blender-v2.93-release' 2021-05-20 13:00:07 +02:00
0456223cde Fix T87793: Cycles OptiX crash hiding objects in viewport render 2021-05-19 18:30:43 +02:00
3e472d87a8 Cycles OpenCL: disable AO preview kernels
These seem to be causing some stability issues, and really are just not that
useful in practice. Compiling them is slow already, so it does not improve
the user experience much to show an AO preview if it's not nearly instant.
2021-05-19 18:30:43 +02:00
542b8da831 Merge branch 'blender-v2.93-release' 2021-05-17 20:18:39 +02:00
91a5dbbd17 Fix OpenCL group size performance issue on Intel GPUs
Contributed by Intel. On some scenes like classroom with particular integrated
GPUs this speeds up rendering 1.97x. With other benchmarks and GPUs it's
between 0.99-1.14x.
2021-05-17 19:40:57 +02:00
94960250b5 Cycles: Fix build with OptiX 7.3 SDK 2021-04-26 14:55:39 +02:00
7cbd66d42f Cycles: Initialize all OptiX structs to zero before use
This is done to ensure building with newer OptiX SDK releases that add new struct fields gives
deterministic results (no uninitialized fields and therefore random data is passed to OptiX).
2021-04-13 13:56:15 +02:00
f1fe42d912 Cycles: Do not allocate tile buffers on all devices when peer memory is active and denoising is not
Separate tile buffers on all devices only need to exist when denoising is active (so any overlap
being rendered simultaneously does not write to the same memory region).
When denoising is not active they can be distributed like all other memory when peer
memory support is available.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D10858
2021-03-30 14:04:56 +02:00
91c44fe885 Cycles: disable NanoVDB for AMD OpenCL
It is causing issue with AMD OpenCL drivers, due to a potential driver bug.

Ref T84461
2021-03-30 00:00:17 +02:00
8f93386e62 Fix (apparently harmless) Cycles asan warnings 2021-03-15 20:46:57 +01:00
f4f8b6dde3 Cycles: Change device-only memory to actually only allocate on the device
This patch changes the `MEM_DEVICE_ONLY` type to only allocate on the device and fail if
that is not possible anymore because out-of-memory (since OptiX acceleration structures may
not be allocated in host memory). It also fixes high peak memory usage during OptiX
acceleration structure building.

Reviewed By: brecht

Maniphest Tasks: T85985

Differential Revision: https://developer.blender.org/D10535
2021-03-11 14:12:35 +01:00
17e1e2bfd8 Cleanup: correct spelling in comments 2021-02-05 16:23:34 +11:00
b2e00e8f8e Merge branch 'blender-v2.92-release' 2021-01-29 13:35:21 +01:00
9f89166b52 Fix T85148: OptiX viewport denoising regression
Commit 6e74a8b69f changed the denoiser input passes default to
include the normal pass. This does not always produce optimal images though, hence why the
default was previously set to only include the color and albedo passes. This restores that behavior, so
that viewport denoising with OptiX produces the same results as before.
2021-01-29 13:35:00 +01:00
9b80291412 Merge branch 'blender-v2.92-release' 2021-01-27 15:29:39 +01:00
James Horsley
4fbeb3e6be Fix T85089: Crash when rendering scene that does not fit into GPU memory with CUDA/OptiX
The "cuda_mem_map_mutex" was potentially being locked recursively during the call to
"CUDADevice::move_textures_to_host", which crashed. This moves around the locking and
unlocking of "cuda_mem_map_mutex", so that it doesn't call a function that locks it while
still holding the lock.

Reviewed By: pmoursnv

Maniphest Tasks: T85089, T84734

Differential Revision: https://developer.blender.org/D10219
2021-01-27 15:27:57 +01:00
bbe6d44928 Cycles: optimize device updates
This optimizes device updates (during user edits or frame changes in
the viewport) by avoiding unnecessary computations. To achieve this,
we use a combination of the sockets' update flags as well as some new
flags passed to the various managers when tagging for an update to tell
exactly what the tagging is for (e.g. shader was modified, object was
removed, etc.).

Besides avoiding recomputations, we also avoid resending to the devices
unmodified data arrays, thus reducing bandwidth usage. For OptiX and
Embree, BVH packing was also multithreaded.

The performance improvements may vary depending on the used device (CPU
or GPU), and the content of the scene. Simple scenes (e.g. with no adaptive
subdivision or volumes) rendered using OptiX will benefit from this work
the most.

On average, for a variety of animated scenes, this gives a 3x speedup.

Reviewed By: #cycles, brecht

Maniphest Tasks: T79174

Differential Revision: https://developer.blender.org/D9555
2021-01-22 16:08:25 +01:00
10d2cbfa36 Fix T84872: OptiX GPU + CPU rendering uses branched path samples
Branched path tracing is not supported for OptiX, and it would still use the
number of AA samples from there when branched path was enabled by the user
earlier but auto disabled and hidden in the UI when using OptiX.

Ref D10159
2021-01-20 14:59:23 +01:00
0d8948387e Cycles: Fix missing OpenCL extensions in certain cases
If extensions string is longer than 1024 then the old code would have
reported empty string instead of extensions.

Now the code does dynamic string allocation to store result of request,
similar to what is done in `OpenCLInfo::get_hardware_id`.

The code looks a bit ugly, but it didn't really change much with this
patch. In other words, the code can become more modern and clear, but
it is considered to be outside of the scope of this change.

Differential Revision: https://developer.blender.org/D10135
2021-01-18 15:47:00 +01:00
3732508c64 Fix T84745: build error with TBB 2021
task_group::is_canceling() was removed.
2021-01-15 17:29:36 +01:00
Lukas Stockner
688e5c6d38 Fix T82351: Cycles: Tile stealing glitches with adaptive sampling
In my testing this works, but it requires me to remove the min(start_sample...) part in the
adaptive sampling kernel, and I assume there's a reason why it was there?

Reviewed By: brecht

Maniphest Tasks: T82351

Differential Revision: https://developer.blender.org/D9445
2021-01-11 21:04:49 +01:00
c66f00dc26 Fix Cycles rendering with OptiX after instance limit increase when building with old SDK
Commit d259e7dcfb increased the instance limit, but only provided
a fall back for the host code for older OptiX SDKs, not for kernel code. This caused a mismatch when
an old SDK was used (as is currently the case on buildbot) and subsequent rendering artifacts. This
fixes that by moving the bit that is checked to a common location that works with both old an new
SDK versions.
2021-01-08 13:38:26 +01:00
d259e7dcfb Cycles: Increase instance limit for OptiX acceleration structure building
For a while now OptiX had support for 28-bits of instance IDs, instead of the initial 24-bits (see also
value reported by OPTIX_DEVICE_PROPERTY_LIMIT_MAX_INSTANCE_ID). This change makes use of
that and also adds an error reported when the number of instances an OptiX acceleration structure is
created with goes beyond the limit, to make this clear instead of just rendering an image with artifacts.

Manifest Tasks: T81431
2021-01-07 19:23:13 +01:00
3373d14b1b Fix T83925: Crash when rendering on the CPU with OptiX denoiser enabled
Rendering on the CPU uses the Embree BVH layout, whether the OptiX denoiser is enabled or not.
This means the "build_bvh" function gets a "BVHEmbree" object to fill and not a "BVHMulti" as it
was assuming before, which caused crashes due to memory geting overwritten incorrectly. This
fixes that by redirecting Embree BVH builds to the Embree device.

Manifest Tasks: T83925
2021-01-05 18:37:31 +01:00
c4f8aedbc2 Fix T84016: Cycles baking crash with OptiX after recent changes
This worked for CPU + GPU, but not GPU only.
2020-12-24 12:59:35 +01:00
b2edc716c1 Fix Cycles OptiX runtime compilation broken after shader raytracing
Need to pass the appropriate flags as we do for compilation as part of the
CMake build.
2020-12-22 15:08:59 +01:00
001f2c5d50 Cleanup: spelling 2020-12-15 12:34:25 +11:00
f762d37790 Cycles: enable OpenCL rendering on recent Intel GPUs
Based on testing by Intel, rendering on Iris GPUs and upcoming Xe GPUs
should work. This is enabled on Windows and Linux.

More testing is needed to verify correctness and performance in production
scenes, but our basic benchmark files seem to give correct results.
2020-12-11 17:37:54 +01:00
c6626a2f8a Cleanup: compiler warnings
If you mark one function as override in a class, all must be marked.
2020-12-11 17:37:31 +01:00