The NanoVDB sampling implementation behaves different from dense texture sampling, so this
adds a small offset to the voxel indices to correct for that.
Also removes the need to modify the sampling coordinates by moving all the necessary
transformations into the image transform. See also T81454.
Cycles defines some basic integer types since it cannot use the standard headers when
compiling with NVRTC. NanoVDB however only does this when the "__CUDACC_RTC__" define
is set and otherwise includes the standard "stdint.h" header which clashes with those typedefs.
So for compatibility do the same thing in the Cycles kernel headers. See also T81454.
The existing code for this was incomplete. Each instance can now have a set
of attributes stored separately from geometry attributes. Geometry attributes
take precedence over instance attributes.
Ref D2057
This avoids OpenCL inlining heavy volume interpolation code once for every
data type, which could cause a performance regression when we add a float4
data type in the next commit.
Ref D2057
Corrects incorrect usage of contraction for 'it is', when possessive 'its' was required.
Differential Revision: https://developer.blender.org/D9250
Reviewed by Campbell Barton
With this patch the build system checks whether the "CUDA10_NVCC_EXECUTABLE" CMake
variable is set and if so will use that to build sm_30 kernels. Similarily for sm_8x kernels it
checks "CUDA11_NVCC_EXECUTABLE". All other kernels are built using the default CUDA
toolkit. This makes it possible to use either the CUDA 10 or CUDA 11 toolkit by default and
only selectively use the other for the kernels where its a hard requirement.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D9179
NanoVDB is a platform-independent sparse volume data structure that makes it possible to
use OpenVDB volumes on the GPU. This patch uses it for volume rendering in Cycles,
replacing the previous usage of dense 3D textures.
Since it has a big impact on memory usage and performance and changes the OpenVDB
branch used for the rest of Blender as well, this is not enabled by default yet, which will
happen only after 2.82 was branched off. To enable it, build both dependencies and Blender
itself with the "WITH_NANOVDB" CMake option.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8794
Running Blender on Ampere cards was already possible with ptx, this fix is
needed to support building CUDA binaries.
Note the CUDA version used for official Blender builds is still 10, this is
merely the change to make it possible for those using CUDA 11 and specifying
the sm_8x kernels to be compiled.
Found by Milan Jaros.
On user level this fixes dead-lock of OpenCL render on Intel Iris GPUs.
Note that this patch does not include change in the logic which allows
or disallows OpenCL platforms to be used, that will happen after the
kernel fix is known to be fine for the currently officially supported
platforms.
The dead-lock was caused by wrong usage of memory barriers: as per the
OpenCL specification the barrier is to be executed by the entire work
group. This means, that the following code is invalid:
void foo() {
if (some_condition) {
return;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
void bar() {
foo();
}
The Cycles code was mentioning this as an invalid code on CPU, while in
fact this is invalid as per specification. From the implementation side
this change removes the ifdefs around the CPU-only barrier logic, and
is implementing similar logic in the shader setup kernel.
Tested on NUC8i7HVK NUC.
The root cause of the dead-lock was identified by Max Dmitrichenko.
There is no measurable difference in performance of currently supported
OpenCL platforms.
Differential Revision: https://developer.blender.org/D9039
* Support precompiled libraries on Linux
* Add license headers
* Refactoring to deduplicate code
Includes work by Ray Molenkamp and Grische for precompiled libraries.
Ref D8769
The current 1D Voronoi implementation for the Distance to Edge option
computes the distance to the cells instead. This patch fixes that and
compute the distance to the edge.
Reviewed By: JacquesLucke, brecht
Differential Revision: https://developer.blender.org/D8634
We forgot to update this code as part of D3108. I'd like to include this in 2.90,
it's entirely broken now so can't really get any worse.
Differential Revision: https://developer.blender.org/D8738
The return value of scene_intersect must be checked, the isect struct members
can't be assumed to be initialized if that returns false.
Differential Revision: https://developer.blender.org/D8692
The fisheye camera setup causes the edges of the image to not shoot primary rays. This was not
respected by OptiX because of an optimization that tried to reduce conditionals around trace calls.
Removing that does not seem to have an impact on performance anymore however and it fixes
the issue.
The sky will appear brighter than before by default. To compensate for this,
lower exposure in the Film panel. The default altitude was also changed from
90 to 15 degrees.
Patch contributed by Marco with the help of Ryan Jones.
Differential Revision: https://developer.blender.org/D8285
This patch changes the discovery of pre-compiled kernels, to look for any PTX, even if
it does not match the current architecture version exactly. It works because the driver can
JIT-compile PTX generated for architectures less than or equal to the current one.
This e.g. makes it possible to render on a new GPU architecture even if no pre-compiled
binary kernel was distributed for it as part of the Blender installation.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8332
The problem here was numerical precision: The code calculates the angle between
sun and view direction, and the usual acos(dot(a, b)) approach for that has
poor numerical performance for almost parallel angles.
As a result, the generally tiny difference between floating point computation
between CPU and GPU was enough to make the sun vanish at different radii,
causing different results.
The new version fixes the difference by making the computation much more robust
on both platforms.
This patch adds support for the curve primitive from OptiX to Cycles. It's currently hidden
behind a debug option, since there can be some slight rendering differences still (because no
backface culling is performed and something seems off with endcaps). The curve primitive
was added with the OptiX 7.1 SDK and requires a r450 driver or newer, so this also updates
the codebase to be able to build with the new SDK.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8223
For GPU debugging purposes, it is still possible to render with the same BVH2
on the CPU using the Debug panel in the render properties.
Note that building Blender without Embree will now lead to significantly reduced
performance in CPU rendering, and a few of the Cycles regression tests will fail
due to small pixel differences.
Ref T73778
Depends on D8014
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8015
Also removing the curve system manager which only stored a few curve intersection
settings. These are all changes towards making shape and subdivision settings
per-object instead of per-scene, but there is more work to do here.
Ref T73778
Depends on D8013
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8014
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:
* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
fast rendering. Hair curves are subdivided with a fixed number of user
specified subdivisions.
This gives relatively good results, especially when used with the Principled
Hair BSDF and hair viewed from a typical distance. There are artifacts when
viewed closed up, though this was also the case with all previous primitives
(but different ones).
* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
close up. This automatically subdivides the curve until it is smooth.
This gives higher quality than any of the previous primitives, but does come
at a performance cost and is somewhat slower than our previous Thick curves.
The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.
However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.
Ref T73778
Depends on D8012
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8013