Commit Graph

1140 Commits

Author SHA1 Message Date
a99022e22d Cleanup: spelling in comments 2023-02-07 14:17:01 +11:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
b454416927 Cycles: add non-uniform scaling to spot light size
Cycles ignores the size of spot lights, therefore the illuminated area doesn't match the gizmo. This patch resolves this discrepancy.
| Before (Cycles) | After (Cycles) | Eevee
|{F14200605}|{F14200595}|{F14200600}|
This is done by scaling the ray direction by the size of the cone. The implementation of `spot_light_attenuation()` in `spot.h` matches `spot_attenuation()` in `lights_lib.glsl`.
**Test file**:
{F14200728}

Differential Revision: https://developer.blender.org/D17129
2023-02-03 18:51:14 +01:00
e308b891c8 Cycles: Use faster and exact GGX VNDF sampling algorithm
Based on "Sampling the GGX Distribution of Visible Normals" by Eric Heitz
(https://jcgt.org/published/0007/04/01/).

Also, this removes the lambdaI computation from the Beckmann sampling code and
just recomputes it below. We already need to recompute for two other cases
(GGX and clearcoat), so this makes the code more consistent.

In terms of performance, I don't expect a notable impact since the earlier
computation also was non-trivial, and while it probably was slightly more
accurate, I'd argue that being consistent between evaluation and sampling is
more important than absolute numerical accuracy anyways.

Differential Revision: https://developer.blender.org/D17100
2023-01-24 17:59:29 +01:00
ce25e3e581 Cycles: Cleanup: Add general-purpose conversion between sin and cos 2023-01-24 17:59:29 +01:00
e8d1d1486e Fix T103960: build issue with GCC 13 in Cycles thread code 2023-01-18 16:43:47 +01:00
93ca4eeec1 Cleanup: remove unused functions 2023-01-16 19:40:25 +01:00
a84a8a528d Cycles: remove SSE3 and AVX kernel optimization levels
While keeping SSE2, SSE4.1 and AVX2. This does not affect hardware support, it
only slightly reduces performance for some older CPUs.

To reduce maintenance cost and improve compile times.

Differential Revision: https://developer.blender.org/D16978
2023-01-16 17:53:36 +01:00
317a5f61f0 Cycles: Fix recently introduced off-by-one error in an assert
This was added in rB95696d09bc07, but I got the index wrong.
2023-01-10 02:10:20 +01:00
95696d09bc Fix T88849: Motion blur in cycles leaves bright edge on trailing end of blur
The code that computes and inverts the shutter CDF had some issues that caused
the result to be asymmetric, this tweaks it to be more robust and produce
symmetric outputs for symmetric inputs.
2023-01-09 03:08:36 +01:00
Hallam Roberts
a501a2dbff Images: add mirror extension type
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).

See D16432 for a more detailed explanation of `wrap_mirror`.

This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.

Differential Revision: https://developer.blender.org/D16432
2022-12-14 19:27:29 +01:00
Leszek Godlewski
07d3a3962a Fix Cycles build in VS2022, use explicit two's complement in find_first_set()
Compiling Cycles in Visual Studio 2022 yields the error:
C4146: unary minus operator applied to unsigned type, result still unsigned

Replacing it with explicit two's complement achieves the same result as signed
negation but avoids the error.

Differential Revision: https://developer.blender.org/D16616
2022-12-07 18:34:57 +01:00
3124241256 Fix Cycles SSE4 define for fast math rint function.
Differential Revision: https://developer.blender.org/D16708
2022-12-06 19:06:43 +01:00
16b6116b9d Fix Cycles light tree render errors on Windows
Due to mistake in popcount implementation. Thanks to Weizhen for help
figuring this out.
2022-12-06 16:52:15 +01:00
c5e71cebaa Cycles: Remove OpenGL header
It is not really used from any of the sources, including the
standalone app. Since we are moving to a more backend-independent
drawing it makes sense to remove header which was specific to
how Blender integrates Cycles into viewport.

There is probably some cleanup in CMake files is possible, but
there is some inter-dependency with USD.

Differential Revision: https://developer.blender.org/D16681
2022-12-02 17:19:00 +01:00
e028662f78 Cycles: store axis and length of an area light instead of their product 2022-12-02 15:23:09 +01:00
Michael Jones
2c596319a4 Cycles: Cache only up to 5 kernels of each type on Metal
This patch adapts D14754 for the Metal backend. Kernels of the same type are already organised into subdirectories which simplifies type matching.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16469
2022-11-11 18:10:29 +00:00
e6b38deb9d Cycles: Add basic support for using OSL with OptiX
This patch  generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.

Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D15902
2022-11-09 15:30:21 +01:00
Brecht Van Lommel
e1b3d91127 Refactor: replace Cycles sse/avx types by vectorized float4/int4/float8/int8
The distinction existed for legacy reasons, to easily port of Embree
intersection code without affecting the main vector types. However we are now
using SIMD for these types as well, so no good reason to keep the distinction.

Also more consistently pass these vector types by value in inline functions.
Previously it was partially changed for functions used by Metal to avoid having
to add address space qualifiers, simple to do it everywhere.

Also removes function declarations for vector math headers, serves no real
purpose.

Differential Revision: https://developer.blender.org/D16146
2022-11-08 12:28:40 +01:00
74c293863d Cycles: Remove use of sprintf() in MD5 code
The new Xcode declares the `sprintf()` function deprecated and
suggests to sue `snprintf()` as a safer alternative.

This change actually moves away from any formatted printing and
uses inlined byte-to-hex-string conversion which is also safe
and is (unmesurably) faster.

Differential Revision: https://developer.blender.org/D16378
2022-11-03 15:10:37 +01:00
4b14b33ea8 Cycles: use packed float3 back for oneAPI
This fixes a 15% performance regression silently introduced by
79ab76e156 that aligned the compact
float3 on 16 bytes for oneAPI.
Current change is minimalist, there are further cleanup opportunities
such as removing packed_float3 definition for oneAPI but for some
reason, it cuts the recovered speedup in half, so we're starting with
this small fix for now.

Reviewed by: brecht

Differential Revision: https://developer.blender.org/D16340
2022-10-26 10:53:23 +02:00
7da85ea35a Fix Cycles build error on 32bit x86 2022-10-25 16:56:35 +02:00
6ad04a031c Cycles: Fix floor intrinsic for ARM Neon 2022-10-17 01:13:43 +02:00
0c50f9c4aa Fix T98672: Noise texture shows incorrect behaviour for large scales
This was a floating point precision issue - or, to be more precise,
an issue with how Cycles split floats into the integer and fractional
parts for Perlin noise.
For coordinates below -2^24, the integer could be wrong, leading to
the fractional part being outside of 0-1 range, which breaks all sorts
of other things. 2^24 sounds like a lot, but due to how the detail
octaves work, it's not that hard to reach when combined with a large
scale.

Since this code is originally based on OSL, I checked if they changed
it in the meantime, and sure enough, there's a fix for it:
https://github.com/OpenImageIO/oiio/commit/5c9dc68391e9

So, this basically just ports over that change to Cycles.

The original code mentions being faster, but as pointed out in the
linked commit, the performance impact is actually irrelevant.

I also checked in a simple scene with eight Noise textures at
detail 15 (with >90% of render time being spent on the noise), and
the render time went from 13.06sec to 13.05sec. So, yeah, no issue.
2022-10-16 02:34:10 +02:00
d20be55c1a Cleanup: in Cycles force inline transform_inverse_impl
We expect this to always happen.

Ref T100891
2022-10-03 17:58:34 +02:00
ea2c41c730 Cleanup: spelling in comments
Also replace "dm" for evaluated mesh in some comments.
2022-10-03 11:03:46 +11:00
Sebastian Herhoz
75a6d3abf7 Cycles: add Path Guiding on CPU through Intel OpenPGL
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.

This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.

The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.

On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.

The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.

At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.

Ref T92571

Differential Revision: https://developer.blender.org/D15286
2022-09-27 15:56:32 +02:00
fd1bc90679 Cycles: sync changes from standalone repository
* Windows build fixes
* Workaround for Hydra + OpenColorIO link issue
* Bump version
2022-09-18 17:34:23 +02:00
Lukas Stockner
6951e8890a Mikktspace: Optimized port to C++
This commit is a big overhaul to the Mikktspace module, which is used
to compute tangents. I'm not calling it a rewrite since it's the
result of a lot of iterations on the original code, but pretty much
everything is reworked somehow.

Overall goal was to a) make it faster and b) make it maintainable.

Notable changes:
- Since the callbacks for requesting geometry data were a big
  bottleneck before, I've ported it to C++ and made it header-only,
  templating on the data source. That way, the compiler generates code
  specific to the caller, which allows it to inline the data source and
  specialize for some cases (e.g. subd vs. non-subd in Cycles).
- The one input parameter, an optional angle threshold, was not used
  anywhere. Turns out that removing it allows for considerable
  algorithmic simplification, removing a lot of the complexity in the
  later stages. Therefore, I've just removed the option in the new code.
- The code computes several outputs, but only one (the tangent itself)
  is ever used in Blender. Therefore, I've removed the others to
  simplify the code. They could easily be brought back if needed, none
  of the algorithmic simplifications are conflicting with them.
- The original code had fallback paths for many steps in case temporary
  memory allocation fails, but that never actually gets used anyways
  since malloc() doesn't really ever return NULL in practise, so I
  removed them.
- In general, I've restructured A LOT of the code to make the
  algorithms clearer and make use of some C++ features (vectors,
  std::array, booleans, classes), though there's still some of cleanup
  that could be done.
- Parallelized duplicate detection, neighbor detection, triangle
  tangent computation, degenerate triangle handling and tangent space
  accumulation.
- Replaced several algorithms with faster equivalents: Duplicate
  detection uses a (concurrent) hash set now, neighbor detection uses
  Radixsort and splits vertices by index pairs etc.

As for results, the exact speedup depends on the scene of course, but
let's consider the file from T97378:
- Blender 3.1 (before D14675): 6.07sec
- Blender 3.2 (with D14675): 4.62sec
- rBf0a36599007d (last nightly build): 4.42sec
- With this commit: 0.90sec

This speedup will mostly be noticed at the start of Cycles renders and,
even more importantly, in Eevee when doing something that changes the
geometry (e.g. animating) on a model using normal maps.

Differential Revision: https://developer.blender.org/D15589
2022-09-07 00:35:44 +02:00
Nathan Vegdahl
50df9caef0 Cycles: improve Progressive Multi-Jittered sampling
Fix two issues in the previous implementation:
* Only power-of-two prefixes were progressively stratified, not suffixes.
  This resulted in unnecessarily increased noise when using non-power-of-two
  sample counts.
* In order to try to get away with just a single sample pattern, the code
  used a combination of sample index shuffling and Cranley-Patterson rotation.
  Index shuffling is normally fine, but due to the sample patterns themselves
  not being quite right (as described above) this actually resulted in
  additional increased noise. Cranley-Patterson, on the other hand, always
  increases noise with randomized (t,s) nets like PMJ02, and should be avoided
  with these kinds of sequences.

Addressed with the following changes:
* Replace the sample pattern generation code with a much simpler algorithm
  recently published in the paper "Stochastic Generation of (t, s) Sample
  Sequences". This new implementation is easier to verify, produces fully
  progressively stratified PMJ02, and is *far* faster than the previous code,
  being O(N) in the number of samples generated.
* It keeps the sample index shuffling, which works correctly now due to the
  improved sample patterns. But it now uses a newer high-quality hash instead
  of the original Laine-Karras hash.
* The scrambling distance feature cannot (to my knowledge) be implemented with
  any decorrelation strategy other than Cranley-Patterson, so Cranley-Patterson
  is still used when that feature is enabled. But it is now disabled otherwise,
  since it increases noise.
* In place of Cranley-Patterson, multiple independent patterns are generated
  and randomly chosen for different pixels and dimensions as described in the
  original PMJ paper. In this patch, the pattern selection is done via
  hash-based shuffling to ensure there are no repeats within a single pixel
  until all patterns have been used.

The combination of these fixes brings the quality of Cycles' PMJ sampler in
line with the previously submitted Sobol-Burley sampler in D15679. They are
essentially indistinguishable in terms of quality/noise, which is expected
since they are both randomized (0,2) sequences.

Differential Revision: https://developer.blender.org/D15746
2022-09-01 14:57:39 +02:00
a3e1a9e2aa Cleanup: spelling in comments, format 2022-08-26 12:47:21 +10:00
6b9209ddfa Merge branch 'blender-v3.3-release' 2022-08-19 21:02:02 +02:00
4b62970dd3 Cleanup: replace CHECK_TYPE macro with static_assert
To avoid conflicts with BLI headers and simplify code.
2022-08-19 20:36:02 +02:00
Nathan Vegdahl
a06c9b5ca8 Cycles: add Sobol-Burley sampling pattern
Based on the paper "Practical Hash-based Owen Scrambling" by Brent Burley,
2020, Journal of Computer Graphics Techniques.

It is distinct from the existing Sobol sampler in two important ways:
* It is Owen scrambled, which gives it a much better convergence rate in many
  situations.
* It uses padding for higher dimensions, rather than using higher Sobol
  dimensions directly. In practice this is advantagous because high-dimensional
  Sobol sequences have holes in their sampling patterns that don't resolve
  until an unreasonable number of samples are taken. (See Burley's paper for
  details.)

The pattern reduces noise in some benchmark scenes, however it is also slower,
particularly on the CPU. So for now Progressive Multi-Jittered sampling remains
the default.

Differential Revision: https://developer.blender.org/D15679
2022-08-19 16:27:22 +02:00
8ffc11dbcb Cleanup OpenGL linking and related code after libepoxy merge
This cleans up the OpenGL build flags and linking.
It additionally also removes some dead code.

One of these dead code paths is WITH_X11_ALPHA which actually never was
active even with the build flag on. The call to use this was never
called because the default initializer for GHOST was set to have it off
per default. Nothing called this function with a boolean value to enable it.

These cleanups are needed to support true headless OpenGL rendering.
Without these cleanups libepoxy will fail to load the correct OpenGL
Libraries as we have already linked them to the blender binary.

Reviewed By: Brecht, Campbell, Jeroen

Differential Revision: http://developer.blender.org/D15554
2022-08-15 16:47:20 +02:00
a296b8f694 GPU: replace GLEW with libepoxy
With libepoxy we can choose between EGL and GLX at runtime, as well as
dynamically open EGL and GLX libraries without linking to them.

This will make it possible to build with Wayland, EGL, GLVND support while
still running on systems that only have X11, GLX and libGL. It also paves
the way for headless rendering through EGL.

libepoxy is a new library dependency, and is included in the precompiled
libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed.

Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton
and Sergey Sharybin.

Ref T76428

Differential Revision: https://developer.blender.org/D15291
2022-08-15 16:10:29 +02:00
c9d821294f Cycles: take into account time limit for progress bar
This change allows the Cycles progress report system to take into conderation
the time limit property. This allows for more accuracte progress reports for
high sample count renders with short time limits.

Contributed by Alaska.

Differential Revision: https://developer.blender.org/D15599
2022-08-11 19:37:18 +02:00
5cbfdaccd0 Cleanup: minor changes to DebugFlags
Use C++11, remove unused running_inside_blender and move viewport_static_bvh
to BlenderSync.
2022-08-11 17:03:10 +02:00
752fb5dd08 Merge branch 'blender-v3.3-release' 2022-08-09 19:19:54 +02:00
79f1cc601c Cycles: improve ray tracing precision near triangle edges
Detect cases where a ray-intersection would miss the current triangle, which if
the intersection is strictly watertight, implies that a neighboring triangle would
incorrectly be hit instead.

When that is detected, apply a ray-offset. The idea being that we only want to
introduce potential error from ray offsets if we really need to.

This work for BVH2 and Embree, as we are able to match the ray-interesction
bit-for-bit, though doing so for Embree requires ugly hacks. Tiny differences
like fused-multiply-add or dot product intrinstics in matrix inversion and ray
intersection needed to be matched exactly, so this is fragile.

Unfortunately we're not able to do the same for OptiX or MetalRT, since those
implementations are unknown (and possibly impossible to match as hardware
instructions). Still artifacts are much reduced, though not eliminated.

Ref T97259

Differential Revision: https://developer.blender.org/D15559
2022-08-09 18:42:01 +02:00
230f9ade64 Cycles: make transform inverse match Embree exactly
Helps improve ray-tracing precision. This is a bit complicated as it requires
different implementation depending on the CPU architecture.
2022-08-09 16:59:05 +02:00
286e535071 Cleanup: simplify CPU instruction checking
The performance of this will be slightly more important for upcoming changes.
Also removed an unused function and changed includes so these system.h can
be included in more places.
2022-08-09 16:59:05 +02:00
Andrii Symkin
d832d993c5 Cycles: add new Spectrum and PackedSpectrum types
These replace float3 and packed_float3 in various places in the kernel where a
spectral color representation will be used in the future. That representation
will require more than 3 channels and conversion to from/RGB. The kernel code
was refactored to remove the assumption that Spectrum and RGB colors are the
same thing.

There are no functional changes, Spectrum is still a float3 and the conversion
functions are no-ops.

Differential Revision: https://developer.blender.org/D15535
2022-08-09 16:49:34 +02:00
1988665c3c Cleanup: make vector types make/print functions consistent between CPU and GPU
Now all the same ones are available on CPU and GPU, which was previously not
possible due to lack of operator overloadng in OpenCL. Print functions are
no-ops on some GPUs.

Ref D15535
2022-08-09 16:07:23 +02:00
cefd6140f3 Fix T100119: Cycles light object's parametric vector distorted
Caused by 38af5b0501.

Adjust barycentric coordinates used for intersection result in the
ray-to-rectangle intersection check.

Differential Revision: https://developer.blender.org/D15592
2022-08-09 15:56:03 +02:00
25a0124bc8 Fix T100119: Light object's parametric vector distorted in blender 3.4
Caused by 38af5b0501.

Adjust barycentric coordinates used for intersection result in the
ray-to-rectangle intersection check.

Differential Revision: https://developer.blender.org/D15592
2022-08-02 14:17:10 +02:00
d3879e9aaa Merge branch 'blender-v3.3-release' 2022-07-29 15:17:40 +02:00
Tianhao Chai
b862cf0b9f Fix Cycles build error with CUDA on arm64
Checking arm64 assembly support before CUDA/Metal would cause NVCC to
generate inline arm64 assembly.

Differential Revision: https://developer.blender.org/D15569
2022-07-29 14:57:09 +02:00
79ab76e156 Cleanup: simplifications and consistency for vector types
* OneAPI: remove separate float3 definition
* OneAPI: disable operator[] to match other GPUs
* OneAPI: make int3 compact to match other GPUs
* Use #pragma once
* Add __KERNEL_NATIVE_VECTOR_TYPES__ to simplify checks
* Remove unused vector3
2022-07-28 21:27:13 +02:00
38af5b0501 Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX
Simplifies intersection code a little and slightly improves precision regarding
self intersection.

The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
2022-07-27 21:03:33 +02:00