Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.
T61736: removal of mega kernel
T61703: baking does not work with mega kernel
Tags: #cycles
Differential Revision: https://developer.blender.org/D4383
There is a generic function to retrieve float and float3 attributes
`primitive_attribute_float` and primitive_attribute_float3`. Inside
these functions an prioritised if-else construction checked where
the attribute is stored and then retrieved from that location.
Actually the calling function most of the time already knows where
the data is stored. So we could simplify this by splitting these
functions and remove the check logic.
This patch splits the `primitive_attribute_float?` functions into
`primitive_surface_attribute_float?` and `primitive_volume_attribute_float?`.
What leads to less branching and more optimum kernels.
The original function is still being used by OSL and `svm_node_attr`.
This will reduce the compilation time and render time for kernels.
Especially in production scenes there is a lot of benefit.
Impact in compilation times
job | scene_name | previous | new | percentage
-------+-----------------+----------+-------+------------
t61513 | empty | 10.63 | 10.66 | 0%
t61513 | bmw | 17.91 | 17.65 | 1%
t61513 | fishycat | 19.57 | 17.68 | 10%
t61513 | barbershop | 54.10 | 24.41 | 55%
t61513 | classroom | 17.55 | 16.29 | 7%
t61513 | koro | 18.92 | 18.05 | 5%
t61513 | pavillion | 17.43 | 16.52 | 5%
t61513 | splash279 | 16.48 | 14.91 | 10%
t61513 | volume_emission | 36.22 | 21.60 | 40%
Impact in render times
job | scene_name | previous | new | percentage
-------+-----------------+----------+--------+------------
61513 | empty | 21.06 | 20.35 | 3%
61513 | bmw | 198.44 | 190.05 | 4%
61513 | fishycat | 394.20 | 401.25 | -2%
61513 | barbershop | 1188.16 | 912.39 | 23%
61513 | classroom | 341.08 | 340.38 | 0%
61513 | koro | 472.43 | 471.80 | 0%
61513 | pavillion | 905.77 | 899.80 | 1%
61513 | splash279 | 55.26 | 54.86 | 1%
61513 | volume_emission | 62.59 | 61.70 | 1%
There is also a possitive impact when using CPU and CUDA, but they are small.
I didn't split the hair logic from the surface logic due to:
* Hair and surface use same attribute types. It was not clear if it could be
splitted when looking at the code only.
* Hair and surface are quick to compile and to read. So the benefit is quite
small.
Differential Revision: https://developer.blender.org/D4375
The code assumed all datablocks were read from .blend files saved with the
same version. This restructures the Cycles versioning code to take into
account libraries.
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.
Patch by lukasstockner97, jbakker, brecht
job | scene_name | compilation_time
----------+-----------------+------------------
Baseline | empty | 22.73
D2264 | empty | 13.94
Baseline | bmw | 56.44
D2264 | bmw | 41.32
Baseline | fishycat | 59.50
D2264 | fishycat | 45.19
Baseline | barbershop | 212.28
D2264 | barbershop | 169.81
Baseline | victor | 67.51
D2264 | victor | 53.60
Baseline | classroom | 51.46
D2264 | classroom | 39.02
Baseline | koro | 62.48
D2264 | koro | 49.03
Baseline | pavillion | 54.37
D2264 | pavillion | 38.82
Baseline | splash279 | 47.43
D2264 | splash279 | 37.94
Baseline | volume_emission | 145.22
D2264 | volume_emission | 121.10
This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).
Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97
Reviewed By: brecht
Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli
Differential Revision: https://developer.blender.org/D2264
Values outside the 0..1 range produce negative colors, so now clamp to that
range everywhere. Also fixes improper handling of hue > 2.0 in some places.
For OIIO 2.x we must use unique_ptr. This also required updating the
guarded allocator for std::move to work. Since C++11 construct/destroy
have a default implementation that also works this case, so we just
leave it out.
The render layer name is now always included. Best to keep these consistent,
so that animation denoising and sample merging works the same for both and
tests can be the same. Ref D4311.
This adds a cycles.denoise_animation operator, which denoises an animation
sequence or individual file. Renders must be saved as multilayer EXR files
with denoising data passes.
By default file path and frame range come from the current scene, and EXR
files are denoised in-place. Alternatively, a different input and/or output
file path can be provided.
Denoising settings come from the current view layer. Renders can be denoised
again with different settings, as the original noisy image is preserved along
with other passes and metadata.
There is no user interface yet for this feature, that comes later.
Code by Lukas with modifications by Brecht. This feature was originally
developed for Tangent Animation, thanks for the support!
Differential Revision: https://developer.blender.org/D3889
This is backporting a change from 2.8, which may help solve crashes when
activating a window. Previously bringTabletContextToFront() would call
tablet API functions with NULL tablet, which may crash on some drivers.
Ref T60811.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.
Ref D3889.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.
The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.
Ref D3889.
BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
We've had many reported crashes on Windows where we suspect there is a
corrupted OpenCL driver. The purpose here is to keep Blender generally
usable in such cases.
Now it always shows None / CUDA / OpenCL in the preferences, and only when
selecting one will it reveal if there are any GPUs available. This should
avoid crashes when opening the preferences or on startup.
Differential Revision: https://developer.blender.org/D4265
The integrator maximum number of closures was not set properly for the CPU/mega
kernels to match the actual available memory. Before relatively recent code
refactoring we did not use this value in those kernels so it worked fine.
Even though it makes sense logically to have displacement actually displace
the mesh, this is causing a lot of confusion for existing users that are used
to the previous behavior. Further, since Eevee does not support displacement
yet and the discrepancy between the viewport and final render is problematic.
Xorg's XIWarpPointer doesn't support multi-head display while
XWarpPointer does.
Revert since this is a known TODO in Xorg and setting a custom
xinput matrix seems not to be used often.
Resolves T50383
This disables touch gesture recognition in Blender, avoiding any initial delay
when drawing with grease pencil, texture paint, etc.
Differential Revision: https://developer.blender.org/D4203
This change brings back old original logic which was checking
whether worker threads do fit into an active CPU group. But
it does it a bit smarter now and is also checking affinity
within that group. This way Cycles will use all threads on a
Threadripper2 CPU if it's set to automatic number of threads,
but on another hand will not change affinity if user requested
16 threads and changed Blender affinity.
Tweaked scheduling so it survives this situation by scattering
"extra" threads uniformly over all the NUMA nodes.
There are still tweaks possible to make some specific hardware
configurations work better.