Commit Graph

2101 Commits

Author SHA1 Message Date
949ab753bb Cycles OpenCL: Remove OpenCL MegaKernel
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.

T61736: removal of mega kernel
T61703: baking does not work with mega kernel

Tags: #cycles

Differential Revision: https://developer.blender.org/D4383
2019-02-20 15:17:22 +01:00
667033e89e T61463: Separate Baking kernels
Cycles OpenCL: Split baking kernels in own program

Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.

When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.

Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.

Impact on compilation time

    job   |   scene_name    | previous |  new  | percentage
  --------+-----------------+----------+-------+------------
   T61463 | empty           |    10.63 |  7.27 |         32%
   T61463 | bmw             |    17.91 | 14.24 |         20%
   T61463 | fishycat        |    19.57 | 15.08 |         23%
   T61463 | barbershop      |    54.10 | 48.18 |         11%
   T61463 | classroom       |    17.55 | 14.42 |         18%
   T61463 | koro            |    18.92 | 17.15 |          9%
   T61463 | pavillion       |    17.43 | 14.23 |         18%
   T61463 | splash279       |    16.48 | 15.33 |          7%
   T61463 | volume_emission |    36.22 | 34.19 |          6%

Impact on render time

    job   |   scene_name    | previous |   new   | percentage
  --------+-----------------+----------+---------+------------
   T61463 | empty           |    21.06 |   20.54 |          2%
   T61463 | bmw             |   198.44 |  189.59 |          4%
   T61463 | fishycat        |   394.20 |  388.50 |          1%
   T61463 | barbershop      |  1188.16 | 1185.49 |          0%
   T61463 | classroom       |   341.08 |  339.27 |          1%
   T61463 | koro            |   472.43 |  360.70 |         24%
   T61463 | pavillion       |   905.77 |  902.14 |          0%
   T61463 | splash279       |    55.26 |   54.92 |          1%
   T61463 | volume_emission |    62.59 |   39.09 |         38%

I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...

Maniphest Tasks: T61463

Differential Revision: https://developer.blender.org/D4376
2019-02-19 16:34:55 +01:00
e6f5632eb1 T61513: Refactored Cycles Attribute Retrieval
There is a generic function to retrieve float and float3 attributes
`primitive_attribute_float` and primitive_attribute_float3`. Inside
these functions an prioritised if-else construction checked where
the attribute is stored and then retrieved from that location.

Actually the calling function most of the time already knows where
the data is stored. So we could simplify this by splitting these
functions and remove the check logic.

This patch splits the `primitive_attribute_float?` functions into
`primitive_surface_attribute_float?` and `primitive_volume_attribute_float?`.
What leads to less branching and more optimum kernels.

The original function is still being used by OSL and `svm_node_attr`.

This will reduce the compilation time and render time for kernels.
Especially in production scenes there is a lot of benefit.

Impact in compilation times

    job  |   scene_name    | previous |  new  | percentage
  -------+-----------------+----------+-------+------------
  t61513 | empty           |    10.63 | 10.66 |          0%
  t61513 | bmw             |    17.91 | 17.65 |          1%
  t61513 | fishycat        |    19.57 | 17.68 |         10%
  t61513 | barbershop      |    54.10 | 24.41 |         55%
  t61513 | classroom       |    17.55 | 16.29 |          7%
  t61513 | koro            |    18.92 | 18.05 |          5%
  t61513 | pavillion       |    17.43 | 16.52 |          5%
  t61513 | splash279       |    16.48 | 14.91 |         10%
  t61513 | volume_emission |    36.22 | 21.60 |         40%

Impact in render times

    job  |   scene_name    | previous |  new   | percentage
  -------+-----------------+----------+--------+------------
  61513 | empty           |    21.06 |  20.35 |          3%
  61513 | bmw             |   198.44 | 190.05 |          4%
  61513 | fishycat        |   394.20 | 401.25 |         -2%
  61513 | barbershop      |  1188.16 | 912.39 |         23%
  61513 | classroom       |   341.08 | 340.38 |          0%
  61513 | koro            |   472.43 | 471.80 |          0%
  61513 | pavillion       |   905.77 | 899.80 |          1%
  61513 | splash279       |    55.26 |  54.86 |          1%
  61513 | volume_emission |    62.59 |  61.70 |          1%

There is also a possitive impact when using CPU and CUDA, but they are small.

I didn't split the hair logic from the surface logic due to:

* Hair and surface use same attribute types. It was not clear if it could be
  splitted when looking at the code only.
* Hair and surface are quick to compile and to read. So the benefit is quite
  small.

Differential Revision: https://developer.blender.org/D4375
2019-02-19 16:28:25 +01:00
9800837b98 Cycles: Support multithreaded compilation of kernels
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.

Patch by lukasstockner97, jbakker, brecht

    job    |   scene_name    | compilation_time
----------+-----------------+------------------
    Baseline | empty           |            22.73
    D2264    | empty           |            13.94
    Baseline | bmw             |            56.44
    D2264    | bmw             |            41.32
    Baseline | fishycat        |            59.50
    D2264    | fishycat        |            45.19
    Baseline | barbershop      |           212.28
    D2264    | barbershop      |           169.81
    Baseline | victor          |            67.51
    D2264    | victor          |            53.60
    Baseline | classroom       |            51.46
    D2264    | classroom       |            39.02
    Baseline | koro            |            62.48
    D2264    | koro            |            49.03
    Baseline | pavillion       |            54.37
    D2264    | pavillion       |            38.82
    Baseline | splash279       |            47.43
    D2264    | splash279       |            37.94
    Baseline | volume_emission |           145.22
    D2264    | volume_emission |           121.10

This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).

Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97

Reviewed By: brecht

Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli

Differential Revision: https://developer.blender.org/D2264
2019-02-15 08:56:20 +01:00
de0e456a6c Cleanup: fix compiler warnings. 2019-02-14 19:39:39 +01:00
9886ae6331 Fix T61470: incorrect saturation clamping in recent bugfix.
We should clamp the result after multiplication.
2019-02-14 19:28:44 +01:00
ec559912fb Fix T61470: inconsistent HSV node results with saturation > 1.0.
Values outside the 0..1 range produce negative colors, so now clamp to that
range everywhere. Also fixes improper handling of hue > 2.0 in some places.
2019-02-13 17:06:30 +01:00
fccf506ed7 Cycles: animation denoising support in the kernel.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.

Ref D3889.
2019-02-06 15:18:42 +01:00
c183ac73dc Cycles: tweak outlier detection, preparing for animation denoising.
Ref D3889.
2019-02-06 15:18:38 +01:00
405cacd4cd Cycles: prefilter feature passes separate from denoising.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.

The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.

Ref D3889.
2019-02-06 15:18:29 +01:00
8c68ed6df1 Cleanup: remove redundant, invalid info from headers
BF-admins agree to remove header information that isn't useful,
to reduce noise.

- BEGIN/END license blocks

  Developers should add non license comments as separate comment blocks.
  No need for separator text.

- Contributors

  This is often invalid, outdated or misleading
  especially when splitting files.

  It's more useful to git-blame to find out who has developed the code.

See P901 for script to perform these edits.
2019-02-02 02:40:00 +11:00
d918217d35 OSL: remove fresnel template that was not public domain.
Convention is to only have public domain code templates. Also fixes wrong
license header in Cycles.
2019-01-28 12:04:54 +01:00
10fa3b790f Fix T60450: Cycles broken GPU denoising after recent changes. 2019-01-14 11:42:38 +01:00
e5a1a9288c Fix T60320: Cycles OpenCL denoising filter errors on some drivers. 2019-01-11 11:25:37 +01:00
b486088218 Fix T60320: Cycles OpenCL volume rendering error on some drivers. 2019-01-08 15:59:10 +01:00
8491dba0c6 Fix T60300: Cycles SSS render hanging with AMD OpenCL. 2019-01-08 15:37:16 +01:00
fffdedbcc1 Fix T54962: Cycles crash using subsurface scattering texture blur. 2019-01-03 17:10:37 +01:00
f7e9642da9 Fix T60061: Cycles OSL point density not working.
Add override keywords so we can detect when the function definitions change.
2019-01-02 19:56:49 +01:00
8e331c3431 Fix T59565: NaN/crash with zero radius tip of hair curves. 2018-12-21 18:54:45 +01:00
765795aed7 Fix macOS buildbot build, wrong CUDA version check. 2018-12-11 14:16:48 +01:00
cccc40db51 Fix T57963: Cycles crash using AO for displacement.
Note this is not supported, there exists no geometry at this point, but
it should not crash at least.
2018-12-06 19:50:05 +01:00
f5b46daf52 Fix build with old CMake versions. 2018-12-05 12:53:19 +01:00
f63da3dcf5 Buildbot: enable support for NVIDIA Turing cards in Cycles (like GTX 20xx).
We currently only build the sm_7x kernels with CUDA 10.0, older cards still
use 9.1 until rendering errors are solved for them.
2018-12-04 16:03:18 +01:00
b14ec18601 Cycles: add initial CUDA 10.0 support, but only recommend use for Turing cards.
There may still be rendering errors when used for older graphics cards.
2018-12-04 16:03:18 +01:00
5a6f1fa563 Fix T58600: update OSL scripts to work with OSL 1.10.x. 2018-12-03 15:14:21 +01:00
7fa6f72084 Cycles: Add sample-based runtime profiler that measures time spent in various parts of the CPU kernel
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.

The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").

Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.

Reviewers: brecht, sergey, swerner

Reviewed By: brecht, swerner

Differential Revision: https://developer.blender.org/D3892
2018-11-29 02:45:24 +01:00
e742e0934d Cleanup: trailing space 2018-11-25 08:01:14 +11:00
968bf0df14 Fix T57811: Render crashes in certain scenes when AO Bounces are used 2018-11-21 14:17:26 +01:00
6f48bfc7a8 Cycles: Cleanup, use utility function
Replaces inlined platform-specific code.
2018-11-21 13:51:18 +01:00
65143542af Cycles: Cleanup, reduce indentation level 2018-11-21 12:41:24 +01:00
700330afe8 Cycles: Cleanup, comments and dead code 2018-11-21 11:33:11 +01:00
65d01def80 Cycles: Cleanup, CUDA code path is not possible inside AVX2 2018-11-21 11:28:49 +01:00
cd9ab9d99e Cycles: Cleanup, code style 2018-11-15 17:16:40 +01:00
65e9388440 Revert "Cycles: Cleanup, move Embree BVH logic to own file"
While we shouldn't have logic in an entry point, and since one should
not be making typos when moving lines around, there is bigger entanglement
issue with BVH host code using kernel function. This is bad violation,
but is tricky to get solved moments before the weekly.

In order to keep things in a (less) broken state than before own cleanup
reverting the changes.

This reverts commit 2bad10be96.
This reverts commit ddabb21d05
2018-11-09 17:54:09 +01:00
ddabb21d05 Cycles; Cleanup, line length
There are some more sanitization which would be cool to be done
in the neighbourhood of those functions, but that could also happen
later.
2018-11-09 12:31:46 +01:00
2bad10be96 Cycles: Cleanup, move Embree BVH logic to own file
There is no way we can keep generic entry point functions easy to
follow if we start adding actual logic in them.
2018-11-09 12:28:55 +01:00
2d98b198e9 Cycles: Cleanup, indentation in preprocessor 2018-11-09 12:12:11 +01:00
3e76cc494a Cycles: Cleanup, indentation 2018-11-09 12:10:48 +01:00
203de0bbf0 Cycles: Cleanup, space after (void)
It was used in like 95% of places.
2018-11-09 12:08:51 +01:00
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
116be3deff Fix build on 32bit after Embree changes. 2018-11-08 14:58:01 +01:00
Stefan Werner
d3320c5488 Cycles: Rearranged macros in kernel_types.h to fix Embree build. 2018-11-07 15:20:24 +01:00
33201a48b0 Fix build with OSL, remove unneeded file after Embree changes. 2018-11-07 14:38:07 +01:00
Stefan Werner
2c5531c0a5 Cycles: Added Embree as BVH option for CPU renders.
Note that this is turned off by default and must be enabled at build time with the CMake WITH_CYCLES_EMBREE flag.
Embree must be built as a static library with ray masking turned on, the `make deps` scripts have been updated accordingly.
There, Embree is off by default too and must be enabled with the WITH_EMBREE flag.

Using Embree allows for much faster rendering of deformation motion blur while reducing the memory footprint.

TODO: GPU implementation, deduplication of data, leveraging more of Embrees features (e.g. tessellation cache).

Differential Revision: https://developer.blender.org/D3682
2018-11-07 12:58:12 +01:00
ea8e45de29 Fix assert rendering hair tests on some systems. 2018-11-04 20:25:57 +01:00
7c0d37deca Fix build error on Windows 32bit, alignment was wrong. 2018-10-30 11:39:44 +01:00
Stefan Werner
e58c6cf0c6 Cycles: Added Cryptomatte output.
This allows for extra output passes that encode automatic object and material masks
for the entire scene. It is an implementation of the Cryptomatte standard as
introduced by Psyop. A good future extension would be to add a manifest to the
export and to do plenty of testing to ensure that it is fully compatible with other
renderers and compositing programs that use Cryptomatte.

Internally, it adds the ability for Cycles to have several passes of the same type
that are distinguished by their name.

Differential Revision: https://developer.blender.org/D3538
2018-10-28 05:37:41 -04:00
c0b3e3daeb Fix T57393: Cycles OSL bevel and AO not working after OSL upgrade. 2018-10-27 15:00:37 +02:00
65b25df801 Cycles: Overhaul ensure_valid_reflection to fix issues with normal- and bumpmapping
This function is supposed to prevent the black artifacts caused by strong normal- or bumpmapping, but failed in some cases.

Now the code correctly handles all test files and previous issues I am aware of and also has extensive comments describing
the algorithm and the math behind it.

Basically, the main problem was that there can be multiple valid solutions that fulfil the reflection angle criterium,
but I had assumed that only one would exist and therefore simply picked the first solution with a positive term in srqt().
Now, the code uses additional validity checks and a simple heuristic to pick the best valid solution.

Additionally, the code messed up very shallow reflections even if the normal map strength was zero due to the constant
limit for the outgoing ray angle, which caused shallow incoming rays to fail the initial test even when reflected directly
on Ng. Now, the code accounts for this by reducing the threshold in the case of a shallow incoming ray, ensuring that at
least N=Ng is always a valid solution.

Reviewers: brecht

Differential Revision: https://developer.blender.org/D3816
2018-10-25 14:50:48 +02:00
15e9d80375 Cycles: Use existing shared temporary memory in reconstruction step of the denoiser
Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.
2018-10-08 22:13:40 +02:00