Commit Graph

81 Commits

Author SHA1 Message Date
a3c4091215 Fix Cycles device kernels containing debug assertation code
NanoVDB includes "assert.h" and makes use of "assert" in several places and since the compile
pipeline for CUDA/OptiX kernels does not define "NDEBUG" for release builds, those debug
checks were always added. This is not intended, so this patch disables "assert" for CUDA/OptiX
by defining "NDEBUG" before including NanoVDB headers.
This also fixes a warning about unknown pragmas in NanoVDB thrown by the CUDA compiler.
2020-12-03 15:20:50 +01:00
a63208823c Fix NanoVDB compile errors with recent NanoVDB versions
There were some changes to the NanoVDB API that broke the way Cycles was previously using it.
With these changes it compiles successfully again and also still compiles with the NanoVDB revision
that is currently part of the Blender dependencies. Ref T81454.
2020-11-10 18:28:14 +01:00
ed75a50119 Cycles: Fix function inline attributes
forceinline attribute is only applicable for function which are
marked inline. Interestingly, it can be used for class methods
without explicit inline statement. But for functions it is another
story.
2020-11-09 14:41:00 +01:00
118e31a0a9 Cycles: Fix tricubic sampling with NanoVDB
Volumes using tricubic sampling were producing different results with NanoVDB compared
to dense textures. This fixes that by using the same tricubic sampling algorithm in both
cases. It also fixes some remaining offset issues and some minor things that broke OpenCL
kernel compilation on NVIDIA.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D9491
2020-11-09 12:37:47 +01:00
fd9124ed6b Fix Cycles volume render differences with NanoVDB when using linear sampling
The NanoVDB sampling implementation behaves different from dense texture sampling, so this
adds a small offset to the voxel indices to correct for that.
Also removes the need to modify the sampling coordinates by moving all the necessary
transformations into the image transform. See also T81454.
2020-11-04 15:09:06 +01:00
3df90de6c2 Cycles: Add NanoVDB support for rendering volumes
NanoVDB is a platform-independent sparse volume data structure that makes it possible to
use OpenVDB volumes on the GPU. This patch uses it for volume rendering in Cycles,
replacing the previous usage of dense 3D textures.

Since it has a big impact on memory usage and performance and changes the OpenVDB
branch used for the rest of Blender as well, this is not enabled by default yet, which will
happen only after 2.82 was branched off. To enable it, build both dependencies and Blender
itself with the "WITH_NANOVDB" CMake option.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8794
2020-10-05 15:03:30 +02:00
Valentin
5ac4778056 Cleanup: convert gforge task ID's to phabricator format
Cleanup old tracker task format to the new. e.g: [#34039] to T34039

Ref D8718
2020-09-30 20:11:06 +10:00
Imre Palik
f6dc6caa15 Fix Cycles build error when disabling some kernel features
Differential Revision: https://developer.blender.org/D8372
2020-09-01 19:14:31 +02:00
b4e1571d0b Cleanup: compiler warnings 2020-06-24 17:25:44 +02:00
d9773edaa3 Cycles: code refactor to bake using regular render session and tiles
There should be no user visible change from this, except that tile size
now affects performance. The goal here is to simplify bake denoising in
D3099, letting it reuse more denoising tiles and pass code.

A lot of code is now shared with regular rendering, with the two main
differences being that we read some render result passes from the bake API
when starting to render a tile, and call the bake kernel instead of the
path trace kernel.

With this kind of design where Cycles asks for tiles from the bake API,
it should eventually be easier to reduce memory usage, show tiles as
they are baked, or bake multiple passes at once, though there's still
quite some work needed for that.

Reviewers: #cycles

Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben

Differential Revision: https://developer.blender.org/D3108
2020-05-15 20:25:24 +02:00
006025ead0 Cycles: support for different 3D transform per volume grid
This is not yet fully supported by automatic volume bounds but works fine in
most cases that will have mostly matching bounds.

Ref T73201
2020-03-18 11:23:05 +01:00
26bea849cf Cleanup: add device_texture for images, distinct from other global memory
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
2020-03-12 17:28:55 +01:00
f01bc597a8 Cleanup: stop encoding image data type in slot index
This is legacy code from when we had a fixed number of textures.
2020-03-11 17:07:17 +01:00
c8ac760c59 Cleanup: tweak Cycles #includes in preparation for clang-format sorting 2020-03-06 14:44:42 +01:00
Stefan Werner
51e898324d Adaptive Sampling for Cycles.
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.

After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4686
2020-03-05 12:21:38 +01:00
ea8e0df672 Fix T55054: possible use of unsupported instructions in Cycles texture code
Differential Revision: https://developer.blender.org/D5326
2019-08-16 16:49:04 +02:00
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
fccf506ed7 Cycles: animation denoising support in the kernel.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.

Ref D3889.
2019-02-06 15:18:42 +01:00
405cacd4cd Cycles: prefilter feature passes separate from denoising.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.

The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.

Ref D3889.
2019-02-06 15:18:29 +01:00
203de0bbf0 Cycles: Cleanup, space after (void)
It was used in like 95% of places.
2018-11-09 12:08:51 +01:00
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
a0cc7bd961 Cycles: Implement vectorized NLM kernels for faster CPU denoising 2018-10-06 21:49:54 +02:00
1daa20ad9f Cleanup: strip trailing space for cycles 2018-07-06 10:17:58 +02:00
Stefan Werner
4d00e95ee3 Cycles: Adding native support for UINT16 textures.
Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed.

Reviewers: #cycles, brecht, sergey

Reviewed By: #cycles, brecht, sergey

Subscribers: sergey, dingto, #cycles

Tags: #cycles

Differential Revision: https://developer.blender.org/D3523
2018-07-05 13:53:34 +02:00
9db8bdbc65 Cycles Denoising: Cleanup: Rename tiles to tile_info 2018-07-04 14:37:24 +02:00
3ee606621c Cycles: Query XYZ to/from Scene Linear conversion from OCIO instead of assuming sRGB
I've limited it to just the RGB<->XYZ stuff for now, correct image handling is the next step.

Reviewers: brecht, sergey

Differential Revision: https://developer.blender.org/D3478
2018-06-14 22:21:37 +02:00
81060ff6b2 Windows: Add support for building with clang.
This commit contains the minimum to make clang build/work with blender, asan and ninja build support is forthcoming

Things to note:

1) Builds and runs, and is able to pass all tests (except for the freestyle_stroke_material.blend test which was broken at that time for all platforms by the looks of it)

2) It's slightly faster than msvc when using cycles. (time in seconds, on an i7-3370)

victor_cpu
	msvc:3099.51
	clang:2796.43

pavillon_barcelona_cpu
	msvc:1872.05
	clang:1827.72

koro_cpu
	msvc:1097.58
	clang:1006.51

fishy_cat_cpu
	msvc:815.37
	clang:722.2

classroom_cpu
	msvc:1705.39
	clang:1575.43

bmw27_cpu
	msvc:552.38
	clang:561.53

barbershop_interior_cpu
	msvc:2134.93
	clang:1922.33

3) clang on windows uses a drop in replacement for the Microsoft cl.exe (takes some of the Microsoft parameters, but not all, and takes some of the clang parameters but not all) and uses ms headers + libraries + linker, so you still need visual studio installed and will use our existing vc14 svn libs.

4) X64 only currently, X86 builds but crashes on startup.

5) Tested with llvm/clang 6.0.0

6) Requires visual studio integration, available at https://github.com/LazyDodo/llvm-vs2017-integration

7) The Microsoft compiler spawns a few copies of cl in parallel to get faster build times, clang doesn't, so the build time is 3-4x slower than with msvc.

8) No openmp support yet. Have not looked at this much, the binary distribution of clang doesn't seem to include it on windows.

9) No ASAN support yet, some of the sanitizers can be made to work, but it was decided to leave support out of this commit.

Reviewers: campbellbarton

Differential Revision: https://developer.blender.org/D3304
2018-05-28 14:34:47 -06:00
1dcd7db73d Code cleanup: remove some more unused code after recent CUDA changes. 2018-02-18 00:53:03 +01:00
54632dc830 Cycles: Remove util_debug include from kernel code
Not sure why it was in there, all the debug flags stuff is to be handled outside
of kernel.
2018-01-19 15:21:34 +01:00
2e8914549b Cycles: Fix difference in image Clip extension method between CPU and GPU
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.

This partially fixes issue reported in T53452.

Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
2017-12-08 12:03:11 +01:00
f31fb4a014 Cycles: Cleanup, split 2D interpolation function 2017-12-08 11:22:04 +01:00
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
5801ef71e4 Code refactor: device memory cleanups, preparing for mapped host memory. 2017-11-05 15:22:04 +01:00
7ad9333fad Code refactor: store device/interp/extension/type in each device_memory. 2017-10-24 01:03:59 +02:00
2d92988f6b Cycles: CUDA bicubic and tricubic texture interpolation support.
While cubic interpolation is quite expensive on the CPU compared to linear
interpolation, the difference on the GPU is quite small.
2017-10-07 15:30:57 +02:00
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00
fb99ea79f8 Code refactor: split displace/background into separate kernels, remove luma. 2017-10-05 17:57:58 +02:00
12f4538205 Code refactor: use split variance calculation for mega kernels too.
There is no significant difference in denoised benchmark scenes and
denoising ctests, so might as well make it all consistent.
2017-10-04 21:11:14 +02:00
e3e16cecc4 Code refactor: remove rng_state buffer and compute hash on the fly.
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
2017-10-04 21:11:14 +02:00
c961737d0f Cycles: Fix compilation error of filter kernels on 32 bit Windows
We don't enable global SSE optimizations in regular kernel, and we
keep those disabled on Linux 32bit.

One possible workaround would be to pass arguments by ccl_ref, but
that is quite a few of code which better be done accurately.
2017-08-08 22:01:17 +02:00
ee77c1e917 Code refactor: use float4 instead of intrinsics for CPU denoise filtering.
Differential Revision: https://developer.blender.org/D2764
2017-08-07 14:01:24 +02:00
ea846a4dfc Cycles: Add kernel to enqueue inactive rays
The queue will be used to make reuse of inactive threads to keep
the GPU more busy.
2017-06-10 03:51:18 -04:00
705c43be0b Cycles Denoising: Merge outlier heuristic and confidence interval test
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.

This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.

Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
2017-06-09 03:46:11 +02:00
90a62404cb Cycles: Cleanup, variable names
Don't use camel case for variable names. Leave that for the structures.
2017-05-19 12:52:12 +02:00
740cd28748 Cycles Denoising: Add more robust outlier heuristic to avoid artifacts
Extremely bright pixels in the rendered image cause the denoising algorithm
to produce extremely noticable artifacts. Therefore, a heuristic is needed
to exclude these pixels from the filtering process.

The new approach calculates the 75% percentile of the 5x5 neighborhood of
each pixel and flags the pixel if it is more than twice as bright.

During the reconstruction process, flagged pixels are skipped. Therefore,
they don't cause any problems for neighboring pixels, and the outlier pixels
themselves are replaced by a prediction of their actual value based on their
feature pass values and the neighboring pixels.

Therefore, the denoiser now also works as a smarter despeckling filter that
uses a more accurate prediction of the pixel instead of a simple average.
This can be used even if denoising isn't wanted by setting the denoising
radius to 1.
2017-05-18 21:55:56 +02:00
966a2681f9 Cycles: Fix building with native only option
Approach suggested by Lukas S.
2017-05-16 16:05:04 -04:00
43b374e8c5 Cycles: Implement denoising option for reducing noise in the rendered image
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.

To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.

Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.

Finally, thanks to all the people who supported this project:

- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
  on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
  mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
  that could and/or should work better!
2017-05-07 14:40:58 +02:00
Hristo Gueorguiev
6bf4115c13 Cycles: Split kernel - sort shaders
Reduce thread divergence in kernel_shader_eval.

Rays are sorted in blocks of 2048 according to shader->id.

On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster.

No sorting for CUDA split kernel.

Reviewers: sergey, maiself

Reviewed By: maiself

Differential Revision: https://developer.blender.org/D2598
2017-05-03 15:30:45 +02:00
915766f42d Cycles: Branched path tracing for the split kernel
This implements branched path tracing for the split kernel.

General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.

Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.

Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.

Reviewers: sergey, nirved

Reviewed By: nirved

Subscribers: Blendify, bliblubli

Differential Revision: https://developer.blender.org/D2611
2017-05-02 14:26:46 -04:00
4245ed360e Cycles: Cleanup, indentaiton and trailing whitespace and wrapping 2017-04-28 13:21:17 +02:00