Commit Graph

46 Commits

Author SHA1 Message Date
1daa20ad9f Cleanup: strip trailing space for cycles 2018-07-06 10:17:58 +02:00
Stefan Werner
4d00e95ee3 Cycles: Adding native support for UINT16 textures.
Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed.

Reviewers: #cycles, brecht, sergey

Reviewed By: #cycles, brecht, sergey

Subscribers: sergey, dingto, #cycles

Tags: #cycles

Differential Revision: https://developer.blender.org/D3523
2018-07-05 13:53:34 +02:00
9db8bdbc65 Cycles Denoising: Cleanup: Rename tiles to tile_info 2018-07-04 14:37:24 +02:00
3ee606621c Cycles: Query XYZ to/from Scene Linear conversion from OCIO instead of assuming sRGB
I've limited it to just the RGB<->XYZ stuff for now, correct image handling is the next step.

Reviewers: brecht, sergey

Differential Revision: https://developer.blender.org/D3478
2018-06-14 22:21:37 +02:00
9e717c0495 Cycles: Remove Fermi texture code.
This should be the last Fermi removal commit, unless I missed something.
It's been a pleasure Fermi!
2018-02-17 22:56:58 +01:00
e1ef902058 Cycles: Remove fermi related defines from the code.
Did not touch Texture related defines, that comes next.
2018-02-17 22:19:54 +01:00
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
Stefan Werner
58a15b2bfe Cycles: Fixed compilation of CUDA kernels. Follow-up fix for my last commit. 2017-11-21 10:43:40 +01:00
Stefan Werner
1febc85855 Cycles: Workaround for performance loss with the CUDA 9.0 SDK.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
2017-11-21 10:29:11 +01:00
2e50add164 Fix OpenCL performance regression after cubic interpolation.
Reorganize code to reduce register pressure.
2017-10-15 17:46:50 +02:00
8d73ba58b6 Cycles: Fix compilation of sm_20 and sm_21 kernels
Was broken since the bicubic commit for GPU support.
2017-10-10 12:26:02 +05:00
2d92988f6b Cycles: CUDA bicubic and tricubic texture interpolation support.
While cubic interpolation is quite expensive on the CPU compared to linear
interpolation, the difference on the GPU is quite small.
2017-10-07 15:30:57 +02:00
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00
fb99ea79f8 Code refactor: split displace/background into separate kernels, remove luma. 2017-10-05 17:57:58 +02:00
6da6f8d33f Cycles: CUDA faster rendering of small tiles, using multiple samples like OpenCL.
The work size is still very conservative, and this doesn't help for progressive
refine. For that we will need to render multiple tiles at the same time. But this
should already help for denoising renders that require too much memory with big
tiles, and just generally soften the performance dropoff with small tiles.

Differential Revision: https://developer.blender.org/D2856
2017-10-04 21:58:47 +02:00
12f4538205 Code refactor: use split variance calculation for mega kernels too.
There is no significant difference in denoised benchmark scenes and
denoising ctests, so might as well make it all consistent.
2017-10-04 21:11:14 +02:00
e3e16cecc4 Code refactor: remove rng_state buffer and compute hash on the fly.
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
2017-10-04 21:11:14 +02:00
5b7d6ea54b Code refactor: add WorkTile struct for passing work to kernel.
This makes sharing some code between mega/split in following commits a bit
easier, and also paves the way for rendering multiple tiles later.
2017-10-04 21:11:14 +02:00
45dcd20ca9 Cycles: CUDA split performance tweaks, still far from megakernel.
On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
2017-08-05 14:32:59 +02:00
ea846a4dfc Cycles: Add kernel to enqueue inactive rays
The queue will be used to make reuse of inactive threads to keep
the GPU more busy.
2017-06-10 03:51:18 -04:00
705c43be0b Cycles Denoising: Merge outlier heuristic and confidence interval test
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.

This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.

Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
2017-06-09 03:46:11 +02:00
90a62404cb Cycles: Cleanup, variable names
Don't use camel case for variable names. Leave that for the structures.
2017-05-19 12:52:12 +02:00
de86da521c Cycles: Cleanup, braces after function definition
I wouldn't mind switching fully to Google style, but i am against of
mixing two different styles in same project. So just stick to brace
at the new line after function definition.
2017-05-19 12:43:26 +02:00
803337f3f6 \0;115;0cCycles: Cleanup, use ccl_restrict instead of ccl_restrict_ptr
There were following issues with ccl_restrict_ptr:

- We already had ccl_restrict for all platforms.

- It was secretly adding `const` qualifier to the declaration,
  which is quite weird since non-const pointer can also be
  declared as restricted.

- We never in Blender are using foo_ptr or FooPtr type definitions,
  so not sure why we should introduce such a thing here.

- It is absolutely wrong from semantic point of view to put pointer
  into the restrict macro -- const is a part of type, not part of
  hint for compiler that some pointer is never aliased.
2017-05-19 12:41:03 +02:00
740cd28748 Cycles Denoising: Add more robust outlier heuristic to avoid artifacts
Extremely bright pixels in the rendered image cause the denoising algorithm
to produce extremely noticable artifacts. Therefore, a heuristic is needed
to exclude these pixels from the filtering process.

The new approach calculates the 75% percentile of the 5x5 neighborhood of
each pixel and flags the pixel if it is more than twice as bright.

During the reconstruction process, flagged pixels are skipped. Therefore,
they don't cause any problems for neighboring pixels, and the outlier pixels
themselves are replaced by a prediction of their actual value based on their
feature pass values and the neighboring pixels.

Therefore, the denoiser now also works as a smarter despeckling filter that
uses a more accurate prediction of the pixel instead of a simple average.
This can be used even if denoising isn't wanted by setting the denoising
radius to 1.
2017-05-18 21:55:56 +02:00
43b374e8c5 Cycles: Implement denoising option for reducing noise in the rendered image
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.

To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.

Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.

Finally, thanks to all the people who supported this project:

- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
  on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
  mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
  that could and/or should work better!
2017-05-07 14:40:58 +02:00
Hristo Gueorguiev
6bf4115c13 Cycles: Split kernel - sort shaders
Reduce thread divergence in kernel_shader_eval.

Rays are sorted in blocks of 2048 according to shader->id.

On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster.

No sorting for CUDA split kernel.

Reviewers: sergey, maiself

Reviewed By: maiself

Differential Revision: https://developer.blender.org/D2598
2017-05-03 15:30:45 +02:00
915766f42d Cycles: Branched path tracing for the split kernel
This implements branched path tracing for the split kernel.

General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.

Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.

Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.

Reviewers: sergey, nirved

Reviewed By: nirved

Subscribers: Blendify, bliblubli

Differential Revision: https://developer.blender.org/D2611
2017-05-02 14:26:46 -04:00
0579eaae1f Cycles: Make all #include statements relative to cycles source directory
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.

For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.

Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.

This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.

Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.

Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner

Reviewed By: lukasstockner97, maiself, nirved, dingto

Subscribers: brecht

Differential Revision: https://developer.blender.org/D2586
2017-03-29 13:41:11 +02:00
1cad64900e Cycles: Define ccl_local variables in kernel functions
Declaring ccl_local in a device function is not supported
by certain compilers.
2017-03-16 11:27:17 +01:00
96868a3941 Fix T50888: Numeric overflow in split kernel state buffer size calculation
Overflow led to the state buffer being too small and the split kernel to
get stuck doing nothing forever.
2017-03-11 05:39:28 -05:00
4a2cde3f0e Cycles: Enable SSS and volumes for CUDA and Nvidia OpenCL split kernel 2017-03-10 02:09:41 -05:00
306034790f Cycles: Calculate size of split state buffer kernel side
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.

Will also be a little faster in some cases because of larger global work size.
2017-03-08 01:31:30 -05:00
cd7d5669d1 Cycles: Remove sum_all_radiance kernel
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.

Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
2017-03-08 01:31:07 -05:00
4cf501b835 Cycles: Split path initialization into own kernel
This makes it easier to initialize things correctly in the data_init kernel
before they are needed by path tracing.
2017-03-08 01:30:43 -05:00
817873cc83 Cycles: CUDA implementation of split kernel 2017-03-08 01:24:53 -05:00
dde40989f3 Cycles: Store shadow intersections in the kernel globals
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.

Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
2017-02-08 14:00:48 +01:00
e26eb9c93b Cycles: reduce CUDA stack memory access for Maxwell and up, increasing max registers.
For non-branched path tracing with a GTX 960 and CUDA 7.5, this gives a small reduction
in stack usage but mainly: 8% faster render on BMW, 5% on pabellon, 13% on classroom.
2016-06-19 20:17:26 +02:00
4422b3f919 Some fixes for CUDA runtime compile:
* When Baking wasn't used we got an error.
* On top of Volume Nodes (NODES_FEATURE_VOLUME), we now also check if we need volume sampling code,
so we can disable that as well and save some further compilation time.
2016-05-06 23:13:33 +02:00
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
ff0dcc5d70 Cycles: Make kernel compilable for 3.7 compute capability
It is used by GK210 GPUs which could be found in, i.e. Tesla K80.
2016-01-28 11:56:09 +01:00
Dalai Felinto
9a76354585 Cycles-Bake: Custom Baking passes
The combined pass is built with the contributions the user finds fit.

It is useful for lightmap baking, as well as non-view dependent effects
baking.

The manual will be updated once we get closer to the 2.77 release.
Meanwhile the new page can be found here:

http://dalaifelinto.com/blender-manual/render/cycles/baking.html

Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D1674
2016-01-15 13:00:56 -02:00
3918c8b9a5 Cycles: Optionally output luminance from the shader evaluation kernel
This makes it possible to move some parts of evaluation from host to the device
and hopefully reduce memory usage by avoid having full RGBA buffer on the host.

Reviewers: juicyfruit, lukasstockner97, brecht

Reviewed By: lukasstockner97, brecht

Differential Revision: https://developer.blender.org/D1702
2015-12-30 19:04:04 +05:00
de0672436b Add support for compiling the cuda kernel on the Nvidia Jetson TX1 2015-12-07 17:51:24 +01:00
099aaea447 Cycles: Move branched path tracking into own file
Code there started becoming a bit too big, by splitting it up it'll make it
easier to do improvements or extending the features in there.

The layout is not totally final yet, would need to try de-duplicating parts
of code from split kernel with non-split integrators,
2015-06-15 23:02:42 +02:00
2c503d8303 Cycles: Restructure kernel files organization
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.

Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.

Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.

There should be no functional changes.

Reviewers: juicyfruit, dingto

Differential Revision: https://developer.blender.org/D1314
2015-05-22 16:31:34 +05:00