Commit Graph

1960 Commits

Author SHA1 Message Date
31e96cbf96 Cleanup: style, spelling 2015-05-15 23:38:53 +10:00
c86a6f3efb Cycles: Enable CMJ for Intel/NVidia experimental split kernels
It is still disabled for AMD devices since can't test if it works fine
on this hardware.
2015-05-15 13:22:47 +05:00
2ab909a88c Cycles: Make experimental kernel build option more generic
Previously it was explicitly mentioning it's NVidia kernel related option,
but in fact it's also handy for the OpenCL kernel.
2015-05-15 13:22:47 +05:00
c9e8888f87 Cycles: Disable bake OpenCL kernel for NVidia devices prior to sm_30
Driver fails to compile kernel in reasonable time for those devices here,
so for easier testing of the OpenCL split kernel work disabling bake kernel
for now.
2015-05-15 13:22:47 +05:00
3c10ec96b5 Cycles: Enable object motion blur on Intel OpenCL platform
This required allocating some memory related on object transform needed
by ShaderData and currently it is done for all the platforms. Since we're
targeting full feature-complete platforms this is rather acceptable at
this point and in the future we'll do selective NO_HAIR/NO_SSS/NO_BLUR
kernels.

This is experimental still and in fact there're some major issues on
NVidia platform and it's not really clear if it's a bug in compiler,
some uninitizlied variable or other kind of issue.
2015-05-15 00:48:12 +05:00
f6c6dd44de Cycles: Remove meaningless ifdef checks for features in device_opencl
This file was actually checking for features enabled on CPU and surely all
of them were enabled, so removing them does not cause any difference.

ideally we'll need to do runtime feature detection and just pass some stuff
as NULL to the kernel, or maybe also have variadic kernel entry points which
is also possible quite easily.
2015-05-14 23:44:19 +05:00
5c34266383 Cycles: Enable camera motion blur in split kernel for Intel/NVidia
It's good for testing and seems to work quite reliably here.

This probably not totally cheap in terms of performance, but this we
could solve quite easily by selective kernel compilation once other
things are tested/proved to be reliable.
2015-05-14 23:35:19 +05:00
3d3d805b64 Cycles: Prepare code for OpenCL camera/motion blur
The kernels are now compiling just fine, but there're some issues
during rendering. This is still to be investigated.
2015-05-14 18:48:56 +05:00
5a63edb929 Cycles: Use special _auto versions of transform function in motion blur code
Doing this as a separate commit so it's easier to revert in the future, once
OpenCL 2.0 is becoming our requirement.
2015-05-14 18:48:56 +05:00
79aa50dc53 Cycles: Enable hair for split kernels when using Intel or NVidia drivers
Apart from simply enabling this features needed changes to the code were done.
Technical change, replacing SD access from "simple" structure to SOA.
2015-05-14 18:48:56 +05:00
fc31bae66f Cleanup: Avoid temp variable in portal sampling code. 2015-05-13 19:54:52 +02:00
0a6e32173e Cleanup / Cycles: De-Duplicate Portal data fetch and side check. 2015-05-13 16:05:30 +02:00
583fd3af65 Cycles: Fix typo in global space version of normal transform
It was using direction transform, which is obviously wrong.
2015-05-10 00:53:32 +05:00
2840a5de8f Cycles: Workaround for AMD compiler crashing building the split kernel
It's a but in compiler but it's nice to have working kernel for until
that bug is fixed.
2015-05-09 19:56:38 +05:00
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
6fc1669679 Cycles: Initial work towards selective nodes support compilation
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,

The idea is:

- Have few node groups, starting with a group which contains nodes are used
  really often, and then couple of groups which will be extension of this one.

- Have feature-based nodes disabling, so it's possible to disable nodes related
  to features which are not used with the currently used nodes group.

This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
2015-05-09 19:22:16 +05:00
5068f7dc01 Cycles: Add utility function to graph to query number of closures used in it
Currently unused but will be needed soon for the split kernel work.
2015-05-09 19:13:32 +05:00
d69c80f717 Cycles: Presumably correct workaround for addrspace in camera motion blur 2015-05-09 19:04:19 +05:00
c9133778cf Cycles: Add CPU compat headers to some of the OSL implementation files
This header was already included into some of the implementation files already,
and this change is needed for some upcoming changes in the way how kernel_types.h
works.
2015-05-09 19:04:16 +05:00
900fc43bb4 Cleanup: Remove unused ray type flags.
They were added for completeness, but it seems we don't need them.
2015-05-08 12:10:26 +02:00
9ca2b76a9f Cycles: Cleanup, make it more clear what endif closes what ifdef 2015-05-07 15:02:43 +05:00
165598e49e Correct typo: ifdef'd now, but obviously wrong 2015-05-07 10:12:12 +10:00
7201f6d14c Cycles: Use curve approximation for blackbody instead of lookup table
Now we calculate color in range 800..12000 using an approximation a/x+bx+c for R and G and ((at + b)t + c)t + d) for B.
Max absolute error for RGB for non-lut function is less than 0.0001, which is enough to get the same 8 bit/channel color as for OSL with a noticeable performance difference.
However there is a slight visible difference between previous non-OSL implementation because of lookup table interpolation and offset-by-one mistake.
The previous implementation gave black color outside of soft range (t > 12000), now it gives the same color as for 12000.

Also blackbody node without input connected is being converted to value input at shader compile time.

Reviewers: dingto, sergey

Reviewed By: dingto

Subscribers: nutel, brecht, juicyfruit

Differential Revision: https://developer.blender.org/D1280
2015-05-05 06:11:54 +00:00
4eab0e72b3 Cleanup: Update some comments and add ToDo. 2015-04-29 23:56:46 +02:00
b3def11f5b Cycles: Record all possible volume intersections for SSS and camera checks
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.

Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.

This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain

In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.

Patch by Sergey and myself.

Differential Revision: https://developer.blender.org/D1264
2015-04-29 23:31:06 +02:00
7aab5c6ca9 Cycles: Fix wrong termination criteria in SSS volume stack update
Another issue spotted with Thomas.
2015-04-30 01:20:17 +05:00
5e423775da Cleanup: Move Cycles volume stack update for subsurface into kernel_volume.h. 2015-04-28 11:20:27 +02:00
58a2b10a65 Cycles: Initialize portal variable directly, so we can avoid the one NULL check. 2015-04-27 23:12:53 +02:00
f478c2cfbd Cycles: Added support for light portals
This patch adds support for light portals: objects that help sampling the
environment light, therefore improving convergence. Using them tor other
lights in a unidirectional pathtracer is virtually useless.

The sampling is done with the area-preserving code already used for area lamps.
MIS is used both for combination of different portals and for combining portal-
and envmap-sampling.

The direction of portals is considered, they aren't used if the sampling point
is behind them.

Reviewers: sergey, dingto, #cycles

Reviewed By: dingto, #cycles

Subscribers: Lapineige, nutel, jtheninja, dsisco11, januz, vitorbalbio, candreacchio, TARDISMaker, lichtwerk, ace_dragon, marcog, mib2berlin, Tunge, lopataasdf, lordodin, sergey, dingto

Differential Revision: https://developer.blender.org/D1133
2015-04-28 01:30:16 +05:00
ae7d84dbc1 Cycles: Use native saturate function for CUDA
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.

Original patch by @lockal with own modification:

  Don't make changes outside of the kernel. They don't make any difference
  anyway and term saturate() has a bit different meaning outside of kernel.

This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.

Subscribers: dingto

Projects: #cycles

Differential Revision: https://developer.blender.org/D1224
2015-04-28 00:38:32 +05:00
bc160d8a85 Cleanup: Code style. 2015-04-26 00:42:26 +02:00
60c5a2f2d2 Cycles: Add Mirror ball mapping to camera panorama options
The projection code was already in place, so this just exposes the option.

Differential Revision: https://developer.blender.org/D1079
2015-04-25 23:51:56 +02:00
b82d571c85 Cleanup: style 2015-04-21 15:53:32 +10:00
828abaf11c Cycles: Split BVH nodes storage into inner and leaf nodes
This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.

This translates into following memory save using 01.01.01.G rendered
without hair:

                   Device memory size   Device memory peak   Global memory peak
Before the patch:  4957                 5051                 7668
With the patch:    4467                 4562                 7332

The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.

Reviewers: brecht, juicyfruit

Subscribers: venomgfx, eyecandy

Differential Revision: https://developer.blender.org/D1236
2015-04-20 17:29:51 +05:00
bf11e362c5 Fix T44046: Cycles speed regression in 2.74 (CPU only)
Issue was caused by MSVC not being able to optimize some code out in the same
way as GCC/Clang does, so now that parts of code are explicitly unfolded in
order to help compilers out.

This makes speed loss much less drastic on my laptop. That's probably as good
as we can do with MSVC without investing infinite amount of time looking trying
to workaround the optimizer.
2015-04-08 18:47:25 +05:00
09a746b857 Cycles: Cleanup, typos 2015-04-08 01:15:38 +05:00
858f54f16e Cycles: Cleanup, indentation 2015-04-07 22:41:08 +05:00
e2354e64d2 Cycles: Cleanup, spaces around assignment operator
Did some bad spacing in recent commits, better to get rid of those so
they does not confuse those who're working on sources.
2015-04-07 00:25:54 +05:00
c1d8ddacaf Cycles: Avoid doing paranoid checks in filepath of builtin images
Originally we thought it's needed in order to distinguish builtin file from
filename which starts with '@', but the filepath is actually full path there
and it's unlikely to have file system where '@' is a proper root character.

Surprisingly this does not give visible speed differences, but it's still
nice to get rid of redundant check.
2015-04-07 00:11:47 +05:00
7c19239bf9 Cycles: Support bultin 3d textures with OSL backend 2015-04-06 23:29:29 +05:00
a9bb8d8a73 Cycles: de-duplicate fast/approximate erf function calculation
Our own implementation is in fact the same performance as in fast_math from
OpenShadingLanguage, but implementation from fast_math is using explicit madd
function, which increases chance of compiler deciding to use intrinsics.
2015-04-06 12:49:44 +05:00
ab2d05d958 Fix T44269: Typo in volume_attribute_float:geom_volume.h
Was rather harmless typo since we either pass both dx,dy or pass both NULL.
2015-04-05 19:07:45 +05:00
b06962fcfe Cycles: Avoid using lookup table for Beckmann slopes on GPU
This patch is based on some work done in D788 and re-formulation from Beckmann
implementation in OpenShadingLanguage.

Skipping texture lookup helps a lot on GPUs where it's more expensive to access
texture memory than to do some extra calculation in threads.

CPU code still uses lookup-table based approach since this seems to be still
faster (at least on computers i've got access to).

This change gives about 2% speedup on BMW scene with GTX560TI.
2015-04-05 19:07:45 +05:00
252b36ce77 Cycles: Remove unused Beckmann slope sampling code
It did not preserve stratification too well and lookup-table approach was
working much better. There are now also some more interesting forumlation
from Wenzel and OpenShadingLanguage which should work better than old code.
2015-04-05 19:07:44 +05:00
e5392069cc Cleanup: Typo fix in HSV code. 2015-04-04 07:50:09 +02:00
f1494edf78 Cycles: Make SSS intersection closer to regular triangle intersection 2015-04-01 21:20:04 +05:00
394b947a50 Cycles: Remove unused direction from triangle intersection functions
This argument was unused and got nicely optimized out. But once it
starts to be using registers are getting stressed really crazy,
causing slow down of render.
2015-04-01 21:08:12 +05:00
af399884e1 Fix T44113: Ashikhmin-Shirley distribution of glossy shader at 0 roughness causes artifacts when background uses MIS
Was a division by zero error, solved in the same way as beckmann/ggx
deals with small roughness values.
2015-04-01 14:21:21 +05:00
79918e0577 Cycles: Avoid float/int conversion in few places 2015-03-31 19:52:14 +05:00
7da4c2637d Cycles: Fix typo in distance heuristic for shadow rays
It's not that bad because this typo could only caused not really
efficient BVH traversal, causing higher render times. Not as if
it was causing render artifacts.
2015-03-31 19:52:14 +05:00