Commit Graph

179 Commits

Author SHA1 Message Date
d0a9ec5efc Cycles: Fix SSS object not properly reflected in glossy object with indirect clamping
This fixes remained issues reported in T46908.
2015-12-02 16:00:01 +05:00
6147c4037d Cycles: Fix wrong volume stack after SSS bounce
Was introduced by a recent fixes, now it should be all correct and additionally
it solves the TODO mentioned in the code.
2015-11-28 20:07:34 +05:00
f5d1551b6e Cycles: Fix wrong original ray used for SSS baking
Also de-duplicated some code by moving to an utility function.
2015-11-28 20:07:34 +05:00
1e43f0d742 Cycles: Set of fixes for delayed SSS ray tracing
There were multiple issues which are solved now:

- It was possible that ray wouldn't be bounced off the BSSRDF, for example
  when PDF or shader eval is zero. In this case PathState might have been
  left in pre-bounced state which would have been gave incorrect shading
  results.

  This is solved by having separate PathState for each of the hits.

- Path radiance summing wasn't happening correct as well, indirect rays
  were using wrong path radiance in the case when there were more than
  one hit recorded.

  This is now using a bit trickier state machine which calculates path
  radiance for just SSS (both direct and indirect) and then sums it back
  to the final radiance.

- Previous commit wasn't totally correct either and was an induced bug
  due to wrong path state left from the "un-happened" ray bounce.

  There should be no special case happening here, BSSRDFs will be replaced
  with diffuse ones due to PATH_RAY_DIFFUSE_ANCESTOR flag.

- Merged back codebases for "delayed" and "immediate" indirect SSS ray
  tracing, hopefully making it easier to maintain the codebase.

Sure this changes brings memory usage back by about 4-5%, but overall
it's still about 2x memory reduction for the experimental kernel here.

Thanks Brecht for the review!
2015-11-28 20:07:34 +05:00
8919ed3a62 Cycles: Fallback to diffuse BSDF for the indirect SSS rays when BSSRDF is hit
This is actually how it was intended to work, just didn't notice it wasn't
really happening in the main ray loop.

Solves some memory issues reported in T46880.
2015-11-28 20:07:34 +05:00
20fc9c00fd Cycles: Fully roll-back to non-delayed SSS indirect rays for CPU
There are some issues to be solved with the recent optimization we did for
the indirect rays for the SSS. Those issues will take a bit of a time to
be fully solved still and we need to unlock Caminandes team now, so let's
revert some changes back.

CUDA will still use delayed indirect rays since it's an experimental
feature.

For the details about what's to be done still please refer to T46880.
2015-11-27 17:15:02 +05:00
175f00c89a Revert "Cycles: Fix wrong SSS with regular path tracing and clamping enabled"
This wasn't really a complete fix and only worked if there was a single scatter
event recorded only. Proper fix requires some more thoughts to make it correct
without memory use increase.

This reverts commit bf9e88bfbe.
2015-11-27 17:15:02 +05:00
bf9e88bfbe Cycles: Fix wrong SSS with regular path tracing and clamping enabled
Radiance sum and reset was happening in different order after 26f1c51.

This is a quick fix to unlock Caminandes team, perhaps we can avoid having
separate variable to detect when radiance is to be sum.
2015-11-26 16:11:41 +05:00
26f1c51ca6 Cycles: Trace indirect subsurface rays by restarting the integrator loop
This gives much lower stack usage on GPU and reduces kernel memory size to
around 448MB on GTX560Ti (comparing to 652MB with previous commit and 946MB
with official release). There's also a barely measurable speedup of around
5%, but this is to be confirmed still.

At this stage we're using only ~3% for the experimental kernel and SSS
rendering seems to be faster by 40% and after some further testing we might
consider making SSS and CMJ official features and remove experimental
precompiled kernels.
2015-11-25 13:01:22 +05:00
2a5c1fc9cc Cycles: Delay shooting SSS indirect rays
The idea is to delay shooting indirect rays for the SSS sampling and
trace them after the main integration loop was finished.

This reduces GPU stack usage even further and brings it down to around
652MB (comparing to 722MB before the change and 946MB with previous
stable release).

This also solves the speed regression happened in the previous commit
and now simple SSS scene (SSS suzanne on the floor) renders in 0:50
(comparing to 1:16 with previous commit and 1:03 with official release).
2015-11-25 13:01:22 +05:00
8bca34fe32 Cysles: Avoid having ShaderData on the stack
This commit introduces a SSS-oriented intersection structure which is replacing
old logic of having separate arrays for just intersections and shader data and
encapsulates all the data needed for SSS evaluation.

This giver a huge stack memory saving on GPU. In own experiments it gave 25%
memory usage reduction on GTX560Ti (722MB vs. 946MB).

Unfortunately, this gave some performance loss of 20% which only happens on GPU.
This is perhaps due to different memory access pattern. Will be solved in the
future, hopefully.

Famous saying: won in memory - lost in time (which is also valid in other way
around).
2015-11-25 13:01:22 +05:00
099aaea447 Cycles: Move branched path tracking into own file
Code there started becoming a bit too big, by splitting it up it'll make it
easier to do improvements or extending the features in there.

The layout is not totally final yet, would need to try de-duplicating parts
of code from split kernel with non-split integrators,
2015-06-15 23:02:42 +02:00
596eadf0e1 Cycles: Add debug pass which shows number of instance pushes during camera ray intersection
TODO: We might want to refactor debug passes into PASS_DEBUG and some
debug_type (similar to Blender's side passes) to avoid issue of running
out of bits.
2015-06-12 00:12:03 +02:00
2bd6de5bbb Cycles: Add debug pass showing average number of ray bounces per pixel
Quite straightforward implementation, but still needs some work for the split
kernel. Includes both regular and split kernel implementation for that.

The pass is not exposed to the interface yet because it's currently not really
easy to have same pass listed in the menu multiple times.
2015-06-11 14:53:15 +02:00
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
900fc43bb4 Cleanup: Remove unused ray type flags.
They were added for completeness, but it seems we don't need them.
2015-05-08 12:10:26 +02:00
5e423775da Cleanup: Move Cycles volume stack update for subsurface into kernel_volume.h. 2015-04-28 11:20:27 +02:00
3db0e1ef6a Cycles: Simplify volume light connect code. 2015-03-13 00:09:13 +01:00
60679a171d Revert "Cleanup: Simplify camera sample motion blur code."
This reverts commit 8197f0bb64.
2015-02-26 13:27:02 +01:00
8197f0bb64 Cleanup: Simplify camera sample motion blur code. 2015-02-26 10:30:01 +01:00
d979f39cf1 Cycles: Small improvement for volume render (decoupled)
Simplify branching here a bit, helps ~3% in volume_light_sampling.blend (Branched MIS scene).
2015-02-14 20:44:30 +01:00
25f33e058a Fix T43562: Cycles gets stuck with camera in volume in certain setup
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.

Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.

For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.
2015-02-05 16:10:50 +05:00
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
4c60aae66c Cleanup: warnings 2014-10-06 23:19:07 +02:00
233de800e2 Cycles: Optimize of volume stack update when sampling SSS
basically we skip all non-volume objects now in the volume stack function.
Depending on the show it might give some percent of speedup.

Most of the speedup would be gained in the scenes when having SSS object
intersecting the volume and taking a reasonable amount of frame space.
2014-10-06 12:36:46 +02:00
68f2066602 Cycles: Make OpenCL folks happy to use __KERNEL_DEBUG__
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.

Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
2014-10-05 16:00:23 +06:00
27d660ad20 Cycles: Add support for debug passes
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.

Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.

From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.

Reviewers: campbellbarton

Reviewed By: campbellbarton

Differential Revision: https://developer.blender.org/D813
2014-10-04 19:00:26 +06:00
a654512356 Cycles: Implement preliminary test for volume stack update from SSS
This adds an AABB collision check for objects with volumes and if there's a
collision detected then the object will have SD_OBJECT_INTERSECTS_VOLUME flag.

This solves a speed regression introduced by the fix for T39823 by skipping
volume stack update in cases no volumes intersects the current SSS object.
2014-10-03 10:52:04 +02:00
21825c4359 Cycles: Avoid temp variable in camera-in-volume check
Was a left-over from some experiments, no need it with the current
implementation, and likely wouldn't need in the future.
2014-09-28 02:35:37 +06:00
53b05e4f06 Cycles: Cleanup of the SSS volume stack update code
Was a leftover after the changed scene_intersect() which used to
be ifdefed depending on the __HAIR__ in the original patch.
2014-09-28 02:19:17 +06:00
ff4a867dc0 Code style. 2014-09-26 02:04:40 +02:00
fe731686fb Cycles: Add support for cameras inside volume
Basically the title says it all, volume stack initialization now is aware that
camera might be inside of the volume. This gives quite noticeable render time
regressions in cases camera is in the volume (didn't measure them yet) because
this requires quite a few of ray-casting per camera ray in order to check which
objects we're inside. Not quite sure if this might be optimized.

But the good thing is that we can do quite a good job on detecting whether
camera is outside of any of the volumes and in this case there should be no
time penalty at all (apart from some extra checks during the sync state).

For now we're only doing rather simple AABB checks between the viewplane and
volume objects. This could give some false-positives, but this should be good
starting point.

Need to mention panoramic cameras here, for them it's only check for whether
there are volumes in the scene, which would lead to speed regressions even if
the camera is outside of the volumes. Would need to figure out proper check
for such cameras.

There are still quite a few of TODOs in the code, but the patch is good enough
to start playing around with it checking whether there are some obvious mistakes
somewhere.

Currently the feature is only available in the Experimental feature sey, need
to solve some of the TODOs and look into making things faster before considering
the feature is ready for the official feature set. This would still likely
happen in current release cycle.

Reviewers: brecht, juicyfruit, dingto

Differential Revision: https://developer.blender.org/D794
2014-09-25 23:28:01 +06:00
ccc5983e2b Fix T39823: SSS scatter doesn't update volume stack, causing shading artifacts
Basically the title says it all, we need to update volume stack when doing ray
scatter for SSS. This leads to speed regressions in cases scene does have both
volume and SSS (performance in case there's no SSS or no volume should be the
same).

We might try optimizing kernel_path_subsurface_update_volume_stack() a bit by
either recording all intersections or using some more appropriate visibility
flags.

Reviewers: brecht, juicyfruit, dingto

Differential Revision: https://developer.blender.org/D795
2014-09-25 23:17:45 +06:00
1b5ec32ed9 Cleanup: Avoid some defines for scene_intersect(), related to Min Width. 2014-09-24 11:32:29 +02:00
f670a8aeaa Fix T41709: Bump not rendered correctly behind transparency using Branched Path Tracing 2014-09-06 18:16:38 +06:00
f7062ff3ed Fix T41693: Volumes get brightened with extra volume samples on GPU + BPT 2014-09-03 21:28:43 +06:00
35bc266de7 Cleanup: Silence compiler warning. 2014-09-01 02:49:28 +02:00
ae31b25fb5 Cycles: Fix wrong Volume Scattering in Branched Path integrator, when building without Decoupled Ray Marching.
The wrong throughput was used here.
2014-08-24 23:08:07 +02:00
a25484eefa Cleanup: Remove unused variable in kernel_path_volume_bounce(). 2014-08-24 23:06:30 +02:00
031620aba2 Cycles: Avoid redundant call to volume_stack_is_heterogeneous() for Distance Sampling. 2014-08-24 16:15:57 +02:00
c89287e057 Cycles: Avoid call to volume_stack_sampling_method() on GPU, Decoupled is required for Equi-Angular/MIS. 2014-08-24 15:58:41 +02:00
187d77612b Code refactor: Split __VOLUME__ defines in Cycles.
* __VOLUME__ is basic volume support with Emission and Absorption.
* __VOLUME_SCATTER__ enables volume Scattering support.
* __VOLUME_DECOUPLED__ enables Decoupled Ray Marching.
2014-08-20 23:15:30 +02:00
075f6eff74 Cycles: Further tweak for Decoupled Ray Marching
Avoid some if checks when probalistic_scatter is false.

Differential Revision: https://developer.blender.org/D743
2014-08-20 22:59:08 +02:00
8ff3cf3e56 Cleanup: typos and extra brackets. 2014-08-14 16:31:53 +02:00
83f5d41071 Cleanup: Same thing in path trace setup, we can safely always assign the proper value. 2014-07-10 01:49:34 +02:00
ef22e972b1 Code cleanup: Simplify decoupled scattering code a bit. 2014-07-07 13:28:10 +02:00
5aec61f849 Cycles: Compile fixes for CUDA Volumetrics.
* CUDA can be compiled with Volume support again, change line 78 kernel_types.h for that.

Volumes are still fragile on GPU though, got some Memory/Address CUDA errors in tests.. needs to be investigated more deeply.
2014-07-05 02:04:07 +02:00
4b209f063c Fix T40695: world surface shader incorrectly visible with world volume. 2014-06-24 11:35:48 +02:00
5fa68133c9 Cycles: volume sampling method can now be set per material/world.
This gives you "Multiple Importance", "Distance" and "Equiangular" choices.

What multiple importance sampling does is make things more robust to certain
types of noise at the cost of a bit more noise in cases where the individual
strategies are always better.

So if you've got a pretty dense volume that's lit from far away then distance
sampling is usually more efficient. If you've got a light inside or near the
volume then equiangular sampling is better. If you have a combination of both,
then the multiple importance sampling will be better.
2014-06-14 13:49:56 +02:00
a29807cd63 Cycles: volume light sampling
* Volume multiple importace sampling support to combine equiangular and distance
  sampling, for both homogeneous and heterogeneous volumes.

* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
  apply to volumes as well as surfaces.

Implementation note:

For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
2014-06-14 13:49:56 +02:00