This is an alternate fix for T40964 which resolves bad handling of
caustics reported in T45609.
There were too much transmission rays being discarded by the original
fix, which caused by caustic light being totally disabled. There is
still some room for investigation why exactly original paper didn't
work that well, could be caused by the way how the pdf is calculated.
In any case current results seems rather correct now.
This simplification is safe, as the call to volume_phase_eval() is guarded behind a CLOSURE_IS_PHASE check, which is equal to
CLOSURE_VOLUME_HENYEY_GREENSTEIN_ID. I don't think we will add more phase functions anytime soon, if at all.
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.
Original patch by @lockal with own modification:
Don't make changes outside of the kernel. They don't make any difference
anyway and term saturate() has a bit different meaning outside of kernel.
This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D1224
Our own implementation is in fact the same performance as in fast_math from
OpenShadingLanguage, but implementation from fast_math is using explicit madd
function, which increases chance of compiler deciding to use intrinsics.
This patch is based on some work done in D788 and re-formulation from Beckmann
implementation in OpenShadingLanguage.
Skipping texture lookup helps a lot on GPUs where it's more expensive to access
texture memory than to do some extra calculation in threads.
CPU code still uses lookup-table based approach since this seems to be still
faster (at least on computers i've got access to).
This change gives about 2% speedup on BMW scene with GTX560TI.
It did not preserve stratification too well and lookup-table approach was
working much better. There are now also some more interesting forumlation
from Wenzel and OpenShadingLanguage which should work better than old code.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
Issue was introduced in 01ee21f where i didn't notice *_setup()
function only doing partial initialization, and some of parameters
are expected to be initialized by callee function.
This was hitting only some setups, so tests with benchmark scenes
didn't unleash issues. Now it should all be fine.
This is to go to the 2.74 branch and we actually might re-AHOY.
Fix T44007: Cycles Volumetrics: block artifacts with overlapping volumes
The issue was caused by uninitialized parameters of some closures, which
lead to unpredictable behavior of shader_merge_closures().
The issue was caused by lack of check for whether fresnel term is actually
giving total internal reflection in refraction BSDFs. This lead to usage of
arbitrary vector of (0, 0, 0) as reflection, giving numeric issues in other
areas of the kernel.
This gives some visual changes of sharp reflection but it seems to be rather
proper now. Which also corresponds with rough glossy reflection with sharpness
set to 0.001 (previously it was totally different from sharpness of 0.0, which
is just weird).
Precision of the fast functions seems to be enough in there and
since the code was heavily using inverse trigonometric functions
this change gives few percent speedup on Victor's hair.
From the tests files from ctests storage doesn't have any meaningful
difference, hair on Victor is all below 4% absolute error and only
few pixels are exceeding 1% absolute difference.
In any case, let it be as it is currently so it allows us to have
fast math file in sources for it's further evaluation and possible
usage in other areas as well.
BSDF sampler function shouldn't give labels it's not intended to do.
That said reflection shouldn't give transmission ray and transmission
give reflection ray.
Added an assert in the transmission sampling but reflection still
needs some investigation because even after recent fixes the check
for projection onto the reflected ray could give both positive and
negative values.
It shouldn't have any affect on renders just makes internal logic
consistent and unleashes an issue to be investigate further.
Issue seems to be caused by not totally proper pdf and eval values for this
closure. Changed it so they reflect to ggx/beckmann reflection with roughness
set to 0, which is effectively the same as the sharp reflection.
Also reduce number of branching and multiplications a bit by inlining the branches.
This gives an unmeasurable speedup, which is in case of BMW is about 2% here.
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
This adds a fresnel conductive OSL preset to the Text Editor. Based on a patch by Lukas Stockner.
Differential revision: https://developer.blender.org/D145
See the differential for details.
That was only needed in the beginning, when we did not had support for tangents. It's time to clean some of the defines up, it's getting a bit too much.
It turns out that the new Beckmann sampling function doesn't work well with
Quasi Monte Carlo sampling, mainly near normal incidence where it can be worse
than the previous sampler. In the new sampler the random number pattern gets
split in two, warped and overlapped, which hurts the stratification, see the
visualization in the differential revision.
Now we use a precomputed table, which is much better behaved. GGX does not seem
to benefit from using a precomputed table.
Disadvantage is that this table adds 1MB of memory usage and 0.03s startup time
to every render (on my quad core CPU).
Differential Revision: https://developer.blender.org/D614
* Anisotropic BSDF now supports GGX and Beckmann distributions, Ward has been
removed because other distributions are superior.
* GGX is now the default distribution for all glossy and anisotropic nodes,
since it looks good, has low noise and is fast to evaluate.
* Ashikhmin-Shirley is now available in the Glossy BSDF.
* Ashikhmin-Shirley anisotropic BSDF was added as closure
* Anisotropic BSDF node now has two distributions
Reviewers: brecht, dingto
Differential Revision: https://developer.blender.org/D549
Samples render slower than before, but hopefully this is made up for with
reduced noise in most cases. The main slowdown comes from samples that would
previously be wasted and turn out black, which are now continued.
GGX sampling is about the same speed as before, while for Beckmann it is slower
still. Perhaps optimizations are still possible there, but didn't find anything
easy.
Code from this paper, which comes with sample code:
Importance Sampling Microfacet-Based BSDFs using the Distribution of Visible Normals.
E. Heitz and E. d'Eon, EGSR 2014
Differential Revision: https://developer.blender.org/D572
Z, Index, normal, UV and vector passes are only affected by surfaces with alpha
transparency equal to or higher than this threshold. With value 0.0 the first
surface hit will always write to these passes, regardless of transparency. With
higher values surfaces that are mostly transparent can be skipped until an opaque
surface is encountered.