Previously offsets were calculated based on the BVH node size,
which is wrong and real PITA in cases when some extra data is
to be added into (or removed from) the node.
Now use offsets which are not calculated form the node size.
Visibility flags are set to all visibility anyway, So there was no reason
to perform that test.
TODO: We need to investigate if having primitive intersection functions
which doesn't do visibility check gives any speedup here as well.
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.
These nodes were assuming sRGB input/output which is for sure wrong for the
shader pipeline which works in the linear space.
So now conversion to/from linear space happens in these nodes which makes them
making sence in the shader context but which might change look and feel of
existing scenes.
This was caused by some internal optimization which evaluated SSS with
size of zero as BSDF but used different ID so the evaluation result
didn't appear in regular diffuse pass.
This lead to situation when SSS data was nowhere stored if the
size was zero.
Now SSS with zero size and close-to-zero sizes will be handled in the
same way from the passes point of view.
Since the aligned allocation of shader closures in OSL memory pool
this workaround is no longer needed.
Also put a comment which describes the desired layout of the structure
so array of shader closures is all nicely aligned.
This solves bugs like T42210 which are caused by compiler being
smart and using some SSE instructions to operate with closure
classes, which was failing because those classes are not allocated
by the regular allocator but allocated in memory pool in OSL.
With newer versions of OSL it is now possible to force closure
classes being aligned to a given boundary and this commit uses
this new functionality.
Unfortunately, it means we're no longer compatible with older
versions of OSL, only latest git version from upstream and our
branch at github are supported:
https://github.com/Nazg-Gul/OpenShadingLanguage/tree/blender-fixes
For OSX and Windows it's not an issue because libraries are
already updated there, Linux users would need to run install_deps
script.
So now cases when object has both hair motion blur and deformation motion blur
vector pass is all correct.
We could get rid of the flag in the future, still need to look deeper into all
the areas trying to find a more clear solution.
Issue was caused by mismatch in pre/post transform matrix spaces for mesh and
curve vectors. This happened because of current way how static transform apply
works: it only stores post/pre in the world space if there's triangle motion
exists. This lead to situation when there's no triangle motion happening but
was hair motion happening.
After long time of trying to solve it in a nice way, ended up solving it in
a bit slow way -- pre/post transform is still storing in the same spaces as
they used to be stored and just convert hair pre/post position to a world
space in the kernel.
This is because currently it's not so clear how to deal with cases when curve
and mesh motion needs different space of pre/post transform (which happens in
cases when only one of the motions exists).
Would think of some magic, and meanwhile artists could be happy with proper
render results.
Create unique flag for output shaders with displacement data and use it
to calculate transformed normal. Implementation suggested by Brecht Van
Lommel.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D890
The idea is to avoid memory allocation when only one segment step is to be allocated.
This gives some speedup which is difficult to measure on this trashcan from hell, but
it's about from 7% to 10% in the extreme case with single volume filling the whole of
the viewport. This seems to depends on the phase of the bug-o-meter in the studio.
On the linux boxes it's not that spectacular speedup, it's about 2% on my laptop and
about 3% on the studio desktop. This is likely because of the awesomeness of jemalloc.
Add compile-time check for particular glibc version which fixed the issue.
This makes it so own-compiled blender is the fastest in the world, and the
only issue remains what should we do for release builds.
After some discussion with Campbell we decided to keep it as is for now
because slowdown is not that much noticeable. We'll disable this workaround
for release builds when all the majority of the distros will switch to the
new version of glibc.
That code was mainly needed for the transition period, now we've
got all platforms updated to new OSL.
Plus there are some crucial fixes baking in the current upstream
sources which we'll need to have for the next Blender release.
Even tho it's not 100% clear when we'll switch to OSL-1.6 we'd better
start preparing earlier for this, so we don't spend time on this later.
Plus this code helps troubleshooting some OSL issues, which requires
testing with latest versions of OSL.
This mainly happens when over-saturating already saturated color.
After some discussion with Campbell and loads of tests we decided
to clamp the result RGB color. As an alternative we might want to
clamp corrected HSV values instead, but that would lead to some
larger changes in the render results.
TODO: The same is to be done for compositor nodes.
This is basically just a wrapper class, which maps the generic call from the OSL spec to our closures.
Example usage:
shader microfacet_osl(
color Color = color(0.8),
int Distribution = 0,
normal Normal = N,
vector Tangent = normalize(dPdu),
float RoughnessU = 0.0,
float RoughnessV = 0.0,
float IOR = 1.4,
int Refract = 0,
output closure color BSDF = 0)
{
if (Distribution == 0)
BSDF = Color * microfacet("ggx", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
else
BSDF = Color * microfacet("beckmann", Normal, Tangent, RoughnessU, RoughnessV, IOR, Refract);
}
This way it's easier to extend bitfields and see when we start running
out of free bits.
Plus added brief description of what SD_VOLUME_CUBIC flag means.
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.
There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
This is the first step towards supporting cubic interpolation for voxel
data (such as smoke and fire). It is not epxosed to the interface at all
yet, this is to be done soon after this change.
This is so-called GPU limitation boundary hit, told compiler to NOT include
volume bound function, otherwise some real weird things used to happen.
We actually might want to do the same for CPU, inlining everything is not
the way to get fastest code.