Turning on importance sampling on area lights increases noise on diffuse
surfaces. This was caused by PDF calculated for an intersected point on
light instead of original light position.
Patch by Stefan with some own modifications.
Requires having latest El Capitan beta 3 OSX due to ome crucial fixes made in the
compiler. Supports same features as NVidia OpenCL apart from CMJ (there's no
experimental feature set support in megakernel yet).
Uses megakernel internally, which works much better than the split kernel. Split
kernel is not supported on OSX still, needs to be investigated still.
Some more details can be found there:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/OpenCL#AMD_on_OSX
This gives quite the same problems as experimental CUDA kernels
and for until it's found a root cause of the problem we'd just
explicitly uninline the function.
The main loop only handles transparent intersections from the camera ray.
Therefore we can simplify some things.
* Avoid PATH_RAY_CAMERA check, this is always true.
* Avoid path_state_next() call, we can just set transparent flag and increase transparent bounces. This way we avoid the function call and some branching.
Also remove debug num_ray_bounces++, this is incorrect here as no indirect bounce happens here.
Should be no functional changes.
Code there started becoming a bit too big, by splitting it up it'll make it
easier to do improvements or extending the features in there.
The layout is not totally final yet, would need to try de-duplicating parts
of code from split kernel with non-split integrators,
Older OSX has major issues with sincos() function, it's likely a big in OSL
or LLVM. For until we've updated to new versions of this libraries we'll use
a workaround to prevent possible crashes on all the platforms.
Shouldn't be that bad because it's mainly used for anisotropic shader where
angle is usually constant.
This fix is safe for inclusion into final Blender 2.75 release.
Glass BSDF was doing some magic with copying weigths from initial closure
onto refraction one and the code was not checking properly for the number
of closures.
TODO: We might want to refactor debug passes into PASS_DEBUG and some
debug_type (similar to Blender's side passes) to avoid issue of running
out of bits.
They're working just fine on AMD Tonga GPU and probably other architectures,
lets enable it under the experimental feature set and see what exact system
configuration gives issues.
This simplification is safe, as the call to volume_phase_eval() is guarded behind a CLOSURE_IS_PHASE check, which is equal to
CLOSURE_VOLUME_HENYEY_GREENSTEIN_ID. I don't think we will add more phase functions anytime soon, if at all.
Quite straightforward implementation, but still needs some work for the split
kernel. Includes both regular and split kernel implementation for that.
The pass is not exposed to the interface yet because it's currently not really
easy to have same pass listed in the menu multiple times.
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.
This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.
In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.
This will help figuring out cases when node was not properly handled by the SVM
by aborting execution on CPU, where all the nodes are expected to be supported.
This commits finishes initial selective nodes compilation into kernel, which
helps a lot performance-wise for AMD OpenCL kernels.
Split by node groups is based on statistics from simple scenes like BMW and
more complex scenes like mango and gooseberry production files. Further
tweaks are always possible, but it should be a good starting point.
TODO: Still need to ignore unused nodes when calculating requested shader
features.
Like Camera Motion, only available in the Experimental kernel.
This should be it for the upcoming release, we now support almost everything, apart from Transparent Shadows, SSS and Volume.
Same as last commit, code is unused and this one actually would have required some fixes,
as these variants output values outside the 0-1 value range, which doesn't fit Cycles shader design.
Let's finally delete this code, after 4 years of being unused,
there really is no excuse anymore.
If we decide to extend the procedural textures in SVM, we can do this anytime in the future.
This commit re-shuffles code in split kernel once again and makes it so common
parts which is in the headers is only responsible to making all the work needed
for specified ray index. Getting ray index, checking for it's validity and
enqueuing tasks are now happening in the device specified part of the kernel.
This actually makes sense because enqueuing is indeed device-specified and i.e.
with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip.
TODO:
- Kernel comments are still placed in the common header files, but since queue
related stuff is not passed to those functions those comments might need to
be split as well.
Just currently read them considering that they're also covering the way how
all devices are invoking the common code path.
- Arguments might need to be wrapped into KernelGlobals, so we don't ened to
pass all them around as function arguments.
It was kept disabled due to render artifacts which weer in fact caused by bad
memory access, which is fixed in the previous commit.
We now also can make it enabled in regular AMD split kernel after someone tests
the updated code.
The code was failing to compile on runtime because of some path differences,
and it seems we don't need to specify full path to the file which originally
seemed to be needed to make include directives expansion working correct.
This was broken after the kernel file restructure.
Variables allocated in the __local address space can only be defined
inside a __kernel function.
We probably need to solve this a bit differently once we do the CUDA
kernel split, but this fix shoud be good enough until then.
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.
Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.
Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.
There should be no functional changes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1314