This adds support for CUDA Texture objects (also known as Bindless textures) for Kepler GPUs (Geforce 6xx and above).
This is used for all 2D/3D textures, data still uses arrays as before.
User benefits:
* No more limits of image textures on Kepler.
We had 5 float4 and 145 byte4 slots there before, now we have 1024 float4 and 1024 byte4.
This can be extended further if we need to (just change the define).
* Single channel textures slots (byte and float) are now supported on Kepler as well (1024 slots for each type).
ToDo / Issues:
* 3D textures don't work yet, at least don't show up during render. I have no idea whats wrong yet.
* Dynamically allocate bindless_mapping array?
I hope Fermi still works fine, but that should be tested on a Fermi card before pushing to master.
Part of my GSoC 2016.
Reviewers: sergey, #cycles, brecht
Subscribers: swerner, jtheninja, brecht, sergey
Differential Revision: https://developer.blender.org/D1999
If the CUDA Toolkit is installed and the user is on Linux,
adaptive, feature based CUDA runtime compile is now possible to enable via:
* Environment flag CYCLES_CUDA_ADAPTIVE_COMPILE or
* Debug menu (Debug value 256) in the Cycles UI.
This was a hard decision, because going newer CUDA toolkit makes
rendering up to 5% slower. But on another hand, it solves major
speed regressions (up to 30%) with branched path tracing on a
top level cards.
Neither of those regressions have a meaningful and sane workaround
from the code itself.
Toolkit 6.5 could still be used, but it's no longer recommended one.
Supports both smoke/fire and point density textures now.
Reduces number of textures available for sm_20 and sm_21, but you have
to compromise somewhere on such a limited hardware.
Currently limited to linear interpolation only, and decoupled ray
marching is not supported yet. Think those could be considered just a
further improvement.
Some quick example:
https://developer.blender.org/F282934
Code is minimal and we can fully consider it a fix for missing
support of 3D textures with CUDA.
Reviewers: lukasstockner97, brecht, juicyfruit, dingto
Reviewed By: brecht, juicyfruit, dingto
Subscribers: mib2berlin
Differential Revision: https://developer.blender.org/D1806
Basically the idea is to make code robust against extending
enum options in the future by falling back to a known safe
default setting when RNA is set to something unknown.
While this approach solves the issues similar to T47377,
but it wouldn't really help when/if any of the RNA values
gets ever deprecated and removed. There'll be no simple
solution to that apart from defining explicit mapping from
RNA value to Cycles one.
Another part which isn't so great actually is that we now
have to have some enum guards and give some explicit values
to the enum items, but we can live with that perhaps.
Reviewers: dingto, juicyfruit, lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1785
This commit removes the experimental CUDA kernel, making SSS and CMJ
regular features.
Several improvements have been made in the past few
weeks (thanks Sergey!) which make SSS render several times faster (2-3x
compared to 2.76b) on the GPU, and the increased VRAM usage has also been
fixed. Therefore the experimental kernel is no longer needed.
Differential Revision: https://developer.blender.org/D1726
Manual has been updated: too:
https://www.blender.org/manual/render/cycles/features.html
The main purpose of such linking is to make Blender compatible with
NVidia's debuggers and profilers which are doing some LD_PRELOAD
magic to intercept some function calls. Such magic conflicts with
our CUDA wrangler magic and causes segmentation faults.
The option is disabled by default, so there's no affect on any of
artists.
In order to make Blender linked directly against CUDA library use
the WITH_CUDA_DYNLOAD CMake option (it's marked as advanced).
This makes it possible to move some parts of evaluation from host to the device
and hopefully reduce memory usage by avoid having full RGBA buffer on the host.
Reviewers: juicyfruit, lukasstockner97, brecht
Reviewed By: lukasstockner97, brecht
Differential Revision: https://developer.blender.org/D1702
Maybe this is pedantic but I read it’s best to explicitly set the
desired component size.
Also append “_ARB” to float texture formats since those need an
extension in GL 2.1.
Gives few percent of memory improvement for regular feature set kernel
and could give significant memory improvement for Experimental kernel.
It could also give some degree of performance improvement, but this I
didn't really measure reliably yet.
Code is ifdef-ed for now, since it's only working on Linux and requires
CUDA toolkit to be installed (other platform only use precompiled
kernels).
This is just an experiment for now and a base for the proper feature
support in the future (with runtime compilation using CUDA 7?).
This just replaces internal argument `experimental` with `requested_features`
making it possible to access particular requested settings when building
kernels.
it is the same issue as described in the previous commit, original changes
in this area were wrong and only worked on a bugger optimus driver which
simply appeared to work by co-incident and in fact used wrong device..
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.
Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.
Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.
There should be no functional changes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1314
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
For CPU it gives available instructions set (SSE, AVX and so).
For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
This is what was handy troubleshooting issues in the studio,
plus this is exactly the same thing which would be helpful
when solving issues with paths to compiled shaders and cubins
for standalone repository.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.