The workaround for generated texture coordinates is to be done before
calculating number of elements for attribute, otherwise counter wouldn't
include those attributes.
The idea behind the change is to pre-allocate attribute arrays in advance,
which avoids re-allocation of arrays later for each of meshes being handled.
This reduces peak memory used by Cycles database from 1.3G to 0.9G for
victor.blend from Gooseberry.
It doesn't mean every file will benefit from this change since peak memory
usage is happening in the different places of the rendering code.
Also, unfortunately, attributes export might not cause the peak of render
preparation stage. That said, it's actually object_to_mesh() which causes
the memory to peak in the same test file. So we really need to optimize that
part first in order to get visible results for artists. But in any case it's
now quite easy to track hotspots in Cycles itself which is good.
It's actually a bad level call, but it's inside ifdef block and disabled by
default and only intended to be used for development purposes.
Main idea of this change is to combine statistics coming from Cycles and
Blender during scene synchronization step, to see if further changes are
actually reducing memory footprint.
The method is called vector::free_memory(). Use with care since it'll invalidate
all the pointers to vector memory, all iterators and so on.
Currently unused, but might become handy when clearing unused data.
The commit implements a guarded allocator which can be used by STL classes
such as vectors, maps and so on. This allocator will keep track of current
and peak memory usage which then can be queried.
New code for allocator is only active when building Cycles with debug flag
(WITH_CYCLES_DEBUG) and doesn't distort regular builds too much.
Additionally now we're using own subclass of std::vector which allows us
to implement shrink_to_fit() method which would ensure capacity of the
vector is as big as it should be (without this making vector smaller will
still use all previous memory allocated).
This replaces our own implementation of aligned malloc with system calls,
which depends on which operation system you're on.
This is probably really minor noticeable change, but in the same time it
might reduce amount of wasted memory.
Yes, for me tablets (both wacom and no-name) were again broken - curse X11!
So now, we want ButtonPress, Button1Motion does not work anymore...
Anyway, this patch makes things much cleaner, storing each event type
in its own variable!
Patch by cedricp (Cédric PAILLE) from T43367, thanks a bunch!
Purely developers-only feature which allows to disable some of the CPU
capabilities. This way it's easier to test different kernels on the
same machine.
In the worst case it'll do nothing, in the best case it might give some percent
of speedup because of better cache coherency.
Currently it's all handled as an override on blender_python level, don't really
see reason to penetrate the boolean flag further into sync code. This can always
be done later if needed.
This attribute means how "pointy" the geometry surface is, which allows to do
effects like dirt maps and wear-off effects on render geometry. This means the
attribute is calculated for the final mesh which means no baking (which implies
UV unwrap) is needed. Apart from this the behavior is quite close to how vertex
dirty colors works.
The new attribute is available as an output socket of Geometry node.
There's no penalty for the render time, only some delay on scene preparation
(the delay is linear of the mesh complexity).
Reviewers: brecht, juicyfruit
Subscribers: eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D1086
From more investigation of the numeric failures in the kernel it appears
the check was rather correct. But in theory it;s also needed for the motion
triangles.
The issue was caused by lack of check for whether fresnel term is actually
giving total internal reflection in refraction BSDFs. This lead to usage of
arbitrary vector of (0, 0, 0) as reflection, giving numeric issues in other
areas of the kernel.
This gives some visual changes of sharp reflection but it seems to be rather
proper now. Which also corresponds with rough glossy reflection with sharpness
set to 0.001 (previously it was totally different from sharpness of 0.0, which
is just weird).
This is the same issue T43475: SSE4 code is more robust to non-finite values
in the ray origin/direction. So for now added a check before doing BVH traversal
for pre-SSE4 CPUs.
For sure actual root of the issue is a bit different and much more tricky to
solve, especially without disturbing render results too much. Still looking
into this.
In any case, it's kinda fine to have such a check, we might later make it to be
a kernel_assert() instead of just a return.
Previous fix didn't quite work well. For some reason everything worked fine when
using native nvcc in 32bit environment, but cross-compiling from 64bit platform
it was still running out of memory.
For now just made it so all the kernels are slower on 32bit CUDA as a temporary
solution. Either it'll be solved in next CUDA releases (by dropped 32bit? =\) or
we'll find better workaround.
Was filtering for Logitech's USB vendor ID. 3Dconnexion now uses their
own vendor ID for new products. Mac & Windows don't look for specific
vendors so they should be fine.
Also added a note to eventually make USE_FINISH_GLITCH_WORKAROUND
available on all platforms.
Reined back over-use of virtual functions in GHOST, especially in
"leaves" of the inheritance hierarchy. This eliminates vtables for many
classes and (in some places) turns virtual function dispatch into direct
function calls.
I'll be around to fix things if other coders think this change is too
much.
Still lots of virtual in GHOST_TaskbarWin32 since it just loves virtual.
Visual representation of fire was generated before simulation step took place. This caused fire to essentially lag one simulation step behind, even though all emission areas were up to date.
Runtime costs were horrible. On gooseberry in some sequencer edits using
proxies of small size, a cache with about 2000 elements would slow to
about 6 fps once cache was full and system tried to find smallest
element available.
There are still improvements to be done here, like requesting a number
of good candidates to avoid rerunnung through the list, or even using
some heap or ring buffer scheme to sort data, but nothing suits all
needs so for now that should bring the cache back to usable state (25fps
here at the studio)
This is the core code for it, tools (datatransfer and modifier) will come in next commits).
RNA api is already there, though.
See the code for details, but basically, we define, for each 'smooth fan'
(which is a set of adjacent loops around a same vertex that are smooth, i.e. have a single same normal),
a 'loop normal space' (or lnor space), using auto-computed normal and relevant edges, and store
custom normal as two angular factors inside that space. This allows to have custom normals
'following' deformations of the geometry, and to only save two shorts per loop in new clnor CDLayer.
Normal manipulation (editing, mixing, interpolating, etc.) shall always happen with plain 3D vectors normals,
and be converted back into storage format at the end.
Clnor computation has also been threaded (at least for Mesh case, not for BMesh), since the process can
be rather heavy with high poly meshes.
Also, bumping subversion, and fix mess in 2.70 versioning code.
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.
Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.
For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.
Issue was in fact only visible in certain circumstances:
- OSL was compiled with Boost Wave
- or system's cpp didn't handle space between -I and path
Now made it so both wave and cpp code paths are always happy.
Original patch from Shane Ambler with own modifications to
mimic what variable holds on more verbose.