This reverts commit ecbfa31caa.
Original commit broke logic in nodes re-fitting. That area can
access non-existing children momentarely. Not sure what would
be best solution here, for now simply reverting the change/
The issue was caused by some false-positive empty non-AABB intersection.
Tried to tweak it a bit so it does not record intersection anymore.
Hopefully will work for all platforms. Tested here on iMac and Debian.
For proper indexing to work we need to use unaligned node with
identity transform instead of aligned nodes when doing refit.
To be backported to 2.78 release.
This is a special builder type which is allowed to orient nodes to
strands direction, hence minimizing their surface area in comparison
with axis-aligned nodes. Such nodes are much more efficient for hair
rendering.
Implementation of BVH builder is based on Embree, and generally idea
there is to calculate axis-aligned SAH and oriented SAH and if SAH
of oriented node is smaller than axis-aligned SAH we create unaligned
node.
We store both aligned and unaligned nodes in the same tree (which
seems to be different from what Embree is doing) so we don't have
any any extra calculations needed to set up hair ray for BVH
traversal, hence avoiding any possible negative effect of this new
BVH nodes type.
This new builder is currently not in use, still need to make BVH
traversal code aware of unaligned nodes.
This seems to be straightforward way to support heterogeneous nodes
in the same tree.
There is some penalty related on 4gig limit of the address space now,
but here's are the thing:
Traversal code was already using ints to store final offset, so
there can't be regressions really.
This is a required commit to make it possible to encode both aligned
and unaligned nodes in the same array. Also, in the future we can use
this to get rid of __leaf_nodes array (which is a bit tricky to do since
trickery in pack_instances().
There are several internal changes for this:
First idea is to make __tri_verts to behave similar to __tri_storage,
meaning, __tri_verts array now contains all vertices of all triangles
instead of just mesh vertices. This saves some lookup when reading
triangle coordinates in functions like triangle_normal().
In order to make it efficient needed to store global triangle offset
somewhere. So no __tri_vindex.w contains a global triangle index which
can be used to read triangle vertices.
Additionally, the order of vertices in that array is aligned with
primitives from BVH. This is needed to keep cache as much coherent as
possible for BVH traversal. This causes some extra tricks needed to
fill the array in and deal with True Displacement but those trickery
is fully required to prevent noticeable slowdown.
Next idea was to use this __tri_verts instead of __tri_storage in
intersection code. Unfortunately, this is quite tricky to do without
noticeable speed loss. Mainly this loss is caused by extra lookup
happening to access vertex coordinate.
Fortunately, tricks here and there (i,e, some types changes to avoid
casts which are not really coming for free) reduces those losses to
an acceptable level. So now they are within couple of percent only,
On a positive site we've achieved:
- Few percent of memory save with triangle-only scenes. Actual save
in this case is close to size of all vertices.
On a more fine-subdivided scenes this benefit might become more
obvious.
- Huge memory save of hairy scenes. For example, on koro.blend
there is about 20% memory save. Similar figure for bunny.blend.
This memory save was the main goal of this commit to move forward
with Hair BVH which required more memory per BVH node. So while
this sounds exciting, this memory optimization will become invisible
by upcoming Hair BVH work.
But again on a positive side, we can add an option to NOT use Hair
BVH and then we'll have same-ish render times as we've got currently
but will have this 20% memory benefit on hairy scenes.
It was initially unsupported because initial idea of checking visibility
of all children was slowing scenes down a lot. Now the idea has changed
and we only perform visibility check of current node. This avoids huge
slowdown (from tests here it seems to be withing 1-2%, but more tests
would never hurt) and gives nice speedup of ray traversal for complex
scenes which utilized ray visibility.
Here's timing of koro.blend:
Without visibility check With visibility check
Original file 4min 20sec 4min 23sec
Camera rays only 1min 43 sec 55sec
Unfortunately, this doesn't come for free and requires extra data in
BVH node, which increases memory usage of BVH nodes by 15%. This we
can solve with some future trickery of avoiding __tri_storage created
for curve segments.
In certain types of animation it's possible to have some objects
scaling to zero. In this case we can save render times by avoid
traversing such instances.
Better to do ti ahead of a time, so traversal stays simple.
Reviewers: lukasstockner97, dingto, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2048
Some of these values can get quite large and are hard to read, adding this
makes it easy to read them at a glance.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2039
This commits implements threaded sorting of references when looking for
object spatial split. It mainly useful when doing initial binning, which
happens from main thread.
Gives nice speedup of BVH build for Bunny.blend: 36sec vs. 55sec for
the Rabbit mesh BVH build.
On more complex scenes the speedup is probably minimal, but still nice
to have more instant rendering for simplier scenes.
Some further tests with production scenes would be interesting.
Reviewers: juicyfruit, dingto, lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1928
Simple fix: release lock earlier.
Reduces spatial split build time from 96 to 53sec on the Bunny.blend
(using studio Intel for benchmark).
NOTE: Timing difference is not that spectacular when comparing numbers
with builds before memory optimization, but even then it's about 20%
faster build.
This was only visible on systems with lots of threads and root of the issue
was that we've been pre-allocating too much memory for all the threads.
Now we only pre-allocate data for the main thread and rest of the threads
does allocation on-demand.
This brings down memory usage from 36Gig to 6.9Gig when building spatial
split for the Bunny.blend file on our Intel beast.
Originally regression was happened by the threaded spacial split builder
commit.
The title actually covers it all, This commit exploits all the work
being done in previous changes to make it possible to build spatial
splits in threads.
Works quite nicely, but has a downside of some extra memory usage.
In practice it doesn't seem to be a huge problem and that we can
always look into later if it becomes a real showstopper.
In practice it shows some nice speedup:
- BMW27 scene takes 3 now (used to be 4)
- Agent shot takes 5 sec (used to be 80)
Such non-linear speedup is most likely coming from much less amount
of heap re-allocations. A a downside, there's a bit of extra memory
used by BVH arrays. From the tests amount of extra memory is below
0.001% so far, so it's not that bad at all.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1820
Policy here is a bit more complicated, if tree becomes too deep we're
forced to create a leaf node and size of that leaf wouldn't be so well
predicted, which means it's quite tricky to use single stack array for
that.
Made it more official feature that StackAllocator will fall-back to
heap when running out of stack memory.
It's still much better than always using heap allocator.
Currently they're staying at 1 (actual size over capacity), but we
will be changing it quite soon in order to avoid having too much
memory re-allocation happening at a BVH build time and will be
playing with different policies for that.
In some files stack memory was overruning the pre-allocated stack.
Perhaps we should fall-back to a hep-allocated stack so release builds
don't crash in works case but just becoming slower.
There are in fact some missing parts to it (Split BVH builder should
be creating bins from result of Object Split constructor).
Doable, but need to quickly fix issue for the studio here, easier to
revert for now.
This has following advantages:
- Localizes all the run-time storage into a single structure,
which could easily be extended further.
- Storage could be created per-thread, so once builder is
threaded we wouldn't have any conflicts between threads.
- Global nature of the storage avoids memory re-allocation
on the runtime, keeping builder as fast as possible.
Currently it's just API changes, which don't affect user at all.
Uses new StackAllocator from util_stack_allocator. Some tweaks to the stack
storage size are possible, read notes in the code about this.
At this point we might want to rename allocator files to util_allocator_foo.c,
so the stay nicely grouped in the folder.
This commit makes it so casting subsurface rays will totally ignore all
the BVH nodes and primitives which do not belong to a current object,
making it much simpler traversal code and reduces number of intersection
tests.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1823