This is an optimization, but the difference is still not that
significant as some extractions are still done in single thread.
**Benchmarking**
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 14.246502 FPS|Average: 15.438118 FPS
||rdata 9ms iter 31ms (frame 69ms)|rdata 9ms iter 27ms (frame 65ms)
|large_mesh_editing_ledge: |Average: 14.913622 FPS|Average: 15.856538 FPS
||rdata 9ms iter 30ms (frame 67ms)|rdata 9ms iter 26ms (frame 63ms)
|looptris_test:|Average: 3.970774 FPS|Average: 4.095200 FPS
||rdata 11ms iter 90ms (frame 235ms)|rdata 12ms iter 87ms (frame 229ms)
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D11467
A single threaded task with thread data over 8192 bytes would leak.
While this didn't happen in practice, it could cause issues in the
future.
The free call for `task_parallel_iterator_do` wasn't running
if callbacks weren't set, also not an issue in practice but avoids
potential problems in the future too.
Changes introduced in commit rBe9f2f17e8518
can create different render results when there is
a Math or Mix operation after TextureOperation
on tiled execution model.
This is due to WriteBufferOperation forcing a single pixel
resolution when these operations use a preferred
resolution of 0 to check if their inputs have resolution.
Fixing this behaviour creates different renders too.
This patch keeps previous tiled implementation and
adds the new implementation only for full frame execution.
Reviewed By: Jeroen Bakker (jbakker)
Differential Revision: https://developer.blender.org/D11546
In order to reduce stack size this patch converts full frame
recursive methods into iterative.
- No functional changes.
- No performance changes.
- Memory peak may slightly vary depending on the tree because
now breadth-first traversal is used instead of depth-first.
Tests in D11113 have same results except for test1 memory peak:
360MBs instead of 329.50MBs.
Reviewed By: Jeroen Bakker (jbakker)
Differential Revision: https://developer.blender.org/D11515
CMake builder and install deps changes, precompiled libraries are still to be
committed.
Ref T88438, T88434
Differential Revision: https://developer.blender.org/D11486
During transforming an image, a matrix multiplication per pixel was done.
The matrix in itself is always linear so it could be replaced by two additions.
During testing in debug builds playing back a movie went from 20fps to
300 fps.
Reviewed By: zeddb
Differential Revision: https://developer.blender.org/D11533
Talked with Bastien and we ended up looking into this. Issue is that the
dupliation through drag & drop should also be considered a
"sub-process", like Shift+D duplicating does. Added a comment explaining
why this is needed.
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.
To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.
`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.
`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.
For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.
* When creating tasks, the task number of current task is stored in
ExtractTaskData including the total number of tasks.
* Adding two functions in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
access the correct task data without any need for locking.
There is no noticeable change in end user performance.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11499
This node creates a boolean face attribute that is "true" for
every face that has the given material.
Differential Revision: https://developer.blender.org/D11324
Selection of an FCurve with box/circle select now selects the entire
curve and all its keys:
- Box selecting a curve selects all the keyframes of the curve.
- Ctrl + box selecting of the curve deselects all the keyframes of the
curve.
- Shift + box selecting of the curve extends the keyframe selection,
adding all the keyframes of the curves that were just selected to the
selection.
- In all cases, if the selection area contains a key, nothing is
performed on the curves themselves (the action only impacts the
selected keys).
Reviewed By: sybren, #animation_rigging
Differential Revision: https://developer.blender.org/D11181
Some of the primitive nodes can return null in an error condition.
This is confusing mixed with adding a maderial slot in calling
functions. This is the second crash caused by that confusion. It's
simpler to add the slot right when allocating the mesh, and it will
lend itself better to copy & paste coding in the future.
Differential Revision: https://developer.blender.org/D11530
Under some circumstances using task isolation can cause deadlocks.
Previously, our task pool implementation would run all tasks in an
isolated region. Now using task isolation is optional and can be
turned on/off for individual task pools.
Task pools that spawn new tasks recursively should never enable
task isolation. There is a new check that finds these cases at runtime.
Right now this check is disabled, so that this commit is a pure refactor.
It will be enabled in an upcoming commit.
This fixes T88598.
Differential Revision: https://developer.blender.org/D11415
Simplify vertex normal calculation by moving the main normal
accumulation function to operate on vertices instead of faces.
Using faces had the down side that it needed to zero, accumulate and
normalize the vertex normals in 3 separate passes, accumulating also
needed a spin-lock for thread since the face would write it's normal
to all of it's vertices which could be shared with other faces.
Now a single loop over vertices is performed without locking.
This gives 5-6% speedup calculating all normals.
This also simplifies partial updates, fixing a problem where
all connected faces were being read from when calculating normals.
While this could have been resolved separately,
it's simpler to operate on vertices directly.
Move BMesh conversion and all loading code into worker.
Reviewed By: Sebastian Parborg (zeddb)
Differential Revision: https://developer.blender.org/D11288
ChildWindowFromPoint retrieves the child of the provided window at a
point. In this case it always returns 0 because HWND_DESKTOP is flag
defined as 0, which is never a valid window handle and is not intended
for use in place of a window handle.
Forwarding of mousewheel events was added in adb08def61, and later
modified to the current unworking state in e9645806f5. Sending mouse
wheel events to the window under the cursor is a system preference and
therefore should not be overridden by Blender, therefore the noop code
has been removed.
Added a new api function to stich multires grids
on specific faces in a mesh,
subdiv_ccg_average_faces_boundaries_and_corners,
and changed multires normal calc to use it.
VTune profiling showed that this was a major
performance hit once you get above 10,000 or so
base mesh faces and/or have a high number of
subdivision levels.
Here's a video comparing the difference. Note the
bpy.app_debug switch is not in the final commit.
{F10145323}
And the .blend file:
{F10145346}
Reviewed By: Sergey Sharybin (sergey)
Differential Revision:
https://developer.blender.org/D11334
Added a new api function to stich multires grids
on specific faces in a mesh,
subdiv_ccg_average_faces_boundaries_and_corners,
and changed multires normal calc to use it.
VTune profiling showed that this was a major
performance hit once you get above 10,000 or so
base mesh faces and/or have a high number of
subdivision levels.
Here's a video comparing the difference. Note the
bpy.app_debug switch is not in the final commit.
{F10145323}
And the .blend file:
{F10145346}
Reviewed By: Sergey Sharybin (sergey)
Differential Revision:
https://developer.blender.org/D11334
Also use Curve as an argument instead of Object, since the object was
only used to retrieve the curve, and the calling code is already working
with curve data.