One drawback to trying to predict the number of threads that will be
used in the `task_graph` is that we are only sure of the number when the
threads are running.
Using `BLI_task_parallel_range` allows the driver to
choose the best thread distribution through `parallel_reduce`.
The benefit is most evident on hardware with fewer cores.
This is the result on an 4-core laptop:
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)
Differential Revision: https://developer.blender.org/D11558
This is an adaptation of {D11488}.
A disadvantage of manually setting the iter ranges per thread is that
we don't know how many threads are running in the background and so we
don't know how to best distribute the ranges.
To solve this limitation we can use `parallel_reduce` and thus let the
driver choose the best distribution of ranges among the threads.
This proved to be especially beneficial for computers with few cores.
**Benchmarking:**
Here's the result on an 4-core laptop:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)
Here's the result on an 8-core PC:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 15.267482 FPS|Average: 15.906881 FPS
||rdata 9ms iter 28ms (frame 65ms)|rdata 9ms iter 25ms (frame 63ms)
|large_mesh_editing_ledge: |Average: 15.145966 FPS|Average: 15.520474 FPS
||rdata 9ms iter 29ms (frame 65ms)|rdata 9ms iter 25ms (frame 64ms)
|looptris_test:|Average: 4.001917 FPS|Average: 4.061105 FPS
||rdata 12ms iter 90ms (frame 236ms)|rdata 12ms iter 87ms (frame 230ms)
|subdiv_mesh_cage_and_final:|Average: 1.917769 FPS|Average: 1.971790 FPS
||rdata 7ms iter 37ms (frame 261ms)|rdata 7ms iter 31ms (frame 258ms)
||rdata 7ms iter 38ms (frame 252ms)|rdata 7ms iter 33ms (frame 249ms)
|subdiv_mesh_final_only:|Average: 6.387240 FPS|Average: 6.591251 FPS
||rdata 3ms iter 25ms (frame 151ms)|rdata 3ms iter 16ms (frame 145ms)
|subdiv_mesh_final_only_ledge:|Average: 6.247393 FPS|Average: 6.596024 FPS
||rdata 3ms iter 26ms (frame 158ms)|rdata 3ms iter 16ms (frame 148ms)
**Notes:**
- The improvement can only be noticed if all extracts are multithreaded.
- This patch touches different areas of the code, so it can be split into another patch if the idea is accepted.
These screenshots show how threads behave in a quadcore:
Master:
{F10164664}
Patch:
{F10164666}
Differential Revision: https://developer.blender.org/D11558
This patch adds a specific extraction method when the mesh has only
one material. This method is multi-threaded.
There is a trade-off in this patch as the ibo isn't compressed (it adds
restart indexes for hidden faces). So it depends if threading is faster
than the additional GPU buffer upload.
# Subdivided cube
I used a cube subdivided 7 times, modifiers applied. that gives around 400000 faces.
The test is selecting some vertices and move them. During this test the next buffers are updated on each frame:
* vbo.pos_nor
* vbo.lnor
* vbo.edit_data
* ibo.tris
* ibo.points
System info:
|platform| Linux-5.11.0-7614-generic-x86_64-with-glibc2.33|
| renderer| AMD SIENNA_CICHLID (DRM 3.40.0, 5.11.0-7614-generic, LLVM 11.0.1)|
|vendor| AMD|
|version| 4.6 (Core Profile) Mesa 21.0.1|
|cpu| Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz|
|compiler| gcc version 10.3.0|
Timing have been measured using DEBUG_TIME in `draw_cache_extract_mesh`.
master: `rdata 8ms iter 45ms (frame 153ms)`
this patch `rdata 6ms iter 36ms (frame 132ms)`
Reviewed By: mano-wii
Maniphest Tasks: T88352
Differential Revision: https://developer.blender.org/D11290
This is an optimization, but the difference is still not that
significant as some extractions are still done in single thread.
**Benchmarking**
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 14.246502 FPS|Average: 15.438118 FPS
||rdata 9ms iter 31ms (frame 69ms)|rdata 9ms iter 27ms (frame 65ms)
|large_mesh_editing_ledge: |Average: 14.913622 FPS|Average: 15.856538 FPS
||rdata 9ms iter 30ms (frame 67ms)|rdata 9ms iter 26ms (frame 63ms)
|looptris_test:|Average: 3.970774 FPS|Average: 4.095200 FPS
||rdata 11ms iter 90ms (frame 235ms)|rdata 12ms iter 87ms (frame 229ms)
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D11467
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.
To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.
`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.
`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.
For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.
* When creating tasks, the task number of current task is stored in
ExtractTaskData including the total number of tasks.
* Adding two functions in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
access the correct task data without any need for locking.
There is no noticeable change in end user performance.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11499
draw_cache_extract_mesh for task scheduling. Will be refactored to draw_cache_extract_mesh_scheduling later on after migrating to CPP.
draw_cache_extract_mesh_render_data extraction of mesh render data from edit mesh/mesh into a more generic structure.
draw_cache_extract_mesh_extractors containing all the extractors. This will be split up further into a single file per extractor.
This patch replaces / redoes the entire MeshExtractors system.
Although they were useful and facilitated the addition of new buffers, they made it difficult to control the threads and added a lot of threading overhead.
Part of the problem was in traversing the same loop type in different threads. The concurrent access of the BMesh Elements slowed the reading.
This patch simplifies the use of threads by merging all the old callbacks from the extracts into a single series of iteration functions.
The type of extraction can be chosen using flags.
This optimized the process by around 34%.
Initial idea and implementation By @mano-wii.
Fine-tuning, cleanup by @atmind.
MASTER:
large_mesh_editing:
- rdata 9ms iter 50ms (frame 155ms)
- Average: 6.462874 FPS
PATCH:
large_mesh_editing:
- rdata 9ms iter 34ms (frame 136ms)
- Average: 7.379491 FPS
Differential Revision: https://developer.blender.org/D11425
Reuse loose geometry during selection (and other operations) from
previous calculation. Loose geometry stays the same, but was
recalculated to determine the size of GPU buffers. This patch would
reuse the previous loose geometry when geometry wasn't changed.
Although not the main bottleneck during selection it is measurable.
Master.
`rdata 46ms iter 55ms (frame 410ms)`
This patch.
`rdata 5ms iter 52ms (frame 342ms)`
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11339
This was caused by unsafe sqrt calls.
Fixes T86578 white artifacts in EEVEE
Reviewed By: brecht, dfelinto
Differential Revision: https://developer.blender.org/D11428
This patch will use compute shaders to create the VBO for hair.
The previous implementation uses transform feedback.
Timings before: between 0.000069s and 0.000362s.
Timings after: between 0.000032s and 0.000092s.
Speedup isn't noticeable by end-users. The patch is used to test
the new compute shader pipeline and integrate it with the draw
manager. Allowing EEVEE, Workbench and other draw engines to
use compute shaders with the introduction of `DRW_shgroup_call_compute`
and `DRW_shgroup_vertex_buffer`.
Future improvements are possible by generating the index buffer
of hair directly on the GPU.
NOTE: that compute shaders aren't supported by Apple and still use
the transform feedback workaround.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D11057
This patch adds relatively small changes to the curve draw
cache implementation in order to draw the curve data in the
viewport. The dependency graph iterator is also modified
so that it iterates over the curve geometry component, which
is presented to users as `Curve` data with a pointer to the
`CurveEval`
The idea with the spline data type in geometry nodes is that
curve data itself is only the control points, and any evaluated
data with faces is a mesh. That is mostly expected elsewhere in
Blender anyway. This means it's only necessary to implement
wire edge drawing of `CurveEval` data.
Adding a `CurveEval` pointer to `Curve` is in line with changes
I'd like to make in the future like using `CurveEval` in more places
such as edit mode.
An alternate solution involves converting the curve wire data
to a mesh, however, that requires copying all of the data, and
since avoiding it is rather simple and is in-line with future plans
anyway, I think doing it this way is better.
Differential Revision: https://developer.blender.org/D11351
This reverts commit 8f9599d17e.
Mac seems to have an error with this change.
```
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:115:44: error: use of undeclared identifier 'shader_src'
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:123:13: error: use of undeclared identifier 'shader_src'
ERROR: make[2]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/intern/draw_hair.c.o] Error 1
ERROR: make[1]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/all] Error 2
ERROR: make: *** [all] Error 2
```