Commit Graph

37 Commits

Author SHA1 Message Date
622e6f05f1 Fix T92750: sculpt vertex colors missing in object mode
The layers were not aliased properly for usage in the shaders.

Regression caused by rB03013d19d167.
2021-11-15 02:38:52 +01:00
a3b785bc08 Cleanup: clang-format, clang-tidy, spelling 2021-10-27 15:55:36 +11:00
3e3ff1a464 Revert "Revert "Eevee: support accessing custom mesh attributes""
This reverts commit e7fedf6dba.

And also fix a compilation issue on windows.

Differential Revision: https://developer.blender.org/D12969
2021-10-26 18:23:59 -03:00
e7fedf6dba Revert "Eevee: support accessing custom mesh attributes"
This reverts commit 03013d19d1.

This commit broke the windows build pretty badly and I don't
feel confident landing the fix for this without review.

Will post a possible fix in D12969 and we'll take it from there.
2021-10-26 14:49:22 -06:00
03013d19d1 Eevee: support accessing custom mesh attributes
This adds generic attribute rendering support for meshes for Eevee and
Workbench. Each attribute is stored inside of the `MeshBufferList` as a
separate VBO, with a maximum of `GPU_MAX_ATTR` VBOs for consistency with
the GPU shader compilation code.

Since `DRW_MeshCDMask` is not general enough, attribute requests are
stored in new `DRW_AttributeRequest` structures inside of a convenient
`DRW_MeshAttributes` structure. The latter is used in a similar manner
as `DRW_MeshCDMask`, with the `MeshBatchCache` keeping track of needed,
used, and used-over-time attributes. Again, `GPU_MAX_ATTR` is used in
`DRW_MeshAttributes` to prevent too many attributes being used.

To ensure thread-safety when updating the used attributes list, a mutex
is added to the Mesh runtime. This mutex will also be used in the future
for other things when other part of the rendre pre-processing are multi-threaded.

`GPU_BATCH_VBO_MAX_LEN` was increased to 16 in order to accommodate for
this design.

Since `CD_PROP_COLOR` are a valid attribute type, sculpt vertex colors
are now handled using this system to avoid to complicate things. In the
future regular vertex colors will also use this. From this change, bit
operations for DRW_MeshCDMask are now using uint32_t (to match the
representation now used by the compiler).

Due to the difference in behavior for implicit type conversion for scalar types
between OpenGL and what users expect (a scalar `s` is converted to
`vec4(s, 0, 0, 1)` by OpenGL, vs. `vec4(s, s, s, 1)` in Blender's various node graphs) ,
all scalar types are using a float3 internally for now, which increases memory usage.
This will be resolved during or after the EEVEE rewrite as properly handling
this involves much deeper changes.

Ref T85075

Reviewed By: fclem

Maniphest Tasks: T85075

Differential Revision: https://developer.blender.org/D12969
2021-10-26 18:29:30 +02:00
be1891e895 Cleanup: move the buffer list to 'MeshBufferCache'
The cache is used to fill the buffer list.
2021-08-23 13:44:31 -03:00
eb0c50ac78 Cleanup: rename 'MeshBufferExtractionCache' to 'MeshBufferCache'
Matches the existing `MeshBatchCache`.
2021-08-23 13:43:42 -03:00
6e51ef9531 Cleanup: rename 'MeshBufferCache' to 'MeshBufferList'
`MeshBufferList` is more specific and can avoid confusion with
`MeshBufferExtractionCache`.
2021-08-23 13:41:03 -03:00
Germano Cavalcante
ebdae75736 Cleanup: Move 'tris_per_mat' member out of 'MeshBufferCache'
`MeshBufferCache` is a struct representing a list of buffers.

As such, `GPUIndexBuf **tris_per_mat` is out of place as it does not
represent one of the buffers in the list.

In fact this member should be close to `GPUBatch **surface_per_mat` as
they are related.

The code for dependencies between buffer and batch had to be reworked
as it relies on the member's position.

Differential Revision: https://developer.blender.org/D12227
2021-08-23 13:37:32 -03:00
Germano Cavalcante
3059853732 Cleanup: Rearrange mesh extraction files
In the draw module, it's not easy to identify what its header is, and
where the shared functions are.

So move `draw_cache_extract_mesh_extractors.c` and
`draw_cache_extract_mesh_private.h` to the same folder as the extractors
and rename these files to make them more identifiable.

Reviewed By: jbakker

Differential Revision: https://developer.blender.org/D11991
2021-07-26 10:25:39 -03:00
Germano Cavalcante
178086d581 Draw Cache: extract tris in parallel ranges
The `ibo.tris` extraction in multithread is currently only done if the
mesh has only 1 material.

Now we cache a map indicating the index of each polygon after sort and
thus allow the extraction of tris with materials in multithreaded.

As caching is a heavy operation and was already being performed in
multi-thread for triangle offsets, no significant improvements are
expected.

The benefit will be much greater when we can skip updating the cache
while transforming a geometry.

**Profiling:**
||master:|PATCH:
|---|---|---|
|large_mesh_editing_materials:|Average: 13.855380 FPS|Average: 15.525684 FPS
||rdata 9ms iter 36ms (frame 71ms)|rdata 9ms iter 29ms (frame 64ms)
|subdiv_mesh_final_only_materials:|Average: 28.113742 FPS|Average: 28.633599 FPS
||rdata 0ms iter 1ms (frame 36ms)|rdata 0ms iter 1ms (frame 35ms)

1.1x overall speedup

Differential Revision: https://developer.blender.org/D11445
2021-07-21 15:09:43 -03:00
785d87ee42 Fix T90017: Bone widget drawing inconsistent with editing
The `lines_loose` extractor did not trigger loose geometry caching.
2021-07-21 14:46:53 -03:00
9b89de2571 Cleanup: consistent use of tags: NOTE/TODO/FIXME/XXX
Also use doxy style function reference `#` prefix chars when
referencing identifiers.
2021-07-04 00:43:40 +10:00
841b2cea7b Cleanup: compiler & clang-tidy warnings 2021-07-02 12:15:29 +10:00
fd1fec5600 Cleanup: Clang tidy, remove typedef 2021-07-01 16:33:07 -05:00
4a7951fede Cleanup: Separate each extractor into specific compile units
Makes code cleaner and easier to find.
2021-07-01 11:13:58 -03:00
4b9ff3cd42 Cleanup: comment blocks, trailing space in comments 2021-06-24 15:59:34 +10:00
af4167441b Cleanup: clang-tidy 2021-06-18 14:41:24 +10:00
Jeroen Bakker
174ed69c1b DrawManager: Cache material offsets.
When using multiple materials in a single mesh the most time is spend in
counting the offsets of each material for the sorting.

This patch moves the counting of the offsets to render mesh data and
caches it as long as the geometry doesn't change.

This patch doesn't include multithreading of this code.

Reviewed By: mano-wii

Differential Revision: https://developer.blender.org/D11612
2021-06-15 15:31:34 +02:00
0eb9351296 Refactor: use 'BLI_task_parallel_range' in Draw Cache
One drawback to trying to predict the number of threads that will be
used in the `task_graph` is that we are only sure of the number when the
threads are running.

Using `BLI_task_parallel_range` allows the driver to
choose the best thread distribution through `parallel_reduce`.

The benefit is most evident on hardware with fewer cores.

This is the result on an 4-core laptop:
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)

Differential Revision: https://developer.blender.org/D11558
2021-06-11 10:49:50 -03:00
Germano Cavalcante
2330cec2c6 Refactor: Draw Cache: use 'BLI_task_parallel_range'
This is an adaptation of {D11488}.

A disadvantage of manually setting the iter ranges per thread is that
we don't know how many threads are running in the background and so we
don't know how to best distribute the ranges.

To solve this limitation we can use `parallel_reduce` and thus let the
driver choose the best distribution of ranges among the threads.

This proved to be especially beneficial for computers with few cores.

**Benchmarking:**
Here's the result on an 4-core laptop:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)

Here's the result on an 8-core PC:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 15.267482 FPS|Average: 15.906881 FPS
||rdata 9ms iter 28ms (frame 65ms)|rdata 9ms iter 25ms (frame 63ms)
|large_mesh_editing_ledge: |Average: 15.145966 FPS|Average: 15.520474 FPS
||rdata 9ms iter 29ms (frame 65ms)|rdata 9ms iter 25ms (frame 64ms)
|looptris_test:|Average: 4.001917 FPS|Average: 4.061105 FPS
||rdata 12ms iter 90ms (frame 236ms)|rdata 12ms iter 87ms (frame 230ms)
|subdiv_mesh_cage_and_final:|Average: 1.917769 FPS|Average: 1.971790 FPS
||rdata 7ms iter 37ms (frame 261ms)|rdata 7ms iter 31ms (frame 258ms)
||rdata 7ms iter 38ms (frame 252ms)|rdata 7ms iter 33ms (frame 249ms)
|subdiv_mesh_final_only:|Average: 6.387240 FPS|Average: 6.591251 FPS
||rdata 3ms iter 25ms (frame 151ms)|rdata 3ms iter 16ms (frame 145ms)
|subdiv_mesh_final_only_ledge:|Average: 6.247393 FPS|Average: 6.596024 FPS
||rdata 3ms iter 26ms (frame 158ms)|rdata 3ms iter 16ms (frame 148ms)

**Notes:**
- The improvement can only be noticed if all extracts are multithreaded.
- This patch touches different areas of the code, so it can be split into another patch if the idea is accepted.

These screenshots show how threads behave in a quadcore:
Master:
{F10164664}
Patch:
{F10164666}

Differential Revision: https://developer.blender.org/D11558
2021-06-11 10:45:12 -03:00
Jeroen Bakker
6e999e08ab T88352: Use threaded ibo.tris extraction for single material meshes.
This patch adds a specific extraction method when the mesh has only
one material. This method is multi-threaded.

There is a trade-off in this patch as the ibo isn't compressed (it adds
restart indexes for hidden faces). So it depends if threading is faster
than the additional GPU buffer upload.

# Subdivided cube
I used a cube subdivided 7 times, modifiers applied. that gives around 400000 faces.

The test is selecting some vertices and move them. During this test the next buffers are updated on each frame:
* vbo.pos_nor
* vbo.lnor
* vbo.edit_data
* ibo.tris
* ibo.points

System info:
|platform| Linux-5.11.0-7614-generic-x86_64-with-glibc2.33|
| renderer|      AMD SIENNA_CICHLID (DRM 3.40.0, 5.11.0-7614-generic, LLVM 11.0.1)|
|vendor|         AMD|
|version|        4.6 (Core Profile) Mesa 21.0.1|
|cpu|              Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz|
|compiler|     gcc version 10.3.0|

Timing have been measured using DEBUG_TIME in `draw_cache_extract_mesh`.

master: `rdata 8ms iter 45ms (frame 153ms)`
this patch `rdata 6ms iter 36ms (frame 132ms)`

Reviewed By: mano-wii

Maniphest Tasks: T88352

Differential Revision: https://developer.blender.org/D11290
2021-06-09 16:20:53 +02:00
Germano Cavalcante
e4c6da29b2 Draw Cache: use threading for Mesh extract lines
This is an optimization, but the difference is still not that
significant as some extractions are still done in single thread.

**Benchmarking**
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 14.246502 FPS|Average: 15.438118 FPS
||rdata 9ms iter 31ms (frame 69ms)|rdata 9ms iter 27ms (frame 65ms)
|large_mesh_editing_ledge: |Average: 14.913622 FPS|Average: 15.856538 FPS
||rdata 9ms iter 30ms (frame 67ms)|rdata 9ms iter 26ms (frame 63ms)
|looptris_test:|Average: 3.970774 FPS|Average: 4.095200 FPS
||rdata 11ms iter 90ms (frame 235ms)|rdata 12ms iter 87ms (frame 229ms)

Reviewed By: jbakker

Differential Revision: https://developer.blender.org/D11467
2021-06-09 08:58:08 -03:00
Jeroen Bakker
259b9c73d0 GPU: Thread safe index buffer builders.
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.

To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.

`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.

`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.

For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.

* When creating tasks, the task number of current task is stored in
  ExtractTaskData including the total number of tasks.
* Adding two functions  in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
  access the correct task data without any need for locking.

There is no noticeable change in end user performance.

Reviewed By: mano-wii

Differential Revision: https://developer.blender.org/D11499
2021-06-08 16:36:06 +02:00
5b014911a5 Revert "Cleanup: use cpp new/delete."
This reverts commit 43464c94f4.
2021-06-08 15:08:09 +02:00
23fd576cf8 Cleanup: replace NULL with nullptr. 2021-06-08 13:14:18 +02:00
43464c94f4 Cleanup: use cpp new/delete. 2021-06-08 13:12:49 +02:00
322a614497 Cleanup: replace typedef structs with structs. 2021-06-08 12:03:06 +02:00
340c535dbf Cleanup: Separate compile unit edituv. 2021-06-08 12:03:06 +02:00
088ea59b7e Cleanup: Separate compile unit lines_adjacency. 2021-06-08 12:03:06 +02:00
cac9828ae3 Cleanup: Separate compile unit lines_paint_mask. 2021-06-08 12:03:06 +02:00
9e9d45ae16 Cleanup: Separate fdots extraction in own compile unit. 2021-06-08 12:03:06 +02:00
4a9c5c60b7 Cleanup: Move extract lines to compile unit. 2021-06-07 16:57:21 +02:00
0e285fa23c Cleanup: Move extract tris in own compile unit. 2021-06-07 16:55:09 +02:00
3da0b52c97 Cleanup: compiler warnings signed/unsigned mismatch 2021-06-08 00:50:25 +10:00
2bf56f7fbb Fix: do not use threading for 'extract_points'
`extract_points` doesn't support multithreading yet.
2021-06-07 11:30:17 -03:00
e517aaa136 Cleanup: move extract points into own compile unit. 2021-06-07 13:27:38 +02:00