BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
This mean you can store data used for drawing inside the object engine
data.
Also fixes T55243 Crash in ASAN debug builds due to use-after-free memory in draw code - instances issue?
We now alloc a vbo id on creation and let OpenGL manage its memory directly.
We use glMapBuffer to get this memory location.
This enables us to reuse and modify any vertex buffer directly without
destroying it with its associated Batches.
This commit does not really improve performance but will let us implement
more optimizations in the future.
We can also resize the buffer even if this can be slow if we need to keep
the existing data.
The addition of the usage hint makes dynamic buffers not a special case
anymore, simplifying things a bit.
A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
This manager allows to distribute existing batches for instancing
attributes. This reduce the number of batches creation.
Querying a batch is done with a vertex format. This format should
be static so that it's pointer never changes (because we are using
this pointer as identifier [we don't want to check the full format
that would be too slow]).
This might make the original Instance Data manager useless but it's currently used by DRW_object_engine_data_ensure().
This is a special memory manager that keeps memory blocks ready to send as vbo data.
Since we loose which memory block was used each DRWShadingGroup we need to redistribute them in the same order/size to avoid to realloc each frame.
This is why DRWInstanceDatas are sorted in a list for each different data size.