Refactor include:
- Removal of DRWInterface. (was useless)
- Split DRWCallHeader into a new struct DRWCallState that will be reused in the future.
- Use BLI_link_utils for APPEND/PREPEND.
- Creation of the new DRWManager struct type. This will enable us to create more than one manager in the future.
- Removal of some dead code.
User notes
----------
Compositing, rendering of multi-layers in Eevee should be fully working now.
Development notes
-----------------
Up until now we were still using the same depsgraph for rendering and viewport
evaluation. And we had to go out of our ways to be sure the depsgraphs were
updated.
Now we iterate over the (to be rendered) view layers and create a depsgraph to
each one, fully evaluated and call the render engines (Cycles, Eevee, ...) with
this viewlayer/depsgraph/evaluation context.
At this time we are not handling data persistency, Depsgraph is created from
scratch prior to rendering each frame. So I got rid of most of the partial
update calls we had during the render pipeline.
Cycles: Brecht Van Lommel did a patch to tackle some of the required Cycles
changes but this commit mark these changes as TODOs. Basically Cycles needs to
render one layer at a time.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D3073
-Make the view and object dependant matrices calculation isolated and separated, avoiding non-needed calculation.
-Adding a per drawcall matrix cache so that we can precompute these in advance in the future.
-Replaced integer uniform location of only view dependant builtins by DRWUniforms that are only updated once per shgroup.
Depth picking needs to read the depth buffer after drawing
since GPU_select_end runs in a different OpenGL context
reading the depth buffer wasn't working.
This caused the last object to be unelectable.
This separate context allows two things:
- It allows viewports in multi-windows configuration.
- F12 render can use this context in a separate thread and do a non-blocking render.
The downside is that the context cannot be used while rendering so a request to refresh a viewport will lock the UI. This is something that will be adressed in the future.
Under the hood what does that mean:
- Not adding more mess with VAOs management in gawain.
- Doing depth only draw for operators / selection needs to be done in an offscreen buffer.
- The 3D cursor "autodis" operator is still reading the backbuffer so we need to copy the result to it.
- All FBOs needed by the drawmanager must to be created/destroyed with its context active.
- We cannot use batches created for UI in the DRW context and vice-versa. There is a clear separation of resources that enables the use of safe multi-threading.
Turns out to be the call that was destroying performance.
I get 18ms->6ms improvement of drawing time with 10 000 unique objects.
And we can still improve upon this!
Technically the original issue is that xof/yof in render result is calculated
for drawing border render. So a simpler patch could be:
```
- rr->xof = re->disprect.xmin;
+ rr->xof = re->disprect.xmin + BLI_rcti_cent_x(&re->disprect) - (re->winx / 2);
```
However everywhere in the code we are getting border directly from re->disprect
which we may as well do here too.
Besides I'm taking this as a chance to get rid of RenderResult in the internal
loop of eevee, to help prepare the code to the upcoming rendering pipeline
changes.
A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
We need to move the render result logic outside the render engine code.
It makes no sense for Eevee/Clay/... to have to re-implement the render resilt
creation logic. Beside the original implementation really got it wrong, by
ignoring the different render layers needed for the final render.
Finally, there is no need to re-create the logic for views. So this was also
fixed.
Note 1: This will break still if the depsgraph of the needed view layers is not
updated / created. We need to address this separately. For now if users want
to test this, just show each view layer in the viewport at least once.
Note 2: We are still getting depsgraph from scene and creating if needed.
`BKE_scene_get_depsgraph(scene, view_layer, true);` according to Sergey we need
to move the render depsgraph for the Render struct instead. I will do it
separately as well.
This removes the need of custom attribs for instancing.
Instancing works fully with dynamic batches & Gwn_VertFormat now.
This is in prevision of the VAO manager patch.
This manager allows to distribute existing batches for instancing
attributes. This reduce the number of batches creation.
Querying a batch is done with a vertex format. This format should
be static so that it's pointer never changes (because we are using
this pointer as identifier [we don't want to check the full format
that would be too slow]).
This might make the original Instance Data manager useless but it's currently used by DRW_object_engine_data_ensure().
Theses batches keeps their memory chuck allocated after transfer to be reused and updated very often.
NOTE: This commit break instancing in DRW. (it's fixed in the next commit)
This was caused by dupli's ObjectEngineData that were not free.
This allocates the data using the instance data manager (no alloc/free between frames). Though the data should be treated as not persistent in this case.
For simplicity we choose to execute the rendering of Opengl engines in the main thread and block the interface.
This might be addressed in the future at least for video rendering.
A drawmanager wrapper (DRW_render_to_image) is called by the render pipeline to set up the Opengl state and then call the specific draw_engine->render_to_image function.
General idea of the fix: skip the whole draw manager callback madness which
was used to tag object's engine specific data as dirty. Use generic recalc
flag in ObjectEngineData structure instead. This gives us the following
benefits;
- Sovles mentioned bug report.
- Avoids whole interface lookup for opened viewports for EVERY changed ID.
- Fixes missing updates when viewport is temporarily invisible.
Reviewers: dfelinto, fclem
Differential Revision: https://developer.blender.org/D3028
Main idea is to make specific engine types be a subclass of generic
ObjectEngineData structure.
This required following changes:
- Have extra size argument to engine data allocation function.
Not sure whether there is less error-prone way of doing this.
- Add init() callback to engine data allocation function.
Additionally, added some extra checks to Eevee's engine data getters, so we do
not silently cast lamp data to lightprobe data.
Reviewers: dfelinto, fclem
Differential Revision: https://developer.blender.org/D3027
`gpu_texture_try_alloc` invalidates zero-sized textures.
The message in the console is not correct in this case (because it is not due to lack of memory).
This optimisation only works if no material in the scene require the AO pass.
For this either set the AO distance to 0 or both Cavity and Edges factors to 0.
This double the performance of scenes with very high triangle count.
This is because certain part of the engine may require a blank framebuffer to bind textures to.
This is the case when using only array textures, unsupported by DRW_framebuffer_init().