The depsgraph was always created within a fixed evaluation context. Passing
both risks the depsgraph and evaluation context not matching, and it
complicates the Python API where we'd have to expose both which is not so
easy to understand.
This also removes the global evaluation context in main, which assumed there
to be a single active scene and view layer.
Differential Revision: https://developer.blender.org/D3152
This adds initial multi-object editing support.
- Selected objects are used when entering edit & pose modes.
- Selection & tools work on all objects however many tools need porting
See: T54641 for remaining tasks.
Indentation will be done separately.
See patch: D3101
This was only used for viewport rendering, where we can just pass the engine
type directly. There is no technical reason why we can't draw the same depsgrpah
with different render engines.
It also led to some weird things like requiring a render engine for snapping
and raycast API functions.
Differential Revision: https://developer.blender.org/D3145
This refactor modernise the use of framebuffers.
It also touches a lot of files so breaking down changes we have:
- GPUTexture: Allow textures to be attached to more than one GPUFrameBuffer.
This allows to create and configure more FBO without the need to attach
and detach texture at drawing time.
- GPUFrameBuffer: The wrapper starts to mimic opengl a bit closer. This
allows to configure the framebuffer inside a context other than the one
that will be rendering the framebuffer. We do the actual configuration
when binding the FBO. We also Keep track of config validity and save
drawbuffers state in the FBO. We remove the different bind/unbind
functions. These make little sense now that we have separate contexts.
- DRWFrameBuffer: We replace DRW_framebuffer functions by GPU_framebuffer
ones to avoid another layer of abstraction. We move the DRW convenience
functions to GPUFramebuffer instead and even add new ones. The MACRO
GPU_framebuffer_ensure_config is pretty much all you need to create and
config a GPUFramebuffer.
- DRWTexture: Due to the removal of DRWFrameBuffer, we needed to create
functions to create textures for thoses framebuffers. Pool textures are
now using default texture parameters for the texture type asked.
- DRWManager: Make sure no framebuffer object is bound when doing cache
filling.
- GPUViewport: Add new color_only_fb and depth_only_fb along with FB API
usage update. This let draw engines render to color/depth only target
and without the need to attach/detach textures.
- WM_window: Assert when a framebuffer is bound when changing context.
This balance the fact we are not track ogl context inside GPUFramebuffer.
- Eevee, Clay, Mode engines: Update to new API. This comes with a lot of
code simplification.
This also come with some cleanups in some engine codes.
Previous approach was not clear enough and caused problems.
UBOs were taking slots and not release them after a shading group even
if this UBO was only for this Shading Group (notably the nodetree ubo,
since we now share the same GPUShader for identical trees).
So I choose to have a better defined approach:
- Standard texture and ubo calls are assured to be valid for the shgrp
they are called from.
- (new) Persistent texture and ubo calls are assured to be valid accross
shgrps unless the shader changes.
The standards calls are still valids for the next shgrp but are not assured
to be so if this new shgrp binds a new texture.
This enables some optimisations by not adding redundant texture and ubo
binds.
This leads to less lookups to the GWNShaderInterface and less uniform upload.
We still keep a legacy path so that Builtin uniforms can still work. We might restrict this path to Builtin shader only in the future.
- put render iterator in own scope
(would shadow it's own variable if used multiple times).
- enforce semicolon at end of iterator macros.
- no need to typedef one-off macro structs.
Also get rid of the static var and initialization.
This enables the user to see the progress on the info header.
Closing blender or reading a file also kill the job which is good.
Unfortunatly, this job cannot be interrupt by users directly. We could make it interruptible but we need a way to resume the compilation.
Selection code relies on being able to set the depth functions
however passes have their own depth settings.
Add DRW_state_lock to ignore passes settings for particular flags.
This fixes occlusion queries cycling through objects under the cursor.
By default select wasn't picking the nearest object,
this could have been fixed by not clearing the depth buffer,
but calling GPU_select_(begin/end) without the binded frame-buffer
caused issues for depth-picking. So move GPU_select begin/end to a
callback.
This also has the advantage that only needs to populate the engines once
to draw two passes.
Note that cycling through objects fails with occlusion queries still,
will fix shortly.
This is very efficient and add a pretty low overhead (0.1ms of drawing time for 10K objects passing through all tests, on my i3-4100M).
The like the rest of the DRWCallState, test is "cached" until the view matrices changes.
Refactor include:
- Removal of DRWInterface. (was useless)
- Split DRWCallHeader into a new struct DRWCallState that will be reused in the future.
- Use BLI_link_utils for APPEND/PREPEND.
- Creation of the new DRWManager struct type. This will enable us to create more than one manager in the future.
- Removal of some dead code.
User notes
----------
Compositing, rendering of multi-layers in Eevee should be fully working now.
Development notes
-----------------
Up until now we were still using the same depsgraph for rendering and viewport
evaluation. And we had to go out of our ways to be sure the depsgraphs were
updated.
Now we iterate over the (to be rendered) view layers and create a depsgraph to
each one, fully evaluated and call the render engines (Cycles, Eevee, ...) with
this viewlayer/depsgraph/evaluation context.
At this time we are not handling data persistency, Depsgraph is created from
scratch prior to rendering each frame. So I got rid of most of the partial
update calls we had during the render pipeline.
Cycles: Brecht Van Lommel did a patch to tackle some of the required Cycles
changes but this commit mark these changes as TODOs. Basically Cycles needs to
render one layer at a time.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D3073
-Make the view and object dependant matrices calculation isolated and separated, avoiding non-needed calculation.
-Adding a per drawcall matrix cache so that we can precompute these in advance in the future.
-Replaced integer uniform location of only view dependant builtins by DRWUniforms that are only updated once per shgroup.
Depth picking needs to read the depth buffer after drawing
since GPU_select_end runs in a different OpenGL context
reading the depth buffer wasn't working.
This caused the last object to be unelectable.
This separate context allows two things:
- It allows viewports in multi-windows configuration.
- F12 render can use this context in a separate thread and do a non-blocking render.
The downside is that the context cannot be used while rendering so a request to refresh a viewport will lock the UI. This is something that will be adressed in the future.
Under the hood what does that mean:
- Not adding more mess with VAOs management in gawain.
- Doing depth only draw for operators / selection needs to be done in an offscreen buffer.
- The 3D cursor "autodis" operator is still reading the backbuffer so we need to copy the result to it.
- All FBOs needed by the drawmanager must to be created/destroyed with its context active.
- We cannot use batches created for UI in the DRW context and vice-versa. There is a clear separation of resources that enables the use of safe multi-threading.
Turns out to be the call that was destroying performance.
I get 18ms->6ms improvement of drawing time with 10 000 unique objects.
And we can still improve upon this!
Technically the original issue is that xof/yof in render result is calculated
for drawing border render. So a simpler patch could be:
```
- rr->xof = re->disprect.xmin;
+ rr->xof = re->disprect.xmin + BLI_rcti_cent_x(&re->disprect) - (re->winx / 2);
```
However everywhere in the code we are getting border directly from re->disprect
which we may as well do here too.
Besides I'm taking this as a chance to get rid of RenderResult in the internal
loop of eevee, to help prepare the code to the upcoming rendering pipeline
changes.
A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).