A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
This removes the need of custom attribs for instancing.
Instancing works fully with dynamic batches & Gwn_VertFormat now.
This is in prevision of the VAO manager patch.
For simplicity we choose to execute the rendering of Opengl engines in the main thread and block the interface.
This might be addressed in the future at least for video rendering.
A drawmanager wrapper (DRW_render_to_image) is called by the render pipeline to set up the Opengl state and then call the specific draw_engine->render_to_image function.
Main idea is to make specific engine types be a subclass of generic
ObjectEngineData structure.
This required following changes:
- Have extra size argument to engine data allocation function.
Not sure whether there is less error-prone way of doing this.
- Add init() callback to engine data allocation function.
Additionally, added some extra checks to Eevee's engine data getters, so we do
not silently cast lamp data to lightprobe data.
Reviewers: dfelinto, fclem
Differential Revision: https://developer.blender.org/D3027
Both object level and camera datablock properties animation did not work with
copy on write enabled.
The root of the issue is going to the fact, that all interface elements are
referencing original datablock. For example, View3D has pointer to camera it's
using, and all areas which does access v3d->camera should in fact query for
the evaluated version of that camera, within the current context.
Annoying part of this change is that we now need to pass depsgraph in lots
of places. Which is rather annoying.
Alternative would be to cache evaluated camera in viewport itself, but then
it makes it annoying to keep things in sync.
Not sure if there is nicer solution here.
Reviewers: dfelinto, campbellbarton, mont29
Subscribers: dragoneex
Differential Revision: https://developer.blender.org/D3007
This modify the selection code quite a bit but it's for the better.
When using selection we use the same batching / instancing process but we draw each element at a time using a an offset to the first element we want to draw and by drawing only one element.
This result much less memory allocation and better draw time.
This allows a duplicator (as known as dupli parent) to be in a visible
collection so its duplicated objects are visible, however while being
invisible for the final render.
An object that is a particle emitter is also considered a duplicator.
Many thanks for the reviewers for the extense feedback.
Reviewers: sergey, campbellbarton
Differential Revision: https://developer.blender.org/D2966
Users can change the group collection visibility in the outliner
when looking at groups.
Regular collections on the other hand don't have any special visibility control,
if you need a collection to be invisible during render, either don't link it
into the view layer used for F12, or disable it.
This includes:
* Updated unittests - update your lib/tests/layers folder.
* Subversion bump - branches be aware of that.
Note:
Although we are using eval_ctx to determine the visibility of a group collection
when rendering, the depsgraph is still using the same depsgraph for the viewport
and the render engine, so at the moment the render visibility is ignored.
Following next is a workaround for this separately to tag the groups before and
after rendering to tackle that.
The RenderResult struct still has a listbase of RenderLayer, but that's ok
since this is strictly for rendering.
* Subversion bump (to 2.80.2)
* DNA low level doversion (renames) - only for .blend created since 2.80 started
Note: We can't use DNA_struct_elem_find or get file version in init_structDNA,
so we are manually iterating over the array of the SDNA elements instead.
Note 2: This doversion change with renames can be reverted in a few months. But
so far it's required for 2.8 files created between October 2016 and now.
Reviewers: campbellbarton, sergey
Differential Revision: https://developer.blender.org/D2927
This adds a custom depth test that have the benefits to glitch less and be more visually pleasing.
Downside is that it let the grid pass trough the objects a little.
This effect is done in NDC space so that it counteract the logarithmic depth distribution imprecision (read as it's less visible near the camera but more present far away).
This patch also includes some cleanups.
This required some small changes to the data display shaders so that they match the way the object mode renders them.
Strangely enough, I had to remove the normal attribute from the display code because it was being not bound as soon as I created another rendering call in object mode. The problem may be deeper but I did not have time for this so I derive the normal from the sphere pos.
You can change the amount of samples in the user preferences. You do not need to restart blender to see the effect in the new viewport.
This adds another Multisample Framebuffer and textures (so even more memory required).
It works by blitting the default_fb to the multisample_fb each time the renderer need to render one or more "wire" pass.
It it then blit back to the default_fb so that the rest of pipeline is working as expected.
We COULD lower the GPU memory / bandwidth usage to render everything to the same multisample fbo and change the logic depending on if MSAA is enabled or not, but I think it's a bit too much work for now.
Adds a FXAA for smoothing out the extracted outlines.
The Post Process Anti Aliasing is only done on the Alpha channel of the outlines.
Because of that we need to add bleed the outline color out of the silouhette so the AA'd alpha can blend the right color and not pick black when the alpha is smoothed out of the silhouette.
Also because of the AA needs to have clear contrast to work with, I decided to ditch the "bluring" or the occluded outlines.
The FXAA adds an overhead of 0.17ms but we gain back 0.22ms * 4 = 0.88ms by removing the blur.
The FXAA Implementation is from Corey Richardson (cmr) (D2717). I had to modify it a bit to only filter the alpha channel.
Iterate over invisible objects too, so lamps can still lit the scene.
Also, now you can use a collection to set an object to invisible, not
only to visible.
For example:
Scene > Master collection > bedroom > furniture
Scene > View Layer > bedroom (visible)
> furniture (invisible)
The View Layer has two linked collections, bedroom and furniture.
This setup will make the furniture collection invisible.
Note: Unlike what was suggested on D2849, this does not make collection
visibility influence camera visibility. I will keep this as a separate
patch.
Reviewers: sergey
Subscribers: sergey, brecht, fclem
Differential Revision: https://developer.blender.org/D2849
This is in order to use the same texture on multiple sampler.
Also texture counter is reset after each shading group. This mimics the previous behaviour.
We do have an history of those pieces of evil in our code, would be nice
to get fully rid of it, but at the very least let's not add more of them
in new code. :)
Group texture fetches to hide latency. 3.2ms -> 2.2ms (constant time improvement, not depending on scene complexity)
Could optimize further with textureGather (require OpenGL 4.0).
A pointer to the uniform data for the empty drawing was being freed
before the actual draw call, which invalidates the uniform.
This makes the data only be freed after drawing.