This is a usefull feature that can be used to do a lot of precomputation on
the GPU instead of the CPU.
Implementation is simple and only covers the most usefull case.
How to use:
- Create shader with transform feedback.
- Create a pass with DRW_STATE_TRANS_FEEDBACK.
- Create a target Gwn_VertBuf (make sure it's big enough).
- Create a shading group with DRW_shgroup_transform_feedback_create().
- Add your draw calls to the shading group.
- Render your pass normaly.
Current limitation:
- Only one output buffer.
- Cannot pause/resume tfb rendering to interleave with normal drawcalls.
- Cannot get the number of verts drawn.
Initial review of the shard shadows in the workbench engine.
Speed optimizations like transform feedback are not implemented yet. I first want this part to be reviewed and merged.
@fclem please check the note in drw_stencil_set it was holding back nequal == 0 as by default DST.stencil_mask was set to 0. questioin is should we remove the whole check or not.
Also I am still looking for a better name (or split the enum) for DRW_STATE_STENCIL_DEPTH_FAIL_INCR_DECR_WRAP
Reviewers: fclem
Reviewed By: fclem
Tags: #code_quest
Differential Revision: https://developer.blender.org/D3198
This add a callback function that runs after frustum culling test.
This callback returns the final visibility for this object.
Be aware that it's called for EVERY drawcalls that use this callback even
if their visibility has been cached.
This uniforms can be used to have a unique id for each drawcall of a shgrp.
This only works for standard shgroups and is an exception for the outline
drawing.
Blender allows this.
The Cube in the file in the report would always disappear with the non camera view.
The clip_end was too small.
The correction here is only on the assert.
Although somewhat less micro efficient, I decided to separate the `viewinv` matrix to calculate the world position separately.
This makes it easier to understand the code.
The problem was that textures were assigned to different slots on different draw
calls, which caused shader specialization/patching by the driver. So the shader
would be compiled over and over until all possible assignments were used.
Previous approach was not clear enough and caused problems.
UBOs were taking slots and not release them after a shading group even
if this UBO was only for this Shading Group (notably the nodetree ubo,
since we now share the same GPUShader for identical trees).
So I choose to have a better defined approach:
- Standard texture and ubo calls are assured to be valid for the shgrp
they are called from.
- (new) Persistent texture and ubo calls are assured to be valid accross
shgrps unless the shader changes.
The standards calls are still valids for the next shgrp but are not assured
to be so if this new shgrp binds a new texture.
This enables some optimisations by not adding redundant texture and ubo
binds.
This leads to less lookups to the GWNShaderInterface and less uniform upload.
We still keep a legacy path so that Builtin uniforms can still work. We might restrict this path to Builtin shader only in the future.
The idea is to separate the most common case from symmetrical frustum. And to make a simple but efficient calculation.
The new radius is usually 98% the size of the radius size of the asymmetric solution.
Thanks to @fclem for reviewing the patch on IRC
Instead of creating a new instancing shading group without attrib, we now have instancing calls. The benefits is that they can be culled.
They can be used in conjuction with the standard and generate calls but shader must support it (which is generally not the case).
We store a pointer to the actual count so that the number can be tweaked between redraw.
This will makes multi layer rendering more efficient.
Selection code relies on being able to set the depth functions
however passes have their own depth settings.
Add DRW_state_lock to ignore passes settings for particular flags.
This fixes occlusion queries cycling through objects under the cursor.
This is very efficient and add a pretty low overhead (0.1ms of drawing time for 10K objects passing through all tests, on my i3-4100M).
The like the rest of the DRWCallState, test is "cached" until the view matrices changes.