This changes quite a few things.
- Outline is now per object.
- No more outline at object intersection (fix hairs problem).
- Simplify the code quite a bit.
We use a R16UI buffer to save one id per object outline. We convert this id
to color when detecting the outline.
Added textureGatherOffsets to the code but could not test on current hardware
so leaving it commented for now.
The way how particle state is to be accessed or used did not change
in Blender 2.8, so the drawing code should follow old design.
This code is somewhat duplicated from drawobject.c, but old draw
code is on the way to be removed anyway.
This fixes issue with disappearing particles when tweaking number
of particles.
This refactor modernise the use of framebuffers.
It also touches a lot of files so breaking down changes we have:
- GPUTexture: Allow textures to be attached to more than one GPUFrameBuffer.
This allows to create and configure more FBO without the need to attach
and detach texture at drawing time.
- GPUFrameBuffer: The wrapper starts to mimic opengl a bit closer. This
allows to configure the framebuffer inside a context other than the one
that will be rendering the framebuffer. We do the actual configuration
when binding the FBO. We also Keep track of config validity and save
drawbuffers state in the FBO. We remove the different bind/unbind
functions. These make little sense now that we have separate contexts.
- DRWFrameBuffer: We replace DRW_framebuffer functions by GPU_framebuffer
ones to avoid another layer of abstraction. We move the DRW convenience
functions to GPUFramebuffer instead and even add new ones. The MACRO
GPU_framebuffer_ensure_config is pretty much all you need to create and
config a GPUFramebuffer.
- DRWTexture: Due to the removal of DRWFrameBuffer, we needed to create
functions to create textures for thoses framebuffers. Pool textures are
now using default texture parameters for the texture type asked.
- DRWManager: Make sure no framebuffer object is bound when doing cache
filling.
- GPUViewport: Add new color_only_fb and depth_only_fb along with FB API
usage update. This let draw engines render to color/depth only target
and without the need to attach/detach textures.
- WM_window: Assert when a framebuffer is bound when changing context.
This balance the fact we are not track ogl context inside GPUFramebuffer.
- Eevee, Clay, Mode engines: Update to new API. This comes with a lot of
code simplification.
This also come with some cleanups in some engine codes.
This module has no use now with the new DrawManager and DrawEngines and it
is using deprecated paths.
Moving gpu_shader_fullscreen_vert.glsl
to draw/modes/shaders/common_fullscreen_vert.glsl
Instead of creating a new instancing shading group without attrib, we now have instancing calls. The benefits is that they can be culled.
They can be used in conjuction with the standard and generate calls but shader must support it (which is generally not the case).
We store a pointer to the actual count so that the number can be tweaked between redraw.
This will makes multi layer rendering more efficient.
A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
This removes the need of custom attribs for instancing.
Instancing works fully with dynamic batches & Gwn_VertFormat now.
This is in prevision of the VAO manager patch.
For simplicity we choose to execute the rendering of Opengl engines in the main thread and block the interface.
This might be addressed in the future at least for video rendering.
A drawmanager wrapper (DRW_render_to_image) is called by the render pipeline to set up the Opengl state and then call the specific draw_engine->render_to_image function.
Main idea is to make specific engine types be a subclass of generic
ObjectEngineData structure.
This required following changes:
- Have extra size argument to engine data allocation function.
Not sure whether there is less error-prone way of doing this.
- Add init() callback to engine data allocation function.
Additionally, added some extra checks to Eevee's engine data getters, so we do
not silently cast lamp data to lightprobe data.
Reviewers: dfelinto, fclem
Differential Revision: https://developer.blender.org/D3027
Both object level and camera datablock properties animation did not work with
copy on write enabled.
The root of the issue is going to the fact, that all interface elements are
referencing original datablock. For example, View3D has pointer to camera it's
using, and all areas which does access v3d->camera should in fact query for
the evaluated version of that camera, within the current context.
Annoying part of this change is that we now need to pass depsgraph in lots
of places. Which is rather annoying.
Alternative would be to cache evaluated camera in viewport itself, but then
it makes it annoying to keep things in sync.
Not sure if there is nicer solution here.
Reviewers: dfelinto, campbellbarton, mont29
Subscribers: dragoneex
Differential Revision: https://developer.blender.org/D3007
This modify the selection code quite a bit but it's for the better.
When using selection we use the same batching / instancing process but we draw each element at a time using a an offset to the first element we want to draw and by drawing only one element.
This result much less memory allocation and better draw time.
This allows a duplicator (as known as dupli parent) to be in a visible
collection so its duplicated objects are visible, however while being
invisible for the final render.
An object that is a particle emitter is also considered a duplicator.
Many thanks for the reviewers for the extense feedback.
Reviewers: sergey, campbellbarton
Differential Revision: https://developer.blender.org/D2966
Users can change the group collection visibility in the outliner
when looking at groups.
Regular collections on the other hand don't have any special visibility control,
if you need a collection to be invisible during render, either don't link it
into the view layer used for F12, or disable it.
This includes:
* Updated unittests - update your lib/tests/layers folder.
* Subversion bump - branches be aware of that.
Note:
Although we are using eval_ctx to determine the visibility of a group collection
when rendering, the depsgraph is still using the same depsgraph for the viewport
and the render engine, so at the moment the render visibility is ignored.
Following next is a workaround for this separately to tag the groups before and
after rendering to tackle that.
The RenderResult struct still has a listbase of RenderLayer, but that's ok
since this is strictly for rendering.
* Subversion bump (to 2.80.2)
* DNA low level doversion (renames) - only for .blend created since 2.80 started
Note: We can't use DNA_struct_elem_find or get file version in init_structDNA,
so we are manually iterating over the array of the SDNA elements instead.
Note 2: This doversion change with renames can be reverted in a few months. But
so far it's required for 2.8 files created between October 2016 and now.
Reviewers: campbellbarton, sergey
Differential Revision: https://developer.blender.org/D2927
This adds a custom depth test that have the benefits to glitch less and be more visually pleasing.
Downside is that it let the grid pass trough the objects a little.
This effect is done in NDC space so that it counteract the logarithmic depth distribution imprecision (read as it's less visible near the camera but more present far away).
This patch also includes some cleanups.
This required some small changes to the data display shaders so that they match the way the object mode renders them.
Strangely enough, I had to remove the normal attribute from the display code because it was being not bound as soon as I created another rendering call in object mode. The problem may be deeper but I did not have time for this so I derive the normal from the sphere pos.
You can change the amount of samples in the user preferences. You do not need to restart blender to see the effect in the new viewport.
This adds another Multisample Framebuffer and textures (so even more memory required).
It works by blitting the default_fb to the multisample_fb each time the renderer need to render one or more "wire" pass.
It it then blit back to the default_fb so that the rest of pipeline is working as expected.
We COULD lower the GPU memory / bandwidth usage to render everything to the same multisample fbo and change the logic depending on if MSAA is enabled or not, but I think it's a bit too much work for now.
Adds a FXAA for smoothing out the extracted outlines.
The Post Process Anti Aliasing is only done on the Alpha channel of the outlines.
Because of that we need to add bleed the outline color out of the silouhette so the AA'd alpha can blend the right color and not pick black when the alpha is smoothed out of the silhouette.
Also because of the AA needs to have clear contrast to work with, I decided to ditch the "bluring" or the occluded outlines.
The FXAA adds an overhead of 0.17ms but we gain back 0.22ms * 4 = 0.88ms by removing the blur.
The FXAA Implementation is from Corey Richardson (cmr) (D2717). I had to modify it a bit to only filter the alpha channel.
Iterate over invisible objects too, so lamps can still lit the scene.
Also, now you can use a collection to set an object to invisible, not
only to visible.
For example:
Scene > Master collection > bedroom > furniture
Scene > View Layer > bedroom (visible)
> furniture (invisible)
The View Layer has two linked collections, bedroom and furniture.
This setup will make the furniture collection invisible.
Note: Unlike what was suggested on D2849, this does not make collection
visibility influence camera visibility. I will keep this as a separate
patch.
Reviewers: sergey
Subscribers: sergey, brecht, fclem
Differential Revision: https://developer.blender.org/D2849
This is in order to use the same texture on multiple sampler.
Also texture counter is reset after each shading group. This mimics the previous behaviour.
We do have an history of those pieces of evil in our code, would be nice
to get fully rid of it, but at the very least let's not add more of them
in new code. :)
Group texture fetches to hide latency. 3.2ms -> 2.2ms (constant time improvement, not depending on scene complexity)
Could optimize further with textureGather (require OpenGL 4.0).