Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.
As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.
Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.
Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.
Authored by Apple: Michael Parkin-White
Ref T96261
Pull Request #104536
`PBVH_Leaf` nodes are now split into a new `PBVH_TexLeaf`
node type when using the paint brush. These nodes are
split by image pixels, not triangles. This greatly
increases performance when working with large
textures on low-poly meshes.
Reviewed By: Jeroen Bakker
Differential Revision: https://developer.blender.org/D14900
Ref: D14900
Texture usage flags can now be provided during texture creation specifying
the ways in which a texture can be used. This allows the GPU backends to
perform contextual optimizations which were not previously possible. This
includes enablement of hardware lossless compression which can result in
a 15%+ performance uplift for bandwidth-limited scenes on hardware such
as Apple-Silicon using Metal.
GPU_TEXTURE_USAGE_GENERAL can be used by default if usage is not known
ahead of time. Patch will also be relevant for the Vulkan backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15967
Required by Metal backend for efficient shader compilation. EEVEE material
resource binding permutations now controlled via CreateInfo and selected
based on material options. Other existing CreateInfo's also modified to
ensure explicitness for depth-writing mode. Other missing bindings also
addressed to ensure full compliance with the Metal backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16243
This is part of the effor to simplify the View struct in order to implement
multiview rendering.
The CameraTexCoFactors being only valid for a single view, and being only
used in very few places, it make sense to move it to the engine side.
Rewrite PBVH draw to allocate attributes into individual VBOs.
The old system tried to create a single VBO that could feed
every open viewport. This required uploading every color and
UV attribute to the viewport whether needed or not, often exceeding
the VBO limit.
This new system creates one VBO per attribute. Each attribute layout is
given its own GPU batch which is cached inside the owning PBVH node.
Notes:
* This is a full C++ rewrite. The old code is still there; ripping it out
can happen later.
* PBVH nodes now have a collection of batches, PBVHBatches, that keeps
track of all the batches inside the node.
* Batches are built exclusively from a list of attributes.
* Each attribute has its own VBO.
* Overlays, workbench and EEVEE can all have different attribute
layouts, each of which will get its own batch.
Reviewed by: Clement Foucault
Differential Revision: https://developer.blender.org/D15428
Ref D15428
This code slipped through the final review step surely caused by a faulty
merge.
Fixes T101372 Regression: World shader setup crashes Blender in rendered view
Regression introduced by rB697b447c2069bbbbaa9929aab0ea1f66ef8bf4d0
MTLContext provides functionality for command encoding, binding management and graphics device management. MTLImmediate provides simple draw enablement with dynamically encoded data. These draws utilise temporary scratch buffer memory to provide minimal bandwidth overhead during workload submission.
This patch also contains empty placeholders for MTLBatch and MTLDrawList to enable testing of first pixels on-screen without failure.
The Metal API also requires access to the GHOST_Context to ensure the same pre-initialized Metal GPU device is used by the viewport. Given the explicit nature of Metal, explicit control is also needed over presentation, to ensure correct work scheduling and rendering pipeline state.
Authored by Apple: Michael Parkin-White
Ref T96261
(The diff is based on 043f59cb3b)
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15953
This reverts commit 34051fcc12.
Although for normal use this doesn't make a difference. But when working with
huge scenes and volumetrics + NVIDIA it made a work-around not possible anymore.
For the heist production we added a fix on the render-farm (enable GPU workarounds).
{rB34051fcc12f388375697dcfc6da53e9909058fe1} made another work-around not
accessible anymore and it and was requested to revert this change.
On NVIDIA volumetric resolve failed for large production scenes.
The result would remove most color from the final render. The cause
seems to be a faulty driver.
This change ported the fragment shader to a compute shader which
would select a different compiler branch and didn't show the error.
This is a new implementation of the draw manager using modern
rendering practices and GPU driven culling.
This only ports features that are not considered deprecated or to be
removed.
The old DRW API is kept working along side this new one, and does not
interfeer with it. However this needed some more hacking inside the
draw_view_lib.glsl. At least the create info are well separated.
The reviewer might start by looking at `draw_pass_test.cc` to see the
API in usage.
Important files are `draw_pass.hh`, `draw_command.hh`,
`draw_command_shared.hh`.
In a nutshell (for a developper used to old DRW API):
- `DRWShadingGroups` are replaced by `Pass<T>::Sub`.
- Contrary to DRWShadingGroups, all commands recorded inside a pass or
sub-pass (even binds / push_constant / uniforms) will be executed in order.
- All memory is managed per object (except for Sub-Pass which are managed
by their parent pass) and not from draw manager pools. So passes "can"
potentially be recorded once and submitted multiple time (but this is
not really encouraged for now). The only implicit link is between resource
lifetime and `ResourceHandles`
- Sub passes can be any level deep.
- IMPORTANT: All state propagate from sub pass to subpass. There is no
state stack concept anymore. Ensure the correct render state is set before
drawing anything using `Pass::state_set()`.
- The drawcalls now needs a `ResourceHandle` instead of an `Object *`.
This is to remove any implicit dependency between `Pass` and `Manager`.
This was a huge problem in old implementation since the manager did not
know what to pull from the object. Now it is explicitly requested by the
engine.
- The pases need to be submitted to a `draw::Manager` instance which can
be retrieved using `DRW_manager_get()` (for now).
Internally:
- All object data are stored in contiguous storage buffers. Removing a lot
of complexity in the pass submission.
- Draw calls are sorted and visibility tested on GPU. Making more modern
culling and better instancing usage possible in the future.
- Unit Tests have been added for regression testing and avoid most API
breakage.
- `draw::View` now contains culling data for all objects in the scene
allowing caching for multiple views.
- Bounding box and sphere final setup is moved to GPU.
- Some global resources locations have been hardcoded to reduce complexity.
What is missing:
- ~~Workaround for lack of gl_BaseInstanceARB.~~ Done
- ~~Object Uniform Attributes.~~ Done (Not in this patch)
- Workaround for hardware supporting a maximum of 8 SSBO.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15817
Replaces `DRW_shgroup_call_procedural_triangles_indirect`.
This makes the indirect drawing more flexible.
Not all primitive types are supported but it is just a matter of adding
them.
This commit removes all EEVEE specific code from the `gpu_shader_material*.glsl`
files. It defines a clear interface to evaluate the closure nodes leaving
more flexibility to the render engine.
Some of the long standing workaround are fixed:
- bump mapping support is no longer duplicating a lot of node and is instead
compiled into a function call.
- bump rewiring to Normal socket is no longer needed as we now use a global
`g_data.N` for that.
Closure sampling with upstread weight eval is now supported if the engine needs
it.
This also makes all the material GLSL sources use `GPUSource` for better
debugging experience. The `GPUFunction` parsing now happens in `GPUSource`
creation.
The whole `GPUCodegen` now uses the `ShaderCreateInfo` and is object type
agnostic. Is has also been rewritten in C++.
This patch changes a view behavior for EEVEE:
- Mix shader node factor imput is now clamped.
- Tangent Vector displacement behavior is now matching cycles.
- The chosen BSDF used for SSR might change.
- Hair shading may have very small changes on very large hairs when using hair
polygon stripes.
- ShaderToRGB node will remove any SSR and SSS form a shader.
- SSS radius input now is no longer a scaling factor but defines an average
radius. The SSS kernel "shape" (radii) are still defined by the socket default
values.
Appart from the listed changes no other regressions are expected.
This adds support to show dots for the curves points when in edit mode,
using a specific overlay.
This also adds `DRW_curves_batch_cache_create_requested` which for now
only creates the point buffer for the newly added `edit_points` batch.
In the future, this will also handle other edit mode overlays, and
probably also replace the current curves batch cache creation.
Maniphest Tasks: T95770
Differential Revision: https://developer.blender.org/D14262
This function was not used for anything other than mat4. This
was because of a limitation of the DRW module/
This makes it cleaner for the GLSL and also less tempting to use
it for other unconventional purpose.
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
- Compute ref let the size of dispatch be modified just before drawing.
- Barrier call makes it possible to chain multiple compute passes in one pass.
- DRW_shgroup_vertex_buffer_ref is the analog of DRW_shgroup_uniform_block_ref.
In the original design draw engines had to copy with a limitation that
they were not allowed to reuse complex data structures between drawing
calls. Data that could be reused were limited to:
- GPUFramebuffers
- GPUTextures
- Memory that could be removed calling MEM_freeN (storage list)
- DRWPass
This is fine when the storage list contains arrays or structs but when
more complex data types (vectors, maps) etc wasn't possible.
This patch adds instance_data that can be reused between drawing calls.
The instance_data is controlled by the draw engine and doesn't need to
be limited as described above.
When an engines stores instance_data it must implement the
`DrawEngineType.instance_free` callback to free the data.
The patch originates from eevee rewrite. But was added to master as the
image engine rewrite also has a need for it.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D13425
This is a necessary step for EEVEE's new arch. This moves more data
to the draw manager. This makes it easier to have the render or draw
engines manage their own data.
This makes more sense and cleans-up what the GPUViewport holds
Also rewrites the Texture pool manager to be in C++.
This also move the DefaultFramebuffer/TextureList and the engine related
data to a new `DRWViewData` struct. This struct manages the per view
(as in stereo view) engine data.
There is a bit of cleanup in the way the draw manager is setup.
We now use a temporary DRWData instead of creating a dummy viewport.
Development: fclem, jbakker
Differential Revision: https://developer.blender.org/D11966
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycleshttps://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
This patch allows dropping material assets from material slot under the mouse
cursor. Before this change the material slot had to be hand-picked from the
properties panel.
For consistency it is chosen to do this in any shading mode as the tooltip shows
what is exactly going to happen during release.
The feature also works for other object types than Meshes as it uses the drawn surface on the
GPU to detect the material slots. Performance of this patch has been tested with AMD GCN3.0
cards and are very responsive.
Reviewed By: fclem, Severin
Differential Revision: https://developer.blender.org/D12190
This patch will use compute shaders to create the VBO for hair.
The previous implementation uses transform feedback.
Timings before: between 0.000069s and 0.000362s.
Timings after: between 0.000032s and 0.000092s.
Speedup isn't noticeable by end-users. The patch is used to test
the new compute shader pipeline and integrate it with the draw
manager. Allowing EEVEE, Workbench and other draw engines to
use compute shaders with the introduction of `DRW_shgroup_call_compute`
and `DRW_shgroup_vertex_buffer`.
Future improvements are possible by generating the index buffer
of hair directly on the GPU.
NOTE: that compute shaders aren't supported by Apple and still use
the transform feedback workaround.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D11057
This reverts commit 8f9599d17e.
Mac seems to have an error with this change.
```
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:115:44: error: use of undeclared identifier 'shader_src'
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:123:13: error: use of undeclared identifier 'shader_src'
ERROR: make[2]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/intern/draw_hair.c.o] Error 1
ERROR: make[1]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/all] Error 2
ERROR: make: *** [all] Error 2
```