Image engine uses 4 gpu textures that are as large as the area being
drawn. The amount of needed GPU memory can be reduced by dividing the
region in smaller parts. Reducing the GPU memory also reduces the stalls
when updating the textures, improving the performance as well.
This optimization works, but is disabled for now due to some rounding
errors that drawn lines on the screen where the screen is divided.
It used to be a number of fixed full screen images where the
responsibility was at image engine. Now moved the responsibility to the
drawing mode to make sure we can allocate non-full region-size textures
in a future refactoring.
`get_gpu_textures` was created when the image engine didn't support
texture streaming and used the gpu textures that were stored in the
image buffer itself.
This patch implements the Simple Star Glare node. This is only an approximation
of the existing implementation in the CPU compositor, an approximation that
removes the row-column dependency in the original algorithm, yielding an order
of magnitude faster computations. The difference due to the approximation is
readily visible in artificial test cases, but is less visible in actual use
cases, so it was agreed that this approximation is worthwhile.
For the future, we can look into approximating this further using a closed form
IIR recursive filter with parallel interconnection and block-based parallelism.
Which is expected to yield another order of magnitude faster computations.
The different passes can potentially be combined into a single shader with some
preprocessor tricks, but doing that complicated that code in a way that makes
it difficult to experiment with future optimizations, so I decided to leave it
as is for now.
Differential Revision: https://developer.blender.org/D16724
Reviewed By: Clement Foucault
This patch implements the Ghost Glare node. It is implemented using
direct convolution as opposed to a recursive one, which produces
slightly different results---more accurate ones, however, since the
ghosts are attenuated where it matters, the difference is barely
visible and is acceptable as far as I can tell.
A possible performance improvement is to implement all passes in a
single shader dispatch, where an array of all scales and color
modulators is computed recursively on the host then used in the shader
to add all ghosts, avoiding usage of global memory and unnecessary
copies. This optimization will be implemented separately.
Differential Revision: https://developer.blender.org/D16641
Reviewed By: Clement Foucault
These functions were originally implemented because:
- Not all of them existed pre C++17, but now we are using C++17.
- The call stack depth is quite a bit deeper with the std functions, making
debugging slower and more annoying. I didn't find this to be a problem
anymore recently.
No functional changes are expected.
Nodes which are common in multiple editors (RGB, Value, Switch, RGB to BW)
has less width in compositor editor. Patch changes compositor node width to
140 for consistency.
Reviewed by: HooglyBoogly
Differential Revision: https://developer.blender.org/D16719
OpenVDB likes to crash even in release builds when volumes become too small.
To fix this I used the same function that we use in other places already to
determine if the resulting volume will be too small.
Float images loaded in Blender are converted to scene linear and don't
require additional conversion. Image engine can reuse the rect_float of
those images. An assert statement is added tp make this more clear and
to test on missing code paths or future developments.
This splits the logic to detect if the MeshSequenceCache modifier
evaluation is for the ORCO mesh into its own function. This will allow
reusing the logic for when GeometrySet support is added to the modifier
(D11592).
No functionnal changes.
Differential Revision: https://developer.blender.org/D16611
Texture usage flags can now be provided during texture creation specifying
the ways in which a texture can be used. This allows the GPU backends to
perform contextual optimizations which were not previously possible. This
includes enablement of hardware lossless compression which can result in
a 15%+ performance uplift for bandwidth-limited scenes on hardware such
as Apple-Silicon using Metal.
GPU_TEXTURE_USAGE_GENERAL can be used by default if usage is not known
ahead of time. Patch will also be relevant for the Vulkan backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15967
Fix a number of small memory leaks in the Metal backend. Unreleased blit
shader objects and temporary textures addressed. Static memory manager
modified to defer creation until use. Added reference count tracker to
shared memory manager across contexts, such that cached memory allocations
will be released if all contexts are destroyed and re-initialized.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16415
Implementing non-geometry-shader path for rendering stencil shadows,
used by the workbench engine.
Patch also contains a few small modifications to Create-info to ensure
usage of gl_FragDepth is explicitly specified.
This is required for testing of the patch.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16436
Implemented geometry shader alternative for rendering of UV edges in Metal, as geometry shaders are unsupported.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16452
There is a more recent implementation as a geometry node, but this code
is used by RNA and Cycles still. In order to help understand the code
to tell if it can be replaced this makes some small changes:
- Use indexing instead of pointer incrementing
- Add const to unchanged arguments
- Avoid unnecessary indentation
- Use references for expected non-null arguments
On linux a man page is generated before the
scripts are installed, this lead to USD getting
a null pointer for it's pluging path and crashing
the process.
Porting conservative depth rendering to use non-geometry shader path for
Metal.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16424
Additional mat3 constructors added, global variable namespace collisions
for uniform and object color avoided via re-name.
Metal vertex format compatibility added for shaders wherein vertex data
goes through a double-conversion and cannot be implicitly converted during
Metal vertex assembly e.g. bitmasks passed directly as unsigned type in
shader interface for certain shader interfaces.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16433
Required by Metal backend for efficient shader compilation. EEVEE material
resource binding permutations now controlled via CreateInfo and selected
based on material options. Other existing CreateInfo's also modified to
ensure explicitness for depth-writing mode. Other missing bindings also
addressed to ensure full compliance with the Metal backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D16243
I organized the fields so that similar variables were closer together and
more "important" fields were closer to the beginning. I also added
comments to help describe the purpose of most fields.
Differential Revision: https://developer.blender.org/D16710
Based on feedback from Simon Thommes, for link-drag-serach it's most
useful to have the A and B sockets connected, first, then the factor
sockets, then the special color mix operations. This addresses that by
adding the search items in order and decrementing a weight manually
as items are added.
Make the `Object *` and `Mesh *` parameter of `USD_mesh_topology_changed()`
`const`. This function only inspects them, and doesn't need to modify
them.
No functional changes.
Since moving to float scaling, the method of accessing the text
width with the aspect applied wasn't working properly.
Based on contributions by @lone_noel & @harley, see D15043.
The optimization is done by removing the `len` member from the groups
and using fewer `for` loops.
But it's not a really impactful optimization.
Only 1.9% in the weld operation of a high poly mesh.
(disregarding getting the vertex map and all other operations on a
Blender frame).
The readability improvement comes from using more familiar types like
`int` and `int2` instead of `WeldGroup` and `WeldGroupEdge` structs.