Similar to recent issues with gl_shader_interface,
ShaderInput lists need to be sorted to ensure correct
and efficient uniform lookup by name.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105239
GPencil 3D stroke rendering uses a geometry shader.
This is unsupported by the Metal backend, so implement
fix for this failing compilation by shifting geometry shader
logic into the Vertex shader for Metal backend.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105143
Metal LineLoop emulation path does not correctly apply
when using SSBO vertex fetch mode alongside 3D line
rendering.
Patch moves line emulation above SSBO
vertex fetch setup to ensure the correct emulation
parameters are passed to the shader.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105142
Resolves issue with nearest filtering on UI Icons. Note that as
Metal does not support LOD bias as a parameter on a sampler
object, the original code has been modified to perform LOD
biasing at the shader level.
As GPU_SAMPLER_ICON is not widely used, it is more
efficient to apply directly to the affected shaders, rather
than workaround passing in the sampler LOD bias as a
separate value e.g. uniform or push constant.
Original PR feedback addressed to also refactor ICON
shaders to use consistent style for single and multi
Icon rendering.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105145
When using ShaderCreateInfo with builtin uniform(blocks) there are
cases where the current implementation could not find an existing
block. The reason is that it uses name matching and name matching
requires that the shader inputs are sorted based on the name hash.
This change fixes this by first for the sorting of the shader
inputs before resolving the builtins.
Pull Request #105127
Cycles fallback display shader previously did not use viewport.
This would crash or cause the display not to show when using
GPU backends other than OpenGL, if another display shader
was unavailable.
Now use ShaderCreateInfo for Cycles fallback display.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #104987
The stoke shader of grease pencil uses a geometry shader stage. Apple
devices don't support shaders with geometry shader stage. In the
OpenGL driver there was a pass-through implemented so it didn't fail.
When using the metal backend this needs to be solved more explicitly.
This change patches the grease pencil shader to support both the
backends supporting a geometry stage and those without.
Fixes#105059
Pull Request #105116
Erroneous cache warming case where the generated material is
identical to default material and cached shader is re-used,
resulting in case where the parent shader is identical to the
source.
Authored by Apple: Michael Parkin-White
Ref #96261
The check was triggering the 'this' pointer cannot be null in
well-defined C++ code
We do not check for this pointer in any other areas. If it is
needed due to possible opaque pointer cast to the check prior
to the cast.
Pull Request #104974
This reverts commit 19222627c6.
Something went wrong here, seems like this commit merged the main branch
into the release branch, which should never be done.
This reverts commit 68181c2560.
I merged 3.6 into 3.5 by mistake. Basically I had a PR against main,
then changed it in the last minute to be against 3.5 via the
web-interface unaware that I shouldn't do it without updating the
patch.
Original Pull Request: #104889
Note that the node group has its sockets names
translated, while the built-in nodes don't.
So we need to use data_ for the built-in nodes names,
and the sockets of the created node groups.
Pull Request #104889
When updating a mesh, the GPU Subdivision code makes calls to
`GPU_indexbuf_bind_as_ssbo()`.
This may cause the current VAO index buffer to change due to calls from
`glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo_id_)` in
`GPU_indexbuf_bind_as_ssbo()`.
The solution is to unbind the VAO (by calling `glBindVertexArray(0)`)
before creating the index buffer IBO.
Co-authored-by: Germano Cavalcante <grmncv@gmail.com>
Pull Request #104873
Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.
As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.
Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.
Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.
Authored by Apple: Michael Parkin-White
Ref T96261
Pull Request #104536
This replaces `GPU_SHADER_3D_POINT_FIXED_SIZE_VARYING_COLOR` by
GPU_SHADER_2D_POINT_UNIFORM_SIZE_UNIFORM_COLOR_OUTLINE_AA`.
None of the usage made sense to not use the AA shader.
Scale the point size to account for the rounded shape.
Vulkan has a pluggable memory allocation feature, which allows internal
driver allocations to be done by the client application provided
allocator. Vulkan uses this for more client application allocations
done inside the driver, but can also do it for more internal oriented
allocations.
VK_ALLOCATION_CALLBACKS initializes allocation callbacks for host allocations.
The macro creates a local static variable with the name vk_allocation_callbacks
that can be passed to vulkan API functions that expect
const VkAllocationCallbacks *pAllocator.
When WITH_VULKAN_GUARDEDALLOC=Off the memory allocation implemented
in the vulkan device driver is used for both internal and application
oriented memory operations.
For now this would help during the development of Vulkan backend to
detect hidden memory leaks that are hidden inside the driver part
of the stack. In a later stage we need to measure the overhead and
if this should become the default behavior.
Pull Request #104434
The GPU module has 2 different styles when reading back data from
GPU buffers. The SSBOs used a memcpy to copy the data to a
pre-allocated buffer. IndexBuf/VertBuf gave back a driver/platform
controlled pointer to the memory.
Readback is done for test cases returning mapped pointers is not safe.
For this reason we settled on using the same approach as the SSBO.
Copy the data to a caller pre-allocated buffer.
Reason why this API is currently changed is that the Vulkan API is more
strict on mapping/unmapping buffers that can lead to potential issues
down the road.
Pull Request #104571
Keep using the 3 evaluations dF_branch method for the Displacement output.
The optimized 2 evaluations method used by node_bump is now on its own macro (dF_branch_incomplete).
displacement_bump modifies the normal that nodetree_exec uses, so even with a refactor it wouldn’t be possible to re-use the computation anyway.
Avoid computing the non-derivative height twice.
The height is now computed as part of the main function, while the height at x and y offsets are still computed on a separate function.
The differentials are now computed directly at node_bump.
Co-authored-by: Miguel Pozo <pragma37@gmail.com>
Pull Request #104595
Documented all functions, adding use case and side effects.
Also replace the use of shortened argument name by more meaningful ones.
Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).
Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.
Blender was reporting that the GPU_TEXTURE_USAGE_HOST_READ wasn't set.
This is used to indicate that the textures needs to be read back to
CPU. Textures that don't need to be read back can be optimized by the
GPU backend.
Found during investigation of #104282.
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.
The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.
Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.
## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.
## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).
Related to #93220
Related to #104472
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).
This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.