Complex EEVEE nodegraphs, particularly those combining multiple principledBSDF shader nodes have a tendancy to require a large number of simultaneous live registers due to function call depth. In some instances, this causes substantial performance drop and corruption if the stack gets too large.
To mitigate this, splitting calls to closure_eval such that only a single individual closure is evaluated in each call reduces the number of live registers required. This is preferred over using compound closure evaluation functions which require a large amount of in-flight data.
Note that this is generally not more optimal, if the stack does not spill, as there is an increased instruction count. The specific trade-off depends on the exact architecture in question. Hence, this is limited to AMD GPUs.
Authored by Apple: Michael Parkin-White
Ref T96261