Cycles: oneAPI: use local memory for faster shader sorting #107994
With enabling of partitioned shader sorting for oneAPI backend (this functionality previous have been used only by Metal backend) I can see nice performance improvement on my machine with Intel® Arc™ A770 in benchmark scenes (at average I have got 8% performance improvement).
In case of approval of this pull request I would prefer to keep initial 2 commit separated.
@brecht, sorry for bothering you when there is with so little time before bcon3, but can you please take a look? This changes quite small, yet they are giving nice performance improvement (at my machine at least) and they also basically just mostly reuse already existing source code, so I think it is reasonable to have this changes in the Blender 3.6 LTS release.
cuda compilation failed as it didn't like the ifdef in kernel arguments, I've pushed
e2e7723aa9 that fixes it.
The changes outside of the OneAPI seems and easy to re-iterate of we've missed something by accident. And that was the main reason why PR was not merged yesterday (while similarly sized changes were still happening).
So while the timing is unideal, I think it is acceptable.
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?