Design: Changes required to target SSE42 #116592
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#116592
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Introduction
In the admin meetings from 2023-09-12 were the following notes:
While this sounded very indecisive, it was actually decided to bump the blender requirements in that meeting, this was not communicated as such, nor was the platform module informed about this hence nothing has happened in this area for months.
Implications
Since the notes are bit vague, I'm unsure what was actually decided. The blender requirements page has been updated to mention sse4.2 though, so I assume the new target platform is actually x86-64-v2 this will need to be confirmed by the admins.
Changes Required
CMake
The current TEST_SSE_SUPPORT Macro can be updated to test for SSE42 rather than sse and sse2.
GCC/CLang
For linux for both gcc and clang we'd test for the availability of the
-march:x86-64-v2
flagMSVC
None, MSVC will not generate SSE4.2 code, if we were to use the sse42 intrinsics in our code it will happily generate the opcodes for it, but it will not generate sse42 code itself, the next step up it will support is the AVX platfom there. So the current flags of "no architecture flags" will remain, the remnants of 32 bit support in the
TEST_SSE_SUPPORT
macro can be cleaned up though.Cycles
If (and that is a very much undecided if afaik) cycles wants to follow blender in bumping its minimum requirements the following changes are likely needed
CYCLES_CPU_NO_SSE41
andCYCLES_CPU_NO_SSE2
can be removedEnd user facing changes
Previous changes in the minimum (cough GPU cough) requirements of blender requirements have been met with and endless stream of bug reports of blender crashing at startup, for this change i very much would like to prevent this and have pre-flight check in place to determine if the current CPU is supported or not, and if not politely inform the user about this and gracefully exit the process.
A native way to implement this would be having the check on the first line of
main()
this however would be a mistake, as any initalizers and initializers from shared libraries will run long before execution even reaches the main function, hence SSe42 instructions will likely be executed before our check could run.Now on windows blender has a blender-launcher binary to hide the console window from the end user, which would be a great place to put such a check, however, that assumes every user will be using the launcher, some will, some won't, some will just still be running blender.exe from a script since that is what they have always done.
on linux a launcher currently (afaik) does not exist, and even if we introduced one, few people would use it.
Proposed solution
both windows and linux (afaik please correct me if i'm wrong) will load/initalize shared libraries in the same order they were linked, which we can exploit, the following proof of concept code was done on linux since it's shorter (Windows gets kinda messy with its DLL main, but the same mechanism has been proved to work there)
if we were to make a small shared library with the following code, build it without sse42 flags, and it be the first thing blender links against, this should do the trick
Ideally a more visual popup will be given rather than a printf but I'll leave this to the imagination of the platform dev for each platform.
Things like OpenImageIO, OpenEXR and perhaps some other external libraries could get built with SSE4.2 options too, right?
Yes, but that won't be relevant until we start the 4.2 library update, @brecht suggested doing blender itself first hence the plan focused on those changes first.
I was looking further into a
/arch:sse42
switch being missing for msvc, it actually looks like without any/arch
flags it does actually generate sse42 code already, it does a run-time check before executing any sse42 instructions and the sse4 codepath is the default non-branched code path. There is an undocumented compiler flag/d2archSSE42
which removes the run-time check and generated sse2 codepath.live example
So that while that all sounds great, the odds of the compiler choosing this optimization is relatively low, in all of blenkernel i only saw it applying it in
BKE_scopes_update
.Not super thrilled about using an undocumented flag though, and even if we do i doubt there will be a meaningful performance difference, so probably best to leave that one alone.
Now that 4.2 development started, what are the first steps to get going? I assume adding the SSE42 check library that Ray suggested is the first thing?
Afterwards further work can be done on enabling SSE4 in the libraries, cleaning up Blender and Cycles code etc. I am available to help.
I'll take an initial stab at the check, shouldn't be too hard.