Design: Changes required to target SSE42 #116592

Open
opened 2023-12-28 01:35:41 +01:00 by Ray molenkamp · 5 comments
Member

Introduction

In the admin meetings from 2023-09-12 were the following notes:

Bumping minimum CPU instruction set for 4.0:

  • Suggestion is to support SSE 4.2. It is very unlikely that computers that do not support this will have a GPU that supports OpenGL4.3 (Blender 4.x requirement)
  • Raising the requirement to AVX won’t bring that many immediate benefits and better wait.
  • Raising the requirement to AVX2 is not acceptable as it leaves behind too many artists

While this sounded very indecisive, it was actually decided to bump the blender requirements in that meeting, this was not communicated as such, nor was the platform module informed about this hence nothing has happened in this area for months.

Implications

Since the notes are bit vague, I'm unsure what was actually decided. The blender requirements page has been updated to mention sse4.2 though, so I assume the new target platform is actually x86-64-v2 this will need to be confirmed by the admins.

Changes Required

CMake

The current TEST_SSE_SUPPORT Macro can be updated to test for SSE42 rather than sse and sse2.

GCC/CLang

For linux for both gcc and clang we'd test for the availability of the -march:x86-64-v2 flag

MSVC

None, MSVC will not generate SSE4.2 code, if we were to use the sse42 intrinsics in our code it will happily generate the opcodes for it, but it will not generate sse42 code itself, the next step up it will support is the AVX platfom there. So the current flags of "no architecture flags" will remain, the remnants of 32 bit support in the TEST_SSE_SUPPORT macro can be cleaned up though.

Cycles

If (and that is a very much undecided if afaik) cycles wants to follow blender in bumping its minimum requirements the following changes are likely needed

End user facing changes

Previous changes in the minimum (cough GPU cough) requirements of blender requirements have been met with and endless stream of bug reports of blender crashing at startup, for this change i very much would like to prevent this and have pre-flight check in place to determine if the current CPU is supported or not, and if not politely inform the user about this and gracefully exit the process.

A native way to implement this would be having the check on the first line of main() this however would be a mistake, as any initalizers and initializers from shared libraries will run long before execution even reaches the main function, hence SSe42 instructions will likely be executed before our check could run.

Now on windows blender has a blender-launcher binary to hide the console window from the end user, which would be a great place to put such a check, however, that assumes every user will be using the launcher, some will, some won't, some will just still be running blender.exe from a script since that is what they have always done.

on linux a launcher currently (afaik) does not exist, and even if we introduced one, few people would use it.

Proposed solution

both windows and linux (afaik please correct me if i'm wrong) will load/initalize shared libraries in the same order they were linked, which we can exploit, the following proof of concept code was done on linux since it's shorter (Windows gets kinda messy with its DLL main, but the same mechanism has been proved to work there)

if we were to make a small shared library with the following code, build it without sse42 flags, and it be the first thing blender links against, this should do the trick

/* SPDX-FileCopyrightText: 2023 Blender Authors
 *
 * SPDX-License-Identifier: GPL-2.0-or-later */

/** \file
 * \ingroup creator
 */

#include <cstdio>
#include <cstdlib>

static bool check_sse42() { return false; } // TODO: Implement this

static __attribute__ ((constructor)) void cpu_check(void)
{
  bool supported = check_sse42();
  if(supported)
  {
    printf("sse42 supported!\n");
  }
  else
  {
    printf("sse42 not supported exiting...\n");
    exit(-1);
  }
  return;
}

Ideally a more visual popup will be given rather than a printf but I'll leave this to the imagination of the platform dev for each platform.

# Introduction In the admin meetings from [2023-09-12](https://devtalk.blender.org/t/2023-09-12-blender-admins-meeting/) were the following notes: > Bumping minimum CPU instruction set for 4.0: > > - Suggestion is to support SSE 4.2. It is very unlikely that computers that do not support this will have a GPU that supports OpenGL4.3 (Blender 4.x requirement) > - Raising the requirement to AVX won’t bring that many immediate benefits and better wait. > - Raising the requirement to AVX2 is not acceptable as it leaves behind too many artists While this sounded very indecisive, it was actually decided to bump the blender requirements in that meeting, this was not communicated as such, nor was the platform module informed about this hence nothing has happened in this area for months. # Implications Since the notes are bit vague, I'm unsure what was actually decided. The blender requirements page has been updated to mention sse4.2 though, so I _assume_ the new target platform is actually [x86-64-v2](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels) this will need to be confirmed by the admins. # Changes Required ## CMake The current [TEST_SSE_SUPPORT](https://projects.blender.org/blender/blender/src/branch/main/build_files/cmake/macros.cmake#L679) Macro can be updated to test for SSE42 rather than sse and sse2. ### GCC/CLang For linux for both gcc and clang we'd test for the availability of the `-march:x86-64-v2` flag ### MSVC None, MSVC will not generate SSE4.2 code, if we were to use the sse42 intrinsics in our code it will happily generate the opcodes for it, but it will not generate sse42 code itself, the next step up it will support is the AVX platfom there. So the current flags of "no architecture flags" will remain, the remnants of 32 bit support in the `TEST_SSE_SUPPORT` macro can be cleaned up though. ## Cycles _If_ (and that is a very much undecided if afaik) cycles wants to follow blender in bumping its minimum requirements the following changes are likely needed - [intern/cycles/kernel/device/cpu/kernel.cpp](https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/kernel/device/cpu/kernel.cpp#L10) can be updated to have SSE42 as the lower bar - [intern/cycles/kernel/device/cpu/kernel_sse2.cpp](https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/kernel/device/cpu/kernel_sse2.cpp) can be removed - [intern/cycles/kernel/device/cpu/kernel_sse42.cpp](https://projects.blender.org/blender/blender/src/branch/main/intern/cycles/kernel/device/cpu/kernel_sse42.cpp) can be removed - tests for `CYCLES_CPU_NO_SSE41` and `CYCLES_CPU_NO_SSE2` can be removed - Update the python debug UI to remove these kernels - Likely more, these are just the most obvious changes # End user facing changes Previous changes in the minimum (cough GPU cough) requirements of blender requirements have been met with and endless stream of bug reports of blender crashing at startup, for this change i very much would like to prevent this and have pre-flight check in place to determine if the current CPU is supported or not, and if not politely inform the user about this and gracefully exit the process. A native way to implement this would be having the check on the first line of `main()` this however would be a mistake, as any initalizers and initializers from shared libraries will run _long_ before execution even reaches the main function, hence SSe42 instructions will likely be executed before our check could run. Now on windows blender has a blender-launcher binary to hide the console window from the end user, which would be a great place to put such a check, however, that assumes every user will be using the launcher, some will, some won't, some will just still be running blender.exe from a script since that is what they have always done. on linux a launcher currently (afaik) does not exist, and even if we introduced one, few people would use it. ## Proposed solution both windows and linux (afaik please correct me if i'm wrong) will load/initalize shared libraries in the same order they were linked, which we can exploit, the following proof of concept code was done on linux since it's shorter (Windows gets kinda messy with its DLL main, but the same mechanism has been proved to work there) if we were to make a small shared library with the following code, build it without sse42 flags, and it be the _first_ thing blender links against, this _should_ do the trick ``` /* SPDX-FileCopyrightText: 2023 Blender Authors * * SPDX-License-Identifier: GPL-2.0-or-later */ /** \file * \ingroup creator */ #include <cstdio> #include <cstdlib> static bool check_sse42() { return false; } // TODO: Implement this static __attribute__ ((constructor)) void cpu_check(void) { bool supported = check_sse42(); if(supported) { printf("sse42 supported!\n"); } else { printf("sse42 not supported exiting...\n"); exit(-1); } return; } ``` Ideally a more visual popup will be given rather than a printf but I'll leave this to the imagination of the platform dev for each platform.
Ray molenkamp added this to the 4.2 LTS milestone 2023-12-28 01:35:41 +01:00
Ray molenkamp added the
Type
Design
label 2023-12-28 01:35:41 +01:00
Brecht Van Lommel was assigned by Ray molenkamp 2023-12-28 01:36:14 +01:00
Thomas Dinges was assigned by Ray molenkamp 2023-12-28 01:36:14 +01:00
Ray molenkamp self-assigned this 2023-12-28 01:36:15 +01:00
Campbell Barton was assigned by Ray molenkamp 2023-12-28 01:36:30 +01:00

Things like OpenImageIO, OpenEXR and perhaps some other external libraries could get built with SSE4.2 options too, right?

Things like OpenImageIO, OpenEXR and perhaps some other external libraries could get built with SSE4.2 options too, right?
Author
Member

Yes, but that won't be relevant until we start the 4.2 library update, @brecht suggested doing blender itself first hence the plan focused on those changes first.

Yes, but that won't be relevant until we start the 4.2 library update, @brecht suggested doing blender itself first hence the plan focused on those changes first.
Author
Member

I was looking further into a /arch:sse42 switch being missing for msvc, it actually looks like without any /arch flags it does actually generate sse42 code already, it does a run-time check before executing any sse42 instructions and the sse4 codepath is the default non-branched code path. There is an undocumented compiler flag /d2archSSE42 which removes the run-time check and generated sse2 codepath.

live example

So that while that all sounds great, the odds of the compiler choosing this optimization is relatively low, in all of blenkernel i only saw it applying it in BKE_scopes_update.

Not super thrilled about using an undocumented flag though, and even if we do i doubt there will be a meaningful performance difference, so probably best to leave that one alone.

I was looking further into a `/arch:sse42` switch being missing for msvc, it actually looks like without any `/arch` flags it does actually generate sse42 code already, it does a run-time check before executing any sse42 instructions and the sse4 codepath is the default non-branched code path. There is an undocumented compiler flag `/d2archSSE42` which removes the run-time check and generated sse2 codepath. [live example](https://godbolt.org/z/hnbnrdxex) So that while that all sounds great, the odds of the compiler choosing this optimization is relatively low, in all of blenkernel i only saw it applying it in `BKE_scopes_update`. Not super thrilled about using an undocumented flag though, and even if we do i doubt there will be a meaningful performance difference, so probably best to leave that one alone.

Now that 4.2 development started, what are the first steps to get going? I assume adding the SSE42 check library that Ray suggested is the first thing?

Afterwards further work can be done on enabling SSE4 in the libraries, cleaning up Blender and Cycles code etc. I am available to help.

Now that 4.2 development started, what are the first steps to get going? I assume adding the SSE42 check library that Ray suggested is the first thing? Afterwards further work can be done on enabling SSE4 in the libraries, cleaning up Blender and Cycles code etc. I am available to help.
Author
Member

I'll take an initial stab at the check, shouldn't be too hard.

I'll take an initial stab at the check, shouldn't be too hard.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116592
No description provided.