Windows ARM64 Support #119126

Open
opened 2024-03-06 16:05:48 +01:00 by Brecht Van Lommel · 22 comments

Libraries

  • Create lib-windows_arm64 repository
  • Populate repository
  • Integrate with build system and submodules

Features

  • SIMD support in BLI_simd.h once blenlib is converted to C++
  • OpenPGL support

Buildbot

Gitea

  • Build by default via blender-bot

Issues

  • Test failures
    • io_wavefront
    • constraints
    • Various python tests, libs appear to be missing some python dlls (see Sergey's comment below)
  • TBBmalloc: skip allocation functions replacement in ucrtbase.dll: unknown prologue for function free
    • WITH_TBB_MALLOC_PROXY=OFF may work around it by turning off tbbmalloc
  • The viewport flickers sometimes when zoomed in (clay view, textured view is fine) - difficult to reproduce, but happens from time to time.
  • There is a shader compiler issue which will have a BSP update out in due course - the effect is noticeable on the 3.3 splash in the viewport, where meshes fail to render in some cases
### Libraries * [x] Create [lib-windows_arm64](https://projects.blender.org/blender/lib-windows_arm64) repository * [x] Populate repository * [x] Integrate with build system and submodules ### Features * [x] SIMD support in `BLI_simd.h` once blenlib is converted to C++ * [x] OpenPGL support ### Buildbot * [x] Provision machines * [x] Integration (infrastructure/blender-devops!3) ### Gitea * [ ] Build by default via blender-bot ### Issues * [x] Test failures * [x] io_wavefront * [x] constraints * [x] Various python tests, libs appear to be missing some python dlls (see Sergey's comment below) * [x] TBBmalloc: skip allocation functions replacement in ucrtbase.dll: unknown prologue for function free * `WITH_TBB_MALLOC_PROXY=OFF` may work around it by turning off tbbmalloc * [x] The viewport flickers sometimes when zoomed in (clay view, textured view is fine) - difficult to reproduce, but happens from time to time. * [x] There is a shader compiler issue which will have a BSP update out in due course - the effect is noticeable on the 3.3 splash in the viewport, where meshes fail to render in some cases
Member

I had io_wavefront fail in #118468 due to __SSE2__ not being defined by accident at one point, could be the same root cause.

I had `io_wavefront` fail in #118468 due to `__SSE2__` not being defined by accident at one point, could be the same root cause.

@LazyDodo Do you know which assert it was failing on due to that? The line from ctest on the wavefront failure for me is:

BLI_assert failed: C:\WoA\blender-build-arm64\blender\source\blender\io\wavefront_obj\importer\importer_mesh_utils.cc:78, fixup_invalid_face(), at 'idx >= 0 && idx < face_verts.size()'
@LazyDodo Do you know which assert it was failing on due to that? The line from ctest on the wavefront failure for me is: ``` BLI_assert failed: C:\WoA\blender-build-arm64\blender\source\blender\io\wavefront_obj\importer\importer_mesh_utils.cc:78, fixup_invalid_face(), at 'idx >= 0 && idx < face_verts.size()' ```
Member

I'm about 99% certain thats the same assert

I'm about 99% certain thats the same assert

As noted by @aras_p here, SIMD disabled causes at least one of these test failures (wavefront): #118468 (comment)

Given the stl failure is also in a maths library, and the constraints failure is on precision, this could be our culprit for the ARM64 failures, as SIMD is disabled until blenlib becomes C++

As noted by @aras_p here, SIMD disabled causes at least one of these test failures (`wavefront`): https://projects.blender.org/blender/blender/pulls/118468#issuecomment-1140814 Given the `stl` failure is also in a maths library, and the `constraints` failure is on precision, this could be our culprit for the ARM64 failures, as SIMD is disabled until blenlib becomes C++

Test failures log attached here

Test failures log attached here

I've added Windows Arm64 builders to buildbot. Some tests are failing, on top of the known ones: https://builder.blender.org/admin/#/builders/244/builds/1

  • io_curve_svg_* seems to be failing due to missing expat. And, in fact, I did not see expat.pyd in the Python's DLLs folder
  • io_usd_* seems to be failing due to another missing module
  • scripts_* is a bit tricky to follow. One of obvious related issues is numpy which misses numpy.core._multiarray_umath

P.S. Dropping a message here so the information is not lost until Anthony is back.

I've added Windows Arm64 builders to buildbot. Some tests are failing, on top of the known ones: https://builder.blender.org/admin/#/builders/244/builds/1 - `io_curve_svg_*` seems to be failing due to missing expat. And, in fact, I did not see `expat.pyd` in the [Python's DLLs folder](https://projects.blender.org/blender/lib-windows_arm64/src/branch/main/python/311/DLLs) - `io_usd_*` seems to be failing due to another missing module - `scripts_*` is a bit tricky to follow. One of obvious related issues is numpy which misses `numpy.core._multiarray_umath` P.S. Dropping a message here so the information is not lost until Anthony is back.
Member

numpy.core._multiarray_umath is missing the .pyd files, similar to expat, the pdb's are there though, so it looks like the pyd's got build just didn't make it into the git.

`numpy.core._multiarray_umath` is missing the .pyd files, similar to expat, the pdb's are there though, so it looks like the pyd's got build just didn't make it into the git.

Thanks @Sergey and @LazyDodo - I am back now and catching up - I have kicked off a local rebuild of the libs from the tip of main, and will update the prebuilts directory when/if it succesfully completes.

You're right that those files seem to be missing in the GIT LFS repo - I'm not sure why that would be the case, as my "output" folder from the lib build (an old one I pulled out from a few months ago) seems to contain them...

Once the latest build is complete and functional I will diff the file lists against x64 to make sure nothing else got unintentionally missed.

Thanks @Sergey and @LazyDodo - I am back now and catching up - I have kicked off a local rebuild of the libs from the tip of main, and will update the prebuilts directory when/if it succesfully completes. You're right that those files seem to be missing in the GIT LFS repo - I'm not sure why that would be the case, as my "output" folder from the lib build (an old one I pulled out from a few months ago) seems to contain them... Once the latest build is complete and functional I will diff the file lists against x64 to make sure nothing else got unintentionally missed.
Author
Owner

.pyd files were ignored through .gitignore.

I've removed them from there now: blender/lib-windows_arm64@6da81e67a9. We only need to ignore them on Linux and macOS.

`.pyd` files were ignored through `.gitignore`. I've removed them from there now: blender/lib-windows_arm64@6da81e67a9852839c9f52bb7c5e08a613ef94ab7. We only need to ignore them on Linux and macOS.

ACK, that would make sense :)

RE the TBBmalloc thing - it's always been an issue, even when running the emulated x64 version of blender on an ARM64 device. Other than the error message being a bit spammy, it seems relatively harmless. Is the better option here to disable it completely?

ACK, that would make sense :) RE the TBBmalloc thing - it's always been an issue, even when running the emulated x64 version of blender on an ARM64 device. Other than the error message being a bit spammy, it seems relatively harmless. Is the better option here to disable it completely?
Member

it's not entirely harmless, there is a distinct performance penalty, the standard ms allocator is not very good in multi threaded environments, see D6218 for some benchmarks, do note that around that time, we had our own BVH builder now-days we use embree and performance profile will likely have changed. (the old builder allocated many small memory blocks from many threads, i don't think embree does this)

given tbb isn't able to hook the old allocator, having it enabled really adds nothing, I say we disable WITH_TBB_MALLOC_PROXY in cmake for WOA until we can make it work/move to a version that support it on WOA.

it's not entirely harmless, there is a distinct performance penalty, the standard ms allocator is not very good in multi threaded environments, see [D6218](https://archive.blender.org/developer/D6218) for some benchmarks, do note that around that time, we had our own BVH builder now-days we use embree and performance profile will likely have changed. (the old builder allocated many small memory blocks from many threads, i don't think embree does this) given tbb isn't able to hook the old allocator, having it enabled really adds nothing, I say we disable `WITH_TBB_MALLOC_PROXY` in cmake for WOA until we can make it work/move to a version that support it on WOA.

Understood - I think this is a solved problem in newer versions of TBB (ie, not 2020u3 which I had to port to ARM64 by hand) - do we know what timescales are looking like for TBB/USD updates?

Last I heard it was "at some point" - it would also reduce the patch size for TBB dramatically for the ARM64 stuff.

Understood - I think this is a solved problem in newer versions of TBB (ie, not 2020u3 which I had to port to ARM64 by hand) - do we know what timescales are looking like for TBB/USD updates? Last I heard it was "at some point" - it would also reduce the patch size for TBB dramatically for the ARM64 stuff.
Member

unknown, we're bound by the vfx platform there, they have been saying "next year" for a few years now and it never happens due to lack of support for newer TBB in USD. The draft for 2025 should be posted in the next month or so, then we'll know more.

unknown, we're bound by the [vfx platform](https://vfxplatform.com/) there, they have been saying "next year" for a few years now and it never happens due to lack of support for newer TBB in USD. The draft for 2025 should be posted in the next month or so, then we'll know more.
Author
Owner

I posted pull requests to USD to support newer TBB last year. Half of them were merged last week, so hopefully the other half will not take too long and make it into the next USD release.

I posted [pull requests to USD](https://github.com/PixarAnimationStudios/OpenUSD/pull/2466) to support newer TBB last year. Half of them were merged last week, so hopefully the other half will not take too long and make it into the next USD release.

I have updated the files on the LFS repo, and done a build on top of main using my updated submodule - all seems well with the tests, with only constraints failing now:

99% tests passed, 1 tests failed out of 281

Total Test time (real) = 534.63 sec

The following tests FAILED:
         33 - constraints (Failed)
Errors while running CTest

I will make a PR to the main blender repo in the next hour or two updating the submodule version - things should work properly after that!

I have updated the files on the LFS repo, and done a build on top of main using my updated submodule - all seems well with the tests, with only `constraints` failing now: ``` 99% tests passed, 1 tests failed out of 281 Total Test time (real) = 534.63 sec The following tests FAILED: 33 - constraints (Failed) Errors while running CTest ``` I will make a PR to the main blender repo in the next hour or two updating the submodule version - things should work properly after that!

The flickering and scene failures finally have an updated GPU driver available from here: https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-8-series-mobile-compute-platforms/snapdragon-8cx-gen-3-compute-platform#Software

The TBB stuff is resolved, I switched the malloc proxy off.

Now all that remains (AFAICT?) is OpenPGL, and the clang-cl work I have been doing in #124182.

Given brecht is away, one for @Sergey maybe - the Windows ARM64 machines you guys have will want the latest version of the driver installed from that website (29th of May).

The flickering and scene failures finally have an updated GPU driver available from here: https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-8-series-mobile-compute-platforms/snapdragon-8cx-gen-3-compute-platform#Software The TBB stuff is resolved, I switched the malloc proxy off. Now all that remains (AFAICT?) is OpenPGL, and the clang-cl work I have been doing in #124182. Given brecht is away, one for @Sergey maybe - the Windows ARM64 machines you guys have will want the latest version of the driver installed from that website (29th of May).

@bartvdbraak Even though the current machines we have are used in an unintended mode for CI/CD, it might be good to update the driver, to potentially enable GPU tests later and not run into some obscure issues. Is it something you can have a look?

@bartvdbraak Even though the current machines we have are used in an unintended mode for CI/CD, it might be good to update the driver, to potentially enable GPU tests later and not run into some obscure issues. Is it something you can have a look?

New GPU driver Adreno_Driver_for_Snapdragon_8cx_Gen3_release_8.0_(05-29-2024) was installed on both workers.

New GPU driver `Adreno_Driver_for_Snapdragon_8cx_Gen3_release_8.0_(05-29-2024)` was installed on both workers.

Okay, now that #126640 has been merged, I think that we are ready to go!

So, we now need to install this on the workers: https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.8/LLVM-18.1.8-woa64.exe

And change the C/CXX compiler flags (and also linker maybe? Not sure how the bots are set up) on the worker to point at clang-cl on there for builds.

One for @bartvdbraak, unless @Sergey has any objections?

This compiler switch would take benchmark scores from 158.59 on MSVC to 231.42 on clang-cl, when tested on a Snapdragon X Elite device, a 46% increase by my calculations.

Okay, now that #126640 has been merged, I think that we are ready to go! So, we now need to install this on the workers: https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.8/LLVM-18.1.8-woa64.exe And change the C/CXX compiler flags (and also linker maybe? Not sure how the bots are set up) on the worker to point at clang-cl on there for builds. One for @bartvdbraak, unless @Sergey has any objections? This compiler switch would take benchmark scores from `158.59` on MSVC to `231.42` on clang-cl, when tested on a Snapdragon X Elite device, a 46% increase by my calculations.

Going clang-cl for those workers sounds good to me.

Do you happen to have some script or command line for CMake to use clang-cl? Having it would save some time figuring what exactly need to change build-system wide.

Going clang-cl for those workers sounds good to me. Do you happen to have some script or command line for CMake to use clang-cl? Having it would save some time figuring what exactly need to change build-system wide.
Member

I haven't tested it on WOA but make.bat clang ninja will likely be enough to get you going (assuming you have used the installer from https://github.com/llvm/llvm-project/releases/ )

I haven't tested it on WOA but `make.bat clang ninja` will likely be enough to get you going (assuming you have used the installer from https://github.com/llvm/llvm-project/releases/ )

What Ray said, really - I have been using the command line make.bat arm64 2022 with_tests clang ninja - looking at the command line used on the builders, looks like you would need to set the tools (CXX, C compilers, linker, etc) manually to that directory,

What Ray said, really - I have been using the command line `make.bat arm64 2022 with_tests clang ninja` - looking at the command line used on the builders, looks like you would need to set the tools (CXX, C compilers, linker, etc) manually to that directory,
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#119126
No description provided.