Buildbot: Install Vulkan on workers for running tests #121

Open
opened 2024-08-23 15:21:34 +02:00 by Bart van der Braak · 8 comments

@Jeroen-Bakker requested a Vulkan installation for the Buildbot workers:

  • Have a patch ready to run tests on
  • Deploy Vulkan to Linux workers on UATEST
  • Run successful tests on UATEST workers
  • Deploy Vulkan to Linux workers on Production
  • Run successful tests on Production workers

  • Discuss if this also should be done for Windows workers.
@Jeroen-Bakker requested a Vulkan installation for the Buildbot workers: - [ ] Have a patch ready to run tests on - [ ] Deploy Vulkan to Linux workers on UATEST - [ ] Run successful tests on UATEST workers - [ ] Deploy Vulkan to Linux workers on Production - [ ] Run successful tests on Production workers --- - [ ] Discuss if this also should be done for Windows workers.
Bart van der Braak added the
Service
Buildbot
Type
Deployment
labels 2024-08-23 15:21:34 +02:00
Bart van der Braak self-assigned this 2024-08-23 15:21:35 +02:00
Bart van der Braak added this to the DevOps Progress Board project 2024-08-23 15:21:35 +02:00

Some side notes.

  • First test with blender-bot +gpu parameter. That should run the test already on a vulkan capable machine.
  • Installing lavapipe on linux seems to be installing a specific mesa package. Need to find out which one.
    Building the initial patch might take some time. Code wise it is easy. Validating that it is correct might take some more time...
Some side notes. - First test with blender-bot `+gpu` parameter. That should run the test already on a vulkan capable machine. - Installing lavapipe on linux seems to be installing a specific mesa package. Need to find out which one. Building the initial patch might take some time. Code wise it is easy. Validating that it is correct might take some more time...

I spend today checking out the current state of things, these are my notes on windows:

There's 3 main parts where vulkan is used:

1- Shader builder, this compiles all shaders at build time and checks all shaders are valid
2- WITH_GPU_DRAW_TESTS - tests the draw manager?
3- WITH_GPU_RENDER_TESTS - Render tests

Global Setup is fairly straight forward:

1- grab a copy of mesa for windows from https://github.com/pal1000/mesa-dist-win and unpack it somewhere, i used f:\tools\Mesa-24.2.4\ but feel free to be createive here if you want to:

2- set the following environment vars

VK_DRIVER_FILES=f:\tools\Mesa-24.2.4\x64\lvp_icd.x86_64.json

This should be enough but if you want to be super sure it's not picking up any other stray ICD's you can set the following 2 environment vars:

VK_LOADER_DRIVERS_DISABLE=*
VK_LOADER_DRIVERS_SELECT=*lvp*

during troubleshooting this may come in handy, but should not be set on the CI env.

VK_LOADER_DEBUG=all

Shader builder

1- enable cmake option WITH_GPU_BUILDTIME_SHADER_BUILDER

2- Rebuild blender

Notes/Points of concern:

  • while convenient for the GPU developers, for the rest of us this should NOT be a build time option, it's clearly a test and this should have been integrated into ctest instead.

  • it's slow, vulkan tests take little over 4 minutes, opengl tests atleast seem to be caching something and are faster the second time around, vulkan does not appear to have that benefit

  • It's not very chatty / displays no progress whatsoever, it just sits there for the duration of the test with a blinking cursor before it tells you it succeeded or not

  • When running without the --gpu-backend switch it'll run both the opengl and vulkan tests, which is great, except for when it has issues getting an opengl context, this made it bail out here locally and not run the vulkan tests , we may need to pass a backend from cmake there

WITH_GPU_DRAW_TESTS

no changes needed, for me locally many tests are failing, then again so do they on my nvidia card so it's not doing that much worse

takes about 1700s to complete

Notes/Points of concern:

1700s is not great....

WITH_GPU_RENDER_TESTS

No changes needed, for me locally all tests appear to be crashing, then again so do they on my nvidia card so it's not doing that much worse (seemingly a different crash though)

nv crashes in blender

blender.exe         :0x00007FF66A763D70  BLI_strncpy K:\BlenderGit\blender\source\blender\blenlib\intern\string.c:72
blender.exe         :0x00007FF66CB81D20  blender::gpu::VKPipelinePool::read_from_disk K:\BlenderGit\blender\source\blender\gpu\vulkan\vk_pipeline_pool.cc:717
blender.exe         :0x00007FF66CBADA10  blender::gpu::VKDevice::init K:\BlenderGit\blender\source\blender\gpu\vulkan\vk_device.cc:99

while with mesa enabled it crashes inside mesa.

overall impression:

This test suite is nowhere near ready for CI, between the perf concerns and the failing/crashing tests, feels like this needs to mature on a developers workstation for a bit longer.

I spend today checking out the current state of things, these are my notes on windows: There's 3 main parts where vulkan is used: 1- Shader builder, this compiles all shaders at build time and checks all shaders are valid 2- WITH_GPU_DRAW_TESTS - tests the draw manager? 3- WITH_GPU_RENDER_TESTS - Render tests Global Setup is fairly straight forward: 1- grab a copy of mesa for windows from https://github.com/pal1000/mesa-dist-win and unpack it somewhere, i used `f:\tools\Mesa-24.2.4\` but feel free to be createive here if you want to: 2- set the following environment vars ``` VK_DRIVER_FILES=f:\tools\Mesa-24.2.4\x64\lvp_icd.x86_64.json ``` This _should_ be enough but if you want to be super sure it's not picking up any other stray ICD's you can set the following 2 environment vars: ``` VK_LOADER_DRIVERS_DISABLE=* VK_LOADER_DRIVERS_SELECT=*lvp* ``` during troubleshooting this may come in handy, but should not be set on the CI env. ``` VK_LOADER_DEBUG=all ``` # Shader builder 1- enable cmake option `WITH_GPU_BUILDTIME_SHADER_BUILDER` 2- Rebuild blender Notes/Points of concern: - while convenient for the GPU developers, for the rest of us this should NOT be a build time option, it's clearly a test and this should have been integrated into ctest instead. - it's slow, vulkan tests take little over 4 minutes, opengl tests atleast seem to be caching something and are faster the second time around, vulkan does not appear to have that benefit - It's not very chatty / displays no progress whatsoever, it just sits there for the duration of the test with a blinking cursor before it tells you it succeeded or not - When running without the `--gpu-backend` switch it'll run both the opengl and vulkan tests, which is great, except for when it has issues getting an opengl context, this made it bail out here locally and not run the vulkan tests , we may need to pass a backend from cmake there # WITH_GPU_DRAW_TESTS no changes needed, for me locally many tests are failing, then again so do they on my nvidia card so it's not doing _that_ much worse takes about 1700s to complete Notes/Points of concern: 1700s is not great.... # WITH_GPU_RENDER_TESTS No changes needed, for me locally all tests appear to be crashing, then again so do they on my nvidia card so it's not doing _that_ much worse (seemingly a different crash though) nv crashes in blender ``` blender.exe :0x00007FF66A763D70 BLI_strncpy K:\BlenderGit\blender\source\blender\blenlib\intern\string.c:72 blender.exe :0x00007FF66CB81D20 blender::gpu::VKPipelinePool::read_from_disk K:\BlenderGit\blender\source\blender\gpu\vulkan\vk_pipeline_pool.cc:717 blender.exe :0x00007FF66CBADA10 blender::gpu::VKDevice::init K:\BlenderGit\blender\source\blender\gpu\vulkan\vk_device.cc:99 ``` while with mesa enabled it crashes inside mesa. # overall impression: This test suite is nowhere near ready for CI, between the perf concerns and the failing/crashing tests, feels like this needs to mature on a developers workstation for a bit longer.

Noteworthy mentions:

  • The option WITH_GPU_BUILDTIME_SHADER_BUILDER is not needed anymore and will be removed soon #129014 as it is replaced by the shader compilation tests. Developpers should just run the test after building.
  • WITH_GPU_DRAW_TESTS was introduced to test the draw manager (from the draw module), but was reused to enable the GPU tests (sigh). See:
  if(WITH_GPU_DRAW_TESTS)
    list(APPEND TEST_SRC
      tests/buffer_texture_test.cc
      tests/compute_test.cc
      tests/framebuffer_test.cc
      tests/immediate_test.cc
      tests/index_buffer_test.cc
      tests/push_constants_test.cc
      tests/shader_create_info_test.cc
      tests/shader_test.cc
      tests/specialization_constants_test.cc
      tests/state_blend_test.cc
      tests/storage_buffer_test.cc
      tests/texture_test.cc
      tests/vertex_buffer_test.cc
    )
  endif()
  • We should really focus on getting the shader compilation tests running first. These only require WITH_GTESTS and a valid context creation. Metal already does this exception for running them:
  # Enable shader validation on build-bot for Metal
  if(WITH_METAL_BACKEND AND NOT WITH_GPU_DRAW_TESTS)
    list(APPEND TEST_SRC
      tests/shader_create_info_test.cc
    )
  endif()

I don't think enabling all the tests is a short term target. If we can have the shader tests on all platform that would be nice already.

Noteworthy mentions: - The option `WITH_GPU_BUILDTIME_SHADER_BUILDER` is not needed anymore and will be removed soon #129014 as it is replaced by the shader compilation tests. Developpers should just run the test after building. - `WITH_GPU_DRAW_TESTS` was introduced to test the draw manager (from the draw module), but was reused to enable the GPU tests (sigh). See: ``` if(WITH_GPU_DRAW_TESTS) list(APPEND TEST_SRC tests/buffer_texture_test.cc tests/compute_test.cc tests/framebuffer_test.cc tests/immediate_test.cc tests/index_buffer_test.cc tests/push_constants_test.cc tests/shader_create_info_test.cc tests/shader_test.cc tests/specialization_constants_test.cc tests/state_blend_test.cc tests/storage_buffer_test.cc tests/texture_test.cc tests/vertex_buffer_test.cc ) endif() ``` - We should really focus on getting the shader compilation tests running first. These only require `WITH_GTESTS` and a valid context creation. Metal already does this exception for running them: ``` # Enable shader validation on build-bot for Metal if(WITH_METAL_BACKEND AND NOT WITH_GPU_DRAW_TESTS) list(APPEND TEST_SRC tests/shader_create_info_test.cc ) endif() ``` I don't think enabling all the tests is a short term target. If we can have the shader tests on all platform that would be nice already.

Proposal:

  • WITH_GPU_RENDER_TESTS same as now. Render tests, focused on EEVEE and long execution times. Many stuff can be tweaked to make it more usable. But that should be done with rendering module as well.
  • WITH_GPU_DRAW_TESTS remove from gpu module
  • WITH_GPU_BACKEND_TESTS Add tests that require backends. Not
  • WITH_GTEST Contains tests that don't require backends. Vulkan has several already being tested this way.

Nice to see that mesa icd could be used on windows. I haven't tested it but could be a nice addition to what I expected to be done.

Proposal: - `WITH_GPU_RENDER_TESTS` same as now. Render tests, focused on EEVEE and long execution times. Many stuff can be tweaked to make it more usable. But that should be done with rendering module as well. - `WITH_GPU_DRAW_TESTS` remove from gpu module - `WITH_GPU_BACKEND_TESTS` Add tests that require backends. Not - `WITH_GTEST` Contains tests that don't require backends. Vulkan has several already being tested this way. Nice to see that mesa icd could be used on windows. I haven't tested it but could be a nice addition to what I expected to be done.

allright, ran the gpu tests on actual hardware (GTX1660) on windows

[==========] 509 tests from 8 test suites ran. (2987669 ms total)
[  PASSED  ] 400 tests.
[  SKIPPED ] 13 tests, listed below:
[  FAILED  ] 96 tests, listed below:
...
"gpu" time elapsed: 00:49:49

50 minutes just is too much, on the upside a fair bit of that are opengl shader test (GPUOpenGLTest.static_shaders (1098249 ms)) which i'd like to leave out of the testing track for now, registering a vulkan ICD is easy on windows (just some env vars) installing an opengl ICD is a bit more work, and much less flexible, it can be done but i'd like to kick that down the road as a problem for future us.

I Agree with @Jeroen-Bakker here that having a bit more control over what is enabled would be nice, i'd go a little further and add a flag for enabling backends for testing (ie we'd still want to build with both opengl and vulkan backends since the build is going to end users, but maybe we just want to test the vulkan backend) perhaps a WITH_GPU_TESTS_BACKENDS=opengl;vulkan;metal (not sold on the name, as it can easily give the impression this may also influence cycles, and people may go sticking optix/cuda/hip in there...)

I attached the test log (The printf's during the static_shaders test are mine, i just wanted to see some progress, you can ignore them)

allright, ran the gpu tests on actual hardware (GTX1660) on windows ``` [==========] 509 tests from 8 test suites ran. (2987669 ms total) [ PASSED ] 400 tests. [ SKIPPED ] 13 tests, listed below: [ FAILED ] 96 tests, listed below: ... "gpu" time elapsed: 00:49:49 ``` 50 minutes just is too much, on the upside a fair bit of that are opengl shader test (`GPUOpenGLTest.static_shaders (1098249 ms)`) which i'd like to leave out of the testing track for now, registering a vulkan ICD is easy on windows (just some env vars) installing an opengl ICD is a bit more work, and much less flexible, it can be done but i'd like to kick that down the road as a problem for future us. I Agree with @Jeroen-Bakker here that having a bit more control over what is enabled would be nice, i'd go a little further and add a flag for enabling backends for testing (ie we'd still want to _build_ with both opengl and vulkan backends since the build is going to end users, but maybe we _just_ want to test the vulkan backend) perhaps a `WITH_GPU_TESTS_BACKENDS=opengl;vulkan;metal ` (not sold on the name, as it can easily give the impression this may also influence cycles, and people may go sticking optix/cuda/hip in there...) I attached the test log (The printf's during the static_shaders test are mine, i just wanted to see _some_ progress, you can ignore them)

Yes we should do the backend specific testing filter.

OpenGL testing on buildbot is not the first step as OpenGL might require an actual GPU and connected monitor. So would focus on getting Vulkan as a first step. Metal can follow.

The issue with static shader testing is that OpenGL will do frontend and backend testing. NVIDIA OpenGL compiler is just slow, but the second run can be faster as we cache the previous result at this moment. Although during testing you might not want to load the compiled binaries from disk.

The target is to focus on SPIR-V frontend testing. This is fast and should narrow down the GLSL compilation. Although we need to find out how where and how to do the linting/static analyzing part to get better quality GLSL code.

Yes we should do the backend specific testing filter. OpenGL testing on buildbot is not the first step as OpenGL might require an actual GPU and connected monitor. So would focus on getting Vulkan as a first step. Metal can follow. The issue with static shader testing is that OpenGL will do frontend and backend testing. NVIDIA OpenGL compiler is just slow, but the second run can be faster as we cache the previous result at this moment. Although during testing you might not want to load the compiled binaries from disk. The target is to focus on SPIR-V frontend testing. This is fast and should narrow down the GLSL compilation. Although we need to find out how where and how to do the linting/static analyzing part to get better quality GLSL code.

Proposal:

  • WITH_GPU_BACKEND_TESTS Add tests that require backends. Not

Did something get lost in this sentence? I can't quite understand it.

> Proposal: > - `WITH_GPU_BACKEND_TESTS` Add tests that require backends. Not Did something get lost in this sentence? I can't quite understand it.

Down to 8 or so failures, seems like the WITH_GPU_DRAW_TESTS define is missing causing a whole bunch of the test shaders not to be found, I'm somewhat confused how is this could be passing on any of the other platforms?

diff --git a/source/blender/gpu/CMakeLists.txt b/source/blender/gpu/CMakeLists.txt
index a893b644d7c..f81d4e28752 100644
--- a/source/blender/gpu/CMakeLists.txt
+++ b/source/blender/gpu/CMakeLists.txt
@@ -698,6 +698,7 @@ set(MSL_SRC
 if(WITH_GTESTS)
   if(WITH_GPU_DRAW_TESTS)
     list(APPEND GLSL_SRC ${GLSL_SRC_TEST})
+       add_definitions(-DWITH_GPU_DRAW_TESTS)
   endif()
 endif()
Down to 8 or so failures, seems like the `WITH_GPU_DRAW_TESTS` define is missing causing a whole bunch of the test shaders not to be found, I'm somewhat confused how is this could be passing on any of the other platforms? ``` diff --git a/source/blender/gpu/CMakeLists.txt b/source/blender/gpu/CMakeLists.txt index a893b644d7c..f81d4e28752 100644 --- a/source/blender/gpu/CMakeLists.txt +++ b/source/blender/gpu/CMakeLists.txt @@ -698,6 +698,7 @@ set(MSL_SRC if(WITH_GTESTS) if(WITH_GPU_DRAW_TESTS) list(APPEND GLSL_SRC ${GLSL_SRC_TEST}) + add_definitions(-DWITH_GPU_DRAW_TESTS) endif() endif() ```
Sign in to join this conversation.
No description provided.