Buildbot: Upgrade to latest HIP compiler on Windows #81

Closed
opened 2024-05-24 21:13:19 +02:00 by Brecht Van Lommel · 8 comments

Previously the HIP compiler was available as a zip file that we could simply decompress. Now it is an installer, which will require some work in the provisioning scripts to automatically install.

Previously the HIP compiler was available as a zip file that we could simply decompress. Now it is an installer, which will require some work in the provisioning scripts to automatically install.
Brecht Van Lommel added the
Service
Buildbot
label 2024-05-24 21:13:28 +02:00
Bart van der Braak was assigned by Sergey Sharybin 2024-05-27 10:01:39 +02:00

I have found that running the installer creates a directory structure closely resembling the compressed zip files we used previously:

#Requires -RunAsAdministrator
Invoke-WebRequest `
  -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q3-Win10-Win11-For-HIP.exe" `
  -OutFile "$env:USERPROFILE\Downloads\Setup-5.5.exe"
Invoke-WebRequest `
  -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-Win10-Win11-For-HIP.exe" `
  -OutFile "$env:USERPROFILE\Downloads\Setup-5.7.exe"

Start-Process "$env:USERPROFILE\Downloads\Setup-5.5.exe" -ArgumentList '-install','-log',"$env:USERPROFILE\installer_log.txt" -NoNewWindow -Wait
Start-Process "$env:USERPROFILE\Downloads\Setup-5.7.exe" -ArgumentList '-install','-log',"$env:USERPROFILE\installer_log.txt" -NoNewWindow -Wait

These commands created the directories C:\Program Files\AMD\ROCm\5.5 and C:\Program Files\AMD\ROCm\5.7, which have the same structure as our previous zipped packages, though with additional includes.

I attempted to compile using one these versions (e.g by moving and renaming the 5.7 directory to C:\Program Data\AMD\HIP\hip_sdk_5.7.32000), but it appears CMAKE only sets HIP_ROOT_DIR when hip_sdk_5.5.30571 is available.

I am investigating whether there is a misconfiguration in the provisioning scripts.

I have found that running the installer creates a directory structure closely resembling the compressed zip files we used previously: ```powershell #Requires -RunAsAdministrator Invoke-WebRequest ` -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q3-Win10-Win11-For-HIP.exe" ` -OutFile "$env:USERPROFILE\Downloads\Setup-5.5.exe" Invoke-WebRequest ` -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-Win10-Win11-For-HIP.exe" ` -OutFile "$env:USERPROFILE\Downloads\Setup-5.7.exe" Start-Process "$env:USERPROFILE\Downloads\Setup-5.5.exe" -ArgumentList '-install','-log',"$env:USERPROFILE\installer_log.txt" -NoNewWindow -Wait Start-Process "$env:USERPROFILE\Downloads\Setup-5.7.exe" -ArgumentList '-install','-log',"$env:USERPROFILE\installer_log.txt" -NoNewWindow -Wait ``` These commands created the directories `C:\Program Files\AMD\ROCm\5.5` and `C:\Program Files\AMD\ROCm\5.7`, which have the same structure as our previous zipped packages, though with additional includes. I attempted to compile using one these versions (e.g by moving and renaming the 5.7 directory to `C:\Program Data\AMD\HIP\hip_sdk_5.7.32000`), but it appears CMAKE only sets `HIP_ROOT_DIR` when `hip_sdk_5.5.30571` is available. I am investigating whether there is a misconfiguration in the provisioning scripts.
Author
Owner

The HIP version is controlled by this file in the Blender repo:
https://projects.blender.org/blender/blender/src/branch/main/build_files/config/pipeline_config.yaml

The way I tested this in the past is to install HIP, then make a pull request for Blender with the updated version in this config file. And then it can be tested on the buildbot.

I guess you will also need to update this to only add the hip_sdk prefix for e.g. version 5.5 and older. Or alternatively rename all the existing folders to remove the prefix.
https://projects.blender.org/infrastructure/blender-devops/src/branch/main/buildbot/worker/blender/compile.py#L224

The HIP version is controlled by this file in the Blender repo: https://projects.blender.org/blender/blender/src/branch/main/build_files/config/pipeline_config.yaml The way I tested this in the past is to install HIP, then make a pull request for Blender with the updated version in this config file. And then it can be tested on the buildbot. I guess you will also need to update this to only add the hip_sdk prefix for e.g. version 5.5 and older. Or alternatively rename all the existing folders to remove the prefix. https://projects.blender.org/infrastructure/blender-devops/src/branch/main/buildbot/worker/blender/compile.py#L224

Thanks @brecht, that helps alot. I created blender/blender#122393

Thanks @brecht, that helps alot. I created https://projects.blender.org/blender/blender/pulls/122393

I was able to get a build going with the new version by executing

cd C:\Users\blender\.devops\services\buildbot-worker\
pipenv run python C:\Users\blender\git\bdr-devops-core\buildbot\code.py --track-id vexp --service-end-id LOCAL --architecture amd64 --patch-id 122393 --commit-id 6672dcc1b5965d24071501ef5f5b743ebd40be1f --needs-gpu-binaries compile-gpu

However, that did seem to give the following errors:

[37/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/clang++.exe --offload-arch=gfx900 --offload-arch=gfx90c --offload-arch=gfx902 --offload-arch=gfx1010 --offload-arch=gfx1011 --offload-arch=gfx1012 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -fgpu-rdc --hip-link --cuda-device-only C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.bc C:/ProgramData/AMD/HIP/hiprtsdk-2.0.3a134c7/hiprt2.0.3a134c7/dist/bin/Release/hiprt02000_amd_lib_win.bc -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.hipfb"
FAILED: intern/cycles/kernel/kernel_rt_gfx.hipfb
cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/clang++.exe --offload-arch=gfx900 --offload-arch=gfx90c --offload-arch=gfx902 --offload-arch=gfx1010 --offload-arch=gfx1011 --offload-arch=gfx1012 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -fgpu-rdc --hip-link --cuda-device-only C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.bc C:/ProgramData/AMD/HIP/hiprtsdk-2.0.3a134c7/hiprt2.0.3a134c7/dist/bin/Release/hiprt02000_amd_lib_win.bc -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.hipfb"
lld: error: linking module flags 'amdgpu_code_object_version': IDs have conflicting values in 'C:\Users\blender\AppData\Local\Temp\hiprt02000_amd_lib_win-gfx1010-d8f33f.bc' and 'ld-temp.o'
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
[38/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/hipcc.bat --offload-arch=gfx1012 --genco C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/hip/kernel.cpp -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -D HIPCC -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/.. -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/hip -Wno-parentheses-equality -Wno-unused-value -ffast-math -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_gfx1012.fatbin -D WITH_NANOVDB"
[39/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe" -arch=sm_61 --cubin C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/cuda/kernel.cu -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -D NVCC -m 64 -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/.. -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/cuda --use_fast_math -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_sm_61.cubin -Wno-deprecated-gpu-targets -ccbin="C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe" -D WITH_NANOVDB -std=c++17"
kernel.cu
ninja: build stopped: subcommand failed.
...
================================================================================
ERROR Summary:
================================================================================
FAILED: intern/cycles/kernel/kernel_rt_gfx.hipfb
lld: error: linking module flags 'amdgpu_code_object_version': IDs have conflicting values in 'C:\Users\blender\AppData\Local\Temp\hiprt02000_amd_lib_win-gfx1010-d8f33f.bc' and 'ld-temp.o'
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

I am trying to figure out if this is an error on my part adding the new version and my expectation for it to work out of the box or that it's an issue with the package and/or the linked HIPRT version.

I was able to get a build going with the new version by executing ``` cd C:\Users\blender\.devops\services\buildbot-worker\ pipenv run python C:\Users\blender\git\bdr-devops-core\buildbot\code.py --track-id vexp --service-end-id LOCAL --architecture amd64 --patch-id 122393 --commit-id 6672dcc1b5965d24071501ef5f5b743ebd40be1f --needs-gpu-binaries compile-gpu ``` However, that did seem to give the following errors: ``` [37/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/clang++.exe --offload-arch=gfx900 --offload-arch=gfx90c --offload-arch=gfx902 --offload-arch=gfx1010 --offload-arch=gfx1011 --offload-arch=gfx1012 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -fgpu-rdc --hip-link --cuda-device-only C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.bc C:/ProgramData/AMD/HIP/hiprtsdk-2.0.3a134c7/hiprt2.0.3a134c7/dist/bin/Release/hiprt02000_amd_lib_win.bc -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.hipfb" FAILED: intern/cycles/kernel/kernel_rt_gfx.hipfb cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/clang++.exe --offload-arch=gfx900 --offload-arch=gfx90c --offload-arch=gfx902 --offload-arch=gfx1010 --offload-arch=gfx1011 --offload-arch=gfx1012 --offload-arch=gfx1030 --offload-arch=gfx1031 --offload-arch=gfx1032 --offload-arch=gfx1034 --offload-arch=gfx1035 --offload-arch=gfx1036 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1103 -fgpu-rdc --hip-link --cuda-device-only C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.bc C:/ProgramData/AMD/HIP/hiprtsdk-2.0.3a134c7/hiprt2.0.3a134c7/dist/bin/Release/hiprt02000_amd_lib_win.bc -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_rt_gfx.hipfb" lld: error: linking module flags 'amdgpu_code_object_version': IDs have conflicting values in 'C:\Users\blender\AppData\Local\Temp\hiprt02000_amd_lib_win-gfx1010-d8f33f.bc' and 'ld-temp.o' clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation) [38/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\CMake\bin\cmake.exe" -E env HIP_PATH=C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000 C:/ProgramData/AMD/HIP/hip_sdk_5.7.32000/bin/hipcc.bat --offload-arch=gfx1012 --genco C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/hip/kernel.cpp -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -D HIPCC -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/.. -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/hip -Wno-parentheses-equality -Wno-unused-value -ffast-math -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_gfx1012.fatbin -D WITH_NANOVDB" [39/39] cmd.exe /C "cd /D C:\Users\blender\git\blender-vexp\build_release\intern\cycles\kernel && "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe" -arch=sm_61 --cubin C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/cuda/kernel.cu -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -D NVCC -m 64 -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/.. -I C:/Users/blender/git/blender-vexp/blender.git/intern/cycles/kernel/device/cuda --use_fast_math -o C:/Users/blender/git/blender-vexp/build_release/intern/cycles/kernel/kernel_sm_61.cubin -Wno-deprecated-gpu-targets -ccbin="C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe" -D WITH_NANOVDB -std=c++17" kernel.cu ninja: build stopped: subcommand failed. ... ================================================================================ ERROR Summary: ================================================================================ FAILED: intern/cycles/kernel/kernel_rt_gfx.hipfb lld: error: linking module flags 'amdgpu_code_object_version': IDs have conflicting values in 'C:\Users\blender\AppData\Local\Temp\hiprt02000_amd_lib_win-gfx1010-d8f33f.bc' and 'ld-temp.o' clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. ``` I am trying to figure out if this is an error on my part adding the new version and my expectation for it to work out of the box or that it's an issue with the package and/or the linked HIPRT version.

@brecht Do you think this error might be because HIPRT is not updated, or could there be another cause?

I noticed the new version of HIPRT available for download is packaged differently from the version we're currently using (hiprtsdk-2.0.3a134c7).

Interestingly, there’s a more recent release on their GitHub page:

https://github.com/GPUOpen-LibrariesAndSDKs/HIPRT/releases/tag/2.3.7df94af

@brecht Do you think this error might be because HIPRT is not updated, or could there be another cause? I noticed the new version of [HIPRT available for download](https://gpuopen.com/download/hiprt/hiprtSdk-2.2.0e68f54.zip) is packaged differently from the version we're currently using (`hiprtsdk-2.0.3a134c7`). Interestingly, there’s a more recent release on their GitHub page: https://github.com/GPUOpen-LibrariesAndSDKs/HIPRT/releases/tag/2.3.7df94af
Author
Owner

Yeah, I think it's a mismatch with the hiprt version somehow. We are in the process to switching to this new version, which is part of what prompted the upgrade.

I think from the infrastructure side just deploying this on all machines is enough. And then we can take care of solving these build errors and actually bumping the version in the blender repo config file.

Yeah, I think it's a mismatch with the hiprt version somehow. We are in the process to switching to this new version, which is part of what prompted the upgrade. I think from the infrastructure side just deploying this on all machines is enough. And then we can take care of solving these build errors and actually bumping the version in the blender repo config file.

Perfect, will do.

Perfect, will do.

Summary of actions performed:

  1. Installed AMD-Software-PRO-Edition-23.Q4-Win10-Win11-For-HIP.exe on Windows 10 VM and extracted the HIP 5.7.2 SDK.
  2. Repackaged SDK to be in line with our current CMAKE FIND commands and placed it within our share at /shared/devops/Software/Workers/windows/hip_sdk_5.7.32000.zip.
    • Created draft PR on blender/blender to target as patch on BuildBot (see here).
    • Tested on local VM with provisioning changes:
      cd C:\Users\blender\.devops\services\buildbot-worker\ && `
      	pipenv run python C:\Users\blender\git\bdr-devops-core\buildbot\worker\code.py `
       	--track-id vdev `
         --service-env-id LOCAL `
         --architecture amd64 `
         --commit-id 6672dcc1b5965d24071501ef5f5b743ebd40be1f `
         --needs-gpu-binaries `
         compile-gpu
      
  3. Updated cmd/provision/common/provision-hip.ps1 to include new version.
  4. Pushed to UATEST workers:
    • pwsh -c "cmd/machine/machine-push-workers.ps1" with:
      • Machine-Push-Core-ToWorkers -machinePattern "*" -devOpsEnvIds "UATEST"
      • Machine-Push-Software-ToWorkers -machinePattern "*-windows-*" -devOpsEnvIds "UATEST"
    • pwsh -c "cmd/machine/machine-invoke-workers.ps1" with:
      • Machine-Remote-Invoke -machinePattern "*-windows-*" -devOpsEnvIds "UATEST" -script $installHIP
  5. Ran pipeline on builder-uatest.blender.org using vexp-code-patch-windows-amd64 (see 46).
  6. Merged develop into master on bdr-devops-core.
  7. Pushed to PROD workers:
    • pwsh -c "cmd/machine/machine-push-workers.ps1" with:
      • Machine-Push-Core-ToWorkers -machinePattern "*" -devOpsEnvIds "PROD"
      • Machine-Push-Software-ToWorkers -machinePattern "*-windows-*" -devOpsEnvIds "PROD"
    • pwsh -c "cmd/machine/machine-invoke-workers.ps1" with:
      • Machine-Remote-Invoke -machinePattern "*-windows-*" -devOpsEnvIds "PROD" -script $installHIP
  8. Ran pipeline on builder.blender.org using vexp-code-patch-windows-amd64 (see 5382).
Summary of actions performed: 1. Installed `AMD-Software-PRO-Edition-23.Q4-Win10-Win11-For-HIP.exe` on Windows 10 VM and extracted the HIP 5.7.2 SDK. 2. Repackaged SDK to be in line with our current CMAKE FIND commands and placed it within our share at `/shared/devops/Software/Workers/windows/hip_sdk_5.7.32000.zip`. - Created draft PR on `blender/blender` to target as patch on BuildBot (see [here](https://projects.blender.org/blender/blender/pulls/122393)). - Tested on local VM with provisioning changes: ```sh cd C:\Users\blender\.devops\services\buildbot-worker\ && ` pipenv run python C:\Users\blender\git\bdr-devops-core\buildbot\worker\code.py ` --track-id vdev ` --service-env-id LOCAL ` --architecture amd64 ` --commit-id 6672dcc1b5965d24071501ef5f5b743ebd40be1f ` --needs-gpu-binaries ` compile-gpu ``` 3. Updated `cmd/provision/common/provision-hip.ps1` to include new version. - See Merge Request [Upgrade HIP compiler on Windows #11](https://gitlab.com/blender/bdr-devops-core/-/merge_requests/11). 4. Pushed to UATEST workers: - `pwsh -c "cmd/machine/machine-push-workers.ps1"` with: - `Machine-Push-Core-ToWorkers -machinePattern "*" -devOpsEnvIds "UATEST"` - `Machine-Push-Software-ToWorkers -machinePattern "*-windows-*" -devOpsEnvIds "UATEST"` - `pwsh -c "cmd/machine/machine-invoke-workers.ps1"` with: - `Machine-Remote-Invoke -machinePattern "*-windows-*" -devOpsEnvIds "UATEST" -script $installHIP` 5. Ran pipeline on `builder-uatest.blender.org` using `vexp-code-patch-windows-amd64` (see [46](https://builder-uatest.blender.org/admin/#/builders/64/builds/46)). 6. Merged `develop` into `master` on `bdr-devops-core`. 7. Pushed to PROD workers: - `pwsh -c "cmd/machine/machine-push-workers.ps1"` with: - `Machine-Push-Core-ToWorkers -machinePattern "*" -devOpsEnvIds "PROD"` - `Machine-Push-Software-ToWorkers -machinePattern "*-windows-*" -devOpsEnvIds "PROD"` - `pwsh -c "cmd/machine/machine-invoke-workers.ps1"` with: - `Machine-Remote-Invoke -machinePattern "*-windows-*" -devOpsEnvIds "PROD" -script $installHIP` 8. Ran pipeline on `builder.blender.org` using `vexp-code-patch-windows-amd64` (see [5382](https://builder.blender.org/admin/#/builders/136/builds/5382)).
Bart van der Braak added this to the DevOps Progress Board project 2024-07-16 13:00:16 +02:00
Bart van der Braak changed title from Upgrade to latest HIP compiler on Windows to Buildbot: Upgrade to latest HIP compiler on Windows 2024-07-17 15:15:28 +02:00
Sign in to join this conversation.
No description provided.