Regression: Cycles: Optix not able to render without cuda toolkit #109550

Closed
opened 2023-06-30 09:35:03 +02:00 by Raimund Klink · 9 comments
Contributor

System Information
Operating system: Linux
Graphics card: NVIDIA A100-SXM4-40GB

Blender Version
Broken: 3.6
Worked: 3.5.1

Short description of error
A render with Optix suddenly no longer works without the CUDA toolkit
The same machine renders without issues with Blender 3.5.1
Is this expected?

Exact steps for others to reproduce the error

Fra:1 Mem:2762.21M (Peak 2828.77M) | Time:00:01.25 | Mem:162.15M, Peak:162.15M | Scene, ViewLayer | Updating Images | Loading Map #0
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.11 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Waiting for render to start
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.11 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Loading render kernels (may take a few minutes the first time)
/usr/bin/which: no nvcc in (/opt/software/packages/jdk/19.0.1/bin:/home/dattila/.local/bin:/home/dattila/bin:/opt/cray/pe/mpich/8.1.24/ofi/cray/10.0/bin:/opt/cray/pe/mpich/8.1.24/bin:/opt/cray/pe/craype/2.7.19/bin:/opt/cray/pe/cce/15.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin:/opt/cray/pe/cce/15.0.1/binutils/cross/x86_64-aarch64/aarch64-linux-gnu/../bin:/opt/cray/pe/cce/15.0.1/utils/x86_64/bin:/opt/cray/pe/cce/15.0.1/bin:/opt/cray/pe/perftools/23.02.0/bin:/opt/cray/pe/papi/7.0.0.1/bin:/opt/cray/libfabric/1.15.2.0/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/sbin:/bin:/opt/cray/pe/bin)
CUDA nvcc compiler not found. Install CUDA toolkit in default location.
Refer to the Cycles GPU rendering documentation for possible solutions:
https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | CUDA nvcc compiler not found. Install CUDA toolkit in default location.
Error: CUDA nvcc compiler not found. Install CUDA toolkit in default location.
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Updating Scene
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Updating Shaders
Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Loading denoising kernels (may take a few minutes the first time)
Fra:1 Mem:2653.68M (Peak 3850.55M) | Time:00:02.51 | Mem:3020.54M, Peak:3020.54M | Scene, ViewLayer | CUDA nvcc compiler not found. Install CUDA toolkit in default location.
Blender quit
**System Information** Operating system: Linux Graphics card: NVIDIA A100-SXM4-40GB **Blender Version** Broken: 3.6 Worked: 3.5.1 **Short description of error** A render with Optix suddenly no longer works without the CUDA toolkit The same machine renders without issues with Blender 3.5.1 Is this expected? **Exact steps for others to reproduce the error** ``` Fra:1 Mem:2762.21M (Peak 2828.77M) | Time:00:01.25 | Mem:162.15M, Peak:162.15M | Scene, ViewLayer | Updating Images | Loading Map #0 Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.11 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Waiting for render to start Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.11 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Loading render kernels (may take a few minutes the first time) /usr/bin/which: no nvcc in (/opt/software/packages/jdk/19.0.1/bin:/home/dattila/.local/bin:/home/dattila/bin:/opt/cray/pe/mpich/8.1.24/ofi/cray/10.0/bin:/opt/cray/pe/mpich/8.1.24/bin:/opt/cray/pe/craype/2.7.19/bin:/opt/cray/pe/cce/15.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin:/opt/cray/pe/cce/15.0.1/binutils/cross/x86_64-aarch64/aarch64-linux-gnu/../bin:/opt/cray/pe/cce/15.0.1/utils/x86_64/bin:/opt/cray/pe/cce/15.0.1/bin:/opt/cray/pe/perftools/23.02.0/bin:/opt/cray/pe/papi/7.0.0.1/bin:/opt/cray/libfabric/1.15.2.0/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/c3/bin:/sbin:/bin:/opt/cray/pe/bin) CUDA nvcc compiler not found. Install CUDA toolkit in default location. Refer to the Cycles GPU rendering documentation for possible solutions: https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | CUDA nvcc compiler not found. Install CUDA toolkit in default location. Error: CUDA nvcc compiler not found. Install CUDA toolkit in default location. Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Updating Scene Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Updating Shaders Fra:1 Mem:2640.13M (Peak 3850.55M) | Time:00:02.50 | Mem:1724.49M, Peak:1724.49M | Scene, ViewLayer | Loading denoising kernels (may take a few minutes the first time) Fra:1 Mem:2653.68M (Peak 3850.55M) | Time:00:02.51 | Mem:3020.54M, Peak:3020.54M | Scene, ViewLayer | CUDA nvcc compiler not found. Install CUDA toolkit in default location. Blender quit ```
Raimund Klink added the
Status
Needs Triage
Type
Report
Priority
Normal
labels 2023-06-30 09:35:03 +02:00
YimingWu added the
Interest
Render & Cycles
label 2023-06-30 10:00:21 +02:00
Iliya Katushenock changed title from Cycles: Optix not able to render without cuda toolkit to Regression: Cycles: Optix not able to render without cuda toolkit 2023-06-30 10:30:06 +02:00
Author
Contributor

Update: The user now installed CUDA 12.1(latest?)...

Fra:41 Mem:201.80M (Peak 455.10M) | Time:00:03.86 | Mem:167.39M, Peak:167.39M | Scene.001, ViewLayer | Loading render kernels (may take a few minutes the first time)
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
nvcc fatal : Failed to preprocess host compiler properties.
Failed to execute compilation command, see console for details.
Refer to the Cycles GPU rendering documentation for possible solutions:
https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html
CUDA version 12.1 detected, build may succeed but only CUDA 10.1 to 11.4 are officially supported.
Compiling CUDA kernel ...
"nvcc" -arch=sm_80 --cubin "/scratch/tmp/slurm-1856847/sheepit/2b7ebdfaee1283dbb192f2ef50ce83cf/3.6/scripts/addons/cycles/source/kernel/device/cuda/kernel.cu" -o "/home/dattila/.cache/cycles/kernels/cycles_kernel_sm_80_9C3CB7C565EE9EA13D23910D36FB63B9.cubin" -m64 --ptxas-options="-v" --use_fast_math -DNVCC -I"/scratch/tmp/slurm-1856847/sheepit/2b7ebdfaee1283dbb192f2ef50ce83cf/3.6/scripts/addons/cycles/source" -DWITH_NANOVDB
Fra:41 Mem:201.80M (Peak 455.10M) | Time:00:04.00 | Mem:167.39M, Peak:167.39M | Scene.001, ViewLayer | Failed to execute compilation command, see console for details.
Update: The user now installed CUDA 12.1(latest?)... ``` Fra:41 Mem:201.80M (Peak 455.10M) | Time:00:03.86 | Mem:167.39M, Peak:167.39M | Scene.001, ViewLayer | Loading render kernels (may take a few minutes the first time) gcc: error trying to exec 'cc1plus': execvp: No such file or directory nvcc fatal : Failed to preprocess host compiler properties. Failed to execute compilation command, see console for details. Refer to the Cycles GPU rendering documentation for possible solutions: https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html CUDA version 12.1 detected, build may succeed but only CUDA 10.1 to 11.4 are officially supported. Compiling CUDA kernel ... "nvcc" -arch=sm_80 --cubin "/scratch/tmp/slurm-1856847/sheepit/2b7ebdfaee1283dbb192f2ef50ce83cf/3.6/scripts/addons/cycles/source/kernel/device/cuda/kernel.cu" -o "/home/dattila/.cache/cycles/kernels/cycles_kernel_sm_80_9C3CB7C565EE9EA13D23910D36FB63B9.cubin" -m64 --ptxas-options="-v" --use_fast_math -DNVCC -I"/scratch/tmp/slurm-1856847/sheepit/2b7ebdfaee1283dbb192f2ef50ce83cf/3.6/scripts/addons/cycles/source" -DWITH_NANOVDB Fra:41 Mem:201.80M (Peak 455.10M) | Time:00:04.00 | Mem:167.39M, Peak:167.39M | Scene.001, ViewLayer | Failed to execute compilation command, see console for details. ```
Author
Contributor

Another note: we are launching Blender with --factory-startup for every job

Another note: we are launching Blender with `--factory-startup` for every job
Member

This is caused by 7fca0ee76a. The NVIDIA A100 uses compute capability 8.0, which Blender doesn't ship precompiled kernels for. It still worked previously because it fell back to PTX, since compute capability 7.5 PTX can be compiled for it, but the update to 8.9 makes that impossible. @Sergey It's probably fine to leave PTX at 7.5, since the main target-specific optimization pipeline runs during compilation from PTX to bytecode, rather than during the compilation to PTX, plus there are precompiled kernels for all the major compute capabilities (used by consumer cards) anyway.

This is caused by 7fca0ee76a3ba49519075d1833207b8d877cde43. The NVIDIA A100 uses compute capability 8.0, which Blender doesn't ship precompiled kernels for. It still worked previously because it fell back to PTX, since compute capability 7.5 PTX can be compiled for it, but the update to 8.9 makes that impossible. @Sergey It's probably fine to leave PTX at 7.5, since the main target-specific optimization pipeline runs during compilation from PTX to bytecode, rather than during the compilation to PTX, plus there are precompiled kernels for all the major compute capabilities (used by consumer cards) anyway.

This is caused by 7fca0ee76a

I'm considering it confirmed then, but I'm unsure if the bug should be marked high priority even though it's a regression.

> This is caused by 7fca0ee76a I'm considering it confirmed then, but I'm unsure if the bug should be marked high priority even though it's a regression.

@pmoursnv The bump for the PTX kernel was to hopefully ease running older Blender (like LTS) on a modern cards. We had some freezes during the PTX compilation on certain platforms (like #109002). The motivation was that giving the closest possible PTX will help with such cases.

Now, in the context of LTS, not sure what is the best strategy. Perhaps easiest is to revert the change in both main and LTS branches, so that the known regression is resolved. I'm fine with it.

@brecht Are you fine with such change in the 3.6 branch? Or shall we instead add 8.0 compute there?

@pmoursnv The bump for the PTX kernel was to hopefully ease running older Blender (like LTS) on a modern cards. We had some freezes during the PTX compilation on certain platforms (like #109002). The motivation was that giving the closest possible PTX will help with such cases. Now, in the context of LTS, not sure what is the best strategy. Perhaps easiest is to revert the change in both main and LTS branches, so that the known regression is resolved. I'm fine with it. @brecht Are you fine with such change in the 3.6 branch? Or shall we instead add 8.0 compute there?
Author
Contributor

I would be fine with both ways. Keeping or reverting.
I just wanted to clarify if it was an intended change or not.
But it would be nice if Blender could already report the needed toolkit version in the first error message. (CUDA 10.1 to 11.4 instead of the latest)

I would be fine with both ways. Keeping or reverting. I just wanted to clarify if it was an intended change or not. But it would be nice if Blender could already report the needed toolkit version in the first error message. (CUDA 10.1 to 11.4 instead of the latest)

@Raimund58 The change to PTX was intended, the breaking compatibility was not :)

I quickly talked to Brecht, we are both fine reverting to a known good state. I've crated a PR, running tests, etc now. Keep in mind, it will only get into 3.6 with its next LTS update.

@Raimund58 The change to PTX was intended, the breaking compatibility was not :) I quickly talked to Brecht, we are both fine reverting to a known good state. I've crated a PR, running tests, etc now. Keep in mind, it will only get into 3.6 with its next LTS update.
Blender Bot added
Status
Resolved
and removed
Status
Confirmed
labels 2023-07-03 13:56:23 +02:00

I've committed the fix to main and added the commit to the list to be back-ported to 3.6 in there: #109399

I've committed the fix to `main` and added the commit to the list to be back-ported to 3.6 in there: #109399
Author
Contributor

Thank you :)

Thank you :)
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#109550
No description provided.