Cuda error illegal address when rendering more than 5-10 frames of an animation #49956

Closed
opened 2016-11-07 04:09:45 +01:00 by Aaron · 17 comments

This is my first bug report, but I will try my best to include all relevant information.

I'm geting a Cuda error: illegal address in cuCTxSynchronize() when I try to render an animation. I would like to let my computer run overnight, but I can't seem to get more than about 10 frames done before I get this error. It happens on version 2.78 and 2.78a as well as the latest daily build I tried today 11.6.16. One thing to note, with the daily build, I didn't actually get the cuda error, blender just crashed.

version 2.77a runs just fine with no problems rendering the same animation.

My system info: OS X 10.10.5, and Windows 10. CPU i7 4770k 3.5ghz (3.9ghz turbo boost), Motherboard z87x ud5th, Graphics cards 2x gtx 780ti . I have the latest NVIDIA drivers NVIDIA Web Driver 346.02.03f08 (for MAC OS X 10.10.5) and 375.70 (Windows 10) . I'm using my cpu's internal graphics hd4600 iris for my OS so that my 780tis are dedicated to rendering.

Obviously I'm rendering on GPU to get the cuda error.

I've tried rendering a couple different animations to see if it was something specific to my scene, but I got the same results. I also have windows 10 on my machine, but I have not tried rendering on that yet. I will try that next and update.

I hope someone can fix this. The improved vram usage on 2.78 is amazing! I have a scene where I take advantage of that, but with this bug, It's very difficult to render. And going back to 2.77 uses too much Vram for my GPU :-( .

Thank you

  • Aaron

update: I spent some time trying it in windows 10 and I am having the same issues. Only in 2.78 and 2.78a though. 2.77a works just fine in windows 10 on my system.

update: I noticed what looks like a crash report in my /tmp/ folder. its a .txt file. Here are the contents:

Blender 2.78 (sub 0), Commit date: 2016-10-24 12:20, Hash e8299c8

Read library: '/Users/GeeztownProductionsHpro/Library/Application Support/Blender/2.78/scripts/addons/sceneterrain/lib.blend', '//../../../../../../../../../Users/GeeztownProductionsHpro/Library/Application Support/Blender/2.78/scripts/addons/sceneterrain/lib.blend', parent '' # Info

backtrace

0 blender 0x0000000100bf076a BLI_system_backtrace + 58
1 blender 0x000000010015731a sig_handle_crash + 362
2 libsystem_platform.dylib 0x00007fff9160df1a _sigtramp + 26
3 ??? 0x000000000003a2bb 0x0 + 238267
4 blender 0x00000001010a901b _ZN3ccl14BlenderSession20builtin_image_pixelsERKSsPvPh + 283
5 blender 0x00000001010ea85c _ZN3ccl12ImageManager20file_load_byte_imageIhEEbPNS0_5ImageENS0_13ImageDataTypeERNS_13device_vectorIT_EE + 476
6 blender 0x00000001010e61c8 _ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_11DeviceSceneENS0_13ImageDataTypeEiPNS_8ProgressE + 2232
7 blender 0x000000010236d12b _ZN3ccl13TaskScheduler10thread_runEi + 75
8 blender 0x000000010236ef6c _ZN3ccl6thread3runEPv + 28
9 libsystem_pthread.dylib 0x00007fff95bef05a _pthread_body + 131
10 libsystem_pthread.dylib 0x00007fff95beefd7 _pthread_body + 0
11 libsystem_pthread.dylib 0x00007fff95bec3ed thread_start + 13

Update: Here is a .blend file smoke test 4.blend

Steps needed to reproduce: Open file. Make sure GPU is selected for rendering. You may need to change the location the rendered frames will be stored. Click "Animation" to start rendering. I have been getting the error within the first 10 frames or so rendered.

This is my first bug report, but I will try my best to include all relevant information. I'm geting a Cuda error: illegal address in cuCTxSynchronize() when I try to render an animation. I would like to let my computer run overnight, but I can't seem to get more than about 10 frames done before I get this error. It happens on version 2.78 and 2.78a as well as the latest daily build I tried today 11.6.16. One thing to note, with the daily build, I didn't actually get the cuda error, blender just crashed. version 2.77a runs just fine with no problems rendering the same animation. My system info: OS X 10.10.5, and Windows 10. CPU i7 4770k 3.5ghz (3.9ghz turbo boost), Motherboard z87x ud5th, Graphics cards 2x gtx 780ti . I have the latest NVIDIA drivers NVIDIA Web Driver 346.02.03f08 (for MAC OS X 10.10.5) and 375.70 (Windows 10) . I'm using my cpu's internal graphics hd4600 iris for my OS so that my 780tis are dedicated to rendering. Obviously I'm rendering on GPU to get the cuda error. I've tried rendering a couple different animations to see if it was something specific to my scene, but I got the same results. I also have windows 10 on my machine, but I have not tried rendering on that yet. I will try that next and update. I hope someone can fix this. The improved vram usage on 2.78 is amazing! I have a scene where I take advantage of that, but with this bug, It's very difficult to render. And going back to 2.77 uses too much Vram for my GPU :-( . Thank you - Aaron update: I spent some time trying it in windows 10 and I am having the same issues. Only in 2.78 and 2.78a though. 2.77a works just fine in windows 10 on my system. update: I noticed what looks like a crash report in my /tmp/ folder. its a .txt file. Here are the contents: # Blender 2.78 (sub 0), Commit date: 2016-10-24 12:20, Hash e8299c8 Read library: '/Users/GeeztownProductionsHpro/Library/Application Support/Blender/2.78/scripts/addons/sceneterrain/lib.blend', '//../../../../../../../../../Users/GeeztownProductionsHpro/Library/Application Support/Blender/2.78/scripts/addons/sceneterrain/lib.blend', parent '<direct>' # Info # backtrace 0 blender 0x0000000100bf076a BLI_system_backtrace + 58 1 blender 0x000000010015731a sig_handle_crash + 362 2 libsystem_platform.dylib 0x00007fff9160df1a _sigtramp + 26 3 ??? 0x000000000003a2bb 0x0 + 238267 4 blender 0x00000001010a901b _ZN3ccl14BlenderSession20builtin_image_pixelsERKSsPvPh + 283 5 blender 0x00000001010ea85c _ZN3ccl12ImageManager20file_load_byte_imageIhEEbPNS0_5ImageENS0_13ImageDataTypeERNS_13device_vectorIT_EE + 476 6 blender 0x00000001010e61c8 _ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_11DeviceSceneENS0_13ImageDataTypeEiPNS_8ProgressE + 2232 7 blender 0x000000010236d12b _ZN3ccl13TaskScheduler10thread_runEi + 75 8 blender 0x000000010236ef6c _ZN3ccl6thread3runEPv + 28 9 libsystem_pthread.dylib 0x00007fff95bef05a _pthread_body + 131 10 libsystem_pthread.dylib 0x00007fff95beefd7 _pthread_body + 0 11 libsystem_pthread.dylib 0x00007fff95bec3ed thread_start + 13 Update: Here is a .blend file [smoke test 4.blend](https://archive.blender.org/developer/F395348/smoke_test_4.blend) Steps needed to reproduce: Open file. Make sure GPU is selected for rendering. You may need to change the location the rendered frames will be stored. Click "Animation" to start rendering. I have been getting the error within the first 10 frames or so rendered.
Author

Changed status to: 'Open'

Changed status to: 'Open'
Author

Added subscriber: @Geeztown

Added subscriber: @Geeztown

Added subscriber: @Sergey

Added subscriber: @Sergey

We are using GPU quite a lot here in the studio, and do not experience issue. Might be something specific to settings you're using.

So please, always follow bug report guidelines and attach everything requested in there (smallest possible .blend file and exact steps reproducing the problem are the most crucial ones). This helps us to eliminate variables affecting on the issue and do more efficient troubleshooting.

What i'm also not sure about is your note about latest driver 346.02. It is not the latest one on Windows. So please check if the issue happens with driver version 375.70.

We are using GPU quite a lot here in the studio, and do not experience issue. Might be something specific to settings you're using. So please, always follow bug report guidelines and attach everything requested in there (smallest possible .blend file and exact steps reproducing the problem are the most crucial ones). This helps us to eliminate variables affecting on the issue and do more efficient troubleshooting. What i'm also not sure about is your note about latest driver 346.02. It is not the latest one on Windows. So please check if the issue happens with driver version 375.70.
Author

Sergey,

Thank you for your response. I apologize for not following the guidelines, for some reason I was not able to find them when I first created this bug report.

I added a .blend file with steps needed to reproduce the error. I also updated the system and driver information listed.

NVIDIA driver 346.02 is on mac OS X 10.10.5. I am in fact using the 375.70 driver on
Windows 10 though. I made sure it was up to date before I tested it.

Please let me know if there is anything else you need.

Thanks

  • Aaron
Sergey, Thank you for your response. I apologize for not following the guidelines, for some reason I was not able to find them when I first created this bug report. I added a .blend file with steps needed to reproduce the error. I also updated the system and driver information listed. NVIDIA driver 346.02 is on mac OS X 10.10.5. I am in fact using the 375.70 driver on Windows 10 though. I made sure it was up to date before I tested it. Please let me know if there is anything else you need. Thanks - Aaron
Author

One other thing I just noticed. I'm using the classified edition of both of my 780ti cards. I don't know if that makes a difference.

I've been rendering fine on blender 2.77a, but recently I've been experimenting with overclocking my gpus . The classified edition gives you a switch on the card to switch between two different bios. So I flashed a different bios into my secondary bios on both cards to boost my clock speed a bit more. Now I'm getting the cuda error: illegal address in 2.77a. And I can't even render 1 frame like that, at least not the more complicated scene I'm working on. The Blenchmark benchmark addon renders just fine, and in 33 seconds!. But not I'm getting the cuda error when I try to render the animation I've been working on.

I also got cuda error: misaligned address in cuCtxSynchronize .

If I switch back to my primary stock bios, everything is fine again. And I'm still getting the same issue in 2.78a as well.

However, I am also able to render the Blenchmark benchmark test in 2.78a. Just not my more complex scenes it seems.

I wonder if this is related to GPU clock speeds or the stability of my overclock? But if it is, why would it be rendering fine in 2.77a but not 2.78a?

Let me know what you make of this.

Thanks

One other thing I just noticed. I'm using the classified edition of both of my 780ti cards. I don't know if that makes a difference. I've been rendering fine on blender 2.77a, but recently I've been experimenting with overclocking my gpus . The classified edition gives you a switch on the card to switch between two different bios. So I flashed a different bios into my secondary bios on both cards to boost my clock speed a bit more. Now I'm getting the cuda error: illegal address in 2.77a. And I can't even render 1 frame like that, at least not the more complicated scene I'm working on. The Blenchmark benchmark addon renders just fine, and in 33 seconds!. But not I'm getting the cuda error when I try to render the animation I've been working on. I also got cuda error: misaligned address in cuCtxSynchronize . If I switch back to my primary stock bios, everything is fine again. And I'm still getting the same issue in 2.78a as well. However, I am also able to render the Blenchmark benchmark test in 2.78a. Just not my more complex scenes it seems. I wonder if this is related to GPU clock speeds or the stability of my overclock? But if it is, why would it be rendering fine in 2.77a but not 2.78a? Let me know what you make of this. Thanks
Author

This comment was removed by @Geeztown

*This comment was removed by @Geeztown*

Added subscriber: @jaggz

Added subscriber: @jaggz

Not reproduced in Win10, 2.78a, with a GeForce GTX 560 Ti.

Not reproduced in Win10, 2.78a, with a GeForce GTX 560 Ti.
Member

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger
Member

cuCtxSynchronize just reports the error caused by the kernel invocation just above it.

Nvidia describes this error as:

While executing a kernel, the device encountered a load or store instruction on an invalid memory address. The context cannot be used, so it must be destroyed (and a new one should be created). All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA.

Could this be caused by overclocking .. most certainly. Ill look for a sm_35 card to test this. That this is working fine on non sm_35 does help us but it does not rule out a bug in the sm_35 kernel itself. Please do retest with cards running at a more conservative speed

cuCtxSynchronize just reports the error caused by the kernel invocation just above it. Nvidia describes this error as: > While executing a kernel, the device encountered a load or store instruction on an invalid memory address. The context cannot be used, so it must be destroyed (and a new one should be created). All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA. Could this be caused by overclocking .. most certainly. Ill look for a sm_35 card to test this. That this is working fine on non sm_35 does help us but it does not rule out a bug in the sm_35 kernel itself. Please do retest with cards running at a more conservative speed

Added subscriber: @mont29

Added subscriber: @mont29

@MartijnBerger any news? Otherwise would consider closing the report…

@MartijnBerger any news? Otherwise would consider closing the report…
Author

Sorry, I've just been busy lately and had relatives visiting from out of town for Thanksgiving. I will test this more this weekend.

I have been letting my computer render an animation for days on end with no issue in 2.77a. The 780tis that I have are the EVGA "classified" version which have a higher stock clock speed than the standard 780ti. I will try flashing a bios with a more conservative speed to my secondary bios on the cards and see if that makes any difference.

Sorry, I've just been busy lately and had relatives visiting from out of town for Thanksgiving. I will test this more this weekend. I have been letting my computer render an animation for days on end with no issue in 2.77a. The 780tis that I have are the EVGA "classified" version which have a higher stock clock speed than the standard 780ti. I will try flashing a bios with a more conservative speed to my secondary bios on the cards and see if that makes any difference.
Author

Update: Problem Solved!!!

I flashed a bios for a standard gtx 780ti with a lower clock speed (875mhz bass clock, which is stock on the standard version of it) and everything seems to work just fine in 2.78a. I let it render an animation for a full day and it never crashed.

It does seem odd that I was only having this problem in 2.78 though. Maybe 2.78 has a lower tolerance for errors? I don't know. For now, the benefits outweigh the costs of overclocking. I don't think I'll be wasting money on any overclocked versions of cards in the future.

Thanks

  • Aaron
Update: Problem Solved!!! I flashed a bios for a standard gtx 780ti with a lower clock speed (875mhz bass clock, which is stock on the standard version of it) and everything seems to work just fine in 2.78a. I let it render an animation for a full day and it never crashed. It does seem odd that I was only having this problem in 2.78 though. Maybe 2.78 has a lower tolerance for errors? I don't know. For now, the benefits outweigh the costs of overclocking. I don't think I'll be wasting money on any overclocked versions of cards in the future. Thanks - Aaron

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'
Bastien Montagne self-assigned this 2016-12-04 12:34:57 +01:00

So, was indeed a hardware issue in the end. :)

So, was indeed a hardware issue in the end. :)
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#49956
No description provided.