Master Broken, Cuda kernel compat errors compile, OpenCL GPU just sits there on one tile never rendering anything. #50888

Closed
opened 2017-03-08 17:42:09 +01:00 by James W E Bird · 24 comments

System Information
Win 10, AMD Fire Pro W9100

Blender Version
Master

Errors in compile as cuda compat file gives errors, Commented those out. Got to compile master. Rendering on cpu works, Opencl GPU just sits there on 1 tile and never renders (i left it for 15 mins on one 128x128 tile and nothing, normaly would render the entire thing in 4-5 mins.

compile on release for x64 win 10 through VS

**System Information** Win 10, AMD Fire Pro W9100 **Blender Version** Master Errors in compile as cuda compat file gives errors, Commented those out. Got to compile master. Rendering on cpu works, Opencl GPU just sits there on 1 tile and never renders (i left it for 15 mins on one 128x128 tile and nothing, normaly would render the entire thing in 4-5 mins. compile on release for x64 win 10 through VS
Author

Changed status to: 'Open'

Changed status to: 'Open'
Author

Added subscriber: @JamesBird

Added subscriber: @JamesBird
Author

Added subscriber: @Sergey

Added subscriber: @Sergey

Added subscribers: @MaiLavelle, @nirved-1

Added subscribers: @MaiLavelle, @nirved-1

First of all, please make sure you're using the latest driver from amd.com site (sometimes Windows will override manually installed drivers, screwing things up. And we also were testing Cycles on the very recent drivers since they might contain some crucial fixes).

As for the cuda errors it's not clear for me. What are the exact compat error? Can you attach log file (as a file, inlining logs into comments makes reports difficult to follow).

Adding Mai and Hristo as subscribers. They might have clues / make some tests.

First of all, please make sure you're using the latest driver from amd.com site (sometimes Windows will override manually installed drivers, screwing things up. And we also were testing Cycles on the very recent drivers since they might contain some crucial fixes). As for the cuda errors it's not clear for me. What are the exact compat error? Can you attach log file (as a file, inlining logs into comments makes reports difficult to follow). Adding Mai and Hristo as subscribers. They might have clues / make some tests.
Author

File Kernel_compat_cuda.h was the prolem, Dont have the error log as just commented out the issue and recompiled but the lines from memory that were causing the issue were 57, to 105 (the # Define are OK, can leave them in and still compiles).

More important for me is that rendering with Opencl on GPU now even though in console gives no shader compile error and GPU is being used just sits there and doesn't render anything.

Im using the VERY latest drivers from AMD's website, The Enterprise drivers are most up to date.

File Kernel_compat_cuda.h was the prolem, Dont have the error log as just commented out the issue and recompiled but the lines from memory that were causing the issue were 57, to 105 (the # Define are OK, can leave them in and still compiles). More important for me is that rendering with Opencl on GPU now even though in console gives no shader compile error and GPU is being used just sits there and doesn't render anything. Im using the VERY latest drivers from AMD's website, The Enterprise drivers are most up to date.
Author

Here's a video showing an older build ive done for Opencl GPU with SSS and volumes, Denoise and a few other things not in master, Drop the samples to 5 to make a fast example render.

Master Opecl Issue Video

Then i show loading todays master, Render the same scene with the same 5 samples that should render in a matter of seconds but just sits there rendering nothing even though the GPU fan is going mental. I left the same scene rendering for 15 mins earlier and still nothing (but clearly im not going to let the video sit there watching nothing happen for 15 mins.

I did bring up the Cuda issue to Mai on Blender dev forum not long back, Just like everything on that dev page it got ignored. Code eval on Blender dev page NEEDS someone dedicated to checking .diff and patchs, Good updates just sit there for months going to waste with no chance of getting into master after either you or Brecht take a look.

You cant keep treating code helpers that are not core devs like this and expect people to want to help you guys out.

NEEDS immediate discussion between you core devs and Funding managers to get this resolved.

Here's a video showing an older build ive done for Opencl GPU with SSS and volumes, Denoise and a few other things not in master, Drop the samples to 5 to make a fast example render. [Master Opecl Issue Video ](https://www.youtube.com/watch?v=bwnBlsyebNQ&feature=youtu.be) Then i show loading todays master, Render the same scene with the same 5 samples that should render in a matter of seconds but just sits there rendering nothing even though the GPU fan is going mental. I left the same scene rendering for 15 mins earlier and still nothing (but clearly im not going to let the video sit there watching nothing happen for 15 mins. I did bring up the Cuda issue to Mai on Blender dev forum not long back, Just like everything on that dev page it got ignored. Code eval on Blender dev page NEEDS someone dedicated to checking .diff and patchs, Good updates just sit there for months going to waste with no chance of getting into master after either you or Brecht take a look. You cant keep treating code helpers that are not core devs like this and expect people to want to help you guys out. NEEDS immediate discussion between you core devs and Funding managers to get this resolved.

I am not sure what "blender dev forum" is. We can only address bugs reported here in the bug tracker. If there's something questionable or something which needs developer's attention then proper way to do this is to either inform us in the mailing list or in IRC. Those are the only places we constantly looking at and we don't have time or resources to check more places for possible users' issues.

We surely do extensive tests of all patches which goes to master, but we can't cover all possible compiler/OS/driver/hardware combinations. That being said, [this chart ]] was done on a hardware here in the studio just before OpenCL optimization went to master. Additionally, we were keeping track on performance for months now and all the improvements are really documented in [ https:*docs.google.com/spreadsheets/d/1YC0R06lLDn0pECDDridUTxEZDboAzzyjotZLQmOi3Og/edit#gid=0 | this spreadsheet . Doesn't seem we've put something untested to master. But surely you always run into some unpredictable cases with various OS/driver/hardware configuration.

Just to stress: WX7100 is a similar hardware to W9100. I just re-tested the latest build from builder.blender.org on Windows (it's a windows 7 tho) using 17.Q1 driver and the render time of files from our benchmark files is similar to what i've got on Linux. So there is obviously something particular to your exact SW/HW combination.

Things to test:

  • Get latest build from builder.blender.org, see if the issue happens with builds which are don in controlled environment here.
  • Re-install the AMD driver, even if you think the latest driver is installed. Windows 10 is known for re-installing drivers using ones from Windows Update. This is already causing huge issues with CUDA (CDUA rendering suddenly stops working, re-installing the driver helps) and it's not something we can possibly control from Blender side.
  • Test of the issue happens with files from our benchmark bundle (there might be something particular to your file). You can find the bundle here .

P.S. We don't have managers, we've got handful of developers.

I am not sure what "blender dev forum" is. We can only address bugs reported here in the bug tracker. If there's something questionable or something which needs developer's attention then proper way to do this is to either inform us in the mailing list or in IRC. Those are the only places we constantly looking at and we don't have time or resources to check more places for possible users' issues. We surely do extensive tests of all patches which goes to master, but we can't cover all possible compiler/OS/driver/hardware combinations. That being said, [this chart ]] was done on a hardware here in the studio just before OpenCL optimization went to master. Additionally, we were keeping track on performance for months now and all the improvements are really documented in [[ https:*docs.google.com/spreadsheets/d/1YC0R06lLDn0pECDDridUTxEZDboAzzyjotZLQmOi3Og/edit#gid=0 | this spreadsheet ](https:*wiki.blender.org/uploads/2/29/Blender_278_opencl.png). Doesn't seem we've put something untested to master. But surely you always run into some unpredictable cases with various OS/driver/hardware configuration. Just to stress: WX7100 is a similar hardware to W9100. I just re-tested the latest build from builder.blender.org on Windows (it's a windows 7 tho) using 17.Q1 driver and the render time of files from our benchmark files is similar to what i've got on Linux. So there is obviously something particular to your exact SW/HW combination. Things to test: - Get latest build from builder.blender.org, see if the issue happens with builds which are don in controlled environment here. - Re-install the AMD driver, even if you think the latest driver is installed. Windows 10 is known for re-installing drivers using ones from Windows Update. This is already causing huge issues with CUDA (CDUA rendering suddenly stops working, re-installing the driver helps) and it's not something we can possibly control from Blender side. - Test of the issue happens with files from our benchmark bundle (there might be something particular to your file). You can find the bundle [here ](https://code.blender.org/2016/02/new-cycles-benchmark/). P.S. We don't have managers, we've got handful of developers.
Author

@Sergey Sharybin (sergey), Dont get me wrong you guys are doing a great job with limited resources, Im not having a go at you guys as such.

I understand you heavily test code that your adding to master, What i meant was all the .iff and patch's that are sitting here on developer.blender.org.

There's some really nice features Like new procedural noise nodes, Vector displacement, Compositir SMAA Node, SSS and volumes for Opencl, 30-50% speed up patch for Opencl but they've been sitting on here for so long that especially now the new split kernel has been added will be useless without a complete rewrite.

Maybe You devs who have contact with the blender team that sponsor the code devs could have a word about next hiring someone who's main job is to test as soon as possible all the new .diff and patchs from non core devs that get posted here, so to get into a good enough working state to be able to pass to guys like you and brecht to push to master quicker without you core devs that have no time getting swamped.

I dont see how reinstalling my drivers will make any change, All my older branch's of Blender using opencl on GPU still work, So does my custom AMD fire rays Opencl GPU renderer. Just current master after split kernel has broken. But ill try uninstalling the drivers completely and reinstalling just to help test.

Keep up the good work, We do appreciate how hard you guys try.

@Sergey Sharybin (sergey), Dont get me wrong you guys are doing a great job with limited resources, Im not having a go at you guys as such. I understand you heavily test code that your adding to master, What i meant was all the .iff and patch's that are sitting here on developer.blender.org. There's some really nice features Like new procedural noise nodes, Vector displacement, Compositir SMAA Node, SSS and volumes for Opencl, 30-50% speed up patch for Opencl but they've been sitting on here for so long that especially now the new split kernel has been added will be useless without a complete rewrite. Maybe You devs who have contact with the blender team that sponsor the code devs could have a word about next hiring someone who's main job is to test as soon as possible all the new .diff and patchs from non core devs that get posted here, so to get into a good enough working state to be able to pass to guys like you and brecht to push to master quicker without you core devs that have no time getting swamped. I dont see how reinstalling my drivers will make any change, All my older branch's of Blender using opencl on GPU still work, So does my custom AMD fire rays Opencl GPU renderer. Just current master after split kernel has broken. But ill try uninstalling the drivers completely and reinstalling just to help test. Keep up the good work, We do appreciate how hard you guys try.

OpenCL drivers are VERY fragile, especially the ones from AMD. We've changed a way how we communicate with the split kernels running on GPU in order to reduce latency, improve occupancy and things like that. Comparing new code to old one is not fair at all. Additionally, there was a known bug related on reading state buffers which was causing infinite loops. So just to eliminate possibility of buggy driver interference reinstall it. It's really not that hard and will reduce number of variables to be checked here.

We can only fix issues which we can reproduce, and so far the similar setup works just fine here. Freshly installed driver might make a huge difference here.

OpenCL drivers are VERY fragile, especially the ones from AMD. We've changed a way how we communicate with the split kernels running on GPU in order to reduce latency, improve occupancy and things like that. Comparing new code to old one is not fair at all. Additionally, there was a known bug related on reading state buffers which was causing infinite loops. So just to eliminate possibility of buggy driver interference reinstall it. It's really not that hard and will reduce number of variables to be checked here. We can only fix issues which we can reproduce, and so far the similar setup works just fine here. Freshly installed driver might make a huge difference here.
Author

@Sergey Sharybin (sergey), I unistalled all AMD software and reinstalled the latest W9100 Enterprise drivers from AMD and still does the same thing, Just sits there on the first tile never rendering anything.

Any Debug command you want me to use?

@Sergey Sharybin (sergey), I unistalled all AMD software and reinstalled the latest W9100 Enterprise drivers from AMD and still does the same thing, Just sits there on the first tile never rendering anything. Any Debug command you want me to use?
Member

Added subscriber: @LazyDodo

Added subscriber: @LazyDodo
Member

Does it do this on all files? or just a specific .blend? (if so please attach a repro case to this post) can you reproduce with a nightly build from builder.blender.org?

Does it do this on all files? or just a specific .blend? (if so please attach a repro case to this post) can you reproduce with a nightly build from builder.blender.org?

@JamesBird Please also share the git hash (first 8 letters) of the master which you use.

@JamesBird Please also share the git hash (first 8 letters) of the master which you use.
Author

Ive just done a git pull from todays master, Im compiling now. When done ill test again and post the info.

It does this on all scene's with yesterdays master, Were see what happens with todays when compiled. Cheers for the help guys.

Ive just done a git pull from todays master, Im compiling now. When done ill test again and post the info. It does this on all scene's with yesterdays master, Were see what happens with todays when compiled. Cheers for the help guys.
Author

Just compiled todays master and still does the same, Just hangs on the first tile.

Hash: 9de9f25

Just compiled todays master and still does the same, Just hangs on the first tile. Hash: 9de9f25
Member

hate to sound like a broken record but:

  1. can you try with the latest from builder.blender.org
  2. is this with all files, if not please attach the problematic file to this report
hate to sound like a broken record but: 1) can you try with the latest from builder.blender.org 2) is this with all files, if not please attach the problematic file to this report
Author

Hahaha, No worries LazyDodo, Im here to try to help like your trying to help me.

I download blender-2.78-9de9f25-win64.zip the latest official build for win64 today from the builder page (not even the VS2015 experimental version) and it does exactly the same thing, Just sits there hanging on the first tile.

This is with all scene's ive tried.

I just tried with default scene, pland and cube and does the same, just never renders anything.

Hahaha, No worries LazyDodo, Im here to try to help like your trying to help me. I download blender-2.78-9de9f25-win64.zip the latest official build for win64 today from the builder page (not even the VS2015 experimental version) and it does exactly the same thing, Just sits there hanging on the first tile. This is with all scene's ive tried. I just tried with default scene, pland and cube and does the same, just never renders anything.
Member

Please run blender with the --debug-cycles option and attach the full output after the tile gets stuck. Also run clinfo and attach the output of that aswell.

Please run blender with the --debug-cycles option and attach the full output after the tile gets stuck. Also run clinfo and attach the output of that aswell.
Author

Hi Mai, Thanks for the help.

Cliinfo.txt

CyclesDebugInfo.txt

The dubug info is everything up untill the tile renders, There is no other debug info shown once the tile is in the locked state.

Hope this helps.

P.S, I noticed only 12 GB of my 16 GB GDDR5 is being evaluated, I tried an old method in environment variables to force 100% memory use but doesnt seem to of worked, Do you know what the exact commands are i need to add to environment variables to force to use the full 16GB.

Hi Mai, Thanks for the help. [Cliinfo.txt](https://archive.blender.org/developer/F506572/Cliinfo.txt) [CyclesDebugInfo.txt](https://archive.blender.org/developer/F506573/CyclesDebugInfo.txt) The dubug info is everything up untill the tile renders, There is no other debug info shown once the tile is in the locked state. Hope this helps. P.S, I noticed only 12 GB of my 16 GB GDDR5 is being evaluated, I tried an old method in environment variables to force 100% memory use but doesnt seem to of worked, Do you know what the exact commands are i need to add to environment variables to force to use the full 16GB.
Mai Lavelle self-assigned this 2017-03-10 23:44:40 +01:00
Member

@JamesBird, thanks for the logs, very interesting results. Will try to get this fixed quickly.

Also your clinfo looks normal, no need to change environment, the full memory is available to you.

@JamesBird, thanks for the logs, very interesting results. Will try to get this fixed quickly. Also your clinfo looks normal, no need to change environment, the full memory is available to you.

This issue was referenced by blender/cycles@ea2a4e9567

This issue was referenced by blender/cycles@ea2a4e9567542cd8928f8f8da70a11334f08862e

This issue was referenced by 96868a3941

This issue was referenced by 96868a39419f1c9a8962c56e02480fabbf1e5156
Member

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#50888
No description provided.