Cycles: NVidia GTX 980 Ti rendering at 1/3rd speed of NVidia GTX 980 the same machine? #45093

Closed
opened 8 years ago by AdmiralPotato · 306 comments

System Information
Win 8.1 x64, i7 5820K, 16GB ram
3X GTX 980 Ti

Blender Version
Broken: 2.74, 2.75RC1
Worked: (not yet)

Short description of error
I recently upgraded my primary workstation from 3x GTX 980 graphics cards to 3x GTX 980 Ti cards - but the performance of the new cards is actually about 1/3rd of the performance of the original 3x 980s. What's strange though - these cards score very well on all CUDA and OpenCL benchmarks I can find, but in Blender, these 3 cards working together actually sometimes underperform my single GTX 680. What's strangest is that these cards do show a tiny performance boost on the Blender 2.7 BMW benchmark scene over the previous hardware configuration, but in every scene which I design, they're performing much, much slower. There does seem to be a small correlation between which shaders are used in the scene, as some shaders seem to be effected worse that others. Ambient Occlusion for instance, is hardly effected at all relative to the 980s.

I have tested the new cards with the Blender 2.74, 2.75 RC1, and a few nightly builds since then, and this issue seems to effect all current versions.

Exact steps for others to reproduce the error
Using an NVidia GTX 980 Ti, render the following scene to see dramatically hindered performance relative to the hardware potential of the card.

materials_perf_test-blender_275rc1-gtx_980_ti_speed-comparison.gif
system-info.txt
bmps_2015_perftest.blend

**System Information** Win 8.1 x64, i7 5820K, 16GB ram 3X GTX 980 Ti **Blender Version** Broken: 2.74, 2.75RC1 Worked: (not yet) **Short description of error** I recently upgraded my primary workstation from 3x GTX 980 graphics cards to 3x GTX 980 Ti cards - but the performance of the new cards is actually about 1/3rd of the performance of the original 3x 980s. What's strange though - these cards score very well on all CUDA and OpenCL benchmarks I can find, but in Blender, these 3 cards working together actually *sometimes underperform my single GTX 680*. What's strangest is that these cards do show a tiny performance boost on the Blender 2.7 BMW benchmark scene over the previous hardware configuration, but in every scene which I design, they're performing much, much slower. There does seem to be a small correlation between which shaders are used in the scene, as some shaders seem to be effected worse that others. Ambient Occlusion for instance, is hardly effected at all relative to the 980s. I have tested the new cards with the Blender 2.74, 2.75 RC1, and a few nightly builds since then, and this issue seems to effect all current versions. **Exact steps for others to reproduce the error** Using an NVidia GTX 980 Ti, render the following scene to see dramatically hindered performance relative to the hardware potential of the card. ![materials_perf_test-blender_275rc1-gtx_980_ti_speed-comparison.gif](https://archive.blender.org/developer/F192695/materials_perf_test-blender_275rc1-gtx_980_ti_speed-comparison.gif) [system-info.txt](https://archive.blender.org/developer/F192697/system-info.txt) [bmps_2015_perftest.blend](https://archive.blender.org/developer/F192698/bmps_2015_perftest.blend)
Poster

Changed status to: 'Open'

Changed status to: 'Open'
Poster

Added subscriber: @AdmiralPotato

Added subscriber: @AdmiralPotato
Owner

#47877 was marked as duplicate of this issue

#47877 was marked as duplicate of this issue
Owner

#47808 was marked as duplicate of this issue

#47808 was marked as duplicate of this issue
Poster

In case the image with the performance stats doesn't load or display, I've added the same animation which can be scrubbed through in Gyfcat:

http://gfycat.com/HeavenlySmartIberianbarbel

In case the image with the performance stats doesn't load or display, I've added the same animation which can be scrubbed through in Gyfcat: http://gfycat.com/HeavenlySmartIberianbarbel

Added subscriber: @mib2berlin

Added subscriber: @mib2berlin

Hi, to make sure it is a Cycles problem you can test/compare with the Octane benchmark.
There are many GTX 980/Ti user share there results.

http://render.otoy.com/octanebench/

The cards have different core, GTX 980Ti GM200, GTX 980 GM204.

Cheers, mib

Hi, to make sure it is a Cycles problem you can test/compare with the Octane benchmark. There are many GTX 980/Ti user share there results. http://render.otoy.com/octanebench/ The cards have different core, GTX 980Ti GM200, GTX 980 GM204. Cheers, mib
Sergey commented 8 years ago
Owner

Added subscriber: @Sergey

Added subscriber: @Sergey
Sergey commented 8 years ago
Owner

First of all, make sure you're using latest drivers. Second of all, make sure you don't have any technology like SLI enabled (it has negative effect on Cycles).

Other than that, does using individual card results in bad performance? Does using a single Ti card plugged into a PCI-E slot also have bad performance? Also, are the PCI-E slots you're using the same speed?

First of all, make sure you're using latest drivers. Second of all, make sure you don't have any technology like SLI enabled (it has negative effect on Cycles). Other than that, does using individual card results in bad performance? Does using a single Ti card plugged into a PCI-E slot also have bad performance? Also, are the PCI-E slots you're using the same speed?
Poster

@mib2berlin: I ran the tests you requested, and ranked within the top results on the Octane benchmark page, and while the result set is about 1/3rd as large, my results are looking pretty good on the the LuxMark page as well. Unfortunately, it looks like this is just a Cycles issue at the moment. :/

Octane Result: http://render.otoy.com/octanebench/summary_detail_item.php?systemID=3x+GTX+980+Ti
Octane List: http://render.otoy.com/octanebench/results.php?sort_by=avg&filter=980%20ti

LuxMark LuxBall Result: http://luxmark.info/node/613
LuxMark LuxBall List: http://luxmark.info/top_results/LuxBall%20HDR/OpenCL/GPU

LuxMark Microphone Result: http://luxmark.info/node/614
LuxMark Microphone List: http://luxmark.info/top_results/Microphone/OpenCL/GPU

LuxMark Hotel Result: http://luxmark.info/node/615
LuxMark Hotel List: http://luxmark.info/top_results/Hotel/OpenCL/GPU

@mib2berlin: I ran the tests you requested, and ranked within the top results on the Octane benchmark page, and while the result set is about 1/3rd as large, my results are looking pretty good on the the LuxMark page as well. Unfortunately, it looks like this is just a Cycles issue at the moment. :/ Octane Result: http://render.otoy.com/octanebench/summary_detail_item.php?systemID=3x+GTX+980+Ti Octane List: http://render.otoy.com/octanebench/results.php?sort_by=avg&filter=980%20ti LuxMark LuxBall Result: http://luxmark.info/node/613 LuxMark LuxBall List: http://luxmark.info/top_results/LuxBall%20HDR/OpenCL/GPU LuxMark Microphone Result: http://luxmark.info/node/614 LuxMark Microphone List: http://luxmark.info/top_results/Microphone/OpenCL/GPU LuxMark Hotel Result: http://luxmark.info/node/615 LuxMark Hotel List: http://luxmark.info/top_results/Hotel/OpenCL/GPU
Poster

@Sergey: I appreciate you looking into this issue. These are all things which I had learned about the hard the hard way the first time I built a multi-card system. To verify the configuration of this workstation though:

Latest drivers:
Yes. NVidia driver v 353.06.

Make sure you don't have any technology like SLI enabled:
Disabled. SLI was not enabled at any time through this testing.

Does using individual card results in bad performance?
Yes. On the same test scene, it produces about 1/3rd the performance of a single 980 card.

Does using a single Ti card plugged into a PCI-E slot also have bad performance?
Yes.

Also, are the PCI-E slots you're using the same speed?
As far as I am aware, yes. The Motherboard is an MSI X99S SLI Plus: http://www.newegg.com/Product/Product.aspx?Item=N82E16813130796

Is there any additional testing or information gathering which I can do that can be helpful in addressing this issue?

@Sergey: I appreciate you looking into this issue. These are all things which I had learned about the hard the hard way the first time I built a multi-card system. To verify the configuration of this workstation though: Latest drivers: Yes. NVidia driver v 353.06. Make sure you don't have any technology like SLI enabled: Disabled. SLI was not enabled at any time through this testing. Does using individual card results in bad performance? Yes. On the same test scene, it produces about 1/3rd the performance of a single 980 card. Does using a single Ti card plugged into a PCI-E slot also have bad performance? Yes. Also, are the PCI-E slots you're using the same speed? As far as I am aware, yes. The Motherboard is an MSI X99S SLI Plus: http://www.newegg.com/Product/Product.aspx?Item=N82E16813130796 Is there any additional testing or information gathering which I can do that can be helpful in addressing this issue?
Sergey commented 8 years ago
Owner

There now seems no obvious hardware issue involved here, but that makes the issue rather impossible to solve without a developer having access to such a card (which we don't have currently).

So i'll consider the issue a TODO and archive for until we've got developer with such a card. Meanwhile you might want to check if someone on blenderartists forum had similar problem and maybe found a workaround (or maybe there are even folks there who doesn't have such performance problems).

Thanks for the report anyway!

There now seems no obvious hardware issue involved here, but that makes the issue rather impossible to solve without a developer having access to such a card (which we don't have currently). So i'll consider the issue a TODO and archive for until we've got developer with such a card. Meanwhile you might want to check if someone on blenderartists forum had similar problem and maybe found a workaround (or maybe there are even folks there who doesn't have such performance problems). Thanks for the report anyway!
Sergey commented 8 years ago
Owner

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'
Sergey closed this issue 8 years ago
Sergey self-assigned this 8 years ago
Poster

Okay, quick status update on what I have found with these cards so far. My other cards seem to perform well with render tile sizes ranging from 240x160 to 240x240, 320x160, to even 320x240 and 320x320. The GTX 980 Ti cards perform extremely poorly at all of those sizes. However, when I set the tile size to 160X120, they behave about as expected - nice and fast. Any larger tile size, or any smaller tile size than that though, and their performance tanks again. Their performance in the interactive viewport is painfully slow as well. I wonder why shrinking the tile size to 160x120 gets me such a performance boost on the GTX 980 Ti cards? Anyone have any ideas?

Okay, quick status update on what I have found with these cards so far. My other cards seem to perform well with render tile sizes ranging from 240x160 to 240x240, 320x160, to even 320x240 and 320x320. The GTX 980 Ti cards perform extremely poorly at all of those sizes. However, when I set the tile size to 160X120, they behave about as expected - nice and fast. Any larger tile size, or any smaller tile size than that though, and their performance tanks again. Their performance in the interactive viewport is painfully slow as well. I wonder why shrinking the tile size to 160x120 gets me such a performance boost on the GTX 980 Ti cards? Anyone have any ideas?

Added subscriber: @heavypoly

Added subscriber: @heavypoly

Just checking in to say I'm having the same experience with my MSI 980ti with 353.30 drivers.

I also have a 670 which outperforms the 980ti in scenes with SSS.

without SSS the 980ti will be even or slightly beat the 670 at 256x256 tile size. Also finding the 160x120 tile size to greatly help the speed of the 980ti.

Just checking in to say I'm having the same experience with my MSI 980ti with 353.30 drivers. I also have a 670 which outperforms the 980ti in scenes with SSS. without SSS the 980ti will be even or slightly beat the 670 at 256x256 tile size. Also finding the 160x120 tile size to greatly help the speed of the 980ti.

Added subscriber: @NiallEarley

Added subscriber: @NiallEarley

I am also encountering issues with 980ti cards. SLI disabled and latest drivers, using 2 cards the performance is extremely slow and causes blender to hang between samples. There is however acceptable performance with only 1 card enabled. Which is a difference between mine and Admiral Potato's experience. 160x120 tile size helps the speed as with above.

I am also encountering issues with 980ti cards. SLI disabled and latest drivers, using 2 cards the performance is extremely slow and causes blender to hang between samples. There is however acceptable performance with only 1 card enabled. Which is a difference between mine and Admiral Potato's experience. 160x120 tile size helps the speed as with above.
Bombee commented 7 years ago

Added subscriber: @Bombee

Added subscriber: @Bombee
Bombee commented 7 years ago

I have same issue with EVGA GTX 980ti.
My system is Windows 7 x64, with AMD X4 740 quad core 3,2 and EVGA GTX 980ti with latest driver 358.50 widows 7 from evga.com/drivers.

Form me any size (even 160x120) not helps for render speed :(

It is bug in blender or something else?

Anybody find some solution for 980ti cards?

I have same issue with EVGA GTX 980ti. My system is Windows 7 x64, with AMD X4 740 quad core 3,2 and EVGA GTX 980ti with latest driver 358.50 widows 7 from evga.com/drivers. Form me any size (even 160x120) not helps for render speed :( It is bug in blender or something else? Anybody find some solution for 980ti cards?

Added subscriber: @einstein

Added subscriber: @einstein

ALL below comments are with experimental rendering enabled ...,

I would just like to chime in on this I have been running 2x msi gtx 580 lightnings and just got my hands on a msi 980 ti but noticed that the sss on my character was rendering kinda slow so I put the older 580 back in the case alongside the 980 and started comparing.
Besides the tile size mentions above helping I found that enabling the " branched path " mode and progressive render of the whole frame at once blows the older cards of the map. 2x faster than the 580 and enable any tile size less than the whole frame is slower than full frame render. BUT for some reason it loves to crash the Nvidia driver when the sample count gets to high...
So, apparently the render speed has something to do with the way the scene is being fed to the gpu or the way the gpu is rendering the scene from the vram. Either way it would be great to see the crashes fixed and blazing render speeds!

ALL below comments are with experimental rendering enabled ..., I would just like to chime in on this I have been running 2x msi gtx 580 lightnings and just got my hands on a msi 980 ti but noticed that the sss on my character was rendering kinda slow so I put the older 580 back in the case alongside the 980 and started comparing. Besides the tile size mentions above helping I found that enabling the " branched path " mode and progressive render of the whole frame at once blows the older cards of the map. 2x faster than the 580 and enable any tile size less than the whole frame is slower than full frame render. BUT for some reason it loves to crash the Nvidia driver when the sample count gets to high... So, apparently the render speed has something to do with the way the scene is being fed to the gpu or the way the gpu is rendering the scene from the vram. Either way it would be great to see the crashes fixed and blazing render speeds!
Sergey commented 7 years ago
Owner

Enabling progressive refine will lead to slowdown in any settings combination. The scene is sent to all cards individually, so unless you've got SLI there should be no correlation between cards at all.

It's unclear which crashes you're mentioning, but guess it's the "Driver was reset" messages. This is a limitation of cards used for both display and compute. Driver will limit the execution time per-kernel for such devices, additionally Windows has some extra time limits enabled, which checks how often screen is actually refreshing. There's nothing we can fix here, just use lower tile size or use cheaper dedicated card for display.

Enabling progressive refine will lead to slowdown in any settings combination. The scene is sent to all cards individually, so unless you've got SLI there should be no correlation between cards at all. It's unclear which crashes you're mentioning, but guess it's the "Driver was reset" messages. This is a limitation of cards used for both display and compute. Driver will limit the execution time per-kernel for such devices, additionally Windows has some extra time limits enabled, which checks how often screen is actually refreshing. There's nothing we can fix here, just use lower tile size or use cheaper dedicated card for display.

Yes the crashes I mentioned were simply win7 dumping the driver after 2 second delay - another reason I love windows :-(

I have now done extensive testing with different files graphics cards and settings to determine whether it is the architecture of the cards chip or simply a code or driver mismatch with the gpu architecture and have decided it's simply something to do with the experimental feature set in the render dropdown and the " Ti " gm200 chipset.
The sss shader itself has no direct affect - eg rendering the first gen BMW bench file gives a 0:21 min render on 980ti with four tiles but after enabling the experimental feature set it takes 1:23 min - a 6x slow down.
However this scene has no sss or translucent shaders and rendering on the older 580 in experimental mode with exact same file and settings does not produce this slowing effect.

For now though the workaround is to render with experimental on and branched path tracing on. In this setup I can achieve visually equivalent or better quality with the same or faster render times than the supported cuda feature set and also get the benefit of having sss rendered on the gpu!

Yes the crashes I mentioned were simply win7 dumping the driver after 2 second delay - another reason I love windows :-( I have now done extensive testing with different files graphics cards and settings to determine whether it is the architecture of the cards chip or simply a code or driver mismatch with the gpu architecture and have decided it's simply something to do with the experimental feature set in the render dropdown and the " Ti " gm200 chipset. The sss shader itself has no direct affect - eg rendering the first gen BMW bench file gives a 0:21 min render on 980ti with four tiles but after enabling the experimental feature set it takes 1:23 min - a 6x slow down. However this scene has no sss or translucent shaders and rendering on the older 580 in experimental mode with exact same file and settings does not produce this slowing effect. For now though the workaround is to render with experimental on and branched path tracing on. In this setup I can achieve visually equivalent or better quality with the same or faster render times than the supported cuda feature set and also get the benefit of having sss rendered on the gpu!

Added subscriber: @vicmail12

Added subscriber: @vicmail12

Added subscriber: @ChristopherHammang

Added subscriber: @ChristopherHammang

Added subscriber: @SteveLund

Added subscriber: @SteveLund

I can also confirm that after installing a new GTX 980TI 6gb, Blender Cycles actually renders slower than it was with my GTX 970. Has there been any update on this? Is this still a known issue?

Thanks,
Steve (CG Geek)

I can also confirm that after installing a new GTX 980TI 6gb, Blender Cycles actually renders slower than it was with my GTX 970. Has there been any update on this? Is this still a known issue? Thanks, Steve (CG Geek)
Poster

This issue seems to have been solved in the new Blender 2.77 test 1, and test 2 is out now - give it a try!

http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.77

I have been_loving_ how my new cards FINALLY perform like they ought to in Cycles! :D

This issue seems to have been solved in the new Blender 2.77 test 1, and test 2 is out now - give it a try! http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.77 I have been_loving_ how my new cards FINALLY perform like they ought to in Cycles! :D
Sergey commented 7 years ago
Owner

This is surely a known issue because it was reported to the bug tracker. However, we can not give any update on this because we can only solve issue we can reproduce and this issue can't be reproduced by any of the developers so far.

What you can do. however, is to try latest builds from builder.blender.org, they've been switched to a new CUDA toolkit and might have the issue solved. Other than that you're on your own in investigation so far, unfortunately.

This is surely a known issue because it was reported to the bug tracker. However, we can not give any update on this because we can only solve issue we can reproduce and this issue can't be reproduced by any of the developers so far. What you can do. however, is to try latest builds from builder.blender.org, they've been switched to a new CUDA toolkit and might have the issue solved. Other than that you're on your own in investigation so far, unfortunately.

Running some more tests, there is a nice speed increase on both the GTX 980ti and 970 with the latest Blender 2.77 test 2 build. But the older 970 still renders about 25% faster than the new GTX 980ti on a scene with SSS, and rendering Mike Pan's Benchmark the 970 was still 40 sec faster. Also when rendering with the GTX 980ti I got this error twice: CUDA Error at cuctxcreate: launch exceeded timeout. Does this info help at all?

Running some more tests, there is a nice speed increase on both the GTX 980ti and 970 with the latest Blender 2.77 test 2 build. But the older 970 still renders about 25% faster than the new GTX 980ti on a scene with SSS, and rendering Mike Pan's Benchmark the 970 was still 40 sec faster. Also when rendering with the GTX 980ti I got this error twice: CUDA Error at cuctxcreate: launch exceeded timeout. Does this info help at all?
Poster

I suppose that I'd like to at least partially retract my previous statement that 2.77 test build 2 has resolved my render performance issues on the 980 Ti cards - last night's performance boosts seem to have been highly circumstantial.

Tonight, I tried using Pierrick Picaut's "The Bird" as a performance test between 2.76b and 2.77 test build 2.

http://www.p2design.eu/the-bird

I used exactly the render settings already set in the file, which uses tile sizes of 256x256.
On my machine which now holds 4x 980 Ti cards...

Blender 2.76b:
9:43.07

Blender 2.77 test build 2:
18:25.50 !?!?! (The render tiles seem to take the most time on the branch the bird is perched on)

Out of curiosity, I opened my video card stat monitoring tool to look at the render of the bird while it was running- and I've never seen anything so strange.

For comparison, here's a normal, good render:

gpu_activity-blender_276-good.png

...and here's the wacky 2.77 test 2 graph. Note the particularly violent behavior on the "GPU usage %" row/column.

gpu_activity-blender_277_test_2-bad.png

So nothing is going to help the devs out but to get their hands on the hardware? Alright. I've just reached out to Ton to ask about how best to purchase and send a new GTX 980 Ti to a developer at the Blender Foundation. I hope we can get this issue resolved, or all of this community's hardware investments in this card series will not have been worth-while for the tool that we purchased them to work with. :/

I suppose that I'd like to at least partially retract my previous statement that 2.77 test build 2 has resolved my render performance issues on the 980 Ti cards - last night's performance boosts seem to have been highly circumstantial. Tonight, I tried using Pierrick Picaut's "The Bird" as a performance test between 2.76b and 2.77 test build 2. http://www.p2design.eu/the-bird I used exactly the render settings already set in the file, which uses tile sizes of 256x256. On my machine which now holds 4x 980 Ti cards... Blender 2.76b: 9:43.07 Blender 2.77 test build 2: 18:25.50 !?!?! (The render tiles seem to take the most time on the branch the bird is perched on) Out of curiosity, I opened my video card stat monitoring tool to look at the render of the bird while it was running- and I've never seen anything so strange. For comparison, here's a normal, good render: ![gpu_activity-blender_276-good.png](https://archive.blender.org/developer/F285404/gpu_activity-blender_276-good.png) ...and here's the wacky 2.77 test 2 graph. Note the particularly violent behavior on the "GPU usage %" row/column. ![gpu_activity-blender_277_test_2-bad.png](https://archive.blender.org/developer/F285406/gpu_activity-blender_277_test_2-bad.png) So nothing is going to help the devs out but to get their hands on the hardware? Alright. I've just reached out to Ton to ask about how best to purchase and send a new GTX 980 Ti to a developer at the Blender Foundation. I hope we can get this issue resolved, or all of this community's hardware investments in this card series will not have been worth-while for the tool that we purchased them to work with. :/
Sergey commented 7 years ago
Owner

There was an issue solved after the testbuild2 was created for gtx9xx cards. That's why i asked to test builds from builder.blender.org.

But it also seems multiple issues are being mixed here, which makes it difficult to follow. Original report states poor peformance of 980TI cards in comparison with regular 980 cards, but now it sounds you're talking about perofmrance differnece of previous release and current tesbuild on the same Ti card.

I should ask all the guys here to stick to the topic, which is about performance difference bewteen 980 and 980Ti, other speed regressions are to be repotted separately. Even if we wouldn't be able to reproduce them, it'll help a lot to keep discussion really clear. We can also move to the mailing list if that'd be more convenient for you guys.

Having access to a 980Ti cardwill help a lot indeed. Or at least have some technical artist with that card who can build blender, apply patches and so on and who'll have time to skype/hangout so we can try doing a remote troubleshooting session.

There was an issue solved after the testbuild2 was created for gtx9xx cards. That's why i asked to test builds from builder.blender.org. But it also seems multiple issues are being mixed here, which makes it difficult to follow. Original report states poor peformance of 980TI cards in comparison with regular 980 cards, but now it sounds you're talking about perofmrance differnece of previous release and current tesbuild on the same Ti card. I should ask all the guys here to stick to the topic, which is about performance difference bewteen 980 and 980Ti, other speed regressions are to be repotted separately. Even if we wouldn't be able to reproduce them, it'll help a _lot_ to keep discussion really clear. We can also move to the mailing list if that'd be more convenient for you guys. Having access to a 980Ti cardwill help a lot indeed. Or at least have some technical artist with that card who can build blender, apply patches and so on and who'll have time to skype/hangout so we can try doing a remote troubleshooting session.
Poster

@Sergey I apologize for diminishing the focus on the topic in my last post. I don't know if I'm "technical" enough an artist to build Blender myself (at least without a little help in the configuration), but I would be more than happy to volunteer as much time with remote access into my system as you like, to configure and test in any way you need. There are many hours when I am asleep or working that the system is not in use. What method of contact do you prefer to work out the details?

@Sergey I apologize for diminishing the focus on the topic in my last post. I don't know if I'm "technical" enough an artist to build Blender myself (at least without a little help in the configuration), but I would be more than happy to volunteer as much time with remote access into my system as you like, to configure and test in any way you need. There are many hours when I am asleep or working that the system is not in use. What method of contact do you prefer to work out the details?

I'd also like to help in any way possible, although I don't have much experience building Blender. I apologize if I went off topic comparing the new GTX 980ti to the older 970 on this task, I don't have a stranded 980 to compare with.

I'd also like to help in any way possible, although I don't have much experience building Blender. I apologize if I went off topic comparing the new GTX 980ti to the older 970 on this task, I don't have a stranded 980 to compare with.

Added subscriber: @MikePan

Added subscriber: @MikePan

GPU0: Nvidia GTX 980Ti 6GB
GPU1: Nvidia GTX 780 3GB
CPU: i5 2500k @ 4.4Ghz
RAM: 16GB DDR3
OS: Windows 7 SP1 64bit
Blender: Blender 2.77 Test Build 2
Config: No SLI, Nvidia Driver 358.91

bmps_2015_perftest.blend (Experimental Feature set)

Just 980Ti: 2:23 (tile size 240x160)
Just 980Ti: 1:43 (tile size 480x480)
Just 980Ti: 1:59 (tile size 160x120)

Just 780: 3:33 (tile size 240x160)
Just 780: 3:07 (tile size 480x480)
Just 780: 3:48 (tile size 160x120)

bmps_2015_perftest.blend (Supported Feature set)
Just 980Ti: 2:27 (240x160 tile)
Just 980Ti: 1:53 (480x480 tile)
Just 980Ti: 1:59 (160x120 tile)

Just 780: 3:34 (240x160 tile)
Just 780: 3:07 (480x480 tile)
Just 780: 1:59 (160x120 tile)

Both GPU: 1:35 (240x240 tile)
Both GPU: 1:42 (200x480 tile)

BMW27.blend (Supported Feature set)
Just 780: 1:27 (240x136 tile)
Just 780: 1:14 (480x270 tile)

Just 980Ti: 1:00 (240x136 tile)
Just 980Ti: 0:48 (480x270 tile)

Both GPU: 0:38 (240x136 tile)
Both GPU: 0:33 (480x270 tile)

Other Observations:
GPU usage when rendering is consistently 98% or higher when observed using Nvidia Inspector. (No issue as Admiral Potatoe described)
Setting the tile size to 160X120 as suggested by others did not improve performance.

Conclusion:
980Ti is faster than 780 as expected. No issues observed.
Multiple GPU rendering is faster than single GPU rendering. But some scenes scales better than others.
GPU performance is very sensitive to tile size. Each GPU has a different optimal tile size. This gets even trickier to find the optimal with multiple GPUs.

GPU0: Nvidia GTX 980Ti 6GB GPU1: Nvidia GTX 780 3GB CPU: i5 2500k @ 4.4Ghz RAM: 16GB DDR3 OS: Windows 7 SP1 64bit Blender: Blender 2.77 Test Build 2 Config: No SLI, Nvidia Driver 358.91 **bmps_2015_perftest.blend (Experimental Feature set)** Just 980Ti: 2:23 (tile size 240x160) Just 980Ti: 1:43 (tile size 480x480) Just 980Ti: 1:59 (tile size 160x120) Just 780: 3:33 (tile size 240x160) Just 780: 3:07 (tile size 480x480) Just 780: 3:48 (tile size 160x120) **bmps_2015_perftest.blend (Supported Feature set)** Just 980Ti: 2:27 (240x160 tile) Just 980Ti: 1:53 (480x480 tile) Just 980Ti: 1:59 (160x120 tile) Just 780: 3:34 (240x160 tile) Just 780: 3:07 (480x480 tile) Just 780: 1:59 (160x120 tile) Both GPU: 1:35 (240x240 tile) Both GPU: 1:42 (200x480 tile) **BMW27.blend (Supported Feature set)** Just 780: 1:27 (240x136 tile) Just 780: 1:14 (480x270 tile) Just 980Ti: 1:00 (240x136 tile) Just 980Ti: 0:48 (480x270 tile) Both GPU: 0:38 (240x136 tile) Both GPU: 0:33 (480x270 tile) **Other Observations:** GPU usage when rendering is consistently 98% or higher when observed using Nvidia Inspector. (No issue as Admiral Potatoe described) Setting the tile size to 160X120 as suggested by others did not improve performance. **Conclusion:** 980Ti is faster than 780 as expected. No issues observed. Multiple GPU rendering is faster than single GPU rendering. But some scenes scales better than others. GPU performance is very sensitive to tile size. Each GPU has a different optimal tile size. This gets even trickier to find the optimal with multiple GPUs.

[cont from above]

Tested another scene that has a complex shader network including SSS:

File here: http://blog.mikepan.com/post/82803773422/day-22-of-my-default-cube-experiment-only-a-cube (reduced subdiv to work on GPU)

2.77 Testbuild 2
780: 5:04
980Ti: 3:31
780+980ti: 2:21

Buildbot Feb 22
780: 5:04
980Ti: 3:29
780+980ti: 2:24

Conclusion: MultiGPU is working as expected. 980Ti performs as expected compare to the 780.

[cont from above] Tested another scene that has a complex shader network including SSS: File here: http://blog.mikepan.com/post/82803773422/day-22-of-my-default-cube-experiment-only-a-cube (reduced subdiv to work on GPU) **2.77 Testbuild 2** 780: 5:04 980Ti: 3:31 780+980ti: 2:21 **Buildbot Feb 22** 780: 5:04 980Ti: 3:29 780+980ti: 2:24 **Conclusion:** MultiGPU is working as expected. 980Ti performs as expected compare to the 780.

Added subscriber: @GregZaal

Added subscriber: @GregZaal
Ton commented 7 years ago
Collaborator

Added subscriber: @Ton

Added subscriber: @Ton
Ton commented 7 years ago
Collaborator

Thanks Mike, that's encouraging news. But now... how did you get it work and the others not?
Time for Admiral and Steve to start checking their installations...

Thanks Mike, that's encouraging news. But now... how did you get it work and the others not? Time for Admiral and Steve to start checking their installations...

That's interesting Mike, the only difference I see is your using a slightly older driver and Windows 7. Here is all my information:

GPU: Nvidia GTX 980Ti 6GB
CPU: 2x E5-2670's 2.6ghz
RAM: 64GB DDR3
OS: Fresh install of Windows 10 Pro 64bit
Blender: 2.77 Buildbot Feb 22
Config: Nvidia Driver 361.91

BMW27.blend (Supported Feature set)
GTX 980Ti: 2:08 (240x136 tile)
GTX 980Ti: 2:24 (480x270 tile)

GTX 970: 1:28 (240x136 tile)
GTX 970: 1:18 (480x270 tile)

Both 970+980Ti: 0:53 (240x136 tile)
Both 970+980Ti: 1:08 (480x270 tile)

Conclusion: You can see that Mike's GTX980Ti is rendering the BMW scene 1:36 seconds faster, and that my older 970 is almost twice as fast.

That's interesting Mike, the only difference I see is your using a slightly older driver and Windows 7. Here is all my information: GPU: Nvidia GTX 980Ti 6GB CPU: 2x E5-2670's 2.6ghz RAM: 64GB DDR3 OS: Fresh install of Windows 10 Pro 64bit Blender: 2.77 Buildbot Feb 22 Config: Nvidia Driver 361.91 **BMW27.blend (Supported Feature set)** GTX 980Ti: 2:08 (240x136 tile) GTX 980Ti: 2:24 (480x270 tile) GTX 970: 1:28 (240x136 tile) GTX 970: 1:18 (480x270 tile) Both 970+980Ti: 0:53 (240x136 tile) Both 970+980Ti: 1:08 (480x270 tile) **Conclusion:** You can see that Mike's GTX980Ti is rendering the BMW scene 1:36 seconds faster, and that my older 970 is almost twice as fast.

Okay I can confirm that the driver version makes no difference between 358.91 and 361.91.

Okay I can confirm that the driver version makes no difference between 358.91 and 361.91.

Can someone with a 980Ti monitor their GPU when rendering?
Capture.jpg
I wonder if it's a power supply or thermal problem. Also, could be Windows 10 issue. But I am too scared to upgrade.

I tried resetting my Nvidia driver to all default settings. Also loaded factory default settings in Blender. Still no issue with performance.

Can someone with a 980Ti monitor their GPU when rendering? ![Capture.jpg](https://archive.blender.org/developer/F285722/Capture.jpg) I wonder if it's a power supply or thermal problem. Also, could be Windows 10 issue. But I am too scared to upgrade. I tried resetting my Nvidia driver to all default settings. Also loaded factory default settings in Blender. Still no issue with performance.

Here's my GTX 980Ti under rendering workload.

pic2.gif

Here's my GTX 980Ti under rendering workload. ![pic2.gif](https://archive.blender.org/developer/F285726/pic2.gif)

Hmm everything looks fine in the above screenshot. I have no other ideas to add to this conversation. I'll be happy to test builds for people. As it seems I am the only one with a working 980Ti.

Hmm everything looks fine in the above screenshot. I have no other ideas to add to this conversation. I'll be happy to test builds for people. As it seems I am the only one with a working 980Ti.

System
Win 7 64 bit - 32 GB RAM
Nvidia driver 361.43
Nvidia GTX 980 ti 6GB
Blender 2.77 testbuild 2

I can confirm that the render speed between "experimental" and "supported" is now more or less the same.

On BMW27.blend here are my times:

980ti - experimental 00:53:31
980ti - supported 00:53:38

Also rendering a node heavy SSS scene gives the same results. Much faster than before on supported and virtually the same as experimental.

System Win 7 64 bit - 32 GB RAM Nvidia driver 361.43 Nvidia GTX 980 ti 6GB Blender 2.77 testbuild 2 I can confirm that the render speed between "experimental" and "supported" is now more or less the same. On BMW27.blend here are my times: 980ti - experimental 00:53:31 980ti - supported 00:53:38 Also rendering a node heavy SSS scene gives the same results. Much faster than before on supported and virtually the same as experimental.
Sergey commented 7 years ago
Owner

@einstein, it is no longer a magic if you read release notes -- there's no difference in CUDA kernel for Supported and Experimental featureset anymore.

@SteveLund, are there changes in Core and Memory clocks before/after you started the render?

I just had an idea -- there's no single GTX980Ti card, so everyone please mention exact card you're using (MSI, EVGA, Inno3D, etc). It might be just the same memory bandwidth issue as happened with GTX970.

@einstein, it is no longer a magic if you read release notes -- there's no difference in CUDA kernel for Supported and Experimental featureset anymore. @SteveLund, are there changes in Core and Memory clocks before/after you started the render? I just had an idea -- there's no _single_ GTX980Ti card, so everyone please mention _exact_ card you're using (MSI, EVGA, Inno3D, etc). It might be just the same memory bandwidth issue as happened with GTX970.
Ton commented 7 years ago
Collaborator

In contrary to popular belief - computers are logical and predictable instruments. There must be pattern to be found why Mike has working Titan and the others not :)

In contrary to popular belief - computers are logical and predictable instruments. There must be pattern to be found why Mike has working Titan and the others not :)

Added subscriber: @RayMairlot

Added subscriber: @RayMairlot

Added subscriber: @sgtobst

Added subscriber: @sgtobst

Hi,

I am using also a new 980 Ti, the SuperJetStream from Palit wtih 384Bit Bandwidth.
I had also long times to wait for my renders.
Here is the Monitoring from the actual render:
It looks like it can handle its boost clock very well!

Screenshot 2016-02-25 13.47.00.png

Just wanted to help. I will do the BMW bench now!

single 980Ti makes 1:47 min. thats not really fast? 128x128
single 980Ti makes 1:25 min. 256x256
my GTX 770 makes 2:04 min.

its a i5 2500k @4,5 Ghz, 16 GB DDR3, Windows 10 Pro setup on blender 2.76.

Hi, I am using also a new 980 Ti, the SuperJetStream from Palit wtih 384Bit Bandwidth. I had also long times to wait for my renders. Here is the Monitoring from the actual render: It looks like it can handle its boost clock very well! ![Screenshot 2016-02-25 13.47.00.png](https://archive.blender.org/developer/F285786/Screenshot_2016-02-25_13.47.00.png) Just wanted to help. I will do the BMW bench now! single 980Ti makes 1:47 min. thats not really fast? 128x128 single 980Ti makes 1:25 min. 256x256 my GTX 770 makes 2:04 min. its a i5 2500k @4,5 Ghz, 16 GB DDR3, Windows 10 Pro setup on blender 2.76.

Added subscriber: @JoostBouwer

Added subscriber: @JoostBouwer

I have a Asus GTX 970 and a EVGA GTX Titan X.
The supplied blend file renders ~4times faster on my GTX 970.
Samples reduced to 128.
GTX 970: 00:25,97
Titan X: 01:07,57

I used Blender 2.77 testbuild 2

I have a Asus GTX 970 and a EVGA GTX Titan X. The supplied blend file renders ~4times faster on my GTX 970. Samples reduced to 128. GTX 970: 00:25,97 Titan X: 01:07,57 I used Blender 2.77 testbuild 2
Sergey commented 7 years ago
Owner

Please do not use testbuild2 for this benchmarks, there were fixes in performance done after the testbuild2. Use latest builds from builder.blender.org instead.

Please do not use testbuild2 for this benchmarks, there were fixes in performance done **after** the testbuild2. Use latest builds from builder.blender.org instead.

@Sergey Yes, the core and memory speeds are low (around 200mhz) before and after rendering. Also I'm using the standard EVGA 980Ti 6gb.

@Sergey Yes, the core and memory speeds are low (around 200mhz) before and after rendering. Also I'm using the standard EVGA 980Ti 6gb.

In #45093#360011, @SteveLund wrote:
@Sergey Yes, the core and memory speeds are low (around 200mhz) before and after rendering. Also I'm using the standard EVGA 980Ti 6gb.

same here

> In #45093#360011, @SteveLund wrote: > @Sergey Yes, the core and memory speeds are low (around 200mhz) before and after rendering. Also I'm using the standard EVGA 980Ti 6gb. same here
bullx commented 7 years ago

Added subscriber: @bullx

Added subscriber: @bullx
bullx commented 7 years ago

my scenario:
window 7 64bit
blender v2.76.11
nvidia drivers 361.91
ZOTAC gtx 780 ti AMP!Edition --> 02.17.89 | Mem: 181,55M | Peak: 183,53M
ZOTAC GTX 780 Ti AMP! Edition --> 01.46.11 | Mem: 181,55M | Peak: 183,53M
780 ti +980 ti 01.12.68 | Mem: 363,36M | Peak: 367,31M

official blender 2.76 crashed when rendering with 980 ti only.

EDITED: I've added information as requested, sorry for previous not accurate data.

my scenario: window 7 64bit blender v2.76.11 nvidia drivers 361.91 ZOTAC gtx 780 ti AMP!Edition --> 02.17.89 | Mem: 181,55M | Peak: 183,53M ZOTAC GTX 780 Ti AMP! Edition --> 01.46.11 | Mem: 181,55M | Peak: 183,53M 780 ti +980 ti 01.12.68 | Mem: 363,36M | Peak: 367,31M official blender 2.76 crashed when rendering with 980 ti only. EDITED: I've added information as requested, sorry for previous not accurate data.
Sergey commented 7 years ago
Owner

@bullx, please follow the request above and include actual manufacturer of your card (MSI/Palit/EVGA/...). I'm also not sure what blender 2.76 372 is.

Everyone, please be constructive and helpful. We can not deal with any kind of information, we need accurate information for effective troubleshooting!

@bullx, please follow the request above and include actual manufacturer of your card (MSI/Palit/EVGA/...). I'm also not sure what `blender 2.76 372` is. Everyone, please be constructive and helpful. We can not deal with any kind of information, we need accurate information for effective troubleshooting!

Added subscriber: @derekbarker

Added subscriber: @derekbarker

Windows 7 x64

bmps_2015_perftest.blend

Blender 3f602ff
Titan X 5:39
GTX970 2:37

Blender 1c4f21f
Titan X 2:24
GTX970 2:54

At least the 970 got faster lol

Windows 7 x64 bmps_2015_perftest.blend Blender 3f602ff Titan X 5:39 GTX970 2:37 Blender 1c4f21f Titan X 2:24 GTX970 2:54 At least the 970 got faster lol

bmps_2015_perftest.blend (samples reduced to 128)

Latest build:
blender-2.76-3f602ff-win64.zip
EVGA TITAN X 1:07,12
Asus GTX 970 0:25

blender-2.76-ba98b68-win32.zip (just to check if it would make any difference...)
EVGA Titan X: 1:10,33
Asus GTX 970: 0:26,07

bmps_2015_perftest.blend (samples reduced to 128) Latest build: blender-2.76-3f602ff-win64.zip EVGA TITAN X 1:07,12 Asus GTX 970 0:25 blender-2.76-ba98b68-win32.zip (just to check if it would make any difference...) EVGA Titan X: 1:10,33 Asus GTX 970: 0:26,07

Come on guys. OS, driver version... I started a spreadsheet that anyone can edit. Please fill it out to help the developers find the issue.

https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0/edit?usp=sharing

Come on guys. OS, driver version... I started a spreadsheet that anyone can edit. Please fill it out to help the developers find the issue. https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0/edit?usp=sharing

Builds before 2/24/2016 are way faster on my Titan X

Builds before 2/24/2016 are way faster on my Titan X

Added subscriber: @DuarteRamos

Added subscriber: @DuarteRamos

How do I ensure that I have the 2016-02-26 Win64 Buildbot installed? the latest verion is still 2.76b. Or doest it mean, i have to install it after 24th Feb? I would like to add my data too.
Thx

How do I ensure that I have the 2016-02-26 Win64 Buildbot installed? the latest verion is still 2.76b. Or doest it mean, i have to install it after 24th Feb? I would like to add my data too. Thx

Added subscriber: @tobi

Added subscriber: @tobi

@tobi Just go here and download the latest build: https://builder.blender.org/download/

@tobi Just go here and download the latest build: https://builder.blender.org/download/

Added subscriber: @(Deleted)

Added subscriber: @(Deleted)

@Mike-102 Pan: What kind of motherboard/CPU do you use? It appears to be that your one of the very few who has reasonable rendertimes.

@Mike-102 Pan: What kind of motherboard/CPU do you use? It appears to be that your one of the very few who has reasonable rendertimes.
brecht commented 7 years ago
Owner

Added subscriber: @brecht

Added subscriber: @brecht
brecht commented 7 years ago
Owner

For those with slow performance, it may be worth checking if display performance is somehow related. Some things that could be tested:

  • Change Image Draw Method under User Preferences > System
  • Render without the image editor visible, by changing it to another editor right after rendering starts
  • Render from the command line: blender.exe -b BMW27.blend -f 1
For those with slow performance, it may be worth checking if display performance is somehow related. Some things that could be tested: * Change Image Draw Method under User Preferences > System * Render without the image editor visible, by changing it to another editor right after rendering starts * [Render from the command line](https://www.blender.org/manual/render/workflows/command_line.html): `blender.exe -b BMW27.blend -f 1`
bullx commented 7 years ago

@JoostBouwer
Since my setup seem to work fine too here are CPU and MB details, hope it can help:

Processor: Intel core i7 5930K 3,5Ghz 15Mb cache sk 2011-3 box

Motherboard: Msi x99s gaming7

@JoostBouwer Since my setup seem to work fine too here are CPU and MB details, hope it can help: Processor: Intel core i7 5930K 3,5Ghz 15Mb cache sk 2011-3 box Motherboard: Msi x99s gaming7

So according to the spreadsheet the cards (titans and quadriceps m6000)with 12gb of vragen are the slowest when dealing with sss?

So according to the spreadsheet the cards (titans and quadriceps m6000)with 12gb of vragen are the slowest when dealing with sss?

@brecht Mike's and Rian's results in the spreadsheet should confirm that the display performance isn't the issue effecting the GPU speed.

@brecht Mike's and Rian's results in the spreadsheet should confirm that the display performance isn't the issue effecting the GPU speed.

Okay, big update:

I do not have a magical 980Ti.

I did more testing on my system (which was one of the few 'fast' 980Tis in the spreadsheet) and noticed that certain files do render a lot slower(slower than the 780). So I think it's safe to say at this point the performance degradation is not caused entirely by system configurations, but rather how Cycles is using the GPU.

So the current status is: some scenes render fast, some scenes render slow, some files that render fast on some GPUs render slow on others systems. For files that render slow, if I use the magical 160x120 tile size, it runs at normal speed. Maybe it's some weird memory alignment problem? Branch predictor misses?

I have tried rendering from the command line and it has some positive impact on performance, but the performance gain is the same % for both my 780 as well as my 980Ti. (So rendering from the command line does not solve the 980Ti slowness)

Here is the file I am using to test: https://dl.dropboxusercontent.com/u/1742071/GM200Perf.blend Notice if you change the tile size, performance takes a huge hit. (twice as slow)

Here is the GPU status as it's rendering. The first 2 peaks are rendering at the magical 160x120 tile size. Once you increase the tile size to 320x240, the "Bus Interface usage" goes way up and I believe this is an indication that you'll encounter significant slowdown.

366.png

Okay, **big** update: I do not have a magical 980Ti. I did more testing on my system (which was one of the few 'fast' 980Tis in the spreadsheet) and noticed that certain files do render a lot slower(slower than the 780). So I think it's safe to say at this point the performance degradation is not caused entirely by system configurations, but rather how Cycles is using the GPU. So the current status is: some scenes render fast, some scenes render slow, some files that render fast on some GPUs render slow on others systems. For files that render slow, if I use the magical 160x120 tile size, it runs at normal speed. Maybe it's some weird memory alignment problem? Branch predictor misses? I have tried rendering from the command line and it has some positive impact on performance, but the performance gain is the same % for both my 780 as well as my 980Ti. (So rendering from the command line does not solve the 980Ti slowness) Here is the file I am using to test: https://dl.dropboxusercontent.com/u/1742071/GM200Perf.blend Notice if you change the tile size, performance takes a huge hit. (twice as slow) Here is the GPU status as it's rendering. The first 2 peaks are rendering at the magical 160x120 tile size. Once you increase the tile size to 320x240, the "Bus Interface usage" goes way up and I believe this is an indication that you'll encounter significant slowdown. ![366.png](https://archive.blender.org/developer/F286275/366.png)
brecht commented 7 years ago
Owner

In #45093#360786, @SteveLund wrote:
@brecht Mike's and Rian's results in the spreadsheet should confirm that the display performance isn't the issue effecting the GPU speed.

Not really, if it's e.g. an issue in the Windows 10 display drivers then Windows 7 benchmarks tell us nothing. And display performance issues wouldn't necessarily be related to the same GPU being used for display and rendering.

@MikePan, thanks for doing the command line render tests, it helps exclude some possible causes.

> In #45093#360786, @SteveLund wrote: > @brecht Mike's and Rian's results in the spreadsheet should confirm that the display performance isn't the issue effecting the GPU speed. Not really, if it's e.g. an issue in the Windows 10 display drivers then Windows 7 benchmarks tell us nothing. And display performance issues wouldn't necessarily be related to the same GPU being used for display and rendering. @MikePan, thanks for doing the command line render tests, it helps exclude some possible causes.
Ton commented 7 years ago
Collaborator

I thought the spreadsheet conclusion now was that drivers for Windows10 are slow? And there are no Linux reports yet!

Sergey is preparing a reference test suite with 6 .blend files that test various features and setups.
He's running it on a wide range of computers here, including CPU, CUDA and OpenCL renders.

With a bit of luck we post it today. Thomas Dinges volunteered on maintaining the sheet. I'd suggest to (at least) add more test results there from other Cycles developers, or people who can tweak kernel compilations.

I guess we have to accept that for GPU renders, performance will always differ quite some among OSs, cards, driver versions, shader features and scene setups. Nevertheless, we can at least make sure our own hardware collection keeps being tested and performs satisfying. (In the course of March I expect we add a GTX Ti in the studio here).

I thought the spreadsheet conclusion now was that drivers for Windows10 are slow? And there are no Linux reports yet! Sergey is preparing a reference test suite with 6 .blend files that test various features and setups. He's running it on a wide range of computers here, including CPU, CUDA and OpenCL renders. With a bit of luck we post it today. Thomas Dinges volunteered on maintaining the sheet. I'd suggest to (at least) add more test results there from other Cycles developers, or people who can tweak kernel compilations. I guess we have to accept that for GPU renders, performance will always differ quite some among OSs, cards, driver versions, shader features and scene setups. Nevertheless, we can at least make sure our own hardware collection keeps being tested and performs satisfying. (In the course of March I expect we add a GTX Ti in the studio here).

Removed subscriber: @DuarteRamos

Removed subscriber: @DuarteRamos

Added subscriber: @blenderbender-3

Added subscriber: @blenderbender-3

Win 8.1 64

i7 4770k @ 4.4GHz
EVGA GTX 980 Ti SC ACX 2.0+ at stock factory overclock

stock tile size was 1:35 a few days ago with 2/15/16 drivers, 160x120 was 1:31.6

With 3/1/16 nv drivers I got 1:06.3 seconds with 256x256 tile size, my best yet... I wish I had tried my 970 FTW before selling it. new to this stuff

and just a couple fractions of a second faster than the older drivers at 160x120 with 1:31.24

Win 8.1 64 i7 4770k @ 4.4GHz EVGA GTX 980 Ti SC ACX 2.0+ at stock factory overclock stock tile size was 1:35 a few days ago with 2/15/16 drivers, 160x120 was 1:31.6 With 3/1/16 nv drivers I got 1:06.3 seconds with 256x256 tile size, my best yet... I wish I had tried my 970 FTW before selling it. new to this stuff and just a couple fractions of a second faster than the older drivers at 160x120 with 1:31.24

No performance improvement for me after upgrading to 362.00 driver.

@Ton, the problem is definitely not limited to Windows 10, as I am seeing the same problem too on other scenes. (just not the BMW scene). I think we need to revamp the spreadsheet quite a bit, as it's obvious now the issue is not platform related, but scene dependent. See #45093#360827

No performance improvement for me after upgrading to 362.00 driver. @Ton, the problem is definitely not limited to Windows 10, as I am seeing the same problem too on other scenes. (just not the BMW scene). I think we need to revamp the spreadsheet quite a bit, as it's obvious now the issue is not platform related, but scene dependent. See #45093#360827

Added subscriber: @grimmpersonal

Added subscriber: @grimmpersonal

Hope this helps:

I don't have a 980ti, but I do have a 980 that you might be able to benchmark off of?

My system:

MSI X99 Raider motherboard
Intel I7 5820K @ 3.8 Ghz CPU
Palit Nvidia GTX 460 2Gb - This drives my monitors and is currently in a PCIx16 slot
MSI Nvidia GTX 980 4Gb - This is not connected and is currently in a PCIx8 slot
32 Gb DDR4 quad channel memory
Linux Mint 17.3 OS
Nvidia 352.41 driver

Mike Pan BMW27 benchmark using Blender 2.77 RC2:
980 - 1:06.40

I have noticed that GPUs with the maxwell chip don't render as fast as the Kepler based chips. I don't know if it's the same for Cycles as for Octane, but on Octane the Maxwell cards were slow before the kernel was optimized for them. My 980 went from being barely faster than a 680 to being faster than a Titan.

Hope this helps: I don't have a 980ti, but I do have a 980 that you might be able to benchmark off of? My system: MSI X99 Raider motherboard Intel I7 5820K @ 3.8 Ghz CPU Palit Nvidia GTX 460 2Gb - This drives my monitors and is currently in a PCIx16 slot MSI Nvidia GTX 980 4Gb - This is not connected and is currently in a PCIx8 slot 32 Gb DDR4 quad channel memory Linux Mint 17.3 OS Nvidia 352.41 driver Mike Pan BMW27 benchmark using Blender 2.77 RC2: 980 - 1:06.40 I have noticed that GPUs with the maxwell chip don't render as fast as the Kepler based chips. I don't know if it's the same for Cycles as for Octane, but on Octane the Maxwell cards were slow before the kernel was optimized for them. My 980 went from being barely faster than a 680 to being faster than a Titan.

I should have added that the test was ran with 480x270 tiles.

I should have added that the test was ran with 480x270 tiles.

Added subscriber: @P2design

Added subscriber: @P2design

Hi,

I've both a Titan black and a Titan X
I can confirm that titan black is, at my side, always faster than the Titan X (wich should be unexpected) on 2.77 RC2

I don't know if I can be of any help here as I don't have any solid technical background but If I can run tests for you just let me know @Sergey Sharybin (sergey)

It seems the Titan X and the 980ti both use the GM200 chipset.

Hi, I've both a Titan black and a Titan X I can confirm that titan black is, at my side, always faster than the Titan X (wich should be unexpected) on 2.77 RC2 I don't know if I can be of any help here as I don't have any solid technical background but If I can run tests for you just let me know @Sergey Sharybin (sergey) It seems the Titan X and the 980ti both use the GM200 chipset.

Ok Interesting.... Just to add to the confusion, running Windows 7 Pro on my desktop I get over twice the speed with my EVGA 980ti. Also on complex scenes my Nvidia driver would crash on Windows 10 and not render at all, but seems to work fine on Windows 7.

Here's my results on Windows 7:
GPU: Nvidia GTX 980Ti 6GB
CPU: 2x E5-2670's 2.6ghz
RAM: 64GB DDR3
OS: Fresh install of Windows 7 Pro 64bit
Blender: 2.77 RC2
Config: Nvidia Driver 362

Windows 7:
BMW27.blend (Supported Feature set)
GTX 980Ti: 1:01 (480x270 tile)

bmps_2015_preftest.blend
GTX 980Ti: 2:32

And compared to before:

Windows 10:
BMW27.blend (Supported Feature set)
GTX 980Ti: 2:24 (480x270 tile)

bmps_2015_preftest.blend
GTX 980Ti: 6:44

Ok Interesting.... Just to add to the confusion, running Windows 7 Pro on my desktop I get over twice the speed with my EVGA 980ti. Also on complex scenes my Nvidia driver would crash on Windows 10 and not render at all, but seems to work fine on Windows 7. Here's my results on Windows 7: GPU: Nvidia GTX 980Ti 6GB CPU: 2x E5-2670's 2.6ghz RAM: 64GB DDR3 OS: Fresh install of Windows 7 Pro 64bit Blender: 2.77 RC2 Config: Nvidia Driver 362 **Windows 7:** BMW27.blend (Supported Feature set) GTX 980Ti: 1:01 (480x270 tile) bmps_2015_preftest.blend GTX 980Ti: 2:32 And compared to before: **Windows 10:** BMW27.blend (Supported Feature set) GTX 980Ti: 2:24 (480x270 tile) bmps_2015_preftest.blend GTX 980Ti: 6:44

Added subscriber: @maris-4

Added subscriber: @maris-4

Here's my test results with Ubuntu 14.04/64bit, Blender 2.76b
Intel® Core™ i7-2700K CPU @ 3.50GHz × 8, 16 GB RAM
Nvidia drivers: 352.63

BMW27 scene Tile size 480x270
Gigabyte 980ti 6gb: 0:59sec
Gigabyte 780ti 3gb: 0:58sec
together: 0:33sec

BMW27 scene Tile size 128x128
Gigabyte 980ti 6gb: 1:38sec
Gigabyte 780ti 3gb: 1:29sec

BMW27 scene Tile size 256x256
Gigabyte 980ti 6gb: 1:07sec
Gigabyte 780ti 3gb: 1:04sec

Sergey, I am available for testing specific build if required, let me know.

Here's my test results with Ubuntu 14.04/64bit, Blender 2.76b Intel® Core™ i7-2700K CPU @ 3.50GHz × 8, 16 GB RAM Nvidia drivers: 352.63 BMW27 scene Tile size 480x270 Gigabyte 980ti 6gb: 0:59sec Gigabyte 780ti 3gb: 0:58sec together: 0:33sec BMW27 scene Tile size 128x128 Gigabyte 980ti 6gb: 1:38sec Gigabyte 780ti 3gb: 1:29sec BMW27 scene Tile size 256x256 Gigabyte 980ti 6gb: 1:07sec Gigabyte 780ti 3gb: 1:04sec Sergey, I am available for testing specific build if required, let me know.

One thing i've noticed when comparing Gigabyte 780ti render and Gigabyte 980ti render is that 980ti never uses Performance Level 3 (even if i set it's profile to use max power). It always stays in 2 with memory transfer rate 6608Mhz. See attached pic which was taken during cycles texture baking load with only 980ti.

980.png

And this is how it looks while rendering only with 780ti. This card utilizes Level 3.

780.png

Tested with this version: blender-2.77-989b0e4-linux-glibc211-i686.tar.bz2 built on Sun Mar 13 01:26:31 2016

One thing i've noticed when comparing Gigabyte 780ti render and Gigabyte 980ti render is that 980ti never uses Performance Level 3 (even if i set it's profile to use max power). It always stays in 2 with memory transfer rate 6608Mhz. See attached pic which was taken during cycles texture baking load with only 980ti. ![980.png](https://archive.blender.org/developer/F290007/980.png) And this is how it looks while rendering only with 780ti. This card utilizes Level 3. ![780.png](https://archive.blender.org/developer/F290011/780.png) Tested with this version: blender-2.77-989b0e4-linux-glibc211-i686.tar.bz2 built on Sun Mar 13 01:26:31 2016
Sergey commented 7 years ago
Owner

@SteveLund, who is the vendor of the card?

@SteveLund, @maris-4, why don't you put results to a spreadsheet linked above? That would help a lot gathering bigger picture.

As for performance level -- it's not in an application control, so it's something specific to a driver or driver settings.

@SteveLund, who is the vendor of the card? @SteveLund, @maris-4, why don't you put results to a spreadsheet linked above? That would help __a lot__ gathering bigger picture. As for performance level -- it's not in an application control, so it's something specific to a driver or driver settings.

In #45093#363789, @Sergey wrote:
@SteveLund, who is the vendor of the card?

@SteveLund, @maris-4, why don't you put results to a spreadsheet linked above? That would help a lot gathering bigger picture.

As for performance level -- it's not in an application control, so it's something specific to a driver or driver settings.

I have posted my results on Mike's above spreadsheet under the name Steve Lund. I'm using an Evga 980ti 6gb.

> In #45093#363789, @Sergey wrote: > @SteveLund, who is the vendor of the card? > > @SteveLund, @maris-4, why don't you put results to a spreadsheet linked above? That would help __a lot__ gathering bigger picture. > > As for performance level -- it's not in an application control, so it's something specific to a driver or driver settings. I have posted my results on Mike's above spreadsheet under the name Steve Lund. I'm using an Evga 980ti 6gb.

Okay I may have come to some conclusions for myself, I installed Windows 8.1 and my render speeds with the Evga 980ti where fast (even a tad faster than Windows 7.) And looking into my motherboard (its the ASUS Z9PE-D8 WS.) I've found that the latest officially supported OS was Windows 8.1. Even though everything else seemed to run fine on Windows 10 I'm wondering if maybe my issue is the Motherboard drivers not officially supporting it? Maybe others with slow render speeds should check there MB's drivers and make sure there officially supported.

Okay I may have come to some conclusions for myself, I installed Windows 8.1 and my render speeds with the Evga 980ti where fast (even a tad faster than Windows 7.) And looking into my motherboard (its the ASUS Z9PE-[D8](https://archive.blender.org/developer/D8) WS.) I've found that the latest officially supported OS was Windows 8.1. Even though everything else seemed to run fine on Windows 10 I'm wondering if maybe my issue is the Motherboard drivers not officially supporting it? Maybe others with slow render speeds should check there MB's drivers and make sure there officially supported.

Added subscriber: @Eranekao

Added subscriber: @Eranekao

@SteveLund

Hi, I don't think it's linked to the motherboard.
I have the TITAN X and the TITAN black for more than a year now with a floashed Bios of the ASUS rampage V extreme and on Windows 7 64pro and Windows 10 64pro, the titanX as lower performances than the titan black.

I have opened a task a while back about supported and experimental issues showing that the black losse like 10% of its performances in experimental and the X was like 5 to 10 time slower than usual.
Anyway, it was just pointing these 2 card differences while the X should blow the Black.

Both 980ti and Titan X are based on the same full-fat GM200 Maxwell chip.
And Even if I'm absolutely not into hardware stuff and coding (I just read notes and advises from other users and oftenly think, the most expensive, the best, wich is oftenly wrong :) )
I believe this is the point as it's one of the latest chipset and the drivers and/or cycles dev might not be optimised.
Cause, if you check benchmark, you'll see little variation from an Os to the other or a supposed mother board to the other while you'll see a drastic differences when comparing GPUs

There was some kind of same thing few years ago.
I think it was like the GT550 or so, I remember it was a 5'x serie, was blowing most of the other GPU while it was a very cheap one.

Whatever, I don't know what I/we can do to help the devs here, I really hope you'll put your hands on one of these 2 GPU ASAP and may Nvidia giver you an hand.
The day where cycles will give justice to the TitanX, I wish I'll havthousands dollar to spend in a multi TITAN X config (muwawawahahahaah.... dreaming).

@SteveLund Hi, I don't think it's linked to the motherboard. I have the TITAN X and the TITAN black for more than a year now with a floashed Bios of the ASUS rampage V extreme and on Windows 7 64pro and Windows 10 64pro, the titanX as lower performances than the titan black. I have opened a task a while back about supported and experimental issues showing that the black losse like 10% of its performances in experimental and the X was like 5 to 10 time slower than usual. Anyway, it was just pointing these 2 card differences while the X should blow the Black. Both 980ti and Titan X are based on the same full-fat GM200 Maxwell chip. And Even if I'm absolutely not into hardware stuff and coding (I just read notes and advises from other users and oftenly think, the most expensive, the best, wich is oftenly wrong :) ) I believe this is the point as it's one of the latest chipset and the drivers and/or cycles dev might not be optimised. Cause, if you check benchmark, you'll see little variation from an Os to the other or a supposed mother board to the other while you'll see a drastic differences when comparing GPUs There was some kind of same thing few years ago. I think it was like the GT550 or so, I remember it was a 5'x serie, was blowing most of the other GPU while it was a very cheap one. Whatever, I don't know what I/we can do to help the devs here, I really hope you'll put your hands on one of these 2 GPU ASAP and may Nvidia giver you an hand. The day where cycles will give justice to the TitanX, I wish I'll havthousands dollar to spend in a multi TITAN X config (muwawawahahahaah.... dreaming).

@SteveLund

Hi, I don't think it's linked to the motherboard.
I have the TITAN X and the TITAN black for more than a year now with a floashed Bios of the ASUS rampage V extreme and on Windows 7 64pro and Windows 10 64pro, the titanX as lower performances than the titan black.

I have opened a task a while back about supported and experimental issues showing that the black losse like 10% of its performances in experimental and the X was like 5 to 10 time slower than usual.
Anyway, it was just pointing these 2 card differences while the X should blow the Black.

Both 980ti and Titan X are based on the same full-fat GM200 Maxwell chip.
And Even if I'm absolutely not into hardware stuff and coding (I just read notes and advises from other users and oftenly think, the most expensive, the best, wich is oftenly wrong :) )
I believe this is the point as it's one of the latest chipset and the drivers and/or cycles dev might not be optimised.
Cause, if you check benchmark, you'll see little variation from an Os to the other or a supposed mother board to the other while you'll see a drastic differences when comparing GPUs

There was some kind of same thing few years ago.
I think it was like the GT550 or so, I remember it was a 5'x serie, was blowing most of the other GPU while it was a very cheap one.

Whatever, I don't know what I/we can do to help the devs here, I really hope you'll put your hands on one of these 2 GPU ASAP and may Nvidia giver you an hand.
The day where cycles will give justice to the TitanX, I wish I'll havthousands dollar to spend in a multi TITAN X config (muwawawahahahaah.... dreaming).

@SteveLund Hi, I don't think it's linked to the motherboard. I have the TITAN X and the TITAN black for more than a year now with a floashed Bios of the ASUS rampage V extreme and on Windows 7 64pro and Windows 10 64pro, the titanX as lower performances than the titan black. I have opened a task a while back about supported and experimental issues showing that the black losse like 10% of its performances in experimental and the X was like 5 to 10 time slower than usual. Anyway, it was just pointing these 2 card differences while the X should blow the Black. Both 980ti and Titan X are based on the same full-fat GM200 Maxwell chip. And Even if I'm absolutely not into hardware stuff and coding (I just read notes and advises from other users and oftenly think, the most expensive, the best, wich is oftenly wrong :) ) I believe this is the point as it's one of the latest chipset and the drivers and/or cycles dev might not be optimised. Cause, if you check benchmark, you'll see little variation from an Os to the other or a supposed mother board to the other while you'll see a drastic differences when comparing GPUs There was some kind of same thing few years ago. I think it was like the GT550 or so, I remember it was a 5'x serie, was blowing most of the other GPU while it was a very cheap one. Whatever, I don't know what I/we can do to help the devs here, I really hope you'll put your hands on one of these 2 GPU ASAP and may Nvidia giver you an hand. The day where cycles will give justice to the TitanX, I wish I'll havthousands dollar to spend in a multi TITAN X config (muwawawahahahaah.... dreaming).
Collaborator

Closed as duplicate of #47697

Closed as duplicate of #47697
Blendify closed this issue 7 years ago

Added subscriber: @Blendify

Added subscriber: @Blendify

@Blendify Did you close this by accident? This task has nothing to do with smoke simulation.

@Blendify Did you close this by accident? This task has nothing to do with smoke simulation.
Collaborator

Changed status from 'Duplicate' to: 'Open'

Changed status from 'Duplicate' to: 'Open'
Blendify reopened this issue 7 years ago
Collaborator

Yes sorry copied the wrong task

Yes sorry copied the wrong task
Collaborator

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'
Blendify closed this issue 7 years ago

Changed status from 'Archived' to: 'Open'

Changed status from 'Archived' to: 'Open'
GregZaal reopened this issue 7 years ago
Collaborator

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'
Blendify closed this issue 7 years ago
Collaborator

@GregZaal this task was closed by @Sergey a while ago.

@GregZaal this task was closed by @Sergey a while ago.

Does this mean no one is currently trying to find a solution?

Does this mean no one is currently trying to find a solution?
Collaborator

Quote from Sergey:

There now seems no obvious hardware issue involved here, but that makes the issue rather impossible to solve without a developer having > access to such a card (which we don't have currently).

So i'll consider the issue a TODO and archive for until we've got developer with such a card. Meanwhile you might want to check if someone on blenderartists forum had similar problem and maybe found a workaround (or maybe there are even folks there who doesn't have such performance problems).

Thanks for the report anyway!

So unless someone gives access to a a card I don't think they will be able to do anything.

Quote from Sergey: > There now seems no obvious hardware issue involved here, but that makes the issue rather impossible to solve without a developer having > access to such a card (which we don't have currently). > > So i'll consider the issue a TODO and archive for until we've got developer with such a card. Meanwhile you might want to check if someone on blenderartists forum had similar problem and maybe found a workaround (or maybe there are even folks there who doesn't have such performance problems). > >Thanks for the report anyway! So unless someone gives access to a a card I don't think they will be able to do anything.

Well do they at least know what they did in 2.77 to make it even slower lol

Well do they at least know what they did in 2.77 to make it even slower lol
mont29 commented 7 years ago
Owner

Added subscriber: @TomTuko

Added subscriber: @TomTuko

Added subscriber: @LMProductions-1

Added subscriber: @LMProductions-1

ok Im going to pipe in. I've got the same problem. Have two 2013 MacPro's, one with a GTX 980 Ti and the other with a Titan X. Rendering smoke effects in 2.77 takes 2-4x longer for me. Not knowing of this case, I already started a similar report here https://developer.blender.org/T47808

.blend files and more info on that page.

Just did another test with 2.77 now out and still have the slow down. see attached images. CPU render = 44mins and GPU render = 1hr 41mins. Ouch!
{F297684}Screen Shot 2016-03-21 at 12.07.17 PM.png

On another related but different note. Ive also noticed the light illuminated by the smoke on the GPU is a warmer color than the same smoke rendered by the CPU. So using a farm with some CPU and some GPU systems is impossible because of the mis-match.. and because things render much faster on the CPU, all the money we just sank into GPU cards is completely lost. https://developer.blender.org/T47812

ok Im going to pipe in. I've got the same problem. Have two 2013 MacPro's, one with a GTX 980 Ti and the other with a Titan X. Rendering smoke effects in 2.77 takes 2-4x longer for me. Not knowing of this case, I already started a similar report here https://developer.blender.org/T47808 .blend files and more info on that page. Just did another test with 2.77 now out and still have the slow down. see attached images. CPU render = 44mins and GPU render = 1hr 41mins. Ouch! {[F297684](https://archive.blender.org/developer/F297684/Screen_Shot_2016-03-21_at_1.16.44_PM.png)}![Screen Shot 2016-03-21 at 12.07.17 PM.png](https://archive.blender.org/developer/F297688/Screen_Shot_2016-03-21_at_12.07.17_PM.png) On another related but different note. Ive also noticed the light illuminated by the smoke on the GPU is a warmer color than the same smoke rendered by the CPU. So using a farm with some CPU and some GPU systems is impossible because of the mis-match.. and because things render much faster on the CPU, all the money we just sank into GPU cards is completely lost. https://developer.blender.org/T47812

I have had render speed issues on a test file I am working on.

Blend file is in this bug report: https://developer.blender.org/T47877

I have tried the "magic tile size" of 160/120 and it helps a lot but is still significantly slower to render than Blender 2.76b.

Could it be because of the merging of the supported and experimental kernels, or a result of using the newer Cuda Toolkit (7.5 IIRC)?

My system is a i7 3770K 16G ram, Gigabyte gtx 980ti.

I'm not much of a coder although I can read code and make minor adjustments.

I have a windows 10 computer setup to do Blender builds if that would be helpful.

I have had render speed issues on a test file I am working on. Blend file is in this bug report: https://developer.blender.org/T47877 I have tried the "magic tile size" of 160/120 and it helps a lot but is still significantly slower to render than Blender 2.76b. Could it be because of the merging of the supported and experimental kernels, or a result of using the newer Cuda Toolkit (7.5 IIRC)? My system is a i7 3770K 16G ram, Gigabyte gtx 980ti. I'm not much of a coder although I can read code and make minor adjustments. I have a windows 10 computer setup to do Blender builds if that would be helpful.

One more update:

Continued testing with i7 3770K 16G ram, Gigabyte gtx 980ti on windows 10.

testing on the Blend I submitted I found that a big part of the slowdown was caused by the use of the transparent shader.

The leaves of the tree in my blend use the transparent shader controlled by the alpha channel of an image texture.

What I found was that if I replaced the transparent shader with a glass shader and set the ior to 1.0 the scene renders much faster..... not as fast as the modified scene renders in 2.76b but much closer.

The newly modified scene in, 2.76b with a tile size of 480/540 renders in 1:05, and in 2.77 tile size of 160/120 renders in 1:21.

Also if I use the 160/120 tile size in 2.76b the render time is 1:31 just slightly slower that 2.77 at this tile size.

So although the tile size is playing a role is the problem as is previously noted the implementation of some of the shaders may also be adversely affecting the render times when using 2.77

One more update: Continued testing with i7 3770K 16G ram, Gigabyte gtx 980ti on windows 10. testing on the Blend I submitted I found that a big part of the slowdown was caused by the use of the transparent shader. The leaves of the tree in my blend use the transparent shader controlled by the alpha channel of an image texture. What I found was that if I replaced the transparent shader with a glass shader and set the ior to 1.0 the scene renders much faster..... not as fast as the modified scene renders in 2.76b but much closer. The newly modified scene in, 2.76b with a tile size of 480/540 renders in 1:05, and in 2.77 tile size of 160/120 renders in 1:21. Also if I use the 160/120 tile size in 2.76b the render time is 1:31 just slightly slower that 2.77 at this tile size. So although the tile size is playing a role is the problem as is previously noted the implementation of some of the shaders may also be adversely affecting the render times when using 2.77
Sergey commented 7 years ago
Owner

Please do NOT include all sort of GPU slowdown issues into this report! This only makes the report TOTALLY unreadable. This report is ONLY about issues related on the performance of 980Ti cards comparing to 980 and so. All other issues and especially regressions are to be reported SEPARATELY.

Please do NOT include all sort of GPU slowdown issues into this report! This only makes the report TOTALLY unreadable. This report is ONLY about issues related on the performance of 980Ti cards comparing to 980 and so. All other issues and especially regressions are to be reported SEPARATELY.

Added subscriber: @ArnisVaivars

Added subscriber: @ArnisVaivars

Why is this issue closed when it's not fixed yet? Ton commented awhile back that he expects to get a 980 Ti for testing ir march.

In #45093#360829, @Ton wrote:
I guess we have to accept that for GPU renders, performance will always differ quite some among OSs, cards, driver versions, shader features and scene setups. Nevertheless, we can at least make sure our own hardware collection keeps being tested and performs satisfying. (In the course of March I expect we add a GTX Ti in the studio here).

It's still march. My 980 Ti just came in, I'll start using it either tommorow or early next week and I as many others need this issue to be open not closed.

Why is this issue closed when it's not fixed yet? Ton commented awhile back that he expects to get a 980 Ti for testing ir march. > In #45093#360829, @Ton wrote: > I guess we have to accept that for GPU renders, performance will always differ quite some among OSs, cards, driver versions, shader features and scene setups. Nevertheless, we can at least make sure our own hardware collection keeps being tested and performs satisfying. (**In the course of March I expect we add a GTX Ti in the studio here**). It's still march. My 980 Ti just came in, I'll start using it either tommorow or early next week and I as many others need this issue to be open not closed.

In #45093#365355, @Sergey wrote:
Please do NOT include all sort of GPU slowdown issues into this report! This only makes the report TOTALLY unreadable. This report is ONLY about issues related on the performance of 980Ti cards comparing to 980 and so. All other issues and especially regressions are to be reported SEPARATELY.

The GPU slow downs I mentioned were specifically related to the GTX 980 Ti (and Titan X which is the same chipset) which are the same cards mentioned in the original post. So I didnt think it was off-topic or redundant, except to confirm the problem persists.

> In #45093#365355, @Sergey wrote: > Please do NOT include all sort of GPU slowdown issues into this report! This only makes the report TOTALLY unreadable. This report is ONLY about issues related on the performance of 980Ti cards comparing to 980 and so. All other issues and especially regressions are to be reported SEPARATELY. The GPU slow downs I mentioned were specifically related to the GTX 980 Ti (and Titan X which is the same chipset) which are the same cards mentioned in the original post. So I didnt think it was off-topic or redundant, except to confirm the problem persists.
Sergey commented 7 years ago
Owner

@ArnisVaivars, it is closed as TODO, not as Resolved. We can't fix issues we can not reproduce and currently just trying to get more information in order to try deduction what's wrong.

@LMProductions-1, that was a regular slowdown caused by some new code (like, something changed between 2.76 and 2.77). This report is more about something intrinsicly wrong with Cycles+980Ti which is reported here as never working on full speed. I would prefer to handle regressions separately.

@ArnisVaivars, it is closed as TODO, not as Resolved. We can't fix issues we can not reproduce and currently just trying to get more information in order to try deduction what's wrong. @LMProductions-1, that was a regular slowdown caused by some new code (like, something changed between 2.76 and 2.77). This report is more about something intrinsicly wrong with Cycles+980Ti which is reported here as never working on full speed. I would prefer to handle regressions separately.
Ton commented 7 years ago
Collaborator

We were offered a Titan, I still wait for it...

We were offered a Titan, I still wait for it...

Hi Ton,

That's a great news to be heard and I (and lot of peoples) hope you and your team we'll find a solution about these perf. issues.

(as it's the very first time I may be able to push you a message I just wanted to forward you the biggest thanks ever for everything you've put into creating blender and gathering so talented people to make it better and better. You're a part of what made my dream comes true. Thank you so much and thanks to all the dev and people that get involved in anything that makes blender possible :) )

Hi Ton, That's a great news to be heard and I (and lot of peoples) hope you and your team we'll find a solution about these perf. issues. (as it's the very first time I may be able to push you a message I just wanted to forward you the biggest thanks ever for everything you've put into creating blender and gathering so talented people to make it better and better. You're a part of what made my dream comes true. Thank you so much and thanks to all the dev and people that get involved in anything that makes blender possible :) )
Collaborator

Removed subscriber: @Blendify

Removed subscriber: @Blendify

Added my testing results to the spreadsheet: https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0

My PC is all new besides the PSU and SSD/Hard Drives so the people thinking that maybe the problem could be a motherboard or something are wrong. It's also definitely not overheating, the highest GPU temp was 63°C for me.

EDIT:

Setting the tile size to 160x120 gave me a considerable speed up on the latest buildbot version of Blender:

BMW27

  • 480x270 - 02:03
  • 160x120 - 1:17

bmps_2015_preftest

  • 240x160 - 05:47
  • 160x120 - 02:06

GM200Perf

  • 480x270 - 01:14
  • 160x120 - 00:29
Added my testing results to the spreadsheet: https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0 My PC is all new besides the PSU and SSD/Hard Drives so the people thinking that maybe the problem could be a motherboard or something are wrong. It's also definitely not overheating, the highest GPU temp was 63°C for me. EDIT: Setting the tile size to 160x120 gave me a considerable speed up on the latest buildbot version of Blender: **BMW27** - 480x270 - 02:03 - 160x120 - 1:17 **bmps_2015_preftest** - 240x160 - 05:47 - 160x120 - 02:06 **GM200Perf** - 480x270 - 01:14 - 160x120 - 00:29

Just to confirm this issue only affects windows, I installed ubuntu on a partition.

Results (my card is one of the slow ones):

  • BMW27.blend: Win 02:00, Linux 01:04
  • bmps_2015_perftest.blend: Win 05:36, Linux 02:36
  • GM200Perf.blend: Win 01:15, Linux 00:24
Just to confirm this issue only affects windows, I installed ubuntu on a partition. Results (my card is one of the slow ones): - **BMW27.blend:** Win `02:00`, Linux `01:04` - **bmps_2015_perftest.blend:** Win `05:36`, Linux `02:36` - **GM200Perf.blend:** Win `01:15`, Linux `00:24`

I added my info to the spreadsheet as well.

Gigabyte 980ti

bmw27

blender 2.77
480x270 2:08
240x136 2:02 (this is the default tile size in the file)
160x120 1:26

blender 2.76b
480x270 1:02
240x136 1:19
160x120 1:35

Bmps_2015

Blender 2.77
240x160 6:27
160x120 2:16 (the times are the same for both experimental and supported options

Blender 2.76b
240x160 4:11
160x120 2:33 (supported)

G200

Blender 2.77
480x270 1:18
160x120 0:32

Blender 2.76b
480x270 0:53
160x120 0:31

I am updating my build environment so I can build and test some options.

Has anyone from Blender contacted Nvidia to see if they can offer insight into the apparent slowdown on the gxt 980ti using windows 10?

I added my info to the spreadsheet as well. Gigabyte 980ti bmw27 blender 2.77 480x270 2:08 240x136 2:02 (this is the default tile size in the file) 160x120 1:26 blender 2.76b 480x270 1:02 240x136 1:19 160x120 1:35 Bmps_2015 Blender 2.77 240x160 6:27 160x120 2:16 (the times are the same for both experimental and supported options Blender 2.76b 240x160 4:11 160x120 2:33 (supported) G200 Blender 2.77 480x270 1:18 160x120 0:32 Blender 2.76b 480x270 0:53 160x120 0:31 I am updating my build environment so I can build and test some options. Has anyone from Blender contacted Nvidia to see if they can offer insight into the apparent slowdown on the gxt 980ti using windows 10?

Added subscriber: @jugi255

Added subscriber: @jugi255

hi

i have same problem whit my titan x 1/3 slower i can get close my old bmw27 time by setting tile size to 140*140 but still it is slower :(

amd 8350 cpu @ 4.8
gpu asus titan x nvidia driver 364.51
ram 32gb
win 10 pro

hi i have same problem whit my titan x 1/3 slower i can get close my old bmw27 time by setting tile size to 140*140 but still it is slower :( amd 8350 cpu @ 4.8 gpu asus titan x nvidia driver 364.51 ram 32gb win 10 pro

Is it possible to render from Linux live CD to regain the lost speed? I should probably get another SSD/HDD for linux, but switching between OS'es is a huge pain.

Is it possible to render from Linux live CD to regain the lost speed? I should probably get another SSD/HDD for linux, but switching between OS'es is a huge pain.
Poster

@ArnisVaivars Huge pain indeed! I used to Blender on Linux, but every time I would run the NVidia provided shell script installer to update the NVidia drivers to hopefully get a performance boost for Cycles, the install would corrupt my Xorg config file about every other attempt! Then I'd have to spend about a half a day just getting the thing into a state where the Xorg server would display a UI again. Sadly, this is probably exactly why you would -not- be able to test the issue with a live CD linux - you'd need to install the proprietary NVidia drivers, which no distro would actually ship with their image due to license incompatibility issues.

Perhaps this is something to look into though - how does the Blender foundation test issues internally to determine whether the issue can be isolated to a certain OS? Do you guys keep an installed copy of Linux and Windows on each workstation? Alternately, is there a live boot distro that makes it at least easier to install the latest NVidia driver on boot that we can test with?

@ArnisVaivars Huge pain indeed! I used to Blender on Linux, but every time I would run the NVidia provided shell script installer to update the NVidia drivers to hopefully get a performance boost for Cycles, the install would corrupt my Xorg config file about every other attempt! Then I'd have to spend about a half a day just getting the thing into a state where the Xorg server would display a UI again. Sadly, this is probably exactly why you would -not- be able to test the issue with a live CD linux - you'd need to install the proprietary NVidia drivers, which no distro would actually ship with their image due to license incompatibility issues. Perhaps this is something to look into though - how does the Blender foundation test issues internally to determine whether the issue can be isolated to a certain OS? Do you guys keep an installed copy of Linux and Windows on each workstation? Alternately, is there a live boot distro that makes it at least easier to install the latest NVidia driver on boot that we can test with?
Ton commented 7 years ago
Collaborator

Admiral: the Blender Foundation is not doing anything else but facilitation of a public open source project on blender.org. Nobody really works for "the Foundation" (aside of development fund grants). "The Blender Foundation" has no offices either (nor pays for it). I personally volunteer for Foundation.

All of the work on Blender, including this Nvidia topic, happens by the community - by "those who want to be involved", and that is here on blender.org. Blender is being made by blender.org projects. The responsibility of a release is here on blender.org, shared by all of us who want to be involved.

Blender Institute is the company spin-off of Foundation, and that is where we do bigger projects (like open movies) and where we hire developers on long term development work (cycles etc). Blender Institute is part of the community too.
Institute has a small test lab, but I expect that most (99%) of testing and validation of hardware happens by the community.

That is how open source projects work. You are part of it. It's us, not "you guys".

Admiral: the Blender Foundation is not doing anything else but facilitation of a public open source project on blender.org. Nobody really works for "the Foundation" (aside of development fund grants). "The Blender Foundation" has no offices either (nor pays for it). I personally volunteer for Foundation. All of the work on Blender, including this Nvidia topic, happens by the community - by "those who want to be involved", and that is here on blender.org. Blender is being made by blender.org projects. The responsibility of a release is here on blender.org, shared by all of us who want to be involved. Blender Institute is the company spin-off of Foundation, and that is where we do bigger projects (like open movies) and where we hire developers on long term development work (cycles etc). Blender Institute is part of the community too. Institute has a small test lab, but I expect that most (99%) of testing and validation of hardware happens by the community. That is how open source projects work. You are part of it. It's us, not "you guys".
Owner

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges
Owner

@GregZaal: Thanks for testing this! Now they question is, why this solely happens on Windows and Linux is fine. If that is a driver issue or issue within the CUDA toolkit then there is not much we can do at this point. GPU development is already a challenge, and fighting internal driver / OS issues is out of our hands.

@GregZaal: Thanks for testing this! Now they question is, why this solely happens on Windows and Linux is fine. If that is a driver issue or issue within the CUDA toolkit then there is not much we can do at this point. GPU development is already a challenge, and fighting internal driver / OS issues is out of our hands.

In #45093#366935, @ThomasDinges wrote:
@GregZaal: Thanks for testing this! Now they question is, why this solely happens on Windows and Linux is fine. If that is a driver issue or issue within the CUDA toolkit then there is not much we can do at this point. GPU development is already a challenge, and fighting internal driver / OS issues is out of our hands.

If its a driver/toolkit problem how did it manage to slow down even more between 2.76 and 2.77.. That is the part that confuses me the most lol

> In #45093#366935, @ThomasDinges wrote: > @GregZaal: Thanks for testing this! Now they question is, why this solely happens on Windows and Linux is fine. If that is a driver issue or issue within the CUDA toolkit then there is not much we can do at this point. GPU development is already a challenge, and fighting internal driver / OS issues is out of our hands. If its a driver/toolkit problem how did it manage to slow down even more between 2.76 and 2.77.. That is the part that confuses me the most lol
Owner

Did you take a look at the release logs? We removed the experimental kernel and added SSS and CMJ to the official one, because we fixed the increased memory usage. Adding such features can always have a performance impact on the kernel, we have observed that with every major addition. (Motion Blur, Volume...).

Did you take a look at the release logs? We removed the experimental kernel and added SSS and CMJ to the official one, because we fixed the increased memory usage. Adding such features can always have a performance impact on the kernel, we have observed that with every major addition. (Motion Blur, Volume...).

In #45093#366941, @ThomasDinges wrote:
Did you take a look at the release logs? We removed the experimental kernel and added SSS and CMJ to the official one, because we fixed the increased memory usage. Adding such features can always have a performance impact on the kernel, we have observed that with every major addition. (Motion Blur, Volume...).

The build bot build I was using had a lot of those features before the slowdown happened
1c4f21f

To quote my render times

"Windows 7 x64

bmps_2015_perftest.blend

Blender 3f602ff
Titan X 5:39
GTX970 2:37

Blender 1c4f21f
Titan X 2:24
GTX970 2:54

At least the 970 got faster lol"

> In #45093#366941, @ThomasDinges wrote: > Did you take a look at the release logs? We removed the experimental kernel and added SSS and CMJ to the official one, because we fixed the increased memory usage. Adding such features can always have a performance impact on the kernel, we have observed that with every major addition. (Motion Blur, Volume...). The build bot build I was using had a lot of those features before the slowdown happened 1c4f21f To quote my render times "Windows 7 x64 bmps_2015_perftest.blend Blender 3f602ff Titan X 5:39 GTX970 2:37 Blender 1c4f21f Titan X 2:24 GTX970 2:54 At least the 970 got faster lol"

Added subscriber: @cardboard-2

Added subscriber: @cardboard-2

Got same results here, much slower performance vs. 2.76b (cca. 350%) on Quadro M5000 (GM204) using official 2.77 or latest buildbot (blender-2.77-b1f918b). Problem appears only with rendering via Cycles GPU (VS Thea (CUDA) or Lux (OCL). Card is not used for viewport rendering (computing only).
Win 7 sp1 pro x64

Got same results here, much slower performance vs. 2.76b (cca. 350%) on Quadro M5000 (GM204) using official 2.77 or latest buildbot (blender-2.77-b1f918b). Problem appears only with rendering via Cycles GPU (VS Thea (CUDA) or Lux (OCL). Card is not used for viewport rendering (computing only). Win 7 sp1 pro x64

This thread is not about speed decrease between different versions of blender. It's about the speed difference on the same version of Blender between 980 Ti and lesser cards which at the moment outperform the said 980 Ti. Stay on topic.

This thread is not about speed decrease between different versions of blender. It's about the speed difference on the same version of Blender between 980 Ti and lesser cards which at the moment outperform the said 980 Ti. Stay on topic.

Apologies, assumed that slowdowns happened for GM204 chip also, since for GM200 this problem with cycles is older (http://www.elysiun.com/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering&p=2973055&viewfull=1#post2973055) - just for info / over & out.

Apologies, assumed that slowdowns happened for GM204 chip also, since for GM200 this problem with cycles is older (http://www.elysiun.com/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering&p=2973055&viewfull=1#post2973055) - just for info / over & out.

You might be right in your assumptions that GM204 also could have this issue but it gets confusing when you involve 2 different versions of Blender with different feature sets. This issue should be contained to the latest version of Blender where 980 Ti is slower than 980 and even lesser cards like 770.

You might be right in your assumptions that GM204 also could have this issue but it gets confusing when you involve 2 different versions of Blender with different feature sets. This issue should be contained to the latest version of Blender where 980 Ti is slower than 980 and even lesser cards like 770.

This comment was removed by @ArnisVaivars

*This comment was removed by @ArnisVaivars*
Ton commented 7 years ago
Collaborator

Admiral sent a GTX 980 Ti to Blender Institute. Arrived this morning. It's going to be installed and tested soon. Stay tuned, and thanks a lot Admiral!

Screen Shot 2016-03-30 at 16.34.32.png

Admiral sent a GTX 980 Ti to Blender Institute. Arrived this morning. It's going to be installed and tested soon. Stay tuned, and thanks a lot Admiral! ![Screen Shot 2016-03-30 at 16.34.32.png](https://archive.blender.org/developer/F299575/Screen_Shot_2016-03-30_at_16.34.32.png)

That's a fantastic news and very generous gift.
I really hope you'll find the issue and we'll be able to improve GM200 performances.... It's so (so so so...) frustrating to have bough such a TITAN X for performances and to get stuck into these perf. issues.

BTW, do you think you could get any Nvidia and/or ATI endorsement so that they could provide the foundation there latest cards for testing?
(I believe this may not match open source and blender policy but, it's just to know if this could be an option).

That's a fantastic news and very generous gift. I really hope you'll find the issue and we'll be able to improve GM200 performances.... It's so (so so so...) frustrating to have bough such a TITAN X for performances and to get stuck into these perf. issues. BTW, do you think you could get any Nvidia and/or ATI endorsement so that they could provide the foundation there latest cards for testing? (I believe this may not match open source and blender policy but, it's just to know if this could be an option).

Incredible news Ton and a huge thanks from me to Admiral.

Incredible news Ton and a huge thanks from me to Admiral.

Thanks Admiral,

For those who want to test linux/ubuntu I have uploaded a custom Ubuntu14.04 with the latest nvidia drivers that support the gtx 980ti .
Just make a live usb install and follow the instructions on the link. NO questions here about the build , ask them on the blenderartists page.

blenderartists - page

Thanks Admiral, For those who want to test linux/ubuntu I have uploaded a custom Ubuntu14.04 with the latest nvidia drivers that support the gtx 980ti . Just make a live usb install and follow the instructions on the link. NO questions here about the build , ask them on the blenderartists page. [blenderartists - page ](http://blenderartists.org/forum/showthread.php?395921-Custom-Ubuntu-14-04-ISO-installer-for-NVIDIA-GTX980ti-testing)
Sergey commented 7 years ago
Owner

Just a quick update.

We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :) Linux results i've put to the spreadsheet .

We also tested the card on Windows 10 machine. Once all the Windows glitches were solved by wipe-installing it from scratch, we set all power policies to Maximum Performance, but even then BMW27 scene was 3 times slower.

In both cases we used 2.77 release. From our code perspective it means that while parameters might be not optimal, they do work quite well on Linux.

Next step we'll try to test the card on Windows 7 machine here. Then it'll be more clear if it's Linux vs. Windows or Win7 vs. Win10 and so.

P.S. While googling around for tunable performance parameters got quite reasonable amount of forums where people were complaining about dropped FPS in games after updating to Win10.

Just a quick update. We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :) Linux results i've put to the [spreadsheet ](https://docs.google.com/spreadsheets/d/1rybGWiISHtgaUI-E_DIOM0wf6DW5UG1-p1ooizHimUI/edit#gid=0). We also tested the card on Windows 10 machine. Once all the Windows glitches were solved by wipe-installing it from scratch, we set all power policies to Maximum Performance, but even then BMW27 scene was 3 times slower. In both cases we used 2.77 release. From our code perspective it means that while parameters might be not optimal, they do work quite well on Linux. Next step we'll try to test the card on Windows 7 machine here. Then it'll be more clear if it's Linux vs. Windows or Win7 vs. Win10 and so. P.S. While googling around for tunable performance parameters got quite reasonable amount of forums where people were complaining about dropped FPS in games after updating to Win10.

Thanks for your efforts Serge. I'm sure while you were reading those gamer forums you read as I did that the performance problems were solved by telling the Nvidea driver to use the GPU for Physx instead of the Cpu.

In addition on the same system using the same drivers performance at larger tile sizes has dramatically decreased from Blender 2.76b to Blender 2.77.

While I know this is a separate regression issue figuring out why and correcting it would be a move in a positive direction.

It's also no surprise that Linux has better performance. We may all need to start booting to a usb drive for the larger render jobs if things continue to degrade in the windows environment.

Again thanks for you efforts and I'm confident you and the team will find a solution.

Thanks for your efforts Serge. I'm sure while you were reading those gamer forums you read as I did that the performance problems were solved by telling the Nvidea driver to use the GPU for Physx instead of the Cpu. In addition on the same system using the same drivers performance at larger tile sizes has dramatically decreased from Blender 2.76b to Blender 2.77. While I know this is a separate regression issue figuring out why and correcting it would be a move in a positive direction. It's also no surprise that Linux has better performance. We may all need to start booting to a usb drive for the larger render jobs if things continue to degrade in the windows environment. Again thanks for you efforts and I'm confident you and the team will find a solution.
mont29 commented 7 years ago
Owner

Added subscribers: @mont29, @JoelGerlach, @Blendify

Added subscribers: @mont29, @JoelGerlach, @Blendify
Collaborator

Removed subscriber: @Blendify

Removed subscriber: @Blendify

I'd like to know why comperatively small tile size (160x120) gives a huge speed boost on 980Ti when it's known that larger tile sizes on the gpu usually work better. Is this a per card thing that we need to find out ourselves or is there an issue with Blender or Nvidia drivers?

I'd like to know why comperatively small tile size (160x120) gives a huge speed boost on 980Ti when it's known that larger tile sizes on the gpu usually work better. Is this a per card thing that we need to find out ourselves or is there an issue with Blender or Nvidia drivers?

Hey all. Coming in here from ##47808.

We have TitanX and 780ti workstations (of which the 780ti is more than triple the current performance of the TitanX in 2.77). Let me know if there's any tests or info you need from our systems over here. We're happy to help pitch in.

Hey all. Coming in here from ##47808. We have TitanX and 780ti workstations (of which the 780ti is more than triple the current performance of the TitanX in 2.77). Let me know if there's any tests or info you need from our systems over here. We're happy to help pitch in.

In #45093#367626, @Sergey wrote:
Just a quick update.

We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :) Linux results i've put to the spreadsheet .

We also tested the card on Windows 10 machine. Once all the Windows glitches were solved by wipe-installing it from scratch, we set all power policies to Maximum Performance, but even then BMW27 scene was 3 times slower.

In both cases we used 2.77 release. From our code perspective it means that while parameters might be not optimal, they do work quite well on Linux.

Next step we'll try to test the card on Windows 7 machine here. Then it'll be more clear if it's Linux vs. Windows or Win7 vs. Win10 and so.

P.S. While googling around for tunable performance parameters got quite reasonable amount of forums where people were complaining about dropped FPS in games after updating to Win10.

@Sergey My experience was that both Windows 7 and 8.1 had decent results with the GTX 980Ti, and Windows 10 was terrible.

> In #45093#367626, @Sergey wrote: > Just a quick update. > > We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :) Linux results i've put to the [spreadsheet ](https://docs.google.com/spreadsheets/d/1rybGWiISHtgaUI-E_DIOM0wf6DW5UG1-p1ooizHimUI/edit#gid=0). > > We also tested the card on Windows 10 machine. Once all the Windows glitches were solved by wipe-installing it from scratch, we set all power policies to Maximum Performance, but even then BMW27 scene was 3 times slower. > > In both cases we used 2.77 release. From our code perspective it means that while parameters might be not optimal, they do work quite well on Linux. > > Next step we'll try to test the card on Windows 7 machine here. Then it'll be more clear if it's Linux vs. Windows or Win7 vs. Win10 and so. > > P.S. While googling around for tunable performance parameters got quite reasonable amount of forums where people were complaining about dropped FPS in games after updating to Win10. @Sergey My experience was that both Windows 7 and 8.1 had decent results with the GTX 980Ti, and Windows 10 was terrible.
brecht commented 7 years ago
Owner

In #45093#367626, @Sergey wrote:
We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :)

Comparing a 980 with 2048 cores at 1126 Mhz and a 980Ti with 2816 cores at 1000 Mhz, that's an 18% difference. So 15-20% sounds like it scales pretty linear on Linux.

> In #45093#367626, @Sergey wrote: > We've run some tests on the card yesterday. On Linux 980Ti is roughly 15-20% faster than 980 (which is kinda expected, because top-range is not linear scale in terms of buck-per-performance, it has much more ram tho :) Comparing a 980 with 2048 cores at 1126 Mhz and a 980Ti with 2816 cores at 1000 Mhz, that's an 18% difference. So 15-20% sounds like it scales pretty linear on Linux.
Sergey commented 7 years ago
Owner

@SteveLund, that's cool that our results are all aligned :)

@brecht, specification-vise -- yes. But we had some confusion around which was related on comparing price-wise. That's not linear in comparison. Surely it's not technical, but that's what artists might be expecting (and what they did in practice).

So at the next step we'll be trying figure out if it's something we can solve from Blender side. There are couple of theories currently:

  • Bad compiled binaries (both blender and cubins, that would be simple to eliminate by re-compiling blender on Win10)
  • Changes in sunchronization primitives in Windows which increased latency of invoking CUDA kernels.

That'd what we're gonna to investigate next. Could take some time tho.

@SteveLund, that's cool that our results are all aligned :) @brecht, specification-vise -- yes. But we had some confusion around which was related on comparing price-wise. That's not linear in comparison. Surely it's not technical, but that's what artists might be expecting (and what they did in practice). So at the next step we'll be trying figure out if it's something we can solve from Blender side. There are couple of theories currently: - Bad compiled binaries (both blender and cubins, that would be simple to eliminate by re-compiling blender on Win10) - Changes in sunchronization primitives in Windows which increased latency of invoking CUDA kernels. That'd what we're gonna to investigate next. Could take some time tho.

In #45093#367909, @Sergey wrote:
@SteveLund, that's cool that our results are all aligned :)

@brecht, specification-vise -- yes. But we had some confusion around which was related on comparing price-wise. That's not linear in comparison. Surely it's not technical, but that's what artists might be expecting (and what they did in practice).

So at the next step we'll be trying figure out if it's something we can solve from Blender side. There are couple of theories currently:

  • Bad compiled binaries (both blender and cubins, that would be simple to eliminate by re-compiling blender on Win10)
  • Changes in sunchronization primitives in Windows which increased latency of invoking CUDA kernels.

That'd what we're gonna to investigate next. Could take some time tho.

What about those of us with the same problem on OSX?

> In #45093#367909, @Sergey wrote: > @SteveLund, that's cool that our results are all aligned :) > > @brecht, specification-vise -- yes. But we had some confusion around which was related on comparing price-wise. That's not linear in comparison. Surely it's not technical, but that's what artists might be expecting (and what they did in practice). > > So at the next step we'll be trying figure out if it's something we can solve from Blender side. There are couple of theories currently: > > - Bad compiled binaries (both blender and cubins, that would be simple to eliminate by re-compiling blender on Win10) > - Changes in sunchronization primitives in Windows which increased latency of invoking CUDA kernels. > > That'd what we're gonna to investigate next. Could take some time tho. What about those of us with the same problem on OSX?

Are you saying that on Linux there's nothing more to investigate and results like these are totally fine?

BMW27 scene Tile size 480x270
Gigabyte 980ti 6gb: 0:59sec
Gigabyte 780ti 3gb: 0:58sec

Which would be equal to saying that 780ti has better performance than 980ti

Are you saying that on Linux there's nothing more to investigate and results like these are totally fine? BMW27 scene Tile size 480x270 Gigabyte 980ti 6gb: 0:59sec Gigabyte 780ti 3gb: 0:58sec Which would be equal to saying that 780ti has better performance than 980ti
Sergey commented 7 years ago
Owner

@LMProductions-1, we are solving issues once by one.

@maris-4, GTX 780 Ti is a compute capability 3.5, GTX 980 and 980 Ti are a both compute capability 5.2 so not totally fair comparison. Surely we'll try to solve performance of sm_52 cards, but once agan, we can only solve issues one by one.

So please be patient and give us time for investigation. We've got access to a hardware which demonstrates the issue and we're looking into what we can do from Blender side.

@LMProductions-1, we are solving issues once by one. @maris-4, GTX 780 Ti is a compute capability 3.5, GTX 980 and 980 Ti are a both compute capability 5.2 so not totally fair comparison. Surely we'll try to solve performance of sm_52 cards, but once agan, we can only solve issues one by one. So please be patient and give us time for investigation. We've got access to a hardware which demonstrates the issue and we're looking into what we can do from Blender side.

Added subscriber: @chrisoffner3d

Added subscriber: @chrisoffner3d
Sergey commented 7 years ago
Owner

Changed status from 'Archived' to: 'Open'

Changed status from 'Archived' to: 'Open'
Sergey reopened this issue 7 years ago
Sergey commented 7 years ago
Owner

Re-opening as a TODO.

We are working on getting it solved, but so far it doesn't look it's really a Blender bug, so that's why it's a TODO.

Re-opening as a TODO. We are working on getting it solved, but so far it doesn't look it's really a Blender bug, so that's why it's a TODO.

Even if the win 10 issue is not on Blender's side 980Ti should still be 20-25% faster than a 780Ti on Linux at the very least which is not the case at this moment.

Here's Octane's benchmark: https://render.otoy.com/octanebench/results.php?sort_by=avg&filter=&singleGPU=1

Even if the win 10 issue is not on Blender's side 980Ti should still be 20-25% faster than a 780Ti on Linux at the very least which is not the case at this moment. Here's Octane's benchmark: https://render.otoy.com/octanebench/results.php?sort_by=avg&filter=&singleGPU=1

Added subscriber: @MartinLindelof

Added subscriber: @MartinLindelof

Hi Guies, I'm sorry I can't document this statement but when I've join the SSS performance issue task (with @MartinLindelof) I've run into many tests under windows 7 and I've done them at least once a week since.
It was around the 18th of may 2015

I've switched to windows 10 a month ago and I did not really experience a big performance loss on both Titan Black and titan X.
But what I can affirm is that since I've bought the titan X (march 2015) and on both windows 7 and windows 10, the Titan black had always been faster (10 to 300%) on whatever renderin bothe supported and experimental mode.

I know it won't help that much as I can't provide accurate information but whatever, if you need additional test with references, just let me know.

B.REGARDS,
Pierrick

Hi Guies, I'm sorry I can't document this statement but when I've join the SSS performance issue task (with @MartinLindelof) I've run into many tests under windows 7 and I've done them at least once a week since. It was around the 18th of may 2015 I've switched to windows 10 a month ago and I did not really experience a big performance loss on both Titan Black and titan X. But what I can affirm is that since I've bought the titan X (march 2015) and on both windows 7 and windows 10, the Titan black had always been faster (10 to 300%) on whatever renderin bothe supported and experimental mode. I know it won't help that much as I can't provide accurate information but whatever, if you need additional test with references, just let me know. B.REGARDS, Pierrick

Hey everyone! I'm commenting from ##47808 which dealt with a similar issue. While running Windows 10 I was actually noticing pretty heavy differences in terms of performance capabilities between versions of Blender and not just OS versions. I'm submitting the tests I ran for ##47808 here; as you can see there's definately a large performance decrease in Maxwell based GPUs (980s, TitanX) on Windows between Blender versions. Although issue #1 centered on Cycles GPU smoke which I understand is a new feature and thus nonoptimal, it makes up only one series of tests illustrating the performance hit that newer GPUs are taking on Windows 10 and 2.77.

Benchmarks regarding issue #1: Smoke on GPU comparisons

Using Mike's smoke file (posted in ##47808), 2.77 frame 600, 850 samples:
TitanX - 12m 12s
780ti - 1m 29s

NEW benchmarks regarding issue #2, Cycles performance between previous versions and 2.77 on Maxwell GPUs
BMW27 test using latest BMW scene from https://www.blender.org/download/demo-files/

Blender 2.75 gooseberry branch
TITAN X: 1:31.11
780ti: 1:06.80

Blender 2.76b
TITAN X: 1:32.96
780ti: 1:05.13

Blender 2.77 stable release
TITAN X: 2:19.70
780ti: 1:05.85

CONCLUSIONS:
As you can see from the previous smoke benchmarks and the BMW benchmarks I ran, somewhere between 2.76b and 2.77 the TitanX took a big hit on performance when compared to a 780ti, all run under Windows 10. This indicates to me that while drivers may play a part, and Windows 10 may play a part, something did happen in 2.77.

Thanks all!

Hey everyone! I'm commenting from ##47808 which dealt with a similar issue. While running Windows 10 I was actually noticing pretty heavy differences in terms of performance capabilities between *versions of Blender* and not just OS versions. I'm submitting the tests I ran for ##47808 here; as you can see there's definately a large performance decrease in Maxwell based GPUs (980s, TitanX) on Windows between Blender versions. Although issue #1 centered on Cycles GPU smoke which I understand is a new feature and thus nonoptimal, it makes up only one series of tests illustrating the performance hit that newer GPUs are taking on Windows 10 and 2.77. **Benchmarks regarding issue #1: Smoke on GPU comparisons** Using Mike's smoke file (posted in ##47808), 2.77 frame 600, 850 samples: **TitanX** - *12m 12s* **780ti** - *1m 29s* **NEW benchmarks regarding issue #2, Cycles performance between previous versions and 2.77 on Maxwell GPUs** BMW27 test using latest BMW scene from https://www.blender.org/download/demo-files/ Blender 2.75 gooseberry branch **TITAN X**: *1:31.11* **780ti**: *1:06.80* Blender 2.76b **TITAN X**: *1:32.96* **780ti**: *1:05.13* Blender 2.77 stable release **TITAN X**: *2:19.70* **780ti**: *1:05.85* CONCLUSIONS: As you can see from the previous smoke benchmarks and the BMW benchmarks I ran, somewhere between 2.76b and 2.77 the TitanX took a big hit on performance when compared to a 780ti, all run under Windows 10. This indicates to me that while drivers may play a part, and Windows 10 may play a part, something did happen in 2.77. Thanks all!

Added subscriber: @Klaus-4

Added subscriber: @Klaus-4

For me it also looks like it's not just a Windows issue but a general problem with Maxwell GPUs. I got a new GTX Titan X primarily for Davinci Resolve which replaced my ancient GTX 670. I got massive performance improvements in Resolve as expected. In Blender (Linux) I did not get a noticeable speedup running my old projects. Unfortunately I can't do any exact measurements on the 670 because I sold it already. But rendering the BMW27 scene (no settings changed) on the Titan X, Blender 2.77, Fedora 23 takes 1:06:22 which is still slower than the 780ti posted above. Maybe this should be split into two tasks, one for the Windows issue and one for the general performance on Maxwell chips?

Please tell me if I can help with any more tests/measurements on the Titan X on Linux.

For me it also looks like it's not just a Windows issue but a general problem with Maxwell GPUs. I got a new GTX Titan X primarily for Davinci Resolve which replaced my ancient GTX 670. I got massive performance improvements in Resolve as expected. In Blender (Linux) I did not get a noticeable speedup running my old projects. Unfortunately I can't do any exact measurements on the 670 because I sold it already. But rendering the BMW27 scene (no settings changed) on the Titan X, Blender 2.77, Fedora 23 takes 1:06:22 which is still slower than the 780ti posted above. Maybe this should be split into two tasks, one for the Windows issue and one for the general performance on Maxwell chips? Please tell me if I can help with any more tests/measurements on the Titan X on Linux.

Well, nvidia did cut down the double precision capabilities of the titan x. But then again this should apply to Windows as well as linux... but it might explain the minimal difference between 780ti and titan x.

Well, nvidia did cut down the double precision capabilities of the titan x. But then again this should apply to Windows as well as linux... but it might explain the minimal difference between 780ti and titan x.

Added subscriber: @KarstenBitter

Added subscriber: @KarstenBitter

..please correct me if i'm wrong, but I thought cycles only runs on single precision?

..please correct me if i'm wrong, but I thought cycles only runs on single precision?

Added subscriber: @squizzz

Added subscriber: @squizzz

Maybe that's the reason? Check FP64 compute for Maxwell..

[Maybe that's the reason?](http://www.purepc.pl/files/Image/news/2016/04/nvidia_gp100_pascal_gpu_1.png) Check FP64 compute for Maxwell..

I just finished the three performance test renders and filled in the spreadsheet.
I just wanted to add (I know, nobody wants to hear about comparing to 2.76b, but....) that when I compare the GPUZ reading of the
"BUS Interface Load" it is at 92% in the Slow (Snailrender 2.77) and at about 1% in regular Render (2.76b).

The "Memory Controler Load" seems to be opposite, about 1% in slow render and 32% in normal render...
This "thing" (980gtx Ti) seems to fight with itself....

Cheers,
Karsten Bitter

I just finished the three performance test renders and filled in the spreadsheet. I just wanted to add (I know, nobody wants to hear about comparing to 2.76b, but....) that when I compare the GPUZ reading of the "**BUS Interface Load**" it is at 92% in the Slow (Snailrender 2.77) and at about 1% in regular Render (2.76b). The "**Memory Controler Load**" seems to be opposite, about 1% in slow render and 32% in normal render... This "thing" (980gtx Ti) seems to fight with itself.... Cheers, Karsten Bitter

Added subscriber: @FlorianMosleh

Added subscriber: @FlorianMosleh

Added subscriber: @NachoConesa

Added subscriber: @NachoConesa
pXd commented 7 years ago

Added subscriber: @pXd

Added subscriber: @pXd
pXd commented 7 years ago

Has Ton made any progress with this after he received the 980ti?

I just upgraded from a gtx 760 to a gtx 980ti primarily for my Blender projects.

The scene that took 15mins to render on the 760 is estimating about 1hr 15mins on the new 980ti... Though the cycles preview seems quite snappy.

This is really not good :(

Cheers

Has Ton made any progress with this after he received the 980ti? I just upgraded from a gtx 760 to a gtx 980ti primarily for my Blender projects. The scene that took 15mins to render on the 760 is estimating about 1hr 15mins on the new 980ti... Though the cycles preview seems quite snappy. This is really not good :( Cheers
Ton commented 7 years ago
Collaborator

Joel; Interesting tests. We only have the Ti here, not a TitanX. I would prefer to see similar tests using Windows7 or Linux though. In every Blender release we add features or update or optimise things. Sometimes a change causes slowdown in 1 GPU type in surprising ways. This is why we make test builds and "Release Candidates" before releases. People with high performance expensive cards can help us by doing a test once a while.

Adam (and everyone sharing info here): don't forget to mention precise specs, especially OS version and card brand.

BTW: I am not working on this, it's Sergey who checks the 980 Ti. He was busy on other tasks as well, like fixing bugs and doing 2.77a. I expect he has time for this very soon again.

Joel; Interesting tests. We only have the Ti here, not a TitanX. I would prefer to see similar tests using Windows7 or Linux though. In every Blender release we add features or update or optimise things. Sometimes a change causes slowdown in 1 GPU type in surprising ways. This is why we make test builds and "Release Candidates" before releases. People with high performance expensive cards can help us by doing a test once a while. Adam (and everyone sharing info here): don't forget to mention precise specs, especially OS version and card brand. BTW: I am not working on this, it's Sergey who checks the 980 Ti. He was busy on other tasks as well, like fixing bugs and doing 2.77a. I expect he has time for this very soon again.
pXd commented 7 years ago

In #45093#368754, @Ton wrote:
Joel; Interesting tests. We only have the Ti here, not a TitanX. I would prefer to see similar tests using Windows7 or Linux though. In every Blender release we add features or update or optimise things. Sometimes a change causes slowdown in 1 GPU type in surprising ways. This is why we make test builds and "Release Candidates" before releases. People with high performance expensive cards can help us by doing a test once a while.

Adam (and everyone sharing info here): don't forget to mention precise specs, especially OS version and card brand.

BTW: I am not working on this, it's Sergey who checks the 980 Ti. He was busy on other tasks as well, like fixing bugs and doing 2.77a. I expect he has time for this very soon again.

Thanks for the swift reply Ton.
I have been using Blender since the beginning, I'm very grateful for Blender and the effort you and the community have put into it, and humbled at your personal reply, so thanks!

Specs:
Blender 2.77
Win 10 x64
Previous card EVGA GTX 760 2gb (Worked fine on 2.77)
Current card EVGA GTX 980ti 6gb (Ultra slow on 2.77, though preview seems snappy)
Nvidia driver version 364.72 (fully updated)
Mobo Asus x99-a (Bios fully updated)
CPU Intel 5960X
RAM 32gb DDR 4 Ripjaws

Hope this helps,
Thanks again,

Adam

> In #45093#368754, @Ton wrote: > Joel; Interesting tests. We only have the Ti here, not a TitanX. I would prefer to see similar tests using Windows7 or Linux though. In every Blender release we add features or update or optimise things. Sometimes a change causes slowdown in 1 GPU type in surprising ways. This is why we make test builds and "Release Candidates" before releases. People with high performance expensive cards can help us by doing a test once a while. > > Adam (and everyone sharing info here): don't forget to mention precise specs, especially OS version and card brand. > > BTW: I am not working on this, it's Sergey who checks the 980 Ti. He was busy on other tasks as well, like fixing bugs and doing 2.77a. I expect he has time for this very soon again. Thanks for the swift reply Ton. I have been using Blender since the beginning, I'm very grateful for Blender and the effort you and the community have put into it, and humbled at your personal reply, so thanks! Specs: Blender 2.77 Win 10 x64 Previous card EVGA GTX 760 2gb (Worked fine on 2.77) Current card EVGA GTX 980ti 6gb (Ultra slow on 2.77, though preview seems snappy) Nvidia driver version 364.72 (fully updated) Mobo Asus x99-a (Bios fully updated) CPU Intel 5960X RAM 32gb DDR 4 Ripjaws Hope this helps, Thanks again, Adam
Poster

Hey @Ton,

I'm sorry it took so long to finally run the performance tests on my problem machine. I made my own spreadsheet based on the linked spreadsheet above, and added a column for multi-GPU results, and an extra column for the GPU tests at the "Magical 160x120 tile size" as mentioned in the 980 Ti bug thread. (https://developer.blender.org/T45093)

What I find most interesting is that any time there is fur or hair in the scene, the 980 and 980 Ti pretty much suck, but the 780 Ti is just stellar. I hope that these results help add some clues to the mystery. All tests were run on 2.77a. My machine specs are posted in field that pops up over the machine name.

https://docs.google.com/spreadsheets/d/16IMEGEGDy7OwK3yL3ekmIToZKMwe9boTIBpCED_Ul0U/

Please let me know if there is anything else I can do to help on this issue.

Hey @Ton, I'm sorry it took so long to finally run the performance tests on my problem machine. I made my own spreadsheet based on the linked spreadsheet above, and added a column for multi-GPU results, and an extra column for the GPU tests at the "Magical 160x120 tile size" as mentioned in the 980 Ti bug thread. (https://developer.blender.org/T45093) What I find most interesting is that any time there is fur or hair in the scene, the 980 and 980 Ti pretty much suck, but the 780 Ti is just stellar. I hope that these results help add some clues to the mystery. All tests were run on 2.77a. My machine specs are posted in field that pops up over the machine name. https://docs.google.com/spreadsheets/d/16IMEGEGDy7OwK3yL3ekmIToZKMwe9boTIBpCED_Ul0U/ Please let me know if there is anything else I can do to help on this issue.

Added subscriber: @DanNorris

Added subscriber: @DanNorris
Sergey commented 7 years ago
Owner

Developer note

There were several things we've been trying to do in the past week.

  • Make a native binary on Windows 10

Our official build environment is on Windows 7, and in theory it's possible that compiler will link against some OS specific symbol (like it happens with libc) which could be not so optimal on Windows 10.

This did not give any measurable render time difference in comparison with official builds.

  • Test compiled CUDA kernel from Linux machine

In theory it's possible CUDA kernel is optimized differently on different platforms (who knows what's happening inside nvcc CUDA compiler).

Again this was same slow rendering.

  • Do multiple samples per pixel in the kernel

The idea here was to test whether slower render time is caused by roundtrip between CPU and GPU which happens on every sample. Such roundtrip is not coming for free and in theory could be slow is some spin-lock is implemented badly in driver or OS.

So what we did is we hacked kernel to do 2 samples per pixel without roundtrip (which would mean 2x less overhead).

But again this gave absolutely no render time difference.

  • Installed Linux on the same exact machine

Previous benchmark on Win10 and Linux was done on different machines. This is because our original Win10 machine had crappy secure UEFI which did not want to boot Linux hard disk at all. Now we've got another Win10 machine where we can run both Win10 and Linux (not at the same time tho ;).

Rendering on Linux on the same exact machine was still 2x faster than rendering on Win10.

Steps further

Ok, next we'll do:

  • Test Gtx980Ti as a dedicated compute card (without monitor attached to it.

Probably some folks around tested that already, but we need to doublecheck everything.

Performance on a card which is connected to monitor is not gonna to be as good as possible anyway.

  • Run CPU profiler on Blender

While that wouldn't give us any clue about what's happening on the GPU, we can detect anomalies in CPU side.

  • Bump our sampling hack all the way, so we'll sample full tile with all the samples on GPU without and CPU interaction,

That would only be possible if card is not a display one.

This will eliminate as much variables as we possibly can in Blender. If it'll still be slower on Windows then it'll mean it's definitely something out of our control (all the controls we can control for CUDA are the same on all the platforms).

For now that's it.

## Developer note There were several things we've been trying to do in the past week. - Make a native binary on Windows 10 Our official build environment is on Windows 7, and in theory it's possible that compiler will link against some OS specific symbol (like it happens with libc) which could be not so optimal on Windows 10. This did not give any measurable render time difference in comparison with official builds. - Test compiled CUDA kernel from Linux machine In theory it's possible CUDA kernel is optimized differently on different platforms (who knows what's happening inside nvcc CUDA compiler). Again this was same slow rendering. - Do multiple samples per pixel in the kernel The idea here was to test whether slower render time is caused by roundtrip between CPU and GPU which happens on every sample. Such roundtrip is not coming for free and in theory could be slow is some spin-lock is implemented badly in driver or OS. So what we did is we hacked kernel to do 2 samples per pixel without roundtrip (which would mean 2x less overhead). But again this gave absolutely no render time difference. - Installed Linux on the same exact machine Previous benchmark on Win10 and Linux was done on different machines. This is because our original Win10 machine had crappy secure UEFI which did not want to boot Linux hard disk at all. Now we've got another Win10 machine where we can run both Win10 and Linux (not at the same time tho ;). Rendering on Linux on the same exact machine was still 2x faster than rendering on Win10. ## Steps further Ok, next we'll do: - Test Gtx980Ti as a dedicated compute card (without monitor attached to it. Probably some folks around tested that already, but we need to doublecheck everything. Performance on a card which is connected to monitor is not gonna to be as good as possible anyway. - Run CPU profiler on Blender While that wouldn't give us any clue about what's happening on the GPU, we can detect anomalies in CPU side. - Bump our sampling hack all the way, so we'll sample full tile with all the samples on GPU without and CPU interaction, That would only be possible if card is not a display one. This will eliminate as much variables as we possibly can in Blender. If it'll still be slower on Windows then it'll mean it's definitely something out of our control (all the controls we can control for CUDA are the same on all the platforms). For now that's it.

@Sergey thanks for taking the time to do this writeup! Let us know if there's anything we can help with on our end. We are using Win10 and the TitanX (and 780ti of course) so if you have custom builds or would like us to tweak our settings/drivers/etc we're more than happy to participate. Thanks for all your hard work!

@Sergey thanks for taking the time to do this writeup! Let us know if there's anything we can help with on our end. We are using Win10 and the TitanX (and 780ti of course) so if you have custom builds or would like us to tweak our settings/drivers/etc we're more than happy to participate. Thanks for all your hard work!

Added subscriber: @GottfriedHofmann

Added subscriber: @GottfriedHofmann

One thing: Bad performance happens in Windows 7 as well when driver version is 359.06 or 364.51. Only 361.91 seems to work fine on Windows 7. Seems like the Problem is with nVidia?

One thing: Bad performance happens in Windows 7 as well when driver version is 359.06 or 364.51. Only 361.91 seems to work fine on Windows 7. Seems like the Problem is with nVidia?
Sergey commented 7 years ago
Owner

@GottfriedHofmann, that's something we can't confirm here in the studio. If you'll have a look into our spreadheet you'll notice that 980Ti on Win7 is somewhat reasonable in comparison with Linux (including regular 980 on Linux, we can't test 980 on Win7).

We've been using 364.72 on Win7 machine, so you might want to try them out.

P.S. if some drivers works fast and others works slow then it's obviously not our fault ;)

@GottfriedHofmann, that's something we can't confirm here in the studio. If you'll have a look into our spreadheet you'll notice that 980Ti on Win7 is somewhat reasonable in comparison with Linux (including regular 980 on Linux, we can't test 980 on Win7). We've been using 364.72 on Win7 machine, so you might want to try them out. P.S. if some drivers works fast and others works slow then it's obviously not our fault ;)
pXd commented 7 years ago

Thanks for the update Sergey.

I am on Win 10 with a GTX760 and 980ti installed. I can test purely on the 980ti and run the monitor separately if you require.
Also although I'm using a 5960X CPU the acutal loading of the elements onto the 980ti and building BVH seems to take a long time compared to the 760. I guess this is part of the same problem, but something to note.

Cheers!

Thanks for the update Sergey. I am on Win 10 with a GTX760 and 980ti installed. I can test purely on the 980ti and run the monitor separately if you require. Also although I'm using a 5960X CPU the acutal loading of the elements onto the 980ti and building BVH seems to take a long time compared to the 760. I guess this is part of the same problem, but something to note. Cheers!
Sergey commented 7 years ago
Owner

@pXd, BVH building and such has no GPU dependent code, should be the same for all compute devices you use.

However, copying of data to the GPU is surely dependent on what GPU can do. Having two logs of blender run with --debug-cycles would help getting some clues (one log where you render with just 980Ti and another log where you only render with 760, use the same exact scene in both case. Having logs separately would make reading easier :)

@pXd, BVH building and such has no GPU dependent code, should be the same for all compute devices you use. However, copying of data to the GPU is surely dependent on what GPU can do. Having two logs of blender run with `--debug-cycles` would help getting some clues (one log where you render with just 980Ti and another log where you only render with 760, use the same exact scene in both case. Having logs separately would make reading easier :)
Sergey commented 7 years ago
Owner

Hey everyone,

Running out hardware here in the studio and can't test rendering on dedicated 980Ti.

However, here's a build with hack in it: ftp://ftp.blender.org/sergey/tmp/blender-cuda-hack.zip

The hack makes it so all samples within a tile are done in one single CUDA call, meaning there is absolutely no overhead from blender side and all the compute is done on GPU, without Blender doing anything for until the tile is fully done.

So if there's some of you guys who've got 980Ti as non-display card and can test the build i'll be very curious to know it's render time comparison with builds from buildbot.

NOTE: This build will most likely make your GPU used for display to run out into timelimit, so don't render on display GPU with this build.

Hey everyone, Running out hardware here in the studio and can't test rendering on dedicated 980Ti. However, here's a build with hack in it: ftp://ftp.blender.org/sergey/tmp/blender-cuda-hack.zip The hack makes it so all samples within a tile are done in one single CUDA call, meaning there is absolutely no overhead from blender side and all the compute is done on GPU, without Blender doing anything for until the tile is fully done. So if there's some of you guys who've got 980Ti as non-display card and can test the build i'll be very curious to know it's render time comparison with builds from buildbot. NOTE: This build will most likely make your GPU used for display to run out into timelimit, so don't render on display GPU with this build.

In #45093#369806, @Sergey wrote:
Hey everyone,

Running out hardware here in the studio and can't test rendering on dedicated 980Ti.

However, here's a build with hack in it: ftp://ftp.blender.org/sergey/tmp/blender-cuda-hack.zip

The hack makes it so all samples within a tile are done in one single CUDA call, meaning there is absolutely no overhead from blender side and all the compute is done on GPU, without Blender doing anything for until the tile is fully done.

So if there's some of you guys who've got 980Ti as non-display card and can test the build i'll be very curious to know it's render time comparison with builds from buildbot.

NOTE: This build will most likely make your GPU used for display to run out into timelimit, so don't render on display GPU with this build.

Is there a hack version I can test for OSX too?

> In #45093#369806, @Sergey wrote: > Hey everyone, > > Running out hardware here in the studio and can't test rendering on dedicated 980Ti. > > However, here's a build with hack in it: ftp://ftp.blender.org/sergey/tmp/blender-cuda-hack.zip > > The hack makes it so all samples within a tile are done in one single CUDA call, meaning there is absolutely no overhead from blender side and all the compute is done on GPU, without Blender doing anything for until the tile is fully done. > > So if there's some of you guys who've got 980Ti as non-display card and can test the build i'll be very curious to know it's render time comparison with builds from buildbot. > > NOTE: This build will most likely make your GPU used for display to run out into timelimit, so don't render on display GPU with this build. Is there a hack version I can test for OSX too?
Sergey commented 7 years ago
Owner

@LMProductions-1, not currently but i'll try to get one soon :)

@LMProductions-1, not currently but i'll try to get one soon :)

@Sergey awesome, I'll download it here and see how it runs on a titanx; we've got a deadline this evening for client delivery, as soon as that passes I'll give it a go on our workstations. :)

@Sergey awesome, I'll download it here and see how it runs on a titanx; we've got a deadline this evening for client delivery, as soon as that passes I'll give it a go on our workstations. :)

@Sergey Maybe the the Cuda Tookit can shed some light on the issue? I played around with it today but couldn't see any obvious issue from the profiling data.

Attached is a screenshot showing a 980Ti and a 780 rendering the same scene using official 2.77 release. The 980 Ti is underperforming in this scene. But I don't see any obvious bottleneck (and it doesn't seems to be the slowed down by memcopy either)

profiler.PNG

@Sergey Maybe the the Cuda Tookit can shed some light on the issue? I played around with it today but couldn't see any obvious issue from the profiling data. Attached is a screenshot showing a 980Ti and a 780 rendering the same scene using official 2.77 release. The 980 Ti is underperforming in this scene. But I don't see any obvious bottleneck (and it doesn't seems to be the slowed down by memcopy either) ![profiler.PNG](https://archive.blender.org/developer/F303433/profiler.PNG)
pXd commented 7 years ago

@Sergey Sharybin (sergey)

I have tested just now a standalone render with the 980ti (monitor was running on 760) on Mikes 2.77 BMW scene.
980ti: 01:54:27

and on the 760:
02:07:91

I noticed the fans don't even spin up on the 980ti when rendering, it stays quiet (unlike gaming), where as they do get noisy when rendering on the 760.
So it's definitely getting underutilised.

Scenes with particles seem to be a lot worse as I think someone else mentioned.
I couldn't see any errors in the log (console) but couldn't find where the log is actually saved in windows.

Cheers

@Sergey Sharybin (sergey) I have tested just now a standalone render with the 980ti (monitor was running on 760) on Mikes 2.77 BMW scene. 980ti: 01:54:27 and on the 760: 02:07:91 I noticed the fans don't even spin up on the 980ti when rendering, it stays quiet (unlike gaming), where as they do get noisy when rendering on the 760. So it's definitely getting underutilised. Scenes with particles seem to be a lot worse as I think someone else mentioned. I couldn't see any errors in the log (console) but couldn't find where the log is actually saved in windows. Cheers
Sergey commented 7 years ago
Owner

@MikePan, interesting plot. Indeed seems we're trying to load the GPU on it's max but for some reason it just don't behave that great.

@pXd, is it one minute or one hour in the timing? Fan speed is out of our control. What's interesting tho, for me they start spinning after few seconds (Evga Gtx980TI), but i was setting all power settings to settings like "No power save, give me maximum performance". Also, what's the render time on dedicated 980Ti using latest builds from builder.blender.org?

@MikePan, interesting plot. Indeed seems we're trying to load the GPU on it's max but for some reason it just don't behave that great. @pXd, is it one minute or one hour in the timing? Fan speed is out of our control. What's interesting tho, for me they start spinning after few seconds (Evga Gtx980TI), but i was setting all power settings to settings like "No power save, give me maximum performance". Also, what's the render time on dedicated 980Ti using latest builds from builder.blender.org?
pXd commented 7 years ago

@Sergey Sharybin (sergey)

That is one minute, the timing was copied from the render window.
With the fan speed, it should spin up under load as it does when gaming (EVGA gtx 980ti SC), but when rendering it doesn't spin up much beyond idle speeds if at all.
The 760 however spins up to medium-high when rendering (i.e under load). My power settings are set for performance too.

I will do a dedicated test with a builder.blender.org build when I get back in a day or two. I did do a test with the latest graphicall build but had the same performance issue. (http://www.graphicall.org/444)

Cheers

@Sergey Sharybin (sergey) That is one minute, the timing was copied from the render window. With the fan speed, it should spin up under load as it does when gaming (EVGA gtx 980ti SC), but when rendering it doesn't spin up much beyond idle speeds if at all. The 760 however spins up to medium-high when rendering (i.e under load). My power settings are set for performance too. I will do a dedicated test with a builder.blender.org build when I get back in a day or two. I did do a test with the latest graphicall build but had the same performance issue. (http://www.graphicall.org/444) Cheers

The 980 Ti fan is activated only after reaches a certain temperature. And if a maximum load is put on the card, the fan will still idle until it reaches the temperature necessary to trigger the fan.

The 980 Ti fan is activated only after reaches a certain temperature. And if a maximum load is put on the card, the fan will still idle until it reaches the temperature necessary to trigger the fan.

Indeed, aftermarket 980ti fans do not move unless a certain temperature has been reached which is around 50-60°C. My Gigabyte card even has leds that show when the fans are working and when they are not.

Indeed, aftermarket 980ti fans do not move unless a certain temperature has been reached which is around 50-60°C. My Gigabyte card even has leds that show when the fans are working and when they are not.

The fan on my gigabyte 980 ti also defaults to very slow settings and even when the Gpu temp gets to 82c it is still not running at higher speeds. That is why I use the overclocking tool to set a manual fan speed curve that is more in line with performance rather than quietness.

I set the fan speed so that when it reaches 75c it is running at 100%. With this setting the gpu temp rarely exceeds 62c and I live in the tropics without air-conditioning.

This setting alone decreases a 2 min render bay 10-15 sec without overclocking.

This is of course outside the scope of this Blender/Nvidia issue but might play a role in 980ti performance compared to cards that have a more aggressive fan speed curve.

The fan on my gigabyte 980 ti also defaults to very slow settings and even when the Gpu temp gets to 82c it is still not running at higher speeds. That is why I use the overclocking tool to set a manual fan speed curve that is more in line with performance rather than quietness. I set the fan speed so that when it reaches 75c it is running at 100%. With this setting the gpu temp rarely exceeds 62c and I live in the tropics without air-conditioning. This setting alone decreases a 2 min render bay 10-15 sec without overclocking. This is of course outside the scope of this Blender/Nvidia issue but might play a role in 980ti performance compared to cards that have a more aggressive fan speed curve.

Added subscriber: @Blendiz2

Added subscriber: @Blendiz2

Hi Guys
I just installed 980 Ti an got disappointed as may. Then I found this forum.
I would like to share what I tested about the famous tile size and render speed theory

        980 Ti	   980		both non SLI

progres 1:52.30 1:02.76 1:52.00 only Ti run

960x540 1:50.62 1:02.68 1:02.17 Only 980 run

480x270 1:54.61 1:07.52 0:47.99 Both 99%

160x120 1:17.45 1:27.280:42.73 Both 99%

120x68 1:56.95 2:05.91 1:02.50 Both 99%

I did progressive and diffident sizes.
There is a difference between progressive and 1 tile render.
Combined render is best at tile size optimal for 980Ti.
Tile size theory does not apply to 980Ti.
It applies to 980 where 1 tile and progressive render gave me best resuls.

There is something about GPU memory utilization. My 980Ti has 6 GB but uses only 3.5
980 has 4 GB and uses 3.5.
I have been using Nvidia inspector 1.9 to llok at GPU behavior. I do not understand much of that but it was interesting... :-)
I have a good system scoring in Cinebentch 15 : 162 fpm on GPU and rendering score 1120 on CPU with all possible drivers up to date.

I suppose there is something about how Blender interacts with GPU drivers?
BRG
Andy

Hi Guys I just installed 980 Ti an got disappointed as may. Then I found this forum. I would like to share what I tested about the famous tile size and render speed theory 980 Ti 980 both non SLI progres 1:52.30 1:02.76 1:52.00 only Ti run 960x540 1:50.62 1:02.68 1:02.17 Only 980 run 480x270 1:54.61 1:07.52 0:47.99 Both 99% 160x120 1:17.45 1:27.28**0:42.73** Both 99% 120x68 1:56.95 2:05.91 1:02.50 Both 99% I did progressive and diffident sizes. There is a difference between progressive and 1 tile render. Combined render is best at tile size optimal for 980Ti. Tile size theory does not apply to 980Ti. It applies to 980 where 1 tile and progressive render gave me best resuls. There is something about GPU memory utilization. My 980Ti has 6 GB but uses only 3.5 980 has 4 GB and uses 3.5. I have been using Nvidia inspector 1.9 to llok at GPU behavior. I do not understand much of that but it was interesting... :-) I have a good system scoring in Cinebentch 15 : 162 fpm on GPU and rendering score 1120 on CPU with all possible drivers up to date. I suppose there is something about how Blender interacts with GPU drivers? BRG Andy
pXd commented 7 years ago

Interesting. I'm using Precision to manage both my cards also. On default rendering in IRay the fans spin up.

Rendering in IRay with the 760 it is far far slower than in cycles, Blender aces it. However rendering with the 980ti in IRay it smashes it out of the ballpark when compared to cycles. Adding to the fact that the BMW scene render times were very close there's definitely an issue of underutilising the 980ti. The cycles preview seems snappy thought, it could be something to do with tiles.

Interesting. I'm using Precision to manage both my cards also. On default rendering in IRay the fans spin up. Rendering in IRay with the 760 it is far far slower than in cycles, Blender aces it. However rendering with the 980ti in IRay it smashes it out of the ballpark when compared to cycles. Adding to the fact that the BMW scene render times were very close there's definitely an issue of underutilising the 980ti. The cycles preview seems snappy thought, it could be something to do with tiles.

my test results in better format
brg
Andy

Capture.PNG

my test results in better format brg Andy ![Capture.PNG](https://archive.blender.org/developer/F303992/Capture.PNG)

I did also fan speed test controling 980Ti by ASUS Tweak. I do not see performance differance when speeding up fans of Ti. By default if is lower speed and much higher temp than 980.
980 35 C idle
980Ti approx 50 C idle

I did also fan speed test controling 980Ti by ASUS Tweak. I do not see performance differance when speeding up fans of Ti. By default if is lower speed and much higher temp than 980. 980 35 C idle 980Ti approx 50 C idle

Preview render is not snappy at all.
4 samples take 10.83 s on 980Ti and 3.85 s on 980 and 3.72 s on both.

Preview render is not snappy at all. 4 samples take 10.83 s on 980Ti and 3.85 s on 980 and 3.72 s on both.
pXd commented 7 years ago

@andy-28 (AndyZ)

Hmmm strange. I have a very complex scene and the viewport cycles preview renders very quickly on the EVGA 980ti.

@andy-28 (AndyZ) Hmmm strange. I have a very complex scene and the viewport cycles preview renders very quickly on the EVGA 980ti.

What you mean by very quickly? How many samples? Which light path tracing?

What you mean by very quickly? How many samples? Which light path tracing?
Owner

Removed subscriber: @ThomasDinges

Removed subscriber: @ThomasDinges

Hi Again

I did overclocking of 980Ti and 980 with ASUS tweak II by 23%
the same BMW27 render is now 1:15.52 it was 1:17.45 all using 160x120 tiles on 980Ti

980 overclocked did not change. Fastest is still one tile render 960x540 1:02.95

on both together non SLI best is 160X120 - 0:42.18
brg
Andy

Hi Again I did overclocking of 980Ti and 980 with ASUS tweak II by 23% the same BMW27 render is now 1:15.52 it was 1:17.45 all using 160x120 tiles on 980Ti 980 overclocked did not change. Fastest is still one tile render 960x540 1:02.95 on both together non SLI best is 160X120 - 0:42.18 brg Andy

Added subscriber: @Alicja-4

Added subscriber: @Alicja-4

Hi
I am a beginner in a blender and do not know what's going on. I bought new equipment and I thought that everything will be ok ...

3x Gigabyte GeForce GTX980TI
Intel Core i7-6700K, 4.0GHz, 8MB
RAM 64GB
Windows 10 Professional 64-bit

SLI 3xGTX80ti GPU
BMW test: 39 sec

Small scenes are going fast (GPU) but a large scene did not move…
Still the same error:
CUDA error Out of Memory in cuMemAlloc( device_pointer,met.memory_size)
CPU works but so slow.. :(
I need to make a big billboard (2,5x5m full fit) but I have to wait 1217 hours (CPU)
I have to render it in large format:
karpacz_scre.png
I don't know what I can do.. Did I set something wrong?

Thank you for your time!

Hi I am a beginner in a blender and do not know what's going on. I bought new equipment and I thought that everything will be ok ... 3x Gigabyte GeForce GTX980TI Intel Core i7-6700K, 4.0GHz, 8MB RAM 64GB Windows 10 Professional 64-bit SLI 3xGTX80ti GPU BMW test: 39 sec Small scenes are going fast (GPU) but a large scene did not move… Still the same error: CUDA error Out of Memory in cuMemAlloc( device_pointer,met.memory_size) CPU works but so slow.. :( I need to make a big billboard (2,5x5m full fit) but I have to wait 1217 hours (CPU) I have to render it in large format: ![karpacz_scre.png](https://archive.blender.org/developer/F304115/karpacz_scre.png) I don't know what I can do.. Did I set something wrong? Thank you for your time!

@ Alicia

There are opinions that SLI is not good for rendering and that makes sens to me but I have no direct prove, I would try to inactivate SLI
I have 2 gpu and BMW renders 42 s non SLI

First you need to reduce all light path tracing parameters to minimum or just use direct light in render settings For such big scale you need less details. This picture is to be looked at from a distance?

also make tiles small enough for GPUs to render. What is the size you try to render? A tail must be small enough to feet GPUs RAM.

@ Alicia There are opinions that SLI is not good for rendering and that makes sens to me but I have no direct prove, I would try to inactivate SLI I have 2 gpu and BMW renders 42 s non SLI First you need to reduce all light path tracing parameters to minimum or just use direct light in render settings For such big scale you need less details. This picture is to be looked at from a distance? also make tiles small enough for GPUs to render. What is the size you try to render? A tail must be small enough to feet GPUs RAM.

@andy-28
Hi
Thank you for feedback ;)

Yes, picture will be looked from a distance about 5m
I use "pro lighting skies" lighting from Andrew Price (blender guru)
I learned that I need 14000x8000 72dpi ...
I inactivated SLI but still is a problem with "out of memory" scene did not move even in small resolution :(
This problem appears in my every projects even in simple interior bathroom (2880x1800) like this:
bathroom.png
For example i rendered this bathroom on new macbook pro (20hours CPU) When i bought new computer i thought that will be faster(because this is 3x good card 6GB vram) but (GPU) doesn't move.. realy this is very simple scene..
I don't know what can I do.

@andy-28 Hi Thank you for feedback ;) Yes, picture will be looked from a distance about 5m I use "pro lighting skies" lighting from Andrew Price (blender guru) I learned that I need 14000x8000 72dpi ... I inactivated SLI but still is a problem with "out of memory" scene did not move even in small resolution :( This problem appears in my every projects even in simple interior bathroom (2880x1800) like this: ![bathroom.png](https://archive.blender.org/developer/F304134/bathroom.png) For example i rendered this bathroom on new macbook pro (20hours CPU) When i bought new computer i thought that will be faster(because this is 3x good card 6GB vram) but (GPU) doesn't move.. realy this is very simple scene.. I don't know what can I do.
pXd commented 7 years ago

Alicja (Alicja)
Textures take up VRAM, though you're running 3X 980ti, you only have 6gb total (less what windows is using).
A combination of texture size, particles and poly count add up quickly and can easily saturate 6gb.
If the scene is too big to fit within your 4-6gb limit the render will fail with the Cuda memory alloc error.
Split your scene up into layers, render with a transparent background, and composite them in the nodes compositor or photoshop/gimp if possible.

Alicja (Alicja) Textures take up VRAM, though you're running 3X 980ti, you only have 6gb total (less what windows is using). A combination of texture size, particles and poly count add up quickly and can easily saturate 6gb. If the scene is too big to fit within your 4-6gb limit the render will fail with the Cuda memory alloc error. Split your scene up into layers, render with a transparent background, and composite them in the nodes compositor or photoshop/gimp if possible.

Added subscriber: @adam-106

Added subscriber: @adam-106

@adam-106 (-pXd-)
Hi Adam
Thank you for feedback ;)
I know that I have 6gb total :) I thought that would be enough compared to the previous computer.
In this bathroom is only two ( wood ann foliage texture...) not to big
I can't resize anything.. When I used less samples it was so noisy

@adam-106 (-pXd-) Hi Adam Thank you for feedback ;) I know that I have 6gb total :) I thought that would be enough compared to the previous computer. In this bathroom is only two ( wood ann foliage texture...) not to big I can't resize anything.. When I used less samples it was so noisy

@Alicia
GPU rendering is definitely the future. Just have a look at what Octanerender are doing. Otoy.com.

Blender seems to have issue with 980Ti. Three of them triple problem ??? :-)

tI have 980 and 980 ti.
I try BMW at the resolution you need 14000 x 8000 100%
200 samples
limited global illumination
tile size very important 160x120 there would be 5896 of them during render.
The system promises me to be ready in 3h 45 min. I am not going to wait.

at resolution scaled down to 14000 x 8000 50% system only wants 25 min to render 1496 tils

You need to play with render settings to start moving the render but keep the tiles 160x120 . When I tried other tile size i got system blocked

I try BMW at the resolution you need 14000 x 8000 100%
400 samples
limited global illumination
tile size very important 160x120 there would be 5896 of them during render.
The system promises me to be ready in 5h 5 min. I am not going to wait.

If the BMW can render your pics should move too. You have much less speculars.

@Alicia GPU rendering is definitely the future. Just have a look at what Octanerender are doing. Otoy.com. Blender seems to have issue with 980Ti. Three of them triple problem ??? :-) tI have 980 and 980 ti. I try BMW at the resolution you need 14000 x 8000 100% 200 samples limited global illumination tile size very important 160x120 there would be 5896 of them during render. The system promises me to be ready in 3h 45 min. I am not going to wait. at resolution scaled down to 14000 x 8000 50% system only wants 25 min to render 1496 tils You need to play with render settings to start moving the render but keep the tiles 160x120 . When I tried other tile size i got system blocked I try BMW at the resolution you need 14000 x 8000 100% 400 samples limited global illumination tile size very important 160x120 there would be 5896 of them during render. The system promises me to be ready in 5h 5 min. I am not going to wait. If the BMW can render your pics should move too. You have much less speculars.

Removed subscriber: @ChristopherHammang

Removed subscriber: @ChristopherHammang

This is really, really, really off-topic. You might get help at a regular forum not the bug reports forum.

This is really, really, really off-topic. You might get help at a regular forum not the bug reports forum.
Ton commented 7 years ago
Collaborator

Everyone: stop posting here unless you think you have information for the developer to help fixing the GTX Ti issue.

Everyone: stop posting here unless you think you have information for the developer to help fixing the GTX Ti issue.

Ok. I'm sorry... :) I have a problem with slow 3x gtx980ti too but.. thanks everyone for help ;) I'm stop posting about earlier problem now ;) thanks thanks thanks!

Ok. I'm sorry... :) I have a problem with slow 3x gtx980ti too but.. thanks everyone for help ;) I'm stop posting about earlier problem now ;) thanks thanks thanks!

@Ton
But the developer is silent. I for one, have the 2 GPUs and some free time. I could make some testing if there was something I could test on user level to help...

@Ton But the developer is silent. I for one, have the 2 GPUs and some free time. I could make some testing if there was something I could test on user level to help...

Added subscriber: @CarloSchat

Added subscriber: @CarloSchat
jammer commented 7 years ago

Added subscriber: @jammer

Added subscriber: @jammer
jammer commented 7 years ago

I don't know if this will be helpful or not.

I used to have an EVGA GTX580 Classified installed in my main machine. Running any version of Blender on this machine was great. Windows always remained responsive, even when doing a fairly demanding final render.

This exact same machine has been upgraded from Windows 7 to Windows 10 and also upgraded from the GTX580 to a Palit GTX980ti Super JetStream.

Doing a final render on this updated configuration (all other hardware unchanged) and trying to do anything else on the machine isn't anywhere near as smooth. It's incredibly lumpy in fact.

This leads me to think that the issue might not solely lay with Blender but could be a conflation of things in addition to any issues Blender may have with GM200 powered graphics cards.

I'm going to be watching this thread with interest since I've been incredibly disappointed with the Blender performance ever since the upgrade to the GTX980ti.

Thanks for all the hard work on this and look forward to any results.

I don't know if this will be helpful or not. I used to have an EVGA GTX580 Classified installed in my main machine. Running any version of Blender on this machine was great. Windows always remained responsive, even when doing a fairly demanding final render. This exact same machine has been upgraded from Windows 7 to Windows 10 and also upgraded from the GTX580 to a Palit GTX980ti Super JetStream. Doing a final render on this updated configuration (all other hardware unchanged) and trying to do anything else on the machine isn't anywhere near as smooth. It's incredibly lumpy in fact. This leads me to think that the issue might not solely lay with Blender but could be a conflation of things in addition to any issues Blender may have with GM200 powered graphics cards. I'm going to be watching this thread with interest since I've been incredibly disappointed with the Blender performance ever since the upgrade to the GTX980ti. Thanks for all the hard work on this and look forward to any results.
Moony commented 7 years ago

Added subscriber: @Moony

Added subscriber: @Moony
Moony commented 7 years ago

Not sure if this is of any help - but I have been testing tile sizes using 2.77 on a 980Ti and Windows 10. I rendered a 600x400 image at 500 samples using the normal path tracer.

I plotted the results X vs Y on a plot and there are some interesting results. I have posted the plot on this thread on the Blender artists forum (although I am in the process of expanding the data set).

http://blenderartists.org/forum/showthread.php?397423-Old-war-Windows-VS-Linux-(see-my-little-speed-comparison)-)&p=3040912&viewfull=1#post3040912

But to summarise:

  1. I get the best performance when my tile size equates to around 19,000 - 22,500 pixels per tile. The best performing tile sizes were 160x140, 110x200 and 200x110. There is a valley of good render times running through all tile sizes that equate to around 19,000 - 22,500 pixels per tile.
  2. Tiles that are integer divisible into the overall image resolution (e.g. 100x200) don't necessarily give me the best performance which is somewhat unexpected since partial tiles are supposed to carry a performance overhead. Some integer divisible tiles (e.g. 200x200) give very poor performance, however it's not a straight up case of "more pixels = worse performance" since tile sizes that have more pixels and which are also integer divisble give better render times (e.g. 300x200).
  3. There are islands of poor performance regardless of whether they are full or partial tiles (e.g. the areas around 200x200, 290x130). Tiles all around these 'islands' render faster.

Some tile size render times of note:

160x140 = 11.34

100x100 = 16.03
150x100 = 12.65
200x100 = 11.87
300x100 = 14.49

100x200 = 12.23
150x200 = 15.64
200x200 = 17.30
300x200 = 15.20

600x400 = 14.36

Not sure if this is of any help - but I have been testing tile sizes using 2.77 on a 980Ti and Windows 10. I rendered a 600x400 image at 500 samples using the normal path tracer. I plotted the results X vs Y on a plot and there are some interesting results. I have posted the plot on this thread on the Blender artists forum (although I am in the process of expanding the data set). http://blenderartists.org/forum/showthread.php?397423-Old-war-Windows-VS-Linux-(see-my-little-speed-comparison)-)&p=3040912&viewfull=1#post3040912 But to summarise: 1. I get the best performance when my tile size equates to around 19,000 - 22,500 pixels per tile. The best performing tile sizes were 160x140, 110x200 and 200x110. There is a valley of good render times running through all tile sizes that equate to around 19,000 - 22,500 pixels per tile. 2. Tiles that are integer divisible into the overall image resolution (e.g. 100x200) don't necessarily give me the best performance which is somewhat unexpected since partial tiles are supposed to carry a performance overhead. Some integer divisible tiles (e.g. 200x200) give very poor performance, however it's not a straight up case of "more pixels = worse performance" since tile sizes that have more pixels and which are also integer divisble give better render times (e.g. 300x200). 3. There are islands of poor performance regardless of whether they are full or partial tiles (e.g. the areas around 200x200, 290x130). Tiles all around these 'islands' render faster. Some tile size render times of note: 160x140 = 11.34 100x100 = 16.03 150x100 = 12.65 200x100 = 11.87 300x100 = 14.49 100x200 = 12.23 150x200 = 15.64 200x200 = 17.30 300x200 = 15.20 600x400 = 14.36

Added subscriber: @forrestwalter

Added subscriber: @forrestwalter

No activity here for awhile. Any progress on this?

No activity here for awhile. Any progress on this?

Added subscriber: @mattcawte

Added subscriber: @mattcawte

The tile size graphs you created are true for my TitanX on Windows 10 also, it prefers relatively small tiles. I also have an old Titan 6Gb in the same machine which renders much faster than the TitanX. Particles, smoke, SSS and real-time viewport rendering (not tiled) take the biggest hit on the TitanX, where it grinds to a painfully slow speed compared with the old Titan.

The tile size graphs you created are true for my TitanX on Windows 10 also, it prefers relatively small tiles. I also have an old Titan 6Gb in the same machine which renders much faster than the TitanX. Particles, smoke, SSS and real-time viewport rendering (not tiled) take the biggest hit on the TitanX, where it grinds to a painfully slow speed compared with the old Titan.

hi

there is some thing really wrong at win10 bmw test runs 1,45 on titanx i did try 5 different nvidia driver no effect
then i did some oc to my card was running 1380mhz and memory 7400mhz and my time was 1.32 so no effect

i did install linux and blender same test 1,05

hi there is some thing really wrong at win10 bmw test runs 1,45 on titanx i did try 5 different nvidia driver no effect then i did some oc to my card was running 1380mhz and memory 7400mhz and my time was 1.32 so no effect i did install linux and blender same test 1,05
kOsMos commented 7 years ago

Added subscriber: @kOsMos

Added subscriber: @kOsMos
kOsMos commented 7 years ago

Obviously there is problem with cycles. Horrible GPU render performance on GTX Titan X with v2.77a. Basically Titan X is 127% slower than 780 Ti when rendering Mike Pan's BMW scene. But with v2.76 it is only 32% slower, still it should not be slower! I am not super expect but the first thing that comes to mind is most likely bad code in the kernel file kernel_sm_52.cubin OR the larger VRAM on 980ti and Titan X.

To prove that its infact cycles causing the degradation of performance of 980ti and Titan X, Octane render is 20-30% faster than 780Ti, the way it should be!

*160x120 Tiles seems to be the magical size! In version 2.77a Titan X finished this scene in 28s, but in v2.76 with same tile setting it finished in 30s. Still slower than 780Ti.

Windows 10 Pro
Scene: BMW1M-MikePan.blend
GPU1: Titan X vs GPU2: 780Ti
Blender Version: v2.77a vs v2.76
Tiles: 512x512

Results:

v2.77a
Titan X: 50s
780 Ti: 22s

v2.76
Titan X: 29s
780 Ti: 22s

Obviously there is problem with cycles. Horrible GPU render performance on GTX Titan X with v2.77a. Basically Titan X is 127% slower than 780 Ti when rendering Mike Pan's BMW scene. But with v2.76 it is only 32% slower, still it should not be slower! I am not super expect but the first thing that comes to mind is most likely bad code in the kernel file kernel_sm_52.cubin OR the larger VRAM on 980ti and Titan X. To prove that its infact cycles causing the degradation of performance of 980ti and Titan X, Octane render is 20-30% faster than 780Ti, the way it should be! *160x120 Tiles seems to be the magical size! In version 2.77a Titan X finished this scene in 28s, but in v2.76 with same tile setting it finished in 30s. Still slower than 780Ti. Windows 10 Pro Scene: BMW1M-MikePan.blend GPU1: Titan X vs GPU2: 780Ti Blender Version: v2.77a vs v2.76 Tiles: 512x512 Results: v2.77a Titan X: 50s 780 Ti: 22s v2.76 Titan X: 29s 780 Ti: 22s
Sergey commented 7 years ago
Owner

@Blendiz2, developers are silent because they are kind of in the middle of something. It is NOT a forum, flooding report would not make developers more active and will not lead to a faster bug fix.

Please don't use the bug tracker as a user-to-user communication, use BA forum instead and only put really helpful information here.

@Alicja-4, this is out of the scope of this report.

@Moony, you can always gain some %% of speedup by fine-tuning parameters for a particular hardware. That's NOT what we're troubleshooting. While you say there are more optimal tile size it'll be more optimal for all the OSes (since it depends on hardware, basically number of threads on the GPU). Tweaking tile size will NOT change the fact that Windows 10 renders 3 times slower than Linux or Windows 7.

@kOsMos, We don't have Titan X here in the studio, but i can not confirm such a slowdown with 980Ti (which is the same Compute Capability 5.2) we do have here, both 2.76 and 2.77a behaves same slow.

To conclude

So what did a while ago was we've created a special build which avoids ANY of CPU interaction during GPU rendering, avoiding any possible latency (tile was fully sampled on GPU, only final tile result was reported back to CPU). This way we're sure we're loading GPU as much as possible.

This test was tested by @pXd (only by him btw, nobody else even dared to do tests which are needed for further investigation). This did not give any measurable difference in the render time (if i read timing correct and comparing it to proper baseline). This means the root of the issue of render time difference between various platforms is not the way how we launch CUDA kernels, root of the issue is inside of the driver of OS itself.

We also went couple of versions back and compared render times between Windows 10 and Linux, and Linux was consistently faster (around 3 times) on the same hardware. (And as we tested before, Windows 7 was quite on the same level as Linux). So we can not confirm any claims that sm_52-based cards were faster in previous releases.

We also tested regular 980 card (NOT a Ti) on Windows 10 and Linux in the same machine. And surely enough it was much slower in Windows 10 again.

All this currently leads us to a conclusion that it's something fundamentally broken in either Windows 10 itself or NVidia's driver for this platform. This isn't something we can look ourselves (well, we could, but MS is not really happy about reverse-engineering their products ;). For until some major update happens from either Microsoft or NVidia sides i don't see what else we can do here currently.

P.S. Comparing Cycles to IRay is not really legit. IRay is specifically designed and optimized for CUDA architecture. Additionally, as far as i can see nobody compared IRay on Win10 and Linux, so you can't say IRay's performance is on it's maximum either, it might be same 3x times faster on Linux.
P.P.S. Again, even inoptimal design of Cycles Kernel does not cancel the fact that it is only slow on Windows 10 and that's it's much-much faster on Win7 and Linux.

@Blendiz2, developers are silent because they are kind of in the middle of something. It is NOT a forum, flooding report would not make developers more active and will not lead to a faster bug fix. Please don't use the bug tracker as a user-to-user communication, use BA forum instead and only put really helpful information here. @Alicja-4, this is out of the scope of this report. @Moony, you can always gain some %% of speedup by fine-tuning parameters for a particular hardware. That's NOT what we're troubleshooting. While you say there are more optimal tile size it'll be more optimal for all the OSes (since it depends on hardware, basically number of threads on the GPU). Tweaking tile size will NOT change the fact that Windows 10 renders 3 times slower than Linux or Windows 7. @kOsMos, We don't have Titan X here in the studio, but i can not confirm such a slowdown with 980Ti (which is the same Compute Capability 5.2) we do have here, both 2.76 and 2.77a behaves same slow. ## To conclude So what did a while ago was we've created a special build which avoids ANY of CPU interaction during GPU rendering, avoiding any possible latency (tile was fully sampled on GPU, only final tile result was reported back to CPU). This way we're sure we're loading GPU as much as possible. This test was tested by @pXd (only by him btw, nobody else even dared to do tests which are needed for further investigation). This did not give any measurable difference in the render time (if i read timing correct and comparing it to proper baseline). This means the root of the issue of render time difference between various platforms is not the way how we launch CUDA kernels, root of the issue is inside of the driver of OS itself. We also went couple of versions back and compared render times between Windows 10 and Linux, and Linux was consistently faster (around 3 times) on the same hardware. (And as we tested before, Windows 7 was quite on the same level as Linux). So we can not confirm any claims that sm_52-based cards were faster in previous releases. We also tested regular 980 card (NOT a Ti) on Windows 10 and Linux in the same machine. And surely enough it was much slower in Windows 10 again. All this currently leads us to a conclusion that it's something fundamentally broken in either Windows 10 itself or NVidia's driver for this platform. This isn't something we can look ourselves (well, we could, but MS is not really happy about reverse-engineering their products ;). For until some major update happens from either Microsoft or NVidia sides i don't see what else we can do here currently. P.S. Comparing Cycles to IRay is not really legit. IRay is specifically designed and optimized for CUDA architecture. Additionally, as far as i can see nobody compared IRay on Win10 and Linux, so you can't say IRay's performance is on it's maximum either, it might be same 3x times faster on Linux. P.P.S. Again, even inoptimal design of Cycles Kernel does not cancel the fact that it is only slow on Windows 10 and that's it's much-much faster on Win7 and Linux.
kOsMos commented 7 years ago

In #45093#374601, @Sergey wrote:
@Blendiz2, developers are silent because they are kind of in the middle of something. It is NOT a forum, flooding report would not make developers more active and will not lead to a faster bug fix.

Please don't use the bug tracker as a user-to-user communication, use BA forum instead and only put really helpful information here.

@Alicja-4, this is out of the scope of this report.

@Moony, you can always gain some %% of speedup by fine-tuning parameters for a particular hardware. That's NOT what we're troubleshooting. While you say there are more optimal tile size it'll be more optimal for all the OSes (since it depends on hardware, basically number of threads on the GPU). Tweaking tile size will NOT change the fact that Windows 10 renders 3 times slower than Linux or Windows 7.

@kOsMos, We don't have Titan X here in the studio, but i can not confirm such a slowdown with 980Ti (which is the same Compute Capability 5.2) we do have here, both 2.76 and 2.77a behaves same slow.

== To conclude ==

So what did a while ago was we've created a special build which avoids ANY of CPU interaction during GPU rendering, avoiding any possible latency (tile was fully sampled on GPU, only final tile result was reported back to CPU). This way we're sure we're loading GPU as much as possible.

This test was tested by @pXd (only by him btw, nobody else even dared to do tests which are needed for further investigation). This did not give any measurable difference in the render time (if i read timing correct and comparing it to proper baseline). This means the root of the issue of render time difference between various platforms is not the way how we launch CUDA kernels, root of the issue is inside of the driver of OS itself.

We also went couple of versions back and compared render times between Windows 10 and Linux, and Linux was consistently faster (around 3 times) on the same hardware. (And as we tested before, Windows 7 was quite on the same level as Linux). So we can not confirm any claims that sm_52-based cards were faster in previous releases.

We also tested regular 980 card (NOT a Ti) on Windows 10 and Linux in the same machine. And surely enough it was much slower in Windows 10 again.

All this currently leads us to a conclusion that it's something fundamentally broken in either Windows 10 itself or NVidia's driver for this platform. This isn't something we can look ourselves (well, we could, but MS is not really happy about reverse-engineering their products ;). For until some major update happens from either Microsoft or NVidia sides i don't see what else we can do here currently.

P.S. Comparing Cycles to IRay is not really legit. IRay is specifically designed and optimized for CUDA architecture. Additionally, as far as i can see nobody compared IRay on Win10 and Linux, so you can't say IRay's performance is on it's maximum either, it might be same 3x times faster on Linux.
P.P.S. Again, even inoptimal design of Cycles Kernel does not cancel the fact that it is only slow on Windows 10 and that's it's much-much faster on Win7 and Linux.

I would agree with your conclusion only if Octane was also slower but its 20%+ faster than 780ti which is expected so pointing to windows 10 or nvidia drivers is possible but i doubt it. I am going to test your conclusion in a fresh install of win7pro and report back soon. Thanks.

> In #45093#374601, @Sergey wrote: > @Blendiz2, developers are silent because they are kind of in the middle of something. It is NOT a forum, flooding report would not make developers more active and will not lead to a faster bug fix. > > Please don't use the bug tracker as a user-to-user communication, use BA forum instead and only put really helpful information here. > > @Alicja-4, this is out of the scope of this report. > > @Moony, you can always gain some %% of speedup by fine-tuning parameters for a particular hardware. That's NOT what we're troubleshooting. While you say there are more optimal tile size it'll be more optimal for all the OSes (since it depends on hardware, basically number of threads on the GPU). Tweaking tile size will NOT change the fact that Windows 10 renders 3 times slower than Linux or Windows 7. > > @kOsMos, We don't have Titan X here in the studio, but i can not confirm such a slowdown with 980Ti (which is the same Compute Capability 5.2) we do have here, both 2.76 and 2.77a behaves same slow. > > == To conclude == > > So what did a while ago was we've created a special build which avoids ANY of CPU interaction during GPU rendering, avoiding any possible latency (tile was fully sampled on GPU, only final tile result was reported back to CPU). This way we're sure we're loading GPU as much as possible. > > This test was tested by @pXd (only by him btw, nobody else even dared to do tests which are needed for further investigation). This did not give any measurable difference in the render time (if i read timing correct and comparing it to proper baseline). This means the root of the issue of render time difference between various platforms is not the way how we launch CUDA kernels, root of the issue is inside of the driver of OS itself. > > We also went couple of versions back and compared render times between Windows 10 and Linux, and Linux was consistently faster (around 3 times) on the same hardware. (And as we tested before, Windows 7 was quite on the same level as Linux). So we can not confirm any claims that sm_52-based cards were faster in previous releases. > > We also tested regular 980 card (NOT a Ti) on Windows 10 and Linux in the same machine. And surely enough it was much slower in Windows 10 again. > > All this currently leads us to a conclusion that it's something fundamentally broken in either Windows 10 itself or NVidia's driver for this platform. This isn't something we can look ourselves (well, we could, but MS is not really happy about reverse-engineering their products ;). For until some major update happens from either Microsoft or NVidia sides i don't see what else we can do here currently. > > P.S. Comparing Cycles to IRay is not really legit. IRay is specifically designed and optimized for CUDA architecture. Additionally, as far as i can see nobody compared IRay on Win10 and Linux, so you can't say IRay's performance is on it's maximum either, it might be same 3x times faster on Linux. > P.P.S. Again, even inoptimal design of Cycles Kernel does not cancel the fact that it is only slow on Windows 10 and that's it's much-much faster on Win7 and Linux. I would agree with your conclusion only if Octane was also slower but its 20%+ faster than 780ti which is expected so pointing to windows 10 or nvidia drivers is possible but i doubt it. I am going to test your conclusion in a fresh install of win7pro and report back soon. Thanks.
jammer commented 7 years ago

@Sergey this is very interesting news. What should us Win10 980ti/TitanX users do next? How should we highlight this to the companies that can make a difference?

@Sergey this is very interesting news. What should us Win10 980ti/TitanX users do next? How should we highlight this to the companies that can make a difference?
Collaborator

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger
Collaborator

Did anyone test the TDR related settings and their impact on this ?

https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx

I would increase all timeouts by a factor of 10 at least and maybe just try to disable tdr all together by setting level to 0.

Did anyone test the TDR related settings and their impact on this ? https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx I would increase all timeouts by a factor of 10 at least and maybe just try to disable tdr all together by setting level to 0.

How can it be an Nvidia driver issue when previous versions of Blender worked fine with the 980Ti and Titan on the same OS and Nvidia driver. The only thing that changed was the Blender version, which tells me it's something in The new blender version that is the issue. Furthermore this is a cross platform issue. I have OSX 10.10.4 and have the same problem.

How can it be an Nvidia driver issue when previous versions of Blender worked fine with the 980Ti and Titan on the same OS and Nvidia driver. The only thing that changed was the Blender version, which tells me it's something in The new blender version that is the issue. Furthermore this is a cross platform issue. I have OSX 10.10.4 and have the same problem.
Sergey commented 7 years ago
Owner

@LMProductions-1, we can only comment on bugs we can reproduce, and as i already mentioned above: we can NOT confirm that previous versions of Blender are measurable faster so far.

You might be experiencing another issue (which you actually reported in #47808, and where crucial part of information was given by @JoelGerlach -- you can't compare 2.76 and 2.77 releases with a smoke render because 2.76 did not have smoke on GPU). I'll check that report tomorrow and if it'll reproduceable here re-open as a separate issue.

@LMProductions-1, we can only comment on bugs we can reproduce, and as i already mentioned above: we can NOT confirm that previous versions of Blender are measurable faster so far. You might be experiencing another issue (which you actually reported in #47808, and where crucial part of information was given by @JoelGerlach -- you can't compare 2.76 and 2.77 releases with a smoke render because 2.76 did not have smoke on GPU). I'll check that report tomorrow and if it'll reproduceable here re-open as a separate issue.
Ton commented 7 years ago
Collaborator

Roman: Octane is a CUDA-only render engine, it probably uses much smaller kernels - its a complete different architecture.

Further: I have witnessed Sergey doing tests, and he did an incredible thorough job, spending days on it.
This is not in our hands anymore.

The error is in the Nvidia drivers for Windows 10. We cannot fix this. Tell Nvidia. Tell Microsoft. Or use Linux or Windows7.

Roman: Octane is a CUDA-only render engine, it probably uses much smaller kernels - its a complete different architecture. Further: I have witnessed Sergey doing tests, and he did an incredible thorough job, spending days on it. This is not in our hands anymore. **The error is in the Nvidia drivers for Windows 10. We cannot fix this. Tell Nvidia. Tell Microsoft. Or use Linux or Windows7.**

I'd like to also add that Windows 8.1 works too, Just as good as Windows 7.

I'd like to also add that Windows 8.1 works too, Just as good as Windows 7.
kOsMos commented 7 years ago

In #45093#374685, @SteveLund wrote:
I'd like to also add that Windows 8.1 works too, Just as good as Windows 7.

What works? You telling me that you can render mike pans bmw scene under 20s with titan x or 980ti under windows 8.1 or 7??!

> In #45093#374685, @SteveLund wrote: > I'd like to also add that Windows 8.1 works too, Just as good as Windows 7. What works? You telling me that you can render mike pans bmw scene under 20s with titan x or 980ti under windows 8.1 or 7??!

In #45093#374615, @MartijnBerger wrote:
Did anyone test the TDR related settings and their impact on this ?

https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx

I would increase all timeouts by a factor of 10 at least and maybe just try to disable tdr all together by setting level to 0.

I changed the TdrDelay and TdrLevel (22 for each) and whilst there is no improvement in speed, as to be expected, there is a great improvement in terms of stability. On the default Windows 10 values I would get Tdr errors and Blender would crash when rendering progressively for complex scenes. Increasing the Tdr delay lets Windows give the graphics card more time to render, so it no longer produces Tdr timeout errors. I'm now happy with this setting and although the Titan X is not quite as fast as the old Titan, it is now stable and it renders OK using relatively small tiles. Lets hope Nvidia and/or Microsoft improve the speed someday!

> In #45093#374615, @MartijnBerger wrote: > Did anyone test the TDR related settings and their impact on this ? > > https://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx > > I would increase all timeouts by a factor of 10 at least and maybe just try to disable tdr all together by setting level to 0. I changed the TdrDelay and TdrLevel (22 for each) and whilst there is no improvement in speed, as to be expected, there is a great improvement in terms of stability. On the default Windows 10 values I would get Tdr errors and Blender would crash when rendering progressively for complex scenes. Increasing the Tdr delay lets Windows give the graphics card more time to render, so it no longer produces Tdr timeout errors. I'm now happy with this setting and although the Titan X is not quite as fast as the old Titan, it is now stable and it renders OK using relatively small tiles. Lets hope Nvidia and/or Microsoft improve the speed someday!
Moony commented 7 years ago

There must be something going on internal to Blender though (in addition to any Windows 10 issues).

Over the past year or so that I had my windows 10, 980Ti machine - I have run the BMW benchmark on a few versions of Blender as they have been released. Using a tile size of 960X540 (which is what I determined under 2.75a gave me the fastest render) I got the following render times:

2.75a (Sep 2015) = 1:16
2.77 (April 2016) = 2:43

Ok - it could be argued that patches/drivers for Windows and Nvidia etc may have changed between versions and could account for the massive performance hit - so I just downloaded a few versions of Blender from the repository and rerun the test again back to back. I simply loaded the BMW scene and hit render - so all the scene settings are as loaded from the file.

2.73 = 1:04
2.75a = 1:19
2.76 = 1:19
2.77 = 2:02

Whilst I rendered these scenes I monitored system performance using a utility called GPU-Z. Most of the figures looked the same regardless of which blender version I was using (GPU load, Memory used , Power Consumption etc) - however one figure was very different "Memory Controller Load".

For 2.73, 2.75a and 2.76 this value sat around 30-40% mark throughout the render - bouncing around a little, but consistently high. For version 2.77 however, "memory controller load" peaked at around half that figure - never getting any higher than around 15% at any point in the render.

I don't know whether higher or lower figures are better when it comes to this parameter - but it did strike me as odd that it should undergo a step change between blender versions in much the same way as render times have.

There must be something going on internal to Blender though (in addition to any Windows 10 issues). Over the past year or so that I had my windows 10, 980Ti machine - I have run the BMW benchmark on a few versions of Blender as they have been released. Using a tile size of 960X540 (which is what I determined under 2.75a gave me the fastest render) I got the following render times: 2.75a (Sep 2015) = 1:16 2.77 (April 2016) = 2:43 Ok - it could be argued that patches/drivers for Windows and Nvidia etc may have changed between versions and could account for the massive performance hit - so I just downloaded a few versions of Blender from the repository and rerun the test again back to back. I simply loaded the BMW scene and hit render - so all the scene settings are as loaded from the file. 2.73 = 1:04 2.75a = 1:19 2.76 = 1:19 2.77 = 2:02 Whilst I rendered these scenes I monitored system performance using a utility called GPU-Z. Most of the figures looked the same regardless of which blender version I was using (GPU load, Memory used , Power Consumption etc) - however one figure was very different "Memory Controller Load". For 2.73, 2.75a and 2.76 this value sat around 30-40% mark throughout the render - bouncing around a little, but consistently high. For version 2.77 however, "memory controller load" peaked at around half that figure - never getting any higher than around 15% at any point in the render. I don't know whether higher or lower figures are better when it comes to this parameter - but it did strike me as odd that it should undergo a step change between blender versions in much the same way as render times have.

@Moony I am beginning to believe there is a memory component to this as well. In all our tests here (Moony, we have the same types of results as you, showcasing a marked slowdown in 2.77 that is not present in 2.75 or 2.76 while utilizing the same drivers) I've noticed that the Memory Controller Load on the TitanX is significantly reduced while in 2.77 that is not present in other renders. So I can independently verify your results. Is it possible that GPUs which contain higher vRAM (6GB and over) are experiencing some kind of memory leak or underutilization due to some kind of RAM limit? I'm a bit out of my depth on this, clearly.

@Sergey Thanks for bringing some order back to this report. :) I do want to note that the Smoke on GPU I believe is related to this problem, not independent from it, as in that environment the 780ti was exponentially faster than the TitanX when rendering smoke. I believe volumetric calculations exacerbate this problem we're reporting here, though the subject of the rendering is slightly different. We're still seeing the 780ti outpreform the TitanX every single time. That is, whenever we can get it to render as the 708ti's 3GB vRAM limit is rather limiting these days. :)

I did not see the earlier build you had, Sergey, for testing out the GPU calculations. If you'd like I can run some tests here on the TitanX to see if the results would be similar to what Adam yielded. Where can I find that build?

Perhaps we just need to pool together to buy the foundation a TitanX. :)

@Moony I am beginning to believe there is a memory component to this as well. In all our tests here (Moony, we have the same types of results as you, showcasing a marked slowdown in 2.77 that is not present in 2.75 or 2.76 while utilizing the same drivers) I've noticed that the Memory Controller Load on the TitanX is significantly reduced while in 2.77 that is not present in other renders. So I can independently verify your results. Is it possible that GPUs which contain higher vRAM (6GB and over) are experiencing some kind of memory leak or underutilization due to some kind of RAM limit? I'm a bit out of my depth on this, clearly. @Sergey Thanks for bringing some order back to this report. :) I do want to note that the Smoke on GPU I believe is related to this problem, not independent from it, as in that environment the 780ti was exponentially faster than the TitanX when rendering smoke. I believe volumetric calculations exacerbate this problem we're reporting here, though the subject of the rendering is slightly different. We're still seeing the 780ti outpreform the TitanX every single time. That is, whenever we can get it to render as the 708ti's 3GB vRAM limit is rather limiting these days. :) I did not see the earlier build you had, Sergey, for testing out the GPU calculations. If you'd like I can run some tests here on the TitanX to see if the results would be similar to what Adam yielded. Where can I find that build? Perhaps we just need to pool together to buy the foundation a TitanX. :)
kOsMos commented 7 years ago

In #45093#374709, @JoelGerlach wrote:
@Moony I am beginning to believe there is a memory component to this as well. In all our tests here (Moony, we have the same types of results as you, showcasing a marked slowdown in 2.77 that is not present in 2.75 or 2.76 while utilizing the same drivers) I've noticed that the Memory Controller Load on the TitanX is significantly reduced while in 2.77 that is not present in other renders. So I can independently verify your results. Is it possible that GPUs which contain higher vRAM (6GB and over) are experiencing some kind of memory leak or underutilization due to some kind of RAM limit? I'm a bit out of my depth on this, clearly.

@Sergey Thanks for bringing some order back to this report. :) I do want to note that the Smoke on GPU I believe is related to this problem, not independent from it, as in that environment the 780ti was exponentially faster than the TitanX when rendering smoke. I believe volumetric calculations exacerbate this problem we're reporting here, though the subject of the rendering is slightly different. We're still seeing the 780ti outpreform the TitanX every single time. That is, whenever we can get it to render as the 708ti's 3GB vRAM limit is rather limiting these days. :)

I did not see the earlier build you had, Sergey, for testing out the GPU calculations. If you'd like I can run some tests here on the TitanX to see if the results would be similar to what Adam yielded. Where can I find that build?

Perhaps we just need to pool together to buy the foundation a TitanX. :)

Higher vRAM was what I also suggested could be the issue.

Does it make a difference on which OS a program gets compiled on?

> In #45093#374709, @JoelGerlach wrote: > @Moony I am beginning to believe there is a memory component to this as well. In all our tests here (Moony, we have the same types of results as you, showcasing a marked slowdown in 2.77 that is not present in 2.75 or 2.76 while utilizing the same drivers) I've noticed that the Memory Controller Load on the TitanX is significantly reduced while in 2.77 that is not present in other renders. So I can independently verify your results. Is it possible that GPUs which contain higher vRAM (6GB and over) are experiencing some kind of memory leak or underutilization due to some kind of RAM limit? I'm a bit out of my depth on this, clearly. > > @Sergey Thanks for bringing some order back to this report. :) I do want to note that the Smoke on GPU I believe is related to this problem, not independent from it, as in that environment the 780ti was exponentially faster than the TitanX when rendering smoke. I believe volumetric calculations exacerbate this problem we're reporting here, though the subject of the rendering is slightly different. We're still seeing the 780ti outpreform the TitanX every single time. That is, whenever we can get it to render as the 708ti's 3GB vRAM limit is rather limiting these days. :) > > I did not see the earlier build you had, Sergey, for testing out the GPU calculations. If you'd like I can run some tests here on the TitanX to see if the results would be similar to what Adam yielded. Where can I find that build? > > Perhaps we just need to pool together to buy the foundation a TitanX. :) Higher vRAM was what I also suggested could be the issue. Does it make a difference on which OS a program gets compiled on?
Moony commented 7 years ago

I have just logged the results of my testing to a file so I could analyse them in Excel.

I re-ran the test again on 2.73, 2.76 and 2.77

Render time:
1:05, 1:19, 2:02

Average Memory Controller Load (%)
30.1%, 28.2%, 17.6%

Average Power Consumption (% TDP - 'Thermal Design Power'......apparently)
59.4%, 57.2%, 53.8%

I plotted render time vs power consumption and memory controller load - and there is a good correlation with a linear least squares fit (especially for render time vs Memory Controller load which has an r-squared value greater than 0.99).

It seems that the memory controller loading is much lower in 2.77 compared to 2.76 and 2.73. The power consumption of the card is also much lower (meaning it is being underutilised?)

I guess what this data does not answer is whether the lower memory controller load and power consumption is the cause of the longer render time - or a consequence of it.

I have just logged the results of my testing to a file so I could analyse them in Excel. I re-ran the test again on 2.73, 2.76 and 2.77 Render time: 1:05, 1:19, 2:02 Average Memory Controller Load (%) 30.1%, 28.2%, 17.6% Average Power Consumption (% TDP - 'Thermal Design Power'......apparently) 59.4%, 57.2%, 53.8% I plotted render time vs power consumption and memory controller load - and there is a good correlation with a linear least squares fit (especially for render time vs Memory Controller load which has an r-squared value greater than 0.99). It seems that the memory controller loading is much lower in 2.77 compared to 2.76 and 2.73. The power consumption of the card is also much lower (meaning it is being underutilised?) I guess what this data does not answer is whether the lower memory controller load and power consumption is the cause of the longer render time - or a consequence of it.
pXd commented 7 years ago

In #45093#374717, @Moony wrote:
I have just logged the results of my testing to a file so I could analyse them in Excel.

I re-ran the test again on 2.73, 2.76 and 2.77

Render time:
1:05, 1:19, 2:02

Average Memory Controller Load (%)
30.1%, 28.2%, 17.6%

Average Power Consumption (% TDP - 'Thermal Design Power'......apparently)
59.4%, 57.2%, 53.8%

I plotted render time vs power consumption and memory controller load - and there is a good correlation with a linear least squares fit (especially for render time vs Memory Controller load which has an r-squared value greater than 0.99).

It seems that the memory controller loading is much lower in 2.77 compared to 2.76 and 2.73. The power consumption of the card is also much lower (meaning it is being underutilised?)

I guess what this data does not answer is whether the lower memory controller load and power consumption is the cause of the longer render time - or a consequence of it.

I would have to agree. My power consumption is low on the 980ti , low enough that the fans stay near idle when rendering which isnt the case on my 760 and isnt the case when rendering in iray on the 980ti, though as sergey mentioned cant really compare with iray. Also, im not sure if this is what you're referring to with the mem controller but my scene takes a lot longer to load onto the 980ti.

> In #45093#374717, @Moony wrote: > I have just logged the results of my testing to a file so I could analyse them in Excel. > > I re-ran the test again on 2.73, 2.76 and 2.77 > > Render time: > 1:05, 1:19, 2:02 > > Average Memory Controller Load (%) > 30.1%, 28.2%, 17.6% > > Average Power Consumption (% TDP - 'Thermal Design Power'......apparently) > 59.4%, 57.2%, 53.8% > > I plotted render time vs power consumption and memory controller load - and there is a good correlation with a linear least squares fit (especially for render time vs Memory Controller load which has an r-squared value greater than 0.99). > > It seems that the memory controller loading is much lower in 2.77 compared to 2.76 and 2.73. The power consumption of the card is also much lower (meaning it is being underutilised?) > > I guess what this data does not answer is whether the lower memory controller load and power consumption is the cause of the longer render time - or a consequence of it. I would have to agree. My power consumption is low on the 980ti , low enough that the fans stay near idle when rendering which isnt the case on my 760 and isnt the case when rendering in iray on the 980ti, though as sergey mentioned cant really compare with iray. Also, im not sure if this is what you're referring to with the mem controller but my scene takes a lot longer to load onto the 980ti.
kOsMos commented 7 years ago

In #45093#374684, @Ton wrote:
Roman: Octane is a CUDA-only render engine, it probably uses much smaller kernels - its a complete different architecture.

Further: I have witnessed Sergey doing tests, and he did an incredible thorough job, spending days on it.
This is not in our hands anymore.

**The error is in the Nvidia drivers for Windows 10. We cannot fix this. Tell Nvidia. Tell Microsoft. Or use Linux or Windows7.
**

Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result.
How come this has not been documented yet in manual that there is issue with windows 10 and people should use win 7 or 8.1 for optimal performance? This should be in bold letters on every page! :) This is not small thing to ignore, going from 56s in windows 10 to 19.55s (at stock gpu clocks) in windows 8.1 its huge time savings. So Linux should be even faster? Same results in v2.77 and 2.76 under windows 8.1!

Win: 8.1 Pro
GPU: Titan X
Blender: v2.77a and v2.76
Scene: BMWM1 Mike Pan
Tiles: 512x512
Memory Controler Load: 56%(max)
Power Consumption: 75%(Max)
GPU Load: 99%
Make sure you to set GLSL under Images Draw Method for even faster renders, this speed up my test render by 2 seconds, basically 10%!

So what does this prove? Seems it points to Geforce Drivers for windows 10 to have a bug for Maxwell GPU's as @Ton pointed out. I have sent a ticket to Nvidia support and will report back once I hear back from them!

Here is more good stuff:

When rendering in Windows 10, Titan X is processing 7 samples per second, but in windows 8.1 it is processing at least 21 to 25+ !!!

*GPU Memory Load Controller Load in Windows 10 is max 22% whereas in Windows 8.1 is 56%
Something is throteling the GPU in cycles.. I also tested Octane under Win 8.1 and it performs exactly the same as in Win 10.

*In Octane, GPU Memory Controller Load is at 70%!!! So why in Cycles it "caps" at 22%? this is pointing more to a bug in cycles that it is not compatible with windows 10 and not nvidia driver. We will find out soon.

I think we are getting very close where this bug is! :)

@Moony thanks for pointing out the Memory controller load, i think this is a huge lead!

> In #45093#374684, @Ton wrote: > Roman: Octane is a CUDA-only render engine, it probably uses much smaller kernels - its a complete different architecture. > > Further: I have witnessed Sergey doing tests, and he did an incredible thorough job, spending days on it. > This is not in our hands anymore. > > **The error is in the Nvidia drivers for Windows 10. We cannot fix this. Tell Nvidia. Tell Microsoft. Or use Linux or Windows7. > ** Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result. How come this has not been documented yet in manual that there is issue with windows 10 and people should use win 7 or 8.1 for optimal performance? This should be in bold letters on every page! :) This is not small thing to ignore, going from 56s in windows 10 to 19.55s (at stock gpu clocks) in windows 8.1 its huge time savings. So Linux should be even faster? Same results in v2.77 and 2.76 under windows 8.1! Win: 8.1 Pro GPU: Titan X Blender: v2.77a and v2.76 Scene: BMWM1 Mike Pan Tiles: 512x512 Memory Controler Load: 56%(max) Power Consumption: 75%(Max) GPU Load: 99% *Make sure you to set GLSL under Images Draw Method for even faster renders, this speed up my test render by 2 seconds, basically 10%!* So what does this prove? Seems it points to Geforce Drivers for windows 10 to have a bug for Maxwell GPU's as @Ton pointed out. I have sent a ticket to Nvidia support and will report back once I hear back from them! Here is more good stuff: ***When rendering in Windows 10, Titan X is processing 7 samples per second, but in windows 8.1 it is processing at least 21 to 25+ !!!*** ***GPU Memory Load Controller Load in Windows 10 is max 22% whereas in Windows 8.1 is 56%** Something is throteling the GPU in cycles.. I also tested Octane under Win 8.1 and it performs exactly the same as in Win 10. ***In Octane, GPU Memory Controller Load is at 70%!!! So why in Cycles it "caps" at 22%?** this is pointing more to a bug in cycles that it is not compatible with windows 10 and not nvidia driver. We will find out soon. I think we are getting very close where this bug is! :) @Moony thanks for pointing out the Memory controller load, i think this is a huge lead!
Sergey commented 7 years ago
Owner

Issues with speed regression on same OS, same GPU and everything belongs to #47808. I'll be checking it today.

It is important to separate two completely orthogonal cases:

  1. Render time difference caused by ongoing Blender development.
    Those could be detected when running different versions of Blender on same exact machine. This seems to be in the scope of #47808, and not this report. This issue we'll look into, and first guess is that it's caused by higher register pressure caused by enabling SSS by default.
  2. Render time difference of same Blender version on different OS/hardware.
    This is what this report is about and something what we concluded is not under our control.

Replying the power consumption theory. Making decision on a memory controller power consumption is totally misleading. Kernel can be using local registers and not access to the global memory. Even tho it'll cause lower load on the memory controller, this is actually ideal case scenario -- all the data is local, can accessed without penalties accessing the global memory.

Additionally, Blender can NOT control any power settings. Just period. It's all under the driver settings and such.

Another possibility here is that it's something to do with the switch to a newer CUDA toolkit.

But please, don't make a mess from the bug reports. Again, it's not a forum, develoeprs needs time to deal with all the piling bugs and (what's most important) need to be able to reproduce the bug. We will be looking into reported speed regression on the same-hardware-configuration. Please just be patient.

Issues with speed regression on same OS, same GPU and everything belongs to #47808. I'll be checking it today. It is important to separate two completely orthogonal cases: 1. Render time difference caused by ongoing Blender development. Those could be detected when running different versions of Blender on same exact machine. This seems to be in the scope of #47808, and not this report. This issue we'll look into, and first guess is that it's caused by higher register pressure caused by enabling SSS by default. 2. Render time difference of same Blender version on different OS/hardware. This is what this report is about and something what we concluded is not under our control. Replying the power consumption theory. Making decision on a memory controller power consumption is totally misleading. Kernel can be using local registers and not access to the global memory. Even tho it'll cause lower load on the memory controller, this is actually ideal case scenario -- all the data is local, can accessed without penalties accessing the global memory. Additionally, Blender can NOT control any power settings. Just period. It's all under the driver settings and such. Another possibility here is that it's something to do with the switch to a newer CUDA toolkit. But please, don't make a mess from the bug reports. Again, it's not a forum, develoeprs needs time to deal with all the piling bugs and (what's most important) need to be able to reproduce the bug. We will be looking into reported speed regression on the same-hardware-configuration. Please just be patient.
pXd commented 7 years ago

@Sergey

Let me know if you want me to do any specific linux tests with my same Win 10 standalone 980ti rig, I'll just boot into a live/persistent usb.

@Sergey Let me know if you want me to do any specific linux tests with my same Win 10 standalone 980ti rig, I'll just boot into a live/persistent usb.
jammer commented 7 years ago

So that we don't loose momentum on this I've created a Blender Artists Thread

And a thread on GeForce Thread

I cannot find a suitable place to report this to Microsoft. Anyone know?

So that we don't loose momentum on this I've created a [Blender Artists Thread ](http://blenderartists.org/forum/showthread.php?399045-Windows-10-amp-GTX-980TI-Titan-X-Owners-What-to-do) And a thread on [GeForce Thread ](https://forums.geforce.com/default/topic/936396/geforce-drivers/geforce-gm200-980ti-titanx-amp-windows-10-performance-blender-3d-etc/) I cannot find a suitable place to report this to Microsoft. Anyone know?
brecht commented 7 years ago
Owner

I reported this bug to NVidia and they're asking if the problem still exists with the latest driver version (365.19). Can anyone confirm if that's the case?

I reported this bug to NVidia and they're asking if the problem still exists with the latest driver version (365.19). Can anyone confirm if that's the case?

The problem is the same with the latest driver.

The problem is the same with the latest driver.

In #45093#374724, @kOsMos wrote:
Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result.

Are you talking about the new BMW scene with two cars (BMW27.blend)?

There are lots of reports for this scene to take more than one minute with blender 2.77, Linux and a single Titan X. On my system I get 1:06 (which is not significantly better than the results with a 760ti). 20s would be really satisfying.

Update: I just did a test on Win 8.1 Enterprise on my Titan X (driver version 365.19): 1:08 min. There is no big difference between Linux and Win 8.1 on my computer.

> In #45093#374724, @kOsMos wrote: > Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result. Are you talking about the new BMW scene with two cars (BMW27.blend)? There are lots of reports for this scene to take more than one minute with blender 2.77, Linux and a single Titan X. On my system I get 1:06 (which is not significantly better than the results with a 760ti). 20s would be really satisfying. Update: I just did a test on Win 8.1 Enterprise on my Titan X (driver version 365.19): 1:08 min. There is no big difference between Linux and Win 8.1 on my computer.
Moony commented 7 years ago

In #45093#374863, @mattcawte wrote:
The problem is the same with the latest driver.

Yep I concur - I updated my drivers to 365.19 last night. Results today are the same (I have posted some figures on the Blenderartists thread).

> In #45093#374863, @mattcawte wrote: > The problem is the same with the latest driver. Yep I concur - I updated my drivers to 365.19 last night. Results today are the same (I have posted some figures on the Blenderartists thread).
kOsMos commented 7 years ago

In #45093#374864, @Klaus-4 wrote:

In #45093#374724, @kOsMos wrote:
Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result.

Are you talking about the new BMW scene with two cars (BMW27.blend)?

There are lots of reports for this scene to take more than one minute with blender 2.77, Linux and a single Titan X. On my system I get 1:06 (which is not significantly better than the results with a 760ti). 20s would be really satisfying.

Update: I just did a test on Win 8.1 Enterprise on my Titan X (driver version 365.19): 1:08 min. There is no big difference between Linux and Win 8.1 on my computer.

I use the old BMW1M-MikePan.blend scene.

There is No difference with latest Geforce Driver 365.19

Here are results with BMW27.blend file

Win 10
240x136 1:56 (Default)

Win 8.1
240x136 1:01 (Default)
512x512 0:51
480x540 0:49

> In #45093#374864, @Klaus-4 wrote: >> In #45093#374724, @kOsMos wrote: >> Got under 20s with Titan X in Windows 8.1. Tiles 512x512 seem to give best result. > > Are you talking about the new BMW scene with two cars (BMW27.blend)? > > There are lots of reports for this scene to take more than one minute with blender 2.77, Linux and a single Titan X. On my system I get 1:06 (which is not significantly better than the results with a 760ti). 20s would be really satisfying. > > > Update: I just did a test on Win 8.1 Enterprise on my Titan X (driver version 365.19): 1:08 min. There is no big difference between Linux and Win 8.1 on my computer. I use the old BMW1M-MikePan.blend scene. There is No difference with latest Geforce Driver 365.19 Here are results with BMW27.blend file Win 10 240x136 1:56 (Default) Win 8.1 240x136 1:01 (Default) 512x512 0:51 480x540 0:49

no change with Geforce Driver 365.19. All results same as I reported before.

no change with Geforce Driver 365.19. All results same as I reported before.
brecht commented 7 years ago
Owner

Thanks! I've reported it back to NVidia.

Thanks! I've reported it back to NVidia.
jammer commented 7 years ago

Just for completeness - I tried 365.19 as well, no change in performance.

Just for completeness - I tried 365.19 as well, no change in performance.
sasa42 commented 7 years ago

Added subscriber: @sasa42

Added subscriber: @sasa42
sasa42 commented 7 years ago

BMW27.blend.
Hard slowdown here too.

2.73a
364.72
64GB

CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16
GPU: GeForce GTX TitanX
OS: Windows 10 64bit

Time: 0 min 23 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto
Time: 1 min 02 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto

Time: 1 min 51 sec (CPU) 32x32 Tiles Auto

2.76
364.72
64GB

CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16
GPU: GeForce GTX TitanX
OS: Windows 10 64bit

Time: 0 min 29 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto
Time: 1 min 32 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto

Time: 1 min 57 sec (CPU) 32x32 Tiles Auto

2.77a
364.72
64GB

CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16
GPU: GeForce GTX TitanX
OS: Windows 10 64bit

Time: 0 min 46 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto
Time: 2 min 24 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto

Time: 1 min 56 sec (CPU) 32x32 Tiles Auto

BMW27.blend. Hard slowdown here too. 2.73a 364.72 64GB CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16 GPU: GeForce GTX TitanX OS: Windows 10 64bit Time: 0 min 23 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 1 min 02 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 1 min 51 sec (CPU) 32x32 Tiles Auto 2.76 364.72 64GB CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16 GPU: GeForce GTX TitanX OS: Windows 10 64bit Time: 0 min 29 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 1 min 32 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 1 min 57 sec (CPU) 32x32 Tiles Auto 2.77a 364.72 64GB CPU: Intel Core(TM) i7-5960X CPU @ 3.00GHz @4.17GHz 8 Cores logische 16 GPU: GeForce GTX TitanX OS: Windows 10 64bit Time: 0 min 46 seconds (4x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 2 min 24 seconds (1x GeForce GTX TitanX - CUDA) 240x180 Tiles Auto Time: 1 min 56 sec (CPU) 32x32 Tiles Auto

Removed subscriber: @NachoConesa

Removed subscriber: @NachoConesa

Added subscriber: @tyanksar

Added subscriber: @tyanksar
brecht commented 7 years ago
Owner

NVidia have now been able to confirm the issue, and they have assigned it to the appropriate developer team. They had to put in quite some effort before they succeeded in reproducing this issue, testing with various GPUs, so I think they're taking this seriously.

NVidia have now been able to confirm the issue, and they have assigned it to the appropriate developer team. They had to put in quite some effort before they succeeded in reproducing this issue, testing with various GPUs, so I think they're taking this seriously.

That's good news, thanks very much for referring the problem to them.

That's good news, thanks very much for referring the problem to them.
pXd commented 7 years ago

@brecht Thanks for the update, looking forward to getting this sorted!

@brecht Thanks for the update, looking forward to getting this sorted!

That's a very good news.
After monthes of testing, it's good to hear that there is actually a problem and it was identified by the card supplier.

I really hope they'll find the solution very soon and we'll be able to enjoy our investment :)

Thanks for having pushed nvidia.

That's a very good news. After monthes of testing, it's good to hear that there is actually a problem and it was identified by the card supplier. I really hope they'll find the solution very soon and we'll be able to enjoy our investment :) Thanks for having pushed nvidia.
kOsMos commented 7 years ago

Ive been communicating with Nvidia Support for several days now, I also made a video for their dev team exactly how to reproduce the issue in win8.1 and win 10. This is the last worthy response I received from them. Supposedly the error has something to do with WDDM 2.0 since thats what windows 10 uses. Windows 7/8/8.1 use WDDM 1.1/1.2/1.3. yw :D

Subject
Rendering time higher in windows 10

Discussion Thread
Response Via Email (Farzana) 05/23/2016 06:07 AM
Hello,

Your case is being escalated to our Level 2 Technical Support group for further attention. The Level 2 agents will review the case notes to troubleshoot the issue and find a solution or workaround. As this process may take some time and require a good deal of testing and research, we ask that you be patient. A Level 2 tech will contact you as soon they can to assist or point you in the right direction.

Best Regards,

  • Farzana
    NVIDIA Customer Care
Ive been communicating with Nvidia Support for several days now, I also made a video for their dev team exactly how to reproduce the issue in win8.1 and win 10. This is the last worthy response I received from them. Supposedly the error has something to do with WDDM 2.0 since thats what windows 10 uses. Windows 7/8/8.1 use WDDM 1.1/1.2/1.3. yw :D Subject Rendering time higher in windows 10 Discussion Thread Response Via Email (Farzana) 05/23/2016 06:07 AM Hello, **Your case is being escalated to our Level 2 Technical Support group for further attention. The Level 2 agents will review the case notes to troubleshoot the issue and find a solution or workaround. As this process may take some time and require a good deal of testing and research, we ask that you be patient. A Level 2 tech will contact you as soon they can to assist or point you in the right direction.** Best Regards, - Farzana NVIDIA Customer Care

In #45093#375552, @kOsMos wrote:
Ive been communicating with Nvidia Support for several days now, I also made a video for their dev team exactly how to reproduce the issue in win8.1 and win 10. This is the last worthy response I received from them. Supposedly the error has something to do with WDDM 2.0 since thats what windows 10 uses. Windows 7/8/8.1 use WDDM 1.1/1.2/1.3. yw :D

Subject
Rendering time higher in windows 10

Discussion Thread
Response Via Email (Farzana) 05/23/2016 06:07 AM
Hello,

Your case is being escalated to our Level 2 Technical Support group for further attention. The Level 2 agents will review the case notes to troubleshoot the issue and find a solution or workaround. As this process may take some time and require a good deal of testing and research, we ask that you be patient. A Level 2 tech will contact you as soon they can to assist or point you in the right direction.

Best Regards,
-Farzana
NVIDIA Customer Care

Remember this is a Mac issue too. Not solely windows. With that perspective in mind it might help nvidia narrow down the issue.

> In #45093#375552, @kOsMos wrote: > Ive been communicating with Nvidia Support for several days now, I also made a video for their dev team exactly how to reproduce the issue in win8.1 and win 10. This is the last worthy response I received from them. Supposedly the error has something to do with WDDM 2.0 since thats what windows 10 uses. Windows 7/8/8.1 use WDDM 1.1/1.2/1.3. yw :D > > Subject > Rendering time higher in windows 10 > > Discussion Thread > Response Via Email (Farzana) 05/23/2016 06:07 AM > Hello, > > **Your case is being escalated to our Level 2 Technical Support group for further attention. The Level 2 agents will review the case notes to troubleshoot the issue and find a solution or workaround. As this process may take some time and require a good deal of testing and research, we ask that you be patient. A Level 2 tech will contact you as soon they can to assist or point you in the right direction.** > > Best Regards, > -Farzana > NVIDIA Customer Care Remember this is a Mac issue too. Not solely windows. With that perspective in mind it might help nvidia narrow down the issue.
kOsMos commented 7 years ago

Response Via Email (Troy) 05/24/2016 04:34 PM

Hello,

NVIDIA QA has reproduced the problem and Engineering is investigating it.

Best Regards,

Troy
NVIDIA Customer Care

Response Via Email (Troy) 05/24/2016 04:34 PM Hello, NVIDIA QA has reproduced the problem and Engineering is investigating it. Best Regards, Troy NVIDIA Customer Care
Sergey commented 7 years ago
Owner

There is no reason to paste each occasion when NVidia support accepts your issue, we already know that they accepted it. Respect others time and time of developers who'll need to scroll over all the comments here trying to extract meaningful information.

Anyway, did anyone try 368.22 driver? From some feedback seems it brings performance improvements.

There is no reason to paste each occasion when NVidia support accepts your issue, we already know that they accepted it. Respect others time and time of developers who'll need to scroll over all the comments here trying to extract meaningful information. Anyway, did anyone try 368.22 driver? From some feedback seems it brings performance improvements.

2X Titan X
Windows 10 Home
Blender 2.77 2016-05-24 (nightly build from buildbot)
Nvidia Driver: 368.22

I upgrade yesterday to 368.22 and noticed this morning the viewport sampled much faster, so ran the bmw27 benchmark scene. and I went from 1m15s~ with my dual TitanX setup, which FYI was slower than my dual GTX 580 at 1m11s down to 36s now to 33s with larger tiles (the issue before was that large tiles worked really good on fermi cards, while on maxwell small tiles gave a little better time)

74bd558ac09d410da652424c39025877.png

On a side note, before I got UI lags when using all cards, leaving none for the OS. Now it's buttery smooth, and they peak at 75C. Whatever they did it's a move in right direction, for me with GM200 / TITANX it's a huge boost.

2X Titan X Windows 10 Home Blender 2.77 2016-05-24 (nightly build from buildbot) Nvidia Driver: 368.22 I upgrade yesterday to **368.22** and noticed this morning the viewport sampled much faster, so ran the bmw27 benchmark scene. and I went from 1m15s~ with my dual TitanX setup, which FYI was slower than my dual GTX 580 at 1m11s down to 36s now to 33s with larger tiles (the issue before was that large tiles worked really good on fermi cards, while on maxwell small tiles gave a little better time) ![74bd558ac09d410da652424c39025877.png](https://archive.blender.org/developer/F314773/74bd558ac09d410da652424c39025877.png) On a side note, before I got UI lags when using all cards, leaving none for the OS. Now it's buttery smooth, and they peak at 75C. Whatever they did it's a move in right direction, for me with GM200 / TITANX it's a huge boost.
kOsMos commented 7 years ago

In #45093#375810, @Sergey wrote:
There is no reason to paste each occasion when NVidia support accepts your issue, we already know that they accepted it. Respect others time and time of developers who'll need to scroll over all the comments here trying to extract meaningful information.

Anyway, did anyone try 368.22 driver? From some feedback seems it brings performance improvements.

you cant be serious?
A lot of people are curious here to know that Nvidia Support was able to "reproduce" this problem which is i dunno man pretty darn good information. besides I am not posting "each occasion" only what is important. if you really want me I can post the entire email thread here which has over 10 replies. I also made a video outlining step by step for them how to reproduce this problem. So I dunno what you are talking about about respecting others time here. I am the one that put the effort making a how to video for them so dont talk to me about respecting others time.

> In #45093#375810, @Sergey wrote: > There is no reason to paste each occasion when NVidia support accepts your issue, we already know that they accepted it. Respect others time and time of developers who'll need to scroll over all the comments here trying to extract meaningful information. > > Anyway, did anyone try 368.22 driver? From some feedback seems it brings performance improvements. you cant be serious? A lot of people are curious here to know that Nvidia Support was able to "reproduce" this problem which is i dunno man pretty darn good information. besides I am not posting "each occasion" only what is important. if you really want me I can post the entire email thread here which has over 10 replies. I also made a video outlining step by step for them how to reproduce this problem. So I dunno what you are talking about about respecting others time here. I am the one that put the effort making a how to video for them so dont talk to me about respecting others time.

Added subscriber: @roman

Added subscriber: @roman

Also Updated the drivers to 368.22
Windows 10 professional edition.

Here are the result with the bmw27 - 400 samples

Titan black tiles : 256x256
1minute 09 sec

TITAN X : tiles 256x256
2 minutes 39 sec

I've tested on other scenes of my own.
The TITAN X performances are still really bad on my station.

Looking forward to new driver and Nvidia's updates.

@roman @Sergey , Please guies don't get pissed. I believe everyone is bored with this problem as we all have invested in these cards and can't solve the problem.
We all appreciate all what you're doing to solve this.

Pierrick

Also Updated the drivers to 368.22 Windows 10 professional edition. Here are the result with the bmw27 - 400 samples Titan black tiles : 256x256 1minute 09 sec TITAN X : tiles 256x256 2 minutes 39 sec I've tested on other scenes of my own. The TITAN X performances are still really bad on my station. Looking forward to new driver and Nvidia's updates. @roman @Sergey , Please guies don't get pissed. I believe everyone is bored with this problem as we all have invested in these cards and can't solve the problem. We all appreciate all what you're doing to solve this. Pierrick
pXd commented 7 years ago

@Sergey

Just tested new drivers (368.22) on the EVGA 980ti and EVGA 760.

Mikes BMW 2.7 scene:

My previous time was 1:54 on the 980ti.
My previous time was 2:07 on the 760.

New drivers time was 1:56 on the 980ti.
New drivers time 2:06 on the 760.

Though for information accuracy this time the 980ti is running the desktop.

Doesn't seem like much improvement for the 980ti.

Tile size were unaltered: 240 x 136

@Sergey Just tested new drivers (368.22) on the EVGA 980ti and EVGA 760. Mikes BMW 2.7 scene: My previous time was 1:54 on the 980ti. My previous time was 2:07 on the 760. New drivers time was 1:56 on the 980ti. New drivers time 2:06 on the 760. Though for information accuracy this time the 980ti is running the desktop. Doesn't seem like much improvement for the 980ti. Tile size were unaltered: 240 x 136
Sergey commented 7 years ago
Owner

Interesting results, and quite weird. It might be not be related on drivers, but on the work from @brecht did to reduce stack memory usage on GPU.

So perhaps need both latest driver AND latest builds from builder.blender.org?

Interesting results, and quite weird. It might be not be related on drivers, but on the work from @brecht did to reduce stack memory usage on GPU. So perhaps need both latest driver AND latest builds from builder.blender.org?

Testing with blender latest built 2.77.1 (F2ba 139) - WINDOWS 10

bmw27 - 400 samples - tiles 256x256

Titan Black
1 minute 09 sec.

TITAN X (HOLD YOUR BREATH!!!!)
1 minute 06 sec.

This is the first time I experienced better performances with the TITAN X and compared to official release, it's more than 2x faster (2 minutes 39 sec)

Using TITAN X and tiles : 512x512
56 sec.

Using TITAN X + TITAN black tile : 256x256
35.88 sec.

Another great news is thatwhen using both cards in viewport rendering, it does work very very fast while before using both card was slower than using only the TITAN black (like the X was slowering everything).
I've tested other scene with much more complexe shaders and textures and it works very very well.

So it seems This build works wayyyyyyy better than the current official release.
Really looking forward to your analysis!

**Testing with blender latest built 2.77.1 (F2ba 139) - WINDOWS 10** bmw27 - 400 samples - tiles 256x256 **Titan Black 1 minute 09 sec.** **TITAN X (HOLD YOUR BREATH!!!!) 1 minute 06 sec.** This is the first time I experienced better performances with the TITAN X and compared to official release, it's more than 2x faster (2 minutes 39 sec) Using TITAN X and tiles : 512x512 56 sec. Using TITAN X + TITAN black tile : 256x256 35.88 sec. Another great news is thatwhen using both cards in viewport rendering, it does work very very fast while before using both card was slower than using only the TITAN black (like the X was slowering everything). I've tested other scene with much more complexe shaders and textures and it works very very well. So it seems This build works wayyyyyyy better than the current official release. Really looking forward to your analysis!
Sergey commented 7 years ago
Owner

Well, kudos to @brecht for that :) And big ones ;)

Now important question: is it same speed for Win10, Win7 and Linux. That is something to be figured out still, because who knows, maybe Linux is still 2x faster.

Well, kudos to @brecht for that :) And big ones ;) Now important question: is it same speed for Win10, Win7 and Linux. That is something to be figured out still, because who knows, maybe Linux is still 2x faster.
pXd commented 7 years ago

@Sergey

Just downloaded latest from builder.blender.org

Definitely faster on the 980ti:

Mikes 2.7 scene: 1:06!

50 seconds faster than the official build it seems for me.

@Sergey Just downloaded latest from builder.blender.org Definitely faster on the 980ti: Mikes 2.7 scene: 1:06! 50 seconds faster than the official build it seems for me.
kOsMos commented 7 years ago

In #45093#375819, @pXd wrote:
@Sergey

Just downloaded latest from builder.blender.org

Definitely faster on the 980ti:

Mikes 2.7 scene: 1:06!

50 seconds faster than the official build it seems for me.

As I suspected a bug within blender/cycles.

New speed record! 18.96s Titan X 480x540 Windows 10 pro with latest blender build. BMW1M-MikePan.blend
This is 1s faster than Windows 8.1pro.

So what has been changed? Why all of a sudden now Titan X is even faster than 780Ti?

> In #45093#375819, @pXd wrote: > @Sergey > > Just downloaded latest from builder.blender.org > > Definitely faster on the 980ti: > > Mikes 2.7 scene: 1:06! > > 50 seconds faster than the official build it seems for me. As I suspected a bug within blender/cycles. **New speed record! 18.96s Titan X 480x540 Windows 10 pro with latest blender build. BMW1M-MikePan.blend** This is 1s faster than Windows 8.1pro. So what has been changed? Why all of a sudden now Titan X is even faster than 780Ti?
jammer commented 7 years ago

Windows 10
NVidia 362.22
Blender 2.77.0 (Nightly - Thu May 26 04:46:46 2016)

160 x 160 - 1:24
180 x 180 - 1:14
240 x 180 - 1:09 (down from 2:10)
240 x 240 - 1:09
480 x 480 - 1:03
512 x 512 - 1:04

Using Auto Tile set to Custom with a Target Size of 480

480 x 270 - 1:01

Using Auto Tile set to Custom with a Target Size of 540

480 x 540 - 1:02

Windows 10 NVidia 362.22 Blender 2.77.0 (Nightly - Thu May 26 04:46:46 2016) 160 x 160 - 1:24 180 x 180 - 1:14 240 x 180 - 1:09 (down from 2:10) 240 x 240 - 1:09 480 x 480 - 1:03 512 x 512 - 1:04 Using Auto Tile set to Custom with a Target Size of 480 480 x 270 - 1:01 Using Auto Tile set to Custom with a Target Size of 540 480 x 540 - 1:02
Collaborator

Added subscriber: @mano-wii

Added subscriber: @mano-wii

5/26 build of blender 2.77

Nvidia 368.22 drivers

52.92 seconds at 512x512 on Windows 10 Pro 64bit with EVGA 980 Ti SC ACX 2.0

52.99s 2nd run

5/26 build of blender 2.77 Nvidia 368.22 drivers 52.92 seconds at 512x512 on Windows 10 Pro 64bit with EVGA 980 Ti SC ACX 2.0 52.99s 2nd run

Nothing was changed from my side,

Before and after installing driver 368.22: 1 min 29 sec
Card: nVidia GTX GeForce Titan X
Tiles: 128X128
Blender version: 2.77a

Nothing was changed from my side, Before and after installing driver 368.22: 1 min 29 sec Card: nVidia GTX GeForce Titan X Tiles: 128X128 Blender version: 2.77a
kOsMos commented 7 years ago

In #45093#375971, @tyanksar wrote:
Nothing was changed from my side,

Before and after installing driver 368.22: 1 min 29 sec
Card: nVidia GTX GeForce Titan X
Tiles: 128X128
Blender version: 2.77a

you need to download the latest build https://builder.blender.org/download/

> In #45093#375971, @tyanksar wrote: > Nothing was changed from my side, > > Before and after installing driver 368.22: 1 min 29 sec > Card: nVidia GTX GeForce Titan X > Tiles: 128X128 > Blender version: 2.77a you need to download the latest build https://builder.blender.org/download/

In #45093#375972, @kOsMos wrote:

In #45093#375971, @tyanksar wrote:
Nothing was changed from my side,

Before and after installing driver 368.22: 1 min 29 sec
Card: nVidia GTX GeForce Titan X
Tiles: 128X128
Blender version: 2.77a

you need to download the latest build https://builder.blender.org/download/

Thank you Roman,
I just downloaded the latest build, but now I have an 8 sec delay on rendering time:

1 min 37 sec

OS: WIndows 10 64-bit

> In #45093#375972, @kOsMos wrote: >> In #45093#375971, @tyanksar wrote: >> Nothing was changed from my side, >> >> Before and after installing driver 368.22: 1 min 29 sec >> Card: nVidia GTX GeForce Titan X >> Tiles: 128X128 >> Blender version: 2.77a > > you need to download the latest build https://builder.blender.org/download/ Thank you Roman, I just downloaded the latest build, but now I have an 8 sec delay on rendering time: 1 min 37 sec OS: WIndows 10 64-bit

Just tested on Linux & Windows 10 with latest blender and latest drivers, can confirm this issue now seems to be resolved \o/

Before (Build from March):

  • Win10 (driver v361.91): 02:00
  • Win10 (driver v368.22): 01:58
  • Linux: (352.63): 01:04

Now (Build from yesterday):

  • Win10 (368.22): 00:54
  • Linux (352.63): 00:53

Also tested the other 2 blends on the spreadsheet and added my results: https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0/edit?usp=sharing

Only updating to the new driver did not help, but updating to new blender fixed it :)

I didn't test if using old driver with new blender works too.

Just tested on Linux & Windows 10 with latest blender and latest drivers, can confirm **this issue now seems to be resolved \o/** ***Before** (Build from March):* - Win10 (driver v361.91): **02:00** - Win10 (driver v368.22): 01:58 - Linux: (352.63): 01:04 ***Now** (Build from yesterday):* - Win10 (368.22): **00:54** - Linux (352.63): 00:53 Also tested the other 2 blends on the spreadsheet and added my results: https://docs.google.com/spreadsheets/d/1KS4Ew6wfNmGHVQ_GPUmvdBpuvgKIzgwn_yMV6j_rzJ0/edit?usp=sharing Only updating to the new driver did not help, but updating to new blender fixed it :) I didn't test if using old driver with new blender works too.
Sergey commented 7 years ago
Owner

@GregZaal, thanks,very mush good comparison :)

So seems changes from @brecht not only improved memory usage and lead to a speedup on Linux, but also made Win10 drivers happy.

However, it is important to understand, that Win10 drivers are still considered broken and NVidia guys are looking into this. The thing here is: while we managed to reduce stress on GPU and made drivers happy, when we'll increase complexity of the kernel again (upcoming mipmaps, denoise, microdisplacement...) we'll pretty much risking to run into same exact speed issue.

@GregZaal, thanks,very mush good comparison :) So seems changes from @brecht not only improved memory usage and lead to a speedup on Linux, but also made Win10 drivers happy. However, it is important to understand, that Win10 drivers are still considered broken and NVidia guys are looking into this. The thing here is: while we managed to reduce stress on GPU and made drivers happy, when we'll increase complexity of the kernel again (upcoming mipmaps, denoise, microdisplacement...) we'll pretty much risking to run into same exact speed issue.
jammer commented 7 years ago

@Sergey Agreed. As far as I can tell from what I've read NVidia had to incorporate some form of shim into the driver due to the async DX12 problem. I'm not really completely up to speed on this but I think there are some issues with how this shim is effecting things like performance.

It's also interesting that this is a memory related thing since I have a blend here using a large model with refused to render on the 980ti but in these newer builds it flies through the GPU.

@Sergey Agreed. As far as I can tell from what I've read NVidia had to incorporate some form of shim into the driver due to the async DX12 problem. I'm not really completely up to speed on this but I think there are some issues with how this shim is effecting things like performance. It's also interesting that this is a memory related thing since I have a blend here using a large model with refused to render on the 980ti but in these newer builds it flies through the GPU.
brecht commented 7 years ago
Owner

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'
brecht closed this issue 7 years ago
brecht commented 7 years ago
Owner

I think we can close this report now and consider it resolved. Certainly we'll have to keep an eye on this to ensure it doesn't break again, and hopefully NVidia can improve things on their side. But latest Blender builds should be working OK now with the 980 Ti.

Various other issues came up here and those can get their own reports if they confirmed and worth investigating further, but not all 52 subscribers of this report need to be involved.

Thanks all for the tests & patience.

I think we can close this report now and consider it resolved. Certainly we'll have to keep an eye on this to ensure it doesn't break again, and hopefully NVidia can improve things on their side. But latest Blender builds should be working OK now with the 980 Ti. Various other issues came up here and those can get their own reports if they confirmed and worth investigating further, but not all 52 subscribers of this report need to be involved. Thanks all for the tests & patience.

In #45093#376390, @brecht wrote:
I think we can close this report now and consider it resolved. Certainly we'll have to keep an eye on this to ensure it doesn't break again, and hopefully NVidia can improve things on their side. But latest Blender builds should be working OK now with the 980 Ti.

Various other issues came up here and those can get their own reports if they confirmed and worth investigating further, but not all 52 subscribers of this report need to be involved.

Thanks all for the tests & patience.

Im glad things are considered resolved but I had reported a problem on March 15th (Task # #47807 and #47808) where I had problems with slow renders on a MacPro using the GTX 980 Ti and Titan X -- especially with Fire. I know fire rendering was just added in 2.77 but rendering fire on the GPU takes 10-20x longer than with 8-core 3ghz CPUs (literally). A few days later it was merged with this task because the issues were thought to be similiar. Now this task is resolved, but I still have $2400 of new GPUs I cant use. And with the 1080 coming out soon the money is practically lost entirely. My intention is not to blame anyone here for the loss, but point is, this task is a serious problem for us and we have this issue on 4 different machines running 2.77a and OSX and I reported it on March 15th and followed up with it and did several render tests on my end and to date, nothing has been done to address this issue. Can someone please acknowledge this and can we get the ball rolling on troubleshooting this please??

> In #45093#376390, @brecht wrote: > I think we can close this report now and consider it resolved. Certainly we'll have to keep an eye on this to ensure it doesn't break again, and hopefully NVidia can improve things on their side. But latest Blender builds should be working OK now with the 980 Ti. > > Various other issues came up here and those can get their own reports if they confirmed and worth investigating further, but not all 52 subscribers of this report need to be involved. > > Thanks all for the tests & patience. Im glad things are considered resolved but I had reported a problem on March 15th (Task # #47807 and #47808) where I had problems with slow renders on a MacPro using the GTX 980 Ti and Titan X -- especially with Fire. I know fire rendering was just added in 2.77 but rendering fire on the GPU takes 10-20x longer than with 8-core 3ghz CPUs (literally). A few days later it was merged with this task because the issues were thought to be similiar. Now this task is resolved, but I still have $2400 of new GPUs I cant use. And with the 1080 coming out soon the money is practically lost entirely. My intention is not to blame anyone here for the loss, but point is, this task is a serious problem for us and we have this issue on 4 different machines running 2.77a and OSX and I reported it on March 15th and followed up with it and did several render tests on my end and to date, nothing has been done to address this issue. Can someone please acknowledge this and can we get the ball rolling on troubleshooting this please??
brecht commented 7 years ago
Owner

@LMProductions-1, have you tried the latest OS X builds from builder.blender.org and confirmed that your issue still exists?

If it does still exist, please comment on the other ticket with render time comparisons between the CPU and GPU for a .blend that we can test, and we can reopen it as a to do item. Please do understand though that this Windows 10 issue got a lot of developer attenuation because it affects many users and every type of scene. If GPU smoke rendering on OS X is slow and only confirmed by one user, it's unlikely to be prioritized by developers over the hundreds of other tasks they have on their list.

@LMProductions-1, have you tried the latest OS X builds from builder.blender.org and confirmed that your issue still exists? If it does still exist, please comment on the other ticket with render time comparisons between the CPU and GPU for a .blend that we can test, and we can reopen it as a to do item. Please do understand though that this Windows 10 issue got a lot of developer attenuation because it affects many users and every type of scene. If GPU smoke rendering on OS X is slow and only confirmed by one user, it's unlikely to be prioritized by developers over the hundreds of other tasks they have on their list.

In #45093#376556, @brecht wrote:
@LMProductions-1, have you tried the latest OS X builds from builder.blender.org and confirmed that your issue still exists?

If it does still exist, please comment on the other ticket with render time comparisons between the CPU and GPU for a .blend that we can test, and we can reopen it as a to do item. Please do understand though that this Windows 10 issue got a lot of developer attenuation because it affects many users and every type of scene. If GPU smoke rendering on OS X is slow and only confirmed by one user, it's unlikely to be prioritized by developers over the hundreds of other tasks they have on their list.

Render times and a .blend file were uploaded to the task in March. Another user also tested the .blend and confirmed he had the issue as well -- and again, all this was posted in the thread in March. So if you can please re-open the task, that would be a good start.

> In #45093#376556, @brecht wrote: > @LMProductions-1, have you tried the latest OS X builds from builder.blender.org and confirmed that your issue still exists? > > If it does still exist, please comment on the other ticket with render time comparisons between the CPU and GPU for a .blend that we can test, and we can reopen it as a to do item. Please do understand though that this Windows 10 issue got a lot of developer attenuation because it affects many users and every type of scene. If GPU smoke rendering on OS X is slow and only confirmed by one user, it's unlikely to be prioritized by developers over the hundreds of other tasks they have on their list. Render times and a .blend file were uploaded to the task in March. Another user also tested the .blend and confirmed he had the issue as well -- and again, all this was posted in the thread in March. So if you can please re-open the task, that would be a good start.
Ton commented 7 years ago
Collaborator

Mike: this is an open source project and a lot of people here spend their free time on Blender development and debugging. The goal of reporting bugs is to help making Blender better for everyone.

I would also complain at Apple, at Nvidia or AMD if you have issues. After all they took your money.

Mike: this is an open source project and a lot of people here spend their free time on Blender development and debugging. The goal of reporting bugs is to help making Blender better for everyone. I would also complain at Apple, at Nvidia or AMD if you have issues. After all they took your money.

Removed subscriber: @JoostBouwer

Removed subscriber: @JoostBouwer

Removed subscriber: @TomTuko

Removed subscriber: @TomTuko

Removed subscriber: @MikePan

Removed subscriber: @MikePan

Removed subscriber: @MartinLindelof

Removed subscriber: @MartinLindelof
pXd commented 7 years ago

Removed subscriber: @pXd

Removed subscriber: @pXd

Thanx everyone involved in this, especially Brecht, Ton, Sergey and of course Dingto:

Performance here (Win7) is (noticeably) better with the new 2.77 builds with 980Ti than in 2.76b!!!
Also there is a BIG performance increase in scene preparation time!

Thank you again!

best regards,
Karsten Bitter

Thanx everyone involved in this, especially Brecht, Ton, Sergey and of course Dingto: Performance here (Win7) is (noticeably) better with the new 2.77 builds with 980Ti than in 2.76b!!! Also there is a BIG performance increase in scene preparation time! Thank you again! best regards, Karsten Bitter

Removed subscriber: @Eranekao

Removed subscriber: @Eranekao
Sign in to join this conversation.