Some benchmarks considering Metal device type show unrealistic values in Blender Open Data #96519

Open
opened 2022-03-16 13:28:13 +01:00 by ExLibris · 32 comments

System Information
Operating system: n/a
Graphics card: n/a

Blender Version
Broken: n/a
Worked: n/a
Since a few days it is possible to submit benchmarks with Metal as device_type. In some cases the sample_per_minute value seem quite unrealistic.
It appears only on the Junkshop scene for Metal device type. The "samples_per_minute"-value are in this cases 10,000-700,000, while all other values are in the range as shown in the added picture.

ID-SUBMISSION for the top-10 scores:

id_submission Samples per minute
90596dba-2af3-4e24-988a-140b0565c203 710,228.40
62dfe033-a81f-4e7c-bed3-ddc62ab165a0 648,278.30
d2deef41-29c1-4878-b11d-153758945712 597,953.50
a3a89bd2-beec-4ac6-a4a6-17aed94e6f03 571,592.70
1c34e618-59fd-4772-b364-1b2f8c870f54 567,603.60
a6ed0d61-8d61-458e-acba-ecc5a56a7dac 547,242.40
5ba65bc7-d84f-4370-aad0-4bb6c05fdb5f 515,224.80
f8a1c1a5-fafe-4d78-897e-6804c9f2e604 514,256.00
44ef2573-677a-42fb-9997-e291480e48cf 486,760.50
03b7ab96-c1f8-41ad-9da9-25cf293ca425 483,787.20

Snapshot of List table - Render method 1 03-16-2022 at 01.25.41 PM.png

Another non-sensical result:
Image PNG.jpeg

**System Information** Operating system: n/a Graphics card: n/a **Blender Version** Broken: n/a Worked: n/a Since a few days it is possible to submit benchmarks with Metal as device_type. In some cases the sample_per_minute value seem quite unrealistic. It appears only on the Junkshop scene for Metal device type. The "samples_per_minute"-value are in this cases 10,000-700,000, while all other values are in the range as shown in the added picture. ID-SUBMISSION for the top-10 scores: id_submission Samples per minute 90596dba-2af3-4e24-988a-140b0565c203 710,228.40 62dfe033-a81f-4e7c-bed3-ddc62ab165a0 648,278.30 d2deef41-29c1-4878-b11d-153758945712 597,953.50 a3a89bd2-beec-4ac6-a4a6-17aed94e6f03 571,592.70 1c34e618-59fd-4772-b364-1b2f8c870f54 567,603.60 a6ed0d61-8d61-458e-acba-ecc5a56a7dac 547,242.40 5ba65bc7-d84f-4370-aad0-4bb6c05fdb5f 515,224.80 f8a1c1a5-fafe-4d78-897e-6804c9f2e604 514,256.00 44ef2573-677a-42fb-9997-e291480e48cf 486,760.50 03b7ab96-c1f8-41ad-9da9-25cf293ca425 483,787.20 ![Snapshot of List table - Render method 1 03-16-2022 at 01.25.41 PM.png](https://archive.blender.org/developer/F12930097/Snapshot_of_List_table_-_Render_method_1_03-16-2022_at_01.25.41_PM.png) Another non-sensical result: ![Image PNG.jpeg](https://archive.blender.org/developer/F12931235/Image_PNG.jpeg)
Author

Added subscriber: @ExLibris

Added subscriber: @ExLibris

#99134 was marked as duplicate of this issue

#99134 was marked as duplicate of this issue

infrastructure/blender-open-data#96548 was marked as duplicate of this issue

infrastructure/blender-open-data#96548 was marked as duplicate of this issue

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'

Added subscriber: @MHEonM1

Added subscriber: @MHEonM1
Author

The unrealistic values are still being submitted. Because the very high values the overall scores of Apple benchmarks are not really usable:

image.png

The unrealistic values are still being submitted. Because the very high values the overall scores of Apple benchmarks are not really usable: ![image.png](https://archive.blender.org/developer/F12944318/image.png)

Just to add a precision

On Junkshop, GPU scores are unrealistically high while CPU scores are unrealistically low.

Looks like someone tried to fix the bug by putting a divider somewhere in the CPU bench results, thinking he was acting on the GPU bench results.

Just to add a precision On Junkshop, GPU scores are unrealistically high while CPU scores are unrealistically low. Looks like someone tried to fix the bug by putting a divider somewhere in the CPU bench results, thinking he was acting on the GPU bench results.
Author

I cannot see the low overal scores for Apple M1 CPU scores for junkshop (see screenshot), but I notice quite a large difference between the lowest and highest scores. Is your score specific for your benchmark?

image.png

I cannot see the low overal scores for Apple M1 CPU scores for junkshop (see screenshot), but I notice quite a large difference between the lowest and highest scores. Is your score specific for your benchmark? ![image.png](https://archive.blender.org/developer/F12945072/image.png)

Sorry for the late response. Here are some screenshots of what I get in OpenData :

CPU:
Capture d’écran 2022-03-28 à 13.51.10.png

GPU:
Capture d’écran 2022-03-28 à 13.54.16.png

Sorry for the late response. Here are some screenshots of what I get in OpenData : CPU: ![Capture d’écran 2022-03-28 à 13.51.10.png](https://archive.blender.org/developer/F12952310/Capture_d_e_cran_2022-03-28_a__13.51.10.png) GPU: ![Capture d’écran 2022-03-28 à 13.54.16.png](https://archive.blender.org/developer/F12952323/Capture_d_e_cran_2022-03-28_a__13.54.16.png)
Author

No problem :)

Just as you I am a user that commited a bug report, the difference is that you saw your error on your own benchmark and I saw a bunch of errors in the data. Both our observations seem to have the same origin of the error. At least we see unrealistic values in Apple M1 benchmarks for metal rendering on scene junkshop. You see also an unrealistic value for CPU in your benchmark. I cannot see that directly in the data, although I notice a bigger scatter in values as my graph from last friday shows. I think we need some input of the developers of the tool here. This bug has been triaged by Blender and the current status is "Confirmed, Normal" as you see on the top of the screen. I am not sure what that mean. If Normal means "this is normal behaviour" we have a problem. But it can also mean "treat as a Normal bug" then we can expect some help from them.

No problem :) Just as you I am a user that commited a bug report, the difference is that you saw your error on your own benchmark and I saw a bunch of errors in the data. Both our observations seem to have the same origin of the error. At least we see unrealistic values in Apple M1 benchmarks for metal rendering on scene junkshop. You see also an unrealistic value for CPU in your benchmark. I cannot see that directly in the data, although I notice a bigger scatter in values as my graph from last friday shows. I think we need some input of the developers of the tool here. This bug has been triaged by Blender and the current status is "Confirmed, Normal" as you see on the top of the screen. I am not sure what that mean. If Normal means "this is normal behaviour" we have a problem. But it can also mean "treat as a Normal bug" then we can expect some help from them.

Concerning the CPU score of Junkshop:

I rendered the scene directly in Blender 3.1. You can see it took 8:44
Capture d’écran 2022-03-28 à 15.21.13.png

On the other hand, it took 12:02 to render Classroom
Capture d’écran 2022-03-28 à 15.44.18.png

Also, I tried to render Junkshop using GPU only. The render stopped after 20s, as if it was finished with no errors. This might explain why the score is so high on GPU benchmark.

Being a developper myself, I think "Confirmed, Normal" is meant for "Bug confirmed, normal priority". We'll see that !

Concerning the CPU score of Junkshop: I rendered the scene directly in Blender 3.1. You can see it took 8:44 ![Capture d’écran 2022-03-28 à 15.21.13.png](https://archive.blender.org/developer/F12952422/Capture_d_e_cran_2022-03-28_a__15.21.13.png) On the other hand, it took 12:02 to render Classroom ![Capture d’écran 2022-03-28 à 15.44.18.png](https://archive.blender.org/developer/F12952428/Capture_d_e_cran_2022-03-28_a__15.44.18.png) Also, I tried to render Junkshop using GPU only. The render stopped after 20s, as if it was finished with no errors. This might explain why the score is so high on GPU benchmark. Being a developper myself, I think "Confirmed, Normal" is meant for "Bug confirmed, normal priority". We'll see that !
Author

That is usefull info for the Metal/Blender developers I think if they are going to look into this bug.

The problem of this behaviour is that it occurs not always. In the data are until now 128 submissions of benchmarks on junkshop with the M1 cpu/gpu.

In 57 cases the score is over 500 (550-700,000) and 71 cases the values seems ok (41-51 ). There is not much else that I can see in the data, other than the slightly higher peak memory usage.

image.png

That is usefull info for the Metal/Blender developers I think if they are going to look into this bug. The problem of this behaviour is that it occurs not always. In the data are until now 128 submissions of benchmarks on junkshop with the M1 cpu/gpu. In 57 cases the score is over 500 (550-700,000) and 71 cases the values seems ok (41-51 ). There is not much else that I can see in the data, other than the slightly higher peak memory usage. ![image.png](https://archive.blender.org/developer/F12952447/image.png)
Author

Since April 10th I see no new unrealistic scores. I don't know if that is a coincidence.

Screenshot 2022-04-20 112252.jpg

Since April 10th I see no new unrealistic scores. I don't know if that is a coincidence. ![Screenshot 2022-04-20 112252.jpg](https://archive.blender.org/developer/F13012107/Screenshot_2022-04-20_112252.jpg)

I am still having nonsensical numbers (13inch MBP M1 8 GPU cores, 8Go RAM), both on CPU and GPU

Capture d’écran 2022-04-20 à 14.27.50.png

Capture d’écran 2022-04-20 à 14.31.17.png

Capture d’écran 2022-04-20 à 14.28.37.png

Capture d’écran 2022-04-20 à 14.31.25.png

I am still having nonsensical numbers (13inch MBP M1 8 GPU cores, 8Go RAM), both on CPU and GPU ![Capture d’écran 2022-04-20 à 14.27.50.png](https://archive.blender.org/developer/F13012320/Capture_d_e_cran_2022-04-20_a__14.27.50.png) ![Capture d’écran 2022-04-20 à 14.31.17.png](https://archive.blender.org/developer/F13012319/Capture_d_e_cran_2022-04-20_a__14.31.17.png) ![Capture d’écran 2022-04-20 à 14.28.37.png](https://archive.blender.org/developer/F13012318/Capture_d_e_cran_2022-04-20_a__14.28.37.png) ![Capture d’écran 2022-04-20 à 14.31.25.png](https://archive.blender.org/developer/F13012317/Capture_d_e_cran_2022-04-20_a__14.31.25.png)

Added subscriber: @brecht

Added subscriber: @brecht

Likely there is an error while rendering, which causes the render to stop quickly, but this goes undetected and the render time is considered correct.

If this happens also with the command line benchmark, getting the full debug output would help identify the cause.

./benchmark-launcher-cli --verbosity 3

https://download.blender.org/release/BlenderBenchmark2.0/launcher/benchmark-launcher-cli-3.0.0-macos.zip

Likely there is an error while rendering, which causes the render to stop quickly, but this goes undetected and the render time is considered correct. If this happens also with the command line benchmark, getting the full debug output would help identify the cause. ``` ./benchmark-launcher-cli --verbosity 3 ``` https://download.blender.org/release/BlenderBenchmark2.0/launcher/benchmark-launcher-cli-3.0.0-macos.zip

Here we go.

blender open data CLI log.txt

Still nonsensical numbers from the CPU rendering of Junkshop, as it takes 4 minutes less to compute that scene in Blender 3.1 than it takes to compute Classroom.

Here we go. [blender open data CLI log.txt](https://archive.blender.org/developer/F13012522/blender_open_data_CLI_log.txt) Still nonsensical numbers from the CPU rendering of Junkshop, as it takes 4 minutes less to compute that scene in Blender 3.1 than it takes to compute Classroom.

Added subscriber: @Michael-Jones

Added subscriber: @Michael-Jones

Thanks, there is indeed an error that doesn't seem to stop the render:

BLENDER: CommandBuffer Failed: cycles_metal_integrator_reset

@Michael-Jones I guess there are two issues to fix here. One is to stop the render on this error, and the other would be to figure out why the error happens in the first place. Would you have time to look into this?

Thanks, there is indeed an error that doesn't seem to stop the render: ``` BLENDER: CommandBuffer Failed: cycles_metal_integrator_reset ``` @Michael-Jones I guess there are two issues to fix here. One is to stop the render on this error, and the other would be to figure out why the error happens in the first place. Would you have time to look into this?

@MHEonM1 regarding CPU rendering time, which numbers are you looking at exactly?

The way the benchmark works now is to first render 1 sample to ensure any necessary kernels are compiled, and then a second time render for 30 seconds and count the number of samples finished in that time. So how many minutes it takes to render shouldn't really be the metric.

@MHEonM1 regarding CPU rendering time, which numbers are you looking at exactly? The way the benchmark works now is to first render 1 sample to ensure any necessary kernels are compiled, and then a second time render for 30 seconds and count the number of samples finished in that time. So how many minutes it takes to render shouldn't really be the metric.

@brecht I'm looking at the number of samples per second at the very end of the log.

It says 2.71 samples per minute* for Junkshop and 26.11 for Classroom.

Recognizing that you are adding up all the scores to get the final score, either Junkshop has no weight in the final score (since it takes 1.5 times longer to render Classroom on the same CPU), or there is a problem with the Junkshop CPU score.

@brecht I'm looking at the number of samples per second at the very end of the log. It says 2.71 samples per minute* for Junkshop and 26.11 for Classroom. Recognizing that you are adding up all the scores to get the final score, either Junkshop has no weight in the final score (since it takes 1.5 times longer to render Classroom on the same CPU), or there is a problem with the Junkshop CPU score.

Ok, perhaps the junkshop scene is running out of memory and causing issues on both CPU and GPU. Swapping and running very slowly on the CPU, and just failing on the GPU.

I have a 16GB M1 MacBook Air here and did not see the issue, perhaps this happens specifically for 8GB models.

Ok, perhaps the junkshop scene is running out of memory and causing issues on both CPU and GPU. Swapping and running very slowly on the CPU, and just failing on the GPU. I have a 16GB M1 MacBook Air here and did not see the issue, perhaps this happens specifically for 8GB models.

@brecht What is the CPU score of Junkshop on your 16G machine ?

I monitored my memory and swap usage while executing the benchmark. Nothing to report here.

Also, that would be very strange since I have absolutely no problem to render the scene with Blender.

CPU compute works fine, but GPU seems to halt after the first sample, with no error.

Junkshop is a rather old scene. Less demanding than Classroom. How could Junkshop halt on memory issue while Classroom renders just fine ?

@brecht What is the CPU score of Junkshop on your 16G machine ? I monitored my memory and swap usage while executing the benchmark. Nothing to report here. Also, that would be very strange since I have absolutely no problem to render the scene with Blender. CPU compute works fine, but GPU seems to halt after the first sample, with no error. Junkshop is a rather old scene. Less demanding than Classroom. How could Junkshop halt on memory issue while Classroom renders just fine ?

@brecht

Also, where can I find documentation on how the Open Data Benchmark performs its Benchmark ? What are the settings for the files you are computing ? Why and how those settings were chosen ? How can I correlate the Open Data Benchmark results with what I can expect to see in Blender ?

For a benchmark dubbed "Open Data", I find it pretty opaque...

@brecht Also, where can I find documentation on how the Open Data Benchmark performs its Benchmark ? What are the settings for the files you are computing ? Why and how those settings were chosen ? How can I correlate the Open Data Benchmark results with what I can expect to see in Blender ? For a benchmark dubbed "Open Data", I find it pretty opaque...

The source code is here, I don't know if there is documentation outside of that.
https://developer.blender.org/source/blender-open-data/

Samples per minute on the CPU here are:

  • monster: 59
  • junkshop: 32
  • classroom: 23

The classroom is older than junkshop and uses less memory.

The source code is here, I don't know if there is documentation outside of that. https://developer.blender.org/source/blender-open-data/ Samples per minute on the CPU here are: * monster: 59 * junkshop: 32 * classroom: 23 The classroom is older than junkshop and uses less memory.

Classroom may be older but is still a 2.8.x Blender demo file.

Still, Junkshop takes less time to render than Classroom.

Your numbers are consistent with mine though. As rendering times are 722s against 524s (see screenshots above), no wonder why you got 32 on junkshop and 32*524/722 = 23,224376731301939 on classroom.

Open source code is not open data. Not everyone is a coder. Not everyone wants to dig into Blender's code to figure out how it works.

Data should be readable by everyone.

Classroom may be older but is still a 2.8.x Blender demo file. Still, Junkshop takes less time to render than Classroom. Your numbers are consistent with mine though. As rendering times are 722s against 524s (see screenshots above), no wonder why you got 32 on junkshop and 32*524/722 = 23,224376731301939 on classroom. Open source code is not open data. Not everyone is a coder. Not everyone wants to dig into Blender's code to figure out how it works. Data should be readable by everyone.
Author

A simular problem appears with (some) M2 processors as described in Track #99134

A simular problem appears with (some) M2 processors as described in Track #99134

Added subscriber: @anthonynelzin

Added subscriber: @anthonynelzin

This issue was referenced by blender/blender@9b6e86ace1

This issue was referenced by blender/blender@9b6e86ace139529fa18c2e73f960cfa484e199ec
Author

Since 24-6 the Apple M2 appears in the open data, with unrealistic values for the Junkshop scene

image.png

Since 24-6 the Apple M2 appears in the open data, with unrealistic values for the Junkshop scene ![image.png](https://archive.blender.org/developer/F13231974/image.png)
Author

On juli 12th 2022 two another unrealistic values appear. This time it is a RTX 3080 Laptop GPU.

id_submission System Render method label_scene Samples per minute (avg) cpu_name gpu_name submission_date
e5e1814c-d233-4d6a-8501-16205b684dbc Windows OPTIX junkshop 63.623.862,7 11th Gen Intel Core i7-11700K @ 3.60GHz NVIDIA GeForce RTX 3080 Laptop GPU 12Jul2022
e8405fd8-2c8c-42fd-a072-ff59c801079c Windows OPTIX junkshop 62.844.645,3 11th Gen Intel Core i7-11700K @ 3.60GHz NVIDIA GeForce RTX 3080 Laptop GPU 12Jul2022
On juli 12th 2022 two another unrealistic values appear. This time it is a RTX 3080 Laptop GPU. | id_submission System Render method label_scene Samples per minute (avg) cpu_name gpu_name submission_date | -- | e5e1814c-d233-4d6a-8501-16205b684dbc Windows OPTIX junkshop 63.623.862,7 11th Gen Intel Core i7-11700K @ 3.60GHz NVIDIA GeForce RTX 3080 Laptop GPU 12Jul2022 e8405fd8-2c8c-42fd-a072-ff59c801079c Windows OPTIX junkshop 62.844.645,3 11th Gen Intel Core i7-11700K @ 3.60GHz NVIDIA GeForce RTX 3080 Laptop GPU 12Jul2022 | |
Author

Screenshot 2022-07-26 072412.jpg

![Screenshot 2022-07-26 072412.jpg](https://archive.blender.org/developer/F13317501/Screenshot_2022-07-26_072412.jpg)
This repo is archived. You cannot comment on issues.
No description provided.