Some benchmarks considering Metal device type show unrealistic values in Blender Open Data #96519
Labels
No Label
legacy module
Rendering & Cycles
legacy project
1.0.0-beta.2
legacy project
Blender Benchmark
legacy project
Cycles
Priority::Normal
Status::Archived
Status::Confirmed
Status::Duplicate
Status::Needs Triage
Status::Resolved
Type::Bug
Type::Design
Type::Report
Type::To Do
No Milestone
No project
No Assignees
5 Participants
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: archive/blender-benchmark-bundle#96519
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: n/a
Graphics card: n/a
Blender Version
Broken: n/a
Worked: n/a
Since a few days it is possible to submit benchmarks with Metal as device_type. In some cases the sample_per_minute value seem quite unrealistic.
It appears only on the Junkshop scene for Metal device type. The "samples_per_minute"-value are in this cases 10,000-700,000, while all other values are in the range as shown in the added picture.
ID-SUBMISSION for the top-10 scores:
id_submission Samples per minute
90596dba-2af3-4e24-988a-140b0565c203 710,228.40
62dfe033-a81f-4e7c-bed3-ddc62ab165a0 648,278.30
d2deef41-29c1-4878-b11d-153758945712 597,953.50
a3a89bd2-beec-4ac6-a4a6-17aed94e6f03 571,592.70
1c34e618-59fd-4772-b364-1b2f8c870f54 567,603.60
a6ed0d61-8d61-458e-acba-ecc5a56a7dac 547,242.40
5ba65bc7-d84f-4370-aad0-4bb6c05fdb5f 515,224.80
f8a1c1a5-fafe-4d78-897e-6804c9f2e604 514,256.00
44ef2573-677a-42fb-9997-e291480e48cf 486,760.50
03b7ab96-c1f8-41ad-9da9-25cf293ca425 483,787.20
Another non-sensical result:
Added subscriber: @ExLibris
#99134 was marked as duplicate of this issue
infrastructure/blender-open-data#96548 was marked as duplicate of this issue
Changed status from 'Needs Triage' to: 'Confirmed'
Added subscriber: @MHEonM1
The unrealistic values are still being submitted. Because the very high values the overall scores of Apple benchmarks are not really usable:
Just to add a precision
On Junkshop, GPU scores are unrealistically high while CPU scores are unrealistically low.
Looks like someone tried to fix the bug by putting a divider somewhere in the CPU bench results, thinking he was acting on the GPU bench results.
I cannot see the low overal scores for Apple M1 CPU scores for junkshop (see screenshot), but I notice quite a large difference between the lowest and highest scores. Is your score specific for your benchmark?
Sorry for the late response. Here are some screenshots of what I get in OpenData :
CPU:
GPU:
No problem :)
Just as you I am a user that commited a bug report, the difference is that you saw your error on your own benchmark and I saw a bunch of errors in the data. Both our observations seem to have the same origin of the error. At least we see unrealistic values in Apple M1 benchmarks for metal rendering on scene junkshop. You see also an unrealistic value for CPU in your benchmark. I cannot see that directly in the data, although I notice a bigger scatter in values as my graph from last friday shows. I think we need some input of the developers of the tool here. This bug has been triaged by Blender and the current status is "Confirmed, Normal" as you see on the top of the screen. I am not sure what that mean. If Normal means "this is normal behaviour" we have a problem. But it can also mean "treat as a Normal bug" then we can expect some help from them.
Concerning the CPU score of Junkshop:
I rendered the scene directly in Blender 3.1. You can see it took 8:44
On the other hand, it took 12:02 to render Classroom
Also, I tried to render Junkshop using GPU only. The render stopped after 20s, as if it was finished with no errors. This might explain why the score is so high on GPU benchmark.
Being a developper myself, I think "Confirmed, Normal" is meant for "Bug confirmed, normal priority". We'll see that !
That is usefull info for the Metal/Blender developers I think if they are going to look into this bug.
The problem of this behaviour is that it occurs not always. In the data are until now 128 submissions of benchmarks on junkshop with the M1 cpu/gpu.
In 57 cases the score is over 500 (550-700,000) and 71 cases the values seems ok (41-51 ). There is not much else that I can see in the data, other than the slightly higher peak memory usage.
Since April 10th I see no new unrealistic scores. I don't know if that is a coincidence.
I am still having nonsensical numbers (13inch MBP M1 8 GPU cores, 8Go RAM), both on CPU and GPU
Added subscriber: @brecht
Likely there is an error while rendering, which causes the render to stop quickly, but this goes undetected and the render time is considered correct.
If this happens also with the command line benchmark, getting the full debug output would help identify the cause.
https://download.blender.org/release/BlenderBenchmark2.0/launcher/benchmark-launcher-cli-3.0.0-macos.zip
Here we go.
blender open data CLI log.txt
Still nonsensical numbers from the CPU rendering of Junkshop, as it takes 4 minutes less to compute that scene in Blender 3.1 than it takes to compute Classroom.
Added subscriber: @Michael-Jones
Thanks, there is indeed an error that doesn't seem to stop the render:
@Michael-Jones I guess there are two issues to fix here. One is to stop the render on this error, and the other would be to figure out why the error happens in the first place. Would you have time to look into this?
@MHEonM1 regarding CPU rendering time, which numbers are you looking at exactly?
The way the benchmark works now is to first render 1 sample to ensure any necessary kernels are compiled, and then a second time render for 30 seconds and count the number of samples finished in that time. So how many minutes it takes to render shouldn't really be the metric.
@brecht I'm looking at the number of samples per second at the very end of the log.
It says 2.71 samples per minute* for Junkshop and 26.11 for Classroom.
Recognizing that you are adding up all the scores to get the final score, either Junkshop has no weight in the final score (since it takes 1.5 times longer to render Classroom on the same CPU), or there is a problem with the Junkshop CPU score.
Ok, perhaps the junkshop scene is running out of memory and causing issues on both CPU and GPU. Swapping and running very slowly on the CPU, and just failing on the GPU.
I have a 16GB M1 MacBook Air here and did not see the issue, perhaps this happens specifically for 8GB models.
@brecht What is the CPU score of Junkshop on your 16G machine ?
I monitored my memory and swap usage while executing the benchmark. Nothing to report here.
Also, that would be very strange since I have absolutely no problem to render the scene with Blender.
CPU compute works fine, but GPU seems to halt after the first sample, with no error.
Junkshop is a rather old scene. Less demanding than Classroom. How could Junkshop halt on memory issue while Classroom renders just fine ?
@brecht
Also, where can I find documentation on how the Open Data Benchmark performs its Benchmark ? What are the settings for the files you are computing ? Why and how those settings were chosen ? How can I correlate the Open Data Benchmark results with what I can expect to see in Blender ?
For a benchmark dubbed "Open Data", I find it pretty opaque...
The source code is here, I don't know if there is documentation outside of that.
https://developer.blender.org/source/blender-open-data/
Samples per minute on the CPU here are:
The classroom is older than junkshop and uses less memory.
Classroom may be older but is still a 2.8.x Blender demo file.
Still, Junkshop takes less time to render than Classroom.
Your numbers are consistent with mine though. As rendering times are 722s against 524s (see screenshots above), no wonder why you got 32 on junkshop and 32*524/722 = 23,224376731301939 on classroom.
Open source code is not open data. Not everyone is a coder. Not everyone wants to dig into Blender's code to figure out how it works.
Data should be readable by everyone.
A simular problem appears with (some) M2 processors as described in Track #99134
Added subscriber: @anthonynelzin
This issue was referenced by blender/blender@9b6e86ace1
Since 24-6 the Apple M2 appears in the open data, with unrealistic values for the Junkshop scene
On juli 12th 2022 two another unrealistic values appear. This time it is a RTX 3080 Laptop GPU.