EEVEE-Next: Performance Analysis #117246

Open
opened 2024-01-17 18:35:09 +01:00 by Miguel Pozo · 0 comments
Member

To compare EEVEE Legacy and EEVEE Next and look for bottlenecks, I’ve made a performance test with the Wanderer scene using #116304.
The camera is static and pointing slightly downwards so the full view is covered by geometry.

Results

EEVEE Legacy
 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 8.37 | 3.02 |
. Manager.end_sync              | 0.02 | 0.06 | 1.68
. EEVEE                         | 7.95 | 2.02 | 1.62
.. Probes Refresh               | 0.00 | 0.01 | 1.61
.. Shadows                      | 0.00 | 0.05 | 1.58
... Cube Shadow Maps            | 0.00 | 0.01 | 1.58
... Cascaded Shadow Maps        | 0.00 | 0.01 | 1.55
.. Prepass                      | 1.36 | 0.38 | 1.51
... psl->depth_ps               | 1.36 | 0.34 | 1.48
.. Main MinMax buffer           | 0.10 | 0.24 | 2.48
... Max buffer                  | 0.10 | 0.21 | 2.47
.... psl->maxz_copydepth_ps     | 0.03 | 0.02 | 2.45
.... psl->maxz_downlevel_ps     | 0.02 | 0.01 | 2.44
.... psl->maxz_downlevel_ps     | 0.01 | 0.01 | 2.44
.... psl->maxz_downlevel_ps     | 0.00 | 0.01 | 2.43
.... psl->maxz_downlevel_ps     | 0.00 | 0.01 | 2.41
.... psl->maxz_downlevel_ps     | 0.00 | 0.01 | 2.39
.... psl->maxz_downlevel_ps     | 0.00 | 0.01 | 2.37
.. GTAO Horizon Scan            | 0.67 | 0.04 | 2.34
... psl->ao_horizon_search      | 0.66 | 0.01 | 2.32
.. Shading                      | 4.90 | 0.43 | 2.95
... psl->background_ps          | 0.01 | 0.02 | 2.94
... psl->material_ps            | 4.88 | 0.38 | 2.93
.. SSR                          | 0.60 | 0.32 | 7.42
... psl->ssr_raytrace           | 0.10 | 0.01 | 7.40
... Downsample Radiance         | 0.12 | 0.22 | 7.48
.... psl->color_copy_ps         | 0.04 | 0.01 | 7.46
.... psl->color_downsample_ps   | 0.02 | 0.01 | 7.48
.... psl->color_downsample_ps   | 0.00 | 0.01 | 7.48
.... psl->color_downsample_ps   | 0.00 | 0.01 | 7.47
.... psl->color_downsample_ps   | 0.00 | 0.01 | 7.42
.... psl->color_downsample_ps   | 0.00 | 0.01 | 7.40
.... psl->color_downsample_ps   | 0.00 | 0.01 | 7.39
... psl->ssr_resolve            | 0.37 | 0.02 | 7.36
.. psl->probe_display           | 0.00 | 0.01 | 7.69
.. Opaque Refraction            | 0.00 | 0.01 | 7.67
.. Post FX                      | 0.21 | 0.35 | 7.65
... Bloom Blit                  | 0.04 | 0.02 | 7.63
... Bloom Downsample First      | 0.01 | 0.01 | 7.64
... Bloom Downsample            | 0.00 | 0.01 | 7.62
... Bloom Downsample            | 0.00 | 0.01 | 7.60
... Bloom Downsample            | 0.00 | 0.01 | 7.57
... Bloom Downsample            | 0.00 | 0.01 | 7.55
... Bloom Upsample              | 0.00 | 0.02 | 7.52
... Bloom Upsample              | 0.00 | 0.01 | 7.49
... Bloom Upsample              | 0.00 | 0.01 | 7.47
... Bloom Upsample              | 0.02 | 0.01 | 7.46
... Bloom Resolve               | 0.08 | 0.01 | 7.46
. Overlay                       | 0.22 | 0.66 | 7.54
.. psl->background_ps           | 0.03 | 0.01 | 7.52
.. psl->fade_ps[i]              | 0.00 | 0.01 | 7.51
.. psl->facing_ps[i]            | 0.00 | 0.01 | 7.49
.. psl->extra_blend_ps          | 0.00 | 0.01 | 7.47
.. psl->wireframe_ps            | 0.00 | 0.01 | 7.45
.. *p_armature_trans_ps         | 0.00 | 0.01 | 7.43
.. *p_armature_ps               | 0.00 | 0.02 | 7.40
.. psl->particle_ps             | 0.00 | 0.01 | 7.37
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 7.35
.. *p_extra_ps                  | 0.00 | 0.08 | 7.33
.. psl->attribute_ps            | 0.00 | 0.01 | 7.25
.. psl->grid_ps                 | 0.04 | 0.01 | 7.22
.. psl->fade_ps[i]              | 0.00 | 0.01 | 7.23
.. psl->facing_ps[i]            | 0.00 | 0.01 | 7.22
.. *p_armature_trans_ps         | 0.00 | 0.01 | 7.19
.. *p_armature_ps               | 0.00 | 0.01 | 7.17
.. *p_extra_ps                  | 0.00 | 0.02 | 7.15
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 7.11
.. psl->motion_paths_ps         | 0.00 | 0.01 | 7.09
.. psl->extra_grid_ps           | 0.00 | 0.01 | 7.07
.. psl->extra_centers_ps        | 0.00 | 0.01 | 7.05
.. psl->antialiasing_ps         | 0.10 | 0.01 | 7.03
. RegionInfo                    | 0.03 | 0.06 | 7.06

 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 12.2 | 6.86 |
. SPACE_TOPBAR                  | 0.00 | 0.02 | 0.25
. SPACE_STATUSBAR               | 0.00 | 0.00 | 0.21
. SPACE_VIEW3D                  | 11.9 | 6.67 | 0.19
.. Viewport                     | 11.9 | 6.65 | 0.19
. Window Redraw                 | 0.30 | 0.11 | 5.43
EEVEE Next
 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 21.9 | 5.80 |
. Manager.end_sync              | 0.06 | 0.14 | 2.32
. EEVEE                         | 21.5 | 4.58 | 2.18
.. negZ_view                    | 21.5 | 4.40 | 2.16
... LightCulling                | 0.04 | 0.10 | 2.13
.... Select                     | 0.00 | 0.01 | 2.12
.... Sort                       | 0.00 | 0.01 | 2.10
.... Zbin                       | 0.01 | 0.01 | 2.08
.... Tiles                      | 0.01 | 0.01 | 2.08
... Probe.Select                | 0.00 | 0.01 | 2.06
... World.Background            | 1.03 | 0.03 | 2.03
... Deferred.Opaque             | 18.9 | 2.93 | 3.03
.... View.compute_visibility    | 0.01 | 0.02 | 3.01
.... DrawMultiBuf.bind          | 0.01 | 0.03 | 2.99
.... Prepass                    | 2.24 | 0.36 | 2.96
..... DoubleSided.Static        | 2.21 | 0.27 | 2.95
..... SingleSided.Static        | 0.00 | 0.01 | 4.88
..... DoubleSided.Moving        | 0.02 | 0.01 | 4.86
..... SingleSided.Moving        | 0.00 | 0.00 | 4.86
.... HizUpdate                  | 0.06 | 0.03 | 4.83
.... Shadow                     | 5.55 | 0.80 | 4.86
..... TilemapSetup              | 0.02 | 0.09 | 4.85
...... ClearClipmap             | 0.00 | 0.01 | 4.84
...... DirectionalBounds        | 0.00 | 0.01 | 4.82
...... Init                     | 0.00 | 0.01 | 4.80
...... CasterUpdate             | 0.00 | 0.01 | 4.79
..... View.compute_visibility   | 0.01 | 0.01 | 4.76
..... DrawMultiBuf.bind         | 0.00 | 0.01 | 4.76
..... TagUsage                  | 0.13 | 0.06 | 4.74
...... Opaque                   | 0.13 | 0.01 | 4.73
...... Transparent              | 0.00 | 0.02 | 4.83
..... TilemapUpdate             | 0.12 | 0.15 | 4.80
...... MaskLod                  | 0.00 | 0.01 | 4.79
...... Free                     | 0.00 | 0.01 | 4.77
...... Defrag                   | 0.00 | 0.01 | 4.76
...... AllocatePages            | 0.00 | 0.01 | 4.74
...... Finalize                 | 0.01 | 0.01 | 4.73
...... RenderClear              | 0.07 | 0.01 | 4.72
..... View.compute_procedural_bounds | 0.01 | 0.01 | 4.77
..... View.compute_visibility   | 0.03 | 0.01 | 4.76
..... DrawMultiBuf.bind         | 0.01 | 0.02 | 4.78
..... Shadow.Surface            | 5.17 | 0.31 | 4.76
...... Shadow.Surface           | 5.17 | 0.30 | 4.75
....... Shadow.Surface.Double-Sided | 5.17 | 0.25 | 4.74
....... Shadow.Surface.Single-Sided | 0.00 | 0.01 | 9.64
.... View.compute_visibility    | 0.01 | 0.01 | 9.58
.... DrawMultiBuf.bind          | 0.01 | 0.02 | 9.58
.... Shading                    | 5.08 | 0.41 | 9.55
..... DoubleSided               | 0.00 | 0.00 | 9.54
..... SingleSided               | 0.00 | 0.01 | 9.53
..... DoubleSided               | 5.00 | 0.30 | 9.51
..... SingleSided               | 0.00 | 0.01 | 14.2
..... StencilClassify           | 0.07 | 0.02 | 14.1
.... Raytracing                 | 4.35 | 0.93 | 14.2
..... HorizonScan.Setup         | 0.07 | 0.01 | 14.2
..... TileClassify              | 0.08 | 0.01 | 14.2
..... Raytracing                | 2.47 | 0.23 | 14.3
...... TileCompact              | 0.01 | 0.01 | 14.2
...... RayGenerate              | 0.01 | 0.01 | 14.2
...... Trace.Screen             | 0.01 | 0.01 | 14.2
...... DenoiseSpatial           | 0.05 | 0.01 | 14.2
...... DenoiseTemporal          | 0.03 | 0.02 | 14.2
...... DenoiseBilateral         | 0.01 | 0.01 | 14.2
...... HorizonScan.Trace        | 1.63 | 0.01 | 14.2
...... HorizonScan.Denoise      | 0.69 | 0.01 | 15.8
..... Raytracing                | 1.65 | 0.38 | 16.5
...... TileCompact              | 0.01 | 0.01 | 16.5
...... RayGenerate              | 0.01 | 0.02 | 16.5
...... Trace.Screen             | 0.01 | 0.01 | 16.4
...... DenoiseSpatial           | 0.04 | 0.02 | 16.4
...... DenoiseTemporal          | 0.02 | 0.05 | 16.4
...... DenoiseBilateral         | 0.01 | 0.04 | 16.3
...... HorizonScan.Trace        | 1.06 | 0.01 | 16.3
...... HorizonScan.Denoise      | 0.46 | 0.02 | 17.3
..... Raytracing                | 0.06 | 0.22 | 17.7
...... TileCompact              | 0.01 | 0.01 | 17.7
...... RayGenerate              | 0.00 | 0.01 | 17.7
...... Trace.Screen             | 0.00 | 0.01 | 17.7
...... DenoiseSpatial           | 0.00 | 0.02 | 17.7
...... DenoiseTemporal          | 0.00 | 0.01 | 17.7
...... DenoiseBilateral         | 0.00 | 0.01 | 17.6
...... HorizonScan.Trace        | 0.00 | 0.01 | 17.6
...... HorizonScan.Denoise      | 0.00 | 0.01 | 17.6
.... EvalLights                 | 1.34 | 0.06 | 17.6
..... Eval.Light                | 1.34 | 0.04 | 17.5
.... Combine                    | 0.11 | 0.01 | 18.8
... Deferred.Refract            | 0.00 | 0.01 | 18.9
... Forward.Opaque              | 1.31 | 1.06 | 18.9
.... View.compute_visibility    | 0.01 | 0.01 | 18.9
.... DrawMultiBuf.bind          | 0.00 | 0.01 | 18.9
.... Prepass                    | 0.00 | 0.08 | 18.9
..... DoubleSided.Static        | 0.00 | 0.01 | 18.9
..... SingleSided.Static        | 0.00 | 0.01 | 18.9
..... DoubleSided.Moving        | 0.00 | 0.01 | 18.8
..... SingleSided.Moving        | 0.00 | 0.00 | 18.8
.... HizUpdate                  | 0.06 | 0.01 | 18.8
.... Shadow                     | 1.20 | 0.75 | 18.8
..... TilemapSetup              | 0.02 | 0.09 | 18.8
...... ClearClipmap             | 0.00 | 0.01 | 18.8
...... DirectionalBounds        | 0.00 | 0.01 | 18.8
...... Init                     | 0.00 | 0.01 | 18.8
...... CasterUpdate             | 0.00 | 0.01 | 18.8
..... View.compute_visibility   | 0.01 | 0.01 | 18.7
..... DrawMultiBuf.bind         | 0.00 | 0.01 | 18.7
..... TagUsage                  | 0.13 | 0.05 | 18.7
...... Opaque                   | 0.12 | 0.01 | 18.7
...... Transparent              | 0.00 | 0.01 | 18.8
..... TilemapUpdate             | 0.04 | 0.14 | 18.8
...... MaskLod                  | 0.00 | 0.01 | 18.8
...... Free                     | 0.00 | 0.01 | 18.8
...... Defrag                   | 0.00 | 0.01 | 18.7
...... AllocatePages            | 0.00 | 0.01 | 18.7
...... Finalize                 | 0.01 | 0.01 | 18.7
...... RenderClear              | 0.00 | 0.01 | 18.7
..... View.compute_procedural_bounds | 0.01 | 0.01 | 18.7
..... View.compute_visibility   | 0.03 | 0.01 | 18.7
..... DrawMultiBuf.bind         | 0.01 | 0.02 | 18.7
..... Shadow.Surface            | 0.91 | 0.28 | 18.7
...... Shadow.Surface           | 0.91 | 0.26 | 18.7
....... Shadow.Surface.Double-Sided | 0.90 | 0.22 | 18.6
....... Shadow.Surface.Single-Sided | 0.00 | 0.01 | 19.3
.... View.compute_visibility    | 0.01 | 0.01 | 19.3
.... DrawMultiBuf.bind          | 0.00 | 0.01 | 19.3
.... Shading                    | 0.00 | 0.04 | 19.2
..... SingleSided               | 0.00 | 0.00 | 19.2
..... DoubleSided               | 0.00 | 0.00 | 19.2
... View.compute_visibility     | 0.01 | 0.01 | 19.2
... DrawMultiBuf.bind           | 0.00 | 0.01 | 19.2
... Forward.Transparent         | 0.00 | 0.02 | 19.1
.... ResourceBind               | 0.00 | 0.01 | 19.1
... Film.Accumulate             | 0.18 | 0.02 | 19.1
.. Velocity Copy Pass           | 0.00 | 0.01 | 19.2
. Overlay                       | 0.23 | 0.80 | 19.1
.. psl->background_ps           | 0.04 | 0.02 | 19.1
.. psl->fade_ps[i]              | 0.00 | 0.01 | 19.1
.. psl->facing_ps[i]            | 0.00 | 0.01 | 19.1
.. psl->extra_blend_ps          | 0.00 | 0.01 | 19.0
.. psl->wireframe_ps            | 0.00 | 0.02 | 19.0
.. *p_armature_trans_ps         | 0.00 | 0.01 | 19.0
.. *p_armature_ps               | 0.00 | 0.02 | 19.0
.. psl->particle_ps             | 0.00 | 0.01 | 18.9
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 18.9
.. *p_extra_ps                  | 0.00 | 0.09 | 18.9
.. psl->attribute_ps            | 0.00 | 0.01 | 18.8
.. psl->grid_ps                 | 0.04 | 0.02 | 18.8
.. psl->fade_ps[i]              | 0.00 | 0.02 | 18.8
.. psl->facing_ps[i]            | 0.00 | 0.01 | 18.7
.. *p_armature_trans_ps         | 0.00 | 0.01 | 18.7
.. *p_armature_ps               | 0.00 | 0.02 | 18.7
.. *p_extra_ps                  | 0.00 | 0.01 | 18.6
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 18.6
.. psl->motion_paths_ps         | 0.00 | 0.01 | 18.6
.. psl->extra_grid_ps           | 0.00 | 0.02 | 18.6
.. psl->extra_centers_ps        | 0.00 | 0.01 | 18.5
.. psl->antialiasing_ps         | 0.10 | 0.03 | 18.5
. RegionInfo                    | 0.03 | 0.11 | 18.5

 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 27.8 | 25.0 |
. SPACE_TOPBAR                  | 0.00 | 0.01 | 14.6
. SPACE_STATUSBAR               | 0.00 | 0.01 | 14.6
. SPACE_VIEW3D                  | 27.5 | 24.8 | 14.5
.. Viewport                     | 27.5 | 24.8 | 14.5
. Window Redraw                 | 0.30 | 0.10 | 17.2

I’ve also tested other scenes and they show similar results.

(These are from an RTX 3060 Ti, it would be good to have a similar comparison made on AMD and on Apple hardware)

Issues

PrePass

PrePass is slower in EEVEE Next (2.24 vs 1.36ms).
No idea about the reason, I've tried removing all the extra stuff from the shaders (velocity, displacement, transparency) and the performance difference didn't change.

Update: There are thousands of rocks in the scene and many of them are unique meshes.
After merging all of them into one mesh or ensuring they all share the same mesh (so they can be batched), the pre-pass performance between Legacy and Next is identical.
So the performance difference seems caused by the draw call overhead, maybe due to EEVEE-Next using indirect draw calls? If that's the case we may want to look into using regular draw calls + CPU culling for single instance meshes.

Update 2: Fix: #117561

Shading

The Shading pass in EEVEE Next (which only outputs the closure data) takes a bit longer than the EEVEE Legacy pass (5.08 vs 4.90 ms) even if EEVEE Legacy is computing the lighting too.
So, in reality, Shading + Lighting is taking 6.53 ms in EEVEE Next (Shading+Eval Lights+Combine) vs 4.90 ms in EEVEE Legacy.
I guess the main bottleneck here is fillrate and we’ll have to eat the cost as long as we use Deferred Rendering.
Other types of scenes (with more light sources or very dense geometry) might compare more favorably.

Update: A good chunk of the performance difference seems to be caused by the same reason as the pre-pass difference. The shading pass (without lighting) in Next becomes faster than Legacy after #117561.

Shadows

EEVEE Legacy is able to fully skip the Shadow pass, but EEVEE Next Deferred.Opaque Shadows are always being re-rendered (5.55 ms) so the update detection is not working correctly.

Update: This doesn't seem to happen consistently, even in the same scene. Sometimes the update detection seems to work fine.

Running Shadows.set_view has a non-trivial cost (1.20 ms) even when it doesn’t render anything.
That’s up to 2.40 ms (Deferred.Refraction + Forward.Opaque) doing nothing.
We could avoid by it by rendering all pre-passes fist and running the shadow update/rendering only once per frame.
Forward.Opaque can probably be fully removed, though?

Update: Deferred.Refraction and Forward.Opaque passes are now skipped when they're empty. The issue is still there when they're used, though.

Update 2: While tagging still has a cost, the Surface.Shading overhead is gone after #117561.

ShadowsStatistics

The large difference between EEVEE Next (21.5 ms) and the full Viewport (27.5 ms) seems to be caused almost entirely by the Shadow::statistics_buf_ read (which is done at sync time).
Removing the read also increases the latency, which is oddly low at the moment.
I’m not sure if the cause is a bug on our side (Swapchain maybe?) or if it’s simply the driver using a different strategy.

Update: Fix: #117521

Raytracing

Raytracing seems quite costly (4.35 ms edit: actually more (~5ms), there was a bug in the initial test, see #117159),
not sure how much the horizon passes could be optimized.
But I found something odd with the “regular” screen tracing. If I disable the horizon method (by setting Max Roughness to 1) this is what I get:

.... Raytracing                 | 4.99 | 0.55 | 49.1
..... TileClassify              | 0.08 | 0.01 | 49.1
..... Raytracing                | 2.51 | 0.17 | 49.2
...... TileCompact              | 0.01 | 0.01 | 49.1
...... RayGenerate              | 0.06 | 0.01 | 49.1
...... Trace.Screen             | 0.44 | 0.01 | 49.2
...... DenoiseSpatial           | 0.76 | 0.01 | 49.6
...... DenoiseTemporal          | 0.40 | 0.01 | 50.3
...... DenoiseBilateral         | 0.82 | 0.01 | 50.7
..... Raytracing                | 2.34 | 0.15 | 51.5
...... TileCompact              | 0.01 | 0.01 | 51.5
...... RayGenerate              | 0.04 | 0.01 | 51.5
...... Trace.Screen             | 0.34 | 0.01 | 51.5
...... DenoiseSpatial           | 0.49 | 0.01 | 51.8
...... DenoiseTemporal          | 0.25 | 0.01 | 52.3
...... DenoiseBilateral         | 1.18 | 0.01 | 52.5
..... Raytracing                | 0.05 | 0.15 | 53.7
...... TileCompact              | 0.00 | 0.01 | 53.7
...... RayGenerate              | 0.00 | 0.01 | 53.6
...... Trace.Screen             | 0.00 | 0.01 | 53.6
...... DenoiseSpatial           | 0.00 | 0.01 | 53.6
...... DenoiseTemporal          | 0.00 | 0.01 | 53.6
...... DenoiseBilateral         | 0.00 | 0.01 | 53.6

Most of the time seems to be eaten by denoising (4.01 ms out of 4.99 ms).
So, at the moment, it doesn’t seem like the current horizon scanning or the denoise implementation are worth its cost.

Update: Heavily improved by #118924

Conclusion

By optimizing these issues, I think we should get roughly on the same performance level as EEVEE Legacy.
I'll keep updating the task as I find more details.

To compare EEVEE Legacy and EEVEE Next and look for bottlenecks, I’ve made a performance test with the Wanderer scene using #116304. The camera is static and pointing slightly downwards so the full view is covered by geometry. ### Results <details> <summary>EEVEE Legacy</summary> ``` Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 8.37 | 3.02 | . Manager.end_sync | 0.02 | 0.06 | 1.68 . EEVEE | 7.95 | 2.02 | 1.62 .. Probes Refresh | 0.00 | 0.01 | 1.61 .. Shadows | 0.00 | 0.05 | 1.58 ... Cube Shadow Maps | 0.00 | 0.01 | 1.58 ... Cascaded Shadow Maps | 0.00 | 0.01 | 1.55 .. Prepass | 1.36 | 0.38 | 1.51 ... psl->depth_ps | 1.36 | 0.34 | 1.48 .. Main MinMax buffer | 0.10 | 0.24 | 2.48 ... Max buffer | 0.10 | 0.21 | 2.47 .... psl->maxz_copydepth_ps | 0.03 | 0.02 | 2.45 .... psl->maxz_downlevel_ps | 0.02 | 0.01 | 2.44 .... psl->maxz_downlevel_ps | 0.01 | 0.01 | 2.44 .... psl->maxz_downlevel_ps | 0.00 | 0.01 | 2.43 .... psl->maxz_downlevel_ps | 0.00 | 0.01 | 2.41 .... psl->maxz_downlevel_ps | 0.00 | 0.01 | 2.39 .... psl->maxz_downlevel_ps | 0.00 | 0.01 | 2.37 .. GTAO Horizon Scan | 0.67 | 0.04 | 2.34 ... psl->ao_horizon_search | 0.66 | 0.01 | 2.32 .. Shading | 4.90 | 0.43 | 2.95 ... psl->background_ps | 0.01 | 0.02 | 2.94 ... psl->material_ps | 4.88 | 0.38 | 2.93 .. SSR | 0.60 | 0.32 | 7.42 ... psl->ssr_raytrace | 0.10 | 0.01 | 7.40 ... Downsample Radiance | 0.12 | 0.22 | 7.48 .... psl->color_copy_ps | 0.04 | 0.01 | 7.46 .... psl->color_downsample_ps | 0.02 | 0.01 | 7.48 .... psl->color_downsample_ps | 0.00 | 0.01 | 7.48 .... psl->color_downsample_ps | 0.00 | 0.01 | 7.47 .... psl->color_downsample_ps | 0.00 | 0.01 | 7.42 .... psl->color_downsample_ps | 0.00 | 0.01 | 7.40 .... psl->color_downsample_ps | 0.00 | 0.01 | 7.39 ... psl->ssr_resolve | 0.37 | 0.02 | 7.36 .. psl->probe_display | 0.00 | 0.01 | 7.69 .. Opaque Refraction | 0.00 | 0.01 | 7.67 .. Post FX | 0.21 | 0.35 | 7.65 ... Bloom Blit | 0.04 | 0.02 | 7.63 ... Bloom Downsample First | 0.01 | 0.01 | 7.64 ... Bloom Downsample | 0.00 | 0.01 | 7.62 ... Bloom Downsample | 0.00 | 0.01 | 7.60 ... Bloom Downsample | 0.00 | 0.01 | 7.57 ... Bloom Downsample | 0.00 | 0.01 | 7.55 ... Bloom Upsample | 0.00 | 0.02 | 7.52 ... Bloom Upsample | 0.00 | 0.01 | 7.49 ... Bloom Upsample | 0.00 | 0.01 | 7.47 ... Bloom Upsample | 0.02 | 0.01 | 7.46 ... Bloom Resolve | 0.08 | 0.01 | 7.46 . Overlay | 0.22 | 0.66 | 7.54 .. psl->background_ps | 0.03 | 0.01 | 7.52 .. psl->fade_ps[i] | 0.00 | 0.01 | 7.51 .. psl->facing_ps[i] | 0.00 | 0.01 | 7.49 .. psl->extra_blend_ps | 0.00 | 0.01 | 7.47 .. psl->wireframe_ps | 0.00 | 0.01 | 7.45 .. *p_armature_trans_ps | 0.00 | 0.01 | 7.43 .. *p_armature_ps | 0.00 | 0.02 | 7.40 .. psl->particle_ps | 0.00 | 0.01 | 7.37 .. psl->metaball_ps[i] | 0.00 | 0.01 | 7.35 .. *p_extra_ps | 0.00 | 0.08 | 7.33 .. psl->attribute_ps | 0.00 | 0.01 | 7.25 .. psl->grid_ps | 0.04 | 0.01 | 7.22 .. psl->fade_ps[i] | 0.00 | 0.01 | 7.23 .. psl->facing_ps[i] | 0.00 | 0.01 | 7.22 .. *p_armature_trans_ps | 0.00 | 0.01 | 7.19 .. *p_armature_ps | 0.00 | 0.01 | 7.17 .. *p_extra_ps | 0.00 | 0.02 | 7.15 .. psl->metaball_ps[i] | 0.00 | 0.01 | 7.11 .. psl->motion_paths_ps | 0.00 | 0.01 | 7.09 .. psl->extra_grid_ps | 0.00 | 0.01 | 7.07 .. psl->extra_centers_ps | 0.00 | 0.01 | 7.05 .. psl->antialiasing_ps | 0.10 | 0.01 | 7.03 . RegionInfo | 0.03 | 0.06 | 7.06 Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 12.2 | 6.86 | . SPACE_TOPBAR | 0.00 | 0.02 | 0.25 . SPACE_STATUSBAR | 0.00 | 0.00 | 0.21 . SPACE_VIEW3D | 11.9 | 6.67 | 0.19 .. Viewport | 11.9 | 6.65 | 0.19 . Window Redraw | 0.30 | 0.11 | 5.43 ``` </details> <details> <summary>EEVEE Next</summary> ``` Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 21.9 | 5.80 | . Manager.end_sync | 0.06 | 0.14 | 2.32 . EEVEE | 21.5 | 4.58 | 2.18 .. negZ_view | 21.5 | 4.40 | 2.16 ... LightCulling | 0.04 | 0.10 | 2.13 .... Select | 0.00 | 0.01 | 2.12 .... Sort | 0.00 | 0.01 | 2.10 .... Zbin | 0.01 | 0.01 | 2.08 .... Tiles | 0.01 | 0.01 | 2.08 ... Probe.Select | 0.00 | 0.01 | 2.06 ... World.Background | 1.03 | 0.03 | 2.03 ... Deferred.Opaque | 18.9 | 2.93 | 3.03 .... View.compute_visibility | 0.01 | 0.02 | 3.01 .... DrawMultiBuf.bind | 0.01 | 0.03 | 2.99 .... Prepass | 2.24 | 0.36 | 2.96 ..... DoubleSided.Static | 2.21 | 0.27 | 2.95 ..... SingleSided.Static | 0.00 | 0.01 | 4.88 ..... DoubleSided.Moving | 0.02 | 0.01 | 4.86 ..... SingleSided.Moving | 0.00 | 0.00 | 4.86 .... HizUpdate | 0.06 | 0.03 | 4.83 .... Shadow | 5.55 | 0.80 | 4.86 ..... TilemapSetup | 0.02 | 0.09 | 4.85 ...... ClearClipmap | 0.00 | 0.01 | 4.84 ...... DirectionalBounds | 0.00 | 0.01 | 4.82 ...... Init | 0.00 | 0.01 | 4.80 ...... CasterUpdate | 0.00 | 0.01 | 4.79 ..... View.compute_visibility | 0.01 | 0.01 | 4.76 ..... DrawMultiBuf.bind | 0.00 | 0.01 | 4.76 ..... TagUsage | 0.13 | 0.06 | 4.74 ...... Opaque | 0.13 | 0.01 | 4.73 ...... Transparent | 0.00 | 0.02 | 4.83 ..... TilemapUpdate | 0.12 | 0.15 | 4.80 ...... MaskLod | 0.00 | 0.01 | 4.79 ...... Free | 0.00 | 0.01 | 4.77 ...... Defrag | 0.00 | 0.01 | 4.76 ...... AllocatePages | 0.00 | 0.01 | 4.74 ...... Finalize | 0.01 | 0.01 | 4.73 ...... RenderClear | 0.07 | 0.01 | 4.72 ..... View.compute_procedural_bounds | 0.01 | 0.01 | 4.77 ..... View.compute_visibility | 0.03 | 0.01 | 4.76 ..... DrawMultiBuf.bind | 0.01 | 0.02 | 4.78 ..... Shadow.Surface | 5.17 | 0.31 | 4.76 ...... Shadow.Surface | 5.17 | 0.30 | 4.75 ....... Shadow.Surface.Double-Sided | 5.17 | 0.25 | 4.74 ....... Shadow.Surface.Single-Sided | 0.00 | 0.01 | 9.64 .... View.compute_visibility | 0.01 | 0.01 | 9.58 .... DrawMultiBuf.bind | 0.01 | 0.02 | 9.58 .... Shading | 5.08 | 0.41 | 9.55 ..... DoubleSided | 0.00 | 0.00 | 9.54 ..... SingleSided | 0.00 | 0.01 | 9.53 ..... DoubleSided | 5.00 | 0.30 | 9.51 ..... SingleSided | 0.00 | 0.01 | 14.2 ..... StencilClassify | 0.07 | 0.02 | 14.1 .... Raytracing | 4.35 | 0.93 | 14.2 ..... HorizonScan.Setup | 0.07 | 0.01 | 14.2 ..... TileClassify | 0.08 | 0.01 | 14.2 ..... Raytracing | 2.47 | 0.23 | 14.3 ...... TileCompact | 0.01 | 0.01 | 14.2 ...... RayGenerate | 0.01 | 0.01 | 14.2 ...... Trace.Screen | 0.01 | 0.01 | 14.2 ...... DenoiseSpatial | 0.05 | 0.01 | 14.2 ...... DenoiseTemporal | 0.03 | 0.02 | 14.2 ...... DenoiseBilateral | 0.01 | 0.01 | 14.2 ...... HorizonScan.Trace | 1.63 | 0.01 | 14.2 ...... HorizonScan.Denoise | 0.69 | 0.01 | 15.8 ..... Raytracing | 1.65 | 0.38 | 16.5 ...... TileCompact | 0.01 | 0.01 | 16.5 ...... RayGenerate | 0.01 | 0.02 | 16.5 ...... Trace.Screen | 0.01 | 0.01 | 16.4 ...... DenoiseSpatial | 0.04 | 0.02 | 16.4 ...... DenoiseTemporal | 0.02 | 0.05 | 16.4 ...... DenoiseBilateral | 0.01 | 0.04 | 16.3 ...... HorizonScan.Trace | 1.06 | 0.01 | 16.3 ...... HorizonScan.Denoise | 0.46 | 0.02 | 17.3 ..... Raytracing | 0.06 | 0.22 | 17.7 ...... TileCompact | 0.01 | 0.01 | 17.7 ...... RayGenerate | 0.00 | 0.01 | 17.7 ...... Trace.Screen | 0.00 | 0.01 | 17.7 ...... DenoiseSpatial | 0.00 | 0.02 | 17.7 ...... DenoiseTemporal | 0.00 | 0.01 | 17.7 ...... DenoiseBilateral | 0.00 | 0.01 | 17.6 ...... HorizonScan.Trace | 0.00 | 0.01 | 17.6 ...... HorizonScan.Denoise | 0.00 | 0.01 | 17.6 .... EvalLights | 1.34 | 0.06 | 17.6 ..... Eval.Light | 1.34 | 0.04 | 17.5 .... Combine | 0.11 | 0.01 | 18.8 ... Deferred.Refract | 0.00 | 0.01 | 18.9 ... Forward.Opaque | 1.31 | 1.06 | 18.9 .... View.compute_visibility | 0.01 | 0.01 | 18.9 .... DrawMultiBuf.bind | 0.00 | 0.01 | 18.9 .... Prepass | 0.00 | 0.08 | 18.9 ..... DoubleSided.Static | 0.00 | 0.01 | 18.9 ..... SingleSided.Static | 0.00 | 0.01 | 18.9 ..... DoubleSided.Moving | 0.00 | 0.01 | 18.8 ..... SingleSided.Moving | 0.00 | 0.00 | 18.8 .... HizUpdate | 0.06 | 0.01 | 18.8 .... Shadow | 1.20 | 0.75 | 18.8 ..... TilemapSetup | 0.02 | 0.09 | 18.8 ...... ClearClipmap | 0.00 | 0.01 | 18.8 ...... DirectionalBounds | 0.00 | 0.01 | 18.8 ...... Init | 0.00 | 0.01 | 18.8 ...... CasterUpdate | 0.00 | 0.01 | 18.8 ..... View.compute_visibility | 0.01 | 0.01 | 18.7 ..... DrawMultiBuf.bind | 0.00 | 0.01 | 18.7 ..... TagUsage | 0.13 | 0.05 | 18.7 ...... Opaque | 0.12 | 0.01 | 18.7 ...... Transparent | 0.00 | 0.01 | 18.8 ..... TilemapUpdate | 0.04 | 0.14 | 18.8 ...... MaskLod | 0.00 | 0.01 | 18.8 ...... Free | 0.00 | 0.01 | 18.8 ...... Defrag | 0.00 | 0.01 | 18.7 ...... AllocatePages | 0.00 | 0.01 | 18.7 ...... Finalize | 0.01 | 0.01 | 18.7 ...... RenderClear | 0.00 | 0.01 | 18.7 ..... View.compute_procedural_bounds | 0.01 | 0.01 | 18.7 ..... View.compute_visibility | 0.03 | 0.01 | 18.7 ..... DrawMultiBuf.bind | 0.01 | 0.02 | 18.7 ..... Shadow.Surface | 0.91 | 0.28 | 18.7 ...... Shadow.Surface | 0.91 | 0.26 | 18.7 ....... Shadow.Surface.Double-Sided | 0.90 | 0.22 | 18.6 ....... Shadow.Surface.Single-Sided | 0.00 | 0.01 | 19.3 .... View.compute_visibility | 0.01 | 0.01 | 19.3 .... DrawMultiBuf.bind | 0.00 | 0.01 | 19.3 .... Shading | 0.00 | 0.04 | 19.2 ..... SingleSided | 0.00 | 0.00 | 19.2 ..... DoubleSided | 0.00 | 0.00 | 19.2 ... View.compute_visibility | 0.01 | 0.01 | 19.2 ... DrawMultiBuf.bind | 0.00 | 0.01 | 19.2 ... Forward.Transparent | 0.00 | 0.02 | 19.1 .... ResourceBind | 0.00 | 0.01 | 19.1 ... Film.Accumulate | 0.18 | 0.02 | 19.1 .. Velocity Copy Pass | 0.00 | 0.01 | 19.2 . Overlay | 0.23 | 0.80 | 19.1 .. psl->background_ps | 0.04 | 0.02 | 19.1 .. psl->fade_ps[i] | 0.00 | 0.01 | 19.1 .. psl->facing_ps[i] | 0.00 | 0.01 | 19.1 .. psl->extra_blend_ps | 0.00 | 0.01 | 19.0 .. psl->wireframe_ps | 0.00 | 0.02 | 19.0 .. *p_armature_trans_ps | 0.00 | 0.01 | 19.0 .. *p_armature_ps | 0.00 | 0.02 | 19.0 .. psl->particle_ps | 0.00 | 0.01 | 18.9 .. psl->metaball_ps[i] | 0.00 | 0.01 | 18.9 .. *p_extra_ps | 0.00 | 0.09 | 18.9 .. psl->attribute_ps | 0.00 | 0.01 | 18.8 .. psl->grid_ps | 0.04 | 0.02 | 18.8 .. psl->fade_ps[i] | 0.00 | 0.02 | 18.8 .. psl->facing_ps[i] | 0.00 | 0.01 | 18.7 .. *p_armature_trans_ps | 0.00 | 0.01 | 18.7 .. *p_armature_ps | 0.00 | 0.02 | 18.7 .. *p_extra_ps | 0.00 | 0.01 | 18.6 .. psl->metaball_ps[i] | 0.00 | 0.01 | 18.6 .. psl->motion_paths_ps | 0.00 | 0.01 | 18.6 .. psl->extra_grid_ps | 0.00 | 0.02 | 18.6 .. psl->extra_centers_ps | 0.00 | 0.01 | 18.5 .. psl->antialiasing_ps | 0.10 | 0.03 | 18.5 . RegionInfo | 0.03 | 0.11 | 18.5 Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 27.8 | 25.0 | . SPACE_TOPBAR | 0.00 | 0.01 | 14.6 . SPACE_STATUSBAR | 0.00 | 0.01 | 14.6 . SPACE_VIEW3D | 27.5 | 24.8 | 14.5 .. Viewport | 27.5 | 24.8 | 14.5 . Window Redraw | 0.30 | 0.10 | 17.2 ``` </details> I’ve also tested other scenes and they show similar results. *(These are from an RTX 3060 Ti, it would be good to have a similar comparison made on AMD and on Apple hardware)* ### Issues #### PrePass PrePass is slower in EEVEE Next (2.24 vs 1.36ms). ~~No idea about the reason, I've tried removing all the extra stuff from the shaders (velocity, displacement, transparency) and the performance difference didn't change.~~ *Update:* There are thousands of rocks in the scene and many of them are unique meshes. After merging all of them into one mesh or ensuring they all share the same mesh (so they can be batched), the pre-pass performance between Legacy and Next is identical. So the performance difference seems caused by the draw call overhead, maybe due to EEVEE-Next using indirect draw calls? If that's the case we may want to look into using regular draw calls + CPU culling for single instance meshes. *Update 2:* Fix: #117561 #### Shading The Shading pass in EEVEE Next (which only outputs the closure data) takes a bit longer than the EEVEE Legacy pass (5.08 vs 4.90 ms) even if EEVEE Legacy is computing the lighting too. So, in reality, Shading + Lighting is taking 6.53 ms in EEVEE Next (Shading+Eval Lights+Combine) vs 4.90 ms in EEVEE Legacy. I guess the main bottleneck here is fillrate and we’ll have to eat the cost as long as we use Deferred Rendering. Other types of scenes (with more light sources or very dense geometry) might compare more favorably. *Update:* A good chunk of the performance difference seems to be caused by the same reason as the pre-pass difference. The shading pass (without lighting) in Next becomes faster than Legacy after #117561. #### Shadows EEVEE Legacy is able to fully skip the Shadow pass, but EEVEE Next Deferred.Opaque Shadows are always being re-rendered (5.55 ms) so the update detection is not working correctly. *Update:* This doesn't seem to happen consistently, even in the same scene. Sometimes the update detection seems to work fine. Running `Shadows.set_view` has a non-trivial cost (1.20 ms) even when it doesn’t render anything. That’s up to 2.40 ms (Deferred.Refraction + Forward.Opaque) doing nothing. We could avoid by it by rendering all pre-passes fist and running the shadow update/rendering only once per frame. Forward.Opaque can probably be fully removed, though? *Update:* Deferred.Refraction and Forward.Opaque passes are now skipped when they're empty. The issue is still there when they're used, though. *Update 2:* While tagging still has a cost, the `Surface.Shading` overhead is gone after #117561. #### ShadowsStatistics The large difference between EEVEE Next (21.5 ms) and the full Viewport (27.5 ms) seems to be caused almost entirely by the `Shadow::statistics_buf_` read (which is done at sync time). Removing the read also increases the latency, which is oddly low at the moment. I’m not sure if the cause is a bug on our side (Swapchain maybe?) or if it’s simply the driver using a different strategy. *Update:* Fix: #117521 #### Raytracing Raytracing seems quite costly (~~4.35 ms~~ *edit: actually more (~5ms), there was a bug in the initial test, see #117159*), not sure how much the horizon passes could be optimized. But I found something odd with the “regular” screen tracing. If I disable the horizon method (by setting Max Roughness to 1) this is what I get: ```` .... Raytracing | 4.99 | 0.55 | 49.1 ..... TileClassify | 0.08 | 0.01 | 49.1 ..... Raytracing | 2.51 | 0.17 | 49.2 ...... TileCompact | 0.01 | 0.01 | 49.1 ...... RayGenerate | 0.06 | 0.01 | 49.1 ...... Trace.Screen | 0.44 | 0.01 | 49.2 ...... DenoiseSpatial | 0.76 | 0.01 | 49.6 ...... DenoiseTemporal | 0.40 | 0.01 | 50.3 ...... DenoiseBilateral | 0.82 | 0.01 | 50.7 ..... Raytracing | 2.34 | 0.15 | 51.5 ...... TileCompact | 0.01 | 0.01 | 51.5 ...... RayGenerate | 0.04 | 0.01 | 51.5 ...... Trace.Screen | 0.34 | 0.01 | 51.5 ...... DenoiseSpatial | 0.49 | 0.01 | 51.8 ...... DenoiseTemporal | 0.25 | 0.01 | 52.3 ...... DenoiseBilateral | 1.18 | 0.01 | 52.5 ..... Raytracing | 0.05 | 0.15 | 53.7 ...... TileCompact | 0.00 | 0.01 | 53.7 ...... RayGenerate | 0.00 | 0.01 | 53.6 ...... Trace.Screen | 0.00 | 0.01 | 53.6 ...... DenoiseSpatial | 0.00 | 0.01 | 53.6 ...... DenoiseTemporal | 0.00 | 0.01 | 53.6 ...... DenoiseBilateral | 0.00 | 0.01 | 53.6 ```` Most of the time seems to be eaten by denoising (4.01 ms out of 4.99 ms). So, at the moment, it doesn’t seem like the current horizon scanning or the denoise implementation are worth its cost. *Update:* Heavily improved by #118924 ### Conclusion By optimizing these issues, I think we should get roughly on the same performance level as EEVEE Legacy. I'll keep updating the task as I find more details.
Miguel Pozo added the
Interest
EEVEE
Module
EEVEE & Viewport
Type
Design
labels 2024-01-17 18:35:10 +01:00
Miguel Pozo added this to the EEVEE & Viewport project 2024-01-17 18:35:12 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#117246
No description provided.