BLF: optimizations and fixes to font shader #119653

Merged
Aras Pranckevicius merged 7 commits from aras_p/blender:text-shader-opt into main 2024-03-19 16:29:30 +01:00

7 Commits

Author SHA1 Message Date
Aras Pranckevicius 6629015bd3 Code style adjustments 2024-03-19 17:15:32 +02:00
Aras Pranckevicius f55fb2b4c5 Merge branch 'main' into text-shader-opt
buildbot/vexp-code-patch-lint Build done. Details
buildbot/vexp-code-patch-linux-x86_64 Build done. Details
buildbot/vexp-code-patch-darwin-x86_64 Build done. Details
buildbot/vexp-code-patch-darwin-arm64 Build done. Details
buildbot/vexp-code-patch-windows-amd64 Build done. Details
buildbot/vexp-code-patch-coordinator Build done. Details
2024-03-19 14:40:33 +02:00
Aras Pranckevicius 94c69c0589 BLF: fix sampling artifacts towards top/left edges with blur (exist on main too)
Casting float UV coordinate to int rounds towards zero, but we rounding
towards negative infinity, i.e. a floor.
2024-03-19 14:40:04 +02:00
Aras Pranckevicius 1e4c089a75 BLF: simplify font blurring shader
buildbot/vexp-code-patch-lint Build done. Details
buildbot/vexp-code-patch-linux-x86_64 Build done. Details
buildbot/vexp-code-patch-darwin-x86_64 Build done. Details
buildbot/vexp-code-patch-darwin-arm64 Build done. Details
buildbot/vexp-code-patch-windows-amd64 Build done. Details
buildbot/vexp-code-patch-coordinator Build done. Details
Instead of doing manual bilinear (4 samples) for each tap (total
16 texture fetches for 3x3, 64 fetches for 5x5), fetch the (N+1)x(N+1)
raw texels and interpolate with a bilinearly shifted filter kernel.
For 5x5 blur, this is 36 texture samples instead of 64.

Analyzing the shader with Mali Offline Compiler for Mali-G76 arch:
- Work registers: 64 -> 32
- Uniform registers: unchanged 10
- Total cycles, arithmetic: 26.2 -> 5.97
- Total cycles, load/store: 0.0 -> 4.0
- Total cycles, texture: 44.0 -> 5.0

1st time initialization of the shader (Win10, RTX 3080Ti): 51.3ms
(main branch: 274.4ms)
2024-03-19 12:15:20 +02:00
Aras Pranckevicius 06848c1f64 BLF: simplify fetching of 4 samples in texture_1D_custom_bilinear_filter
Always fetch the 4 corners for a bilinear sample, and then set their
values to zero if they are outside the glyph bounds.

Analyzing the shader with Mali Offline Compiler for Mali-G76 arch:
- Stack spilling: 644 bytes -> none!
- Work registers: unchanged 64
- Uniform registers: 10 -> 18
- Total cycles, arithmetic: 31.8 -> 26.2
- Total cycles, load/store: 114.0 -> 0.0
- Total cycles, texture: 44.0 -> 42.0
2024-03-18 12:00:10 +02:00
Aras Pranckevicius 0a56dcec6f BLF: artithmetic simplifications for text shader
Fold various divisions/multiplications etc.

Analyzing the shader with Mali Offline Compiler for Mali-G76 arch:
- Stack spilling: 692 -> 644 bytes
- Total cycles, arithmetic: 33.2 -> 31.8
- Total cycles, load/store: 119.0 -> 114.0
- Total cycles, texture: unchanged 44.0
2024-03-18 11:50:48 +02:00
Aras Pranckevicius 046b692988 BLF: simplify text shader texel_fetch
Avoid very costly integer division/modulo as well as two texelFetch
calls separated by a branch. We know that font texture width
is power of two, so we can replace division/modulo with a shift
and a mask, that is set from the calling code via glyph_tex_size
uniform.

Analyzing the shader with Mali Offline Compiler for Mali-G76 arch:
- Stack spilling: 724 -> 692 bytes
- Total cycles, arithmetic: 119.5 -> 33.2
- Total cycles, load/store: 161.0 -> 119.0
- Total cycles, texture: 88.0 -> 44.0
2024-03-18 10:49:30 +02:00