VSE: reduce effects code duplication, making gaussian blur faster in the process #116089

Merged
Aras Pranckevicius merged 2 commits from aras_p/blender:vse-fx-cleanup into main 2023-12-14 17:31:16 +01:00

Now that the code is in C++, quite some duplication between "byte" and "float" effect code paths can be reduced (easier than it was in C times). So I did that, removing about 400 lines of code.

In that process I accidentally made Gaussian Blur faster, since while reducing the amount of code I noticed it was doing some things sub-optimally (calculated kernel tables for each job, etc.). Applying 100x100 gaussian blur on 4K UHD resolution image strip on Ryzen 5950X went 630ms -> 450ms.

Now that the code is in C++, quite some duplication between "byte" and "float" effect code paths can be reduced (easier than it was in C times). So I did that, removing about 400 lines of code. In that process I accidentally made Gaussian Blur faster, since while reducing the amount of code I noticed it was doing some things sub-optimally (calculated kernel tables for each job, etc.). Applying 100x100 gaussian blur on 4K UHD resolution image strip on Ryzen 5950X went 630ms -> 450ms.
Aras Pranckevicius added 2 commits 2023-12-12 10:49:29 +01:00
Logic between byte vs float effects was the same in many places, so now
that the source is in C++, we can reduce that duplication. Effects:
alpha over, alpha under, gamma cross, wipe, apply blend function util.
VSE: make Gaussian Blur effect both faster and with less code
All checks were successful
buildbot/vexp-code-patch-windows-amd64 Build done.
buildbot/vexp-code-patch-linux-x86_64 Build done.
buildbot/vexp-code-patch-lint Build done.
buildbot/vexp-code-patch-darwin-x86_64 Build done.
buildbot/vexp-code-patch-darwin-arm64 Build done.
buildbot/vexp-code-patch-coordinator Build done.
bac0481697
Similar to previous commit, there was a lot of code duplication
between "byte" and "float" gaussian blur code variants. Also:

- Use direct parallel_for threading instead of very roundabout
  way that was coming from C times. Way less code, and allows loading
  CPU cores better (parallel_for seems better than task pool, also
  gaussian blur is relatively expensive so use smaller grain sizes).
- Calculate gaussian kernel tables once per effect, instead of once
  per CPU job.
- Do "sample from this to that pixel" including boundary conditions
  calculation explicitly, instead of checking boundary condition
  at each iteration inside the inner loop.

Applying gaussian blur effect of 100x100 size on 4K UHD sequencer
strip input, on Windows / Ryzen 5950X: 630ms -> 450ms

And with 220 fewer lines of code :)
Author
Member

@blender-bot build

@blender-bot build
Aras Pranckevicius changed title from WIP: VSE: reduce effects code duplication, making gaussian blur faster in the process to VSE: reduce effects code duplication, making gaussian blur faster in the process 2023-12-12 11:13:01 +01:00
Aras Pranckevicius added this to the Video Sequencer project 2023-12-12 11:13:07 +01:00
Aras Pranckevicius requested review from Richard Antalik 2023-12-12 11:13:21 +01:00
Richard Antalik reviewed 2023-12-12 21:55:53 +01:00
Richard Antalik left a comment
Member

This is really nice change, I really appreciate it! I would like to discuss the inline stuff, perhaps it would be good idea to ask on chat to better understand the intentions behind it.

This is really nice change, I really appreciate it! I would like to discuss the inline stuff, perhaps it would be good idea to ask on chat to better understand the intentions behind it.
@ -946,4 +863,0 @@
using IMB_blend_func_byte = void (*)(uchar *dst, const uchar *src1, const uchar *src2);
using IMB_blend_func_float = void (*)(float *dst, const float *src1, const float *src2);
BLI_INLINE void apply_blend_function_byte(float fac,

From what I read, BLI_INLINE has an effect mostly on how the code compiles, not necessarily on performance. I have seen some benchmark linked from random stackoverflow article, but not sure if they are benchmarking code performance, or compiler performance. https://indico.cern.ch/event/386232/sessions/159923/attachments/771039/1057534/always_inline_performance.pdf

The benchmark also suggests effects on binary size, but guessing it won't have any major effect here.

So the question is, why did this code use this attribute/qualifier and whether it really needs to be there. From what I have read, I would guess it does not really need to be there, but would like to hear your opinion on this.

From what I read, `BLI_INLINE` has an effect mostly on how the code compiles, not necessarily on performance. I have seen some benchmark linked from random stackoverflow article, but not sure if they are benchmarking code performance, or compiler performance. https://indico.cern.ch/event/386232/sessions/159923/attachments/771039/1057534/always_inline_performance.pdf The benchmark also suggests effects on binary size, but guessing it won't have any major effect here. So the question is, why did this code use this attribute/qualifier and whether it really needs to be there. From what I have read, I would guess it does not really need to be there, but would like to hear your opinion on this.
Author
Member

I think in this particular case it does not matter at all.

"inline" is a hint to the compiler, saying like "oh whenever anything calls this function, please to try to just literally paste the whole function into the call site, instead of actually calling the function". It's only a hint, but generally it makes sense to do that for functions that are extremely short, think one or two lines of code with a handful of operations. This one is not, it's two nested loops, a bunch of operations inside the loop, and call into yet another blend function. The amount of overhead from "call the function" compared to the amount of work done inside the function (working on thousands or tends of thousands of pixels!) is absolutely tiny.

So I have no idea why would anyone mark it as inline. My only guess is that maybe back at some point the function used to not work on many pixels, but rather on one pixel at a time, and it was marked inline, but then it got changed to have these two loops, and the inline stayed by accident. But this theory is hard to prove since the code seems to have been (re)added by Ton 15 years ago with "blender 2.5 is back with sequencer code!", so presumably the code existed from even before that, just without a trace in git.

I think in this particular case it does not matter at all. "inline" is a hint to the compiler, saying like "oh whenever anything calls this function, please to try to just literally paste the whole function into the call site, instead of actually calling the function". It's only a hint, but generally it makes sense to do that for functions that are extremely short, think one or two lines of code with a handful of operations. This one is not, it's two nested loops, a bunch of operations inside the loop, and call into yet another blend function. The amount of overhead from "call the function" compared to the amount of work done inside the function (working on thousands or tends of thousands of pixels!) is absolutely tiny. So I have no idea why would anyone mark it as inline. My only guess is that maybe back at some point the function used to not work on many pixels, but rather on one pixel at a time, and it was marked inline, but then it got changed to have these two loops, and the inline stayed by accident. But this theory is hard to prove since the code seems to have been (re)added by Ton 15 years ago with "blender 2.5 is back with sequencer code!", so presumably the code existed from even before that, just without a trace in git.
@ -373,0 +321,4 @@
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
float4 col2 = load_premul_pixel(src2);
if (col2.w <= 0.0f) {

Later in the code you access float4 values array by index, here by it's member. Personally I like index more, since pixel[4] is more familiar than pixel.w

It's a matter of getting used to it I guess, so it's a nitpick really.

Later in the code you access `float4` values array by index, here by it's member. Personally I like index more, since `pixel[4]` is more familiar than `pixel.w` It's a matter of getting used to it I guess, so it's a nitpick really.
Author
Member

I don't have an opinion either way. .w for me personally is a bit easier to read than [3], but that's a taste/familiarity thing. And yeah I could have used .x ... .w below too. Do you want me to change it one or another way?

I don't have an opinion either way. `.w` for me personally is a bit easier to read than `[3]`, but that's a taste/familiarity thing. And yeah I could have used `.x` ... `.w` below too. Do you want me to change it one or another way?
iss marked this conversation as resolved
Richard Antalik approved these changes 2023-12-14 15:25:32 +01:00
Richard Antalik left a comment
Member

Pressed the wrong button on x/y/z/w or index discussion, I think it should stay as it is. Would be perhaps nicer to have rgba members for pixels, but that's just cosmetics really..

Pressed the wrong button on x/y/z/w or index discussion, I think it should stay as it is. Would be perhaps nicer to have rgba members for pixels, but that's just cosmetics really..
Aras Pranckevicius merged commit 5cac8e2bb4 into main 2023-12-14 17:31:16 +01:00
Aras Pranckevicius deleted branch vse-fx-cleanup 2023-12-14 17:31:19 +01:00
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116089
No description provided.