Where possible, change most of remaining strip drawing parts to use
batched drawing. And now that's done, there's much less need
to separate all of it by "layers", and most of it can go back to the
style of "for all strips: draw these parts" since that all goes
through the same batcher. However thumbnails and the locked
state are still treated as "separate layers" since they use different
shader.
Similar to waveform overlay: draw it via batched quads.
On Sprite Fright Edit, fcurve overlay drawing goes
from 11.1ms down to 5.8ms, with most of the remaining cost inside
id_data_find_fcurve.
Drawing the channels list was taking about 3ms on my machine, almost all
of that time querying *all* the VSE strips for each channel mute/lock
button. That is done due to some complexicated code from, where deep
from inside button it calls ui_but_get_fcurve which eventually calls
rna_SeqTimelineChannel_path, which calls rna_SeqTimelineChannel_owner_get,
which was proceeding to construct a set of all the strips.
It's a good question whether this "all the strips" set could be cached
somewhere.
But for now, just change the code to not query all the strips, but
instead query a set of all meta strips, since that's what the code is
only interested in anyway. This makes draw_channels go from 3.0ms
down to 1.3ms.
Drawing strip waveform overlays is both expensive, and has issues due
to how it is being rendered. A design proposal to draw them as
square samples: #115274
and this follows that. While at it:
- The complicated logic to switch between single line and triangle strip
was removed. Instead the squares are drawn using the same batching
mechanism added in previous commits.
- Fixes the issue of waveform overshooting the actual data, due to
sample interpolation wrongly using -0.5..+0.5 range instead of 0..1
range. This was sometimes causing parts of waveform to be displayed
as "clipped" too, when in fact actual audio is not.
- Fixes the issue of waveform sample accumulation when pixel covers
multiple samples, to not properly include all the samples. Was caused
by start of range using rounded sample number, while end of frame
using truncated frame number.
In Sprite Fright Edit data set, with whole timeline visible (2702
strips), repainting the timeline UI with audio waveforms takes
(Windows, Ryzen 5950X, RTX 3080Ti): 19.5ms -> 15.1ms
(draw_seq_waveform_overlay 13.5ms -> 8.7ms). Large part of remaining
cost is id_data_find_fcurve lookups.
Drawing of retiming keys was spending almost all the time inside
SEQ_retiming_selection_contains, with essentially squared complexity
(for each retiming key, scanning all existing keys).
Instead, build selected keys into a hashmap via
SEQ_retiming_selection_get for way faster lookup.
In Sprite Fright Edit data set, with whole timeline visible (2702
strips), repainting the timeline UI with audio waveforms takes
(Windows, Ryzen 5950X, RTX 3080Ti): 23ms -> 19.5ms (retime_keys_draw
5.7ms -> 0.9ms)
The strips and retiming items were drawn one at a time, switching
between GPU shaders, geometry topologies and using tiny immediate
mode batches (often two triangles) for everything.
Optimize that by going way larger GPU batches, like:
- Draw all strip backgrounds at once,
- Draw all strip handles at once,
- Draw all strip outlines at once,
- Draw all retiming continuity sections at once,
- Draw all retiming keyframes at once.
There's no efficient way to draw separate quads in GPU IMM API, so
instead this adds SeqQuadsBatch utility that batches up to 1024
quads for drawing, using indexed vertices.
In Sprite Fright Edit data set, with whole timeline visible (2702
strips), repainting the timeline UI with audio waveforms on takes:
Windows (Ryzen 5950X, RTX 3080Ti): 46ms -> 23ms. Of the time still
left, 14ms is drawing sound waveforms. Remaining cost is other overheads
not directly related to actual rendering (some squared complexity
selection queries, f-curve existence lookups etc.).