Instead of applying blur kernel to "left + right side, followed by
middle", do much simpler thing and just apply it normally, taking care
of boundary conditions where kernel would step outside the image.
Also instead of doing "add glow to original image" in a separate pass
over the whole image, just add source when writing the final pixel.
Less code, and faster.
Applying glow at 4K UHD resolution, on Windows Ryzen 5950X:
- distance 4: 122ms -> 109ms
- distance 20: 346ms -> 336ms
Instead of doing preparation/finishing operations in separate passes
over the image, do a combined operation in one go. This also makes
IMB_buffer_float_unpremultiply and IMB_buffer_float_premultiply not
be used by anything, so remove.
Applying glow at 4K UHD resolution, on Windows Ryzen 5950X:
- distance 4: 136ms -> 122ms
- distance 20: 365ms -> 346ms