Conceptually Subsampling filter is a box filter: it sums up N source
image pixels, computes their average and outputs the result. Critical
thing is, that should be done in premultiplied space so that colors
from fully or mostly transparent regions do not "override" opaque
colors.
Previously, especially when operating on byte images, the code
achieved this by always working on byte values, doing "progressively
smaller" lerps into byte color result, taking care of
premultiplication and again storing the "straight" alpha for each
sample being processed. This meant that for each sample, there are 3
divisions involved! This also led to some precision loss, since for
all 9 samples all the intermediate results would only be stored at
byte precision.
Reformulate that by simply accumulating the premultiplied color
as a float color. This gets rid of all divisions, except the last
step when said float needs to be written back into a byte color.
Processing destination 4K UHD resolution image with Subsampling 3x3
filter:
- Windows/VS2022/Ryzen5950X: 52.7ms -> 28.3ms
- Mac/clang15/M1Max: 54.4ms -> 43.7ms
The unit test results have a tiny difference, since now it is better
(as per above, previously it was having some precision loss).