Reduce the inner loop of IMB_transform by extracting writing to an output buffer in a template. This reduces a branch in the inner loop and would allow different number of channels in the future.
Reduce the inner loop of IMB_transform by extracting writing to an output buffer in a template. This reduces a branch in the inner loop and would allow different number of channels in the future.