c54381488b
This gives about 5% speedup for AVX processors. Benefit of such optimization on other microarchitectures is still under investigation.