Only concerns poly normals computing, have usual 10% speedup of affected code for OMP -> BLI_task switching. Also parallelized the 'weighted accum' part (used when computing both polys and vertices normals, when using modifiers e.g.), which gives nice 325% speedup (from 66ms to 20ms for a 500k poly monkey with simple deform modifier e.g.). ;)