One drawback to trying to predict the number of threads that will be used in the `task_graph` is that we are only sure of the number when the threads are running. Using `BLI_task_parallel_range` allows the driver to choose the best thread distribution through `parallel_reduce`. The benefit is most evident on hardware with fewer cores. This is the result on an 4-core laptop: ||before:|after: |---|---|---| |large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS ||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms) Differential Revision: https://developer.blender.org/D11558