Replace iteration with foreach_get/set and numpy vectorized operations
An issue introduced in Blender 3.3 makes the original code very slow:
blender/blender#105909
So this results in about a 1500 times speedup at 6000 vertices and 3500 times at 25000 vertices.
In Blender 3.2 the new code would run about 40-60 times faster for 1000-25000+ vertices.