Use buffers matching the C types of the data in foreach_get to avoid having to iterate and cast every single element in the C foreach_getset function.
Replace nors_transformed_gen mesh transform helper with numpy version nors_transformed.
This patch can slightly change the exported normals/tangent/bitangents when geom_mat_no is set by skipping a cast from double to single precision before casting as float64 (usually double precision):
The original code would multiply mathutils.Matrix (single precision) and mathutils.Vector (single precision) together, which casts the Matrix elements to double precision, performs the multiplication, and then casts the result back to single precision. These single precision multiplied vectors would then be cast to float64 to be exported.
The new code performs the same cast of the matrix to double precision, but skips the step of casting back to single precision, instead casting directly to float64.
Even if the new code were to cast back to single precision float and then to float64 like the original code, there does tend to be a small difference in the result, presumably due to precision error.
In most cases, however, it seems that there is no change because geom_mat_no is usually an identity matrix, so both the original and new code end up at the same result.
For subdivided default cubes with 1538 to 1572864 vertices:
~5-8 times faster (geom_mat_no is None and tangents are exported)
~8-11 times faster (geom_mat_no is set and tangents are exported)
~13-16 times faster (geom_mat_no is None and no tangents are exported)
~18-21 times faster (geom_mat_no is set and no tangents are exported)