FBX Export: Base patch for numpy speedup #104447
No reviewers
Labels
No Label
Interest
Animation & Rigging
Interest
Blender Cloud
Interest
Collada
Interest
Core
Interest
Documentation
Interest
Eevee & Viewport
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
Import and Export
Interest
Modeling
Interest
Modifiers
Interest
Nodes & Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds, Tests & Devices
Interest
Python API
Interest
Rendering & Cycles
Interest
Sculpt, Paint & Texture
Interest
Translations
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Meta
Good First Issue
Meta
Papercut
Module
Add-ons (BF-Blender)
Module
Add-ons (Community)
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender-addons#104447
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "Mysteryem/blender-addons:fbx_numpy_base_patch_pr"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a base patch that all other separate numpy patches to the FBX exporter rely on.
Add support for writing bytes from numpy arrays like the already supported Python arrays.
Add numpy and helper function imports to fbx_utils.py and export_fbx_bin.py to simplify subsequent patches.
Add astype_view_signedness utility function for viewing unsigned integer data as signed only when the itemsizes match, to avoid copying arrays unnecessarily with
numpy.ndarray.astype(new_type, copy=False)
.Add numpy versions of the vcos_transformed_gen and nors_transformed_gen mesh transform helpers. 4d output is supported following comments in nors_transformed_gen, though remains unused.
Given tests of 1000 to 200000 vectors:
The most common use case is where the matrix is None (when the bake_space_transform option of the exporter is disabled, which is the default), which is ~44-105 times faster.
When bake_space_transform is enabled geom_mat_co is usually a matrix containing only scaling, which is ~14-65 times faster.
When bake_space_transform is enabled geom_mat_no is usually the identity matrix, which is ~18-170 times faster.
Add helper functions for performing faster uniqueness along the first axis when a sorted result is not needed. The sorting part of numpy.unique is often what takes the most time in the subsequent patches and this helper function can run numpy.unique many times faster when the second axis of an array has more than one element because it treats each row as a single element of a larger dtype meaning it only has to sort once.
This patch on its own makes no change to exported files.
Roughly this time last year I was trying to export as FBX a .blend file that I didn't think was that big/complex, but was finding it taking 30 seconds to export which seemed rather long. Since then I've been working on and off on speeding up the exporter. Initially I started with trying to optimise existing loops but it soon became clear that more significant speedups would be achievable by using numpy.
A lot of the performance cost of the FBX export code comes from iterating through and processing every vertex/edge/etc. individually in Python code. By rewriting the iterated code to use numpy vectorized functions where possible, the iterations can be run in optimised C code instead. To facilitate the use of numpy arrays throughout the exporter, this patch adds support for exporting bytes from numpy arrays.
An additional, surprising performance cost came from the
bpy_prop_collection.foreach_get
function. When the second argument is a Python object that implements the buffer interface and the datatype of the buffer matches the C datatype of the property being accessed, then a directmemcpy
into the buffer is performed, which is very fast, but if the datatype doesn't match, then the data is iterated, and each element is cast to the same type as the buffer.The foreach_get function seems to be quite slow when it has to cast the elements for some reason. Passing in a buffer matching the C datatype and then casting the entire buffer with numpy is about 20-30 times faster for large arrays. From my testing, when the C datatype doesn't match the desired array type, it's actually even slightly faster to pass a Python list into foreach_get and then pass that list into the creation of a new Python array (or numpy array using
np.fromiter
) of the desired type than it is to pass an array of the desired type directly into foreach_get.This patch doesn't contain any code to do with using
bpy_prop_collection.foreach_get
, but it's a common feature of the subsequent patches so I've gone into more detail here. For reference, the C function is foreach_getset in bpy_rna.cI couldn't find a way to inspect properties at runtime through Blender's Python API to determine their C type (the hard max/min values do not always match the min/max of the C types) so for the most part relied on a custom Blender build I managed to cobble together that would print a warning whenever the buffer passed into foreach_get/set didn't match the type of the C data being accessed.
The helper functions for performing fast uniqueness along the first axis could definitely do with being scrutinised since viewing floats as types that compare by bytes is a bit hacky. I think I've covered the main issue of -0.0 and 0.0 having different representations as bytes and I think it's ok to not care that different NaNs are not collapsed into a single value, though it wouldn't be too much of an issue to modify the code to collapse the different NaNs into one.
I have not been able to test the code in encode_bin.py on a big endian system, only simulate the behaviour on a copy of the code with
_IS_BIG_ENDIAN
replaced with True.LGTM. Think I will first commit this, then you can rebase the
vcos
PR on main, and commit that one. Then would let a few extra days of initial validation before merging the other PRs.Also kudos for the detailed doc in code, always great to see!
That sounds valid assumptions to me, NaN values in Blender data should be considered as invalid anyway...
323182c873
to388f48cb09