FBX IO: Speed up parsing by multithreading array decompression #104739
No reviewers
Labels
No Label
Interest
Animation & Rigging
Interest
Blender Cloud
Interest
Collada
Interest
Core
Interest
Documentation
Interest
Eevee & Viewport
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
Import and Export
Interest
Modeling
Interest
Modifiers
Interest
Nodes & Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds, Tests & Devices
Interest
Python API
Interest
Rendering & Cycles
Interest
Sculpt, Paint & Texture
Interest
Translations
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Meta
Good First Issue
Meta
Papercut
Module
Add-ons (BF-Blender)
Module
Add-ons (Community)
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender-addons#104739
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "Mysteryem/blender-addons:fbx_parse_multithread_pr"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Because
zlib.decompress
releases the GIL, the arrays are nowdecompressed on separate threads.
Given enough logical CPUs on the current system, decompressing arrays
and parsing the rest of the file is now done simultaneously.
This uses the threading utils recently added in
fbx_utils_threading
.Aside from .fbx files without any compressed arrays, array decompression
usually takes just under 50% of the parsing duration on average, though
commonly varies between 40% to 60% depending on the contents of the file.
However, multithreading array decompression leads to main thread reading
from the file and waiting for IO more often, so I was only able to get
an average of a 35% reduction in parsing duration. Because it's waiting
on IO, this is likely to vary depending on the file system that is being
read from and the time taken to read from IO is expected to be even
longer in real use cases because the file being read won't have been
accessed recently.
For the smallest files, e.g. a single cube mesh, this can be slightly
slower because starting a new thread takes more time than is gained by
starting that thread.
Parsing fbx files takes around 16% of the total import duration on
average, so the overall import duration would be expected to reduce by
about 5.6% on average. However, from timing imports before and after
this patch, I get an actual average reduction of 3.5%.
Because the main thread spends some time waiting on IO, even systems
with a single CPU can see a tiny speedup from this patch. I get about a
6% reduction in parsing duration in this case, which would be less than
1% of the total import duration.
This patch depends on #105017, which is not included in this PR.
Altogether, these commits slightly less than half the duration, on average, of parsing FBX files on my system. Though it should be noted that this time also includes reading the file from disk, so slow IO will significantly affect the duration, which is particularly noticeable when I import an .fbx on an HDD that hasn't been read from recently, so I always do a few warmup parses/imports before profiling anything. I also disable the
Image Search
option in the FBX importer settings because that can be very slow.Parsing takes up about a third of the FBX import time on my system, so these commits reduce import times by roughly 10-15%.Edit: The original profiling I did here was done withcprofile
which appears to add a lot of overhead in this case, so the timings were not accurate and parsing was closer to 16% of the import duration. Additionally, since parsing resulting in waiting on IO a lot more, the 50% saving of parsing duration could not be achieved, making the total effect on parsing duration closer to a 3.5% reduction on average.This may need some more work, or more work done in a separate PR because doing the decompression of arrays on a separate thread appears to result in the main thread waiting on IO a lot more, so the 50% parsing time save becomes more like 35%.
Additionally, it looks like
cProfile
may be introducing additional overhead within theparse
function specifically, which causes it to report theparse
function as taking up a larger percentage of the total import duration. I've profiled full imports, as well as justparse
, withtimeit
/time.perf_counter
and it leads me to believe thatparse
's percentage of the total import duration is closer to 16% than 33%.16% of 35% gives an expected average import duration reduction of 5.6%, but from timing full imports before and after this patch I'm only getting about 3.5% on average.
The question would then be whether this additional code/complexity is worth a 3.5% time save.
a7d3a03983
to00c58e0d91
WIP: Speed up FBX parsing: Multithread array decompressionto FBX IO: Speed up parsing by multithreading array decompressionLGTM.