FBX IO: Speed up parsing by multithreading array decompression #104739

Merged
Thomas Barlow merged 7 commits from Mysteryem/blender-addons:fbx_parse_multithread_pr into main 2024-01-12 21:32:36 +01:00
Member

Because zlib.decompress releases the GIL, the arrays are now
decompressed on separate threads.

Given enough logical CPUs on the current system, decompressing arrays
and parsing the rest of the file is now done simultaneously.

This uses the threading utils recently added in fbx_utils_threading.

Aside from .fbx files without any compressed arrays, array decompression
usually takes just under 50% of the parsing duration on average, though
commonly varies between 40% to 60% depending on the contents of the file.

However, multithreading array decompression leads to main thread reading
from the file and waiting for IO more often, so I was only able to get
an average of a 35% reduction in parsing duration. Because it's waiting
on IO, this is likely to vary depending on the file system that is being
read from and the time taken to read from IO is expected to be even
longer in real use cases because the file being read won't have been
accessed recently.

For the smallest files, e.g. a single cube mesh, this can be slightly
slower because starting a new thread takes more time than is gained by
starting that thread.

Parsing fbx files takes around 16% of the total import duration on
average, so the overall import duration would be expected to reduce by
about 5.6% on average. However, from timing imports before and after
this patch, I get an actual average reduction of 3.5%.

Because the main thread spends some time waiting on IO, even systems
with a single CPU can see a tiny speedup from this patch. I get about a
6% reduction in parsing duration in this case, which would be less than
1% of the total import duration.


This patch depends on #105017, which is not included in this PR.

Altogether, these commits slightly less than half the duration, on average, of parsing FBX files on my system. Though it should be noted that this time also includes reading the file from disk, so slow IO will significantly affect the duration, which is particularly noticeable when I import an .fbx on an HDD that hasn't been read from recently, so I always do a few warmup parses/imports before profiling anything. I also disable the Image Search option in the FBX importer settings because that can be very slow.

Parsing takes up about a third of the FBX import time on my system, so these commits reduce import times by roughly 10-15%. Edit: The original profiling I did here was done with cprofile which appears to add a lot of overhead in this case, so the timings were not accurate and parsing was closer to 16% of the import duration. Additionally, since parsing resulting in waiting on IO a lot more, the 50% saving of parsing duration could not be achieved, making the total effect on parsing duration closer to a 3.5% reduction on average.

Because `zlib.decompress` releases the GIL, the arrays are now decompressed on separate threads. Given enough logical CPUs on the current system, decompressing arrays and parsing the rest of the file is now done simultaneously. This uses the threading utils recently added in `fbx_utils_threading`. Aside from .fbx files without any compressed arrays, array decompression usually takes just under 50% of the parsing duration on average, though commonly varies between 40% to 60% depending on the contents of the file. However, multithreading array decompression leads to main thread reading from the file and waiting for IO more often, so I was only able to get an average of a 35% reduction in parsing duration. Because it's waiting on IO, this is likely to vary depending on the file system that is being read from and the time taken to read from IO is expected to be even longer in real use cases because the file being read won't have been accessed recently. For the smallest files, e.g. a single cube mesh, this can be slightly slower because starting a new thread takes more time than is gained by starting that thread. Parsing fbx files takes around 16% of the total import duration on average, so the overall import duration would be expected to reduce by about 5.6% on average. However, from timing imports before and after this patch, I get an actual average reduction of 3.5%. Because the main thread spends some time waiting on IO, even systems with a single CPU can see a tiny speedup from this patch. I get about a 6% reduction in parsing duration in this case, which would be less than 1% of the total import duration. --- This patch depends on https://projects.blender.org/blender/blender-addons/pulls/105017, which is not included in this PR. Altogether, these commits slightly less than half the duration, on average, of parsing FBX files on my system. Though it should be noted that this time also includes reading the file from disk, so slow IO will significantly affect the duration, which is particularly noticeable when I import an .fbx on an HDD that hasn't been read from recently, so I always do a few warmup parses/imports before profiling anything. I also disable the `Image Search` option in the FBX importer settings because that can be very slow. ~~Parsing takes up about a third of the FBX import time on my system, so these commits reduce import times by roughly 10-15%.~~ Edit: The original profiling I did here was done with `cprofile` which appears to add a lot of overhead in this case, so the timings were not accurate and parsing was closer to 16% of the import duration. Additionally, since parsing resulting in waiting on IO a lot more, the 50% saving of parsing duration could not be achieved, making the total effect on parsing duration closer to a 3.5% reduction on average.
Author
Member

This may need some more work, or more work done in a separate PR because doing the decompression of arrays on a separate thread appears to result in the main thread waiting on IO a lot more, so the 50% parsing time save becomes more like 35%.

Additionally, it looks like cProfile may be introducing additional overhead within the parse function specifically, which causes it to report the parse function as taking up a larger percentage of the total import duration. I've profiled full imports, as well as just parse, with timeit/time.perf_counter and it leads me to believe that parse's percentage of the total import duration is closer to 16% than 33%.

16% of 35% gives an expected average import duration reduction of 5.6%, but from timing full imports before and after this patch I'm only getting about 3.5% on average.

The question would then be whether this additional code/complexity is worth a 3.5% time save.

This may need some more work, or more work done in a separate PR because doing the decompression of arrays on a separate thread appears to result in the main thread waiting on IO a lot more, so the 50% parsing time save becomes more like 35%. Additionally, it looks like `cProfile` may be introducing additional overhead within the `parse` function specifically, which causes it to report the `parse` function as taking up a larger percentage of the total import duration. I've profiled full imports, as well as just `parse`, with `timeit`/`time.perf_counter` and it leads me to believe that `parse`'s percentage of the total import duration is closer to 16% than 33%. 16% of 35% gives an expected average import duration reduction of 5.6%, but from timing full imports before and after this patch I'm only getting about 3.5% on average. The question would then be whether this additional code/complexity is worth a 3.5% time save.
Thomas Barlow force-pushed fbx_parse_multithread_pr from a7d3a03983 to 00c58e0d91 2023-08-19 03:58:04 +02:00 Compare
Thomas Barlow added 1 commit 2023-09-02 13:44:01 +02:00
Since the expected decompressed size is known ahead of time, the output
buffer can be created at the full size in advance, reducing the number
of memory allocations while decompressing.
Thomas Barlow added 1 commit 2023-09-18 01:43:01 +02:00
Thomas Barlow added 2 commits 2023-12-03 03:18:24 +01:00
Thomas Barlow added 1 commit 2023-12-18 02:44:32 +01:00
Thomas Barlow changed title from WIP: Speed up FBX parsing: Multithread array decompression to FBX IO: Speed up parsing by multithreading array decompression 2023-12-18 03:18:42 +01:00
Thomas Barlow requested review from Bastien Montagne 2023-12-22 15:28:07 +01:00
Bastien Montagne approved these changes 2024-01-10 17:54:44 +01:00
Bastien Montagne left a comment
Owner

LGTM.

LGTM.
Thomas Barlow added 1 commit 2024-01-12 21:30:52 +01:00
Thomas Barlow merged commit 502e097fdb into main 2024-01-12 21:32:36 +01:00
Thomas Barlow deleted branch fbx_parse_multithread_pr 2024-01-12 21:32:37 +01:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender-addons#104739
No description provided.