FBX IO: Speed up parsing by multithreading array decompression #104739

Thomas Barlow · 2023-07-05T19:11:05+02:00

Thomas Barlow commented

2023-07-05 19:11:05 +02:00

Because zlib.decompress releases the GIL, the arrays are now
decompressed on separate threads.

Given enough logical CPUs on the current system, decompressing arrays
and parsing the rest of the file is now done simultaneously.

This uses the threading utils recently added in fbx_utils_threading.

Aside from .fbx files without any compressed arrays, array decompression
usually takes just under 50% of the parsing duration on average, though
commonly varies between 40% to 60% depending on the contents of the file.

However, multithreading array decompression leads to main thread reading
from the file and waiting for IO more often, so I was only able to get
an average of a 35% reduction in parsing duration. Because it's waiting
on IO, this is likely to vary depending on the file system that is being
read from and the time taken to read from IO is expected to be even
longer in real use cases because the file being read won't have been
accessed recently.

For the smallest files, e.g. a single cube mesh, this can be slightly
slower because starting a new thread takes more time than is gained by
starting that thread.

Parsing fbx files takes around 16% of the total import duration on
average, so the overall import duration would be expected to reduce by
about 5.6% on average. However, from timing imports before and after
this patch, I get an actual average reduction of 3.5%.

Because the main thread spends some time waiting on IO, even systems
with a single CPU can see a tiny speedup from this patch. I get about a
6% reduction in parsing duration in this case, which would be less than
1% of the total import duration.

This patch depends on #105017, which is not included in this PR.

Altogether, these commits slightly less than half the duration, on average, of parsing FBX files on my system. Though it should be noted that this time also includes reading the file from disk, so slow IO will significantly affect the duration, which is particularly noticeable when I import an .fbx on an HDD that hasn't been read from recently, so I always do a few warmup parses/imports before profiling anything. I also disable the Image Search option in the FBX importer settings because that can be very slow.

~~Parsing takes up about a third of the FBX import time on my system, so these commits reduce import times by roughly 10-15%.~~ Edit: The original profiling I did here was done with cprofile which appears to add a lot of overhead in this case, so the timings were not accurate and parsing was closer to 16% of the import duration. Additionally, since parsing resulting in waiting on IO a lot more, the 50% saving of parsing duration could not be achieved, making the total effect on parsing duration closer to a 3.5% reduction on average.

Because `zlib.decompress` releases the GIL, the arrays are now decompressed on separate threads. Given enough logical CPUs on the current system, decompressing arrays and parsing the rest of the file is now done simultaneously. This uses the threading utils recently added in `fbx_utils_threading`. Aside from .fbx files without any compressed arrays, array decompression usually takes just under 50% of the parsing duration on average, though commonly varies between 40% to 60% depending on the contents of the file. However, multithreading array decompression leads to main thread reading from the file and waiting for IO more often, so I was only able to get an average of a 35% reduction in parsing duration. Because it's waiting on IO, this is likely to vary depending on the file system that is being read from and the time taken to read from IO is expected to be even longer in real use cases because the file being read won't have been accessed recently. For the smallest files, e.g. a single cube mesh, this can be slightly slower because starting a new thread takes more time than is gained by starting that thread. Parsing fbx files takes around 16% of the total import duration on average, so the overall import duration would be expected to reduce by about 5.6% on average. However, from timing imports before and after this patch, I get an actual average reduction of 3.5%. Because the main thread spends some time waiting on IO, even systems with a single CPU can see a tiny speedup from this patch. I get about a 6% reduction in parsing duration in this case, which would be less than 1% of the total import duration. --- This patch depends on https://projects.blender.org/blender/blender-addons/pulls/105017, which is not included in this PR. Altogether, these commits slightly less than half the duration, on average, of parsing FBX files on my system. Though it should be noted that this time also includes reading the file from disk, so slow IO will significantly affect the duration, which is particularly noticeable when I import an .fbx on an HDD that hasn't been read from recently, so I always do a few warmup parses/imports before profiling anything. I also disable the `Image Search` option in the FBX importer settings because that can be very slow. ~~Parsing takes up about a third of the FBX import time on my system, so these commits reduce import times by roughly 10-15%.~~ Edit: The original profiling I did here was done with `cprofile` which appears to add a lot of overhead in this case, so the timings were not accurate and parsing was closer to 16% of the import duration. Additionally, since parsing resulting in waiting on IO a lot more, the 50% saving of parsing duration could not be achieved, making the total effect on parsing duration closer to a 3.5% reduction on average.

❤️ 1 🚀 1

Thomas Barlow commented

2023-08-18 19:11:01 +02:00

This may need some more work, or more work done in a separate PR because doing the decompression of arrays on a separate thread appears to result in the main thread waiting on IO a lot more, so the 50% parsing time save becomes more like 35%.

Additionally, it looks like cProfile may be introducing additional overhead within the parse function specifically, which causes it to report the parse function as taking up a larger percentage of the total import duration. I've profiled full imports, as well as just parse, with timeit/time.perf_counter and it leads me to believe that parse's percentage of the total import duration is closer to 16% than 33%.

16% of 35% gives an expected average import duration reduction of 5.6%, but from timing full imports before and after this patch I'm only getting about 3.5% on average.

The question would then be whether this additional code/complexity is worth a 3.5% time save.

This may need some more work, or more work done in a separate PR because doing the decompression of arrays on a separate thread appears to result in the main thread waiting on IO a lot more, so the 50% parsing time save becomes more like 35%. Additionally, it looks like `cProfile` may be introducing additional overhead within the `parse` function specifically, which causes it to report the `parse` function as taking up a larger percentage of the total import duration. I've profiled full imports, as well as just `parse`, with `timeit`/`time.perf_counter` and it leads me to believe that `parse`'s percentage of the total import duration is closer to 16% than 33%. 16% of 35% gives an expected average import duration reduction of 5.6%, but from timing full imports before and after this patch I'm only getting about 3.5% on average. The question would then be whether this additional code/complexity is worth a 3.5% time save.

Thomas Barlow force-pushed fbx_parse_multithread_pr from a7d3a03983 to 00c58e0d91

2023-08-19 03:58:04 +02:00

Compare

Thomas Barlow added 1 commit 2023-09-02 13:44:01 +02:00

Minor speedup to array decompression 93b00cdb76

Since the expected decompressed size is known ahead of time, the output
buffer can be created at the full size in advance, reducing the number
of memory allocations while decompressing.

Thomas Barlow added 1 commit 2023-09-18 01:43:01 +02:00

Comment updates 0fdf09a75b

Thomas Barlow referenced this pull request

2023-11-21 06:45:06 +01:00

FBX IO: Add utility to schedule tasks to run on separate threads #105017

Thomas Barlow added 2 commits 2023-12-03 03:18:24 +01:00

Merge branch 'main' into fbx_parse_multithread_pr 2c1a816491

Update for new multithreading utils f5e286a94e

Thomas Barlow added 1 commit 2023-12-18 02:44:32 +01:00

Merge remote-tracking branch 'upstream/main' into fbx_parse_multithread_pr 0e4e3e4337

Thomas Barlow changed title from ~~WIP: Speed up FBX parsing: Multithread array decompression~~ to FBX IO: Speed up parsing by multithreading array decompression

2023-12-18 03:18:42 +01:00

Thomas Barlow requested review from Bastien Montagne 2023-12-22 15:28:07 +01:00

Bastien Montagne approved these changes 2024-01-10 17:54:44 +01:00

Bastien Montagne left a comment

LGTM.

Thomas Barlow added 1 commit 2024-01-12 21:30:52 +01:00

Increase FBX IO version 726ddf588e

Thomas Barlow merged commit 502e097fdb into main

2024-01-12 21:32:36 +01:00

Thomas Barlow referenced this issue from a commit

2024-01-12 21:32:36 +01:00

FBX IO: Speed up parsing by multithreading array decompression

Thomas Barlow deleted branch fbx_parse_multithread_pr

2024-01-12 21:32:37 +01:00

Sign in to join this conversation.

No reviewers

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

FBX IO: Speed up parsing by multithreading array decompression #104739