Since the expected decompressed size is known ahead of time, the output
buffer can be created at the full size in advance, reducing the number
of memory allocations while decompressing.
Because `zlib.decompress` releases the GIL, the arrays are now
decompressed on separate threads.
Given enough logical CPUs on the current system, decompressing arrays
and parsing the rest of the file is now done simultaneously.
All the functions for managing the multithreading have been encapsulated
in a helper class that gets exposed through a context manager.
If the current platform does not support multithreading
(wasm32-emscripten/wasm32-wasi), then the code falls back to being
single-threaded.
Aside from .fbx files without any compressed arrays, array decompression
usually takes just under 50% of the parsing duration on average, though
commonly varies between 40% to 60% depending on the contents of the file.
I was only able to get an average of a 35% reduction in parsing duration
because the main thread ends up reading from the file more often and
appears to end up spending more time waiting for IO than before. Though
this is likely to vary depending on the file system that is being read
from and the time taken to read from IO is expected to be longer in real
use cases because the file being read won't have been accessed recently.
For the smallest files, e.g. a single cube mesh, this can be slightly
slower because starting a new thread takes more time than is gained by
starting that thread.
Because the main thread spends some time waiting on IO, even systems
with a single CPU can see a small speedup from this patch. I get about a
6% reduction in parsing duration in this case.
Parsing fbx files takes about 16% of the total import duration on
average, so the overall import duration would be expected to reduce by
about 5.6% on average. However, from timing imports before and after
this patch, I get an actual average reduction of 3.5%.