It was caused by image threading safe commit and it was noticeable
only on really multi-core CPU (like dual-socket Xeon stations), was
not visible on core i7 machine.
The reason of slowdown was spinlock around image buffer referencing,
which lead to lots of cores waiting for single core and using image
buffer after it was referenced was not so much longer than doing
reference itself.
The most clear solution here seemed to be introducing Image Pool
which will contain list of loaded and referenced image buffers, so
all threads could skip lock if the pool is used for reading only.
Lock only needed in cases when buffer for requested image user is
missing in the pool. This lock will happen only once per image so
overall amount of locks is much less that it was before.
To operate with pool:
- BKE_image_pool_new() creates new pool
- BKE_image_pool_free() destroys pool and dereferences all image
buffers which were loaded to it
- BKE_image_pool_acquire_ibuf() returns image buffer for given
image and user. Pool could be NULL and in this case fallback to
BKE_image_acquire_ibuf will happen.
This helps to avoid lots to if(poll) checks in image sampling
code.
- BKE_image_pool_release_ibuf releases image buffer. In fact, it
will only do something if pool is NULL, in all other case it'll
equal to DoNothing operation.
Documentation & Test blend files:
------------------
http://wiki.blender.org/index.php/User:MiikaH/GSoC-2012-Smoke-Simulator-Improvements
Credits:
------------------
Miika Hamalainen (MiikaH): Student / Main programmer
Daniel Genrich (Genscher): Mentor / Programmer of merged patches from Smoke2 branch
Google: For Google Summer of Code 2012
* Fluid compilation: Inverse the compile flag from DISABLE_ELBEEM to WITH_MOD_FLUID for consistency. (scons/cmake)
* Use WITH_BF_FLUID in your user config (scons)
* Add support for scons to disable build with Decimate and Boolean modifier.
(WITH_BF_DECIMATE and WITH_BF_BOOLEAN)
* where_is_object_time was called for every effector evaluation only to determine the object velocity in some rare cases.
* Calculating the effector velocity is now done in the effector precalculation stage.
* Removing this makes the code thread safe and also should give some nice performance boosts when simulating a lot of points.
* Thanks to MiikaH for noticing this problem.