This change solves a bottleneck which was caused by attempt to cache
postprocessed search areas used for tracking. It was a single cache
used by all threads, which required to have some synchronization
mechanism. This synchronization turned out to be making all threads
to idle while one thread is accessing the cache. The access was not
cheap, so the multi-threading did not provide expected speedup.
Current solution is to remove the cache of search areas. This avoids
any threading synchronization overhead because there is no need for
it anymore. The downside is that for certain configurations tracking
became slower when comparing to master branch. There is no expected
slowdown compared to 2.91 release.
The slowdown is mainly experienced when using big search area and
keyframe matching strategy. Other cases should still be within a
ballpark of performance of single-threaded code prior to this change.
The reason why is it so is because while this change makes it so the
image accessors needs to process images multiple times the complexity
of this process is almost the same as all the overhead for the cache
lookup and maintenance.
Here are Some numbers gained on different configurations.
CPU: Intel Xeom CPU E5-2699 v4
OS: Linux
Footage: Old_Factory MVI_4005.mov from the first part of Track Match
Blend training which can be found on the Blender Cloud.
Tracking 443 markers across 250 frames. The unit is seconds.
File: F9433209
2.91: 401.520874
before: 358.650055
after: 14.966302
Tracking single marker across 250 frames. The unit is seconds.
File: F9433211
2.91 before after
Big keyframe 1.307203 1.005324 1.227300
Big previous frame 1.144055 0.881139 0.944044
Small keyframe 0.434015 0.197760 0.224982
Small previous frame 0.463207 0.218058 0.234172
All at once 2.338268 1.481220 1.518060
The idea is to avoid any synchronization needed in the worker threads
and make them to operate on a local data. From implementation detail
this is achieved by keeping track of "wavefront" of markers which are
to be tracked and the tracking result. Insertion of results to the
AutoTrack context happens from main thread, which avoids need in the
lock when accessing AutoTrack.
This change makes tracking of many (300+) about 10% faster on the
Xeon) CPU E5-2699 v4. More speedup will be gained by minimizing
threading overhead in the frame cache.
Another important aspect of this change is that it fixes non-thread
safe access which was often causing crashes. Quite surprising the
crash was never reported.
This is something not-so-trivial to see from just reading code, which
shown an important of proper comments and clear naming.
Main source of confusion was that it is not immediately clear that
AutoTrack context is to see all tracks, but tracking to only operate on
selected ones.
Mainly affects for() loops.
The reason why loop parameter was declared outside of the loop roots
back to the times when not all compilers supported C99.
BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
Now all the fine-tuning is happening using parallel range settings structure,
which avoid passing long lists of arguments, allows extend fine-tuning further,
avoid having lots of various functions which basically does the same thing.
Pretty straightforward this time, we already have a single struct
pointer containing all needed data (or nearly).
And we gain about 10-15% speed on tracking! :)
This feature got lost with new auto-track API,
Added it back by extending frame accessor class. This isn't really
a frame thing, but we don't have other type of accessor here.
Surely, we can use old-style API here and pass mask via region
tracker options for this particular case, but then it becomes much
less obvious how real auto-tracker will access this mask with old
style API.
So seems we do need an accessor for such data, just matter of
finding better place than frame accessor.
There was some stupidness in the way how tracks are synchronized from the job
to actual DNA data leading to all sort of weird and wonderful failures again.
Writing to the tracks was already inside the lock section, but
reading was not. This could have lead to race condition leading
to all sorts of weird and wonderful artifacts.
This was never ported to a new tracking pipeline and now it's done using
FrameAccessor::Transform routines. Quite striaghtforward, but i've changed
order of grayscale conversion in blender side with call of transform callback.
This way it's much easier to perform rescaling in libmv side.
The title actually tells it all, this commit switches Blender to use the new
autotrack API from Libmv.
From the user point of view it means that prediction model is now used when
tracking which gives really nice results.
All the other changes are not really visible for users, those are just frame
accessors, caches and so for the new API.