Are there locks down the the road in the
get_elem
perhaps which gets in a way of better parralelism?
It looks like memory access is the bottleneck. read_elem_checked
is the function…
No specific reason, but it doesn't make a difference here because there is a single assert per test. I can update it in a later patch for clarity
Some findings from profiling:
- SAT multithreaded implementation is 30-40% faster than single threaded SAT. On my machine, fullframe is now 3-4x slower than GPU
- SAT operation is the bottleneck…
Using TBB
was not much faster than openMP
but multithreading in general helped in reducing the error (see updated images in description).
As agreed, I removed the fast
. The filter still…
As discussed, I tried this and it didn't work. The reason is the small differences of 0.2 - 0.3 (2x - 3x actual mean) cause very large differences in the squared SAT, making it asymmetric and…
As discussed in the meeting, my concern was using SingleThreadedOperation
for a multi-threaded execution. I will upload a patch using TBB.