BLI: Improve IndexMask::complement() performance #108331

Hans Goudey · 2023-05-26T21:22:51+02:00

Hans Goudey commented

2023-05-26 21:22:51 +02:00

IndexMask::complement() is often used in geometry processing
algorithms when a selection needs to be inverted, mostly just in
curves code so far.

Instead of reusing from_predicate and lookup in the source mask,
scan the mask once, inserting segments between the original indices.

Theoretically this improves the performance from O(N*log(N)) to O(N).
But with the small constant offset of the former, the improvement is
generally just 3-4 times faster. However in some special cases the
new code will take constant time.

IndexMask::complement() is often used in geometry processing algorithms when a selection needs to be inverted, mostly just in curves code so far. Instead of reusing `from_predicate` and lookup in the source mask, scan the mask once, inserting segments between the original indices. Theoretically this improves the performance from O(N*log(N)) to O(N). But with the small constant offset of the former, the improvement is generally just 3-4 times faster. However in some special cases the new code will take constant time. ![image](/attachments/d2f6b0be-f195-4206-9bf4-c0ab20041d1b)

image.png

92 KiB

👍 1

Hans Goudey added 1 commit 2023-05-26 21:23:01 +02:00

339cf787c2 WIP: BLI: Improve IndexMask::complement() performance

Instead of reusing `from_predicate` and lookup in the source mask,
scan the mask once, inserting segments between the original indices.

Theoretically this improves the performance from O(N*log(N)) to O(N).
But with the small constant offset of the former, the improvement won't
be that nice.

TODO:
- More performance testing. I didn't see much change in the test code runtime.

Hans Goudey added this to the Core Libraries project 2023-05-26 21:23:39 +02:00

Hans Goudey requested review from Jacques Lucke 2023-05-26 21:23:54 +02:00

Jacques Lucke requested changes 2023-05-27 08:36:22 +02:00

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -333,0 +337,4 @@

				static void inverted_indices_to_segments(const IndexMaskSegment segment,

				                                         const int64_t range_threshold,

				                                         LinearAllocator<> &allocator,

				                                         Vector<IndexMaskSegment, 16> &segments)

Jacques Lucke commented

2023-05-27 08:26:15 +02:00

r_segments

`r_segments`

HooglyBoogly marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc

						
				@ -333,0 +361,4 @@

				  Span<int16_t> indices = segment.base_span();

				  while (indices.size() > 1) {

				    const int64_t size_before_gap = unique_sorted_indices::find_size_of_next_range(indices);

Jacques Lucke commented

2023-05-27 08:28:53 +02:00

Doing this logarithmic range-size-search for potentially every index is not efficient. It may be possible to improve performance of find_size_of_next_range for small ranges.

Doing this logarithmic range-size-search for potentially every index is not efficient. It may be possible to improve performance of `find_size_of_next_range` for small ranges.

Hans Goudey commented

2023-05-31 01:21:09 +02:00

I did some experimenting with this and ended up specializing it for single indices. I'm sure there are more possibilities here for the future too!

source/blender/blenlib/intern/index_mask.cc

						
				@ -333,0 +375,4 @@

				    }

				    else {

				      for (const int64_t i : IndexRange(gap_size)) {

				        add_index(gap_first + int16_t(i));

Jacques Lucke commented

2023-05-27 08:30:20 +02:00

Add indices "at once" instead of one by one. Essentially increasing inverted_indices_count only once.

Add indices "at once" instead of one by one. Essentially increasing ` inverted_indices_count` only once.

Hans Goudey commented

2023-05-30 23:10:19 +02:00

This didn't seem to change the performance, but I did it anyway just in case, it is a bit clearer

HooglyBoogly marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -333,0 +400,4 @@

				  if (!this->to_range()) {

				    const int64_t segments_num = this->segments_num();

				    ParallelSegmentsCollector segments_collector;

				    threading::parallel_for(

Jacques Lucke commented

2023-05-27 08:31:53 +02:00

There should be a separate code path that does not use EnumerableThreadSpecific.

There should be a separate code path that does not use `EnumerableThreadSpecific`.

HooglyBoogly marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -333,0 +401,4 @@

				    const int64_t segments_num = this->segments_num();

				    ParallelSegmentsCollector segments_collector;

				    threading::parallel_for(

				        IndexRange(segments_num).drop_back(1), 512, [&](const IndexRange range) {

Jacques Lucke commented

2023-05-27 08:35:42 +02:00

Processing 512 segments at once is likely too much in practice and causes the algorithm to be single threaded in too many cases. Generally it's hard to find a good grain size with these algorithms here, because the time per segment can vary wildly, but 512 is still too much.

HooglyBoogly marked this conversation as resolved

Hans Goudey added 11 commits 2023-05-31 01:19:23 +02:00

da708df74b Cleanup: Simplofy use of modifier eval context flags

795c002f1a Merge branch 'main' into index-mask-complement-performance

38a061824b Make performance test

f890f8e0c4 Improve normal test

2e40248801 Merge branch 'main' into index-mask-complement-performance

d2b9825f0e Make performance test slower

83713fef3d Add r_ prefix

9a6ccb3770 Add multiple indices at the same time

c2b75d0e90 Use a dynamic grain size

edbe13209e Add a separate code path for small masks

64d31f1b65 Slightly specialize get_size_before_gap

Hans Goudey requested review from Jacques Lucke 2023-05-31 01:24:21 +02:00

Hans Goudey changed title from ~~WIP: BLI: Improve IndexMask::complement() performance~~ to BLI: Improve IndexMask::complement() performance

2023-05-31 01:24:33 +02:00

Jacques Lucke commented

2023-05-31 09:09:04 +02:00

Looks good. There are a few more cases where the algorithm can become O(1) instead of O(n). Mainly when the output is a single range. In this case no new memory has to be allocated either. Would be nice if you could add some tests for these special cases as well. Might be good to put "fuzzy" into the names of tests that use random numbers and to have some tests that don't rely on random numbers.

  if (universe.is_empty()) {
    return {};
  }
  const std::optional<IndexRange> this_range = this->to_range();
  const bool this_is_range = this_range.has_value();
  if (this_is_range) {
    const bool first_in_range = this_range->first() <= universe.first();
    const bool last_in_range = this_range->last() >= universe.last();
    if (first_in_range && last_in_range) {
      /* This mask fills the entire universe, so the complement is empty. */
      return {};
    }
    if (first_in_range) {
      /* This mask is a range that contains the start of the universe. The complement is a range
       * that contains the end of the universe. */
      const int64_t complement_start = this_range->one_after_last();
      const int64_t complement_size = universe.one_after_last() - complement_start;
      return IndexRange(complement_start, complement_size);
    }
    if (last_in_range) {
      /* This mask is a range that contains the end of the universe. The complement is a range
      that
       * contains the start of the universe. */
      const int64_t complement_start = universe.first();
      const int64_t complement_size = this_range->first() - complement_start;
      return IndexRange(complement_start, complement_size);
    }
  }

Looks good. There are a few more cases where the algorithm can become O(1) instead of O(n). Mainly when the output is a single range. In this case no new memory has to be allocated either. Would be nice if you could add some tests for these special cases as well. Might be good to put "fuzzy" into the names of tests that use random numbers and to have some tests that don't rely on random numbers. ```cpp if (universe.is_empty()) { return {}; } const std::optional<IndexRange> this_range = this->to_range(); const bool this_is_range = this_range.has_value(); if (this_is_range) { const bool first_in_range = this_range->first() <= universe.first(); const bool last_in_range = this_range->last() >= universe.last(); if (first_in_range && last_in_range) { /* This mask fills the entire universe, so the complement is empty. */ return {}; } if (first_in_range) { /* This mask is a range that contains the start of the universe. The complement is a range * that contains the end of the universe. */ const int64_t complement_start = this_range->one_after_last(); const int64_t complement_size = universe.one_after_last() - complement_start; return IndexRange(complement_start, complement_size); } if (last_in_range) { /* This mask is a range that contains the end of the universe. The complement is a range that * contains the start of the universe. */ const int64_t complement_start = universe.first(); const int64_t complement_size = this_range->first() - complement_start; return IndexRange(complement_start, complement_size); } } ```

Jacques Lucke approved these changes 2023-05-31 09:28:50 +02:00

Hans Goudey added 3 commits 2023-05-31 17:07:19 +02:00

fde54ae1a9 Merge branch 'main' into index-mask-complement-performance

7f1433764c Add more O(1) checks from Jacques

19428c4b6f Add some special case tests, remove performance test

Hans Goudey merged commit 49b48209e7 into main

2023-05-31 17:11:11 +02:00

Hans Goudey deleted branch index-mask-complement-performance

2023-05-31 17:11:12 +02:00

Hans Goudey referenced this issue from a commit

2023-05-31 17:11:13 +02:00

BLI: Improve IndexMask::complement() performance