BLI: refactor IndexMask for better performance and memory usage #104629

Jacques Lucke · 2023-02-11T20:06:29+01:00

Jacques Lucke commented

2023-02-11 20:06:29 +01:00

Goals of this refactor:

Reduce memory consumption of IndexMask. The old IndexMask uses an int64_t for each index which is more than necessary in pretty much all practical cases currently. I still wouldn't want to simply reduce the size to int32_t because that could become limiting in the future in case we use this to index e.g. byte buffers larger than a few gigabytes. I also don't want to template IndexMask, because that would cause a split in the "ecosystem", or everything would have to be implemented twice or templated.
Allow for more multi-threading. The old IndexMask contains a single array. This is generally good but has the problem that it is hard to fill from multiple-threads when the final size is not known from the beginning. This is commonly the case when e.g. converting an array of bool to an index mask. Currently, this kind of code only runs on a single thread.
Allow for efficient set operations like join, intersect and difference. It should be possible to multi-thread those operations.
It should be possible to iterate over an IndexMask very efficiently. The most important part of that is to avoid all memory access when iterating over continuous ranges. For some core nodes (e.g. math nodes), we generate optimized code for the cases of irregular index masks and simple index ranges.

To achieve these goals, a few compromises had to made:

Slicing of the mask (at specific indices) and random element access is O(log #indices) now, but with a low constant factor. It should be possible to split a mask into n approximately equally sized parts in O(n) though, making the time per split O(1).
Using range-based for loops does not work well when iterating over a nested data structure like the new IndexMask. Therefor, foreach_* functions with callbacks have to be used. To avoid extra code complexity at the call site, the foreach_* methods support multi-threading out of the box.

The new data structure splits an IndexMask into an arbitrary number of ordered IndexMaskSegment. Each segment can contain at most 2^14 = 16384 indices. The indices within a segment are stored as int16_t. Each segment has an additional int64_t offset which allows storing arbitrary int64_t indices. This approach has the main benefits that segments can be processed/constructed individually on multiple threads without a serial bottleneck. Also it reduces the memory requirements significantly.

For more details see comments in BLI_index_mask.hh.

I did a few tests to verify that the data structure generally improves performance and does not cause regressions:

Our field evaluation benchmarks take about as much as before. This is to be expected because we already made sure that e.g. add node evaluation is vectorized. The important thing here is to check that changes to the way we iterate over the indices still allows for auto-vectorization.
Memory usage by a mask is about 1/4 of what it was before in the average case. That's mainly caused by the switch from int64_t to int16_t for indices. In the worst case, the memory requirements can be larger when there are many indices that are very far away. However, when they are far away from each other, that indicates that there aren't many indices in total. In common cases, memory usage can be way lower than 1/4 of before, because sub-ranges use static memory.

I possible performance improvements by benchmarking IndexMask::from_bools in index_mask_from_selection on 10.000.000 elements at various probabilities for true at every index:

Probability      Old        New
0              4.6 ms     0.8 ms 
0.001          5.1 ms     1.3 ms
0.2            8.4 ms     1.8 ms
0.5           15.3 ms     3.0 ms
0.8           20.1 ms     3.0 ms
0.999         25.1 ms     1.7 ms
1             13.5 ms     1.1 ms

Goals of this refactor: * Reduce memory consumption of `IndexMask`. The old `IndexMask` uses an `int64_t` for each index which is more than necessary in pretty much all practical cases currently. I still wouldn't want to simply reduce the size to `int32_t` because that could become limiting in the future in case we use this to index e.g. byte buffers larger than a few gigabytes. I also don't want to template `IndexMask`, because that would cause a split in the "ecosystem", or everything would have to be implemented twice or templated. * Allow for more multi-threading. The old `IndexMask` contains a single array. This is generally good but has the problem that it is hard to fill from multiple-threads when the final size is not known from the beginning. This is commonly the case when e.g. converting an array of bool to an index mask. Currently, this kind of code only runs on a single thread. * Allow for efficient set operations like join, intersect and difference. It should be possible to multi-thread those operations. * It should be possible to iterate over an `IndexMask` very efficiently. The most important part of that is to avoid all memory access when iterating over continuous ranges. For some core nodes (e.g. math nodes), we generate optimized code for the cases of irregular index masks and simple index ranges. To achieve these goals, a few compromises had to made: * Slicing of the mask (at specific indices) and random element access is `O(log #indices)` now, but with a low constant factor. It should be possible to split a mask into n approximately equally sized parts in `O(n)` though, making the time per split `O(1)`. * Using range-based for loops does not work well when iterating over a nested data structure like the new `IndexMask`. Therefor, `foreach_*` functions with callbacks have to be used. To avoid extra code complexity at the call site, the `foreach_*` methods support multi-threading out of the box. The new data structure splits an `IndexMask` into an arbitrary number of ordered `IndexMaskSegment`. Each segment can contain at most `2^14 = 16384` indices. The indices within a segment are stored as `int16_t`. Each segment has an additional `int64_t` offset which allows storing arbitrary `int64_t` indices. This approach has the main benefits that segments can be processed/constructed individually on multiple threads without a serial bottleneck. Also it reduces the memory requirements significantly. For more details see comments in `BLI_index_mask.hh`. I did a few tests to verify that the data structure generally improves performance and does not cause regressions: * Our field evaluation benchmarks take about as much as before. This is to be expected because we already made sure that e.g. add node evaluation is vectorized. The important thing here is to check that changes to the way we iterate over the indices still allows for auto-vectorization. * Memory usage by a mask is about 1/4 of what it was before in the average case. That's mainly caused by the switch from `int64_t` to `int16_t` for indices. In the worst case, the memory requirements can be larger when there are many indices that are very far away. However, when they are far away from each other, that indicates that there aren't many indices in total. In common cases, memory usage can be way lower than 1/4 of before, because sub-ranges use static memory. * I possible performance improvements by benchmarking `IndexMask::from_bools` in `index_mask_from_selection` on 10.000.000 elements at various probabilities for `true` at every index: ``` Probability Old New 0 4.6 ms 0.8 ms 0.001 5.1 ms 1.3 ms 0.2 8.4 ms 1.8 ms 0.5 15.3 ms 3.0 ms 0.8 20.1 ms 3.0 ms 0.999 25.1 ms 1.7 ms 1 13.5 ms 1.1 ms ```

👍 2 🎉 1 🚀 1

Jacques Lucke added 28 commits 2023-02-11 20:06:31 +01:00

d1f2ae5630 progress

11bea3a954 progress

c158664ccd progress

e8a99b3d83 progress

b0c5527df2 progress

98478973aa Merge branch 'main' into index-mask-refactor

169acd0fdb remove old code

38c2b67e69 cleanup

253e6bbf14 progress

14b2ffff6e progress

b257c01127 progress

b476e24d45 Merge branch 'main' into index-mask-refactor

234cddec5f progress

5b9f8cd9f5 progress

fc6a72256a progress

6a77c5b219 progress

9ef2f470c7 Merge branch 'main' into index-mask-refactor

74530c692c cleanup

35ffb58cec progress

1a2e1d1a9c progress

c9c09cb0fb progress

006dbf253f cleanup

de8e0f7032 cleanup

cbf103865e progress

69cb21962d progress

87a29e9b30 progress

d6c3b2250c progress

c43a3bb12e progress

Jacques Lucke added 1 commit 2023-02-11 20:31:41 +01:00

7cecdbdd29 Merge branch 'main' into index-mask-refactor

Jacques Lucke referenced this pull request

2023-02-12 17:30:51 +01:00

BLI: use larger integer type in BitVector #104658

Jacques Lucke referenced this issue from a commit

2023-02-12 18:01:00 +01:00

BLI: use larger integer type in BitVector

Jacques Lucke added 12 commits 2023-02-12 18:02:14 +01:00

dbb98b4a2f Merge branch 'main' into index-mask-refactor

719c43c61a improve naming

1142a996f6 cleanup naming

bf00af9e87 progress

a254fffbfb simplify naming

ebaa70ce2b improve performance

cd37dc7169 cleanup

b83833fcb0 progress

409d526617 progress

3d4ced77ed progress

24600ca65f fixes

5d624e6060 add benchmark

Jacques Lucke referenced this pull request

2023-02-12 23:33:27 +01:00

BLI: new bit span data structure #104671

Jacques Lucke added 2 commits 2023-02-17 00:35:13 +01:00

ffc934d4f5 Merge branch 'main' into index-mask-refactor

f917322fae try two pass algorithm to avoid small allocations

Jacques Lucke added 2 commits 2023-02-17 00:41:24 +01:00

f3f69f6e0e change chunk_offsets to chunk_ids

beda66bec8 rename max_chunk_size to chunk_capacity

Jacques Lucke referenced this issue from a commit

2023-02-17 00:42:55 +01:00

BLI: new bit span data structure

Jacques Lucke added 4 commits 2023-02-17 01:05:45 +01:00

ba9c58664b Merge branch 'main' into index-mask-refactor

67b52758bb rename segment_indices to indices_by_segment

6b6f0b7267 rename segment_sizes_cumulative to cumulative_segment_sizes

528b7a6625 rename chunk_sizes_cumulative to cumulative_chunk_sizes

Jacques Lucke added 4 commits 2023-02-17 01:30:03 +01:00

9633298aa1 fix

14d4da0704 cleanup naming

0d61818a56 move OffsetSpan to separate file

debd1010f8 simplify iteration api

Jacques Lucke added 12 commits 2023-02-17 12:39:57 +01:00

6ae3c2dd61 Merge branch 'main' into index-mask-refactor

427551623f add utility methods

4e3331af46 progress

1e25cbdd30 progress

525911f94a cleanup linear allocator

d9ca4b4c8d cleanup

cc6d44f982 reduce allocations for full chunks

c5a37fb850 early return

de628f0b6d initial bits to mask conversion

be6ef6500a convert index mask to bits

30557cf6f5 add aligned parallel reduce

cc71968af9 progress

Jacques Lucke added 2 commits 2023-02-26 16:34:22 +01:00

58562ba47d Merge branch 'main' into index-mask-refactor

d266e0cf32 fix

Jacques Lucke added 3 commits 2023-02-26 17:49:56 +01:00

3623143b76 Merge branch 'main' into index-mask-refactor

a75f7185d5 add foreach range function

c6eee5569d cleanup

Jacques Lucke added 2 commits 2023-02-26 19:12:47 +01:00

e5352abe63 improve api

6c816d74f4 cleanup

Jacques Lucke added 1 commit 2023-02-26 19:57:49 +01:00

eae30305d8 initial index mask expression api

Jacques Lucke added 1 commit 2023-02-26 20:24:58 +01:00

f052c8bfe3 try using memory resource instead of linear allocator

Jacques Lucke added 1 commit 2023-02-26 20:28:27 +01:00

buildbot/vexp-code-patch-coordinator Build done. Details

e734450226 remove type alias

it didn't always make things shorter but made things more obscure

Jacques Lucke commented

2023-02-26 20:29:19 +01:00

@blender-bot build

Jacques Lucke added 1 commit 2023-02-26 20:55:35 +01:00

buildbot/vexp-code-patch-coordinator Build done. Details

777f30a148 cleanup

Jacques Lucke commented

2023-02-26 20:57:32 +01:00

@blender-bot build

Jacques Lucke added 3 commits 2023-02-27 14:39:17 +01:00

38aa47a05f Merge branch 'main' into index-mask-refactor

34cc0635a4 move foreach_segment to implementation file

2b9c0db965 initial finding of chunks to process

Jacques Lucke added 1 commit 2023-02-27 14:55:00 +01:00

9b5d4a6de6 add utilities

Jacques Lucke added 1 commit 2023-02-28 19:39:39 +01:00

fb9e4d19f6 test other algorithm to determine possible chunks to check

Jacques Lucke added 3 commits 2023-03-05 11:41:06 +01:00

784b05e01f Merge branch 'main' into index-mask-refactor

204b5aee6f evaluate expression on separate chunks

0610294156 cleanup

Jacques Lucke added 1 commit 2023-03-05 11:52:17 +01:00

5edfeada7a remove dependence on memory_resource because it's not available on macos

Jacques Lucke added 1 commit 2023-03-05 12:21:02 +01:00

e13543782c cleanup

Jacques Lucke added 8 commits 2023-03-19 08:10:26 +01:00

d5885ec0ec Merge branch 'main' into index-mask-refactor

98c3b8ce24 support to range conversion

b41ab94bc0 cleanup

6f28b7b13b fix

fca893a314 allow smaller ranges

d10db7dbc0 cleanup memory usage

aa1325b1c7 Merge branch 'main' into index-mask-refactor

f7c9fe26f9 Merge branch 'main' into index-mask-refactor

Jacques Lucke added 31 commits 2023-03-19 08:44:43 +01:00

5d92520aab replace files

b9a4c30c7a add to blender namespace

9048ff9362 replace to_best_mask_type

575688cee0 progress

e9863a387c fix

2f05ca302e progress

38a39d5696 progress

47edca6978 progress

a0752026b5 progress

aa04c9ca3d progress

add5bd9b55 progress

ccba23d7bd progress

590c569c5e progress

da6da24c99 progress

aed0ec5d65 progress

9ac7e53c81 progress

aafe9de9b4 progress

263244d64d cleanup

19d0dea25d progress

2e0364b005 progress

6653a301e2 progress

1d23e61bad progress

8fa05e8c0f progress

a01d444ecd progress

709fc832cf progress

c5a95db125 progress

f8854f6038 Merge branch 'index-mask-refactor' into index-mask-refactor-replace-existing

889802ebf5 fixes

0f536ad07d fix

5ed73fb797 Merge branch 'index-mask-refactor' into index-mask-refactor-replace-existing

f3759712ad fix

Jacques Lucke added 1 commit 2023-03-19 09:05:58 +01:00

cced82dbd5 fix

Jacques Lucke added 3 commits 2023-03-20 18:24:43 +01:00

b0e4c576ca support vectorization again

53847952cb improve multi function eval performance

8a98dc4100 optimize single-value case in from_bools

Jacques Lucke added 3 commits 2023-03-20 19:49:07 +01:00

0a92c6138e implement iteration more efficiently

91bd9d0975 use optimized iteration in cpptype

1a537a578b pass IndexMask by reference instead of by value

Hans Goudey added 2 commits 2023-03-21 12:14:40 +01:00

b32bf9a19d Merge branch 'main'

2fa7c92846 Fix build errors from changed OffsetIndices

Jacques Lucke added 8 commits 2023-03-21 12:20:06 +01:00

d8f3eb4ad5 Merge branch 'main' into index-mask-refactor

6b402457e0 cleanup

137f499520 try speedup foreach_index_optimized

94a25b0a16 pass mask by reference

a8b1e07599 skip trivial construction/destruction again

4c1c1cf6e2 add optimized to ranges and spans conversion

393e7a367f try optimize fast case for converting mask to ranges and spans

f7153dd593 Merge branch 'index-mask-refactor' of projects.blender.org:JacquesLucke/blender into index-mask-refactor

Jacques Lucke added 1 commit 2023-03-21 12:51:02 +01:00

b2b441adb6 optimize IndexMask::slice_and_offset for the range case

Jacques Lucke added 1 commit 2023-03-22 11:40:16 +01:00

fd54a0990d Merge branch 'main' into index-mask-refactor

Jacques Lucke added this to the Nodes & Physics project 2023-03-22 12:35:08 +01:00

Hans Goudey added 2 commits 2023-03-22 23:10:58 +01:00

a220ed27fa Merge branch 'main' into index-mask-refactor

b1c7edd413 Fix build error

Hans Goudey added 4 commits 2023-03-25 18:31:40 +01:00

4d1f081b21 Merge branch 'main' into index-mask-refactor

1a4419264f Move extraction of indices from set to a separate function

cb6138a15b Use Array instead of Vector

a3643d5817 Extract stripping of empty chunks and mask creation to separate function

Hans Goudey added 1 commit 2023-03-29 19:55:17 +02:00

12845c9a6b Merge branch 'main' into index-mask-refactor

Hans Goudey added 1 commit 2023-03-31 23:19:59 +02:00

1c68b3d588 Merge branch 'main' into index-mask-refactor

Hans Goudey added 1 commit 2023-04-01 03:47:05 +02:00

225a85f27d Handle span virtual arrays when converting from bools

Hans Goudey added 2 commits 2023-04-25 16:39:25 +02:00

33016fe472 Merge branch 'main' into index-mask-refactor

5cb9e27bd9 Fixes for new code

Jacques Lucke added 12 commits 2023-05-12 18:06:58 +02:00

be1a5b6ed6 Merge branch 'main' into index-mask-refactor

97f0169bc9 progress

0e7b9e031a Merge branch 'main' into index-mask-refactor

62a6272d34 progress

a4fa5c78ae progress

2ecf64725d progress

2eb4751702 progress

623eff5f7c progress

36f2f73449 progress

be9bbdb13b progress

4e174f56d0 Merge branch 'main' into index-mask-refactor

f782037ee4 fix

Jacques Lucke added 1 commit 2023-05-12 19:04:41 +02:00

8d377aff1b fix

Jacques Lucke added 1 commit 2023-05-14 12:25:54 +02:00

1e0c840d9a fix

Jacques Lucke added 1 commit 2023-05-19 10:07:43 +02:00

e09c9f2cc0 Merge branch 'main' into index-mask-refactor

Jacques Lucke added 8 commits 2023-05-19 12:10:43 +02:00

Jacques Lucke added 12 commits 2023-05-19 22:29:46 +02:00

479e0f4275 simplify creating mask from indices

bfc081cd57 progress

3c31d98cb1 move most of from_predicate out of header

cc0b4b4620 improve

fb8d3f1599 fuzzy test index iterator conversion

c5f3b22132 implement find and fuzzy test

800fdf5a21 simplify code

36a9d13c95 fix

a65112a3db cleanup

e72a74a0a2 fix

cb32376810 cleanup

ff1aad316c cleanup

Jacques Lucke added 27 commits 2023-05-21 02:23:32 +02:00

358da692b2 cleanup

58b8e8daed cleanup

1f6192fc86 extract header for unique sorted indices

3cb4dc53a8 cleanup

9b2139a549 cleanup

d68a1d0574 cleanup

9505003dc1 progress

b80177aa32 simplify index mask structure

c693138f83 speedup find

29e725da3b improve from predicate

c394f5825a fix

da64f581fe initial segment consolidation

61e39f209d fix

9ba6240c9e add benchmark

5b4d3f6929 speed from indices

5e9e7087e4 add comments

191da619a1 improve docs

7b394bb3c8 cleanup

3c2fdc2e78 improve comments

1b01bf9d4a cleanup

4e8b16626d cleanup

f560bedc8d cleanup

7dc86dc93c add comment

78bdb11f0c cleanup

7652f34546 comments

3dca97a054 comments

af76a000a5 comments

Jacques Lucke added 1 commit 2023-05-21 02:36:00 +02:00

ef3576a19a Merge branch 'main' into index-mask-refactor

Jacques Lucke added 1 commit 2023-05-21 02:40:55 +02:00

65c382f7c6 fix

Jacques Lucke added 7 commits 2023-05-21 15:09:48 +02:00

Jacques Lucke added 1 commit 2023-05-21 15:24:43 +02:00

7337163c09 Merge branch 'main' into index-mask-refactor

Jacques Lucke added 1 commit 2023-05-22 09:05:26 +02:00

0ed3040837 Merge branch 'main' into index-mask-refactor

Jacques Lucke added 5 commits 2023-05-22 11:34:02 +02:00

3d37fdb267 Merge branch 'main' into index-mask-refactor

25ff160bbe fix after merge

0fb6d648cc improve ability to auto-vectorize code in foreach_index_optimized

5dea3f3faa improve vectorizability

36d6965b06 reduce code bloat a bit

Jacques Lucke changed title from ~~WIP: BLI: refactor IndexMask for better performance and memory usage~~ to BLI: refactor IndexMask for better performance and memory usage

2023-05-22 11:34:48 +02:00

Jacques Lucke added 1 commit 2023-05-22 11:44:12 +02:00

buildbot/vexp-code-patch-coordinator Build done. Details

79b7967854 Merge branch 'main' into index-mask-refactor

Jacques Lucke requested review from Hans Goudey 2023-05-22 11:44:48 +02:00

Jacques Lucke commented

2023-05-22 11:45:40 +02:00

@blender-bot build

Jacques Lucke requested review from Lukas Tönne 2023-05-22 11:48:05 +02:00

Jacques Lucke added 1 commit 2023-05-22 12:54:02 +02:00

buildbot/vexp-code-patch-coordinator Build done. Details

e0784f4fd1 fix compile error

Jacques Lucke commented

2023-05-22 12:54:12 +02:00

@blender-bot build

Hans Goudey added 3 commits 2023-05-23 22:54:38 +02:00

e501a8902a Merge branch 'main' into index-mask-refactor

f585a766c7 Fix and slightly adjust a few comments

45be0be1da Remove const for by-value arguments in declarations

Hans Goudey approved these changes 2023-05-23 22:56:26 +02:00

Hans Goudey left a comment

Looks quite good! It's satisfying to see how this has come together and been simplified. Committing it now/soon will make it easier to track future improvements like improving from_bits or complement.

I think the API documentation is missing a bit of description of how to choose between foreach_segment and foreach_index (and the corresponding optimized variants). The choice seems a bit arbitrary right now in the various users.

Looks quite good! It's satisfying to see how this has come together and been simplified. Committing it now/soon will make it easier to track future improvements like improving `from_bits` or `complement`. I think the API documentation is missing a bit of description of how to choose between `foreach_segment` and `foreach_index` (and the corresponding optimized variants). The choice seems a bit arbitrary right now in the various users.

source/blender/blenlib/BLI_index_mask.hh

						
				@ -31,0 +26,4 @@

				 * - The most-significant-bit is not used so that signed integers can be used which avoids common

				 *   issues when mixing signed and unsigned ints.

				 * - The second most-significant bit is not used for indices so that #max_segment_size itself can

				 *   be stored in the #int16_t.

Hans Goudey commented

2023-05-23 21:55:25 +02:00

Might be helpful to mention why it's helpful that max_segment_size fits in int16_t

Might be helpful to mention why it's helpful that `max_segment_size` fits in `int16_t`

JacquesLucke marked this conversation as resolved

source/blender/blenlib/BLI_index_mask.hh

						
				@ -73,0 +77,4 @@

				  /**

				   * Encodes the size of each segment. The size of a specific segment can be computed by

				   * subtracting consecutive elements (also see #OffsetIndices). The size of this array is one

				   * larger than #segments_num_. Note that the first elements is _not_ necessarily zero.

Hans Goudey commented

2023-05-23 22:00:03 +02:00

Hmm, why would the first element not be 0 always?

Jacques Lucke commented

2023-05-24 09:55:44 +02:00

The first element is often not 0 when the IndexMask is a slice of another mask.

The first element is often not 0 when the `IndexMask` is a slice of another mask.

source/blender/blenlib/BLI_index_mask.hh Outdated

						
				@ -257,3 +294,1 @@

				   *

				   * All the indices in the sub-mask are shifted by 3 towards zero,

				   * so that the first index in the output is zero.

				   * Same as above but may generate more code at compile time because it optimizes for the case

Hans Goudey commented

2023-05-23 22:12:53 +02:00

"optimizes for" might be more helpful if it said "generates a separate case for "

JacquesLucke marked this conversation as resolved

source/blender/blenlib/BLI_index_mask.hh Outdated

						
				@ -291,0 +353,4 @@

				 * The class has to be constructed once. Afterwards, `update` has to be called to fill the mask

				 * with the provided segment.

				 */

				class IndexMaskFromSegment : NonCopyable, NonMovable {

Hans Goudey commented

2023-05-23 20:25:12 +02:00

IndexMaskFromSegment could probably get a more "private" API, with only public mask() and update() methods and everything else private. That might make it more obvious how it's supposed to be used.

`IndexMaskFromSegment` could probably get a more "private" API, with only public `mask()` and `update()` methods and everything else private. That might make it more obvious how it's supposed to be used.

JacquesLucke marked this conversation as resolved

source/blender/blenlib/BLI_offset_span.hh Outdated

						
				@ -0,0 +7,4 @@

				namespace blender {

				/**

				 * An #OffsetSpan where a constant offset is added to every value when accessed. This allows e.g.

Hans Goudey commented

2023-05-23 20:26:01 +02:00

An #OffsetSpan where a constant -> An #OffsetSpan is a #Span with a constant

`An #OffsetSpan where a constant` -> `An #OffsetSpan is a #Span with a constant`

JacquesLucke marked this conversation as resolved

source/blender/blenlib/BLI_unique_sorted_indices.hh

						
				@ -0,0 +66,4 @@

				 *   [3, 4, 5, 6, 8, 9, 10]

				 *                ^ Range ends here because 6 and 8 are not consecutive.

				 */

				template<typename T> inline int64_t find_size_of_next_range(const Span<T> indices)

Hans Goudey commented

2023-05-23 20:32:23 +02:00

It's pretty clear from the implementations, but it might be nice to put the average and worst case big-O runtime for find_size_until_next_range and find_size_of_next_range

It's pretty clear from the implementations, but it might be nice to put the average and worst case big-O runtime for `find_size_until_next_range` and `find_size_of_next_range`

JacquesLucke marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -5,0 +28,4 @@

				const IndexMask &get_static_index_mask_for_min_size(const int64_t min_size)

				{

				  static constexpr int64_t size_shift = 30;

				  static constexpr int64_t max_size = (1 << size_shift);

Hans Goudey commented

2023-05-23 22:19:55 +02:00

Might as well add /* 1'073'741'824 */ in a comment after this. Maybe 2 billion or so is safer?

It's a bit confusing that min_size is passed to this function in general, but I guess it's the only good way to have that assert.

Might as well add `/* 1'073'741'824 */` in a comment after this. Maybe 2 billion or so is safer? It's a bit confusing that `min_size` is passed to this function in general, but I guess it's the only good way to have that assert.

JacquesLucke marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -44,3 +149,1 @@

				  }

				  if (indices_.is_empty()) {

				    return full_range;

				  /* TODO: Implement more efficient solution. */

Hans Goudey commented

2023-05-23 22:26:54 +02:00

I guess this might come if we work on the operations you wrote earlier using bit masks? Worth doing soon I guess, since it's not great to have this with a very different algorithmic complexity than the proper solution.

Jacques Lucke commented

2023-05-24 10:24:54 +02:00

It's probably faster to implement this without bit masks. One could just generate a new segment for each old non-range segment and add extra segments for the gaps between old segments.

source/blender/blenlib/intern/index_mask.cc

						
				@ -54,0 +168,4 @@

				  int64_t group_start_segment_i = 0;

				  int64_t group_first = segments[0][0];

				  int64_t group_last = segments[0].last();

				  bool group_as_range = unique_sorted_indices::non_empty_is_range(segments[0].base_span());

Hans Goudey commented

2023-05-23 22:28:02 +02:00

const bool?

`const bool`?

JacquesLucke marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc Outdated

						
				@ -90,0 +257,4 @@

				                                  LinearAllocator<> &allocator,

				                                  Vector<IndexMaskSegment> &r_segments)

				{

				  Vector<std::variant<IndexRange, Span<T>>> segments;

Hans Goudey commented

2023-05-23 22:32:40 +02:00

With the 1/2^14 constant factor, a larger inline buffer here could probably eliminate most allocations. Same below with Vector<IndexMaskSegment> segments

With the `1/2^14` constant factor, a larger inline buffer here could probably eliminate most allocations. Same below with `Vector<IndexMaskSegment> segments`

JacquesLucke marked this conversation as resolved

source/blender/blenlib/intern/index_mask.cc

						
				@ -90,0 +277,4 @@

				          segment_indices.size());

				      while (!segment_indices.is_empty()) {

				        const int64_t offset = segment_indices[0];

				        const int64_t next_segment_size = binary_search::find_predicate_begin(

Hans Goudey commented

2023-05-23 22:38:36 +02:00

Maybe not worth it (also just want to test my understanding)-- couldn't this limit the maximum size of the span argument to find_predicate_begin with something like find_predicate_begin(segment_indices.take_front(max_segment_size + offset),...?

Maybe not worth it (also just want to test my understanding)-- couldn't this limit the maximum size of the span argument to `find_predicate_begin` with something like `find_predicate_begin(segment_indices.take_front(max_segment_size + offset),...`?

Jacques Lucke commented

2023-05-24 10:41:21 +02:00

Not sure why the + offset, but taking at most max_segment_size makes sense.
In practice it likely doesn't make a difference right now, because the span passed to segments_from_indices is already sliced (for multi-threading).

Not sure why the `+ offset`, but taking at most `max_segment_size` makes sense. In practice it likely doesn't make a difference right now, because the span passed to `segments_from_indices` is already sliced (for multi-threading).

source/blender/editors/sculpt_paint/curves_sculpt_delete.cc Outdated

						
				@ -135,2 +130,2 @@

				          return curves_to_delete[curve_i];

				        });

				    IndexMaskMemory mask_memory;

				    const IndexMask &mask_to_delete = IndexMask::from_bools(curves_to_delete, mask_memory);

Hans Goudey commented

2023-05-23 20:40:02 +02:00

Probably shouldn't be a reference here

JacquesLucke marked this conversation as resolved

source/blender/editors/sculpt_paint/curves_sculpt_puff.cc

						
				@ -165,2 +164,4 @@

				      }

				    }

				    IndexMaskMemory memory;

				    const IndexMask changed_curves_mask = IndexMask::from_indices<int64_t>(changed_curves_indices,

Hans Goudey commented

2023-05-23 20:43:27 +02:00

It looks like this whole changed_curves_indices thing could be replaced with from_predicate?

It looks like this whole `changed_curves_indices` thing could be replaced with `from_predicate`?

Jacques Lucke commented

2023-05-24 10:46:43 +02:00

Might not be entirely trivial right now, but generally I agree. Will leave that for later.

source/blender/geometry/intern/resample_curves.cc

						
				@ -323,3 +321,3 @@

				        MutableSpan<T> dst = attributes.dst[i_attribute].typed<T>();

				        for (const int i_curve : sliced_selection) {

				        for (const int i_curve : selection_segment) {

Hans Goudey commented

2023-05-23 20:59:10 +02:00

This changes int to int64_t in some places but not others, probably best to be consistent.

This changes `int` to `int64_t` in some places but not others, probably best to be consistent.

JacquesLucke marked this conversation as resolved

source/blender/geometry/intern/set_curve_type.cc Outdated

						
				@ -527,1 +514,3 @@

				      for (const int i : selection.slice(range)) {

				    selection.foreach_segment(GrainSize(512), [&](const IndexMaskSegment segment) {

				      for (const int i : segment) {

				        nurbs_order[i] = 4;

Hans Goudey commented

2023-05-23 21:02:44 +02:00

Any particular reason not to just use masked_fill here, rather than writing a for loop? Seems like it would be simpler that way.

Any particular reason not to just use `masked_fill` here, rather than writing a for loop? Seems like it would be simpler that way.

source/blender/geometry/intern/trim_curves.cc Outdated

						
				@ -1068,3 +1051,3 @@

				    /* Only trimmed curves are no longer cyclic. */

				    if (bke::SpanAttributeWriter cyclic = dst_attributes.lookup_for_write_span<bool>("cyclic")) {

				      cyclic.span.fill_indices(selection.indices(), false);

				      selection.foreach_index(GrainSize(4096), [&](const int64_t i) { cyclic.span[i] = false; });

Hans Goudey commented

2023-05-23 21:06:27 +02:00

Might as well used masked_fill here

Might as well used `masked_fill` here

Jacques Lucke commented

2023-05-24 10:50:17 +02:00

Don't mind much either way, but performance wise it's likely better the way it is now, even though it also doesn't matter too much. Might be good to have a version of masked_fill that takes IndexMaskSegment.

Don't mind much either way, but performance wise it's likely better the way it is now, even though it also doesn't matter too much. Might be good to have a version of `masked_fill` that takes `IndexMaskSegment`.

Hans Goudey commented

2023-05-24 14:22:04 +02:00

I think I'd rather keep the code a bit simpler here, and leave changing from fill_indices to something besides masked_fill for a separate commit, where it can be more easily evaluated separately from this larger change.

I think I'd rather keep the code a bit simpler here, and leave changing from `fill_indices` to something besides `masked_fill` for a separate commit, where it can be more easily evaluated separately from this larger change.

source/blender/nodes/geometry/nodes/node_geo_curve_sample.cc

						
				@ -415,2 +408,4 @@

				        });

				      });

				      IndexMaskMemory memory;

Hans Goudey commented

2023-05-23 21:11:37 +02:00

This memory will fill up while processing in the for loop, since from_indices doesn't clear the existing memory. Maybe better to declare it inside the loop? Or maybe not, hrmm...

Actually, maybe I'll just try to replace this with the same from_groups thing from elsewhere.

This `memory` will fill up while processing in the for loop, since `from_indices` doesn't clear the existing memory. Maybe better to declare it inside the loop? Or maybe not, hrmm... Actually, maybe I'll just try to replace this with the same `from_groups` thing from elsewhere.

Jacques Lucke commented

2023-05-24 10:54:31 +02:00

Yeah, difficult, will also leave that for later for now. It's probably good to refactor this a bit more like you mentioned.

Hans Goudey requested changes 2023-05-23 22:57:58 +02:00

Hans Goudey left a comment

Oops. Meant to request changes.

Also, it would be nice to see at least a few performance measurements (or at least some memory usage example case).

Oops. Meant to request changes. Also, it would be nice to see at least a few performance measurements (or at least some memory usage example case).

Hans Goudey added 1 commit 2023-05-24 02:04:38 +02:00

8d29fd4322 Merge branch 'main' into index-mask-refactor

Jacques Lucke added 3 commits 2023-05-24 10:07:09 +02:00

22af3f2d0a Merge branch 'main' into index-mask-refactor

e93534fd60 improve comments

737226c15e improve IndexMaskFromSegment api

Jacques Lucke added 8 commits 2023-05-24 11:25:20 +02:00

1ec847f57f improve comments

7d5e6d4a39 improve max size and comments

b343191503 fix segment consolidation

9724694932 improve sizes

215eb87573 remove unnecessary references

924a8625f4 cleanup

b1a2beb784 fix compilation

4f50fc4ac9 make constructor from size explicit

Jacques Lucke added 1 commit 2023-05-24 12:02:43 +02:00

e3f0e07a28 improve documentation for foreach_* methods

Hans Goudey approved these changes 2023-05-24 14:30:55 +02:00

Hans Goudey left a comment

The new foreach documentation is great, thanks for that!

I just left one comment about the code replacing masked_fill in two places. Other than that this looks ready to me.

The new `foreach` documentation is great, thanks for that! I just left one comment about the code replacing `masked_fill` in two places. Other than that this looks ready to me.

Jacques Lucke added 2 commits 2023-05-24 14:35:50 +02:00

781f058a2f Merge branch 'main' into index-mask-refactor

194a9a1aa8 use masked_fill

Hans Goudey added 1 commit 2023-05-24 14:36:09 +02:00