BLI: add high level documentation for core data structures #25

Merged
Jacques Lucke merged 14 commits from JacquesLucke/blender-developer-docs:core-data-structures into main 2024-02-19 19:09:34 +01:00
9 changed files with 38 additions and 34 deletions
Showing only changes of commit 0fbf737254 - Show all commits

View File

@ -1,10 +1,11 @@
# Any
`blender::Any` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_any.hh)) is a type-safe container for single values of any copy-constructible type. It is similar to `std::any` but provides the following two additional features:
- It has adjustable inline buffer capacity and alignment. `std::any` typically has a small inline buffer but its size is not guaranteed.
- It can store additional user-defined type information without increasing the `sizeof` the `Any` object.
If any of those features is required, it's benefitial to use `blender::Any`. Otherwise using `std::any` is fine as well.
If any of those features are required, it's benefitial to use `blender::Any`. Otherwise using `std::any` is fine as well.
```cpp
/* Construct empty value. */
@ -32,9 +33,9 @@ void *value = value.get();
value.reset();
```
## Store Additional Type Information
## Additional Type Information
One of the features of `blender::Any` is that it can store additional information for each type that is stored. This can be done with fairly low overhead, because `Any` does type erasure and has to store some type specific information anyway.
One of the features of `blender::Any` is that it can store additional information for each type that is stored. This can be done with fairly low overhead, because `Any` does type erasure and has to store some type-specific information anyway.
In the example below, the `blender::Any` knows the size of the stored type.

View File

@ -1,16 +1,16 @@
# Bits
Sometimes it can be benefitial to work with bits directly instead of boolean values because they are very compact and many bits can be processed at the same time. This document shows some available utilities to work with dynamically sized bitsets.
Sometimes it can be benefitial to work with bits directly instead of boolean values because they are very compact and many bits can be processed at the same time. This document describes some of the utilities available for working with dynamically-sized bitsets.
Before choosing to work with individual bits instead of bools, keep in mind that there are also downsides which may not be obvious at first.
- Writing to separate bits in the same int is not thread-safe. Therefore, an existing vector of
bool can't easily be replaced with a bit vector, if it is written to from multiple threads.
bool can't necessarily be replaced with a bit vector when it is written to from multiple threads.
Read-only access from multiple threads is fine though.
- Writing individual elements is more expensive when the array is in cache already. That is
because changing a bit is always a read-modify-write operation on the int the bit resides in.
- Reading individual elements is more expensive when the array is in cache already. That is
because additional bit-wise operations have to be applied after the corresponding int is
read.
- Writing an individual element is more expensive when the array is in cache already, because
changing a bit is always a read-modify-write operation on the integer containing the bit.
- Reading an individual element is more expensive when the array is in cache already, because
additional bit-wise operations have to be applied after the corresponding int is read.
## BitVector
@ -46,9 +46,9 @@ Those are also the types returned when accessing a specific index in `BitVector`
## BitSpan
Just like it's not possible to reference a single bit with standard C++, it's also not possible to reference a span of bits. To do that, one can use `BitSpan` and `MutableBitSpan` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_bit_span.hh)).
Just like it's not possible to reference a single bit with standard C++, it's also not possible to reference a span of bits. Instead, one can use `BitSpan` and `MutableBitSpan` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_bit_span.hh)).
Additionally, there is also `BoundedBitSpan` and `MutableBoundedBitSpan`. Those are like normal bit spans but enforce specific constraints on the alignment of the span. These additional constraints allow bit spans to be processed more efficiently than in the most general constraint. For more details on the exact constraints, check the `is_bounded_span` function.
Additionally, there are also `BoundedBitSpan` and `MutableBoundedBitSpan`. Those are like normal bit spans but enforce specific constraints on the alignment of the span. These additional constraints allow bit spans to be processed more efficiently than in the most general case. For more details on the exact constraints, check the `is_bounded_span` function.
It's generally recommended to work with bit spans that follow these constraints if possible for best performance.
@ -59,7 +59,7 @@ There are three core operations that can be performed on bit spans:
2. Check if any bit is set.
3. Iterate over all set bits.
`BLI_bit_span_ops.hh` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_bit_span_ops.hh)) offers utilities to these things.
`BLI_bit_span_ops.hh` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_bit_span_ops.hh)) offers utilities for these operations.
```cpp
BitVector<> vec1(500);
@ -91,9 +91,9 @@ bits::mix_into_first_expr([](bits::BitInt result,
`BitGroupVector` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_bit_group_vector.hh)) allows storing a fixed number of bits for each element. For example, this could be used to store a bit for each attribute for each vertex in a mesh.
In some sense, this data structure is also 2D bit vector, that can dynamically grow on one axis.
In some sense, this data structure is also 2D bit vector that can dynamically grow on one axis.
`BitGroupVector` is designed so that the individual bit groups all fullfill the requirements by bounded bit spans. As such, they can be processed efficiently.
`BitGroupVector` is designed so that the each bit group fullfills the requirements of bounded bit spans (`BoundedBitSpan`). As such, they can be processed efficiently.
```cpp
/* Store a bit for each attribute for each vertex. */

View File

@ -2,9 +2,11 @@
[Container](https://en.wikipedia.org/wiki/Container_(abstract_data_type)) data structures allow storing many elements of the same type. Different structures in this category allow for different access patterns.
Many of Blender's available containers have equivalents in the standard library. In most case's it's preferred to use the Blender container instead.
## Vector
The `blender::Vector<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector.hh)) is the most important data structure. It stores values of the given type in a dynamically growing contiguous buffer.
The `blender::Vector<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector.hh)) is the most important container. It stores values of the given type in a dynamically growing contiguous buffer.
```cpp
/* Create an empty vector. */
@ -19,7 +21,7 @@ int value = values[0];
## Array
`blender::Array<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_array.hh)) is very similar to `Vector`. The main difference is that it is not dynamically growing. Instead its size is usually only set once and stays the same for the rest of its life-time. It has a slightly lower memory footprint than `Vector`. Using an `Array` instead of `Vector` also indicates that the size is not expected to change.
`blender::Array<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_array.hh)) is very similar to `Vector`. The main difference is that it is does not grow dynamically. Instead its size is usually only set once and stays the same for the rest of its life-time. It has a slightly lower memory footprint than `Vector`. Using an `Array` instead of `Vector` also indicates that the size is not expected to change.
Note that this is different from `std::array` for which the size has to be known at compile time. If the size is actually known at compile time, `std::array` should be used instead.
@ -42,7 +44,7 @@ int value = stack.peek();
int value = stack.pop();
```
A `Vector` can also be used as a `Stack` using the `Vector::append`, `Vector::last` and `Vector::pop_last` methods. This is benefitial if one also needs the ability to iterate over all elements that are currently in the stack. If that's not required, it's better to use `Stack` directly because of it's more purpose-designed methods and it allows pushing in O(1) (not just armortized as it does not require reallocating already pushed elements).
A `Vector` can also be used as a `Stack` using the `Vector::append`, `Vector::last` and `Vector::pop_last` methods. This is benefitial if one also needs the ability to iterate over all elements that are currently in the stack. Otherwise it's better to use `Stack` directly because of its more purpose-designed methods and because it allows pushing in O(1) time (not just armortized, since it does not require reallocating already pushed elements).
## Set
@ -72,7 +74,7 @@ bool is_contained = values.contains(3);
Using `Set` with a custom type requires an equality operator and a [hash](#hashing) function.
While a `Vector` could also be used to mimic the behavior of a `Set`, it's generally much less efficient at that task. A `Set` uses a hash table internally which allows it to check for duplicates in constant time instead of having to compare the value with every previously added element.
While a `Vector` can also be used to mimic the behavior of a `Set`, it's generally much less efficient at that task. A `Set` uses a hash table internally which allows it to check for duplicates in constant time instead of having to compare the value with every previously added element.
## Map
@ -129,7 +131,7 @@ Using `Map` with a custom type as key requires an equality operator and a [hash]
## Vector Set
A `blender::VectorSet<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector_set.hh)) is a combination of a `Vector` and a `Set`. It can't contain duplicate values like a `Set` but the values stored in it are ordered based on insertion order (until elements are removed). Just like in a `Vector`, the values are also stored in a contiguous array which makes it easy to pass them to other functions that expect an array.
A `blender::VectorSet<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector_set.hh)) is a combination of a `Vector` and a `Set`. It can't contain duplicate values like a `Set` but the values stored in it are stored in insertion order (until elements are removed). Just like in a `Vector`, the values are also stored in a contiguous array which makes it easy to pass them to other functions that expect an array.
```cpp
/* Construct empty vector-set. */
@ -162,8 +164,9 @@ These are some concepts that apply to many of the container types.
Most container types mentioned above (except `VectorSet` currently) have an inline buffer. For as long as the elements added to the container fit into the inline buffer, no additional allocation is made. This is important because allocations can be a performance bottleneck.
Inline buffers are enabled by default in supported containers. It's generally recommended to use the default value, but there are cases when the inline buffer size should be set manually.
* When building a compact type which has a container that is usually empty, the inline buffer size could be set to 0. It should also be considered to just wrap the container in a `std::unique_ptr` in this case.
* When working in hot code that requires e.g. a `Vector`, the inline buffer size can be increased to make better use of stack memory and to avoid allocations in the majority of cases.
- When building a compact type which has a container that is usually empty, the inline buffer size could be set to 0. It should also be considered to just wrap the container in a `std::unique_ptr` in this case.
- When working in hot code that requires e.g. a `Vector`, the inline buffer size can be increased to make better use of stack memory and to avoid allocations in the majority of cases.
The inline buffer is is typically the first template parameter after the type.
@ -182,7 +185,7 @@ Using a larger online buffer obviously also increases the size of the type: `siz
### Hashing
Using custom types in a data structure that uses a hash table (like `Set`, `Map` and `VectorSet`) requires the implementation of the equality operator (`operator==`) and a `hash` function.
Using custom types in a data structure that uses a hash table (like `Set`, `Map`, and `VectorSet`) requires the implementation of the equality operator (`operator==`) and a `hash` function.
```cpp
struct MyType {
@ -201,7 +204,7 @@ struct MyType {
A potentially more convenient way to implement the equality operator could be to use `BLI_STRUCT_EQUALITY_OPERATORS_2`.
The `hash` function has to return the same value for two instances of the type that are considered equal. In theory, even returning a constant value fullfills that requirement and would be correct. However, having a hash function that always returns the same value makes using hash tables useless and degrades performance.
The `hash` function has to return the same value for two instances of the type that are considered equal. In theory, even returning a constant value fullfills that requirement and would be correct. However, a hash function that always returns the same value makes using hash tables useless and degrades performance.
Simply calling `get_default_hash` on the data members that should impact the hash (which are the same ones which should impact equality) is usually good enough. When designing a custom hash function, it's recommended to put as much variation as possible into the lower bits. Those are used by the containers at first. However, if the low bits have too many collisions, the higher bits are taken into account automatically as well.
@ -213,6 +216,6 @@ Many core methods on the container data structures have multiple overloads. For
Additionally, many methods have a variant with the `_as` suffix. Such methods allow passing in the parameter with a different type than what the container actually contains. For example, `Vector<std::string>::append_as` can also be called with a `const char *` parameter, and not just `std::string`. The `std::string` is then constructed inplace. This avoids the need to construct it first and then to move it in the right place. This specific example is very similar to `std::vector::emplace_back`.
However, the `*_as` convention is a bit more general. For example, it allows calling `Set::add_as` or `Set::contains_as` to be called with a type that is not exactly the one which is stored. This avoids the construction of stored type in many cases.
However, the `*_as` convention is a bit more general. For example, it allows calling `Set::add_as` or `Set::contains_as` to be called with a type that is not exactly the one which is stored. This avoids the construction of the stored type in many cases.
Other libraries sometimes support this as well, without the additional `_as` suffix, but that also leads to more complex error messages in the common cases.

View File

@ -1,6 +1,6 @@
# Functions
There are many ways to store a reference to a function. This document gives an overview over those and gives recommendations for when to use which approach.
There are many ways to store a reference to a function. This document gives an overview of them and gives recommendations for when to use each approach.
1. Pass function pointer and user data (as void *) separately:
- The only method that is compatible with C interfaces.
@ -25,7 +25,7 @@ There are many ways to store a reference to a function. This document gives an o
- Works well with all callables.
- It's a non-owning reference, so it *cannot* be stored safely in general.
The following diagram helps to decide which approach to use when building an API where the user has to pass in a function.
The following diagram helps to decide which approach to use when building an API where the user has to pass a function.
```mermaid
flowchart TD

View File

@ -1,10 +1,10 @@
# Index Mask
An `IndexMask` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_index_mask.hh)) is a sequence of unique and sorted indices. It's commonly used when a subset of elements in an array has to be processed. This is sometimes called [existential processing](https://www.dataorienteddesign.com/dodbook/node4.html) and is often better than having e.g. a bool for every element that has to be checked in every inner loop to determine if it has to be processed.
An `IndexMask` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_index_mask.hh)) is a sequence of unique and sorted indices. It's commonly used when a subset of elements in an array have to be processed. This is sometimes called [existential processing](https://www.dataorienteddesign.com/dodbook/node4.html) and is often better than having e.g. a bool for every element that has to be checked in every inner loop to determine if it has to be processed.
Semantically, an `IndexMask` is very similar to a simple `Vector<int64_t>` with unique and sorted indices. However, due to the implementation details of `IndexMask`, it can be significantly more efficient than the `Vector`.
Semantically, an `IndexMask` is very similar to a simple `Vector<int64_t>` with unique and sorted indices. However, due to the implementation details of `IndexMask`, it is significantly more efficient than the `Vector`.
An `IndexMask` does not own the memory it references. Typically, the referenced data is either statically allocated or is owned by an `IndexMaskMemory`.
An `IndexMask` does not own the memory it references. Typically the referenced data is either statically allocated or is owned by an `IndexMaskMemory`.
```cpp
/* Owner of some dynamically allocated memory in one or more index masks. */

View File

@ -2,7 +2,7 @@
An `IndexRange` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_index_range.hh)) represents a set of non-negative consecutive indices. It's stored as just the start index and the number of indices. Since it's small, it should generally be passed by value.
The most common usage of `IndexRange` is to loop over indices. This is better than a c-style index loop, because it reduces the likelyhood of mixing up variables and allows the current index to be const. It's also just more convenient.
The most common usage of `IndexRange` is to loop over indices. This is better than a c-style index loop because it reduces the likelyhood of mixing up variables and allows the current index to be const. It's also just more convenient.
```cpp
/* Iterate over the indices from 0 to 9. */

View File

@ -2,6 +2,6 @@
A `blender::Span<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_span.hh)) references an array that is owned by someone else. It is just a pointer and a size. Since it is so small, it should generally be passed by value.
Using `Span` is the main way to pass multiple elements into a function and should be prefered over e.g. `const Vector &` because it gives the caller more flexibility.
`Span` is the main way to pass multiple elements into a function and should be prefered over e.g. `const Vector &` because it gives the caller more flexibility.
The memory directly referenced by the span is considered to be `const`. This is different from `std::span` where constness is not the default. When non-constness is required, a `MutableSpan` can be used. This also makes the intention more clear.

View File

@ -1,3 +1,3 @@
# Strings
Blender usually stores strings as `std::string`. If strings are passed around without transfering ownership `blender::StringRef` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_string_ref.hh)) should be used. `StringRef` is a non-owning slice of a string. It's just a pointer and a size and should generally be passed around by value. If a string with null-termination is required, `StringRefNull` should be used instead.
Blender usually stores strings as `std::string`. If strings are passed around without transfering ownership, `blender::StringRef` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_string_ref.hh)) should be used. `StringRef` is a non-owning slice of a string. It's just a pointer and a size and should generally be passed around by value. If a string with null-termination is required, `StringRefNull` should be used instead.

View File

@ -2,7 +2,7 @@
A virtual array ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_virtual_array.hh)) is a data structure that behaves similarly to an array, but its elements are accessed through virtual methods. This improves the decoupling of a function from its callers, because it does not have to know exactly how the data is laid out in memory, or if it is stored in memory at all. It could just as well be computed on the fly.
Taking a virtual array as parameter instead of a more specific non-virtual type has some tradeoffs. Access to individual elements of the individual elements is slower due to function call overhead. On the other hand, potential callers don't have to convert the data into the specific format required for the function. This can be a costly conversion if only few of the elements are accessed in the end.
Taking a virtual array as parameter instead of a more specific non-virtual type has some tradeoffs. Access to individual elements is slower due to function call overhead. On the other hand, potential callers don't have to convert the data into the specific format required for the function. That can be a costly conversion if only few of the elements are accessed in the end.
Functions taking a virtual array as input can still optimize for different data layouts. For example, they can check if the array references contiguous memory internally or if it is the same value for all indices. Whether it is worth optimizing for different data layouts in a function has to be decided on a case by case basis. One should always do some benchmarking to see if the increased compile time and binary size is worth it.