BLI: add high level documentation for core data structures #25

Merged
Jacques Lucke merged 14 commits from JacquesLucke/blender-developer-docs:core-data-structures into main 2024-02-19 19:09:34 +01:00
Showing only changes of commit 90a34b159b - Show all commits

View File

@ -1,3 +1,218 @@
# Containers
`Array/Vector/Stack/Set/Map/VectorSet`
[Container](https://en.wikipedia.org/wiki/Container_(abstract_data_type)) data structures allow storing many elements of the same type. Different structures in this category allow for different access patterns.
## Vector
The `blender::Vector<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector.hh)) is the most important data structure. It stores values of the given type in a dynamically growing continuous buffer.
```cpp
/* Create an empty vector. */
Vector<int> values;
/* Add an element to the end of the vector. */
values.append(5);
/* Access an element at the given index. */
int value = values[0];
```
## Array
`blender::Array<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_array.hh)) is very similar to `Vector`. The main difference is that it is not dynamically growing. Instead its size is usually only set once and stays the same for the rest of its life-time. It has a slightly lower memory footprint than `Vector`. Using an `Array` instead of `Vector` also indicates that the size is not expected to change.
Note that this is different from `std::array` for which the size has to be known at compile time. If the size is actually known at compile time, `std::array` should be used instead.
## Stack
`blender::Stack<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_stack.hh)) implements a first-in-last-out data structure. It does not allow accessing any element that is not at the top of the stack currently.
```cpp
/* Construct an empty stack. */
Stack<int> stack;
/* Add two elements to the stack. */
stack.push(5);
stack.push(3);
/* Look at the element at the top of the stack without removing it. */
int value = stack.peek();
/* Remove the top element of the stack and return it. */
int value = stack.pop();
```
A `Vector` can also be used as a `Stack` using the `Vector::append`, `Vector::last` and `Vector::pop_last` methods. This is benefitial if one also needs the ability to iterate over all elements that are currently in the stack. If that's not required, it's better to use `Stack` directly because of it's more purpose-designed methods and it allows pushing in O(1) (not just armortized as it does not require reallocating already pushed elements).
## Set
A `blender::Set<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_set.hh)) stores values of type `T` while making sure that there are no duplicates. The elements added to a `Set` have no particular order.
```cpp
/* Construct an empty set. */
Set<int> values;
/* Add two elements. */
values.add(5);
values.add(3);
/* Adding the same element again has no effect. */
values.add(5);
/* Add value that is known not to be in the set already. */
/* Makes intend more obvious, allows for better asserts and performance. */
values.add_new(10);
/* Remove an element. */
values.remove(5);
/* Check if an element is contained in the set. */
bool is_contained = values.contains(3);
```
Using `Set` with a custom type requires an equality operator and a [hash](#hashing) function.
While a `Vector` could also be used to mimic the behavior of a `Set`, it's generally much less efficient at that task. A `Set` uses a hash table internally which allows it to check for duplicates in constant time instead of having to compare the value with every previously added element.
## Map
A `blender::Map<Key, Value>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_map.hh)) stores values that can be looked up by a key. Every key can exist at most once. The key-value-pairs have no particular order.
```cpp
/* Construct an empty map. */
Map<int, std::string> values;
/* Add key-value-pair. */
values.add(2, "two");
/* Adding the same key again does *not* effect the map. */
values.add(2, "TWO");
/* Add a new value for the existing key. */
values.add_overwrite(2, "Two");
/* Add a value that is known not to be in the map already. */
/* Makes intend more obvious, allows for better asserts and performance. */
values.add_new(3, "three");
/* Check if a key exists in the map. */
bool exists = values.contains(2);
/* Remove a key-value-pair if it exists. */
values.remove(2);
/* Get the value for a given key. */
/* This fails if the key does not exist. */
std::string &value = values.lookup(3);
/* Get the value for a given key and return a default value if it's not in the map. */
std::string value = values.lookup_default(4, "unknown");
/* Lookup the value or add a new value if it doesn't exist yet. */
/* This can be used to implement maps with default-values. */
/* Always adds an 'a' to the string corresponding to the key 4. */
values.lookup_or_add(4, "") += "a";
/* Iterate over all keys. */
for (const int key : values.keys()) { /* ... */ }
/* Iterate over all values. */
for (const std::string &value : keys.values()) { /* ... */ }
/* Iterate over all key-value-pairs. */
for (const MapItem<int, std::string> item : keys.items()) {
do_something(item.key, item.value);
}
```
Using `Map` with a custom type as key requires an equality operator and a [hash](#hashing) function.
## Vector Set
A `blender::VectorSet<T>` ([source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_vector_set.hh)) is a combination of a `Vector` and a `Set`. It can't contain duplicate values like a `Set` but the values stored in it are ordered based on insertion order (until elements are removed). Just like in a `Vector`, the values are also stored in a continuous array which makes it easy to pass them to other functions that expect an array.
```cpp
/* Construct empty vector-set. */
VectorSet<int> values;
/* Add elements. Duplicate values are ignored. */
values.add(5);
values.add(5);
values.add(6);
/* Check if value is contained. */
bool is_contained = values.contains(5);
/* Get index of value. */
int index = values.index_of(5); /* = 0 */
int index = values.index_of(6); /* = 1 */
/* Access value by index. */
int value = values[1]; /* = 6 */
```
Using `VectorSet` with a custom type requires an equality operator and a [hash](#hashing) function.
## Common Concepts
These are some concepts that apply to many of the container types.
### Inline Buffers
Most container types mentioned above (except `VectorSet` currently) have an inline buffer. For as long as the elements added to the container fit into the inline buffer, no additional allocation is made. This is important because allocations can be a performance bottleneck.
Inline buffers are enabled by default in supported containers. It's generally recommended to use the default value, but there are cases when the inline buffer size should be set manually.
* When building a compact type which has a container that is usually empty, the inline buffer size could be set to 0. It should also be considered to just wrap the container in a `std::unique_ptr` in this case.
* When working in hot code that requires e.g. a `Vector`, the inline buffer size can be increased to make better use of stack memory and to avoid allocations in the majority of cases.
The inline buffer is is typically the first template parameter after the type.
```cpp
/* Construct vector that can hold up to 32 ints without extra memory. */
Vector<int, 32> vec;
/* Same as above, but for other container types. */
Array<int, 32> array;
Stack<int, 32> stack;
Set<int, 32> set;
Map<int, float, 32> map;
```
Using a larger online buffer obviously also increases the size of the type: `sizeof(Vector<int>) < sizeof(Vector<int, 32>)`.
### Hashing
Using custom types in a data structure that uses a hash table (like `Set`, `Map` and `VectorSet`) requires the implementation of the equality operator (`operator==`) and a `hash` function.
```cpp
struct MyType {
int x, y;
/* Return true when both values are considered equal. */
friend bool operator==(const MyType &a, const MyType &b) {
return /* ... */
}
uint64_t hash() const {
return get_default_hash(this->x, this->y);
}
};
```
A potentially more convenient way to implement the equality operator could be to use `BLI_STRUCT_EQUALITY_OPERATORS_2`.
The `hash` function has to return the same value for two instances of the type that are considered equal. In theory, even returning a constant value fullfills that requirement and would be correct. However, having a hash function that always returns the same value makes using hash tables useless and degrades performance.
Simply calling `get_default_hash` on the data members that should impact the hash (which are the same ones which should impact equality) is usually good enough. When designing a custom hash function, it's recommended to put as much variation as possible into the lower bits. Those are used by the containers at first. However, if the low bits have too many collisions, the higher bits are taken into account automatically as well.
It's also possible to implement a hash function for a type that methods can't be added to (e.g. because it comes from an external separate library). This can be done by specializing the `blender::DefaultHash<T>` struct. For more information look at the [source](https://projects.blender.org/blender/blender/src/branch/main/source/blender/blenlib/BLI_hash.hh).
### Method Overloads and `*_as` Methods
Many core methods on the container data structures have multiple overloads. For example, `Vector::append` has two versions. One that takes a `const T &` and one that takes a `T &&` parameter. Those are used to properly handle the cases when the value is copied or moved into the container.
Additionally, many methods have a variant with the `_as` suffix. Such methods allow passing in the parameter with a different type than what the container actually contains. For example, `Vector<std::string>::append_as` can also be called with a `const char *` parameter, and not just `std::string`. The `std::string` is then constructed inplace. This avoids the need to construct it first and then to move it in the right place. This specific example is very similar to `std::vector::emplace_back`.
However, the `*_as` convention is a bit more general. For example, it allows calling `Set::add_as` or `Set::contains_as` to be called with a type that is not exactly the one which is stored. This avoids the construction of stored type in many cases.
Other libraries sometimes support this as well, without the additional `_as` suffix, but that also leads to more complex error messages in the common cases.