- Reduce variable scope.
- Use snake-case for variables.
- Remove unnecessary counter when building file-list.
- Remove break after return.
- Use early return.
- Add missing braces.
- Use pascel-case type names, instead of snake-case with `_t` suffix.
- Use `GWL_` prefix (short for GhostWayLand), to distinguish these
types from ghost (`GHOST_*`) and wayland (`wl_*`) types.
- Rename `input` to `seat` (following wayland's own terminology).
- Use `wl_` prefix for wayland native variables which have locally
defined equivalents so `GWL_Output *output` isn't confused with
`struct wl_output *wl_output`. As the locally defined types are used
more often this is less verbose overall.
This commit is a big overhaul to the Mikktspace module, which is used
to compute tangents. I'm not calling it a rewrite since it's the
result of a lot of iterations on the original code, but pretty much
everything is reworked somehow.
Overall goal was to a) make it faster and b) make it maintainable.
Notable changes:
- Since the callbacks for requesting geometry data were a big
bottleneck before, I've ported it to C++ and made it header-only,
templating on the data source. That way, the compiler generates code
specific to the caller, which allows it to inline the data source and
specialize for some cases (e.g. subd vs. non-subd in Cycles).
- The one input parameter, an optional angle threshold, was not used
anywhere. Turns out that removing it allows for considerable
algorithmic simplification, removing a lot of the complexity in the
later stages. Therefore, I've just removed the option in the new code.
- The code computes several outputs, but only one (the tangent itself)
is ever used in Blender. Therefore, I've removed the others to
simplify the code. They could easily be brought back if needed, none
of the algorithmic simplifications are conflicting with them.
- The original code had fallback paths for many steps in case temporary
memory allocation fails, but that never actually gets used anyways
since malloc() doesn't really ever return NULL in practise, so I
removed them.
- In general, I've restructured A LOT of the code to make the
algorithms clearer and make use of some C++ features (vectors,
std::array, booleans, classes), though there's still some of cleanup
that could be done.
- Parallelized duplicate detection, neighbor detection, triangle
tangent computation, degenerate triangle handling and tangent space
accumulation.
- Replaced several algorithms with faster equivalents: Duplicate
detection uses a (concurrent) hash set now, neighbor detection uses
Radixsort and splits vertices by index pairs etc.
As for results, the exact speedup depends on the scene of course, but
let's consider the file from T97378:
- Blender 3.1 (before D14675): 6.07sec
- Blender 3.2 (with D14675): 4.62sec
- rBf0a36599007d (last nightly build): 4.42sec
- With this commit: 0.90sec
This speedup will mostly be noticed at the start of Cycles renders and,
even more importantly, in Eevee when doing something that changes the
geometry (e.g. animating) on a model using normal maps.
Differential Revision: https://developer.blender.org/D15589
The recent revert of Apple silicon inlining changes to avoid long compile times
worked on macOS 12, but in macOS 13 Beta it results in render errors. This may
be a compiler bug and perhaps get fixed in time, but try to be on the safe side
and ensure Blender 3.3.0 works regardless.
This brings part of the inlining back, which brings improved performance but
also longer compiler times again. Compile time is around 2min now, where the
previous full inlining was about 5-7min.
Patch by Michael Jones.
Differential Revision: https://developer.blender.org/D15897
Enables a feature flag during OpenGL device initialisation on macOS, which increases the available number of texture samplers available for use within shaders. Enabling this flag removes purple rendering artifacts present in certain EEVEE materials, when the existing limit of 16 is exceeded.
This feature flag is supported on Apple Silicon and AMD GPUs, for devices supporting macOS 11.0+. Device initialisation first tests whether GL device creation with the flag is supported, if not, we fall back to standard initialisation.
Other solutions would not be trivial or incur additional performance overhead or feature limitations. Other workarounds, such as texture atlas's, could already be created by artists.
{F13245498}
{F13245497}
Reviewed By: jbakker
Maniphest Tasks: T57759, T63935
Differential Revision: https://developer.blender.org/D15336
For copy-on-write, we want to share attribute arrays between meshes
where possible. Mutable pointers like `Mesh.mvert` make that difficult
by making ownership vague. They also make code more complex by adding
redundancy.
The simplest solution is just removing them and retrieving layers from
`CustomData` as needed. Similar changes have already been applied to
curves and point clouds (e9f82d3dc7, 410a6efb74). Removing use of
the pointers generally makes code more obvious and more reusable.
Mesh data is now accessed with a C++ API (`Mesh::edges()` or
`Mesh::edges_for_write()`), and a C API (`BKE_mesh_edges(mesh)`).
The CoW changes this commit makes possible are described in T95845
and T95842, and started in D14139 and D14140. The change also simplifies
the ongoing mesh struct-of-array refactors from T95965.
**RNA/Python Access Performance**
Theoretically, accessing mesh elements with the RNA API may become
slower, since the layer needs to be found on every random access.
However, overhead is already high enough that this doesn't make a
noticible differenc, and performance is actually improved in some
cases. Random access can be up to 10% faster, but other situations
might be a bit slower. Generally using `foreach_get/set` are the best
way to improve performance. See the differential revision for more
discussion about Python performance.
Cycles has been updated to use raw pointers and the internal Blender
mesh types, mostly because there is no sense in having this overhead
when it's already compiled with Blender. In my tests this roughly
halves the Cycles mesh creation time (0.19s to 0.10s for a 1 million
face grid).
Differential Revision: https://developer.blender.org/D15488
This uses the same sample classification approach as used for PMJ,
because it turns out to also work equally well with Sobol-Burley.
This also implements a fallback (random classification) that should
work "okay" for other samplers, though there are no other samplers
at the moment.
Differential Revision: https://developer.blender.org/D15845
The multi-dimensional Sobol pattern required us to carefully use as low
dimensions as possible, as quality goes down in higher dimensions. Now that we
have two sampling patterns that are at least as good, there is no need to keep
it around and the implementation can be simplified.
Differential Revision: https://developer.blender.org/D15788
Fix two issues in the previous implementation:
* Only power-of-two prefixes were progressively stratified, not suffixes.
This resulted in unnecessarily increased noise when using non-power-of-two
sample counts.
* In order to try to get away with just a single sample pattern, the code
used a combination of sample index shuffling and Cranley-Patterson rotation.
Index shuffling is normally fine, but due to the sample patterns themselves
not being quite right (as described above) this actually resulted in
additional increased noise. Cranley-Patterson, on the other hand, always
increases noise with randomized (t,s) nets like PMJ02, and should be avoided
with these kinds of sequences.
Addressed with the following changes:
* Replace the sample pattern generation code with a much simpler algorithm
recently published in the paper "Stochastic Generation of (t, s) Sample
Sequences". This new implementation is easier to verify, produces fully
progressively stratified PMJ02, and is *far* faster than the previous code,
being O(N) in the number of samples generated.
* It keeps the sample index shuffling, which works correctly now due to the
improved sample patterns. But it now uses a newer high-quality hash instead
of the original Laine-Karras hash.
* The scrambling distance feature cannot (to my knowledge) be implemented with
any decorrelation strategy other than Cranley-Patterson, so Cranley-Patterson
is still used when that feature is enabled. But it is now disabled otherwise,
since it increases noise.
* In place of Cranley-Patterson, multiple independent patterns are generated
and randomly chosen for different pixels and dimensions as described in the
original PMJ paper. In this patch, the pattern selection is done via
hash-based shuffling to ensure there are no repeats within a single pixel
until all patterns have been used.
The combination of these fixes brings the quality of Cycles' PMJ sampler in
line with the previously submitted Sobol-Burley sampler in D15679. They are
essentially indistinguishable in terms of quality/noise, which is expected
since they are both randomized (0,2) sequences.
Differential Revision: https://developer.blender.org/D15746
Use lowercase rgba channel names which still by-passes lossy nature
of DWA compression and which also keeps external compositing tools
happy.
Thanks Steffen Dünner for testing this patch!
Differential Revision: https://developer.blender.org/D15834
The DWA compression code in OpenEXR has hardcoded rules which decides
which channels are lossy or lossless. There is no control over these
rules via API.
This change makes it so channel names of xyzw is used for cryptomatte
passes in Cycles. This works around the hardcoded rules in the DWA code
making it so lossless compression is used. It is important to use lower
case y channel name as the upper case Y uses lossy compression.
The change in the channel naming also makes it so the write code uses
32bit for the cryptomatte even when saving half-float EXR.
Fixes T96933: Cryptomatte layers saved incorrectly with EXR DWA compression
Fixes T88049: Cryptomatte EXR Output Bit Depth should always be 32bit
Differential Revision: https://developer.blender.org/D15823
Leading to excessive memory usage compared to Blender 2.93. There's still
some avoidable memory usage remaining, due to the full float buffer in the
new image editor drawing and not loading the cached EXR from disk in tiles.
Main difficulty was handling multi-image baking and disk caches, which is
solved by associating a unique layer name with each image so it can be
matched when reading back the image from the disk.
Also some minor header changes to be able to use RE_MAXNAME in RE_bake.h.
This patch moves material indices from the mesh `MPoly` struct to a
generic integer attribute. The builtin material index was already
exposed in geometry nodes, but this makes it a "proper" attribute
accessible with Python and visible in the "Attributes" panel.
The goals of the refactor are code simplification and memory and
performance improvements, mainly because the attribute doesn't have
to be stored and processed if there are no materials. However, until
4.0, material indices will still be read and written in the old
format, meaning there may be a temporary increase in memory usage.
Further notes:
* Completely removing the `MPoly.mat_nr` after 4.0 may require
changes to DNA or introducing a new `MPoly` type.
* Geometry nodes regression tests didn't look at material indices,
so the change reveals a bug in the realize instances node that I fixed.
* Access to material indices from the RNA `MeshPolygon` type is slower
with this patch. The `material_index` attribute can be used instead.
* Cycles is changed to read from the attribute instead.
* BMesh isn't changed in this patch. Theoretically it could be though,
to save 2 bytes per face when less than two materials are used.
* Eventually we could use a 16 bit integer attribute type instead.
Ref T95967
Differential Revision: https://developer.blender.org/D15675
sycl/L0 runtime reports compute-runtime version since Intel graphics
driver 101.3268 on Windows, when querying driver version from sycl.
Prior to this driver, it was 0. Now we can bump minimum requirement to
this one and filter-out devices returning 0.
Maniphest Tasks: T100648
When 'm_render_target' was NULL, backbuffer_res would be used without
being assigned. While it seems likely this code-path is rarely used
(if at all), resolve the logical error.
Introduced in [0], checking the logic here, there seems to be no reason
a press event should ever run release logic, relocate break statement.
In practice this was unlikely to cause problems as peeking into press
events would need to fail, peeking into release would need to succeed.
Even so, better avoid accidental fall through in switch statements.
[0]: 6f158f834d
This patch is a response to T92588 and is implemented
as a Function/Shader node.
This node has support for Float, Vector and Color data types.
For Vector it supports uniform and non-uniform mixing.
For Color it now has the option to remove factor clamping.
It replaces the Mix RGB for Shader and Geometry node trees.
As discussed in T96219, this patch converts existing nodes
in .blend files. The old node is still available in the
Python API but hidden from the menus.
Reviewed By: HooglyBoogly, JacquesLucke, simonthommes, brecht
Maniphest Tasks: T92588
Differential Revision: https://developer.blender.org/D13749
Same as other build options, don't make it a hard requirement to have
Wayland libraries installed when it gets enabled by default.
Also fixes wayland-protocols not being found on the buildbot.
This allows individual users or Linux distributions to specify a directory
Cycles will automatically look for the OptiX include folder, to compile kernels
at runtime.
It is still possible to override this with the OPTIX_ROOT_DIR environment
variable at runtime.
Based on patch by Sebastian Parborg.
Ref D15792