Compare commits

..

232 Commits

Author SHA1 Message Date
a82f633ebb GPU: Cleanup GPU_batch.h documentation and some of the API for consistency
Documented all functions, adding use case and side effects.

Also replace the use of shortened argument name by more meaningful ones.

Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).

Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.
2023-02-11 12:52:58 -08:00
5a184f2ff0 Cleanup: Move 6 sculpt-session-related files and header to C++
To allow further mesh data structure refactoring. See #103343

Pull Request #104540
2023-02-11 12:52:55 -08:00
4a75d4b581 Fix #104383: don't update declaration for clipboard copy
When nodes are copied to the clipboard, they don't need their declaration.
For nodes with dynamic declaration that might depend on the node tree itself,
the declaration could not be build anyway, because the node-clipboard does
not have a node tree.

Pull Request #104432
2023-02-11 12:50:45 -08:00
13a5f5c926 Fix #104435: Fix rna_NlaStrip_new add strip logic to be correct boolean expression
Fixed #104435: Use correct conditional logic when testing if a new NLA strip can be added in the rna_NlaStrip_new method
2023-02-11 12:50:45 -08:00
ce2b6a5b44 Cleanup: Fix const correctness warning in recent commit 2023-02-11 12:50:45 -08:00
Kevin C. Burke
2c22895ee6 Fix: Sequencer "Pan" label using incorrect keyword 'heading_ctxt'
Oversight in db87e2a638

Reviewed By: ISS
Differential Revision: https://archive.blender.org/developer/D17213
2023-02-11 12:50:45 -08:00
4fc62cf0f1 Geometry Nodes: Edges to Face Groups Node
Add a new node that groups faces inside of boundary edge regions.
This is the opposite action as the existing "Face Group Boundaries"
node. It's also the same as some of the "Initialize Face Sets"
options in sculpt mode.

Discussion in #102962 has favored "Group" for a name for these
sockets rather than "Set", so that is used here.

Pull Request #104428
2023-02-11 12:50:44 -08:00
ed40f76fe2 Assets: Implement viewport drag and drop for geometry nodes
Currently there's no way to assign a geometry node group from the asset
browser to an object as a modifier without first appending/linking it
manually. This patch adds a drag and drop operator that adds a new
modifier and assigns the dragged tree.

Pull Request #104430
2023-02-11 12:50:44 -08:00
8f56c7835e Fix: Add missing "-" in logic to get the channel height
This was missed when doing the refactoring in #104500
It didn't seem to have any effect until I worked on clamping the view
2023-02-11 12:50:44 -08:00
3e8c82fb68 Mesh: Remove unnecessary edge draw flag
As described in #95966, replace the `ME_EDGEDRAW` flag with a bit
vector in mesh runtime data. Currently the the flag is only ever set
to false for the "optimal display" feature of the subdivision surface
modifier. When creating an "original" mesh in the main data-base,
the flag is always supposed to be true.

The bit vector is now created by the modifier only as necessary, and
is cleared for topology-changing operations. This fixes incorrect
interpolation of the flag as noted in #104376. Generally it isn't
possible to interpolate it through topology-changing operations.

After this, only the seam status needs to be removed from edges before
we can replace them with the generic `int2` type (or something similar)
and reduce memory usage by 1/3.

Related:
- 10131a6f62
- 145839aa42

In the future `BM_ELEM_DRAW` could be removed as well. Currently it is
used and aliased by other defines in some non-obvious ways though.

Pull Request #104417
2023-02-11 12:50:44 -08:00
b352d9986e Curves: Add box selection
This adds a `select_box` function for the `Curves` object. It is used in the `view3d_box_select` operator.

It also adds the basic selection tools in the toolbar of Edit Mode.

Authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104411
2023-02-11 12:50:44 -08:00
8be7d35212 I18n: use format strings for Cycles version error messages
The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.

Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.

Pull Request #104488
2023-02-11 12:50:44 -08:00
6136c39b56 Refactor: remove yscale from bAnimContext
`bAnimContext` had a float property called `yscale_fac` that was used to define the height of the keyframe channels.

However the property was never set, only read so there really is no need to have it in the struct.

Moreover it complicated getting the channel height because `bAnimContext` had to be passed in.

Speaking of getting the channel height. This was done with macros. I ripped them all out and replaced them with function calls.

Originally it was introduced in this patch: https://developer.blender.org/rB095c8dbe6919857ea322b213a1e240161cd7c843

Co-authored-by: Christoph Lendenfeld <chris.lenden@gmail.com>
Pull Request #104500
2023-02-11 12:50:44 -08:00
760665e88b Fix freeing uninitialized pointer in GHOST/Wayland + X11 fallback
Freeing the timer manager didn't account for Wayland being partially
initialized.
2023-02-11 12:50:44 -08:00
70897a4fea Build: disable LTO for Python builds
LTO compiled libpython3.10.a failed to link with GCC 12.0,
disable since these libraries are intended for developers to link
against.
2023-02-11 12:50:44 -08:00
70d05ccd01 Build: enable Python optimizations (PGO & LTO) on Linux
This is used for most Python release builds and has been reported to
give a modest 5-10% speedup (depending on the workload).

This could be enabled on macOS too but needs to be tested.
2023-02-11 12:50:44 -08:00
c1fd0fa079 GPU: Fix assert when using light gizmo.
Blender was reporting that the GPU_TEXTURE_USAGE_HOST_READ wasn't set.
This is used to indicate that the textures needs to be read back to
CPU. Textures that don't need to be read back can be optimized by the
GPU backend.

Found during investigation of #104282.
2023-02-11 12:50:44 -08:00
6495e36779 Spelling: Assert message in GPU_texture_read. 2023-02-11 12:50:44 -08:00
66c7106672 Cleanup: solve compiler warnings.
Classes were predefined as structs.
2023-02-11 12:50:44 -08:00
8d8960001c Cycles: update Intel Graphics Compiler to 1.0.13064.7 on Linux
Linux side of 8afcecdf1f.

Reviewed by: LazyDodo, sergey, campbellbarton
Ref !104458, 16984
2023-02-11 12:50:44 -08:00
95da6fddcb Cleanup: Remove unused/redundant includes from BKE_curves.hh
Avoid including headers that are obviously redundant, and don't
include BLI_task.hh in the header file, since it isn't really related.
2023-02-11 12:50:44 -08:00
5fba2a156b Cleanup: update username in code-comments: campbellbarton -> ideasman42
Gitea migration changed my username, update code-comments.
2023-02-11 12:50:43 -08:00
c58e60af70 Cleanup: spelling in comments 2023-02-11 12:50:43 -08:00
0f3a8ff8f3 Cleanup: enum conversion compiler warnings 2023-02-11 12:50:43 -08:00
3aff175c27 PyAPI: minor change to rna_manual_reference loading
- Use bpy.utils.execfile instead of importing then deleting from
  sys.modules.
- Add a note for why keeping this cached in memory isn't necessary.

This has the advantage of not interfering with any scripts that import
`rna_manual_reference` as a module.
2023-02-11 12:50:43 -08:00
a6861475b6 EEVEE-Next: Shadows: Add global switch
This allow to bypass all cost associated with shadow mapping.

This can be useful in certain situation, such as opening a scene on a
lower end system or just to gain performance in some situation (lookdev).
2023-02-11 12:50:43 -08:00
02576a8b8e EEVEE-Next: Shadow: Fix issue with last merge
The merge with master updated the code to use the new matrix API. This
introduce some regressions.

For sunlights make sure there is enough tilemaps in orthographic mode
to cover the depth range and fix the level offset in perspective.
2023-02-11 12:50:43 -08:00
f5031dc7f4 Fix Cycles link error with debug/asan builds after recent bugfix
Pull Request #104487
2023-02-11 12:50:43 -08:00
fa55c85176 EEVEE-Next: Virtual Shadow Map initial implementation
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.

The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.

Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.

## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.

## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).

Related to #93220
Related to #104472
2023-02-11 12:50:43 -08:00
df37a60c62 BLI: Math: Fix vector operator * with MutableMatView
This was caused by operator priority trying to use
`friend VecBase operator*(const VecBase &a, FactorT b)`.

Adding tests as these were not covered.
2023-02-11 12:50:43 -08:00
f7e4c3cb69 Fix Cycles debug build error after host falback changes
Introduced in dcfb6df9ce6.

Co-authored-by: Lucas Tadeu Teixeira <lucas@lucastadeu.com>

Pull Request #104454
2023-02-11 12:50:43 -08:00
909daaf629 Make update: Ignore submodules
The previous change in the .gitmodules made it so the `make update`
rejects to do its thing because it now sees changes in the submodules
and rejected to update, thinking there are unstaged changes.

Ignore the submodule changes, bringing the old behavior closer to
what it was.
2023-02-11 12:50:42 -08:00
28e9e40a8d Un-ignore modules in .gitmodules configuration
The meaning of the ignore option for submodules did change since our
initial Git setup was done: back then it was affecting both diff and
stage families of Git command. Unfortunately, the actual behavior did
violate what documentation was stating (the documentation was stating
that the option only affects diff family of commands). This got fixed
in Git some time after our initial setup and it was the behavior of the
commands changed, not the documentation. This lead to a situation when
we can no longer see that submodules are modified and staged, and it is
very easy to stage the submodules.

For the clarity: diff and status are both "status" family, show and
diff are "diff" family.

Hence this change: since there is no built-in zero-configuration way
of forbidding Git from staging submodules lets make it visible and
clear what the state of submodules is.

We still need to inform people to not stage submodules, for which
we can offer some configuration tips and scripts but doing so is
outside of the scope of this change at it requires some additional
research. Current goal is simple: make it visible and clear what is
going to be committed to Git.

This is a response to an increased frequency of incidents when the
submodules are getting modified and committed without authors even
noticing this (which is also a bit annoying to recover from).

Differential Revision: https://developer.blender.org/D13001
2023-02-11 12:50:42 -08:00
c5b0eba85b Fix references to the main branch in the .gitmodules 2023-02-11 12:50:42 -08:00
52a31a8ca9 Subdivision Surface: fix a serious performance hit when mixing CPU & GPU.
Subdivision surface efficiency relies on caching pre-computed topology
data for evaluation between frames. However, while eed45d2a23
introduced a second GPU subdiv evaluator type, it still only kept
one slot for caching this runtime data per mesh.

The result is that if the mesh is also needed on CPU, for instance
due to a modifier on a different object (e.g. shrinkwrap), the two
evaluators are used at the same time and fight over the single slot.
This causes the topology data to be discarded and recomputed twice
per frame.

Since avoiding duplicate evaluation is a complex task, this fix
simply adds a second separate cache slot for the GPU data, so that
the cost is simply running subdivision twice, not recomputing topology
twice.

To help diagnostics, I also add a message to show when GPU evaluation
is actually used to the modifier panel. Two frame counters are used
to suppress flicker in the UI panel.

Differential Revision: https://developer.blender.org/D17117

Pull Request #104441
2023-02-11 12:50:42 -08:00
d3f14bda16 Cleanup: use enum literals, order likely case first in polyfill_2d 2023-02-11 12:50:42 -08:00
3abceef389 Fix #103913: Triangulate sometimes creates degenerate triangles
The ear clipping method used by polyfill_2d only excluded concave ears
which meant ears exactly co-linear edges created zero area triangles
even when convex ears are available.

While polyfill_2d prioritizes performance over *pretty* results,
there is no need to pick degenerate triangles with other candidates
are available. As noted in code-comments, callers that require higher
quality tessellation should use BLI_polyfill_beautify.
2023-02-11 12:50:42 -08:00
ed334e642f Merge branch 'main' into temp-sculpt-roll-mapping 2023-02-07 18:15:15 -08:00
ef04a4262e Merge branch 'master' into temp-sculpt-roll-mapping 2023-02-07 18:15:08 -08:00
6aa1b5d031 Cleanup: format 2023-02-08 00:21:57 +01:00
5c994d7846 Fix #104297: Cycling geometry nodes viewer ignores sockets
Sockets after the geometry socket were ignored when cycling through
the node's output sockets. If there are multiple geometry sockets, the
behavior could still be refined probably, but this should at least make
basic non-geometry socket cycling work.
2023-02-07 16:01:54 -05:00
53b057aa09 Cleanup: Move 18 sculpt files to C++
To allow further mesh data structure refactoring. See #103343

Pull Request #104436
2023-02-07 21:56:45 +01:00
e817cff009 Release: support generating LTS release notes from Gitea
Now a single script to generate both links and release notes. It also includes
the issue ID for the LTS releases, so only the release version needs to be
specified.

Pull Request #104402
2023-02-07 21:23:24 +01:00
41ddd3d732 Fix: Experimental Panel links modified for Gitea
Modifies the links to point to the new developer site.

Pull Request #104425
2023-02-07 19:54:43 +01:00
f5552d759c Fix compiler error 2023-02-07 18:32:24 +01:00
f01bf82480 Curves: Add select pick operator
This adds the `select_pick` function for to `Curves` objects.
It is used in the common `view3d_select` operator.

Pull Request #104406
2023-02-07 17:50:39 +01:00
8d9d16fb53 Fix #104396: Blender crashes when moving Keyframes in Graph Editor
`t->region->gizmo_map` can be `nullptr`.

Caused by 19b63b932d
2023-02-07 12:59:35 -03:00
349350b304 Fix T104390: Regression: Object selection in viewport is not working
Caused by alignment difference between C and C++. Asan caught the issue
on startup.

Removing the unused view matrix storage copy avoids this problem.
2023-02-07 16:45:47 +01:00
cb5318b651 Docs: change Git URLs to point projects.blender.org instead of git.blender.org 2023-02-07 14:23:05 +01:00
bd6b0bac88 Update references to the new projects platform and main branch 2023-02-07 14:18:19 +01:00
3002670332 Fix T104368: Incorrect tooltip text in Blender 3.4.1's Preferences > File Paths > Scripts field.
Use backticks to cleary identify 'path' parts of this tooltip.
2023-02-07 09:50:50 +01:00
f086cf3cea Cleanup: remove redundant parenthesis 2023-02-07 17:34:20 +11:00
2609ca2b8e Cleanup: tweaks to cycles/metal preferences
- Auto-format.
- Use raw string for regex.
- Remove redundant assignment.
- Remove duplicate arm64 check.
- Break early out of loop.
2023-02-07 17:30:13 +11:00
b-init
7e8153b07d Keymap: support default shortcut to toggle overlays in all space-types
UV Editor, Image Editor & Sequencer didn't have a shortcut for toggling
overlays. Use the same shortcut as the 3D viewport.

Ref D16959
2023-02-07 16:54:06 +11:00
622cad7073 Cleanup: minor tweak to recent fix for T10438
Minor change to [0], prefer calling em_setup_viewcontext,
even though there is no functional difference at the moment,
if this function ever performs additional operations than assigning
`ViewContext.em`, it would have to be manually in-lined in
`view3d_circle_select_recalc`.

[0]: 430cc9d7bf
2023-02-07 16:18:30 +11:00
44daeaae7d Cleanup: use arg instead of param for generated sphinx docs 2023-02-07 15:14:22 +11:00
db8b5a2316 PyDoc: remove deprecated dpi argument from BLF example 2023-02-07 15:12:05 +11:00
dbca0cc9d5 Fix crash on exit under Wayland
Order of free error from [0] caused the timer manager
to be freed before the timer.

[0]: 7de1a4d1d8
2023-02-07 15:12:05 +11:00
e4f77c1a6c Cleanup: format 2023-02-07 16:57:35 +13:00
Jon Denning
e27c89c7c7 Docs: added missing documentation for WindowManager methods
Added missing documentation for `draw_cursor_add` and
`draw_cursor_remove` methods for `WindowManager`.

Differential Revision: https://developer.blender.org/D14860
2023-02-06 22:40:10 -05:00
af5706c960 Docs: improve doc-string for WM_operator_flag_only_pass_through_on_press
The doc-string didn't provide any context for how the funciton is
intended to be used.
2023-02-07 14:18:59 +11:00
a99022e22d Cleanup: spelling in comments 2023-02-07 14:17:01 +11:00
d5af895419 Fix missing matrix includes 2023-02-07 14:07:21 +11:00
Jason Fielder
8703db393b Metal: Ensure explicit return after discard to eliminate differences in behaviour between GPUs.
Discard is not always treated as an explicit return and flow control can continue for required derivative calculations. This behaviour is different in Metal vs OpenGL. Adding return after discards ensures consistency in expectation as behaviour is well-defined.

Authored by Apple: Michael Parkin-White

Ref T96261

Reviewed By: fclem

Maniphest Tasks: T96261

Differential Revision: https://developer.blender.org/D17199
2023-02-07 00:58:06 +01:00
Jason Fielder
f152159101 Metal: Guard advanced command buffer debugging behind OS version flag.
Authored by Apple: Michael Parkin-White

Ref T96261

Reviewed By: fclem

Maniphest Tasks: T96261

Differential Revision: https://developer.blender.org/D17181
2023-02-07 00:51:16 +01:00
d3500c482f Cleanup: Move DRW_pbvh.h header to C++
For continued refactoring of the Mesh data structure. See T103343.
2023-02-06 16:52:02 -05:00
3a1583972a Fix T104256: Curve to points node skips curve domain attributes
7536abbe16 forgot to port the curve domain attributes.
2023-02-06 16:30:25 -05:00
6dcfb6df9c Cycles: Abstract host memory fallback for GPU devices
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17173
2023-02-06 22:19:32 +01:00
b0b9e746fa BLI: Use BLI_math_matrix_type.hh instead of BLI_math_float4x4.hh
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).

This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.
2023-02-06 21:25:45 +01:00
8be3fcab7e Fix tests after recent commit.
`9c14039a8f4b5f` broke blenlib tests in release builds, due to how
`EXPECT_BLI_ASSERT` works (in release builds it just calls the given
function, so if that crashes then the test fails).

For now remove that check in the test.
2023-02-06 21:14:20 +01:00
d20f992322 Mesh: Avoid copying positions in object mode modifier stack
Remove the use of a separate contiguous positions array now that
they are stored that way in the first place. This allows removing the
complexity of tracking whether it is allocated and deformed in the
mesh modifier stack.

Instead of deferring the creation of the final mesh until after the
positions have been copied and deformed, create the final mesh
first and then deform its positions.

Since vertex and face normals are calculated lazily, we can rely on
individual modifiers to calculate them as necessary and simplify
the modifier stack. This was hard to change before because of the
separate array of deformed positions.

Differential Revision: https://developer.blender.org/D16971
2023-02-06 14:34:29 -05:00
a38d99e0b2 Fix (unreported): Transform gizmo not restoring when changing mode
When activating a rotation with the Transform gizmo for example, some
gizmos are hidden but they don't reappear when changing the mode.

Make sure the gizmos corresponding to the mode always reappear.
2023-02-06 16:20:12 -03:00
75db4c082b Cleanup: Use BitVector instead of BLI_bitmap for subsurf face dot tags
For better type safety and more automatic memory management.
2023-02-06 14:13:53 -05:00
Michael Jones
2d994de77c Cycles: MetalRT optimisation for subsurface intersection queries
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.

Patch authored by Marco Giordano.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17153
2023-02-06 19:12:29 +00:00
961d99d3a4 Minor comments about current usages of BKE_main_namemap_destroy. 2023-02-06 19:36:36 +01:00
430cc9d7bf Fix T104381: Assert on Circle Select end modal
`em_setup_vivewcontext` cannot be used in this function now as it
expects `obedit` to be a mesh. It also duplicated the viewcontext init.
Instead `BKE_editmesh_from_object` is called only when type is a mesh.
2023-02-06 19:32:14 +01:00
a3551ed878 BKE main namemap: Add a way to clear all namemaps of a given Main.
Existing `BKE_main_namemap_destroy` is too specific when a entire Main
needs to have its namemaps cleared, since it would not handle the
Library ones.

While in regular situation current code is fine, ID management code that
may also edit linked data needs a wider, simpler clearing tool.
2023-02-06 19:29:21 +01:00
288b13b252 BKE lib remapper: Add new util to overwrite an existing mapping.
Current `BKE_id_remapper_add` would not replace an already existing
mapping rule, now `BKE_id_remapper_add_overwrite` allows that behavior
if necessary.
2023-02-06 19:29:21 +01:00
85b2bce037 BKE libremap: do not assert on identity pairs.
If identity pairs (i.e. old ID pointer being same as new one) was
forbidden, then this should be asserted against in code defining
remapping, not in code applying it.

But it is actually sometimes usefull to allow/use identity pairs, so
simply early-return on these instead of asserting.
2023-02-06 19:29:21 +01:00
c7b601c79e Cleanup: Subdiv: Use index instead of pointer
The edge pointer was retrieved from the index and then just
used to calculate the index again. Remove the edge pointer
and only use the index.
2023-02-06 12:29:01 -05:00
81f8d74f6d EEVEE: Cleanup: light power / radiance code and document it
All thanks to @weizhen for spoting the inconsistencies.
2023-02-06 18:23:34 +01:00
9c14039a8f BLI ListBase: Add new 'SplitAfter' given link util.
Allows to easily split a ListBase by moving the part after given link
into a new ListBase.
2023-02-06 17:58:40 +01:00
39e0bbfa55 BKE libquery: Add option to also process ID.lib pointer.
This was never needed before, but upcomming work for Brush Asset project
requires it.
2023-02-06 17:31:01 +01:00
8fe4f3b756 Minor code doc improvement. 2023-02-06 17:19:02 +01:00
Iliya Katueshenock
a86f657692 Fix T104233: crash when deleting a group node that is displayed by other editor
Differential Revision: https://developer.blender.org/D17172
2023-02-06 16:18:16 +01:00
8adebaeb7c Modifiers: measure execution time and provide Python access
The goal is to give technical artists the ability to optimize modifier usage
and/or geometry node groups for performance. In the long term, it
would be useful if Blender could provide its own UI to display profiling
information to users. However, right now, there are too many open
design questions making it infeasible to tackle this in the short term.

This commit uses a simpler approach: Instead of adding new ui for
profiling data, it exposes the execution-time of modifiers in the Python
API. This allows technical artists to access the information and to build
their own UI to display the relevant information. In the long term this
will hopefully also help us to integrate a native ui for this in Blender
by observing how users use this information.

Note: The execution time of a modifier highly depends on what other
things the CPU is doing at the same time. For example, in many more
complex files, many objects and therefore modifiers are evaluated at
the same time by multiple threads which makes the measurement
much less reliable. For best results, make sure that only one object
is evaluated at a time (e.g. by changing it in isolation) and that no
other process on the system keeps the CPU busy.

As shown below, the execution time has to be accessed on the
evaluated object, not the original object.

```lang=python
import bpy
depsgraph = bpy.context.view_layer.depsgraph
ob = bpy.context.active_object
ob_eval = ob.evaluated_get(depsgraph)
modifier_eval = ob_eval.modifiers[0]
print(modifier_eval.execution_time, "s")
```

Differential Revision: https://developer.blender.org/D17185
2023-02-06 15:40:15 +01:00
773a36d2f8 Fix Cycles OneAPI build error after recent changes 2023-02-06 15:36:49 +01:00
cc623ee7b0 Transform: do not save settings when canceling the operation
If we are canceling, the settings must remain the same as before we
start the operation.
2023-02-06 11:18:52 -03:00
deaddbdcff Fix forced snap status being removed when changing transform mode
The `SNAP_FORCED` setting is set to the operation and not the snap
status.

Therefore, this option should not be cleared along with the other
statuses when resetting snapping.

Move then the location of this setting to `TransInfo::modifiers`.
2023-02-06 11:18:52 -03:00
f2538c7173 Fix T104335: MNEE + OptiX OSL results in illegal address error
The OptiX pipeline created for OSL was missing sufficient continuation
stack to handle the MNEE ray generation program.
2023-02-06 15:06:52 +01:00
4bd3b02984 Python: Suppress BGL deprecation messages after 100 times.
BGL deprecation calls used to be reported on each use. As bgl calls
are typically part of a handler that is triggered at refresh this
could lead to overflow of messages and slowing down systems when
the terminal/console had to be refreshed as well.

This patch only reports the first 100 bgl deprecation calls. This
gives enough feedback to the developer that changes needs to be made
. But still provides good responsiveness to users when they have
such add-on enabled. Only the first frames can have a slowdown.
2023-02-06 13:35:29 +01:00
7beb487e9a Fix T104353: Crash on opening sculpting template
`t->region` was `NULL`.

It can happen depending on the context.

Caused by rB19b63b932d2b.
2023-02-06 09:21:04 -03:00
9ad3a85f8b Fix Cycles GPU binaries build error after recent changes for Metal 2023-02-06 13:17:57 +01:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Michael Jones
46c9f7702a Cycles: Enable MetalRT opt-in for AMD/Navi2 GPUs
Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17043
2023-02-06 11:14:11 +00:00
Michael Jones
be0912a402 Cycles: Prevent use of both AMD and Intel Metal devices at same time
This patch removes the option to select both AMD and Intel GPUs on system that have both. Currently both devices will be selected by default which results in crashes and other poorly understood behaviour. This patch adds precedence for using any discrete AMD GPU over an integrated Intel one. This can be overridden with CYCLES_METAL_FORCE_INTEL.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17166
2023-02-06 11:13:33 +00:00
Michael Jones
0a3df611e7 Fix T103393: Cycles: Undefine __LIGHT_TREE__ on Metal/AMD to fix perf
This patch fixes T103393 by undefining `__LIGHT_TREE__` on Metal/AMD as it has an unexpected & major impact on performance even when light trees are not in use.

Patch authored by Prakash Kamliya.

Reviewed By: brecht

Maniphest Tasks: T103393

Differential Revision: https://developer.blender.org/D17167
2023-02-06 11:12:34 +00:00
Amelie Fondevilla
6d297c35c8 Fix T104371: GPencil merge down layer duplicates wrong frame
The merge down operator was sometimes copying the wrong frame, which altered the animation.
While merging the layers, it is sometimes needed to duplicate a keyframe,
when the lowest layer does not have a keyframe but the highest layer does.
Instead of duplicating the previous keyframe of the lowest layer, the code
was actually duplicating the active frame of the layer which was the current frame in the timeline.

This patch fixes the issue by setting the previous keyframe of the layer as its active frame before duplication.

Related issue: T104371.

Differential Revision: https://developer.blender.org/D17214
2023-02-06 10:44:17 +01:00
329eeacc66 Cleanup: Cycles: Remove isotropic microfacet closure setup functions
Turns out these are 100% redundant, so get rid of them.
2023-02-06 04:26:36 +01:00
2627635ff3 Cleanup: use nullptr in C++ 2023-02-06 12:50:34 +11:00
d6b6050e5b Cleanup: use function style casts in C++ 2023-02-06 12:35:51 +11:00
731c3efd97 Cleanup: format 2023-02-06 12:32:45 +11:00
9f5c17f4af Cleanup: comments in code 2023-02-06 12:25:04 +11:00
4fcc9f5e7e Cleanup: use back-slash doxygen commands, de-duplicate doc-string 2023-02-06 12:25:04 +11:00
7de1a4d1d8 Fix GHOST/Wayland thread-unsafe timer-manager manipulation
Mutex locks for manipulating GHOST_System::m_timerManager from
GHOST_SystemWayland relied on WAYLAND being the only user of the
timer-manager.

This isn't the case as timers are fired from
`GHOST_System::dispatchEvents`.

Resolve by using a separate timer-manager for wayland key-repeat timers.
2023-02-06 12:25:04 +11:00
d3949a4fdb Fix GHOST/Wayland thread-unsafe key-repeat timer checks
Resolve a thread safety issue reported by valgrind's helgrind checker,
although I wasn't able to redo the error in practice.

NULL check on the key-repeat timer also needs to lock, otherwise it's
possible the timer is set in another thread before the lock is acquired.

Now all key-repeat timer access which may run from a thread
locks the timer mutex before any checks or timer manipulation.
2023-02-06 12:25:04 +11:00
b642dc7bc7 Fix: Incorrect forward-compatible saving of face sets
There were two errors with the function used to convert face sets
to the legacy mesh format for keeping forward compatibility:
- It was moved before `CustomData_blend_write_prepare` so it
  operated on an empty span.
- It modified the mesh when it's only supposed to change the copy
  of the layers written to the file.

Differential Revision: https://developer.blender.org/D17210
2023-02-05 18:09:22 -05:00
501352ef05 Cleanup: Move PBVH files to C++
For continued refactoring of the Mesh data structure. See T103343.
2023-02-05 17:36:47 -05:00
3dd3a0f02e temp-sculpt-roll-mapping: Fix deadlock 2023-01-24 08:54:12 -08:00
3ca4ec3857 Merge branch 'master' into temp-sculpt-roll-mapping 2023-01-24 08:30:06 -08:00
62af8ae57b Merge branch 'master' into temp-sculpt-roll-mapping 2023-01-12 19:34:16 -08:00
bddb691dfc temp-sculpt-roll-mapping: Fix texture rotation bugs 2023-01-03 10:34:20 -08:00
a489b73a38 Merge branch 'master' into temp-sculpt-roll-mapping 2023-01-03 09:18:36 -08:00
a238533550 temp-sculpt-roll-mapping: Visualization improvements
* Only draw curve line at start of stroke
* Fixed dash
2022-12-30 15:12:22 -08:00
4faa5e30a5 Merge remote-tracking branch 'origin' into temp-sculpt-roll-mapping 2022-12-30 14:22:01 -08:00
2157115fa6 temp-sculpt-roll-mapping: Fix merge errors 2022-12-30 14:21:26 -08:00
8931df291a Merge branch 'master' into temp-sculpt-roll-mapping 2022-12-29 12:25:05 -08:00
d15324f1c3 temp-sculpt-roll-mapping: update more comments 2022-12-18 05:45:37 -08:00
90b633967b Merge remote-tracking branch 'origin' into temp-sculpt-roll-mapping 2022-12-17 20:58:31 -08:00
e9b573e11d temp-sculpt-roll-mapping: Cleanup some comments 2022-12-17 20:58:10 -08:00
ce56101897 Merge branch 'master' into temp-sculpt-roll-mapping 2022-12-15 17:56:57 -08:00
733a764b07 temp-sculpt-roll-mapping: Implement symmetry/tile/radial modes
* Note that tile + radial does not work
* Also cleaned up the code a bit.
2022-11-20 00:58:27 -08:00
999ba46d11 Merge remote-tracking branch 'origin' into temp-sculpt-roll-mapping 2022-11-20 00:01:41 -08:00
1fa19af508 Merge remote-tracking branch 'origin' into temp-sculpt-roll-mapping 2022-11-17 16:50:53 -08:00
997ff01503 Merge branch 'master' into temp-sculpt-roll-mapping 2022-11-17 01:02:26 -08:00
88f2350c34 temp-sculpt-roll-mapping: Update comment 2022-11-17 00:43:07 -08:00
33a1472d4e temp-sculpt-roll-mapping: New strategy for spline tex mapping
Decomposing into a quadratic spline did not work.  Having
to test each segment individually for the closest point test
wasn't worth it.

New strategy is to (roughly) find the inflection points on
the curve and binary search them for the closest
point test.
2022-11-17 00:06:26 -08:00
7a6e2d1e39 temp-sculpt-roll-mapping: remove debug draw points 2022-11-16 00:45:02 -08:00
f8afdc971f temp-sculpt-roll-mappings: Experimental quadratic spline code
Doesn't quite work yet.  The idea is to directly solve for
curve texture coordinates using a quadratic spline.
2022-11-16 00:40:46 -08:00
9a8c4aefb3 temp-sculpt-roll-mapping: mirror symmetry fixes 2022-11-15 10:40:09 -08:00
af4048f2b8 temp-sculpt-roll-mapping: Fix stroke points projection outside of the
mesh
2022-11-15 09:13:31 -08:00
eecd4b69f2 Sculpt: Fix mask from cavity settings issues
Mask from cavity can now pull settings from three
places: the operator properties, scene tool settings
or the brush.  This is needed to make the "create mask"
button work as expected.
2022-11-15 08:44:03 -08:00
12c5ae7e06 GPU: Update Vulkan backend with latest API changes.
UniformBuf::bind_as_ssbo has been introduced.
2022-11-15 08:44:02 -08:00
852a9ffa3d Fix T102482: Crash loading geometry nodes assets after file load
Asset library data is destructed on file load. Asset lists (weak and
hopefully temporary design) contain pointers into it that would dangle
then. Make sure the asset lists are destructed before the asset library
data.
2022-11-15 08:44:02 -08:00
893565268d Fix T102512: Snapping Option Face Nearest is not working
Caused by misuse of the `GS` macro.

Error in rB8f4e52b7e0dd
2022-11-15 08:44:02 -08:00
Philipp Oeser
00113d6888 Report messages from override ID template to the UI.
Previously these were only written to the console with no UI feedback
whatsoever. Just a bit nicer to give the user some info that something
went wrong.

See also T102495.

Differential Revision: https://developer.blender.org/D15732
2022-11-15 08:44:02 -08:00
9826204157 Fix WM_report not printing into console.
Also make it print warnings, not only errors.

Related to D15732.
2022-11-15 08:44:01 -08:00
295dc61334 Fix T102495: Fix UI-invisible warning and failing liboverride creation in IDTemplate in some cases.
Replace `RNA_warning` uage by `WM_report`.

And allow creating liboverrides of linked obdata used by purely local
objects.
2022-11-15 08:44:01 -08:00
fc4c8a3647 Cleanup: Move OptiX denoiser code from device into denoiser class
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).

Differential Revision: https://developer.blender.org/D16502
2022-11-15 08:44:01 -08:00
2e4ed11dd3 Fix (unreported) broken args handling in install_deps.
Issue likely introduced when openpgl lib was added a few months ago.
2022-11-15 08:44:01 -08:00
f46b932c59 GPU: UniformBuffer: Add possibility to bind as SSBO
This way UBOs can be modified directly in shader just like VBOs and IBOs.
2022-11-15 08:44:01 -08:00
d67cf5d288 GPU: State: Add GPU_BARRIER_UNIFORM
This allows to synchronise uniform buffer writes from compute shader
when an UBO is bound as SSBO.
2022-11-15 08:44:00 -08:00
0a16677741 GPU: Enabled Metal test cases.
This commit enabled the metal gpu backend test cases. These test cases
will currently fail, but are by default disabled.
2022-11-15 08:44:00 -08:00
f37cbba8f1 GPU: Improve Codegen variable names
Include the node name and parameter index in the variable name for easier debugging.
(Enabled for debug builds only)

Reviewed By: fclem, jbakker

Differential Revision: https://developer.blender.org/D16496
2022-11-15 08:44:00 -08:00
e2449adfcd Fix T101562: GPU: Update fmod to match Cycles/OSL behavior
Implementation from:
https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/blob/main/src/include/OSL/dual.h#L1283
https://github.com/OpenImageIO/oiio/blob/master/src/include/OpenImageIO/fmath.h#L1605

Reviewed By: fclem

Maniphest Tasks: T101562

Differential Revision: https://developer.blender.org/D16497
2022-11-15 08:44:00 -08:00
63e630b703 install_deps: Update openpgl to 0.4.1.
Ref. T101403.
2022-11-15 08:44:00 -08:00
c2499366f2 Cleanup: fix compile error in leak detection utility 2022-11-15 08:43:59 -08:00
b30e20d0d9 Fix T101536: missing node tree updates after remapping id
The main problem is that the node tree was not properly updated
after a property in the tree changed. More specifically, the collection
pointer in the Collection Info node was cleared, but the node tree
was not updated after that (usually this is handled by rna updates).

Differential Revision: https://developer.blender.org/D16289
2022-11-15 08:43:59 -08:00
Damien Picard
51976a9e4c I18n: fix UI layout operator context extraction
Some operator layout buttons with custom text did not get the proper
context: they'd get * instead of Operator.

This was probably never noticed because the only operator that
actually had this issue was Import Images as Planes in the Add menu.

Reviewed By: mont29

Differential Revision: https://developer.blender.org/D15993
2022-11-15 08:43:59 -08:00
Damien Picard
6749cec22f I18n: fix description_from_data_path() for property assignment
The property assignment operators' tooltips were never actually
translated when they were constructed dynamically from the description
in the prop's RNA.

This was visible when using such operators in menus (example I found
was the Marker Settings, Shift + E in the Movie Clip Editor).

"%s: %s" is already extracted elsewhere, might as well use it.

Reviewed By: mont29

Differential Revision: https://developer.blender.org/D16439
2022-11-15 08:43:59 -08:00
Damien Picard
f5877da993 I18n: improve keymap translations
- The label for modal keymaps was extracted but did not use the proper
  context on translation.
- Same goes for modal keymap items.
- Extract the UI messages from rna_keymap_ui.py
- Translate global keymap names.
- Use the proper context in the status bar for the tool prompt operator

Ref T102071

Maniphest Tasks: T102071

Differential Revision: https://developer.blender.org/D16348
2022-11-15 08:43:59 -08:00
42dd5cc430 Fix T101775: grease pencil keyframe not filtered in dopesheet summary
Grease pencil data keyframes were listed twice in the summary.

First by the generic object data listing,
which did not handle properly grease pencil objects,
and did not account for the grease pencil filter.
Second by the specific grease pencil function.

Now only the second call is made,
and the filter hides keyframes in summary as well.

Reviewed By : Jeroen Bakker, Falk David

Differential Revision: https://developer.blender.org/D16369
2022-11-15 08:43:58 -08:00
e16f473523 Animation: rearrange grease pencil channels in the main dopesheet
Operations to rearrange channels in the main dopesheet
did not cover grease pencil layer channels.
Now grease pencil layer channels can be moved up and down
in the main dopesheet just like other channels.

Reviewed By: Sybren A. Stüvel

Differential Revision: https://developer.blender.org/D15542
2022-11-15 08:43:58 -08:00
a97e2fd4e6 Fix hair/curve drawing artifacts when workarounds are enabled.
Regression introduced by {rB601995c3b86986cf8f8e5b6e5a65bcfa7f8f2e32}.
Noticed by Heist project as they render final frames with workarounds enabled.

The mentioned patch introduces attaching VBO as textures. This used to be
done by the caller. The mechanism used a different order hence the VBO
could still be unbound when using. This cannot be solved inside the new
mechanism clearly so this patch will just bind when the buffer isn't bound
just before the drawing command is sent to the GPU driver.
2022-11-15 08:43:58 -08:00
b0e48cb936 Cleanup: format 2022-11-15 08:43:58 -08:00
dc14353791 Cleanup: quiet unused variable warning 2022-11-15 08:43:57 -08:00
03218f7b45 GHOST/Wayland: limit the size of the events_pending vector
Avoid keeping allocated overly large events_pending vector in case of
long delays between processing events.

While in practice this isn't likely to cause problems, it's better to
avoid keeping unnecessarily large allocations.

Also remove invalid comment.
2022-11-15 08:43:57 -08:00
472762f0dc Fix T100855: Input while Blender is unresponsive exits under Wayland
Consume events in a thread to prevent Wayland's event buffer from
overflowing Waylands internal buffer and closing the connection.
From a users perspective this seemed like a crash.

Details:

- This is a workaround for a known bug in Wayland [0].
  Threaded event handling has been if-defed so it can be removed when
  it's no longer needed.

- GTK & QT use threaded event handling to avoid this problem
  (SDL on the other hand doesn't).

- The complexity and number of locks needed to handle events in a
  separate thread is a significant down-side, but as far as I can see
  this is necessary.

- Re-connecting to the Wayland server is possible but not practical as
  the OpenGL context is lost and as far as I can tell it's not possible
  to keep it active (see: D16492).

[0]: https://gitlab.freedesktop.org/wayland/wayland/-/issues/159
2022-11-15 08:43:57 -08:00
6868d7b74a GHOST/Wayland: share logic for xdg/libdecor window configuration
Split frame & frame_pending structs, making window actions simpler
to reason about.
2022-11-15 08:43:57 -08:00
a3c4e28df9 GHOST/Wayland: only use a single swapBuffers for libdecor redrawing
Using a single draw works in my tests and I couldn't reproduce the
issue noted in the comment.

Also apply minor cleanup, assigning a variable before calling methods to
reduce diff-noise in planned changes.
2022-11-15 08:43:57 -08:00
Iliya Katueshenock
444ce7c3b7 Geometry Nodes: Image Info Node
This commit adds a new "Image Info" node to retrieve various
information from an image like its width, height, and whether
it has an alpha channel. It is also possible to retrieve the FPS
and frame count of video files.

Differential Revision: https://developer.blender.org/D15042
2022-11-15 08:43:56 -08:00
Mattias Fredriksson
24044c64ba Partial Fix T102428: Invertible Bezier curve trim within one segment
Adjusts behavior for trimming Bezier curves, specifically the outer
Bezier handles for the endpoints which do not influence the actual
curve. Handles are only adjusted if they lie within the same segment
with at most one endpoint being a control point (unless they are the
same in which handles are set to the point itself). The result yields
a curve in which the trim result can be inverted by re-setting the
cyclic property for the curve using 'Set Spline Cyclic' node
(iff both trim endpoints lie within a segment).

Differential Revision: https://developer.blender.org/D16488
2022-11-15 08:43:56 -08:00
Laurynas Duburas
9dd0899298 Fix: Versioning problem with cyclic bezier NURBS
Patch fixes versioning issue with NURBS files saved with Blender
version from commit 45d038181a to 0602852860 and opened with
Blender version from commit 0602852860. Cyclic Bezier NURBS
saved and then opened with Blender versions mentioned above
changed their shape.

Bug was reported in comments of T101160, circle problem.

Differential Revision: https://developer.blender.org/D16503
2022-11-15 08:43:56 -08:00
0783789a95 Fix T102502: Collada import sets incorrect material indices
The weird code dealing with `MeshPrimitive` didn't increment the
material indices pointer for geometry types besides triangle fans.
Also use a proper accessor to avoid adding a duplicate material
indices attribute, just in case this code is used on existing meshes.
2022-11-15 08:43:56 -08:00
7e1898cfa7 Cleanup: Resolve unused variable warning in draw module 2022-11-15 08:43:56 -08:00
a67f703ff3 Fix T100491: Mouse selection is inaccurate in NLA Editor
Selection range is +/-7 pixels to actual clicked position, but strip selection
was biased towards rightmost strip.

To make selection more intuitive, select closest strip to clicked position, and
stop iterating when strip intersects clicked pixel.

Reviewed By: sybren

Differential Revision: https://developer.blender.org/D15728
2022-11-15 08:43:56 -08:00
a504c9a69e Python API: add Image.save() optional parameters quality and filepath
To override the default quality and filepath. After changes to unify the image
operator and this method, the quality changed from 75 to 90 to make both
consistent. However there was no way to lower the quality to match the previous
behavior, this adds support for that.

Ref T102421
2022-11-15 08:43:55 -08:00
e86612d442 Fix T102421: Image.save() not respecting set image.file_format
The default file type for new ImBuf is already PNG, no need to override it
again in the save method.
2022-11-15 08:43:55 -08:00
63d4f862ce Cleanup: BVH utils: Remove print, use spans instead of pointers 2022-11-15 08:43:55 -08:00
8a19fcc500 Cycles: Enable MetalRT pointclouds & other fixes
Code authored by Marco Giordano.

This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs:
- Incorrect kernel hashing
- Missing specialisation constants
- Incorrect visibility filtering
- Missing null pointer check

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16499
2022-11-15 08:43:55 -08:00
ed4e04a1f6 Usual minor fixes to UI messages & i18m utils module. 2022-11-15 08:43:55 -08:00
96e93cc70c Mesh: Avoid calculating normals when building BVH tree
Though they are sometimes used by users of the BVH tree, mostly
vertex normals when building the BVH tree is unnecessary. Skip it
instead and avoid storing the vertex normals in the BVH tree cache.
They are just calculated in the few places they are actually needed.
This should save at least a few percent of the runtime in some cases
where the normals weren't needed otherwise.
2022-11-15 08:43:54 -08:00
85dc681fc3 Cleanup: Reduce indentation and variable scope in BVH utils 2022-11-15 08:43:54 -08:00
e442a17587 Cleanup: Use C++ BitVector instead of BLI_bitmap for BVH utils
This gives a friendlier interface, an inline buffer, RAII, etc.
Also switch some BMesh functions that were only used by the snap
system's use of BVH utils.
2022-11-15 08:43:54 -08:00
f4e4a0b5b3 Cleanup: Allow using C++ features in BMesh header functions
Generally the `extern "C" {` brackets shouldn't be added around other
headers since it causes problems when using C++ features in them.
Follow that convention for the "bmesh.h" header.
2022-11-15 08:43:54 -08:00
dd669dd703 Cleanup: Move bmesh_iterators.c to C++ 2022-11-15 08:43:53 -08:00
adf4dd8355 Cleanup: Braces around initialization warning
There is no need to initialize the bounding box as it is fully
initialized by the BKE_boundbox_init_from_minmax().
2022-11-15 08:43:53 -08:00
Ejner Fergo
9b36b846f0 Gizmo toggle for Movie Clip Editor
This patch adds a "Show Gizmo" toggle to the Movie Clip Editor header, for consistency with other editors.

{F13892765}

Differential Revision: https://developer.blender.org/D16437
2022-11-15 08:43:53 -08:00
Mikhail Matrosov
9d74da3352 Cycles: improve adaptive sampling for overexposed scenes
Render time is reduced for overexposed scenes, by taking into account absolute
light intensity for adaptive sampling.

This can negatively affect some scenes where compositing or color management
are used to make the scene much darker or lighter. For best results adjust the
Film > Exposure setting to bring the intensity into a good range, and then do
further compositing and color management on top of that. Note that this setting
is different than color management exposure.

Previously Cycles' adaptive sampling used sqrt(I) to normalize noise level to
conform to a viewer's eye sensitivity. It is great for darker regions of the
image, but also requests too much samples in bright regions, sometimes several
times more than needed. Highlights can tolerate more noise because in most
examples it is still less noticeable then the noise in darker areas in the same
render.

Differential Revision: https://developer.blender.org/D16392
2022-11-15 08:43:52 -08:00
a72f590d51 Fix image datablock losing movie metadata on undo
The anims data is a runtime cache similar to the image buffer or GPU texture
and needs to be preserved through undo in the same way.

Found as part of D15042 development.
2022-11-15 08:43:52 -08:00
Colin Basnett
6b590a86a6 UI: disable curve map & profile zoom buttons at max/min zoom level
Disable the zoom in and out buttons on the when they would have no effect.

This also removes an incorrect comment that indicates the maximum zoom level
was 20x when in fact it was 25x.

Differential Revision: https://developer.blender.org/D16252
2022-11-15 08:43:52 -08:00
48527a5c2d DRW: Fix compilation issues in inline functions 2022-11-15 08:43:52 -08:00
Iliya Katueshenock
94e761e263 Fix: incorrect field dependencies in the raycast node
This does not change the result when evaluating the node.

Differential Revision: https://developer.blender.org/D16325
2022-11-15 08:43:51 -08:00
85cec1deff Asset system: New asset system code module (with files from BKE)
Adds a new `source/blender/asset_system` directory and moves asset
related files from BKE to it. More asset related code can follow
(e.g. asset indexing, ED_assetlist stuff) but needs further work to
untangle it. I also kept `BKE_asset.h` and `asset.cc` as is, since they
deal with asset DNA data mostly, thus make sense in BKE.

Motivation:
- Makes the asset system design more present (term wasn't even used in
  code before).
- An `asset_system` directory is quite descriptive (trivial to identify
  core asset system features) and makes it easy to find asset code.
- Asset system is mostly runtime data, with little relation to other
  `Main`/BKE/DNA types.
- There's a lot of stuff in BKE already. It shouldn't be just a dump for
  all stuff that seems core enough.
- Being its own directly helps us be more mindful about encapsulating
  the module well, and avoiding dependencies on other modules.
- We can be more free with splitting files here than in BKE.
- In future there might be an asset system BPY module, which would then
  map quite nicely to the `asset_system` directory.

Checked with some other core devs, consensus seems that this makes
sense.
2022-11-15 08:43:51 -08:00
165daf0bc6 Fix T98989: Performance regression when using multiple bump nodes
Ensure each graph material_function only evaluates the input links that are connected to it.

Differential Revision: https://developer.blender.org/D16425
2022-11-15 08:43:51 -08:00
aed5a62db4 GHOST/Wayland: fix building when GHOST_OPENGL_ALPHA is defined 2022-11-15 08:43:51 -08:00
0c1b536c57 DRW: View: Add base for multi-view support
This implements the base needed for supporting multiple view concurently
inside the same drawcall.

The view used by common macros and view related functions is indexed using
a global variable `drw_view_id` which can be set arbitrarly or read
from the `drw_ResourceID`.

This is needed for EEVEE-Next shadow but can be used for other purpose
in the future.

Note that a shader specialization is needed for it to work. `DRW_VIEW_LEN`
needs to be defined to the amount of view the shader will access.

The number of views contained in a `draw::View` is set at construction
time.

Note that the maximum number of object correctly drawn by the shaders
using multiple views will be lower than thoses who don't.
2022-11-15 08:43:51 -08:00
903c25f097 Cleanup: DRW: Remove two clang-tidy warnings 2022-11-15 08:43:50 -08:00
6e2fc39eb7 Cleanup: Move mesh_remap.c to C++
To facilitate further mesh data structure refactoring.
2022-11-15 08:43:50 -08:00
f2f1bbdea3 UV: fix crash with uv copy on empty selection
Introduced in 721fc9c1c9
2022-11-15 08:43:50 -08:00
96dfc8be8e GHOST/Wayland: call exit() when Wayland has a fatal error
Without this, a fatal error simply floods the stderr with the same
message without exiting.

Also add note on why reconnecting to the display server isn't practical.
2022-11-15 08:43:50 -08:00
4d57da0dd0 GHOST/Wayland: remove display listener (was warning on startup)
This already has a default listener which uses waylands logging.
2022-11-15 08:43:50 -08:00
9eb885d029 Cleanup: Disable mesh normal debug time printing
Left enabled mistakenly by d63ada602d.
2022-11-15 08:43:49 -08:00
e2376eceb5 Cleanup: Move lineart_cpu.c to C++
To enable further mesh data structure refactoring-- access to loose
edges in particular.
2022-11-15 08:43:49 -08:00
f9947b50c6 UV: fix compile on windows
Remove VLAs for compiling on windows.
Regression from 721fc9c1c9
2022-11-15 08:43:49 -08:00
812de962a3 Fix T95335 Bevel operator Loop Slide overshoot.
If the edge you are going to slide along is very close to in line
with the adjacent beveled edge, then there will be sharp overshoots.
There is an epsilon comparison to just abandon loop slide if this
situation is happening. That epsilon used to be 0.25 radians, but
bug T86768 complained that that value was too high, so it was changed
to .0001 radians (5 millidegrees). Now this current bug shows that
that was too aggressively small, so this change ups it by a factor
of 10, to .001 radians (5 centidegrees). All previous bug reports
remained fixed.
2022-11-15 08:43:49 -08:00
8bea9693df Fix T102187: Add knife tool in mesh panel
Add knife tool option in mesh panel

Reviewer: campbellbarton, JulienKaspar

Differential Revision: https://developer.blender.org/D16395
2022-11-15 08:43:49 -08:00
035e040ee2 DRW: Manager: Fix ClearMulti breaking compilation on Mac
The error was:
`draw_pass.hh:1055:16: error: call to implicitly-deleted default constructor of 'blender::draw::command::Undetermined [3]'
2022-11-15 08:43:48 -08:00
da33e57e39 BLI: Fix ListBaseWrapper::get wrong return type 2022-11-15 08:43:48 -08:00
6ee2f9dc15 DRW: Manager: Add bind_texture command for vertex buffer
This allows the same behavior as with `DRW_shgroup_buffer_texture`.
2022-11-15 08:43:48 -08:00
453a0c430b DRW: Manager: Add ClearMulti command
Allows to record `GPU_framebuffer_multi_clear` inside `draw::Pass`.
2022-11-15 08:43:48 -08:00
94a35be532 DRW: Manager: Finish / change implementation of framebuffer_set command
Use reference instead of direct pointer. This is because framebuffers
often use temp textures and are configured later just before submission.
2022-11-15 08:43:47 -08:00
4efef2fccd DRW: Wrappers: Allow taking reference of the framebuffer object
This is in order to make it work with the new `framebuffer_set` command
which requires a `GPUFrameBuffer **`.
2022-11-15 08:43:47 -08:00
a80e4f4bc9 DRW: Wrappers: Allow trivial types inside draw::SwapChain
This allows to use pointers and such other trivial types which cannot
implement the `swap` mehod.
2022-11-15 08:43:47 -08:00
8e40e6400a DRW: Wrappers: Avoid default vector length of 0 if sizeof(T) is large
This increases the default size to some reasonable value (>512bytes) and
allocate at least 1 element.
2022-11-15 08:43:47 -08:00
9f1560a9a4 Cleanup: fix compiler error/warnings 2022-11-15 08:43:47 -08:00
dce36e3e80 UV: implement copy and paste for uv
Implement a new topology-based copy and paste solution for UVs.

Usage notes:

* Open the UV Editor

* Use the selection tools to select a Quad joined to a Triangle joined to another Quad.
* From the menu, choose UV > UV Copy
 * The UV co-ordinates for your quad<=>tri<=>quad are now stored internally

* Use the selection tools to select a different Quad joined to a Triangle joined to a Quad.
* (Optional) From the menu, choose UV > Split > Selection

* From the menu, choose UV > UV Paste
 * The UV co-ordinates for the new selection will be moved to match the stored UVs.

Repeat selection / UV Paste steps as many times as desired.
For performance considerations, see https://en.wikipedia.org/wiki/Graph_isomorphism_problem

In theory, UV Copy and Paste should work with all UV selection modes.
Please report any problems.

A copy has been made of the Graph Isomorphism code from https://github.com/stefanoquer/graphISO
Copyright (c) 2019 Stefano Quer stefano.quer@polito.it GPL v3 or later.

Additional integration code Copyright (c) 2022 by Blender Foundation, GPL v2 or later.

Maniphest Tasks: T77911
Differential Revision: https://developer.blender.org/D16278
2022-11-15 08:43:46 -08:00
be9bac1f0d Fix: OpenSubdiv reporting version 0.0.0 in system_info.txt
OSD Lists as 0, 0, 0 this is due to opensubdiv_capi.cc not actually including
the OSD version header, so it's not getting the version define, and the code
in openSubdiv_getVersionHex is really well prepared to deal with any or no
version at all of OSD, catches the problem and returns 0, 0, 0

Given this file is only build when OSD is enabled we can just blindly include
opensubdiv/version.h here

Reviewed by: brecht
Differential Revision: https://developer.blender.org/D16398
2022-11-15 08:43:46 -08:00
0f06080bc3 Cleanup: Move cloth.c to C++
To support further mesh data structure refactoring.
2022-11-15 08:43:46 -08:00
b5d4fbb9cc BLI: improve CPPType system
* Support bidirectional type lookups. E.g. finding the base type of a
  field was supported, but not the other way around. This also removes
  the todo in `get_vector_type`. To achieve this, types have to be
  registered up-front.
* Separate `CPPType` from other "type traits". For example, previously
  `ValueOrFieldCPPType` adds additional behavior on top of `CPPType`.
  Previously, it was a subclass, now it just contains a reference to the
  `CPPType` it corresponds to. This follows the composition-over-inheritance
  idea. This makes it easier to have self-contained "type traits" without
  having to put everything into `CPPType`.

Differential Revision: https://developer.blender.org/D16479
2022-11-15 08:43:45 -08:00
431d527679 Fix: Curves sculptmode: deduplicate checks for being in mode
rBa8f7d41d3898 added a "duplicate" check for being in curves sculptmode
unnecessarily afaict (`in_sculpt_curve_mode` in addition to the
previously existing `in_curves_sculpt_mode`).
Over time, the later evolved to also take into account the output of a
viewer node, see rBc55d38f00b8c (the previously existing
`in_curves_sculpt_mode` did not receive this).

This all results in the fact that selection is not drawn with a viewer
node (can be useful though, and there are separate opacity controls for
both selection and the viewer attribute, so these can be used/blended to
everyones liking).

So now deduplicate the check.

Differential Revision: https://developer.blender.org/D16467
2022-11-15 08:43:45 -08:00
Iliya Katueshenock
d5dff81cf6 BLI: use templates for disjoint set data structure
Differential Revision: https://developer.blender.org/D16472
2022-11-15 08:43:45 -08:00
Iliya Katueshenock
c619c86662 Fix: missing tooltip for color sockets
Differential Revision: https://developer.blender.org/D16401
2022-11-15 08:43:45 -08:00
Iliya Katueshenock
dba0a2ad9a Cleanup: remove unused variable
Differential Revision: https://developer.blender.org/D16350
2022-11-15 08:43:44 -08:00
Iliya Katueshenock
2a90fc497b Fix: geometry nodes viewer shows black with dangling reroute input
Differential Revision: https://developer.blender.org/D16322
2022-11-15 08:43:44 -08:00
Iliya Katueshenock
36673016e9 Cleanup: make GArray declarations more explicit
Differential Revision: https://developer.blender.org/D16064
2022-11-15 08:43:44 -08:00
30e9c5b7f5 Cleanup: add a system reference to the wayland window
Avoid relying on GHOST_ISystem::getSystem(), store the system instead.
2022-11-15 08:43:44 -08:00
6e82f6bf2f Fix window title not redrawing with Wayland/libdecor 2022-11-15 08:43:44 -08:00
f6f05740be GHOST/Wayland: skip resizing the EGL surface unnecessarily
wl_egl_window_resize ran when the window became active/inactive for e.g.
2022-11-15 08:43:44 -08:00
a873f7299a Cleanup: move title into GWL_Window
Nearly all Wayland window data is stored in this struct,
follow this convention so GWL_Window functions can be self contained.
2022-11-15 08:43:44 -08:00
9f03d4bc2c Cleanup: move Wayland window state utilities into lower level functions
Add low level gwl_window_* functions.
2022-11-15 08:43:43 -08:00
dc2ac546cd Fix non-interactive window borders after changes to event handling
Regression in [0] causes LIBDECOR interactions not to be detected.

[0]: deb8ae6bd1
2022-11-15 08:43:43 -08:00
a3f23a321e Cleanup: use snake-case for WAYLAND utility functions
It wasn't so obvious which functions were part of the GHOST API
and which system functions were utilities.

This convention was already in place but not always followed.
2022-11-15 08:43:43 -08:00
ddecc6166b GHOST/Wayland: replace roundtrip with dispatch_pending
Add a non-blocking version wrapper for wl_display_dispatch_pending.
This uses roughly the same logic as Wayland_PumpEvents in SDL.
Noticed this when investigating T100855.

Note that performing a round-trip doesn't seem necessary from looking
into QT/GTK & SDL event handling loops.
2022-11-15 08:43:43 -08:00
9155608875 Cleanup: Simplify handling of loop to poly map in normal calculation
A Loop to poly map was passed as an optional output to the loop normal
calculation. That meant it was often recalculated more than necessary.
Instead, treat it as an optional argument. This also helps relieve
unnecessary responsibilities from the already-complicated loop normal
calculation code.
2022-11-15 08:43:43 -08:00
415ed999ab Cleanup: Use simpler timers for mesh normals debug timing 2022-11-15 08:43:43 -08:00
cfab4f432c Cleanup: Make loop normal calculation function static 2022-11-15 08:43:42 -08:00
4ac1f59a81 Cleanup: Decrease variable scope in mesh loop normal calculation 2022-11-15 08:43:42 -08:00
1ed9df3763 Cleanup: Use spans for loop normal calculation input data 2022-11-15 08:43:42 -08:00
61e591b2a5 Cleanup: Remove unnecessary struct keywords 2022-11-15 08:43:42 -08:00
c29795452c Merge branch 'master' into temp-sculpt-roll-mapping 2022-11-11 14:18:46 -08:00
9980fd0b8e temp-sculpt-roll-mapping: Port roll tex mapping code from sculpt-dev 2022-11-05 14:47:48 -07:00
524 changed files with 8515 additions and 6198 deletions

View File

@@ -1,5 +1,4 @@
This repository is only used as a mirror of git.blender.org. Blender development happens on
https://developer.blender.org.
This repository is only used as a mirror. Blender development happens on projects.blender.org.
To get started with contributing code, please see:
https://wiki.blender.org/wiki/Process/Contributing_Code

3
.github/stale.yml vendored
View File

@@ -15,8 +15,7 @@ staleLabel: stale
# Comment to post when closing a stale Issue or Pull Request.
closeComment: >
This issue has been automatically closed, because this repository is only
used as a mirror of git.blender.org. Blender development happens on
developer.blender.org.
used as a mirror. Blender development happens on projects.blender.org.
To get started contributing code, please read:
https://wiki.blender.org/wiki/Process/Contributing_Code

12
.gitmodules vendored
View File

@@ -1,20 +1,16 @@
[submodule "release/scripts/addons"]
path = release/scripts/addons
url = ../blender-addons.git
branch = master
ignore = all
branch = main
[submodule "release/scripts/addons_contrib"]
path = release/scripts/addons_contrib
url = ../blender-addons-contrib.git
branch = master
ignore = all
branch = main
[submodule "release/datafiles/locale"]
path = release/datafiles/locale
url = ../blender-translations.git
branch = master
ignore = all
branch = main
[submodule "source/tools"]
path = source/tools
url = ../blender-dev-tools.git
branch = master
ignore = all
branch = main

View File

@@ -24,7 +24,7 @@ Development
-----------
- [Build Instructions](https://wiki.blender.org/wiki/Building_Blender)
- [Code Review & Bug Tracker](https://developer.blender.org)
- [Code Review & Bug Tracker](https://projects.blender.org)
- [Developer Forum](https://devtalk.blender.org)
- [Developer Documentation](https://wiki.blender.org)

View File

@@ -40,7 +40,8 @@ ExternalProject_Add(external_igc_llvm
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/clang/0004-OpenCL-support-cl_ext_float_atomics.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/clang/0005-OpenCL-Add-cl_khr_integer_dot_product.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0001-Memory-leak-fix-for-Managed-Static-Mutex.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0002-Remove-repo-name-in-LLVM-IR.patch
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0002-Remove-repo-name-in-LLVM-IR.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0003-Add-missing-include-limit-in-benchmark.patch
)
add_dependencies(
external_igc_llvm
@@ -55,9 +56,6 @@ ExternalProject_Add(external_igc_spirv_translator
CONFIGURE_COMMAND echo .
BUILD_COMMAND echo .
INSTALL_COMMAND echo .
PATCH_COMMAND ${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0001-update-SPIR-V-headers-for-SPV_INTEL_split_barrier.patch &&
${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0002-Add-support-for-split-barriers-extension-SPV_INTEL_s.patch &&
${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0003-Support-cl_bf16_conversions.patch
)
add_dependencies(
external_igc_spirv_translator

View File

@@ -88,6 +88,19 @@ else()
export LDFLAGS=${PYTHON_LDFLAGS} &&
export PKG_CONFIG_PATH=${LIBDIR}/ffi/lib/pkgconfig)
# NOTE: untested on APPLE so far.
if(NOT APPLE)
set(PYTHON_CONFIGURE_EXTRA_ARGS
${PYTHON_CONFIGURE_EXTRA_ARGS}
# Used on most release Linux builds (Fedora for e.g.),
# increases build times noticeably with the benefit of a modest speedup at runtime.
--enable-optimizations
# While LTO is OK when building on the same system, it's incompatible across GCC versions,
# making it impractical for developers to build against, so keep it disabled.
# `--with-lto`
)
endif()
ExternalProject_Add(external_python
URL file://${PACKAGE_DIR}/${PYTHON_FILE}
DOWNLOAD_DIR ${DOWNLOAD_DIR}

View File

@@ -668,9 +668,9 @@ set(SPIRV_HEADERS_FILE SPIR-V-Headers-${SPIRV_HEADERS_VERSION}.tar.gz)
# compiler, the versions used are taken from the following location
# https://github.com/intel/intel-graphics-compiler/releases
set(IGC_VERSION 1.0.12149.1)
set(IGC_VERSION 1.0.13064.7)
set(IGC_URI https://github.com/intel/intel-graphics-compiler/archive/refs/tags/igc-${IGC_VERSION}.tar.gz)
set(IGC_HASH 44f67f24e3bc5130f9f062533abf8154782a9d0a992bc19b498639a8521ae836)
set(IGC_HASH a929abd4cca2b293961ec0437ee4b3b2147bd3b2c8a3c423af78c0c359b2e5ae)
set(IGC_HASH_TYPE SHA256)
set(IGC_FILE igc-${IGC_VERSION}.tar.gz)
@@ -690,15 +690,15 @@ set(IGC_LLVM_FILE ${IGC_LLVM_VERSION}.tar.gz)
#
# WARNING WARNING WARNING
set(IGC_OPENCL_CLANG_VERSION 363a5262d8c7cff3fb28f3bdb5d85c8d7e91c1bb)
set(IGC_OPENCL_CLANG_VERSION ee31812ea8b89d08c2918f045d11a19bd33525c5)
set(IGC_OPENCL_CLANG_URI https://github.com/intel/opencl-clang/archive/${IGC_OPENCL_CLANG_VERSION}.tar.gz)
set(IGC_OPENCL_CLANG_HASH aa8cf72bb239722ce8ce44f79413c6887ecc8ca18477dd520aa5c4809756da9a)
set(IGC_OPENCL_CLANG_HASH 1db6735bbcfaa31e8a9ba39f121d6bafa806ea8919e9f56782d6aaa67771ddda)
set(IGC_OPENCL_CLANG_HASH_TYPE SHA256)
set(IGC_OPENCL_CLANG_FILE opencl-clang-${IGC_OPENCL_CLANG_VERSION}.tar.gz)
set(IGC_VCINTRINSICS_VERSION v0.5.0)
set(IGC_VCINTRINSICS_VERSION v0.11.0)
set(IGC_VCINTRINSICS_URI https://github.com/intel/vc-intrinsics/archive/refs/tags/${IGC_VCINTRINSICS_VERSION}.tar.gz)
set(IGC_VCINTRINSICS_HASH 70bb47c5e32173cf61514941e83ae7c7eb4485e6d2fca60cfa1f50d4f42c41f2)
set(IGC_VCINTRINSICS_HASH e5acd5626ce7fa6d41ce154c50ac805eda734ee66af94ef28e680ac2ad81bb9f)
set(IGC_VCINTRINSICS_HASH_TYPE SHA256)
set(IGC_VCINTRINSICS_FILE vc-intrinsics-${IGC_VCINTRINSICS_VERSION}.tar.gz)
@@ -714,9 +714,9 @@ set(IGC_SPIRV_TOOLS_HASH 6e19900e948944243024aedd0a201baf3854b377b9cc7a386553bc1
set(IGC_SPIRV_TOOLS_HASH_TYPE SHA256)
set(IGC_SPIRV_TOOLS_FILE SPIR-V-Tools-${IGC_SPIRV_TOOLS_VERSION}.tar.gz)
set(IGC_SPIRV_TRANSLATOR_VERSION a31ffaeef77e23d500b3ea3d35e0c42ff5648ad9)
set(IGC_SPIRV_TRANSLATOR_VERSION d739c01d65ec00dee64dedd40deed805216a7193)
set(IGC_SPIRV_TRANSLATOR_URI https://github.com/KhronosGroup/SPIRV-LLVM-Translator/archive/${IGC_SPIRV_TRANSLATOR_VERSION}.tar.gz)
set(IGC_SPIRV_TRANSLATOR_HASH 9e26c96a45341b8f8af521bacea20e752623346340addd02af95d669f6e89252)
set(IGC_SPIRV_TRANSLATOR_HASH ddc0cc9ccbe59dadeaf291012d59de142b2e9f2b124dbb634644d39daddaa13e)
set(IGC_SPIRV_TRANSLATOR_HASH_TYPE SHA256)
set(IGC_SPIRV_TRANSLATOR_FILE SPIR-V-Translator-${IGC_SPIRV_TRANSLATOR_VERSION}.tar.gz)
@@ -724,15 +724,15 @@ set(IGC_SPIRV_TRANSLATOR_FILE SPIR-V-Translator-${IGC_SPIRV_TRANSLATOR_VERSION}.
### Intel Graphics Compiler DEPS END ###
########################################
set(GMMLIB_VERSION intel-gmmlib-22.1.8)
set(GMMLIB_VERSION intel-gmmlib-22.3.0)
set(GMMLIB_URI https://github.com/intel/gmmlib/archive/refs/tags/${GMMLIB_VERSION}.tar.gz)
set(GMMLIB_HASH bf23e9a3742b4fb98c7666c9e9b29f3219e4b2fb4d831aaf4eed71f5e2d17368)
set(GMMLIB_HASH c1f33e1519edfc527127baeb0436b783430dfd256c643130169a3a71dc86aff9)
set(GMMLIB_HASH_TYPE SHA256)
set(GMMLIB_FILE ${GMMLIB_VERSION}.tar.gz)
set(OCLOC_VERSION 22.38.24278)
set(OCLOC_VERSION 22.49.25018.21)
set(OCLOC_URI https://github.com/intel/compute-runtime/archive/refs/tags/${OCLOC_VERSION}.tar.gz)
set(OCLOC_HASH db0c542fccd651e6404b15a74d46027f1ce0eda8dc9e25a40cbb6c0faef257ee)
set(OCLOC_HASH 92362dae08b503a34e5d3820ed284198c452bcd5e7504d90eb69887b20492c06)
set(OCLOC_HASH_TYPE SHA256)
set(OCLOC_FILE ocloc-${OCLOC_VERSION}.tar.gz)

View File

@@ -1,7 +1,7 @@
diff -Naur external_igc_opencl_clang.orig/CMakeLists.txt external_igc_opencl_clang/CMakeLists.txt
--- external_igc_opencl_clang.orig/CMakeLists.txt 2022-03-16 05:51:10 -0600
+++ external_igc_opencl_clang/CMakeLists.txt 2022-05-23 10:40:09 -0600
@@ -126,22 +126,24 @@
@@ -147,22 +147,24 @@
)
endif()

View File

@@ -23,19 +23,19 @@ if(EXISTS ${SOURCE_DIR}/.git)
if(MY_WC_BRANCH STREQUAL "HEAD")
# Detached HEAD, check whether commit hash is reachable
# in the master branch
# in the main branch
execute_process(COMMAND git rev-parse --short=12 HEAD
WORKING_DIRECTORY ${SOURCE_DIR}
OUTPUT_VARIABLE MY_WC_HASH
OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND git branch --list master blender-v* --contains ${MY_WC_HASH}
execute_process(COMMAND git branch --list main blender-v* --contains ${MY_WC_HASH}
WORKING_DIRECTORY ${SOURCE_DIR}
OUTPUT_VARIABLE _git_contains_check
OUTPUT_STRIP_TRAILING_WHITESPACE)
if(NOT _git_contains_check STREQUAL "")
set(MY_WC_BRANCH "master")
set(MY_WC_BRANCH "main")
else()
execute_process(COMMAND git show-ref --tags -d
WORKING_DIRECTORY ${SOURCE_DIR}
@@ -48,7 +48,7 @@ if(EXISTS ${SOURCE_DIR}/.git)
OUTPUT_STRIP_TRAILING_WHITESPACE)
if(_git_tag_hashes MATCHES "${_git_head_hash}")
set(MY_WC_BRANCH "master")
set(MY_WC_BRANCH "main")
else()
execute_process(COMMAND git branch --contains ${MY_WC_HASH}
WORKING_DIRECTORY ${SOURCE_DIR}

View File

@@ -11,11 +11,11 @@
mkdir ~/blender-git
cd ~/blender-git
git clone http://git.blender.org/blender.git
git clone https://projects.blender.org/blender/blender.git
cd blender
git submodule update --init --recursive
git submodule foreach git checkout master
git submodule foreach git pull --rebase origin master
git submodule foreach git checkout main
git submodule foreach git pull --rebase origin main
# create build dir
mkdir ~/blender-git/build-cmake
@@ -35,7 +35,7 @@ ln -s ~/blender-git/build-cmake/bin/blender ~/blender-git/blender/blender.bin
echo ""
echo "* Useful Commands *"
echo " Run Blender: ~/blender-git/blender/blender.bin"
echo " Update Blender: git pull --rebase; git submodule foreach git pull --rebase origin master"
echo " Update Blender: git pull --rebase; git submodule foreach git pull --rebase origin main"
echo " Reconfigure Blender: cd ~/blender-git/build-cmake ; cmake ."
echo " Build Blender: cd ~/blender-git/build-cmake ; make"
echo ""

View File

@@ -5,16 +5,16 @@
update-code:
git:
submodules:
- branch: master
- branch: main
commit_id: HEAD
path: release/scripts/addons
- branch: master
- branch: main
commit_id: HEAD
path: release/scripts/addons_contrib
- branch: master
- branch: main
commit_id: HEAD
path: release/datafiles/locale
- branch: master
- branch: main
commit_id: HEAD
path: source/tools
svn:

View File

@@ -58,7 +58,7 @@ Each Blender release supports one Python version, and the package is only compat
## Source Code
* [Releases](https://download.blender.org/source/)
* Repository: [git.blender.org/blender.git](https://git.blender.org/gitweb/gitweb.cgi/blender.git)
* Repository: [projects.blender.org/blender/blender.git](https://projects.blender.org/blender/blender)
## Credits

View File

@@ -170,7 +170,7 @@ def git_update_skip(args: argparse.Namespace, check_remote_exists: bool = True)
return "rebase or merge in progress, complete it first"
# Abort if uncommitted changes.
changes = check_output([args.git_command, 'status', '--porcelain', '--untracked-files=no'])
changes = check_output([args.git_command, 'status', '--porcelain', '--untracked-files=no', '--ignore-submodules'])
if len(changes) != 0:
return "you have unstaged changes"
@@ -202,8 +202,8 @@ def submodules_update(
sys.exit(1)
# Update submodules to appropriate given branch,
# falling back to master if none is given and/or found in a sub-repository.
branch_fallback = "master"
# falling back to main if none is given and/or found in a sub-repository.
branch_fallback = "main"
if not branch:
branch = branch_fallback

View File

@@ -3,9 +3,9 @@ if NOT exist "%BLENDER_DIR%\source\tools\.git" (
if not "%GIT%" == "" (
"%GIT%" submodule update --init --recursive --progress
if errorlevel 1 goto FAIL
"%GIT%" submodule foreach git checkout master
"%GIT%" submodule foreach git checkout main
if errorlevel 1 goto FAIL
"%GIT%" submodule foreach git pull --rebase origin master
"%GIT%" submodule foreach git pull --rebase origin main
if errorlevel 1 goto FAIL
goto EOF
) else (

View File

@@ -37,7 +37,7 @@ def draw_callback_px(self, context):
# BLF drawing routine
font_id = font_info["font_id"]
blf.position(font_id, 2, 80, 0)
blf.size(font_id, 50, 72)
blf.size(font_id, 50)
blf.draw(font_id, "Hello World")

View File

@@ -1816,9 +1816,9 @@ def pyrna2sphinx(basepath):
# operators
def write_ops():
API_BASEURL = "https://developer.blender.org/diffusion/B/browse/master/release/scripts"
API_BASEURL_ADDON = "https://developer.blender.org/diffusion/BA"
API_BASEURL_ADDON_CONTRIB = "https://developer.blender.org/diffusion/BAC"
API_BASEURL = "https://projects.blender.org/blender/blender/src/branch/main/release/scripts"
API_BASEURL_ADDON = "https://projects.blender.org/blender/blender-addons"
API_BASEURL_ADDON_CONTRIB = "https://projects.blender.org/blender/blender-addons-contrib"
op_modules = {}
op = None

View File

@@ -156,7 +156,7 @@ var Popover = function() {
},
getNamed : function(v) {
$.each(all_versions, function(ix, title) {
if (ix === "master" || ix === "latest") {
if (ix === "master" || ix === "main" || ix === "latest") {
var m = title.match(/\d\.\d[\w\d\.]*/)[0];
if (parseFloat(m) == v) {
v = ix;

View File

@@ -1,5 +1,5 @@
Project: Blender
URL: https://git.blender.org/blender.git
URL: https://projects.blender.org/blender/blender.git
License: Apache 2.0
Upstream version: N/A
Local modifications: None

View File

@@ -12,6 +12,7 @@ from bpy.props import (
PointerProperty,
StringProperty,
)
from bpy.app.translations import pgettext_iface as iface_
from math import pi
@@ -1664,30 +1665,48 @@ class CyclesPreferences(bpy.types.AddonPreferences):
col.label(text="No compatible GPUs found for Cycles", icon='INFO')
if device_type == 'CUDA':
col.label(text="Requires NVIDIA GPU with compute capability 3.0", icon='BLANK1')
compute_capability = "3.0"
col.label(text=iface_("Requires NVIDIA GPU with compute capability %s") % compute_capability,
icon='BLANK1', translate=False)
elif device_type == 'OPTIX':
col.label(text="Requires NVIDIA GPU with compute capability 5.0", icon='BLANK1')
col.label(text="and NVIDIA driver version 470 or newer", icon='BLANK1')
compute_capability = "5.0"
driver_version = "470"
col.label(text=iface_("Requires NVIDIA GPU with compute capability %s") % compute_capability,
icon='BLANK1', translate=False)
col.label(text="and NVIDIA driver version %s or newer" % driver_version,
icon='BLANK1', translate=False)
elif device_type == 'HIP':
import sys
if sys.platform[:3] == "win":
driver_version = "21.Q4"
col.label(text="Requires AMD GPU with RDNA architecture", icon='BLANK1')
col.label(text="and AMD Radeon Pro 21.Q4 driver or newer", icon='BLANK1')
col.label(text=iface_("and AMD Radeon Pro %s driver or newer") % driver_version,
icon='BLANK1', translate=False)
elif sys.platform.startswith("linux"):
driver_version = "22.10"
col.label(text="Requires AMD GPU with RDNA architecture", icon='BLANK1')
col.label(text="and AMD driver version 22.10 or newer", icon='BLANK1')
col.label(text=iface_("and AMD driver version %s or newer") % driver_version, icon='BLANK1',
translate=False)
elif device_type == 'ONEAPI':
import sys
if sys.platform.startswith("win"):
driver_version = "101.4032"
col.label(text="Requires Intel GPU with Xe-HPG architecture", icon='BLANK1')
col.label(text="and Windows driver version 101.4032 or newer", icon='BLANK1')
col.label(text=iface_("and Windows driver version %s or newer") % driver_version,
icon='BLANK1', translate=False)
elif sys.platform.startswith("linux"):
driver_version = "1.3.24931"
col.label(text="Requires Intel GPU with Xe-HPG architecture and", icon='BLANK1')
col.label(text=" - intel-level-zero-gpu version 1.3.24931 or newer", icon='BLANK1')
col.label(text=iface_(" - intel-level-zero-gpu version %s or newer") % driver_version,
icon='BLANK1', translate=False)
col.label(text=" - oneAPI Level-Zero Loader", icon='BLANK1')
elif device_type == 'METAL':
col.label(text="Requires Apple Silicon with macOS 12.2 or newer", icon='BLANK1')
col.label(text="or AMD with macOS 12.3 or newer", icon='BLANK1')
silicon_mac_version = "12.2"
amd_mac_version = "12.3"
col.label(text=iface_("Requires Apple Silicon with macOS %s or newer") % silicon_mac_version,
icon='BLANK1', translate=False)
col.label(text=iface_("or AMD with macOS %s or newer") % amd_mac_version, icon='BLANK1',
translate=False)
return
for device in devices:
@@ -1723,12 +1742,21 @@ class CyclesPreferences(bpy.types.AddonPreferences):
if compute_device_type == 'METAL':
import platform
# MetalRT only works on Apple Silicon at present, pending argument encoding fixes on AMD
# Kernel specialization is only viable on Apple Silicon at present due to relative compilation speed
if platform.machine() == 'arm64':
import re
is_navi_2 = False
for device in devices:
if re.search(r"((RX)|(Pro)|(PRO))\s+W?6\d00X", device.name):
is_navi_2 = True
break
# MetalRT only works on Apple Silicon and Navi2.
is_arm64 = platform.machine() == 'arm64'
if is_arm64 or is_navi_2:
col = layout.column()
col.use_property_split = True
col.prop(self, "kernel_optimization_level")
# Kernel specialization is only supported on Apple Silicon
if is_arm64:
col.prop(self, "kernel_optimization_level")
col.prop(self, "use_metalrt")
def draw(self, context):

View File

@@ -53,8 +53,12 @@ void CUDADevice::set_error(const string &error)
}
CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
: Device(info, stats, profiler), texture_info(this, "texture_info", MEM_GLOBAL)
: GPUDevice(info, stats, profiler)
{
/* Verify that base class types can be used with specific backend types */
static_assert(sizeof(texMemObject) == sizeof(CUtexObject));
static_assert(sizeof(arrayMemObject) == sizeof(CUarray));
first_error = true;
cuDevId = info.num;
@@ -65,12 +69,6 @@ CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
need_texture_info = false;
device_texture_headroom = 0;
device_working_headroom = 0;
move_texture_to_host = false;
map_host_limit = 0;
map_host_used = 0;
can_map_host = 0;
pitch_alignment = 0;
/* Initialize CUDA. */
@@ -91,8 +89,9 @@ CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
/* CU_CTX_MAP_HOST for mapping host memory when out of device memory.
* CU_CTX_LMEM_RESIZE_TO_MAX for reserving local memory ahead of render,
* so we can predict which memory to map to host. */
cuda_assert(
cuDeviceGetAttribute(&can_map_host, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, cuDevice));
int value;
cuda_assert(cuDeviceGetAttribute(&value, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, cuDevice));
can_map_host = value != 0;
cuda_assert(cuDeviceGetAttribute(
&pitch_alignment, CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT, cuDevice));
@@ -499,311 +498,57 @@ void CUDADevice::reserve_local_memory(const uint kernel_features)
# endif
}
void CUDADevice::init_host_memory()
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep is free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower so that some space is left after all
* texture memory allocations. */
device_working_headroom = 32 * 1024 * 1024LL; // 32MB
device_texture_headroom = 128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void CUDADevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
}
}
void CUDADevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(cuda_mem_map_mutex);
foreach (CUDAMemMap::value_type &pair, cuda_mem_map) {
device_memory &mem = *pair.first;
CUDAMem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple CUDA devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
CUDADevice::CUDAMem *CUDADevice::generic_alloc(device_memory &mem, size_t pitch_padding)
void CUDADevice::get_device_memory_info(size_t &total, size_t &free)
{
CUDAContextScope scope(this);
CUdeviceptr device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
CUresult mem_alloc_result = CUDA_ERROR_OUT_OF_MEMORY;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
cuMemGetInfo(&free, &total);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
cuMemGetInfo(&free, &total);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = cuMemAlloc(&device_pointer, size);
if (mem_alloc_result == CUDA_SUCCESS) {
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (mem_alloc_result != CUDA_SUCCESS && can_map_host && mem.type != MEM_DEVICE_ONLY) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = CUDA_SUCCESS;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = cuMemHostAlloc(
&shared_pointer, size, CU_MEMHOSTALLOC_DEVICEMAP | CU_MEMHOSTALLOC_WRITECOMBINED);
assert((mem_alloc_result == CUDA_SUCCESS && shared_pointer != 0) ||
(mem_alloc_result != CUDA_SUCCESS && shared_pointer == 0));
}
if (mem_alloc_result == CUDA_SUCCESS) {
cuda_assert(cuMemHostGetDevicePointer_v2(&device_pointer, shared_pointer, 0));
map_host_used += size;
status = " in host memory";
}
}
if (mem_alloc_result != CUDA_SUCCESS) {
if (mem.type == MEM_DEVICE_ONLY) {
status = " failed, out of device memory";
set_error("System is out of GPU memory");
}
else {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(cuda_mem_map_mutex);
CUDAMem *cmem = &cuda_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* CUDA memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void CUDADevice::generic_copy_to(device_memory &mem)
bool CUDADevice::alloc_device(void *&device_pointer, size_t size)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
CUDAContextScope scope(this);
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* cuMemAlloc regardless of mem.host_pointer and mem.shared_pointer, and should copy data from
* mem.host_pointer. */
thread_scoped_lock lock(cuda_mem_map_mutex);
if (!cuda_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const CUDAContextScope scope(this);
cuda_assert(
cuMemcpyHtoD((CUdeviceptr)mem.device_pointer, mem.host_pointer, mem.memory_size()));
}
CUresult mem_alloc_result = cuMemAlloc((CUdeviceptr *)&device_pointer, size);
return mem_alloc_result == CUDA_SUCCESS;
}
void CUDADevice::generic_free(device_memory &mem)
void CUDADevice::free_device(void *device_pointer)
{
if (mem.device_pointer) {
CUDAContextScope scope(this);
thread_scoped_lock lock(cuda_mem_map_mutex);
DCHECK(cuda_mem_map.find(&mem) != cuda_mem_map.end());
const CUDAMem &cmem = cuda_mem_map[&mem];
CUDAContextScope scope(this);
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
cuda_assert(cuMemFree((CUdeviceptr)device_pointer));
}
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
cuMemFreeHost(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
cuda_assert(cuMemFree(mem.device_pointer));
}
bool CUDADevice::alloc_host(void *&shared_pointer, size_t size)
{
CUDAContextScope scope(this);
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
CUresult mem_alloc_result = cuMemHostAlloc(
&shared_pointer, size, CU_MEMHOSTALLOC_DEVICEMAP | CU_MEMHOSTALLOC_WRITECOMBINED);
return mem_alloc_result == CUDA_SUCCESS;
}
cuda_mem_map.erase(cuda_mem_map.find(&mem));
}
void CUDADevice::free_host(void *shared_pointer)
{
CUDAContextScope scope(this);
cuMemFreeHost(shared_pointer);
}
bool CUDADevice::transform_host_pointer(void *&device_pointer, void *&shared_pointer)
{
CUDAContextScope scope(this);
cuda_assert(cuMemHostGetDevicePointer_v2((CUdeviceptr *)&device_pointer, shared_pointer, 0));
return true;
}
void CUDADevice::copy_host_to_device(void *device_pointer, void *host_pointer, size_t size)
{
const CUDAContextScope scope(this);
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer, host_pointer, size));
}
void CUDADevice::mem_alloc(device_memory &mem)
@@ -868,8 +613,8 @@ void CUDADevice::mem_zero(device_memory &mem)
/* If use_mapped_host of mem is false, mem.device_pointer currently refers to device memory
* regardless of mem.host_pointer and mem.shared_pointer. */
thread_scoped_lock lock(cuda_mem_map_mutex);
if (!cuda_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const CUDAContextScope scope(this);
cuda_assert(cuMemsetD8((CUdeviceptr)mem.device_pointer, 0, mem.memory_size()));
}
@@ -994,19 +739,19 @@ void CUDADevice::tex_alloc(device_texture &mem)
return;
}
CUDAMem *cmem = NULL;
Mem *cmem = NULL;
CUarray array_3d = NULL;
size_t src_pitch = mem.data_width * dsize * mem.data_elements;
size_t dst_pitch = src_pitch;
if (!mem.is_resident(this)) {
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
if (mem.data_depth > 1) {
array_3d = (CUarray)mem.device_pointer;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
dst_pitch = align_up(src_pitch, pitch_alignment);
@@ -1050,10 +795,10 @@ void CUDADevice::tex_alloc(device_texture &mem)
mem.device_size = size;
stats.mem_alloc(size);
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
/* 2D texture, using pitch aligned linear memory. */
@@ -1137,8 +882,8 @@ void CUDADevice::tex_alloc(device_texture &mem)
texDesc.filterMode = filter_mode;
texDesc.flags = CU_TRSF_NORMALIZED_COORDINATES;
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cuda_assert(cuTexObjectCreate(&cmem->texobject, &resDesc, &texDesc, NULL));
@@ -1153,9 +898,9 @@ void CUDADevice::tex_free(device_texture &mem)
{
if (mem.device_pointer) {
CUDAContextScope scope(this);
thread_scoped_lock lock(cuda_mem_map_mutex);
DCHECK(cuda_mem_map.find(&mem) != cuda_mem_map.end());
const CUDAMem &cmem = cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
if (cmem.texobject) {
/* Free bindless texture. */
@@ -1164,16 +909,16 @@ void CUDADevice::tex_free(device_texture &mem)
if (!mem.is_resident(this)) {
/* Do not free memory here, since it was allocated on a different device. */
cuda_mem_map.erase(cuda_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else if (cmem.array) {
/* Free array. */
cuArrayDestroy(cmem.array);
cuArrayDestroy(reinterpret_cast<CUarray>(cmem.array));
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
cuda_mem_map.erase(cuda_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else {
lock.unlock();

View File

@@ -21,7 +21,7 @@ CCL_NAMESPACE_BEGIN
class DeviceQueue;
class CUDADevice : public Device {
class CUDADevice : public GPUDevice {
friend class CUDAContextScope;
@@ -29,36 +29,11 @@ class CUDADevice : public Device {
CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
size_t device_texture_headroom;
size_t device_working_headroom;
bool move_texture_to_host;
size_t map_host_used;
size_t map_host_limit;
int can_map_host;
int pitch_alignment;
int cuDevId;
int cuDevArchitecture;
bool first_error;
struct CUDAMem {
CUDAMem() : texobject(0), array(0), use_mapped_host(false)
{
}
CUtexObject texobject;
CUarray array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, CUDAMem> CUDAMemMap;
CUDAMemMap cuda_mem_map;
thread_mutex cuda_mem_map_mutex;
/* Bindless Textures */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
CUDADeviceKernels kernels;
static bool have_precompiled_kernels();
@@ -88,17 +63,13 @@ class CUDADevice : public Device {
void reserve_local_memory(const uint kernel_features);
void init_host_memory();
void load_texture_info();
void move_textures_to_host(size_t size, bool for_texture);
CUDAMem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
void generic_copy_to(device_memory &mem);
void generic_free(device_memory &mem);
virtual void get_device_memory_info(size_t &total, size_t &free) override;
virtual bool alloc_device(void *&device_pointer, size_t size) override;
virtual void free_device(void *device_pointer) override;
virtual bool alloc_host(void *&shared_pointer, size_t size) override;
virtual void free_host(void *shared_pointer) override;
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) override;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) override;
void mem_alloc(device_memory &mem) override;

View File

@@ -452,6 +452,320 @@ void *Device::get_cpu_osl_memory()
return nullptr;
}
GPUDevice::~GPUDevice() noexcept(false)
{
}
bool GPUDevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
return true;
}
else {
return false;
}
}
void GPUDevice::init_host_memory(size_t preferred_texture_headroom,
size_t preferred_working_headroom)
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower than the working one so there
* is space left for it. */
device_working_headroom = preferred_working_headroom > 0 ? preferred_working_headroom :
32 * 1024 * 1024LL; // 32MB
device_texture_headroom = preferred_texture_headroom > 0 ? preferred_texture_headroom :
128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void GPUDevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(device_mem_map_mutex);
foreach (MemMap::value_type &pair, device_mem_map) {
device_memory &mem = *pair.first;
Mem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple backend devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
GPUDevice::Mem *GPUDevice::generic_alloc(device_memory &mem, size_t pitch_padding)
{
void *device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
bool mem_alloc_result = false;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
get_device_memory_info(total, free);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
get_device_memory_info(total, free);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = alloc_device(device_pointer, size);
if (mem_alloc_result) {
device_mem_in_use += size;
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (!mem_alloc_result && can_map_host && mem.type != MEM_DEVICE_ONLY) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = true;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = alloc_host(shared_pointer, size);
assert((mem_alloc_result && shared_pointer != 0) ||
(!mem_alloc_result && shared_pointer == 0));
}
if (mem_alloc_result) {
assert(transform_host_pointer(device_pointer, shared_pointer));
map_host_used += size;
status = " in host memory";
}
}
if (!mem_alloc_result) {
if (mem.type == MEM_DEVICE_ONLY) {
status = " failed, out of device memory";
set_error("System is out of GPU memory");
}
else {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(device_mem_map_mutex);
Mem *cmem = &device_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void GPUDevice::generic_free(device_memory &mem)
{
if (mem.device_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
free_host(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
free_device((void *)mem.device_pointer);
device_mem_in_use -= mem.device_size;
}
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
device_mem_map.erase(device_mem_map.find(&mem));
}
}
void GPUDevice::generic_copy_to(device_memory &mem)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* backend device allocation regardless of mem.host_pointer and mem.shared_pointer, and should
* copy data from mem.host_pointer. */
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
copy_host_to_device((void *)mem.device_pointer, mem.host_pointer, mem.memory_size());
}
}
/* DeviceInfo */
CCL_NAMESPACE_END

View File

@@ -309,6 +309,93 @@ class Device {
static uint devices_initialized_mask;
};
/* Device, which is GPU, with some common functionality for GPU backends */
class GPUDevice : public Device {
protected:
GPUDevice(const DeviceInfo &info_, Stats &stats_, Profiler &profiler_)
: Device(info_, stats_, profiler_),
texture_info(this, "texture_info", MEM_GLOBAL),
need_texture_info(false),
can_map_host(false),
map_host_used(0),
map_host_limit(0),
device_texture_headroom(0),
device_working_headroom(0),
device_mem_map(),
device_mem_map_mutex(),
move_texture_to_host(false),
device_mem_in_use(0)
{
}
public:
virtual ~GPUDevice() noexcept(false);
/* For GPUs that can use bindless textures in some way or another. */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
/* Returns true if the texture info was copied to the device (meaning, some more
* re-initialization might be needed). */
virtual bool load_texture_info();
protected:
/* Memory allocation, only accessed through device_memory. */
friend class device_memory;
bool can_map_host;
size_t map_host_used;
size_t map_host_limit;
size_t device_texture_headroom;
size_t device_working_headroom;
typedef unsigned long long texMemObject;
typedef unsigned long long arrayMemObject;
struct Mem {
Mem() : texobject(0), array(0), use_mapped_host(false)
{
}
texMemObject texobject;
arrayMemObject array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, Mem> MemMap;
MemMap device_mem_map;
thread_mutex device_mem_map_mutex;
bool move_texture_to_host;
/* Simple counter which will try to track amount of used device memory */
size_t device_mem_in_use;
virtual void init_host_memory(size_t preferred_texture_headroom = 0,
size_t preferred_working_headroom = 0);
virtual void move_textures_to_host(size_t size, bool for_texture);
/* Allocation, deallocation and copy functions, with corresponding
* support of device/host allocations. */
virtual GPUDevice::Mem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
virtual void generic_free(device_memory &mem);
virtual void generic_copy_to(device_memory &mem);
/* total - amount of device memory, free - amount of available device memory */
virtual void get_device_memory_info(size_t &total, size_t &free) = 0;
virtual bool alloc_device(void *&device_pointer, size_t size) = 0;
virtual void free_device(void *device_pointer) = 0;
virtual bool alloc_host(void *&shared_pointer, size_t size) = 0;
virtual void free_host(void *shared_pointer) = 0;
/* This function should return device pointer corresponding to shared pointer, which
* is host buffer, allocated in `alloc_host`. The function should `true`, if such
* address transformation is possible and `false` otherwise. */
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) = 0;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) = 0;
};
CCL_NAMESPACE_END
#endif /* __DEVICE_H__ */

View File

@@ -53,8 +53,12 @@ void HIPDevice::set_error(const string &error)
}
HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
: Device(info, stats, profiler), texture_info(this, "texture_info", MEM_GLOBAL)
: GPUDevice(info, stats, profiler)
{
/* Verify that base class types can be used with specific backend types */
static_assert(sizeof(texMemObject) == sizeof(hipTextureObject_t));
static_assert(sizeof(arrayMemObject) == sizeof(hArray));
first_error = true;
hipDevId = info.num;
@@ -65,12 +69,6 @@ HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
need_texture_info = false;
device_texture_headroom = 0;
device_working_headroom = 0;
move_texture_to_host = false;
map_host_limit = 0;
map_host_used = 0;
can_map_host = 0;
pitch_alignment = 0;
/* Initialize HIP. */
@@ -91,7 +89,9 @@ HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
/* hipDeviceMapHost for mapping host memory when out of device memory.
* hipDeviceLmemResizeToMax for reserving local memory ahead of render,
* so we can predict which memory to map to host. */
hip_assert(hipDeviceGetAttribute(&can_map_host, hipDeviceAttributeCanMapHostMemory, hipDevice));
int value;
hip_assert(hipDeviceGetAttribute(&value, hipDeviceAttributeCanMapHostMemory, hipDevice));
can_map_host = value != 0;
hip_assert(
hipDeviceGetAttribute(&pitch_alignment, hipDeviceAttributeTexturePitchAlignment, hipDevice));
@@ -460,305 +460,58 @@ void HIPDevice::reserve_local_memory(const uint kernel_features)
# endif
}
void HIPDevice::init_host_memory()
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep is free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower so that some space is left after all
* texture memory allocations. */
device_working_headroom = 32 * 1024 * 1024LL; // 32MB
device_texture_headroom = 128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void HIPDevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
}
}
void HIPDevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(hip_mem_map_mutex);
foreach (HIPMemMap::value_type &pair, hip_mem_map) {
device_memory &mem = *pair.first;
HIPMem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple HIP devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
HIPDevice::HIPMem *HIPDevice::generic_alloc(device_memory &mem, size_t pitch_padding)
void HIPDevice::get_device_memory_info(size_t &total, size_t &free)
{
HIPContextScope scope(this);
hipDeviceptr_t device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
hipError_t mem_alloc_result = hipErrorOutOfMemory;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
hipMemGetInfo(&free, &total);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
hipMemGetInfo(&free, &total);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = hipMalloc(&device_pointer, size);
if (mem_alloc_result == hipSuccess) {
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (mem_alloc_result != hipSuccess && can_map_host) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = hipSuccess;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = hipHostMalloc(
&shared_pointer, size, hipHostMallocMapped | hipHostMallocWriteCombined);
assert((mem_alloc_result == hipSuccess && shared_pointer != 0) ||
(mem_alloc_result != hipSuccess && shared_pointer == 0));
}
if (mem_alloc_result == hipSuccess) {
hip_assert(hipHostGetDevicePointer(&device_pointer, shared_pointer, 0));
map_host_used += size;
status = " in host memory";
}
}
if (mem_alloc_result != hipSuccess) {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(hip_mem_map_mutex);
HIPMem *cmem = &hip_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* HIP memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void HIPDevice::generic_copy_to(device_memory &mem)
bool HIPDevice::alloc_device(void *&device_pointer, size_t size)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
HIPContextScope scope(this);
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* hipMalloc regardless of mem.host_pointer and mem.shared_pointer, and should copy data from
* mem.host_pointer. */
thread_scoped_lock lock(hip_mem_map_mutex);
if (!hip_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const HIPContextScope scope(this);
hip_assert(
hipMemcpyHtoD((hipDeviceptr_t)mem.device_pointer, mem.host_pointer, mem.memory_size()));
}
hipError_t mem_alloc_result = hipMalloc((hipDeviceptr_t *)&device_pointer, size);
return mem_alloc_result == hipSuccess;
}
void HIPDevice::generic_free(device_memory &mem)
void HIPDevice::free_device(void *device_pointer)
{
if (mem.device_pointer) {
HIPContextScope scope(this);
thread_scoped_lock lock(hip_mem_map_mutex);
DCHECK(hip_mem_map.find(&mem) != hip_mem_map.end());
const HIPMem &cmem = hip_mem_map[&mem];
HIPContextScope scope(this);
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
hip_assert(hipFree((hipDeviceptr_t)device_pointer));
}
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
hipHostFree(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
hip_assert(hipFree(mem.device_pointer));
}
bool HIPDevice::alloc_host(void *&shared_pointer, size_t size)
{
HIPContextScope scope(this);
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
hipError_t mem_alloc_result = hipHostMalloc(
&shared_pointer, size, hipHostMallocMapped | hipHostMallocWriteCombined);
hip_mem_map.erase(hip_mem_map.find(&mem));
}
return mem_alloc_result == hipSuccess;
}
void HIPDevice::free_host(void *shared_pointer)
{
HIPContextScope scope(this);
hipHostFree(shared_pointer);
}
bool HIPDevice::transform_host_pointer(void *&device_pointer, void *&shared_pointer)
{
HIPContextScope scope(this);
hip_assert(hipHostGetDevicePointer((hipDeviceptr_t *)&device_pointer, shared_pointer, 0));
return true;
}
void HIPDevice::copy_host_to_device(void *device_pointer, void *host_pointer, size_t size)
{
const HIPContextScope scope(this);
hip_assert(hipMemcpyHtoD((hipDeviceptr_t)device_pointer, host_pointer, size));
}
void HIPDevice::mem_alloc(device_memory &mem)
@@ -823,8 +576,8 @@ void HIPDevice::mem_zero(device_memory &mem)
/* If use_mapped_host of mem is false, mem.device_pointer currently refers to device memory
* regardless of mem.host_pointer and mem.shared_pointer. */
thread_scoped_lock lock(hip_mem_map_mutex);
if (!hip_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const HIPContextScope scope(this);
hip_assert(hipMemsetD8((hipDeviceptr_t)mem.device_pointer, 0, mem.memory_size()));
}
@@ -951,19 +704,19 @@ void HIPDevice::tex_alloc(device_texture &mem)
return;
}
HIPMem *cmem = NULL;
Mem *cmem = NULL;
hArray array_3d = NULL;
size_t src_pitch = mem.data_width * dsize * mem.data_elements;
size_t dst_pitch = src_pitch;
if (!mem.is_resident(this)) {
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
if (mem.data_depth > 1) {
array_3d = (hArray)mem.device_pointer;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
dst_pitch = align_up(src_pitch, pitch_alignment);
@@ -1007,10 +760,10 @@ void HIPDevice::tex_alloc(device_texture &mem)
mem.device_size = size;
stats.mem_alloc(size);
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
/* 2D texture, using pitch aligned linear memory. */
@@ -1095,8 +848,8 @@ void HIPDevice::tex_alloc(device_texture &mem)
texDesc.filterMode = filter_mode;
texDesc.flags = HIP_TRSF_NORMALIZED_COORDINATES;
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
hip_assert(hipTexObjectCreate(&cmem->texobject, &resDesc, &texDesc, NULL));
@@ -1111,9 +864,9 @@ void HIPDevice::tex_free(device_texture &mem)
{
if (mem.device_pointer) {
HIPContextScope scope(this);
thread_scoped_lock lock(hip_mem_map_mutex);
DCHECK(hip_mem_map.find(&mem) != hip_mem_map.end());
const HIPMem &cmem = hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
if (cmem.texobject) {
/* Free bindless texture. */
@@ -1122,16 +875,16 @@ void HIPDevice::tex_free(device_texture &mem)
if (!mem.is_resident(this)) {
/* Do not free memory here, since it was allocated on a different device. */
hip_mem_map.erase(hip_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else if (cmem.array) {
/* Free array. */
hipArrayDestroy(cmem.array);
hipArrayDestroy(reinterpret_cast<hArray>(cmem.array));
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
hip_mem_map.erase(hip_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else {
lock.unlock();

View File

@@ -18,7 +18,7 @@ CCL_NAMESPACE_BEGIN
class DeviceQueue;
class HIPDevice : public Device {
class HIPDevice : public GPUDevice {
friend class HIPContextScope;
@@ -26,36 +26,11 @@ class HIPDevice : public Device {
hipDevice_t hipDevice;
hipCtx_t hipContext;
hipModule_t hipModule;
size_t device_texture_headroom;
size_t device_working_headroom;
bool move_texture_to_host;
size_t map_host_used;
size_t map_host_limit;
int can_map_host;
int pitch_alignment;
int hipDevId;
int hipDevArchitecture;
bool first_error;
struct HIPMem {
HIPMem() : texobject(0), array(0), use_mapped_host(false)
{
}
hipTextureObject_t texobject;
hArray array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, HIPMem> HIPMemMap;
HIPMemMap hip_mem_map;
thread_mutex hip_mem_map_mutex;
/* Bindless Textures */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
HIPDeviceKernels kernels;
static bool have_precompiled_kernels();
@@ -81,17 +56,13 @@ class HIPDevice : public Device {
virtual bool load_kernels(const uint kernel_features) override;
void reserve_local_memory(const uint kernel_features);
void init_host_memory();
void load_texture_info();
void move_textures_to_host(size_t size, bool for_texture);
HIPMem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
void generic_copy_to(device_memory &mem);
void generic_free(device_memory &mem);
virtual void get_device_memory_info(size_t &total, size_t &free) override;
virtual bool alloc_device(void *&device_pointer, size_t size) override;
virtual void free_device(void *device_pointer) override;
virtual bool alloc_host(void *&shared_pointer, size_t size) override;
virtual void free_host(void *shared_pointer) override;
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) override;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) override;
void mem_alloc(device_memory &mem) override;

View File

@@ -73,6 +73,10 @@ const char *device_kernel_as_string(DeviceKernel kernel)
return "integrator_terminated_paths_array";
case DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY:
return "integrator_sorted_paths_array";
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS:
return "integrator_sort_bucket_pass";
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS:
return "integrator_sort_write_pass";
case DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY:
return "integrator_compact_paths_array";
case DEVICE_KERNEL_INTEGRATOR_COMPACT_STATES:

View File

@@ -247,6 +247,8 @@ class device_memory {
bool is_resident(Device *sub_device) const;
protected:
friend class Device;
friend class GPUDevice;
friend class CUDADevice;
friend class OptiXDevice;
friend class HIPDevice;

View File

@@ -21,6 +21,7 @@ class BVHMetal : public BVH {
API_AVAILABLE(macos(11.0))
vector<id<MTLAccelerationStructure>> blas_array;
vector<uint32_t> blas_lookup;
bool motion_blur = false;

View File

@@ -816,6 +816,11 @@ bool BVHMetal::build_TLAS(Progress &progress,
uint32_t instance_index = 0;
uint32_t motion_transform_index = 0;
// allocate look up buffer for wost case scenario
uint64_t count = objects.size();
blas_lookup.resize(count);
for (Object *ob : objects) {
/* Skip non-traceable objects */
if (!ob->is_traceable())
@@ -843,12 +848,15 @@ bool BVHMetal::build_TLAS(Progress &progress,
/* Set user instance ID to object index */
int object_index = ob->get_device_index();
uint32_t user_id = uint32_t(object_index);
int currIndex = instance_index++;
assert(user_id < blas_lookup.size());
blas_lookup[user_id] = accel_struct_index;
/* Bake into the appropriate descriptor */
if (motion_blur) {
MTLAccelerationStructureMotionInstanceDescriptor *instances =
(MTLAccelerationStructureMotionInstanceDescriptor *)[instanceBuf contents];
MTLAccelerationStructureMotionInstanceDescriptor &desc = instances[instance_index++];
MTLAccelerationStructureMotionInstanceDescriptor &desc = instances[currIndex];
desc.accelerationStructureIndex = accel_struct_index;
desc.userID = user_id;
@@ -894,7 +902,7 @@ bool BVHMetal::build_TLAS(Progress &progress,
else {
MTLAccelerationStructureUserIDInstanceDescriptor *instances =
(MTLAccelerationStructureUserIDInstanceDescriptor *)[instanceBuf contents];
MTLAccelerationStructureUserIDInstanceDescriptor &desc = instances[instance_index++];
MTLAccelerationStructureUserIDInstanceDescriptor &desc = instances[currIndex];
desc.accelerationStructureIndex = accel_struct_index;
desc.userID = user_id;

View File

@@ -55,6 +55,10 @@ void device_metal_info(vector<DeviceInfo> &devices)
info.denoisers = DENOISER_NONE;
info.id = id;
if (MetalInfo::get_device_vendor(device) == METAL_GPU_AMD) {
info.has_light_tree = false;
}
devices.push_back(info);
device_index++;
}

View File

@@ -74,6 +74,11 @@ class MetalDevice : public Device {
id<MTLBuffer> texture_bindings_3d = nil;
std::vector<id<MTLTexture>> texture_slot_map;
/* BLAS encoding & lookup */
id<MTLArgumentEncoder> mtlBlasArgEncoder = nil;
id<MTLBuffer> blas_buffer = nil;
id<MTLBuffer> blas_lookup_buffer = nil;
bool use_metalrt = false;
MetalPipelineType kernel_specialization_level = PSO_GENERIC;
@@ -105,6 +110,8 @@ class MetalDevice : public Device {
bool use_adaptive_compilation();
bool use_local_atomic_sort() const;
bool make_source_and_check_if_compile_needed(MetalPipelineType pso_type);
void make_source(MetalPipelineType pso_type, const uint kernel_features);

View File

@@ -192,6 +192,10 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
arg_desc_as.dataType = MTLDataTypeInstanceAccelerationStructure;
arg_desc_as.access = MTLArgumentAccessReadOnly;
MTLArgumentDescriptor *arg_desc_ptrs = [[MTLArgumentDescriptor alloc] init];
arg_desc_ptrs.dataType = MTLDataTypePointer;
arg_desc_ptrs.access = MTLArgumentAccessReadOnly;
MTLArgumentDescriptor *arg_desc_ift = [[MTLArgumentDescriptor alloc] init];
arg_desc_ift.dataType = MTLDataTypeIntersectionFunctionTable;
arg_desc_ift.access = MTLArgumentAccessReadOnly;
@@ -204,14 +208,28 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_shadow */
arg_desc_ift.index = index++;
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_local */
arg_desc_ift.index = index++;
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_local_prim */
arg_desc_ptrs.index = index++;
[ancillary_desc addObject:[arg_desc_ptrs copy]]; /* blas array */
arg_desc_ptrs.index = index++;
[ancillary_desc addObject:[arg_desc_ptrs copy]]; /* look up table for blas */
[arg_desc_ift release];
[arg_desc_as release];
[arg_desc_ptrs release];
}
}
mtlAncillaryArgEncoder = [mtlDevice newArgumentEncoderWithArguments:ancillary_desc];
// preparing the blas arg encoder
MTLArgumentDescriptor *arg_desc_blas = [[MTLArgumentDescriptor alloc] init];
arg_desc_blas.dataType = MTLDataTypeInstanceAccelerationStructure;
arg_desc_blas.access = MTLArgumentAccessReadOnly;
mtlBlasArgEncoder = [mtlDevice newArgumentEncoderWithArguments:@[ arg_desc_blas ]];
[arg_desc_blas release];
for (int i = 0; i < ancillary_desc.count; i++) {
[ancillary_desc[i] release];
}
@@ -271,6 +289,11 @@ bool MetalDevice::use_adaptive_compilation()
return DebugFlags().metal.adaptive_compile;
}
bool MetalDevice::use_local_atomic_sort() const
{
return DebugFlags().metal.use_local_atomic_sort;
}
void MetalDevice::make_source(MetalPipelineType pso_type, const uint kernel_features)
{
string global_defines;
@@ -278,6 +301,10 @@ void MetalDevice::make_source(MetalPipelineType pso_type, const uint kernel_feat
global_defines += "#define __KERNEL_FEATURES__ " + to_string(kernel_features) + "\n";
}
if (use_local_atomic_sort()) {
global_defines += "#define __KERNEL_LOCAL_ATOMIC_SORT__\n";
}
if (use_metalrt) {
global_defines += "#define __METALRT__\n";
if (motion_blur) {
@@ -1231,6 +1258,33 @@ void MetalDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
if (@available(macos 11.0, *)) {
if (bvh->params.top_level) {
bvhMetalRT = bvh_metal;
// allocate required buffers for BLAS array
uint64_t count = bvhMetalRT->blas_array.size();
uint64_t bufferSize = mtlBlasArgEncoder.encodedLength * count;
blas_buffer = [mtlDevice newBufferWithLength:bufferSize options:default_storage_mode];
stats.mem_alloc(blas_buffer.allocatedSize);
for (uint64_t i = 0; i < count; ++i) {
[mtlBlasArgEncoder setArgumentBuffer:blas_buffer
offset:i * mtlBlasArgEncoder.encodedLength];
[mtlBlasArgEncoder setAccelerationStructure:bvhMetalRT->blas_array[i] atIndex:0];
}
count = bvhMetalRT->blas_lookup.size();
bufferSize = sizeof(uint32_t) * count;
blas_lookup_buffer = [mtlDevice newBufferWithLength:bufferSize
options:default_storage_mode];
stats.mem_alloc(blas_lookup_buffer.allocatedSize);
memcpy([blas_lookup_buffer contents],
bvhMetalRT -> blas_lookup.data(),
blas_lookup_buffer.allocatedSize);
if (default_storage_mode == MTLResourceStorageModeManaged) {
[blas_buffer didModifyRange:NSMakeRange(0, blas_buffer.length)];
[blas_lookup_buffer didModifyRange:NSMakeRange(0, blas_lookup_buffer.length)];
}
}
}
}

View File

@@ -19,6 +19,8 @@ enum {
METALRT_FUNC_SHADOW_BOX,
METALRT_FUNC_LOCAL_TRI,
METALRT_FUNC_LOCAL_BOX,
METALRT_FUNC_LOCAL_TRI_PRIM,
METALRT_FUNC_LOCAL_BOX_PRIM,
METALRT_FUNC_CURVE_RIBBON,
METALRT_FUNC_CURVE_RIBBON_SHADOW,
METALRT_FUNC_CURVE_ALL,
@@ -28,7 +30,13 @@ enum {
METALRT_FUNC_NUM
};
enum { METALRT_TABLE_DEFAULT, METALRT_TABLE_SHADOW, METALRT_TABLE_LOCAL, METALRT_TABLE_NUM };
enum {
METALRT_TABLE_DEFAULT,
METALRT_TABLE_SHADOW,
METALRT_TABLE_LOCAL,
METALRT_TABLE_LOCAL_PRIM,
METALRT_TABLE_NUM
};
/* Pipeline State Object types */
enum MetalPipelineType {

View File

@@ -87,6 +87,9 @@ struct ShaderCache {
break;
}
}
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS] = {1024, 1024};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS] = {1024, 1024};
}
~ShaderCache();
@@ -521,6 +524,8 @@ void MetalKernelPipeline::compile()
"__anyhit__cycles_metalrt_shadow_all_hit_box",
"__anyhit__cycles_metalrt_local_hit_tri",
"__anyhit__cycles_metalrt_local_hit_box",
"__anyhit__cycles_metalrt_local_hit_tri_prim",
"__anyhit__cycles_metalrt_local_hit_box_prim",
"__intersection__curve_ribbon",
"__intersection__curve_ribbon_shadow",
"__intersection__curve_all",
@@ -611,11 +616,17 @@ void MetalKernelPipeline::compile()
rt_intersection_function[METALRT_FUNC_LOCAL_BOX],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX],
nil];
table_functions[METALRT_TABLE_LOCAL_PRIM] = [NSArray
arrayWithObjects:rt_intersection_function[METALRT_FUNC_LOCAL_TRI_PRIM],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX_PRIM],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX_PRIM],
nil];
NSMutableSet *unique_functions = [NSMutableSet
setWithArray:table_functions[METALRT_TABLE_DEFAULT]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_SHADOW]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_LOCAL]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_LOCAL_PRIM]];
if (kernel_has_intersection(device_kernel)) {
linked_functions = [[NSArray arrayWithArray:[unique_functions allObjects]]

View File

@@ -25,6 +25,7 @@ class MetalDeviceQueue : public DeviceQueue {
virtual int num_concurrent_states(const size_t) const override;
virtual int num_concurrent_busy_states(const size_t) const override;
virtual int num_sort_partition_elements() const override;
virtual bool supports_local_atomic_sort() const override;
virtual void init_execution() override;

View File

@@ -315,6 +315,11 @@ int MetalDeviceQueue::num_sort_partition_elements() const
return MetalInfo::optimal_sort_partition_elements(metal_device_->mtlDevice);
}
bool MetalDeviceQueue::supports_local_atomic_sort() const
{
return metal_device_->use_local_atomic_sort();
}
void MetalDeviceQueue::init_execution()
{
/* Synchronize all textures and memory copies before executing task. */
@@ -477,6 +482,12 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
if (metal_device_->bvhMetalRT) {
id<MTLAccelerationStructure> accel_struct = metal_device_->bvhMetalRT->accel_struct;
[metal_device_->mtlAncillaryArgEncoder setAccelerationStructure:accel_struct atIndex:2];
[metal_device_->mtlAncillaryArgEncoder setBuffer:metal_device_->blas_buffer
offset:0
atIndex:7];
[metal_device_->mtlAncillaryArgEncoder setBuffer:metal_device_->blas_lookup_buffer
offset:0
atIndex:8];
}
for (int table = 0; table < METALRT_TABLE_NUM; table++) {
@@ -527,6 +538,10 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
if (bvhMetalRT) {
/* Mark all Accelerations resources as used */
[mtlComputeCommandEncoder useResource:bvhMetalRT->accel_struct usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResource:metal_device_->blas_buffer
usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResource:metal_device_->blas_lookup_buffer
usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResources:bvhMetalRT->blas_array.data()
count:bvhMetalRT->blas_array.size()
usage:MTLResourceUsageRead];
@@ -553,13 +568,24 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
/* See parallel_active_index.h for why this amount of shared memory is needed.
* Rounded up to 16 bytes for Metal */
shared_mem_bytes = (int)round_up((num_threads_per_block + 1) * sizeof(int), 16);
[mtlComputeCommandEncoder setThreadgroupMemoryLength:shared_mem_bytes atIndex:0];
break;
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS:
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS: {
int key_count = metal_device_->launch_params.data.max_shaders;
shared_mem_bytes = (int)round_up(key_count * sizeof(int), 16);
break;
}
default:
break;
}
if (shared_mem_bytes) {
assert(shared_mem_bytes <= 32 * 1024);
[mtlComputeCommandEncoder setThreadgroupMemoryLength:shared_mem_bytes atIndex:0];
}
MTLSize size_threadgroups_per_dispatch = MTLSizeMake(
divide_up(work_size, num_threads_per_block), 1, 1);
MTLSize size_threads_per_threadgroup = MTLSizeMake(num_threads_per_block, 1, 1);

View File

@@ -64,6 +64,12 @@ MetalGPUVendor MetalInfo::get_device_vendor(id<MTLDevice> device)
return METAL_GPU_INTEL;
}
else if (strstr(device_name, "AMD")) {
/* Setting this env var hides AMD devices thus exposing any integrated Intel devices. */
if (auto str = getenv("CYCLES_METAL_FORCE_INTEL")) {
if (atoi(str)) {
return METAL_GPU_UNKNOWN;
}
}
return METAL_GPU_AMD;
}
else if (strstr(device_name, "Apple")) {
@@ -96,6 +102,15 @@ vector<id<MTLDevice>> const &MetalInfo::get_usable_devices()
return usable_devices;
}
/* If the system has both an AMD GPU (discrete) and an Intel one (integrated), prefer the AMD
* one. This can be overridden with CYCLES_METAL_FORCE_INTEL. */
bool has_usable_amd_gpu = false;
if (@available(macos 12.3, *)) {
for (id<MTLDevice> device in MTLCopyAllDevices()) {
has_usable_amd_gpu |= (get_device_vendor(device) == METAL_GPU_AMD);
}
}
metal_printf("Usable Metal devices:\n");
for (id<MTLDevice> device in MTLCopyAllDevices()) {
string device_name = get_device_name(device);
@@ -111,8 +126,10 @@ vector<id<MTLDevice>> const &MetalInfo::get_usable_devices()
}
# if defined(MAC_OS_VERSION_13_0)
if (@available(macos 13.0, *)) {
usable |= (vendor == METAL_GPU_INTEL);
if (!has_usable_amd_gpu) {
if (@available(macos 13.0, *)) {
usable |= (vendor == METAL_GPU_INTEL);
}
}
# endif

View File

@@ -854,12 +854,14 @@ bool OptiXDevice::load_osl_kernels()
context, group_descs, 2, &group_options, nullptr, 0, &osl_groups[i * 2]));
}
OptixStackSizes stack_size[NUM_PROGRAM_GROUPS] = {};
vector<OptixStackSizes> osl_stack_size(osl_groups.size());
/* Update SBT with new entries. */
sbt_data.alloc(NUM_PROGRAM_GROUPS + osl_groups.size());
for (int i = 0; i < NUM_PROGRAM_GROUPS; ++i) {
optix_assert(optixSbtRecordPackHeader(groups[i], &sbt_data[i]));
optix_assert(optixProgramGroupGetStackSize(groups[i], &stack_size[i]));
}
for (size_t i = 0; i < osl_groups.size(); ++i) {
if (osl_groups[i] != NULL) {
@@ -907,13 +909,15 @@ bool OptiXDevice::load_osl_kernels()
0,
&pipelines[PIP_SHADE]));
const unsigned int css = std::max(stack_size[PG_RGEN_SHADE_SURFACE_RAYTRACE].cssRG,
stack_size[PG_RGEN_SHADE_SURFACE_MNEE].cssRG);
unsigned int dss = 0;
for (unsigned int i = 0; i < osl_stack_size.size(); ++i) {
dss = std::max(dss, osl_stack_size[i].dssDC);
}
optix_assert(optixPipelineSetStackSize(
pipelines[PIP_SHADE], 0, dss, 0, pipeline_options.usesMotionBlur ? 3 : 2));
pipelines[PIP_SHADE], 0, dss, css, pipeline_options.usesMotionBlur ? 3 : 2));
}
return !have_error();

View File

@@ -112,6 +112,13 @@ class DeviceQueue {
return 65536;
}
/* Does device support local atomic sorting kernels (INTEGRATOR_SORT_BUCKET_PASS and
* INTEGRATOR_SORT_WRITE_PASS)? */
virtual bool supports_local_atomic_sort() const
{
return false;
}
/* Initialize execution of kernels on this queue.
*
* Will, for example, load all data required by the kernels from Device to global or path state.

View File

@@ -71,6 +71,8 @@ PathTraceWorkGPU::PathTraceWorkGPU(Device *device,
device, "integrator_shader_mnee_sort_counter", MEM_READ_WRITE),
integrator_shader_sort_prefix_sum_(
device, "integrator_shader_sort_prefix_sum", MEM_READ_WRITE),
integrator_shader_sort_partition_key_offsets_(
device, "integrator_shader_sort_partition_key_offsets", MEM_READ_WRITE),
integrator_next_main_path_index_(device, "integrator_next_main_path_index", MEM_READ_WRITE),
integrator_next_shadow_path_index_(
device, "integrator_next_shadow_path_index", MEM_READ_WRITE),
@@ -207,33 +209,45 @@ void PathTraceWorkGPU::alloc_integrator_sorting()
integrator_state_gpu_.sort_partition_divisor = (int)divide_up(max_num_paths_,
num_sort_partitions_);
/* Allocate arrays for shader sorting. */
const int sort_buckets = device_scene_->data.max_shaders * num_sort_partitions_;
if (integrator_shader_sort_counter_.size() < sort_buckets) {
integrator_shader_sort_counter_.alloc(sort_buckets);
integrator_shader_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] =
(int *)integrator_shader_sort_counter_.device_pointer;
integrator_shader_sort_prefix_sum_.alloc(sort_buckets);
integrator_shader_sort_prefix_sum_.zero_to_device();
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_NODE_RAYTRACE) {
if (integrator_shader_raytrace_sort_counter_.size() < sort_buckets) {
integrator_shader_raytrace_sort_counter_.alloc(sort_buckets);
integrator_shader_raytrace_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE] =
(int *)integrator_shader_raytrace_sort_counter_.device_pointer;
if (num_sort_partitions_ > 1 && queue_->supports_local_atomic_sort()) {
/* Allocate array for partitioned shader sorting using local atomics. */
const int num_offsets = (device_scene_->data.max_shaders + 1) * num_sort_partitions_;
if (integrator_shader_sort_partition_key_offsets_.size() < num_offsets) {
integrator_shader_sort_partition_key_offsets_.alloc(num_offsets);
integrator_shader_sort_partition_key_offsets_.zero_to_device();
}
integrator_state_gpu_.sort_partition_key_offsets =
(int *)integrator_shader_sort_partition_key_offsets_.device_pointer;
}
else {
/* Allocate arrays for shader sorting. */
const int sort_buckets = device_scene_->data.max_shaders * num_sort_partitions_;
if (integrator_shader_sort_counter_.size() < sort_buckets) {
integrator_shader_sort_counter_.alloc(sort_buckets);
integrator_shader_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] =
(int *)integrator_shader_sort_counter_.device_pointer;
if (device_scene_->data.kernel_features & KERNEL_FEATURE_MNEE) {
if (integrator_shader_mnee_sort_counter_.size() < sort_buckets) {
integrator_shader_mnee_sort_counter_.alloc(sort_buckets);
integrator_shader_mnee_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE] =
(int *)integrator_shader_mnee_sort_counter_.device_pointer;
integrator_shader_sort_prefix_sum_.alloc(sort_buckets);
integrator_shader_sort_prefix_sum_.zero_to_device();
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_NODE_RAYTRACE) {
if (integrator_shader_raytrace_sort_counter_.size() < sort_buckets) {
integrator_shader_raytrace_sort_counter_.alloc(sort_buckets);
integrator_shader_raytrace_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE] =
(int *)integrator_shader_raytrace_sort_counter_.device_pointer;
}
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_MNEE) {
if (integrator_shader_mnee_sort_counter_.size() < sort_buckets) {
integrator_shader_mnee_sort_counter_.alloc(sort_buckets);
integrator_shader_mnee_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE] =
(int *)integrator_shader_mnee_sort_counter_.device_pointer;
}
}
}
}
@@ -451,8 +465,7 @@ void PathTraceWorkGPU::enqueue_path_iteration(DeviceKernel kernel, const int num
work_size = num_queued;
d_path_index = queued_paths_.device_pointer;
compute_sorted_queued_paths(
DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY, kernel, num_paths_limit);
compute_sorted_queued_paths(kernel, num_paths_limit);
}
else if (num_queued < work_size) {
work_size = num_queued;
@@ -511,11 +524,26 @@ void PathTraceWorkGPU::enqueue_path_iteration(DeviceKernel kernel, const int num
}
}
void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel kernel,
DeviceKernel queued_kernel,
void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel queued_kernel,
const int num_paths_limit)
{
int d_queued_kernel = queued_kernel;
/* Launch kernel to fill the active paths arrays. */
if (num_sort_partitions_ > 1 && queue_->supports_local_atomic_sort()) {
const int work_size = kernel_max_active_main_path_index(queued_kernel);
device_ptr d_queued_paths = queued_paths_.device_pointer;
int partition_size = (int)integrator_state_gpu_.sort_partition_divisor;
DeviceKernelArguments args(
&work_size, &partition_size, &num_paths_limit, &d_queued_paths, &d_queued_kernel);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS, 1024 * num_sort_partitions_, args);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS, 1024 * num_sort_partitions_, args);
return;
}
device_ptr d_counter = (device_ptr)integrator_state_gpu_.sort_key_counter[d_queued_kernel];
device_ptr d_prefix_sum = integrator_shader_sort_prefix_sum_.device_pointer;
assert(d_counter != 0 && d_prefix_sum != 0);
@@ -552,7 +580,7 @@ void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel kernel,
&d_prefix_sum,
&d_queued_kernel);
queue_->enqueue(kernel, work_size, args);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY, work_size, args);
}
}

View File

@@ -70,9 +70,7 @@ class PathTraceWorkGPU : public PathTraceWork {
void enqueue_path_iteration(DeviceKernel kernel, const int num_paths_limit = INT_MAX);
void compute_queued_paths(DeviceKernel kernel, DeviceKernel queued_kernel);
void compute_sorted_queued_paths(DeviceKernel kernel,
DeviceKernel queued_kernel,
const int num_paths_limit);
void compute_sorted_queued_paths(DeviceKernel queued_kernel, const int num_paths_limit);
void compact_main_paths(const int num_active_paths);
void compact_shadow_paths();
@@ -135,6 +133,7 @@ class PathTraceWorkGPU : public PathTraceWork {
device_vector<int> integrator_shader_raytrace_sort_counter_;
device_vector<int> integrator_shader_mnee_sort_counter_;
device_vector<int> integrator_shader_sort_prefix_sum_;
device_vector<int> integrator_shader_sort_partition_key_offsets_;
/* Path split. */
device_vector<int> integrator_next_main_path_index_;
device_vector<int> integrator_next_shadow_path_index_;

View File

@@ -661,7 +661,8 @@ ccl_device void bsdf_blur(KernelGlobals kg, ccl_private ShaderClosure *sc, float
#endif
}
ccl_device_inline Spectrum bsdf_albedo(ccl_private const ShaderData *sd, ccl_private const ShaderClosure *sc)
ccl_device_inline Spectrum bsdf_albedo(ccl_private const ShaderData *sd,
ccl_private const ShaderClosure *sc)
{
Spectrum albedo = sc->weight;
/* Some closures include additional components such as Fresnel terms that cause their albedo to

View File

@@ -519,14 +519,6 @@ ccl_device int bsdf_microfacet_ggx_setup(ccl_private MicrofacetBsdf *bsdf)
return SD_BSDF | SD_BSDF_HAS_EVAL;
}
/* Required to maintain OSL interface. */
ccl_device int bsdf_microfacet_ggx_isotropic_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_y = bsdf->alpha_x;
return bsdf_microfacet_ggx_setup(bsdf);
}
ccl_device int bsdf_microfacet_ggx_fresnel_setup(ccl_private MicrofacetBsdf *bsdf,
ccl_private const ShaderData *sd)
{
@@ -613,14 +605,6 @@ ccl_device int bsdf_microfacet_beckmann_setup(ccl_private MicrofacetBsdf *bsdf)
return SD_BSDF | SD_BSDF_HAS_EVAL;
}
/* Required to maintain OSL interface. */
ccl_device int bsdf_microfacet_beckmann_isotropic_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_y = bsdf->alpha_x;
return bsdf_microfacet_beckmann_setup(bsdf);
}
ccl_device int bsdf_microfacet_beckmann_refraction_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_x = saturatef(bsdf->alpha_x);

View File

@@ -90,8 +90,10 @@ ccl_device float schlick_fresnel(float u)
}
/* Calculate the fresnel color, which is a blend between white and the F0 color */
ccl_device_forceinline Spectrum
interpolate_fresnel_color(float3 L, float3 H, float ior, Spectrum F0)
ccl_device_forceinline Spectrum interpolate_fresnel_color(float3 L,
float3 H,
float ior,
Spectrum F0)
{
/* Compute the real Fresnel term and remap it from real_F0..1 to F0..1.
* The reason why we use this remapping instead of directly doing the

View File

@@ -401,6 +401,72 @@ ccl_gpu_kernel_threads(GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE)
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_SORT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_sort_bucket_pass,
int num_states,
int partition_size,
int num_states_limit,
ccl_global int *indices,
int kernel_index)
{
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
int max_shaders = context.launch_params_metal.data.max_shaders;
ccl_global ushort *d_queued_kernel = (ccl_global ushort *)
kernel_integrator_state.path.queued_kernel;
ccl_global uint *d_shader_sort_key = (ccl_global uint *)
kernel_integrator_state.path.shader_sort_key;
ccl_global int *key_offsets = (ccl_global int *)
kernel_integrator_state.sort_partition_key_offsets;
gpu_parallel_sort_bucket_pass(num_states,
partition_size,
max_shaders,
kernel_index,
d_queued_kernel,
d_shader_sort_key,
key_offsets,
(threadgroup int *)threadgroup_array,
metal_local_id,
metal_local_size,
metal_grid_id);
#endif
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_SORT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_sort_write_pass,
int num_states,
int partition_size,
int num_states_limit,
ccl_global int *indices,
int kernel_index)
{
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
int max_shaders = context.launch_params_metal.data.max_shaders;
ccl_global ushort *d_queued_kernel = (ccl_global ushort *)
kernel_integrator_state.path.queued_kernel;
ccl_global uint *d_shader_sort_key = (ccl_global uint *)
kernel_integrator_state.path.shader_sort_key;
ccl_global int *key_offsets = (ccl_global int *)
kernel_integrator_state.sort_partition_key_offsets;
gpu_parallel_sort_write_pass(num_states,
partition_size,
max_shaders,
kernel_index,
num_states_limit,
indices,
d_queued_kernel,
d_shader_sort_key,
key_offsets,
(threadgroup int *)threadgroup_array,
metal_local_id,
metal_local_size,
metal_grid_id);
#endif
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_ACTIVE_INDEX_DEFAULT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_compact_paths_array,
int num_states,

View File

@@ -178,7 +178,7 @@ __device__
simd_lane_index, \
simd_group_index, \
num_simd_groups, \
simdgroup_offset)
(threadgroup int *)threadgroup_array)
#elif defined(__KERNEL_ONEAPI__)
# define gpu_parallel_active_index_array(num_states, indices, num_indices, is_active_op) \

View File

@@ -19,6 +19,115 @@ CCL_NAMESPACE_BEGIN
# define GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE 512
#endif
#define GPU_PARALLEL_SORTED_INDEX_INACTIVE_KEY (~0)
#define GPU_PARALLEL_SORT_BLOCK_SIZE 1024
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
# define atomic_store_local(p, x) \
atomic_store_explicit((threadgroup atomic_int *)p, x, memory_order_relaxed)
# define atomic_load_local(p) \
atomic_load_explicit((threadgroup atomic_int *)p, memory_order_relaxed)
ccl_device_inline void gpu_parallel_sort_bucket_pass(const uint num_states,
const uint partition_size,
const uint max_shaders,
const uint queued_kernel,
ccl_global ushort *d_queued_kernel,
ccl_global uint *d_shader_sort_key,
ccl_global int *partition_key_offsets,
ccl_gpu_shared int *buckets,
const ushort local_id,
const ushort local_size,
const ushort grid_id)
{
/* Zero the bucket sizes. */
if (local_id < max_shaders) {
atomic_store_local(&buckets[local_id], 0);
}
ccl_gpu_syncthreads();
/* Determine bucket sizes within the partitions. */
const uint partition_start = partition_size * uint(grid_id);
const uint partition_end = min(num_states, partition_start + partition_size);
for (int state_index = partition_start + uint(local_id); state_index < partition_end;
state_index += uint(local_size)) {
ushort kernel_index = d_queued_kernel[state_index];
if (kernel_index == queued_kernel) {
uint key = d_shader_sort_key[state_index] % max_shaders;
atomic_fetch_and_add_uint32(&buckets[key], 1);
}
}
ccl_gpu_syncthreads();
/* Calculate the partition's local offsets from the prefix sum of bucket sizes. */
if (local_id == 0) {
int offset = 0;
for (int i = 0; i < max_shaders; i++) {
partition_key_offsets[i + uint(grid_id) * (max_shaders + 1)] = offset;
offset = offset + atomic_load_local(&buckets[i]);
}
/* Store the number of active states in this partition. */
partition_key_offsets[max_shaders + uint(grid_id) * (max_shaders + 1)] = offset;
}
}
ccl_device_inline void gpu_parallel_sort_write_pass(const uint num_states,
const uint partition_size,
const uint max_shaders,
const uint queued_kernel,
const int num_states_limit,
ccl_global int *indices,
ccl_global ushort *d_queued_kernel,
ccl_global uint *d_shader_sort_key,
ccl_global int *partition_key_offsets,
ccl_gpu_shared int *local_offset,
const ushort local_id,
const ushort local_size,
const ushort grid_id)
{
/* Calculate each partition's global offset from the prefix sum of the active state counts per
* partition. */
if (local_id < max_shaders) {
int partition_offset = 0;
for (int i = 0; i < uint(grid_id); i++) {
int partition_key_count = partition_key_offsets[max_shaders + uint(i) * (max_shaders + 1)];
partition_offset += partition_key_count;
}
ccl_global int *key_offsets = partition_key_offsets + (uint(grid_id) * (max_shaders + 1));
atomic_store_local(&local_offset[local_id], key_offsets[local_id] + partition_offset);
}
ccl_gpu_syncthreads();
/* Write the sorted active indices. */
const uint partition_start = partition_size * uint(grid_id);
const uint partition_end = min(num_states, partition_start + partition_size);
ccl_global int *key_offsets = partition_key_offsets + (uint(grid_id) * max_shaders);
for (int state_index = partition_start + uint(local_id); state_index < partition_end;
state_index += uint(local_size)) {
ushort kernel_index = d_queued_kernel[state_index];
if (kernel_index == queued_kernel) {
uint key = d_shader_sort_key[state_index] % max_shaders;
int index = atomic_fetch_and_add_uint32(&local_offset[key], 1);
if (index < num_states_limit) {
indices[index] = state_index;
}
}
}
}
#endif /* __KERNEL_LOCAL_ATOMIC_SORT__ */
template<typename GetKeyOp>
__device__ void gpu_parallel_sorted_index_array(const uint state_index,

View File

@@ -172,17 +172,14 @@ ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
kernel_assert(!"Invalid ift_local");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
if (is_null_intersection_function_table(metal_ancillaries->ift_local_prim)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid ift_local_prim");
return false;
}
# endif
MetalRTIntersectionLocalPayload payload;
payload.self = ray->self;
@@ -195,14 +192,48 @@ ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
}
payload.result = false;
typename metalrt_intersector_type::result_type intersection;
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
# if defined(__METALRT_MOTION__)
metalrt_intersector_type metalrt_intersect;
typename metalrt_intersector_type::result_type intersection;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, ray->time, metal_ancillaries->ift_local, payload);
# else
metalrt_blas_intersector_type metalrt_intersect;
typename metalrt_blas_intersector_type::result_type intersection;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
// if we know we are going to get max one hit, like for random-sss-walk we can
// optimize and accept the first hit
if (max_hits == 1) {
metalrt_intersect.accept_any_intersection(true);
}
int blas_index = metal_ancillaries->blas_userID_to_index_lookUp[local_object];
// transform the ray into object's local space
Transform itfm = kernel_data_fetch(objects, local_object).itfm;
r.origin = transform_point(&itfm, r.origin);
r.direction = transform_direction(&itfm, r.direction);
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, metal_ancillaries->ift_local, payload);
r,
metal_ancillaries->blas_accel_structs[blas_index].blas,
metal_ancillaries->ift_local_prim,
payload);
# endif
if (lcg_state) {

View File

@@ -105,10 +105,11 @@ struct kernel_gpu_##name \
{ \
PARAMS_MAKER(__VA_ARGS__)(__VA_ARGS__) \
void run(thread MetalKernelContext& context, \
threadgroup int *simdgroup_offset, \
threadgroup atomic_int *threadgroup_array, \
const uint metal_global_id, \
const ushort metal_local_id, \
const ushort metal_local_size, \
const ushort metal_grid_id, \
uint simdgroup_size, \
uint simd_lane_index, \
uint simd_group_index, \
@@ -117,22 +118,24 @@ struct kernel_gpu_##name \
kernel void cycles_metal_##name(device const kernel_gpu_##name *params_struct, \
constant KernelParamsMetal &ccl_restrict _launch_params_metal, \
constant MetalAncillaries *_metal_ancillaries, \
threadgroup int *simdgroup_offset[[ threadgroup(0) ]], \
threadgroup atomic_int *threadgroup_array[[ threadgroup(0) ]], \
const uint metal_global_id [[thread_position_in_grid]], \
const ushort metal_local_id [[thread_position_in_threadgroup]], \
const ushort metal_local_size [[threads_per_threadgroup]], \
const ushort metal_grid_id [[threadgroup_position_in_grid]], \
uint simdgroup_size [[threads_per_simdgroup]], \
uint simd_lane_index [[thread_index_in_simdgroup]], \
uint simd_group_index [[simdgroup_index_in_threadgroup]], \
uint num_simd_groups [[simdgroups_per_threadgroup]]) { \
MetalKernelContext context(_launch_params_metal, _metal_ancillaries); \
params_struct->run(context, simdgroup_offset, metal_global_id, metal_local_id, metal_local_size, simdgroup_size, simd_lane_index, simd_group_index, num_simd_groups); \
params_struct->run(context, threadgroup_array, metal_global_id, metal_local_id, metal_local_size, metal_grid_id, simdgroup_size, simd_lane_index, simd_group_index, num_simd_groups); \
} \
void kernel_gpu_##name::run(thread MetalKernelContext& context, \
threadgroup int *simdgroup_offset, \
threadgroup atomic_int *threadgroup_array, \
const uint metal_global_id, \
const ushort metal_local_id, \
const ushort metal_local_size, \
const ushort metal_grid_id, \
uint simdgroup_size, \
uint simd_lane_index, \
uint simd_group_index, \
@@ -263,13 +266,25 @@ ccl_device_forceinline uchar4 make_uchar4(const uchar x,
# if defined(__METALRT_MOTION__)
# define METALRT_TAGS instancing, instance_motion, primitive_motion
# define METALRT_BLAS_TAGS , primitive_motion
# else
# define METALRT_TAGS instancing
# define METALRT_BLAS_TAGS
# endif /* __METALRT_MOTION__ */
typedef acceleration_structure<METALRT_TAGS> metalrt_as_type;
typedef intersection_function_table<triangle_data, METALRT_TAGS> metalrt_ift_type;
typedef metal::raytracing::intersector<triangle_data, METALRT_TAGS> metalrt_intersector_type;
# if defined(__METALRT_MOTION__)
typedef acceleration_structure<primitive_motion> metalrt_blas_as_type;
typedef intersection_function_table<triangle_data, primitive_motion> metalrt_blas_ift_type;
typedef metal::raytracing::intersector<triangle_data, primitive_motion>
metalrt_blas_intersector_type;
# else
typedef acceleration_structure<> metalrt_blas_as_type;
typedef intersection_function_table<triangle_data> metalrt_blas_ift_type;
typedef metal::raytracing::intersector<triangle_data> metalrt_blas_intersector_type;
# endif
#endif /* __METALRT__ */
@@ -282,6 +297,12 @@ struct Texture3DParamsMetal {
texture3d<float, access::sample> tex;
};
#ifdef __METALRT__
struct MetalRTBlasWrapper {
metalrt_blas_as_type blas;
};
#endif
struct MetalAncillaries {
device Texture2DParamsMetal *textures_2d;
device Texture3DParamsMetal *textures_3d;
@@ -291,6 +312,9 @@ struct MetalAncillaries {
metalrt_ift_type ift_default;
metalrt_ift_type ift_shadow;
metalrt_ift_type ift_local;
metalrt_blas_ift_type ift_local_prim;
constant MetalRTBlasWrapper *blas_accel_structs;
constant int *blas_userID_to_index_lookUp;
#endif
};

View File

@@ -139,6 +139,20 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
#endif
}
[[intersection(triangle, triangle_data )]] TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri_prim(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload [[payload]],
uint primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
{
//instance_id, aka the user_id has been removed. If we take this function we optimized the
//SSS for starting traversal from a primitive acceleration structure instead of the root of the global AS.
//this means we will always be intersecting the correct object no need for the userid to check
return metalrt_local_hit<TriangleIntersectionResult, METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, payload.local_object, primitive_id, barycentrics, ray_tmax);
}
[[intersection(triangle, triangle_data, METALRT_TAGS)]] TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
@@ -163,6 +177,17 @@ __anyhit__cycles_metalrt_local_hit_box(const float ray_tmax [[max_distance]])
return result;
}
[[intersection(bounding_box, triangle_data )]] BoundingBoxIntersectionResult
__anyhit__cycles_metalrt_local_hit_box_prim(const float ray_tmax [[max_distance]])
{
/* unused function */
BoundingBoxIntersectionResult result;
result.distance = ray_tmax;
result.accept = false;
result.continue_search = false;
return result;
}
template<uint intersection_type>
bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,

View File

@@ -372,6 +372,16 @@ bool oneapi_enqueue_kernel(KernelContext *kernel_context,
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sorted_paths_array);
break;
}
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS: {
oneapi_call(
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sort_bucket_pass);
break;
}
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS: {
oneapi_call(
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sort_write_pass);
break;
}
case DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY: {
oneapi_call(kg,
cgh,

View File

@@ -132,6 +132,9 @@ typedef struct IntegratorStateGPU {
/* Index of main path which will be used by a next shadow catcher split. */
ccl_global int *next_main_path_index;
/* Partition/key offsets used when writing sorted active indices. */
ccl_global int *sort_partition_key_offsets;
/* Divisor used to partition active indices by locality when sorting by material. */
uint sort_partition_divisor;
} IntegratorStateGPU;

View File

@@ -115,6 +115,13 @@ ccl_device_forceinline void integrator_path_init_sorted(KernelGlobals kg,
atomic_fetch_and_add_uint32(&kernel_integrator_state.queue_counter->num_queued[next_kernel], 1);
INTEGRATOR_STATE_WRITE(state, path, queued_kernel) = next_kernel;
INTEGRATOR_STATE_WRITE(state, path, shader_sort_key) = key_;
# if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
if (!kernel_integrator_state.sort_key_counter[next_kernel]) {
return;
}
# endif
atomic_fetch_and_add_uint32(&kernel_integrator_state.sort_key_counter[next_kernel][key_], 1);
}
@@ -130,6 +137,13 @@ ccl_device_forceinline void integrator_path_next_sorted(KernelGlobals kg,
atomic_fetch_and_add_uint32(&kernel_integrator_state.queue_counter->num_queued[next_kernel], 1);
INTEGRATOR_STATE_WRITE(state, path, queued_kernel) = next_kernel;
INTEGRATOR_STATE_WRITE(state, path, shader_sort_key) = key_;
# if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
if (!kernel_integrator_state.sort_key_counter[next_kernel]) {
return;
}
# endif
atomic_fetch_and_add_uint32(&kernel_integrator_state.sort_key_counter[next_kernel][key_], 1);
}

View File

@@ -209,14 +209,7 @@ ccl_device void osl_closure_microfacet_setup(KernelGlobals kg,
if (closure->distribution == make_string("ggx", 11253504724482777663ull) ||
closure->distribution == make_string("default", 4430693559278735917ull)) {
if (!closure->refract) {
if (closure->alpha_x == closure->alpha_y) {
/* Isotropic */
sd->flag |= bsdf_microfacet_ggx_isotropic_setup(bsdf);
}
else {
/* Anisotropic */
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
else {
sd->flag |= bsdf_microfacet_ggx_refraction_setup(bsdf);
@@ -225,14 +218,7 @@ ccl_device void osl_closure_microfacet_setup(KernelGlobals kg,
/* Beckmann */
else {
if (!closure->refract) {
if (closure->alpha_x == closure->alpha_y) {
/* Isotropic */
sd->flag |= bsdf_microfacet_beckmann_isotropic_setup(bsdf);
}
else {
/* Anisotropic */
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
else {
sd->flag |= bsdf_microfacet_beckmann_refraction_setup(bsdf);
@@ -258,9 +244,9 @@ ccl_device void osl_closure_microfacet_ggx_setup(
}
bsdf->N = ensure_valid_reflection(sd->Ng, sd->wi, closure->N);
bsdf->alpha_x = closure->alpha_x;
bsdf->alpha_x = bsdf->alpha_y = closure->alpha_x;
sd->flag |= bsdf_microfacet_ggx_isotropic_setup(bsdf);
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
ccl_device void osl_closure_microfacet_ggx_aniso_setup(
@@ -652,9 +638,9 @@ ccl_device void osl_closure_microfacet_beckmann_setup(
}
bsdf->N = ensure_valid_reflection(sd->Ng, sd->wi, closure->N);
bsdf->alpha_x = closure->alpha_x;
bsdf->alpha_x = bsdf->alpha_y = closure->alpha_x;
sd->flag |= bsdf_microfacet_beckmann_isotropic_setup(bsdf);
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
ccl_device void osl_closure_microfacet_beckmann_aniso_setup(

View File

@@ -74,7 +74,8 @@ CCL_NAMESPACE_BEGIN
#define __VOLUME__
/* TODO: solve internal compiler errors and enable light tree on HIP. */
#ifdef __KERNEL_HIP__
/* TODO: solve internal compiler perf issue and enable light tree on Metal/AMD. */
#if defined(__KERNEL_HIP__) || defined(__KERNEL_METAL_AMD__)
# undef __LIGHT_TREE__
#endif
@@ -1508,6 +1509,8 @@ typedef enum DeviceKernel : int {
DEVICE_KERNEL_INTEGRATOR_ACTIVE_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_TERMINATED_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS,
DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS,
DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_COMPACT_STATES,
DEVICE_KERNEL_INTEGRATOR_TERMINATED_SHADOW_PATHS_ARRAY,

View File

@@ -13,6 +13,7 @@
#include "scene/light.h"
#include "scene/mesh.h"
#include "scene/object.h"
#include "scene/osl.h"
#include "scene/pointcloud.h"
#include "scene/scene.h"
#include "scene/shader.h"
@@ -25,7 +26,6 @@
#ifdef WITH_OSL
# include "kernel/osl/globals.h"
# include "kernel/osl/services.h"
#endif
#include "util/foreach.h"
@@ -1717,20 +1717,7 @@ void GeometryManager::device_update_displacement_images(Device *device,
/* If any OSL node is used for displacement, it may reference a texture. But it's
* unknown which ones, so have to load them all. */
if (has_osl_node) {
set<OSLRenderServices *> services_shared;
device->foreach_device([&services_shared](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
services_shared.insert(og->services);
});
for (OSLRenderServices *services : services_shared) {
for (auto it = services->textures.begin(); it != services->textures.end(); ++it) {
if (it->second->handle.get_manager() == image_manager) {
const int slot = it->second->handle.svm_slot();
bump_images.insert(slot);
}
}
}
OSLShaderManager::osl_image_slots(device, image_manager, bump_images);
}
#endif

View File

@@ -665,6 +665,27 @@ OSLNode *OSLShaderManager::osl_node(ShaderGraph *graph,
return node;
}
/* Static function, so only this file needs to be compile with RTTT. */
void OSLShaderManager::osl_image_slots(Device *device,
ImageManager *image_manager,
set<int> &image_slots)
{
set<OSLRenderServices *> services_shared;
device->foreach_device([&services_shared](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
services_shared.insert(og->services);
});
for (OSLRenderServices *services : services_shared) {
for (auto it = services->textures.begin(); it != services->textures.end(); ++it) {
if (it->second->handle.get_manager() == image_manager) {
const int slot = it->second->handle.svm_slot();
image_slots.insert(slot);
}
}
}
}
/* Graph Compiler */
OSLCompiler::OSLCompiler(OSLShaderManager *manager, OSL::ShadingSystem *ss, Scene *scene)

View File

@@ -92,6 +92,9 @@ class OSLShaderManager : public ShaderManager {
const std::string &bytecode_hash = "",
const std::string &bytecode = "");
/* Get image slots used by OSL services on device. */
static void osl_image_slots(Device *device, ImageManager *image_manager, set<int> &image_slots);
private:
void texture_system_init();
void texture_system_free();

View File

@@ -73,16 +73,55 @@ ccl_device_inline float atomic_add_and_fetch_float(volatile ccl_global float *_s
return new_value.float_value;
}
# define atomic_fetch_and_add_uint32(p, x) \
atomic_fetch_add_explicit((device atomic_uint *)p, x, memory_order_relaxed)
# define atomic_fetch_and_sub_uint32(p, x) \
atomic_fetch_sub_explicit((device atomic_uint *)p, x, memory_order_relaxed)
# define atomic_fetch_and_inc_uint32(p) \
atomic_fetch_add_explicit((device atomic_uint *)p, 1, memory_order_relaxed)
# define atomic_fetch_and_dec_uint32(p) \
atomic_fetch_sub_explicit((device atomic_uint *)p, 1, memory_order_relaxed)
# define atomic_fetch_and_or_uint32(p, x) \
atomic_fetch_or_explicit((device atomic_uint *)p, x, memory_order_relaxed)
template<class T> ccl_device_inline uint32_t atomic_fetch_and_add_uint32(device T *p, int x)
{
return atomic_fetch_add_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_sub_uint32(device T *p, int x)
{
return atomic_fetch_sub_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_inc_uint32(device T *p)
{
return atomic_fetch_add_explicit((device atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_dec_uint32(device T *p)
{
return atomic_fetch_sub_explicit((device atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_or_uint32(device T *p, int x)
{
return atomic_fetch_or_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_add_uint32(threadgroup T *p, int x)
{
return atomic_fetch_add_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_sub_uint32(threadgroup T *p, int x)
{
return atomic_fetch_sub_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_inc_uint32(threadgroup T *p)
{
return atomic_fetch_add_explicit((threadgroup atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_dec_uint32(threadgroup T *p)
{
return atomic_fetch_sub_explicit((threadgroup atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_or_uint32(threadgroup T *p, int x)
{
return atomic_fetch_or_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
ccl_device_inline float atomic_compare_and_swap_float(volatile ccl_global float *dest,
const float old_val,

View File

@@ -69,6 +69,9 @@ void DebugFlags::Metal::reset()
{
if (getenv("CYCLES_METAL_ADAPTIVE_COMPILE") != NULL)
adaptive_compile = true;
if (auto str = getenv("CYCLES_METAL_LOCAL_ATOMIC_SORT"))
use_local_atomic_sort = (atoi(str) != 0);
}
DebugFlags::OptiX::OptiX()

View File

@@ -97,6 +97,9 @@ class DebugFlags {
/* Whether adaptive feature based runtime compile is enabled or not. */
bool adaptive_compile = false;
/* Whether local atomic sorting is enabled or not. */
bool use_local_atomic_sort = true;
};
/* Get instance of debug flags registry. */

View File

@@ -74,7 +74,7 @@ ccl_device float fast_sinf(float x)
*
* Results on: [-2pi,2pi].
*
* Examined 2173837240 values of sin: 0.00662760244 avg ulp diff, 2 max ulp,
* Examined 2173837240 values of sin: 0.00662760244 avg ULP diff, 2 max ULP,
* 1.19209e-07 max error
*/
int q = fast_rint(x * M_1_PI_F);
@@ -256,11 +256,11 @@ ccl_device float fast_acosf(float x)
/* clamp and crush denormals. */
const float m = (f < 1.0f) ? 1.0f - (1.0f - f) : 1.0f;
/* Based on http://www.pouet.net/topic.php?which=9132&page=2
* 85% accurate (ulp 0)
* 85% accurate (ULP 0)
* Examined 2130706434 values of acos:
* 15.2000597 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // without "denormal crush"
* 15.2000597 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // without "denormal crush"
* Examined 2130706434 values of acos:
* 15.2007108 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // with "denormal crush"
* 15.2007108 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // with "denormal crush"
*/
const float a = sqrtf(1.0f - m) *
(1.5707963267f + m * (-0.213300989f + m * (0.077980478f + m * -0.02164095f)));
@@ -270,9 +270,8 @@ ccl_device float fast_acosf(float x)
ccl_device float fast_asinf(float x)
{
/* Based on acosf approximation above.
* Max error is 4.51133e-05 (ulps are higher because we are consistently off
* by a little amount).
*/
* Max error is 4.51133e-05 (ULPS are higher because we are consistently off
* by a little amount). */
const float f = fabsf(x);
/* Clamp and crush denormals. */
const float m = (f < 1.0f) ? 1.0f - (1.0f - f) : 1.0f;
@@ -290,9 +289,9 @@ ccl_device float fast_atanf(float x)
const float t = s * s;
/* http://mathforum.org/library/drmath/view/62672.html
* Examined 4278190080 values of atan:
* 2.36864877 avg ulp diff, 302 max ulp, 6.55651e-06 max error // (with denormals)
* 2.36864877 avg ULP diff, 302 max ULP, 6.55651e-06 max error // (with denormals)
* Examined 4278190080 values of atan:
* 171160502 avg ulp diff, 855638016 max ulp, 6.55651e-06 max error // (crush denormals)
* 171160502 avg ULP diff, 855638016 max ULP, 6.55651e-06 max error // (crush denormals)
*/
float r = s * madd(0.43157974f, t, 1.0f) / madd(madd(0.05831938f, t, 0.76443945f), t, 1.0f);
if (a > 1.0f) {
@@ -343,8 +342,8 @@ ccl_device float fast_log2f(float x)
int exponent = (int)(bits >> 23) - 127;
float f = __uint_as_float((bits & 0x007FFFFF) | 0x3f800000) - 1.0f;
/* Examined 2130706432 values of log2 on [1.17549435e-38,3.40282347e+38]:
* 0.0797524457 avg ulp diff, 3713596 max ulp, 7.62939e-06 max error.
* ulp histogram:
* 0.0797524457 avg ULP diff, 3713596 max ULP, 7.62939e-06 max error.
* ULP histogram:
* 0 = 97.46%
* 1 = 2.29%
* 2 = 0.11%
@@ -363,7 +362,7 @@ ccl_device float fast_log2f(float x)
ccl_device_inline float fast_logf(float x)
{
/* Examined 2130706432 values of logf on [1.17549435e-38,3.40282347e+38]:
* 0.313865375 avg ulp diff, 5148137 max ulp, 7.62939e-06 max error.
* 0.313865375 avg ULP diff, 5148137 max ULP, 7.62939e-06 max error.
*/
return fast_log2f(x) * M_LN2_F;
}
@@ -371,7 +370,7 @@ ccl_device_inline float fast_logf(float x)
ccl_device_inline float fast_log10(float x)
{
/* Examined 2130706432 values of log10f on [1.17549435e-38,3.40282347e+38]:
* 0.631237033 avg ulp diff, 4471615 max ulp, 3.8147e-06 max error.
* 0.631237033 avg ULP diff, 4471615 max ULP, 3.8147e-06 max error.
*/
return fast_log2f(x) * M_LN2_F / M_LN10_F;
}
@@ -392,12 +391,12 @@ ccl_device float fast_exp2f(float x)
/* Range reduction. */
int m = (int)x;
x -= m;
x = 1.0f - (1.0f - x); /* Crush denormals (does not affect max ulps!). */
x = 1.0f - (1.0f - x); /* Crush denormals (does not affect max ULPS!). */
/* 5th degree polynomial generated with sollya
* Examined 2247622658 values of exp2 on [-126,126]: 2.75764912 avg ulp diff,
* 232 max ulp.
* Examined 2247622658 values of exp2 on [-126,126]: 2.75764912 avg ULP diff,
* 232 max ULP.
*
* ulp histogram:
* ULP histogram:
* 0 = 87.81%
* 1 = 4.18%
*/
@@ -415,7 +414,7 @@ ccl_device float fast_exp2f(float x)
ccl_device_inline float fast_expf(float x)
{
/* Examined 2237485550 values of exp on [-87.3300018,87.3300018]:
* 2.6666452 avg ulp diff, 230 max ulp.
* 2.6666452 avg ULP diff, 230 max ULP.
*/
return fast_exp2f(x / M_LN2_F);
}
@@ -454,7 +453,7 @@ ccl_device_inline float4 fast_expf4(float4 x)
ccl_device_inline float fast_exp10(float x)
{
/* Examined 2217701018 values of exp10 on [-37.9290009,37.9290009]:
* 2.71732409 avg ulp diff, 232 max ulp.
* 2.71732409 avg ULP diff, 232 max ULP.
*/
return fast_exp2f(x * M_LN10_F / M_LN2_F);
}
@@ -475,7 +474,7 @@ ccl_device float fast_sinhf(float x)
float a = fabsf(x);
if (a > 1.0f) {
/* Examined 53389559 values of sinh on [1,87.3300018]:
* 33.6886442 avg ulp diff, 178 max ulp. */
* 33.6886442 avg ULP diff, 178 max ULP. */
float e = fast_expf(a);
return copysignf(0.5f * e - 0.5f / e, x);
}
@@ -495,7 +494,7 @@ ccl_device float fast_sinhf(float x)
ccl_device_inline float fast_coshf(float x)
{
/* Examined 2237485550 values of cosh on [-87.3300018,87.3300018]:
* 1.78256726 avg ulp diff, 178 max ulp.
* 1.78256726 avg ULP diff, 178 max ULP.
*/
float e = fast_expf(fabsf(x));
return 0.5f * e + 0.5f / e;
@@ -506,7 +505,7 @@ ccl_device_inline float fast_tanhf(float x)
/* Examined 4278190080 values of tanh on [-3.40282347e+38,3.40282347e+38]:
* 3.12924e-06 max error.
*/
/* NOTE: ulp error is high because of sub-optimal handling around the origin. */
/* NOTE: ULP error is high because of sub-optimal handling around the origin. */
float e = fast_expf(2.0f * fabsf(x));
return copysignf(1.0f - 2.0f / (1.0f + e), x);
}
@@ -579,7 +578,7 @@ ccl_device_inline float fast_erfcf(float x)
{
/* Examined 2164260866 values of erfcf on [-4,4]: 1.90735e-06 max error.
*
* ulp histogram:
* ULP histogram:
*
* 0 = 80.30%
*/

View File

@@ -602,7 +602,7 @@ void GHOST_SystemSDL::processEvent(SDL_Event *sdl_event)
/* NOTE: the `sdl_sub_evt.keysym.sym` is truncated,
* for unicode support ghost has to be modified. */
/* TODO(@campbellbarton): support full unicode, SDL supports this but it needs to be
/* TODO(@ideasman42): support full unicode, SDL supports this but it needs to be
* explicitly enabled via #SDL_StartTextInput which GHOST would have to wrap. */
char utf8_buf[sizeof(GHOST_TEventKeyData::utf8_buf)] = {'\0'};
if (type == GHOST_kEventKeyDown) {

View File

@@ -82,6 +82,8 @@
#include "CLG_log.h"
#ifdef USE_EVENT_BACKGROUND_THREAD
# include "GHOST_TimerTask.h"
# include <pthread.h>
#endif
@@ -239,7 +241,7 @@ enum {
BTN_STYLUS = 0x14b,
/** Use as right-mouse. */
BTN_STYLUS2 = 0x14c,
/** NOTE(@campbellbarton): Map to an additional button (not sure which hardware uses this). */
/** NOTE(@ideasman42): Map to an additional button (not sure which hardware uses this). */
BTN_STYLUS3 = 0x149,
};
@@ -768,7 +770,12 @@ struct GWL_Seat {
int32_t rate = 0;
/** Time (milliseconds) after which to start repeating keys. */
int32_t delay = 0;
/** Timer for key repeats. */
/**
* Timer for key repeats.
*
* \note For as long as #USE_EVENT_BACKGROUND_THREAD is defined, any access to this
* (including null checks, must lock `timer_mutex` first.
*/
GHOST_ITimerTask *timer = nullptr;
} key_repeat;
@@ -832,6 +839,42 @@ static bool gwl_seat_key_depressed_suppress_warning(const GWL_Seat *seat)
return suppress_warning;
}
/**
* \note Caller must lock `timer_mutex`.
*/
static void gwl_seat_key_repeat_timer_add(GWL_Seat *seat,
GHOST_TimerProcPtr key_repeat_fn,
GHOST_TUserDataPtr payload,
const bool use_delay)
{
GHOST_SystemWayland *system = seat->system;
const uint64_t time_step = 1000 / seat->key_repeat.rate;
const uint64_t time_start = use_delay ? seat->key_repeat.delay : time_step;
#ifdef USE_EVENT_BACKGROUND_THREAD
GHOST_TimerTask *timer = new GHOST_TimerTask(
system->getMilliSeconds() + time_start, time_step, key_repeat_fn, payload);
seat->key_repeat.timer = timer;
system->ghost_timer_manager()->addTimer(timer);
#else
seat->key_repeat.timer = system->installTimer(time_start, time_step, key_repeat_fn, payload);
#endif
}
/**
* \note The caller must lock `timer_mutex`.
*/
static void gwl_seat_key_repeat_timer_remove(GWL_Seat *seat)
{
GHOST_SystemWayland *system = seat->system;
#ifdef USE_EVENT_BACKGROUND_THREAD
system->ghost_timer_manager()->removeTimer(
static_cast<GHOST_TimerTask *>(seat->key_repeat.timer));
#else
system->removeTimer(seat->key_repeat.timer);
#endif
seat->key_repeat.timer = nullptr;
}
/** \} */
/* -------------------------------------------------------------------- */
@@ -873,7 +916,7 @@ struct GWL_Display {
* The main purpose of having an active seat is an alternative from always using the first
* seat which prevents events from any other seat.
*
* NOTE(@campbellbarton): This could be extended and developed further extended to support
* NOTE(@ideasman42): This could be extended and developed further extended to support
* an active seat per window (for e.g.), basic support is sufficient for now as currently isn't
* a widely used feature.
*/
@@ -906,6 +949,16 @@ struct GWL_Display {
/** Guard against multiple threads accessing `events_pending` at once. */
std::mutex events_pending_mutex;
/**
* A separate timer queue, needed so the WAYLAND thread can lock access.
* Using the system's #GHOST_Sysem::getTimerManager is not thread safe because
* access to the timer outside of WAYLAND specific logic will not lock.
*
* Needed because #GHOST_System::dispatchEvents fires timers
* outside of WAYLAND (without locking the `timer_mutex`).
*/
GHOST_TimerManager *ghost_timer_manager = nullptr;
#endif /* USE_EVENT_BACKGROUND_THREAD */
};
@@ -959,6 +1012,13 @@ static void gwl_display_destroy(GWL_Display *display)
gwl_display_event_thread_destroy(display);
display->system->server_mutex->unlock();
}
/* Important to remove after the seats which may have key repeat timers active. */
if (display->ghost_timer_manager) {
delete display->ghost_timer_manager;
display->ghost_timer_manager = nullptr;
}
#endif /* USE_EVENT_BACKGROUND_THREAD */
if (display->wl_display) {
@@ -1179,7 +1239,7 @@ static void gwl_registry_entry_remove_all(GWL_Display *display)
{
const bool on_exit = true;
/* NOTE(@campbellbarton): Free by slot instead of simply looping over
/* NOTE(@ideasman42): Free by slot instead of simply looping over
* `display->registry_entry` so the order of freeing is always predictable.
* Otherwise global objects would be feed in the order they are registered.
* While this works in my tests, it could cause difficult to reproduce bugs
@@ -1209,7 +1269,7 @@ static void gwl_registry_entry_remove_all(GWL_Display *display)
* so there is no reason to update all other outputs that an output was removed (for e.g.).
* Pass as -1 to update all slots.
*
* NOTE(@campbellbarton): Updating all other items on a single change is typically worth avoiding.
* NOTE(@ideasman42): Updating all other items on a single change is typically worth avoiding.
* In practice this isn't a problem as so there are so few elements in `display->registry_entry`,
* so few use update functions and adding/removal at runtime is rarely called (plugging/unplugging)
* hardware for e.g. So while it's possible to store dependency links to avoid unnecessary
@@ -1258,7 +1318,7 @@ static void ghost_wl_display_report_error(struct wl_display *display)
fprintf(stderr, "The Wayland connection experienced a fatal error: %s\n", strerror(ecode));
}
/* NOTE(@campbellbarton): The application is running,
/* NOTE(@ideasman42): The application is running,
* however an error closes all windows and most importantly:
* shuts down the GPU context (loosing all GPU state - shaders, bind codes etc),
* so recovering from this effectively involves restarting.
@@ -2910,7 +2970,7 @@ static void gesture_pinch_handle_begin(void *data,
if (wl_surface *wl_surface_focus = seat->pointer.wl_surface_window) {
win = ghost_wl_surface_user_data(wl_surface_focus);
}
/* NOTE(@campbellbarton): Blender's use of track-pad coordinates is inconsistent and needs work.
/* NOTE(@ideasman42): Blender's use of track-pad coordinates is inconsistent and needs work.
* This isn't specific to WAYLAND, in practice they tend to work well enough in most cases.
* Some operators scale by the UI scale, some don't.
* Even this window scale is not correct because it doesn't account for:
@@ -2924,7 +2984,7 @@ static void gesture_pinch_handle_begin(void *data,
*/
const wl_fixed_t win_scale = win ? win->scale() : 1;
/* NOTE(@campbellbarton): Scale factors match Blender's operators & default preferences.
/* NOTE(@ideasman42): Scale factors match Blender's operators & default preferences.
* For these values to work correctly, operator logic will need to be changed not to scale input
* by the region size (as with 3D view zoom) or preference for 3D view orbit sensitivity.
*
@@ -3087,7 +3147,7 @@ static const struct zwp_pointer_gesture_swipe_v1_listener gesture_swipe_listener
/* -------------------------------------------------------------------- */
/** \name Listener (Touch Seat), #wl_touch_listener
*
* NOTE(@campbellbarton): It's not clear if this interface is used by popular compositors.
* NOTE(@ideasman42): It's not clear if this interface is used by popular compositors.
* It looks like GNOME/KDE only support `zwp_pointer_gestures_v1_interface`.
* If this isn't used anywhere, it could be removed.
* \{ */
@@ -3718,9 +3778,14 @@ static void keyboard_handle_leave(void *data,
GWL_Seat *seat = static_cast<GWL_Seat *>(data);
seat->keyboard.wl_surface_window = nullptr;
/* Losing focus must stop repeating text. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* Losing focus must stop repeating text. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
}
}
#ifdef USE_GNOME_KEYBOARD_SUPPRESS_WARNING
@@ -3743,7 +3808,7 @@ static xkb_keysym_t xkb_state_key_get_one_sym_without_modifiers(
/* Use an empty keyboard state to access key symbol without modifiers. */
xkb_keysym_t sym = xkb_state_key_get_one_sym(xkb_state_empty, key);
/* NOTE(@campbellbarton): Only perform the number-locked lookup as a fallback
/* NOTE(@ideasman42): Only perform the number-locked lookup as a fallback
* when a number-pad key has been pressed. This is important as some key-maps use number lock
* for switching other layers (in particular `de(neo_qwertz)` turns on layer-4), see: T96170.
* Alternative solutions could be to inspect the layout however this could get involved
@@ -3780,36 +3845,32 @@ static xkb_keysym_t xkb_state_key_get_one_sym_without_modifiers(
return sym;
}
/**
* \note Caller must lock `timer_mutex`.
*/
static void keyboard_handle_key_repeat_cancel(GWL_Seat *seat)
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
GHOST_ASSERT(seat->key_repeat.timer != nullptr, "Caller much check for timer");
delete static_cast<GWL_KeyRepeatPlayload *>(seat->key_repeat.timer->getUserData());
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
}
/**
* Restart the key-repeat timer.
* \param use_delay: When false, use the interval
* (prevents pause when the setting changes while the key is held).
*
* \note Caller must lock `timer_mutex`.
*/
static void keyboard_handle_key_repeat_reset(GWL_Seat *seat, const bool use_delay)
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
GHOST_ASSERT(seat->key_repeat.timer != nullptr, "Caller much check for timer");
GHOST_SystemWayland *system = seat->system;
GHOST_ITimerTask *timer = seat->key_repeat.timer;
GHOST_TimerProcPtr key_repeat_fn = timer->getTimerProc();
GHOST_TimerProcPtr key_repeat_fn = seat->key_repeat.timer->getTimerProc();
GHOST_TUserDataPtr payload = seat->key_repeat.timer->getUserData();
seat->system->removeTimer(seat->key_repeat.timer);
const uint64_t time_step = 1000 / seat->key_repeat.rate;
const uint64_t time_start = use_delay ? seat->key_repeat.delay : time_step;
seat->key_repeat.timer = system->installTimer(time_start, time_step, key_repeat_fn, payload);
gwl_seat_key_repeat_timer_remove(seat);
gwl_seat_key_repeat_timer_add(seat, key_repeat_fn, payload, use_delay);
}
static void keyboard_handle_key(void *data,
@@ -3848,6 +3909,11 @@ static void keyboard_handle_key(void *data,
break;
}
#ifdef USE_EVENT_BACKGROUND_THREAD
/* Any access to `seat->key_repeat.timer` must lock. */
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
struct GWL_KeyRepeatPlayload *key_repeat_payload = nullptr;
/* Delete previous timer. */
@@ -3872,7 +3938,7 @@ static void keyboard_handle_key(void *data,
else {
/* Key-up from keys that were not repeating cause the repeat timer to pause.
*
* NOTE(@campbellbarton): This behavior isn't universal, some text input systems will
* NOTE(@ideasman42): This behavior isn't universal, some text input systems will
* stop the repeat entirely. Choose to pause repeat instead as this is what GTK/WIN32 do,
* and it fits better for keyboard input that isn't related to text entry. */
timer_action = RESET;
@@ -3886,23 +3952,14 @@ static void keyboard_handle_key(void *data,
break;
}
case RESET: {
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* The payload will be added again. */
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
break;
}
case CANCEL: {
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
delete key_repeat_payload;
key_repeat_payload = nullptr;
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
break;
}
}
@@ -3956,8 +4013,8 @@ static void keyboard_handle_key(void *data,
utf8_buf));
}
};
seat->key_repeat.timer = seat->system->installTimer(
seat->key_repeat.delay, 1000 / seat->key_repeat.rate, key_repeat_fn, key_repeat_payload);
gwl_seat_key_repeat_timer_add(seat, key_repeat_fn, key_repeat_payload, true);
}
}
@@ -3982,8 +4039,13 @@ static void keyboard_handle_modifiers(void *data,
/* A modifier changed so reset the timer,
* see comment in #keyboard_handle_key regarding this behavior. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, true);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, true);
}
}
#ifdef USE_GNOME_KEYBOARD_SUPPRESS_WARNING
@@ -4002,9 +4064,14 @@ static void keyboard_repeat_handle_info(void *data,
seat->key_repeat.rate = rate;
seat->key_repeat.delay = delay;
/* Unlikely possible this setting changes while repeating. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, false);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* Unlikely possible this setting changes while repeating. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, false);
}
}
}
@@ -4275,8 +4342,14 @@ static void gwl_seat_capability_keyboard_disable(GWL_Seat *seat)
if (!seat->wl_keyboard) {
return;
}
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
}
}
wl_keyboard_destroy(seat->wl_keyboard);
seat->wl_keyboard = nullptr;
@@ -5411,6 +5484,8 @@ GHOST_SystemWayland::GHOST_SystemWayland(bool background)
#ifdef USE_EVENT_BACKGROUND_THREAD
gwl_display_event_thread_create(display_);
display_->ghost_timer_manager = new GHOST_TimerManager();
#endif
}
@@ -5491,10 +5566,16 @@ bool GHOST_SystemWayland::processEvents(bool waitForEvent)
#endif /* USE_EVENT_BACKGROUND_THREAD */
{
const uint64_t now = getMilliSeconds();
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*display_->system->timer_mutex};
{
std::lock_guard lock_timer_guard{*display_->system->timer_mutex};
if (ghost_timer_manager()->fireTimers(now)) {
any_processed = true;
}
}
#endif
if (getTimerManager()->fireTimers(getMilliSeconds())) {
if (getTimerManager()->fireTimers(now)) {
any_processed = true;
}
}
@@ -6717,6 +6798,13 @@ struct wl_shm *GHOST_SystemWayland::wl_shm() const
return display_->wl_shm;
}
#ifdef USE_EVENT_BACKGROUND_THREAD
GHOST_TimerManager *GHOST_SystemWayland::ghost_timer_manager()
{
return display_->ghost_timer_manager;
}
#endif
/** \} */
/* -------------------------------------------------------------------- */
@@ -6949,7 +7037,7 @@ bool GHOST_SystemWayland::window_cursor_grab_set(const GHOST_TGrabCursorMode mod
UNPACK2(xy_next));
wl_surface_commit(wl_surface);
/* NOTE(@campbellbarton): The new cursor position is a hint,
/* NOTE(@ideasman42): The new cursor position is a hint,
* it's possible the hint is ignored. It doesn't seem like there is a good way to
* know if the hint will be used or not, at least not immediately. */
xy_motion[0] = xy_next[0];
@@ -6992,7 +7080,7 @@ bool GHOST_SystemWayland::window_cursor_grab_set(const GHOST_TGrabCursorMode mod
if (mode != GHOST_kGrabDisable) {
if (grab_state_next.use_lock) {
if (!grab_state_prev.use_lock) {
/* TODO(@campbellbarton): As WAYLAND does not support warping the pointer it may not be
/* TODO(@ideasman42): As WAYLAND does not support warping the pointer it may not be
* possible to support #GHOST_kGrabWrap by pragmatically settings it's coordinates.
* An alternative could be to draw the cursor in software (and hide the real cursor),
* or just accept a locked cursor on WAYLAND. */

View File

@@ -165,6 +165,16 @@ class GHOST_SystemWayland : public GHOST_System {
bool cursor_grab_use_software_display_get(const GHOST_TGrabCursorMode mode);
#ifdef USE_EVENT_BACKGROUND_THREAD
/**
* Return a separate WAYLAND local timer manager to #GHOST_System::getTimerManager
* Manipulation & access must lock with #GHOST_WaylandSystem::server_mutex.
*
* See #GWL_Display::ghost_timer_manager doc-string for details on why this is needed.
*/
GHOST_TimerManager *ghost_timer_manager();
#endif
/* WAYLAND direct-data access. */
struct wl_display *wl_display();
@@ -233,7 +243,14 @@ class GHOST_SystemWayland : public GHOST_System {
* from running at the same time. */
std::mutex *server_mutex = nullptr;
/** Threads must lock this before manipulating timers. */
/**
* Threads must lock this before manipulating #GWL_Display::ghost_timer_manager.
*
* \note Using a separate lock to `server_mutex` is necessary because the
* server lock is already held when calling `ghost_wl_display_event_pump`.
* If manipulating the timer used the `server_mutex`, event pump can indirectly
* handle key up/down events which would lock `server_mutex` causing a dead-lock.
*/
std::mutex *timer_mutex = nullptr;
std::thread::id main_thread_id;

View File

@@ -1080,7 +1080,7 @@ GHOST_EventCursor *GHOST_SystemWin32::processCursorEvent(GHOST_WindowWin32 *wind
if (window->getCursorGrabMode() == GHOST_kGrabHide) {
window->getClientBounds(bounds);
/* WARNING(@campbellbarton): The current warping logic fails to warp on every event,
/* WARNING(@ideasman42): The current warping logic fails to warp on every event,
* so the box needs to small enough not to let the cursor escape the window but large
* enough that the cursor isn't being warped every time.
* If this was not the case it would be less trouble to simply warp the cursor to the
@@ -1179,7 +1179,7 @@ GHOST_EventKey *GHOST_SystemWin32::processKeyEvent(GHOST_WindowWin32 *window, RA
GHOST_TKey key = system->hardKey(raw, &key_down);
GHOST_EventKey *event;
/* NOTE(@campbellbarton): key repeat in WIN32 also applies to modifier-keys.
/* NOTE(@ideasman42): key repeat in WIN32 also applies to modifier-keys.
* Check for this case and filter out modifier-repeat.
* Typically keyboard events are *not* filtered as part of GHOST's event handling.
* As other GHOST back-ends don't have the behavior, it's simplest not to send them through.

View File

@@ -282,7 +282,7 @@ class GHOST_SystemWin32 : public GHOST_System {
GHOST_TSuccess exit();
/**
* Converts raw WIN32 key codes from the wndproc to GHOST keys.
* Converts raw WIN32 key codes from the `wndproc` to GHOST keys.
* \param vKey: The virtual key from #hardKey.
* \param ScanCode: The ScanCode of pressed key (similar to PS/2 Set 1).
* \param extend: Flag if key is not primly (left or right).
@@ -291,7 +291,7 @@ class GHOST_SystemWin32 : public GHOST_System {
GHOST_TKey convertKey(short vKey, short ScanCode, short extend) const;
/**
* Catches raw WIN32 key codes from WM_INPUT in the wndproc.
* Catches raw WIN32 key codes from WM_INPUT in the `wndproc`.
* \param raw: RawInput structure with detailed info about the key event.
* \param r_key_down: Set true when the key is pressed, otherwise false.
* \return The GHOST key (GHOST_kKeyUnknown if no match).
@@ -319,8 +319,8 @@ class GHOST_SystemWin32 : public GHOST_System {
* Creates tablet events from pointer events.
* \param type: The type of pointer event.
* \param window: The window receiving the event (the active window).
* \param wParam: The wParam from the wndproc.
* \param lParam: The lParam from the wndproc.
* \param wParam: The wParam from the `wndproc`.
* \param lParam: The lParam from the `wndproc`.
* \param eventhandled: True if the method handled the event.
*/
static void processPointerEvent(
@@ -337,8 +337,8 @@ class GHOST_SystemWin32 : public GHOST_System {
/**
* Handles a mouse wheel event.
* \param window: The window receiving the event (the active window).
* \param wParam: The wParam from the wndproc.
* \param lParam: The lParam from the wndproc.
* \param wParam: The wParam from the `wndproc`.
* \param lParam: The lParam from the `wndproc`.
*/
static void processWheelEvent(GHOST_WindowWin32 *window, WPARAM wParam, LPARAM lParam);

View File

@@ -278,7 +278,7 @@ uint8_t GHOST_SystemX11::getNumDisplays() const
void GHOST_SystemX11::getMainDisplayDimensions(uint32_t &width, uint32_t &height) const
{
if (m_display) {
/* NOTE(@campbellbarton): for this to work as documented,
/* NOTE(@ideasman42): for this to work as documented,
* we would need to use Xinerama check r54370 for code that did this,
* we've since removed since its not worth the extra dependency. */
getAllDisplayDimensions(width, height);
@@ -927,7 +927,7 @@ void GHOST_SystemX11::processEvent(XEvent *xe)
if (window->getCursorGrabMode() == GHOST_kGrabHide) {
window->getClientBounds(bounds);
/* TODO(@campbellbarton): warp the cursor to `window->getCursorGrabInitPos`,
/* TODO(@ideasman42): warp the cursor to `window->getCursorGrabInitPos`,
* on every motion event, see: D16557 (alternative fix for T102346). */
const int32_t subregion_div = 4; /* One quarter of the region. */
const int32_t size[2] = {bounds.getWidth(), bounds.getHeight()};
@@ -2015,8 +2015,8 @@ void GHOST_SystemX11::getClipboard_xcout(
return;
}
/* if it's not incr, and not format == 8, then there's
* nothing in the selection (that xclip understands, anyway) */
/* If it's not INCR, and not `format == 8`, then there's
* nothing in the selection (that `xclip` understands, anyway). */
if (pty_format != 8) {
*context = XCLIB_XCOUT_NONE;

View File

@@ -669,7 +669,7 @@ static void xdg_surface_handle_configure(void *data,
GHOST_SystemWayland *system = win->ghost_system;
const bool is_main_thread = system->main_thread_id == std::this_thread::get_id();
if (!is_main_thread) {
/* NOTE(@campbellbarton): this only gets one redraw,
/* NOTE(@ideasman42): this only gets one redraw,
* I could not find a case where this causes problems. */
gwl_window_pending_actions_tag(win, PENDING_FRAME_CONFIGURE);
}
@@ -774,7 +774,7 @@ GHOST_WindowWayland::GHOST_WindowWayland(GHOST_SystemWayland *system,
window_->ghost_window = this;
window_->ghost_system = system;
/* NOTE(@campbellbarton): The scale set here to avoid flickering on startup.
/* NOTE(@ideasman42): The scale set here to avoid flickering on startup.
* When all monitors use the same scale (which is quite common) there aren't any problems.
*
* When monitors have different scales there may still be a visible window resize on startup.
@@ -1078,7 +1078,7 @@ GHOST_WindowWayland::~GHOST_WindowWayland()
wl_surface_destroy(window_->wl_surface);
/* NOTE(@campbellbarton): Flushing will often run the appropriate handlers event
/* NOTE(@ideasman42): Flushing will often run the appropriate handlers event
* (#wl_surface_listener.leave in particular) to avoid attempted access to the freed surfaces.
* This is not fool-proof though, hence the call to #window_surface_unref, see: T99078. */
wl_display_flush(system_->wl_display());

View File

@@ -8,9 +8,9 @@ else
exit 1
fi
BRANCH="master"
BRANCH="main"
# repo="git://git.blender.org/libmv.git"
# repo="https://projects.blender.org/blender/libmv.git"
repo="/home/sergey/Developer/libmv"
tmp=`mktemp -d`

View File

@@ -40,11 +40,11 @@ inline float fast_acosf(float x)
/* clamp and crush denormals. */
const float m = (f < 1.0f) ? 1.0f - (1.0f - f) : 1.0f;
/* Based on http://www.pouet.net/topic.php?which=9132&page=2
* 85% accurate (ulp 0)
* 85% accurate (ULP 0)
* Examined 2130706434 values of acos:
* 15.2000597 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // without "denormal crush"
* 15.2000597 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // without "denormal crush"
* Examined 2130706434 values of acos:
* 15.2007108 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // with "denormal crush"
* 15.2007108 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // with "denormal crush"
*/
const float a = sqrtf(1.0f - m) *
(1.5707963267f + m * (-0.213300989f + m * (0.077980478f + m * -0.02164095f)));

View File

@@ -89,7 +89,7 @@ class EvalOutputAPI::EvalOutput {
// The following interfaces are dependant on the actual evaluator type (CPU, OpenGL, etc.) which
// have slightly different APIs to access patch arrays, as well as different types for their
// data structure. They need to be overridden in the specific instances of the EvalOutput derived
// classes if needed, while the interfaces above are overriden through VolatileEvalOutput.
// classes if needed, while the interfaces above are overridden through VolatileEvalOutput.
virtual void fillPatchArraysBuffer(OpenSubdiv_Buffer * /*patch_arrays_buffer*/)
{

View File

@@ -1,41 +1,18 @@
This folder contains several scripts to smoothen the Blender LTS releases.
This folder contains a script to generate release notes and download URLs
for Blender LTS releases.
create_download_urls.py
=======================
Ensure required Python modules are installed before running:
This python script is used to generate the download urls which we can
copy-paste directly into the CMS of www.blender.org.
pip3 install -r ./requirements.txt
Usage: create_download_urls.py --version 2.83.7
Then run for example:
Arguments:
--version VERSION Version string in the form of {major}.{minor}.{build}
(eg 2.83.7)
./create_release_notes.py --version 3.3.2 --format=html
The resulting html will be printed to the console.
Available arguments:
create_release_notes.py
=======================
This python script is used to generate the release notes which we can
copy-paste directly into the CMS of www.blender.org and stores.
Usage: ./create_release_notes.py --task=T77348 --version=2.83.7
Arguments:
--version VERSION Version string in the form of {major}.{minor}.{build}
(e.g. 2.83.7)
--task TASK Phabricator ticket that is contains the release notes
information (e.g. T77348)
--format FORMAT Format the result in `text`, `steam`, `wiki` or `html`
Requirements
============
* Python 3.8 or later
* Python phabricator client version 0.7.0
https://pypi.org/project/phabricator/
For convenience the python modules can be installed using pip
pip3 install -r ./requirements.txt
--version VERSION Version string in the form of {major}.{minor}.{build}
(e.g. 3.3.2)
--issue ISSUE Gitea issue that is contains the release notes
information (e.g. #77348)
--format FORMAT Format the result in `text`, `steam`, `wiki` or `html`

View File

@@ -1,169 +1,46 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-or-later
#!/usr/bin/env python3
import argparse
import phabricator
import lts_issue
import lts_download
DESCRIPTION = ("This python script is used to generate the release notes "
"which we can copy-paste directly into the CMS of "
DESCRIPTION = ("This python script is used to generate the release notes and "
"download URLs which we can copy-paste directly into the CMS of "
"www.blender.org and stores.")
USAGE = "./create_release_notes.py --task=T77348 --version=2.83.7"
# Parse arguments
parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument(
"--version",
required=True,
help="Version string in the form of {major}.{minor}.{patch} (e.g. 3.3.2)")
parser.add_argument(
"--issue",
help="Task that is contains the release notes information (e.g. #77348)")
parser.add_argument(
"--format",
help="Format the result in `text`, `steam`, `wiki` or `html`",
default="text")
args = parser.parse_args()
class ReleaseLogLine:
"""
Class containing the information of a single line of the release log
# Determine issue number
version = args.version
issue = args.issue
if not issue:
if version.startswith("2.83."):
issue = "#77348"
elif version.startswith("2.93."):
issue = "#88449"
elif version.startswith("3.3."):
issue = "#100749"
else:
raise ValueError("Specify --issue or update script to include issue number for this version")
Instance attributes:
# Print
if args.format == "html":
lts_download.print_urls(version=version)
print("")
* line: (str) the original line used to create this log line
* task_id: (int or None) the extracted task id associated with this log
line. Can be None if the log line isn't associated with a task.
* commit_id: (str or None) the extracted commit id associated with this log
line. Only filled when no `task_id` could be found.
* ref: (str) `task_id` or `commit_id` of this line, including `T` for tasks
or `D` for diffs.
* title: (str) title of this log line. When constructed this attribute is
an empty string. The called needs to retrieve the title from the
backend.
* url: (str) url of the ticket task or commit.
"""
def __init__(self, line: str):
self.line = line
items = line.split("|")
self.task_id = None
self.commit_id = None
try:
task_id = int(items[1].strip()[1:])
self.task_id = task_id
self.ref = f"T{self.task_id}"
except ValueError:
# no task
commit_string = items[3].strip()
commits = commit_string.split(",")
commit_id = commits[0]
commit_id = commit_id.replace("{", "").replace("}", "")
if not commit_id.startswith("rB"):
commit_id = f"rB{commit_id}"
self.commit_id = commit_id
self.ref = f"{self.commit_id}"
self.title = ""
self.url = f"https://developer.blender.org/{self.ref}"
def __format_as_html(self) -> str:
return f" <li>{self.title} [<a href=\"{self.url}\">{self.ref}</a>]</li>"
def __format_as_text(self) -> str:
return f"* {self.title} [{self.ref}]"
def __format_as_steam(self) -> str:
return f"* {self.title} ([url={self.url}]{self.ref}[/url])"
def __format_as_wiki(self) -> str:
if self.task_id:
return f"* {self.title} [{{{{BugReport|{self.task_id}}}}}]"
else:
return f"* {self.title} [{{{{GitCommit|{self.commit_id[2:]}}}}}]"
def format(self, format: str) -> str:
"""
Format this line
:attr format: the desired format. Possible values are 'text', 'steam' or 'html'
:type string:
"""
if format == 'html':
return self.__format_as_html()
elif format == 'steam':
return self.__format_as_steam()
elif format == 'wiki':
return self.__format_as_wiki()
else:
return self.__format_as_text()
def format_title(title: str) -> str:
title = title.strip()
if not title.endswith("."):
title = title + "."
return title
def extract_release_notes(version: str, task_id: int):
"""
Extract all release notes logs
# Process
1. Retrieval of description of the given `task_id`.
2. Find rows for the given `version` and convert to `ReleaseLogLine`.
3. based on the associated task or commit retrieves the title of the log
line.
"""
phab = phabricator.Phabricator()
phab.update_interfaces()
task = phab.maniphest.info(task_id=task_id)
description = task["description"]
lines = description.split("\n")
start_index = lines.index(f"## Blender {version} ##")
lines = lines[start_index + 1:]
for line in lines:
if not line.strip():
continue
if line.startswith("| **Report**"):
continue
if line.startswith("## Blender"):
break
log_line = ReleaseLogLine(line)
if log_line.task_id:
issue_task = phab.maniphest.info(task_id=log_line.task_id)
log_line.title = format_title(issue_task.title)
yield log_line
elif log_line.commit_id:
commits = phab.diffusion.commit.search(constraints={"identifiers": [log_line.commit_id]})
commit = commits.data[0]
commit_message = commit['fields']['message']
commit_title = commit_message.split("\n")[0]
log_line.title = format_title(commit_title)
yield log_line
def print_release_notes(version: str, format: str, task_id: int):
"""
Generate and print the release notes to the console.
"""
if format == 'html':
print("<ul>")
if format == 'steam':
print("[ul]")
for log_item in extract_release_notes(version=version, task_id=task_id):
print(log_item.format(format=format))
if format == 'html':
print("</ul>")
if format == 'steam':
print("[/ul]")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=DESCRIPTION, usage=USAGE)
parser.add_argument(
"--version",
required=True,
help="Version string in the form of {major}.{minor}.{build} (e.g. 2.83.7)")
parser.add_argument(
"--task",
required=True,
help="Phabricator ticket that is contains the release notes information (e.g. T77348)")
parser.add_argument(
"--format",
help="Format the result in `text`, `steam`, `wiki` or `html`",
default="text")
args = parser.parse_args()
print_release_notes(version=args.version, format=args.format, task_id=int(args.task[1:]))
lts_issue.print_notes(version=version, format=args.format, issue=issue)

View File

@@ -1,14 +1,9 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-or-later
import argparse
import datetime
DESCRIPTION = ("This python script is used to generate the download urls "
"which we can copy-paste directly into the CMS of "
"www.blender.org")
USAGE = "create_download_urls --version=2.83.7"
# Used date format: "September 30, 2020"
DATE_FORMAT = "%B %d, %Y"
@@ -62,19 +57,8 @@ def generate_html(version: Version) -> str:
return "\n".join(lines)
def print_download_urls(version: Version):
def print_urls(version: str):
"""
Generate the download urls and print them to the console.
"""
print(generate_html(version))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=DESCRIPTION, usage=USAGE)
parser.add_argument("--version",
required=True,
help=("Version string in the form of {major}.{minor}."
"{build} (eg 2.83.7)"))
args = parser.parse_args()
print_download_urls(version=Version(args.version))
print(generate_html(Version(version)))

169
release/lts/lts_issue.py Normal file
View File

@@ -0,0 +1,169 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-or-later
import requests
class ReleaseLogLine:
"""
Class containing the information of a single line of the release log
Instance attributes:
* line: (str) the original line used to create this log line
* issue_id: (int or None) the extracted issue id associated with this log
line. Can be None if the log line isn't associated with a issue.
* commit_id: (str or None) the extracted commit id associated with this log
line. Only filled when no `issue_id` could be found.
* ref: (str) `issue_id` or `commit_id` of this line, including `T` for issues
or `D` for diffs.
* title: (str) title of this log line. When constructed this attribute is
an empty string. The called needs to retrieve the title from the
backend.
* url: (str) url of the ticket issue or commit.
"""
def __init__(self, line: str):
self.line = line
items = line.split("|")
self.issue_id = None
self.issue_repo = None
self.commit_id = None
self.commit_repo = None
base_url = "https://projects.blender.org"
try:
issue_tokens = items[1].strip().split("#")
if len(issue_tokens[0]) > 0:
self.issue_repo = issue_tokens[0]
self.issue_id = issue_tokens[1]
else:
self.issue_repo = "blender/blender"
self.issue_id = issue_tokens[1]
self.ref = f"#{self.issue_id}"
self.url = f"{base_url}/{self.issue_repo}/issues/{self.issue_id}"
except IndexError:
# no issue
commit_string = items[3].strip()
commit_string = commit_string.split(",")[0]
commit_string = commit_string.split("]")[0]
commit_string = commit_string.replace("[", "")
commit_tokens = commit_string.split("@")
if len(commit_tokens) > 1:
self.commit_repo = commit_tokens[0]
self.commit_id = commit_tokens[1]
else:
self.commit_repo = "blender/blender"
self.commit_id = commit_tokens[0]
self.ref = f"{self.commit_id}"
self.url = f"{base_url}/{self.commit_repo}/commit/{self.commit_id}"
self.title = ""
def __format_as_html(self) -> str:
return f" <li>{self.title} [<a href=\"{self.url}\">{self.ref}</a>]</li>"
def __format_as_text(self) -> str:
return f"* {self.title} [{self.ref}]"
def __format_as_steam(self) -> str:
return f"* {self.title} ([url={self.url}]{self.ref}[/url])"
def __format_as_wiki(self) -> str:
if self.issue_id:
return f"* {self.title} [{{{{BugReport|{self.issue_id}}}}}]"
else:
return f"* {self.title} [{{{{GitCommit|{self.commit_id[2:]}}}}}]"
def format(self, format: str) -> str:
"""
Format this line
:attr format: the desired format. Possible values are 'text', 'steam' or 'html'
:type string:
"""
if format == 'html':
return self.__format_as_html()
elif format == 'steam':
return self.__format_as_steam()
elif format == 'wiki':
return self.__format_as_wiki()
else:
return self.__format_as_text()
def format_title(title: str) -> str:
title = title.strip()
if not title.endswith("."):
title = title + "."
return title
def extract_release_notes(version: str, issue: str):
"""
Extract all release notes logs
# Process
1. Retrieval of description of the given `issue_id`.
2. Find rows for the given `version` and convert to `ReleaseLogLine`.
3. based on the associated issue or commit retrieves the title of the log
line.
"""
base_url = "https://projects.blender.org/api/v1/repos"
issues_url = base_url + "/blender/blender/issues/"
headers = {'accept': 'application/json'}
response = requests.get(issues_url + issue[1:], headers=headers)
description = response.json()["body"]
lines = description.split("\n")
start_index = lines.index(f"## Blender {version}")
lines = lines[start_index + 1:]
for line in lines:
if not line.strip():
continue
if line.startswith("| **Report**"):
continue
if line.startswith("## Blender"):
break
if line.find("| -- |") != -1:
continue
log_line = ReleaseLogLine(line)
if log_line.issue_id:
issue_url = f"{base_url}/{log_line.issue_repo}/issues/{log_line.issue_id}"
response = requests.get(issue_url, headers=headers)
if response.status_code != 200:
raise ValueError("Issue not found: " + str(log_line.issue_id))
log_line.title = format_title(response.json()["title"])
yield log_line
elif log_line.commit_id:
commit_url = f"{base_url}/{log_line.commit_repo}/git/commits/{log_line.commit_id}"
response = requests.get(commit_url, headers=headers)
if response.status_code != 200:
raise ValueError("Commit not found: " + log_line.commit_id)
commit_message = response.json()['commit']['message']
commit_title = commit_message.split("\n")[0]
log_line.title = format_title(commit_title)
yield log_line
def print_notes(version: str, format: str, issue: str):
"""
Generate and print the release notes to the console.
"""
if format == 'html':
print("<ul>")
if format == 'steam':
print("[ul]")
for log_item in extract_release_notes(version=version, issue=issue):
print(log_item.format(format=format))
if format == 'html':
print("</ul>")
if format == 'steam':
print("[/ul]")

View File

@@ -1 +1 @@
phabricator==0.7.0
requests

View File

@@ -1002,11 +1002,11 @@ def unregister_tool(tool_cls):
# we start with the built-in default mapping
def _blender_default_map():
import rna_manual_reference as ref_mod
ret = (ref_mod.url_manual_prefix, ref_mod.url_manual_mapping)
# avoid storing in memory
del _sys.modules["rna_manual_reference"]
return ret
# NOTE(@ideasman42): Avoid importing this as there is no need to keep the lookup table in memory.
# As this runs when the user accesses the "Online Manual", the overhead loading the file is acceptable.
# In my tests it's under 1/100th of a second loading from a `pyc`.
ref_mod = execfile(_os.path.join(_script_base_dir, "modules", "rna_manual_reference.py"))
return (ref_mod.url_manual_prefix, ref_mod.url_manual_mapping)
# hooks for doc lookups

View File

@@ -1299,6 +1299,8 @@ def km_uv_editor(params):
{"properties": [("data_path", 'tool_settings.snap_uv_element')]}),
("wm.context_toggle", {"type": 'ACCENT_GRAVE', "value": 'PRESS', "ctrl": True},
{"properties": [("data_path", 'space_data.show_gizmo')]}),
("wm.context_toggle", {"type": 'Z', "value": 'PRESS', "alt": True, "shift": True},
{"properties": [("data_path", "space_data.overlay.show_overlays")]}),
*_template_items_context_menu("IMAGE_MT_uvs_context_menu", params.context_menu_event),
])
@@ -1968,6 +1970,8 @@ def km_image(params):
("image.clear_render_border", {"type": 'B', "value": 'PRESS', "ctrl": True, "alt": True}, None),
("wm.context_toggle", {"type": 'ACCENT_GRAVE', "value": 'PRESS', "ctrl": True},
{"properties": [("data_path", 'space_data.show_gizmo')]}),
("wm.context_toggle", {"type": 'Z', "value": 'PRESS', "alt": True, "shift": True},
{"properties": [("data_path", "space_data.overlay.show_overlays")]}),
*_template_items_context_menu("IMAGE_MT_mask_context_menu", params.context_menu_event),
])
@@ -2914,6 +2918,8 @@ def km_sequencer(params):
{"properties": [("side", 'RIGHT')]}),
("wm.context_toggle", {"type": 'TAB', "value": 'PRESS', "shift": True},
{"properties": [("data_path", 'tool_settings.use_snap_sequencer')]}),
("wm.context_toggle", {"type": 'Z', "value": 'PRESS', "alt": True, "shift": True},
{"properties": [("data_path", "space_data.show_overlays")]}),
*_template_items_context_menu("SEQUENCER_MT_context_menu", params.context_menu_event),
])
@@ -6338,6 +6344,8 @@ def km_node_link_modal_map(_params):
return keymap
# Fallback for gizmos that don't have custom a custom key-map.
def km_generic_gizmo(_params):
keymap = (
"Generic Gizmo",

View File

@@ -338,6 +338,7 @@ class NODE_MT_geometry_node_GEO_MESH_READ(Menu):
node_add_menu.add_node_type(layout, "GeometryNodeInputMeshEdgeAngle")
node_add_menu.add_node_type(layout, "GeometryNodeInputMeshEdgeNeighbors")
node_add_menu.add_node_type(layout, "GeometryNodeInputMeshEdgeVertices")
node_add_menu.add_node_type(layout, "GeometryNodeEdgesToFaceGroups")
node_add_menu.add_node_type(layout, "GeometryNodeInputMeshFaceArea")
node_add_menu.add_node_type(layout, "GeometryNodeInputMeshFaceNeighbors")
node_add_menu.add_node_type(layout, "GeometryNodeMeshFaceSetBoundaries")

View File

@@ -469,6 +469,11 @@ class RENDER_PT_eevee_next_shadows(RenderButtonsPanel, Panel):
def poll(cls, context):
return (context.engine in cls.COMPAT_ENGINES)
def draw_header(self, context):
scene = context.scene
props = scene.eevee
self.layout.prop(props, "use_shadows", text="")
def draw(self, context):
layout = self.layout
layout.use_property_split = True

View File

@@ -1996,7 +1996,7 @@ class SEQUENCER_PT_adjust_sound(SequencerButtonsPanel, Panel):
split = col.split(factor=0.4)
split.alignment = 'RIGHT'
split.label(text="Pan", heading_ctxt=i18n_contexts.id_sound)
split.label(text="Pan", text_ctxt=i18n_contexts.id_sound)
split.prop(strip, "pan", text="")
split.enabled = pan_enabled

File diff suppressed because it is too large Load Diff

View File

@@ -2267,7 +2267,7 @@ class ExperimentalPanel:
bl_region_type = 'WINDOW'
bl_context = "experimental"
url_prefix = "https://developer.blender.org/"
url_prefix = "https://projects.blender.org/"
@classmethod
def poll(cls, _context):
@@ -2308,8 +2308,8 @@ class USERPREF_PT_experimental_virtual_reality(ExperimentalPanel, Panel):
def draw(self, context):
self._draw_items(
context, (
({"property": "use_virtual_reality_scene_inspection"}, "T71347"),
({"property": "use_virtual_reality_immersive_drawing"}, "T71348"),
({"property": "use_virtual_reality_scene_inspection"}, ("blender/blender/issues/71347", "#71347")),
({"property": "use_virtual_reality_immersive_drawing"}, ("blender/blender/issues/71348", "#71348")),
),
)
"""
@@ -2319,13 +2319,18 @@ class USERPREF_PT_experimental_new_features(ExperimentalPanel, Panel):
bl_label = "New Features"
def draw(self, context):
self._draw_items(
context, (
({"property": "use_sculpt_tools_tilt"}, "T82877"),
({"property": "use_extended_asset_browser"}, ("project/view/130/", "Project Page")),
({"property": "use_override_templates"}, ("T73318", "Milestone 4")),
),
)
self._draw_items(context,
(({"property": "use_sculpt_tools_tilt"},
("blender/blender/issues/82877",
"#82877")),
({"property": "use_extended_asset_browser"},
("blender/blender/projects/10",
"Pipeline, Assets & IO Project Page")),
({"property": "use_override_templates"},
("blender/blender/issues/73318",
"Milestone 4")),
),
)
class USERPREF_PT_experimental_prototypes(ExperimentalPanel, Panel):
@@ -2334,12 +2339,12 @@ class USERPREF_PT_experimental_prototypes(ExperimentalPanel, Panel):
def draw(self, context):
self._draw_items(
context, (
({"property": "use_new_curves_tools"}, "T68981"),
({"property": "use_new_point_cloud_type"}, "T75717"),
({"property": "use_sculpt_texture_paint"}, "T96225"),
({"property": "use_full_frame_compositor"}, "T88150"),
({"property": "enable_eevee_next"}, "T93220"),
({"property": "enable_workbench_next"}, "T101619"),
({"property": "use_new_curves_tools"}, ("blender/blender/issues/68981", "#68981")),
({"property": "use_new_point_cloud_type"}, ("blender/blender/issues/75717", "#75717")),
({"property": "use_sculpt_texture_paint"}, ("blender/blender/issues/96225", "#96225")),
({"property": "use_full_frame_compositor"}, ("blender/blender/issues/88150", "#88150")),
({"property": "enable_eevee_next"}, ("blender/blender/issues/93220", "#93220")),
({"property": "enable_workbench_next"}, ("blender/blender/issues/101619", "#101619")),
),
)
@@ -2352,7 +2357,7 @@ class USERPREF_PT_experimental_tweaks(ExperimentalPanel, Panel):
def draw(self, context):
self._draw_items(
context, (
({"property": "use_select_nearest_on_first_click"}, "T96752"),
({"property": "use_select_nearest_on_first_click"}, ("blender/blender/issues/96752", "#96752")),
),
)
@@ -2371,8 +2376,8 @@ class USERPREF_PT_experimental_debugging(ExperimentalPanel, Panel):
def draw(self, context):
self._draw_items(
context, (
({"property": "use_undo_legacy"}, "T60695"),
({"property": "override_auto_resync"}, "T83811"),
({"property": "use_undo_legacy"}, ("blender/blender/issues/60695", "#60695")),
({"property": "override_auto_resync"}, ("blender/blender/issues/83811", "#83811")),
({"property": "use_cycles_debug"}, None),
({"property": "show_asset_debug_info"}, None),
({"property": "use_asset_indexing"}, None),

View File

@@ -96,8 +96,8 @@ Chat <a href="https://blender.chat/channel/today">
<p class="p5">
<span class="s3">Development <a href="https://www.blender.org/get-involved/developers/">
<span class="s4">www.blender.org/get-involved/developers/</span></a><br>
GIT and Bug Tracker <a href="https://developer.blender.org/">
<span class="s4">developer.blender.org</span></a><br>
GIT and Bug Tracker <a href="https://projects.blender.org/">
<span class="s4">projects.blender.org</span></a><br>
Chat <a href="https://blender.chat/channel/blender-coders">
<span class="s4">#blender-coders</span></a> on blender.chat</span>
</p>

View File

@@ -3,7 +3,7 @@ echo Starting blender with GPU debugging options, log files will be created
echo in your temp folder, windows explorer will open after you close blender
echo to help you find them.
echo.
echo If you report a bug on https://developer.blender.org you can attach these files
echo If you report a bug on https://projects.blender.org you can attach these files
echo by dragging them into the text area of your bug report, please include both
echo blender_debug_output.txt and blender_system_info.txt in your report.
echo.

View File

@@ -3,7 +3,7 @@ echo Starting blender with GPU debugging and glitch workaround options, log file
echo will be created in your temp folder, windows explorer will open after you
echo close blender to help you find them.
echo.
echo If you report a bug on https://developer.blender.org you can attach these files
echo If you report a bug on https://projects.blender.org you can attach these files
echo by dragging them into the text area of your bug report, please include both
echo blender_debug_output.txt and blender_system_info.txt in your report.
echo.

View File

@@ -3,7 +3,7 @@ echo Starting blender with debug logging options, log files will be created
echo in your temp folder, windows explorer will open after you close blender
echo to help you find them.
echo.
echo If you report a bug on https://developer.blender.org you can attach these files
echo If you report a bug on https://projects.blender.org you can attach these files
echo by dragging them into the text area of your bug report, please include both
echo blender_debug_output.txt and blender_system_info.txt in your report.
echo.

View File

@@ -3,7 +3,7 @@ echo Starting blender with factory settings, log files will be created
echo in your temp folder, windows explorer will open after you close blender
echo to help you find them.
echo.
echo If you report a bug on https://developer.blender.org you can attach these files
echo If you report a bug on https://projects.blender.org you can attach these files
echo by dragging them into the text area of your bug report, please include both
echo blender_debug_output.txt and blender_system_info.txt in your report.
echo.

View File

@@ -40,7 +40,7 @@ typedef int32_t ft_pix;
/* Macros copied from `include/freetype/internal/ftobjs.h`. */
/**
* FIXME(@campbellbarton): Follow rounding from Blender 3.1x and older.
* FIXME(@ideasman42): Follow rounding from Blender 3.1x and older.
* This is what users will expect and changing this creates wider spaced text.
* Use this macro to communicate that rounding should be used, using floor is to avoid
* user visible changes, which can be reviewed and handled separately.

View File

@@ -6,8 +6,7 @@
#pragma once
#include "BLI_float3x3.hh"
#include "BLI_math_vector_types.hh"
#include "BLI_math_matrix.hh"
#include "BLI_span.hh"
struct Depsgraph;
@@ -38,7 +37,7 @@ struct GeometryDeformation {
return translation;
}
const float3x3 &deform_mat = this->deform_mats[position_i];
return deform_mat.inverted() * translation;
return math::transform_point(math::invert(deform_mat), translation);
}
};

View File

@@ -2,31 +2,25 @@
#pragma once
#include "BKE_curves.h"
/** \file
* \ingroup bke
* \brief Low-level operations for curves.
*/
#include <mutex>
#include "BLI_bounds_types.hh"
#include "BLI_cache_mutex.hh"
#include "BLI_float3x3.hh"
#include "BLI_float4x4.hh"
#include "BLI_generic_virtual_array.hh"
#include "BLI_index_mask.hh"
#include "BLI_math_matrix_types.hh"
#include "BLI_math_vector_types.hh"
#include "BLI_offset_indices.hh"
#include "BLI_shared_cache.hh"
#include "BLI_span.hh"
#include "BLI_task.hh"
#include "BLI_vector.hh"
#include "BLI_virtual_array.hh"
#include "BKE_attribute.hh"
#include "BKE_attribute_math.hh"
#include "BKE_curves.h"
namespace blender::bke {

View File

@@ -2,7 +2,7 @@
#pragma once
#include "BLI_float4x4.hh"
#include "BLI_math_matrix_types.hh"
#include "BKE_geometry_set.hh"

View File

@@ -19,7 +19,7 @@
#include <mutex>
#include "BLI_float4x4.hh"
#include "BLI_math_matrix_types.hh"
#include "BLI_vector.hh"
#include "BLI_vector_set.hh"

View File

@@ -497,6 +497,7 @@ void BKE_lib_id_expand_local(struct Main *bmain, struct ID *id, int flags);
*
* Only for local IDs (linked ones already have a unique ID in their library).
*
* \param name: The new name of the given ID, if NULL the current given ID name is used instead.
* \param do_linked_data: if true, also ensure a unique name in case the given \a id is linked
* (otherwise, just ensure that it is properly sorted).
*

View File

@@ -142,6 +142,9 @@ enum {
/** Also process internal ID pointers like `ID.newid` or `ID.orig_id`.
* WARNING: Dangerous, use with caution. */
IDWALK_DO_INTERNAL_RUNTIME_POINTERS = (1 << 9),
/** Also process the ID.lib pointer. It is an option because this pointer can usually be fully
ignored. */
IDWALK_DO_LIBRARY_POINTER = (1 << 10),
};
typedef struct LibraryForeachIDData LibraryForeachIDData;

View File

@@ -222,8 +222,12 @@ void BKE_id_remapper_clear(struct IDRemapper *id_remapper);
bool BKE_id_remapper_is_empty(const struct IDRemapper *id_remapper);
/** Free the given ID Remapper. */
void BKE_id_remapper_free(struct IDRemapper *id_remapper);
/** Add a new remapping. */
/** Add a new remapping. Does not replace an existing mapping for `old_id`, if any. */
void BKE_id_remapper_add(struct IDRemapper *id_remapper, struct ID *old_id, struct ID *new_id);
/** Add a new remapping, replacing a potential already existing mapping of `old_id`. */
void BKE_id_remapper_add_overwrite(struct IDRemapper *id_remapper,
struct ID *old_id,
struct ID *new_id);
/**
* Apply a remapping.

View File

@@ -425,7 +425,7 @@ int set_listbasepointers(struct Main *main, struct ListBase *lb[]);
/**
* The size of thumbnails (optionally) stored in the `.blend` files header.
*
* NOTE(@campbellbarton): This is kept small as it's stored uncompressed in the `.blend` file,
* NOTE(@ideasman42): This is kept small as it's stored uncompressed in the `.blend` file,
* where a larger size would increase the size of every `.blend` file unreasonably.
* If we wanted to increase the size, we'd want to use compression (JPEG or similar).
*/

View File

@@ -29,6 +29,15 @@ struct UniqueName_Map;
struct UniqueName_Map *BKE_main_namemap_create(void) ATTR_WARN_UNUSED_RESULT;
void BKE_main_namemap_destroy(struct UniqueName_Map **r_name_map) ATTR_NONNULL();
/**
* Destroy all name_maps in given bmain:
* - In bmain itself for local IDs.
* - In the split bmains in the list is any (for linked IDs in some cases, e.g. if called during
* readfile code).
* - In all of the libraries IDs (for linked IDs).
*/
void BKE_main_namemap_clear(struct Main *bmain) ATTR_NONNULL();
/**
* Ensures the given name is unique within the given ID type.
*

Some files were not shown because too many files have changed in this diff Show More