Compare commits

...

251 Commits

Author SHA1 Message Date
5418953d4c Merge branch 'main' into tmp-usd-alab-v2-T100452 2023-02-15 00:15:27 -05:00
e1a29b58bb Fix: Curve resolution input node missing default
When the attribute doesn't exist, the node should give the default
of 12, as defined by the accessor method for `bke::CurvesGeometry`.

Pull Request #104674
2023-02-14 19:59:37 +01:00
87f96b7664 Fix wrong variable being used as plan in 'gizmo_3d_dial_matrixbasis_calc'
A plane must be `float[4]`.
2023-02-14 15:39:32 -03:00
a755e6e63f Revert "MSVC: lower C4100 warning level from 4 to 3"
This reverts commit db4e7616f3.

Caused many issues when compiling mantaflow.
2023-02-14 15:36:35 -03:00
f72d28d7e7 Gizmo: use the final matrix for the dial3D clip plane
Dial3D uses the `matrix_basis` for the clip plane.

This can cause inconsistencies if the gizmo has `matrix_offset` or
`matrix_space`.
2023-02-14 15:31:30 -03:00
db4e7616f3 MSVC: lower C4100 warning level from 4 to 3
This better aligns with OSX/Linux warnings.

Although `__pragma(warning(suppress:4100))` is not the same as
`__attribute__((__unused__))` in gcc (which only affects the attribute
instead of the line), it still seems to be better to use it than to
hide the warning entirely.
2023-02-14 14:38:18 -03:00
f4db58844d GPU: Fix wrong 2D shader used from 3D drawing
Regression introduced by d165d6aa2a.
2023-02-14 18:32:42 +01:00
acf7f46b77 Fix #104748: GPencil: Fill tool is not working at all
Caused by missing color uniform update.
2023-02-14 18:32:42 +01:00
4fbc9c42e5 Curves: Add transform tools to toolbar
This commit also makes `ED_transform_calc_gizmo_stats` work
for curves.

Pull Request #104750
2023-02-14 18:24:24 +01:00
5ca65001ea Geometry Nodes: Operator to wrap a modifier's node group
As described in #104171, add an operator that creates a new node group
that contain the current node group and named attribute nodes to deal
with the outputs. This saves manual work when moving a high-level
modifier to the node editor for better procedural control.

Pull Request #104546
2023-02-14 18:21:50 +01:00
c9f02569c7 Curves: cage overlay for sculpt mode
This adds a new overlay for curves sculpt mode that displays the curves that the
user currently edits. Those may be different from the evaluated/original curves
when procedural deformations or child curves are used.

The overlay can clash with the evaluated curves when they are exactly on top of
each other. There is not much we can do about that currently. The user will have
to decide whether the overlay should be shown or not on a case-by-case basis.

Pull Request #104467
2023-02-14 18:10:11 +01:00
eb9647c53d Cleanup: rename vertex buffer attribute to be more specific
This renames `data` and `color` to `selection`. This is better because
it's actually what the corresponding buffers contain. Using this
more correct name makes sharing vertex buffers between different
gpu batches for different shaders easier.
2023-02-14 17:53:20 +01:00
7ba59c8a62 Assets: show all assets by default instead of current file
The "current file" mode is only useful when creating new assets.
However, the far more common use case and the one that should require
fewer steps is to use existing assets.

There is a risk that this causes freezing if the file browser preview
caching does not work properly. So we'll have to keep an eye on the bug
tracker to see if this is an issue in practice.

Pull Request #104749
2023-02-14 17:44:41 +01:00
b3fb73f325 Assets: bundle Essentials with Blender
This patch adds an "Essentials" asset library that is bundled with Blender.
Also see #103620. At build time, the `lib/assets/publish` folder is copied
to `datafiles/assets` in the build directory.

In the UI, the "Essentials" library can be accessed like other custom asset
libraries with the exception that assets from that library cannot be linked.

The immediate impact of this is that Blender now comes with some geometry
node groups for procedural hair grooming.

Pull Request #104474
2023-02-14 17:35:29 +01:00
459c210154 Fix #104652: UV Face Dots randomized
Match the check from before 75db4c082b,
where we just checked that the face dots bitmap wasn't null.
2023-02-14 11:27:34 -05:00
715acf758c Curves: Use deformed points in edit mode overlay
Uses the deformed crazyspace to populate the position buffer for Curves edit mode overlay.

Pull Request #104705
2023-02-14 16:33:38 +01:00
cfe828b452 OBJ: Support polylines with more than 2 vertices.
The OBJ spec (page B1-17) allows "l" entries to specify
polylines with more than 2 vertices, optionally with texture
coordinates.

Previously, only the first 2 vertices of each polyline
were read and added as loose edges, failing when texture
coordinates were present.

This adds support for proper polylines, reading but ignoring
texture coordinates.

Pull Request #104503
2023-02-14 16:30:50 +01:00
4cc94679dd MSVC: suppress additional warnings out of external headers
In 161908157d we moved all warnings
coming out of the library folder to /W0 as many of them do not follow
our code-style nor can we force them to.

When i made this change, i took `/external:templates-` to mean
"and that goes for you too, templates" and it decisively does the
opposite leading to /W3 warnings coming out of openvdb

This change removes the flag as it should have never have been added
in the first place.
2023-02-14 07:56:14 -07:00
Robin Hohnsbeen
39f63c8c08 Sculpting: Vector displacement for the sculpting draw brush
Sculpt: Added vector displacement for the sculpting draw brush (area plane mapping only for now)

Vector displacement maps (VDM) provide a way to create complex displacements that can have overhangs in one brush dab.
This is unlike standard displacement with height maps that only displace in the normal direction.
Forms like ears, curled horns, etc can be created in one click if VMDs are used.
There is a checkbox on the draw brush in the texture settings "Vector Displacement" that enables/disables this feature.

Technical description: The RGB channels of a texture in a brush stroke are read and interpreted as individual vectors, that are used to offset vertices.
As of now, this is only working for the draw brush using the area plane mapping. Symmetry and radial symmetry are working.

A few things to consider when making VD-Maps:
* UVs need to stay intact for the bake mesh (e.g. voxel remeshing can't be used to create VD Meshes)
* When exporting a VD Map it should be in the file format OpenEXR (for positive and negative floating point values).
* Export resolution can be 512x512 or lower (EXR files can get very large, but VDM brushes don't need a high resolution)

And when using them:
* Inside Blender clamping needs to be unchecked on the texture
* The brush falloff should be set to constant (or nearly constant)

This patch was inspired by this [right-click-select proposal](https://blender.community/c/rightclickselect/WqWx/) Thanks for the post!

(Moved [this patch](https://archive.blender.org/developer/D17080) to here.)

Co-authored-by: Robin Hohnsbeen <robin@hohnsbeen.de>
Pull Request #104481
2023-02-14 15:29:27 +01:00
d4f7f1dfe4 Fix memory leak in view3d_select
There was an unintentional early return in the click select
execute function that caused the operator properties not
to be freed.
2023-02-14 14:59:07 +01:00
a76b5d3a07 Curves: change collision radius for consistency with default radius
Offset distance for curve collision  will be matched with default curve
radius for better visibility.

Pull Request #104648
2023-02-14 14:15:04 +01:00
59b9a88a35 Fix: duplicate elements node does not update curve type counts
Curve type counts are updated eagerly but it was missing in this
node leading to a crash further down the line where the counts
were expected to be correct.
2023-02-14 13:32:00 +01:00
4126284e46 Allocator: Fail building when trying to MEM_delete a void pointer
`MEM_delete()` is designed for type safe destruction and freeing, void
pointers make that impossible.
Was reviewing a patch that was trying to free a C-style custom data
pointer this way. Apparently MSVC compiles this just fine, other
compilers error out. Make sure this is a build error on all platforms
with a useful message.
2023-02-14 12:48:26 +01:00
1fd0c80725 Build: correct forwarding jobs argument to make_deps_wrapper.sh 2023-02-14 22:26:59 +11:00
5215543280 Constraints: use a dimensionless unit prop for Maintain Volume
The Volume property of the Maintain Volume constraint was marked as a
distance, which made it confusing--especially with non-metric units.

The volume can actually be understood as a factor of the initial
volume, so it should be dimensionless.

Additionally, the volume had a range of 0.001 to 100.0. This is wide
enough in most cases, but sometimes you may need to go orders of
magnitude higher or lower to consider vast or thin objects, and there
should be no drawback to extending the limits, provided they stay
positive.

Pull Request #104489
2023-02-14 12:19:53 +01:00
1e449bee21 Cycles: Temporary fix for Principled Hair albedo
The proper fix (bb9eb262d4) caused compilation problems with HIP, so we're
delaying it until 3.6.
To fix the original bug report (#104586), this is a quick workaround that'll
hopefully not upset the compiler.

Pull Request #104723
2023-02-14 12:18:10 +01:00
5040c39d1a Fix T103354: Author extents on UsdGeomMesh
A properly authored USD file will have the extent attribute authored on all prims conforming to UsdGeomBoundable.
This cached extent information is useful because it allows the 3D range of prims to be quickly understood without reading potentially large arrays of data. Note that because the shape of prims may change over time, extent attributes are always evaluated for a given timecode.

This patch introduces support for authoring extents on meshes and volumes during export to USD.

Because extents are common to multiple kinds of geometries, the main support for authoring extents has been placed in USDAbstractWriter, whose new author_extent method can operate on any prim conforming to pxr::UsdGeomBoundable. The USD library already provides us the code necessary to compute the bounds for a given prim, in pxr::UsdGeomBBoxCache::ComputeLocalBound.
Note that not all prims that are imageable are boundable, such as transforms and cameras.

For more details on extents, see https://graphics.pixar.com/usd/release/api/class_usd_geom_boundable.html#details.

Note that when new types of geometries are introduced, such as curves in https://developer.blender.org/D16545, we will need to update the USD writer for that geometry such that it calls this->author_extent.

Update on Feb 2: This patch has been updated to include a unit test to ensure authored extents are valid. This test requires new test assets that will need to be submitted via svn. The test assets are attached in the d16837_usd_test_assets.zip file. To use, unzip and merge the contents of this zip into the lib/tests/usd folder.

This unit test also addresses #104269 by validating compliance of exported USD via UsdUtils.ComplianceChecker.

Pull Request #104676
2023-02-14 12:11:53 +01:00
a3e6cb5dab Gizmo: press shift key to scale cage2d gizmo uniformly
Reviewed by: Campbell Barton
Pull Request #104418
2023-02-14 11:08:54 +01:00
fe0a219b5b Revert: Formatting changes space_toolsystem_toolbar 2023-02-14 10:30:49 +01:00
09498264f6 Deps: build one at a time, each using all available cores on Linux
Use a MAKE wrapper for 'make deps' on Linux that ensures dependencies
are built one at a time. This is preferable because building many
dependencies at once made troubleshooting impractical and had the
downside that large deps such as LLVM would bottleneck on a single core.

This may be used for macOS, so far it's only tested on Linux.
2023-02-14 16:37:04 +11:00
46c34ba1f6 GNUmakefile: include install directory in 'make deps' message
Avoids uncertainly, ensure deps are copied from the right place.
2023-02-14 13:26:30 +11:00
d02f863b18 Update RNA to User manual mappings 2023-02-14 13:19:18 +11:00
d851c91856 PyDocs: update links in generated API docs for GITEA migration
- Correct broken link for undocumented modules.
  Point to the contributing page, it seems #51062 was lost with the
  GITEA task migration.
- Correct Blender Version link to the SHA1.
2023-02-14 12:41:51 +11:00
a6dcf6e6cb Cleanup: avoid redundant sqrt when picking reroute nodes 2023-02-14 12:26:29 +11:00
818c16ef1f Cleanup: pass a const argument to CustomData_get_elem_size 2023-02-14 12:26:01 +11:00
750e278fd1 Cleanup: line wrapping in USERPREF_PT_experimental_new_features
Changing from Phabricator links caused poor formatting.
Also remove blank lines between comment and it's function.
2023-02-14 12:13:04 +11:00
85f8ba9db7 Cleanup: include missing header, sort CMake file lists 2023-02-14 11:52:14 +11:00
1ac80e8338 Cleanup: quiet unreachable-code warning, use ARRAY_SIZE macro 2023-02-14 11:50:00 +11:00
6dc88682ea License headers: add missing license identifier 2023-02-14 11:47:13 +11:00
36937e523a Fix #66863: Bisect plane rotation behavior bugged for side/cut view
The matrix of the rotation gizmo is somewhat confusing.

Attaching to the Z axis has more predictable results.
2023-02-13 21:15:38 -03:00
0fa34aa0ec Cleanup: spelling in comments, reference enum types in doc-strings
Also use doxy formatting for structs in sculpt_uv.c.
2023-02-14 10:29:48 +11:00
e928dd300b Cleanup: format 2023-02-14 10:29:46 +11:00
dc46465490 Cleanup: use typed enum (eDirEntry_SelectFlag) where appropriate 2023-02-14 10:29:44 +11:00
1d7bf4f826 BLI_utildefines: add ^= operator to ENUM_OPERATORS macro 2023-02-14 10:29:42 +11:00
Erik Abrahamsson
526f2273c6 Docs: improve online manual lookup time time
Matching the RNA id's to the search pattern is slow because
of the function `fnmatchcase`. This patch first checks the string
prefix without any special characters used by fnmatch,
if the `startswith` check fails, there is no need to check `fnmatchcase`.
Before the optimization, an online manual lookup took about 400ms
which is quite noticeable, with this patch applied it's under 10ms.

Ref !104581.
2023-02-14 10:29:40 +11:00
bb9eb262d4 Revert "Cycles: Clean up the Principled Hair BSDF implementation"
This reverts commit cb77865c21.

Appears to break HIP compilation, so delay until 3.6.
2023-02-13 23:32:53 +01:00
1f615690e6 Cycles: Add support for CUDA 12
CUDA 12 no longer supports sm_3*, in particular 35 and sm_37,
so skip building those if only CUDA 12+ is available.

Pull Request #104713
2023-02-13 23:04:02 +01:00
6a0b1eae8c Fix #104097: re-enable Cycles AMD Vega support
The internal compiler error appears to be gone. Unclear why it appeared in the
first place and why it's gone now. Just random kernel code changes causing it.

Pull Request #104719
2023-02-13 22:53:08 +01:00
cb77865c21 Cycles: Clean up the Principled Hair BSDF implementation
- Rename roughness variables for more clarity - before, the SVM/OSL code would
  set s and v to the linear roughness values, and the setup function would over-
  write them with the distribution parameters. This actually caused a bug in the
  albedo code, since it intended to use the linear roughness value, but ended up
  getting the remapped value.
- Deduplicate the evaluation and sample functions. Most of their code is the
  same, only the middle part is different.
- Changed albedo computation to return the sum of the intensities of the four
  BSDF lobes. Previously, the code applied the inverse of the color->sigma
  mapping from the paper - this returns the color specified in the node, but
  for very dark hair (e.g. when using the Melanin controls) the result is
  extremely low (e.g. 0.000001) despite the hair still reflecting a significant
  amount of light (since the R lobe is independent of sigma). This causes issues
  with the light component passes, so this change fixes #104586.
- There's quite a few computations at the start of the evaluation function that
  are needed for sampling, evaluation and albedo computation, but only depend on
  the view direction. Therefore, just precompute them - we still have space in
  PrincipledHairExtra after all.
- Fix a tiny bug - the direction sampling code did not account for the R lobe
  roughness modifier.

Pull Request #104669
2023-02-13 22:49:01 +01:00
7dbf2e2e95 Curves: Draw point overlay only in point selection mode
In case the selection domain is `ATTR_DOMAIN_CURVE` then do not draw the points.

Pull Request #104715
2023-02-13 22:25:06 +01:00
c92c52f0c9 Cleanup: Move draw_attributes.hh to C++
In order to experiment with different storage types for `DRW_Attributes`
and for general cleanup (see #103343). Also move a curves header to C++.

Pull Request #104716
2023-02-13 20:56:24 +01:00
dfacaf4f40 Fix: Incorrect BMesh to Mesh attribute copying
The existing logic to copy `BMesh` custom data layers to `Mesh`
attribute arrays was quite complicated, and incorrect in some cases
when the source and destinations didn't have the same layers.
The functions leave a lot to be desired in general, since they have
a lot of redundant complexity that ends up doing the same thing for
every element.

The problem in #104154 was that the "rest_position" attribute overwrote
the mesh positions since it has the same type and the positions weren't
copied. This same problem has shown up in boolean attribute conversion
in the past. Other changes fixed some specific cases but I think a
larger change is the only proper solution.

This patch adds preprocessing before looping over all elements to
find the basic information for copying the relevant layers, taking
layer names into account. The preprocessing makes the hot loops
simpler.

In a simple file with a 1 million vertex grid, I observed a 6%
improvement animation playback framerate in edit mode with a simple
geometry nodes modifier, from 5 to 5.3 FPS.

Fixes #104154, #104348

Pull Request #104421
2023-02-13 20:52:02 +01:00
0dfc102531 Fix #104588: Initialize Face Sets from edit mode selection broken
Mistake in da4bd24c3e
2023-02-13 14:16:23 -05:00
1e794d9a92 BMesh: Add flexibility for future lazily created UV selection
Don't crash on nonexisting uv selection layers. Add an assert
because for now it is a bug if they don't exist. But when converting
back to Mesh it is preferable to accept in release mode, as opposed to
crashing.

Pull Request #104600
2023-02-13 20:08:30 +01:00
684789c815 Fix #104501: Clear CD_FLAG_NOCOPY after use
When generating a Mesh from a BMesh the uv map bool layers are not
copied if all elements are false. To suppress the copying the flag
CD_FLAG_NOCOPY is set in the layer flags. However these layers *do*
need to be copied to other BMeshes (for example undo steps). So we
need to clear them afterwards.
2023-02-13 20:08:29 +01:00
72a85d976a USD Import: USD Shapes support
This commit adds the ability to import USD Shape primitives (Gprims).
They are imported as Blender Meshes using the USD API to convert, so
that they appear the same as they would in other applications. USD
Shapes are important in many workflows, particularly in gaming, where
they are used for stand-in geometry or for collision primitives.

Pull Request #104707
2023-02-13 19:49:24 +01:00
8d20db0594 Curves: Fix curve segment selection
Make sure that in curve selection mode, the segments are checked
for intersection instead of only the points.

Pull Request #104703
2023-02-13 19:05:28 +01:00
64f83f80c6 Fix #104373: Motion Tracker solve with anamorphic footage not working
The issue was caused by rather recent refactor in 7dea18b3aa.

The root of the issue lies within the fact that the optical center was updated
on the Blender side after the solution was run. There was a mistake in the code
which double-corrected for the pixel aspect ratio.

Added a comment in the code about this, so that it does not look suspicious.

Pull Request #104711
2023-02-13 18:49:45 +01:00
9b7d71cec2 Cleanup: Remove redundant translation markers in context poll message
`CTX_wm_operator_poll_msg_set()` is covered by the translation script
and always translates these strings. Checked with Bastien, he prefers
not having the redundancy here.
2023-02-13 18:41:38 +01:00
64e4aede7f Fix bug report including rB commit hash prefix not needed for Gitea 2023-02-13 18:34:13 +01:00
675717eb79 Cleanup: fix a few typos in UI messages
Issues reported by @Joan-Pujolar in #43295.

Pull Request #104672
2023-02-13 18:24:53 +01:00
c2a5c592fe Fix #103397: Vector UI control shader UBO alignment fix.
Replace float3 with packed_float3 when using additional type padding.

Authored by Apple: Michael Parkin-White

Ref #96261
Pull Request #104699
2023-02-13 18:16:38 +01:00
0e1a6f6033 Cleanup: Add in NLAStrip / NLATrack remove / clean methods
## Cleanup: Refactor NLATrack / NLAStrip Remove

This PR adds 3 new methods:
* BKE_nlatrack_remove_strip
* BKE_nlastrip_remove
* BKE_nlastrip_remove_and_free

These named BKE methods are really just replacements for BLI_remlink, but with some added checks, and enhanced readability.

Co-authored-by: Nate Rupsis <nrupsis@gmail.com>
Pull Request #104437
2023-02-13 18:10:13 +01:00
d42c803f5b Cleanup: add comment explaining #interpf is performing lerp 2023-02-13 17:32:19 +01:00
218a909dec GPencil: Fix unreported Eyedropper color difference in Materials
The color selected was converted wrongly for materials. The undo of the conversion must be done only for palettes.

Also, some code cleanup done.
2023-02-13 17:30:20 +01:00
bea1eff3a5 Fix: No update setting curve selection domain with all selected
Send a notification and tag for an update even if the selection doesn't
exist, which is still necessary for drawing that depends on the
selection domain.
2023-02-13 11:22:31 -05:00
d5c60fb685 Cleanup: add comment to cage gizmo scaling regarding the boundary 2023-02-13 17:08:23 +01:00
9fecf1f8b8 Cycles: Replace resolution divider loop with an analytical formula
As a side effect of this change, more resolution divisions are now available.
Before this patch the possible resolution divisions were all powers of two.
Now the possible resolution divisions are the multiples of pixel_size.

This increase in possible resolution divisions is the same idea proposed in https://archive.blender.org/developer/D13590.
In that patch there were concerns that this will increase the time between a user navigating
and seeing the 1:1 render. To my knowledge this is a non-issue and there should be
little to no increase in time between those two events.

Pull Request #104450
2023-02-13 13:02:47 +01:00
99e71ec1f2 Assets: Store pointer to owning asset library in asset representation
This is needed to be able to query asset library information from an
asset. This again is relevant especially for the "All" asset library,
where you can't just directly access the library itself, which is
different for different assets.

The current design is that an asset representation is owned by exactly
one asset library, so having this pointer is perfectly compatible with
the design.

Reviewed by: Julian Eisel
2023-02-13 12:57:03 +01:00
4f19e99cb1 Tests: add unit test for BLI_strnlen
It wasn't there. Now it is.
2023-02-13 12:54:35 +01:00
147c75b831 Cleanup: EEVEE-Next: Fix typo 2023-02-13 12:49:51 +01:00
0d9fbfe7fe GPUShader: Fix compilation caused by designated initializers in C++ 2023-02-13 12:49:22 +01:00
da0c182a61 Cycles: Fix compilation after recent GPU cleanups 2023-02-13 12:17:46 +01:00
dd171f7743 Cleanup: GPUShader: Rename GPU_shader_uniform_vector
Rename to `GPU_shader_uniform_float/int_ex` to make more sense as a
general purpose function.
2023-02-13 11:22:38 +01:00
b68bac7ced Cleanup: GPUShader: Remove GPU_shader_uniform_int/float
Simplify the API, leaving only one function to set uniform without the
uniform name.
2023-02-13 11:22:38 +01:00
173a8f4ac9 GPU: Removes GPU_shader_get_builtin_ssbo
Simplify the API. Use hardcoded ssbo location instead.
2023-02-13 11:22:38 +01:00
164f591033 Cleanup: GPU: Rename some functions for consistency 2023-02-13 11:22:38 +01:00
83a6642045 Cleanup: GPU: Move eGPUKeyframeShapes to shader shared
Removes code duplication.
2023-02-13 11:22:38 +01:00
158f87203e Cleanup: GPUShader: Reorganize GPU_shader.h to separate depecated API
This avoid confusion to what to use nowadays.
Also improves documentation.
2023-02-13 11:22:38 +01:00
d92c28582a Cleanup: GPUShader: Split Builtins to their own header
Also improve documentation and cleanup.
2023-02-13 11:22:38 +01:00
d165d6aa2a GPU: Remove GPU_SHADER_3D_POINT_FIXED_SIZE_VARYING_COLOR
This replaces `GPU_SHADER_3D_POINT_FIXED_SIZE_VARYING_COLOR` by
GPU_SHADER_2D_POINT_UNIFORM_SIZE_UNIFORM_COLOR_OUTLINE_AA`.

None of the usage made sense to not use the AA shader.
Scale the point size to account for the rounded shape.
2023-02-13 11:22:38 +01:00
5f3c97b999 GPU: Move gpu_shader_builtin.c to C++
This also removes all unused part of it now that we have converted all
shaders to use create infos.
2023-02-13 11:22:38 +01:00
afd6358fc0 Gizmo: add gizmos for spot and point light radius
Ref #104280

Pull Request #104410
2023-02-13 11:18:21 +01:00
701f90c677 Gizmo: make cage gizmo follow the cursor exactly when scaling
Ref #104280
The scaling of area light / spot light blend was wrong because it is
calculated for pivot at the edges. The new implementation in theory
works for all `abs(pivot) <= 0.5f`, although we only have -0.5, 0, and
0.5.
- Axis constraint for box cage was only applied when there is translate
flag, now the same logic is applied regardless of the translate flag,
this means when dragging the edge, the scaling in the other axis stays
the same; when dragging the corners, it applies free-form scaling.
- Due to the existence of margin, `data->orig_mouse` does not lie
exactly on the boundary. Using that value to compute the scaling causes
the error to accumulate over distance. The new implementation uses the
original dimension of the object instead, and only uses
`data->orig_mouse` to determine the side of the original cursor relative
to the pivot.
- For circular gizmo with unsigned scaling, the gizmo only follow the
cursor exactly when the cursor stays in the original quadrant, otherwise
it's hard to handle the logic when we should clamp the scaling.
2023-02-13 10:48:31 +01:00
9f4ee75c9b Fix #80307: Underline shortcut not working in enum context menus
The buttons of enum context menus are of type `UI_BUT_ROW`. They
are part of the set of buttons we create underline shortcuts for in
`ui_menu_block_set_keyaccels`.
But since they weren't handled in `ui_handle_button_activate_by_type`,
pressing the underline shortcuts didn't do anyting in those cases.

Co-authored-by: Leon Schittek <leon.schittek@gmx.net>
Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request #104433
2023-02-13 10:43:36 +01:00
2e19aa1399 UI: add camera passepartout color to theme
Currently the passepartout color is hardcoded to black. While a
sensible default for cinema, it may make less sense for other media,
whether video, print, web, etc. It greatly affects viewing conditions
of the image and should be user selectable, much like painting
programs allow.

Pull Request #104486
2023-02-13 09:15:34 +01:00
918cf6cd77 Fix #104341: Handle edge case in Curve to Mesh node
Don't create caps when using cyclic profile splines with two or fewer
points.
This case wasn't handled, yet, leading to invalid meshes or crashes.

Co-authored-by: Leon Schittek <leon.schittek@gmx.net>
Pull Request #104594
2023-02-13 09:03:00 +01:00
af8941e6a8 Vulkan: Use guardedalloc for driver allocations.
Vulkan has a pluggable memory allocation feature, which allows internal
driver allocations to be done by the client application provided
allocator. Vulkan uses this for more client application allocations
done inside the driver, but can also do it for more internal oriented
allocations.

VK_ALLOCATION_CALLBACKS initializes allocation callbacks for host allocations.
The macro creates a local static variable with the name vk_allocation_callbacks
that can be passed to vulkan API functions that expect
const VkAllocationCallbacks *pAllocator.

When WITH_VULKAN_GUARDEDALLOC=Off the memory allocation implemented
in the vulkan device driver is used for both internal and application
oriented memory operations.

For now this would help during the development of Vulkan backend to
detect hidden memory leaks that are hidden inside the driver part
of the stack. In a later stage we need to measure the overhead and
if this should become the default behavior.

Pull Request #104434
2023-02-13 08:37:35 +01:00
f828ecf4ba GPU: Use same read back API as SSBOs
The GPU module has 2 different styles when reading back data from
GPU buffers. The SSBOs used a memcpy to copy the data to a
pre-allocated buffer. IndexBuf/VertBuf gave back a driver/platform
controlled pointer to the memory.

Readback is done for test cases returning mapped pointers is not safe.
For this reason we settled on using the same approach as the SSBO.
Copy the data to a caller pre-allocated buffer.

Reason why this API is currently changed is that the Vulkan API is more
strict on mapping/unmapping buffers that can lead to potential issues
down the road.

Pull Request #104571
2023-02-13 08:34:19 +01:00
86b3073c9e Cleanup: Quiet unused variable warning
Also name another argument for consistency.
2023-02-12 22:27:07 -05:00
2a7440176e Fix: Missing const specifier for curve field input
Mistake in 000e722c7d, which probably made the viewer node
auto-domain detection behave differently when the special case was used.
2023-02-12 20:17:41 -05:00
6ea3fdebc8 Fix: Workbench Next: Extruded frustum binding 2023-02-12 23:40:28 +01:00
77963ff778 Fix #104637: EEVEE Displacement regression after #104595
Keep using the 3 evaluations dF_branch method for the Displacement output.
The optimized 2 evaluations method used by node_bump is now on its own macro (dF_branch_incomplete).
displacement_bump modifies the normal that nodetree_exec uses, so even with a refactor it wouldn’t be possible to re-use the computation anyway.
2023-02-12 23:06:21 +01:00
f0669ff8ba BLI: use larger integer type in BitVector
Using larger integer types allows for more efficient code, because we
can use the hardware better. Instead of working on individual bytes,
the code can now work on 8 bytes at a time. We don't really benefit
from this immediately but I'm planning to implement some more optimized
bit vector operations for #104629.

Pull Request #104658
2023-02-12 18:00:48 +01:00
3f40962414 Cleanup: use sized int types for polyfill_2d
Also correct building when USE_CLIP_EVEN is disabled.
2023-02-12 16:35:24 +11:00
32149f8d7a Tests: add polyfill2d test to ensure the result has no zero area tris
Add a test to address the issue raised in #103913, where zero area
triangles could be created from polygons that have co-linear edges
but were not degenerate.
2023-02-12 16:26:34 +11:00
91346755ce Cleanup: use '#' prefix for issues instead of 'T'
Match the convention from Gitea instead of Phabricator's T for tasks.
2023-02-12 14:56:05 +11:00
a02fa6c40d Cleanup: spelling in comments 2023-02-12 14:23:16 +11:00
10354b043f Fix crash selecting faces in wire-frame mode
Regression in [0] didn't account for the mesh not having
subdivision surface is applied.

[0]: 75db4c082b
2023-02-12 14:20:52 +11:00
c7456272b1 Cleanup: EEVEE-Next: Add LIGHT_FOREACH macros to clang-format exceptions 2023-02-12 01:41:54 +01:00
77aa9e8809 Cleanup: GPU: Remove commented lines without any comments or purpose
These were added during a big refactor. They were supposed to be
uncommented at some point but the new code does not even need a default
world.
2023-02-12 01:21:53 +01:00
d33960aead Cleanup: remove whole-archive linking for USD
Since USD is no longer statically linked these linker tricks
are no longer needed.

Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Pull Request #104627
2023-02-11 19:48:47 +01:00
085c854b2a Fix curves selection toggling 2023-02-11 19:23:48 +01:00
82867753cf Transform: Hide trackball gizmo while dragging
It was accidentally displayed in a38d99e0b2.
2023-02-11 15:22:57 -03:00
232e02282e Fix circular transform gizmo always displaying Global orientation
The Global orientation comes from the mode's default orientation
(without the constraints).

It's not really exposed.
2023-02-11 15:20:38 -03:00
b9fa32cccd Fix #104587: 'Extrude To Cursor' snapping ignoring 'Target Selection'
Although not a transform operator, `Extrude to Cursor` depends on some
snapping settings.

So it should use the `Target Selection` options as well.
2023-02-11 14:50:37 -03:00
197eee6e04 Fix transform gizmos not changing in Automatic Constraint mode 2023-02-11 13:56:06 -03:00
e732580fcc Nodes: change order of Hide Value and Hide in Modifier
Based on the review comment in #104517.
2023-02-11 16:14:38 +01:00
158f809dcb Geometry Nodes: Add option to hide input in modifier
When building a node group that's meant to be used directly in the
node editor as well as in the modifier, it's useful to be able to have
some inputs that are only meant for the node editor, like inputs that
only make sense when combined with other nodes.

In the future we might have the ability to only display certain assets
in the modifier and the node editor, but until then this simple solution
allows a bit more customization.

Pull Request #104517
2023-02-11 16:11:10 +01:00
19ea673260 Cleanup: Remove const keyword in declarations 2023-02-11 15:05:55 +01:00
b723a398f3 Curves: initial surface collision for curves sculpt mode
During hair grooming in curves sculpt mode, it is very useful when hair strands
are prevented from intersecting with the surface mesh. Unfortunately, it also
decreases performance significantly so we don't want it to be turned on all the time.

The surface collision is used by the Comb, Pinch and Puff brushes currently.
It can be turned on or off on a per-geometry basis.

The intersection prevention quality of this patch is not perfect yet. This can
be improved over time using a better solver. Overall, perfect collision detection
at the cost of bad performance is not necessary for interactive sculpting,
because the user can fix small mistakes very quickly. Nevertheless, the quality
can probably still be improved significantly without too big slow-downs depending
on the use case. This can be done separately from this patch.

Pull Request #104469
2023-02-11 13:46:37 +01:00
0f708fa2e3 Geometry Nodes: use smooth normals in Distribute Points on Faces node
Previously, the node used the "true" normal of every looptri. Now it uses the
"loop normals" which includes e.g. smooth faces and custom normals. The true
normal can still be used on the points by capturing it before the Distribute node.

We do intend to expose the smooth normals separately in geometry nodes as well,
but this is an important first step.

It's also necessary to generate child hair between guide hair strands that don't
have visible artifacts at face boundaries.

For perfect backward compatibility, the node still has a "Legacy Normal" option
in the side bar. Creating the exact same behavior with existing nodes isn't
really possible unfortunately because of the specifics of how the Distribute
node used to compute the normals using looptris.

Pull Request #104414
2023-02-11 13:25:59 +01:00
6478eb565a Cleanup: format 2023-02-11 14:26:56 +11:00
fefc6a73b3 Fix pep8 checker operating on dot-files
Temporary editor files were included which could make the checker fail.
2023-02-11 14:12:43 +11:00
9f4edf8c2a Cleanup: remove unused variables 2023-02-11 14:04:35 +11:00
ce44953933 Cleanup: various C++ cleanups 2023-02-11 14:04:35 +11:00
343bb4a5a3 Cleanup: Use const char * for layer names in collada exporter
CustomData layer names should not be written except via the CusomData
api. Therefore use const char * instead of char * when referencing the
layer name.

Pull Request #104585
2023-02-11 01:13:38 +01:00
efabe81c91 Fix #103903: Bump Node performance regression
Avoid computing the non-derivative height twice.
The height is now computed as part of the main function, while the height at x and y offsets are still computed on a separate function.
The differentials are now computed directly at node_bump.

Co-authored-by: Miguel Pozo <pragma37@gmail.com>
Pull Request #104595
2023-02-10 21:06:53 +01:00
0e6da74e98 Fix #104282: Resolve Depth read for D24_S8 types in Metal.
Fixes incorrect spotlight gizmo orientation when moving.

Authored by Apple: Michael Parkin-White

Related to #96261

Pull Request #104537
2023-02-10 20:40:07 +01:00
8a32d56056 Tests: Fix device list of benchmark script only showing a single GPU
Pull Request #104583
2023-02-10 19:38:37 +01:00
7351f533e0 Curves: Add lasso and circle select
This adds a `select_lasso` and a `select_circle` function for the Curves object. It is used in the `view3d_lasso_select` and `view3d_circle_select` operator.

Co-authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104560
2023-02-10 19:06:08 +01:00
5c4e1ed578 UI: Make text nomenclature and ordering consistent
"Center" -> "Middle" when describing vertical alignment.
"Align X" -> "Horizontal Alignment"
"Align Y" -> "Vertical Alignment"
Vertical alignment options rearranged to be consistently top-most to
bottom-most.

---

Co-authored-by: joshua-maros <60271685+joshua-maros@users.noreply.github.com>
Pull Request #104493
2023-02-10 19:05:37 +01:00
6f8c441950 Curves: Add select linked
This adds a new `select_linked` function that selects all the points
on a curve if there is at least one point already selected.
This also adds a keymap for the operator.

Co-authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104569
2023-02-10 18:58:08 +01:00
d411be8a99 Cleanup: Use utility function to find groups in node tree
Add `contains_group` method in python api for `NodeTree` type, cleanup
`ntreeHasTree` function, reuse `ntreeHasTree` in more place in code.
The algorithm has been changed to not recheck trees by using set.

Performance gains from avoiding already checked node trees:
Based on tests, can say that for large files with a huge number
of trees, the response speed of opening the search menu in the
node editor increased by ~200 times (for really large projects
with 16 individual groups in 6 levels of nesting). Group insert
operations are also accelerated, but this is different in some cases.

Pull Request #104465
2023-02-10 17:30:55 +01:00
fae661a1ab Revert "Un-ignore modules in .gitmodules configuration"
This reverts commit aab707ab70.

A different solution to the submodule problem is being considered in #104573.
Revert to the previous behavior that developers are familiar with for now.
2023-02-10 17:15:28 +01:00
923152d180 Geometry Nodes: improve parallelization in Delete/Separate Geometry node
This just adds `threading::parallel_for` and `threading::parallel_invoke` in a few
places where it can be added trivially. The run time of the `separate_geometry`
function changes from 830 ms to 413 ms in my test file.

Pull Request #104563
2023-02-10 17:14:30 +01:00
0ea15a6fbb Fix: Inaccessible default for node group image sockets
The type was just skipped when drawing defaults for the image sockets.
2023-02-10 09:22:45 -05:00
284cdbb6cf Cleanup: Use lambdas in mesh mapping callback, remove unused arguments
Using callback functions didn't scale well as more arguments are added.
It got very confusing when to pass tehmarguments weren't always used.
Instead use a `FunctionRef` with indices for arguments. Also remove
unused edge arguments to topology mapping functions.
2023-02-10 08:37:50 -05:00
bad2c3b9ef Geometry Nodes: Experimental option for Volumes
Adds an experimental option under "New Features" in preferences,
which enables visibility of the new Volume Nodes.
Right now this option does nothing but will be used during development.
See #103248

Pull Request #104552
2023-02-10 14:21:01 +01:00
88f9c55f7f Sculpt: Fix Dyntopo Warnings
Because of T95965, some attributes are stored as generic attributes
in Mesh but have special handling for the conversion to BMesh.

Expose a function to tell whether certain attribute names are handled
specially in the conversion, and refactor the error checking process
to use it. Also check for generic attributes on the face domain which
wasn't done before.

Author: Hans Goudey
Reviewed By: Joseph Eagar

Co-authored-by: Joseph Eagar <joeedh@gmail.com>
Pull Request #104567
2023-02-10 13:16:10 +01:00
dc9f7fe64f Fix #104514: GPencil merge down layer misses some frames
When merging two gpencil layers, if the destination layer had a keyframe
where the source layer did not, strokes of the previous keyframe
in source layer were lost in that frame.

This happened because the merge operator was looping through
frames of the source layer and appending strokes in the
corresponding destination layer, but never completing
other frames than the ones existing in the source layer.

This patch fixes it by first adding in source layer
all frames that are in destination layer.

Co-authored-by: Amelie Fondevilla <amelie.fondevilla@les-fees-speciales.coop>
Pull Request #104558
2023-02-10 12:55:06 +01:00
Lucas Tadeu Teixeira
5d30c3994e Sequencer: Don't create undo step when click-select does nothing
When the sequencer is empty (i.e., there are no sequences),
we would have the deselect_all variable set to true called
ED_sequencer_deselect_all to select any existing sequences.

Ref !104453
2023-02-10 21:49:42 +11:00
51ceeb506f Fix #104026: Click-Drag to select graph editor channels no longer working
Box-Selecting channels in the dope sheet with click-drag was no longer possible as of Blender 3.2

Due to the removal of tweak events the box select operator was always shadowed by the click operator.

Original Phabricator discussion here: https://archive.blender.org/developer/D17065

Use `WM_operator_flag_only_pass_through_on_press` on click operator to fix it

Co-authored-by: Christoph Lendenfeld <chris.lenden@gmail.com>
Pull Request #104505
2023-02-10 11:36:01 +01:00
01480229b1 Cycles: Fix MetalRT checkbox not hooked up to device on AMD
(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.

Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520
2023-02-10 10:55:39 +01:00
b77c82e2bb Tests: minor updates to make bl_rna_manual_reference more useful
- Avoid flooding the output with every match that succeeds.
- Report patterns listed in the manual that don't match anything in
  Blender.
- Disable external URL lookups, this is too slow.
  Instead use a LOCAL_PREFIX (a local build of the manual)
  or skip the test.
2023-02-10 14:04:15 +11:00
c2c62c3618 RNA: return a dummy language value when WITH_INTERNATIONAL=OFF
Without this, every access to "language" would warn that the enum
value didn't match a value in the enum items.

This made the bl_rna_manual_reference.py test output practically
unusable.
2023-02-10 13:18:33 +11:00
a8d951abdd Docs: remove malformed patterns for RNA mapping
The generator now skips these with a warning, they will need to be
corrected in the user manual.

This caused tests/python/bl_rna_manual_reference.py to fail looking
up URL's.
2023-02-10 13:04:27 +11:00
4cbe0bff34 Cleanup: spelling in comments 2023-02-10 11:34:20 +11:00
48d9363fa7 Cleanup: quiet clang compiler warnings
- undeclared variable warning.
- unreachable-code-return warnings.
- array-parameter, mismatch bound.
- 'requires' is a keyword in C++20, (rename to requires_flag).
2023-02-10 11:27:30 +11:00
8ac3096e24 Fix add-on & manual link in Help menu ignoring the current language
Use bpy.utils.manual_language_code() create manual URL's instead of
assuming English.
2023-02-10 11:02:45 +11:00
2ee9c12a23 PyAPI: add bpy.utils.manual_locale_code()
Move the function for getting the language code associated with the
user manual into a utility function (from the generated
rna_manual_reference.py).

This allows other parts of Blender to create a manual URL based on the
current locale preferences and environment.

Ref !104494
2023-02-10 11:01:02 +11:00
2d351e9ee3 Cleanup: format 2023-02-10 10:48:50 +11:00
7e0e07657c GPU: Cleanup GPU_batch.h documentation and some of the API for consistency
Documented all functions, adding use case and side effects.

Also replace the use of shortened argument name by more meaningful ones.

Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).

Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.
2023-02-09 22:46:35 +01:00
5c8edbd99b Cleanup: Move 6 sculpt-session-related files and header to C++
To allow further mesh data structure refactoring. See #103343

Pull Request #104540
2023-02-09 20:35:50 +01:00
2cfc4d7644 Fix #104383: don't update declaration for clipboard copy
When nodes are copied to the clipboard, they don't need their declaration.
For nodes with dynamic declaration that might depend on the node tree itself,
the declaration could not be build anyway, because the node-clipboard does
not have a node tree.

Pull Request #104432
2023-02-09 19:20:39 +01:00
bc0d3c91b1 Fix #104435: Fix rna_NlaStrip_new add strip logic to be correct boolean expression
Fixed #104435: Use correct conditional logic when testing if a new NLA strip can be added in the rna_NlaStrip_new method
2023-02-09 19:09:27 +01:00
50918d44fb Cleanup: Fix const correctness warning in recent commit 2023-02-09 11:26:38 -05:00
Kevin C. Burke
1649921791 Fix: Sequencer "Pan" label using incorrect keyword 'heading_ctxt'
Oversight in db87e2a638

Reviewed By: ISS
Differential Revision: https://archive.blender.org/developer/D17213
2023-02-09 16:33:50 +01:00
50dfd5f501 Geometry Nodes: Edges to Face Groups Node
Add a new node that groups faces inside of boundary edge regions.
This is the opposite action as the existing "Face Group Boundaries"
node. It's also the same as some of the "Initialize Face Sets"
options in sculpt mode.

Discussion in #102962 has favored "Group" for a name for these
sockets rather than "Set", so that is used here.

Pull Request #104428
2023-02-09 16:27:20 +01:00
bfa7f9db0e Assets: Implement viewport drag and drop for geometry nodes
Currently there's no way to assign a geometry node group from the asset
browser to an object as a modifier without first appending/linking it
manually. This patch adds a drag and drop operator that adds a new
modifier and assigns the dragged tree.

Pull Request #104430
2023-02-09 16:04:14 +01:00
b8e15a4a84 Fix: Add missing "-" in logic to get the channel height
This was missed when doing the refactoring in #104500
It didn't seem to have any effect until I worked on clamping the view
2023-02-09 15:59:57 +01:00
7ca651d182 Mesh: Remove unnecessary edge draw flag
As described in #95966, replace the `ME_EDGEDRAW` flag with a bit
vector in mesh runtime data. Currently the the flag is only ever set
to false for the "optimal display" feature of the subdivision surface
modifier. When creating an "original" mesh in the main data-base,
the flag is always supposed to be true.

The bit vector is now created by the modifier only as necessary, and
is cleared for topology-changing operations. This fixes incorrect
interpolation of the flag as noted in #104376. Generally it isn't
possible to interpolate it through topology-changing operations.

After this, only the seam status needs to be removed from edges before
we can replace them with the generic `int2` type (or something similar)
and reduce memory usage by 1/3.

Related:
- 10131a6f62
- 145839aa42

In the future `BM_ELEM_DRAW` could be removed as well. Currently it is
used and aliased by other defines in some non-obvious ways though.

Pull Request #104417
2023-02-09 15:56:05 +01:00
3bed78ff59 Curves: Add box selection
This adds a `select_box` function for the `Curves` object. It is used in the `view3d_box_select` operator.

It also adds the basic selection tools in the toolbar of Edit Mode.

Authored-by: Falk David <falkdavid@gmx.de>
Pull Request #104411
2023-02-09 15:53:42 +01:00
22edf04458 I18n: use format strings for Cycles version error messages
The required version numbers for various devices was hardcoded in the
UI messages. The result was that every time one of these versions was
bumped, every language team had to update the message in question.

Instead, the version numbers can be extracted, and injected into the
error messages using string formatting so that translation updates
need happen less frequently.

Pull Request #104488
2023-02-09 15:48:19 +01:00
666c2ea012 Refactor: remove yscale from bAnimContext
`bAnimContext` had a float property called `yscale_fac` that was used to define the height of the keyframe channels.

However the property was never set, only read so there really is no need to have it in the struct.

Moreover it complicated getting the channel height because `bAnimContext` had to be passed in.

Speaking of getting the channel height. This was done with macros. I ripped them all out and replaced them with function calls.

Originally it was introduced in this patch: https://developer.blender.org/rB095c8dbe6919857ea322b213a1e240161cd7c843

Co-authored-by: Christoph Lendenfeld <chris.lenden@gmail.com>
Pull Request #104500
2023-02-09 14:28:04 +01:00
ca183993a5 Fix freeing uninitialized pointer in GHOST/Wayland + X11 fallback
Freeing the timer manager didn't account for Wayland being partially
initialized.
2023-02-09 23:41:03 +11:00
0e196bab76 Build: disable LTO for Python builds
LTO compiled libpython3.10.a failed to link with GCC 12.0,
disable since these libraries are intended for developers to link
against.
2023-02-09 23:00:08 +11:00
f222fe6a3a Build: enable Python optimizations (PGO & LTO) on Linux
This is used for most Python release builds and has been reported to
give a modest 5-10% speedup (depending on the workload).

This could be enabled on macOS too but needs to be tested.
2023-02-09 20:59:08 +11:00
8b35db914e GPU: Fix assert when using light gizmo.
Blender was reporting that the GPU_TEXTURE_USAGE_HOST_READ wasn't set.
This is used to indicate that the textures needs to be read back to
CPU. Textures that don't need to be read back can be optimized by the
GPU backend.

Found during investigation of #104282.
2023-02-09 08:36:27 +01:00
1883e782cb Spelling: Assert message in GPU_texture_read. 2023-02-09 08:11:11 +01:00
7effc6ffc4 Cleanup: solve compiler warnings.
Classes were predefined as structs.
2023-02-09 08:01:33 +01:00
f3d7de709f Cycles: update Intel Graphics Compiler to 1.0.13064.7 on Linux
Linux side of 8afcecdf1f.

Reviewed by: LazyDodo, sergey, campbellbarton
Ref !104458, 16984
2023-02-09 15:43:23 +11:00
3c8f7b1a64 Cleanup: Remove unused/redundant includes from BKE_curves.hh
Avoid including headers that are obviously redundant, and don't
include BLI_task.hh in the header file, since it isn't really related.
2023-02-08 20:29:52 -05:00
0381fe7bfe Cleanup: update username in code-comments: campbellbarton -> ideasman42
Gitea migration changed my username, update code-comments.
2023-02-09 11:33:48 +11:00
5f842ef336 Cleanup: spelling in comments 2023-02-09 11:24:50 +11:00
5b110548eb Cleanup: enum conversion compiler warnings 2023-02-09 11:18:32 +11:00
9fd71d470e PyAPI: minor change to rna_manual_reference loading
- Use bpy.utils.execfile instead of importing then deleting from
  sys.modules.
- Add a note for why keeping this cached in memory isn't necessary.

This has the advantage of not interfering with any scripts that import
`rna_manual_reference` as a module.
2023-02-09 11:18:15 +11:00
94d280fc3f EEVEE-Next: Shadows: Add global switch
This allow to bypass all cost associated with shadow mapping.

This can be useful in certain situation, such as opening a scene on a
lower end system or just to gain performance in some situation (lookdev).
2023-02-09 00:48:33 +01:00
9103978952 EEVEE-Next: Shadow: Fix issue with last merge
The merge with master updated the code to use the new matrix API. This
introduce some regressions.

For sunlights make sure there is enough tilemaps in orthographic mode
to cover the depth range and fix the level offset in perspective.
2023-02-08 23:45:12 +01:00
9c03a1c92f Fix Cycles link error with debug/asan builds after recent bugfix
Pull Request #104487
2023-02-08 23:20:29 +01:00
a0f5240089 EEVEE-Next: Virtual Shadow Map initial implementation
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.

The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.

Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.

## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.

## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).

Related to #93220
Related to #104472
2023-02-08 21:18:44 +01:00
0ab3ac7a41 BLI: Math: Fix vector operator * with MutableMatView
This was caused by operator priority trying to use
`friend VecBase operator*(const VecBase &a, FactorT b)`.

Adding tests as these were not covered.
2023-02-08 19:38:53 +01:00
a1282ab015 Fix Cycles debug build error after host falback changes
Introduced in dcfb6df9ce6.

Co-authored-by: Lucas Tadeu Teixeira <lucas@lucastadeu.com>

Pull Request #104454
2023-02-08 19:27:40 +01:00
43f308f216 Make update: Ignore submodules
The previous change in the .gitmodules made it so the `make update`
rejects to do its thing because it now sees changes in the submodules
and rejected to update, thinking there are unstaged changes.

Ignore the submodule changes, bringing the old behavior closer to
what it was.
2023-02-08 14:05:50 +01:00
aab707ab70 Un-ignore modules in .gitmodules configuration
The meaning of the ignore option for submodules did change since our
initial Git setup was done: back then it was affecting both diff and
stage families of Git command. Unfortunately, the actual behavior did
violate what documentation was stating (the documentation was stating
that the option only affects diff family of commands). This got fixed
in Git some time after our initial setup and it was the behavior of the
commands changed, not the documentation. This lead to a situation when
we can no longer see that submodules are modified and staged, and it is
very easy to stage the submodules.

For the clarity: diff and status are both "status" family, show and
diff are "diff" family.

Hence this change: since there is no built-in zero-configuration way
of forbidding Git from staging submodules lets make it visible and
clear what the state of submodules is.

We still need to inform people to not stage submodules, for which
we can offer some configuration tips and scripts but doing so is
outside of the scope of this change at it requires some additional
research. Current goal is simple: make it visible and clear what is
going to be committed to Git.

This is a response to an increased frequency of incidents when the
submodules are getting modified and committed without authors even
noticing this (which is also a bit annoying to recover from).

Differential Revision: https://developer.blender.org/D13001
2023-02-08 11:32:32 +01:00
4ed8a360e9 Fix references to the main branch in the .gitmodules 2023-02-08 11:01:01 +01:00
4d3bfb3f41 Subdivision Surface: fix a serious performance hit when mixing CPU & GPU.
Subdivision surface efficiency relies on caching pre-computed topology
data for evaluation between frames. However, while eed45d2a23
introduced a second GPU subdiv evaluator type, it still only kept
one slot for caching this runtime data per mesh.

The result is that if the mesh is also needed on CPU, for instance
due to a modifier on a different object (e.g. shrinkwrap), the two
evaluators are used at the same time and fight over the single slot.
This causes the topology data to be discarded and recomputed twice
per frame.

Since avoiding duplicate evaluation is a complex task, this fix
simply adds a second separate cache slot for the GPU data, so that
the cost is simply running subdivision twice, not recomputing topology
twice.

To help diagnostics, I also add a message to show when GPU evaluation
is actually used to the modifier panel. Two frame counters are used
to suppress flicker in the UI panel.

Differential Revision: https://developer.blender.org/D17117

Pull Request #104441
2023-02-08 10:27:22 +01:00
d781e52ee0 Cleanup: use enum literals, order likely case first in polyfill_2d 2023-02-08 17:06:54 +11:00
09eb4fe19a Fix #103913: Triangulate sometimes creates degenerate triangles
The ear clipping method used by polyfill_2d only excluded concave ears
which meant ears exactly co-linear edges created zero area triangles
even when convex ears are available.

While polyfill_2d prioritizes performance over *pretty* results,
there is no need to pick degenerate triangles with other candidates
are available. As noted in code-comments, callers that require higher
quality tessellation should use BLI_polyfill_beautify.
2023-02-08 16:59:42 +11:00
6aa1b5d031 Cleanup: format 2023-02-08 00:21:57 +01:00
5c994d7846 Fix #104297: Cycling geometry nodes viewer ignores sockets
Sockets after the geometry socket were ignored when cycling through
the node's output sockets. If there are multiple geometry sockets, the
behavior could still be refined probably, but this should at least make
basic non-geometry socket cycling work.
2023-02-07 16:01:54 -05:00
53b057aa09 Cleanup: Move 18 sculpt files to C++
To allow further mesh data structure refactoring. See #103343

Pull Request #104436
2023-02-07 21:56:45 +01:00
e817cff009 Release: support generating LTS release notes from Gitea
Now a single script to generate both links and release notes. It also includes
the issue ID for the LTS releases, so only the release version needs to be
specified.

Pull Request #104402
2023-02-07 21:23:24 +01:00
41ddd3d732 Fix: Experimental Panel links modified for Gitea
Modifies the links to point to the new developer site.

Pull Request #104425
2023-02-07 19:54:43 +01:00
f5552d759c Fix compiler error 2023-02-07 18:32:24 +01:00
f01bf82480 Curves: Add select pick operator
This adds the `select_pick` function for to `Curves` objects.
It is used in the common `view3d_select` operator.

Pull Request #104406
2023-02-07 17:50:39 +01:00
8d9d16fb53 Fix #104396: Blender crashes when moving Keyframes in Graph Editor
`t->region->gizmo_map` can be `nullptr`.

Caused by 19b63b932d
2023-02-07 12:59:35 -03:00
349350b304 Fix T104390: Regression: Object selection in viewport is not working
Caused by alignment difference between C and C++. Asan caught the issue
on startup.

Removing the unused view matrix storage copy avoids this problem.
2023-02-07 16:45:47 +01:00
cb5318b651 Docs: change Git URLs to point projects.blender.org instead of git.blender.org 2023-02-07 14:23:05 +01:00
bd6b0bac88 Update references to the new projects platform and main branch 2023-02-07 14:18:19 +01:00
3002670332 Fix T104368: Incorrect tooltip text in Blender 3.4.1's Preferences > File Paths > Scripts field.
Use backticks to cleary identify 'path' parts of this tooltip.
2023-02-07 09:50:50 +01:00
f086cf3cea Cleanup: remove redundant parenthesis 2023-02-07 17:34:20 +11:00
2609ca2b8e Cleanup: tweaks to cycles/metal preferences
- Auto-format.
- Use raw string for regex.
- Remove redundant assignment.
- Remove duplicate arm64 check.
- Break early out of loop.
2023-02-07 17:30:13 +11:00
b-init
7e8153b07d Keymap: support default shortcut to toggle overlays in all space-types
UV Editor, Image Editor & Sequencer didn't have a shortcut for toggling
overlays. Use the same shortcut as the 3D viewport.

Ref D16959
2023-02-07 16:54:06 +11:00
622cad7073 Cleanup: minor tweak to recent fix for T10438
Minor change to [0], prefer calling em_setup_viewcontext,
even though there is no functional difference at the moment,
if this function ever performs additional operations than assigning
`ViewContext.em`, it would have to be manually in-lined in
`view3d_circle_select_recalc`.

[0]: 430cc9d7bf
2023-02-07 16:18:30 +11:00
44daeaae7d Cleanup: use arg instead of param for generated sphinx docs 2023-02-07 15:14:22 +11:00
db8b5a2316 PyDoc: remove deprecated dpi argument from BLF example 2023-02-07 15:12:05 +11:00
dbca0cc9d5 Fix crash on exit under Wayland
Order of free error from [0] caused the timer manager
to be freed before the timer.

[0]: 7de1a4d1d8
2023-02-07 15:12:05 +11:00
e4f77c1a6c Cleanup: format 2023-02-07 16:57:35 +13:00
Jon Denning
e27c89c7c7 Docs: added missing documentation for WindowManager methods
Added missing documentation for `draw_cursor_add` and
`draw_cursor_remove` methods for `WindowManager`.

Differential Revision: https://developer.blender.org/D14860
2023-02-06 22:40:10 -05:00
af5706c960 Docs: improve doc-string for WM_operator_flag_only_pass_through_on_press
The doc-string didn't provide any context for how the funciton is
intended to be used.
2023-02-07 14:18:59 +11:00
a99022e22d Cleanup: spelling in comments 2023-02-07 14:17:01 +11:00
d5af895419 Fix missing matrix includes 2023-02-07 14:07:21 +11:00
Jason Fielder
8703db393b Metal: Ensure explicit return after discard to eliminate differences in behaviour between GPUs.
Discard is not always treated as an explicit return and flow control can continue for required derivative calculations. This behaviour is different in Metal vs OpenGL. Adding return after discards ensures consistency in expectation as behaviour is well-defined.

Authored by Apple: Michael Parkin-White

Ref T96261

Reviewed By: fclem

Maniphest Tasks: T96261

Differential Revision: https://developer.blender.org/D17199
2023-02-07 00:58:06 +01:00
Jason Fielder
f152159101 Metal: Guard advanced command buffer debugging behind OS version flag.
Authored by Apple: Michael Parkin-White

Ref T96261

Reviewed By: fclem

Maniphest Tasks: T96261

Differential Revision: https://developer.blender.org/D17181
2023-02-07 00:51:16 +01:00
d3500c482f Cleanup: Move DRW_pbvh.h header to C++
For continued refactoring of the Mesh data structure. See T103343.
2023-02-06 16:52:02 -05:00
3a1583972a Fix T104256: Curve to points node skips curve domain attributes
7536abbe16 forgot to port the curve domain attributes.
2023-02-06 16:30:25 -05:00
6dcfb6df9c Cycles: Abstract host memory fallback for GPU devices
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17173
2023-02-06 22:19:32 +01:00
b0b9e746fa BLI: Use BLI_math_matrix_type.hh instead of BLI_math_float4x4.hh
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).

This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.
2023-02-06 21:25:45 +01:00
8be3fcab7e Fix tests after recent commit.
`9c14039a8f4b5f` broke blenlib tests in release builds, due to how
`EXPECT_BLI_ASSERT` works (in release builds it just calls the given
function, so if that crashes then the test fails).

For now remove that check in the test.
2023-02-06 21:14:20 +01:00
d20f992322 Mesh: Avoid copying positions in object mode modifier stack
Remove the use of a separate contiguous positions array now that
they are stored that way in the first place. This allows removing the
complexity of tracking whether it is allocated and deformed in the
mesh modifier stack.

Instead of deferring the creation of the final mesh until after the
positions have been copied and deformed, create the final mesh
first and then deform its positions.

Since vertex and face normals are calculated lazily, we can rely on
individual modifiers to calculate them as necessary and simplify
the modifier stack. This was hard to change before because of the
separate array of deformed positions.

Differential Revision: https://developer.blender.org/D16971
2023-02-06 14:34:29 -05:00
a38d99e0b2 Fix (unreported): Transform gizmo not restoring when changing mode
When activating a rotation with the Transform gizmo for example, some
gizmos are hidden but they don't reappear when changing the mode.

Make sure the gizmos corresponding to the mode always reappear.
2023-02-06 16:20:12 -03:00
75db4c082b Cleanup: Use BitVector instead of BLI_bitmap for subsurf face dot tags
For better type safety and more automatic memory management.
2023-02-06 14:13:53 -05:00
Michael Jones
2d994de77c Cycles: MetalRT optimisation for subsurface intersection queries
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.

Patch authored by Marco Giordano.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17153
2023-02-06 19:12:29 +00:00
961d99d3a4 Minor comments about current usages of BKE_main_namemap_destroy. 2023-02-06 19:36:36 +01:00
430cc9d7bf Fix T104381: Assert on Circle Select end modal
`em_setup_vivewcontext` cannot be used in this function now as it
expects `obedit` to be a mesh. It also duplicated the viewcontext init.
Instead `BKE_editmesh_from_object` is called only when type is a mesh.
2023-02-06 19:32:14 +01:00
a3551ed878 BKE main namemap: Add a way to clear all namemaps of a given Main.
Existing `BKE_main_namemap_destroy` is too specific when a entire Main
needs to have its namemaps cleared, since it would not handle the
Library ones.

While in regular situation current code is fine, ID management code that
may also edit linked data needs a wider, simpler clearing tool.
2023-02-06 19:29:21 +01:00
288b13b252 BKE lib remapper: Add new util to overwrite an existing mapping.
Current `BKE_id_remapper_add` would not replace an already existing
mapping rule, now `BKE_id_remapper_add_overwrite` allows that behavior
if necessary.
2023-02-06 19:29:21 +01:00
85b2bce037 BKE libremap: do not assert on identity pairs.
If identity pairs (i.e. old ID pointer being same as new one) was
forbidden, then this should be asserted against in code defining
remapping, not in code applying it.

But it is actually sometimes usefull to allow/use identity pairs, so
simply early-return on these instead of asserting.
2023-02-06 19:29:21 +01:00
c7b601c79e Cleanup: Subdiv: Use index instead of pointer
The edge pointer was retrieved from the index and then just
used to calculate the index again. Remove the edge pointer
and only use the index.
2023-02-06 12:29:01 -05:00
81f8d74f6d EEVEE: Cleanup: light power / radiance code and document it
All thanks to @weizhen for spoting the inconsistencies.
2023-02-06 18:23:34 +01:00
9c14039a8f BLI ListBase: Add new 'SplitAfter' given link util.
Allows to easily split a ListBase by moving the part after given link
into a new ListBase.
2023-02-06 17:58:40 +01:00
39e0bbfa55 BKE libquery: Add option to also process ID.lib pointer.
This was never needed before, but upcomming work for Brush Asset project
requires it.
2023-02-06 17:31:01 +01:00
8fe4f3b756 Minor code doc improvement. 2023-02-06 17:19:02 +01:00
Iliya Katueshenock
a86f657692 Fix T104233: crash when deleting a group node that is displayed by other editor
Differential Revision: https://developer.blender.org/D17172
2023-02-06 16:18:16 +01:00
8adebaeb7c Modifiers: measure execution time and provide Python access
The goal is to give technical artists the ability to optimize modifier usage
and/or geometry node groups for performance. In the long term, it
would be useful if Blender could provide its own UI to display profiling
information to users. However, right now, there are too many open
design questions making it infeasible to tackle this in the short term.

This commit uses a simpler approach: Instead of adding new ui for
profiling data, it exposes the execution-time of modifiers in the Python
API. This allows technical artists to access the information and to build
their own UI to display the relevant information. In the long term this
will hopefully also help us to integrate a native ui for this in Blender
by observing how users use this information.

Note: The execution time of a modifier highly depends on what other
things the CPU is doing at the same time. For example, in many more
complex files, many objects and therefore modifiers are evaluated at
the same time by multiple threads which makes the measurement
much less reliable. For best results, make sure that only one object
is evaluated at a time (e.g. by changing it in isolation) and that no
other process on the system keeps the CPU busy.

As shown below, the execution time has to be accessed on the
evaluated object, not the original object.

```lang=python
import bpy
depsgraph = bpy.context.view_layer.depsgraph
ob = bpy.context.active_object
ob_eval = ob.evaluated_get(depsgraph)
modifier_eval = ob_eval.modifiers[0]
print(modifier_eval.execution_time, "s")
```

Differential Revision: https://developer.blender.org/D17185
2023-02-06 15:40:15 +01:00
773a36d2f8 Fix Cycles OneAPI build error after recent changes 2023-02-06 15:36:49 +01:00
cc623ee7b0 Transform: do not save settings when canceling the operation
If we are canceling, the settings must remain the same as before we
start the operation.
2023-02-06 11:18:52 -03:00
deaddbdcff Fix forced snap status being removed when changing transform mode
The `SNAP_FORCED` setting is set to the operation and not the snap
status.

Therefore, this option should not be cleared along with the other
statuses when resetting snapping.

Move then the location of this setting to `TransInfo::modifiers`.
2023-02-06 11:18:52 -03:00
f2538c7173 Fix T104335: MNEE + OptiX OSL results in illegal address error
The OptiX pipeline created for OSL was missing sufficient continuation
stack to handle the MNEE ray generation program.
2023-02-06 15:06:52 +01:00
4bd3b02984 Python: Suppress BGL deprecation messages after 100 times.
BGL deprecation calls used to be reported on each use. As bgl calls
are typically part of a handler that is triggered at refresh this
could lead to overflow of messages and slowing down systems when
the terminal/console had to be refreshed as well.

This patch only reports the first 100 bgl deprecation calls. This
gives enough feedback to the developer that changes needs to be made
. But still provides good responsiveness to users when they have
such add-on enabled. Only the first frames can have a slowdown.
2023-02-06 13:35:29 +01:00
7beb487e9a Fix T104353: Crash on opening sculpting template
`t->region` was `NULL`.

It can happen depending on the context.

Caused by rB19b63b932d2b.
2023-02-06 09:21:04 -03:00
9ad3a85f8b Fix Cycles GPU binaries build error after recent changes for Metal 2023-02-06 13:17:57 +01:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Michael Jones
46c9f7702a Cycles: Enable MetalRT opt-in for AMD/Navi2 GPUs
Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17043
2023-02-06 11:14:11 +00:00
Michael Jones
be0912a402 Cycles: Prevent use of both AMD and Intel Metal devices at same time
This patch removes the option to select both AMD and Intel GPUs on system that have both. Currently both devices will be selected by default which results in crashes and other poorly understood behaviour. This patch adds precedence for using any discrete AMD GPU over an integrated Intel one. This can be overridden with CYCLES_METAL_FORCE_INTEL.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17166
2023-02-06 11:13:33 +00:00
Michael Jones
0a3df611e7 Fix T103393: Cycles: Undefine __LIGHT_TREE__ on Metal/AMD to fix perf
This patch fixes T103393 by undefining `__LIGHT_TREE__` on Metal/AMD as it has an unexpected & major impact on performance even when light trees are not in use.

Patch authored by Prakash Kamliya.

Reviewed By: brecht

Maniphest Tasks: T103393

Differential Revision: https://developer.blender.org/D17167
2023-02-06 11:12:34 +00:00
Amelie Fondevilla
6d297c35c8 Fix T104371: GPencil merge down layer duplicates wrong frame
The merge down operator was sometimes copying the wrong frame, which altered the animation.
While merging the layers, it is sometimes needed to duplicate a keyframe,
when the lowest layer does not have a keyframe but the highest layer does.
Instead of duplicating the previous keyframe of the lowest layer, the code
was actually duplicating the active frame of the layer which was the current frame in the timeline.

This patch fixes the issue by setting the previous keyframe of the layer as its active frame before duplication.

Related issue: T104371.

Differential Revision: https://developer.blender.org/D17214
2023-02-06 10:44:17 +01:00
329eeacc66 Cleanup: Cycles: Remove isotropic microfacet closure setup functions
Turns out these are 100% redundant, so get rid of them.
2023-02-06 04:26:36 +01:00
2627635ff3 Cleanup: use nullptr in C++ 2023-02-06 12:50:34 +11:00
d6b6050e5b Cleanup: use function style casts in C++ 2023-02-06 12:35:51 +11:00
731c3efd97 Cleanup: format 2023-02-06 12:32:45 +11:00
9f5c17f4af Cleanup: comments in code 2023-02-06 12:25:04 +11:00
4fcc9f5e7e Cleanup: use back-slash doxygen commands, de-duplicate doc-string 2023-02-06 12:25:04 +11:00
7de1a4d1d8 Fix GHOST/Wayland thread-unsafe timer-manager manipulation
Mutex locks for manipulating GHOST_System::m_timerManager from
GHOST_SystemWayland relied on WAYLAND being the only user of the
timer-manager.

This isn't the case as timers are fired from
`GHOST_System::dispatchEvents`.

Resolve by using a separate timer-manager for wayland key-repeat timers.
2023-02-06 12:25:04 +11:00
d3949a4fdb Fix GHOST/Wayland thread-unsafe key-repeat timer checks
Resolve a thread safety issue reported by valgrind's helgrind checker,
although I wasn't able to redo the error in practice.

NULL check on the key-repeat timer also needs to lock, otherwise it's
possible the timer is set in another thread before the lock is acquired.

Now all key-repeat timer access which may run from a thread
locks the timer mutex before any checks or timer manipulation.
2023-02-06 12:25:04 +11:00
b642dc7bc7 Fix: Incorrect forward-compatible saving of face sets
There were two errors with the function used to convert face sets
to the legacy mesh format for keeping forward compatibility:
- It was moved before `CustomData_blend_write_prepare` so it
  operated on an empty span.
- It modified the mesh when it's only supposed to change the copy
  of the layers written to the file.

Differential Revision: https://developer.blender.org/D17210
2023-02-05 18:09:22 -05:00
501352ef05 Cleanup: Move PBVH files to C++
For continued refactoring of the Mesh data structure. See T103343.
2023-02-05 17:36:47 -05:00
b2ad97ba97 Merge branch 'master' into tmp-usd-alab-v2-T100452 2022-09-23 16:56:14 -04:00
b31807c75f Format fixes. 2022-08-16 22:25:51 -04:00
5649158db9 USD import: Alab v2 load errors.
Work in progress addressing issues loading the Alab v2 scene.

Load primvars of type Float2Array as texture coordinates.
Fixed type mismatch loading UsdPrimvarReader_float2 varname input.
For materials that have an authored opacity value, changed the blend
mode from Alpha Blend to Alpha Hashed, to mitigate object soring issue.
Added import option to specify loading materials by purpose to allow
explicitly loading full materials for debugging.
2022-08-16 22:15:13 -04:00
1225 changed files with 19451 additions and 10406 deletions

View File

@@ -236,6 +236,8 @@ ForEachMacros:
- LOOP_UNSELECTED_POINTS
- LOOP_VISIBLE_KEYS
- LOOP_VISIBLE_POINTS
- LIGHT_FOREACH_BEGIN_DIRECTIONAL
- LIGHT_FOREACH_BEGIN_LOCAL
- LISTBASE_CIRCULAR_BACKWARD_BEGIN
- LISTBASE_CIRCULAR_FORWARD_BEGIN
- LISTBASE_FOREACH

View File

@@ -1,5 +1,4 @@
This repository is only used as a mirror of git.blender.org. Blender development happens on
https://developer.blender.org.
This repository is only used as a mirror. Blender development happens on projects.blender.org.
To get started with contributing code, please see:
https://wiki.blender.org/wiki/Process/Contributing_Code

3
.github/stale.yml vendored
View File

@@ -15,8 +15,7 @@ staleLabel: stale
# Comment to post when closing a stale Issue or Pull Request.
closeComment: >
This issue has been automatically closed, because this repository is only
used as a mirror of git.blender.org. Blender development happens on
developer.blender.org.
used as a mirror. Blender development happens on projects.blender.org.
To get started contributing code, please read:
https://wiki.blender.org/wiki/Process/Contributing_Code

8
.gitmodules vendored
View File

@@ -1,20 +1,20 @@
[submodule "release/scripts/addons"]
path = release/scripts/addons
url = ../blender-addons.git
branch = master
branch = main
ignore = all
[submodule "release/scripts/addons_contrib"]
path = release/scripts/addons_contrib
url = ../blender-addons-contrib.git
branch = master
branch = main
ignore = all
[submodule "release/datafiles/locale"]
path = release/datafiles/locale
url = ../blender-translations.git
branch = master
branch = main
ignore = all
[submodule "source/tools"]
path = source/tools
url = ../blender-dev-tools.git
branch = master
branch = main
ignore = all

View File

@@ -524,7 +524,7 @@ endif()
if(NOT APPLE)
option(WITH_CYCLES_DEVICE_HIP "Enable Cycles AMD HIP support" ON)
option(WITH_CYCLES_HIP_BINARIES "Build Cycles AMD HIP binaries" OFF)
set(CYCLES_HIP_BINARIES_ARCH gfx1010 gfx1011 gfx1012 gfx1030 gfx1031 gfx1032 gfx1034 gfx1035 gfx1100 gfx1101 gfx1102 CACHE STRING "AMD HIP architectures to build binaries for")
set(CYCLES_HIP_BINARIES_ARCH gfx900 gfx906 gfx90c gfx902 gfx1010 gfx1011 gfx1012 gfx1030 gfx1031 gfx1032 gfx1034 gfx1035 gfx1100 gfx1101 gfx1102 CACHE STRING "AMD HIP architectures to build binaries for")
mark_as_advanced(WITH_CYCLES_DEVICE_HIP)
mark_as_advanced(CYCLES_HIP_BINARIES_ARCH)
endif()
@@ -625,8 +625,10 @@ mark_as_advanced(
# Vulkan
option(WITH_VULKAN_BACKEND "Enable Vulkan as graphics backend (only for development)" OFF)
option(WITH_VULKAN_GUARDEDALLOC "Use guardedalloc for host allocations done inside Vulkan (development option)" OFF)
mark_as_advanced(
WITH_VULKAN_BACKEND
WITH_VULKAN_GUARDEDALLOC
)
# Metal

View File

@@ -299,7 +299,11 @@ else
ifneq ("$(wildcard $(DEPS_BUILD_DIR)/build.ninja)","")
DEPS_BUILD_COMMAND:=ninja
else
DEPS_BUILD_COMMAND:=make -s
ifeq ($(OS), Darwin)
DEPS_BUILD_COMMAND:=make -s
else
DEPS_BUILD_COMMAND:="$(BLENDER_DIR)/build_files/build_environment/linux/make_deps_wrapper.sh" -s
endif
endif
endif
@@ -398,7 +402,7 @@ endif
deps: .FORCE
@echo
@echo Configuring dependencies in \"$(DEPS_BUILD_DIR)\"
@echo Configuring dependencies in \"$(DEPS_BUILD_DIR)\", install to \"$(DEPS_INSTALL_DIR)\"
@cmake -H"$(DEPS_SOURCE_DIR)" \
-B"$(DEPS_BUILD_DIR)" \

View File

@@ -24,7 +24,7 @@ Development
-----------
- [Build Instructions](https://wiki.blender.org/wiki/Building_Blender)
- [Code Review & Bug Tracker](https://developer.blender.org)
- [Code Review & Bug Tracker](https://projects.blender.org)
- [Developer Forum](https://devtalk.blender.org)
- [Developer Documentation](https://wiki.blender.org)

View File

@@ -22,7 +22,7 @@ elseif(UNIX AND NOT APPLE)
)
endif()
# Boolean crashes with Arm assembly, see T103423.
# Boolean crashes with Arm assembly, see #103423.
if(BLENDER_PLATFORM_ARM)
set(GMP_OPTIONS
${GMP_OPTIONS}

View File

@@ -40,7 +40,8 @@ ExternalProject_Add(external_igc_llvm
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/clang/0004-OpenCL-support-cl_ext_float_atomics.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/clang/0005-OpenCL-Add-cl_khr_integer_dot_product.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0001-Memory-leak-fix-for-Managed-Static-Mutex.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0002-Remove-repo-name-in-LLVM-IR.patch
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0002-Remove-repo-name-in-LLVM-IR.patch &&
${PATCH_CMD} -p 1 -d ${IGC_LLVM_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/llvm/0003-Add-missing-include-limit-in-benchmark.patch
)
add_dependencies(
external_igc_llvm
@@ -55,9 +56,6 @@ ExternalProject_Add(external_igc_spirv_translator
CONFIGURE_COMMAND echo .
BUILD_COMMAND echo .
INSTALL_COMMAND echo .
PATCH_COMMAND ${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0001-update-SPIR-V-headers-for-SPV_INTEL_split_barrier.patch &&
${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0002-Add-support-for-split-barriers-extension-SPV_INTEL_s.patch &&
${PATCH_CMD} -p 1 -d ${IGC_SPIRV_TRANSLATOR_SOURCE_DIR} < ${IGC_OPENCL_CLANG_PATCH_DIR}/spirv/0003-Support-cl_bf16_conversions.patch
)
add_dependencies(
external_igc_spirv_translator

View File

@@ -88,6 +88,19 @@ else()
export LDFLAGS=${PYTHON_LDFLAGS} &&
export PKG_CONFIG_PATH=${LIBDIR}/ffi/lib/pkgconfig)
# NOTE: untested on APPLE so far.
if(NOT APPLE)
set(PYTHON_CONFIGURE_EXTRA_ARGS
${PYTHON_CONFIGURE_EXTRA_ARGS}
# Used on most release Linux builds (Fedora for e.g.),
# increases build times noticeably with the benefit of a modest speedup at runtime.
--enable-optimizations
# While LTO is OK when building on the same system, it's incompatible across GCC versions,
# making it impractical for developers to build against, so keep it disabled.
# `--with-lto`
)
endif()
ExternalProject_Add(external_python
URL file://${PACKAGE_DIR}/${PYTHON_FILE}
DOWNLOAD_DIR ${DOWNLOAD_DIR}

View File

@@ -668,9 +668,9 @@ set(SPIRV_HEADERS_FILE SPIR-V-Headers-${SPIRV_HEADERS_VERSION}.tar.gz)
# compiler, the versions used are taken from the following location
# https://github.com/intel/intel-graphics-compiler/releases
set(IGC_VERSION 1.0.12149.1)
set(IGC_VERSION 1.0.13064.7)
set(IGC_URI https://github.com/intel/intel-graphics-compiler/archive/refs/tags/igc-${IGC_VERSION}.tar.gz)
set(IGC_HASH 44f67f24e3bc5130f9f062533abf8154782a9d0a992bc19b498639a8521ae836)
set(IGC_HASH a929abd4cca2b293961ec0437ee4b3b2147bd3b2c8a3c423af78c0c359b2e5ae)
set(IGC_HASH_TYPE SHA256)
set(IGC_FILE igc-${IGC_VERSION}.tar.gz)
@@ -690,15 +690,15 @@ set(IGC_LLVM_FILE ${IGC_LLVM_VERSION}.tar.gz)
#
# WARNING WARNING WARNING
set(IGC_OPENCL_CLANG_VERSION 363a5262d8c7cff3fb28f3bdb5d85c8d7e91c1bb)
set(IGC_OPENCL_CLANG_VERSION ee31812ea8b89d08c2918f045d11a19bd33525c5)
set(IGC_OPENCL_CLANG_URI https://github.com/intel/opencl-clang/archive/${IGC_OPENCL_CLANG_VERSION}.tar.gz)
set(IGC_OPENCL_CLANG_HASH aa8cf72bb239722ce8ce44f79413c6887ecc8ca18477dd520aa5c4809756da9a)
set(IGC_OPENCL_CLANG_HASH 1db6735bbcfaa31e8a9ba39f121d6bafa806ea8919e9f56782d6aaa67771ddda)
set(IGC_OPENCL_CLANG_HASH_TYPE SHA256)
set(IGC_OPENCL_CLANG_FILE opencl-clang-${IGC_OPENCL_CLANG_VERSION}.tar.gz)
set(IGC_VCINTRINSICS_VERSION v0.5.0)
set(IGC_VCINTRINSICS_VERSION v0.11.0)
set(IGC_VCINTRINSICS_URI https://github.com/intel/vc-intrinsics/archive/refs/tags/${IGC_VCINTRINSICS_VERSION}.tar.gz)
set(IGC_VCINTRINSICS_HASH 70bb47c5e32173cf61514941e83ae7c7eb4485e6d2fca60cfa1f50d4f42c41f2)
set(IGC_VCINTRINSICS_HASH e5acd5626ce7fa6d41ce154c50ac805eda734ee66af94ef28e680ac2ad81bb9f)
set(IGC_VCINTRINSICS_HASH_TYPE SHA256)
set(IGC_VCINTRINSICS_FILE vc-intrinsics-${IGC_VCINTRINSICS_VERSION}.tar.gz)
@@ -714,9 +714,9 @@ set(IGC_SPIRV_TOOLS_HASH 6e19900e948944243024aedd0a201baf3854b377b9cc7a386553bc1
set(IGC_SPIRV_TOOLS_HASH_TYPE SHA256)
set(IGC_SPIRV_TOOLS_FILE SPIR-V-Tools-${IGC_SPIRV_TOOLS_VERSION}.tar.gz)
set(IGC_SPIRV_TRANSLATOR_VERSION a31ffaeef77e23d500b3ea3d35e0c42ff5648ad9)
set(IGC_SPIRV_TRANSLATOR_VERSION d739c01d65ec00dee64dedd40deed805216a7193)
set(IGC_SPIRV_TRANSLATOR_URI https://github.com/KhronosGroup/SPIRV-LLVM-Translator/archive/${IGC_SPIRV_TRANSLATOR_VERSION}.tar.gz)
set(IGC_SPIRV_TRANSLATOR_HASH 9e26c96a45341b8f8af521bacea20e752623346340addd02af95d669f6e89252)
set(IGC_SPIRV_TRANSLATOR_HASH ddc0cc9ccbe59dadeaf291012d59de142b2e9f2b124dbb634644d39daddaa13e)
set(IGC_SPIRV_TRANSLATOR_HASH_TYPE SHA256)
set(IGC_SPIRV_TRANSLATOR_FILE SPIR-V-Translator-${IGC_SPIRV_TRANSLATOR_VERSION}.tar.gz)
@@ -724,15 +724,15 @@ set(IGC_SPIRV_TRANSLATOR_FILE SPIR-V-Translator-${IGC_SPIRV_TRANSLATOR_VERSION}.
### Intel Graphics Compiler DEPS END ###
########################################
set(GMMLIB_VERSION intel-gmmlib-22.1.8)
set(GMMLIB_VERSION intel-gmmlib-22.3.0)
set(GMMLIB_URI https://github.com/intel/gmmlib/archive/refs/tags/${GMMLIB_VERSION}.tar.gz)
set(GMMLIB_HASH bf23e9a3742b4fb98c7666c9e9b29f3219e4b2fb4d831aaf4eed71f5e2d17368)
set(GMMLIB_HASH c1f33e1519edfc527127baeb0436b783430dfd256c643130169a3a71dc86aff9)
set(GMMLIB_HASH_TYPE SHA256)
set(GMMLIB_FILE ${GMMLIB_VERSION}.tar.gz)
set(OCLOC_VERSION 22.38.24278)
set(OCLOC_VERSION 22.49.25018.21)
set(OCLOC_URI https://github.com/intel/compute-runtime/archive/refs/tags/${OCLOC_VERSION}.tar.gz)
set(OCLOC_HASH db0c542fccd651e6404b15a74d46027f1ce0eda8dc9e25a40cbb6c0faef257ee)
set(OCLOC_HASH 92362dae08b503a34e5d3820ed284198c452bcd5e7504d90eb69887b20492c06)
set(OCLOC_HASH_TYPE SHA256)
set(OCLOC_FILE ocloc-${OCLOC_VERSION}.tar.gz)

View File

@@ -0,0 +1,73 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: GPL-2.0-or-later
# This script ensures:
# - One dependency is built at a time.
# - That dependency uses all available cores.
#
# Without this, simply calling `make -j$(nproc)` from the `${CMAKE_BUILD_DIR}/deps/`
# directory will build all projects at once.
#
# This is undesirable for the following reasons:
#
# - The output from many projects is mixed together,
# making it difficult to track down the cause of a build failure.
#
# - Larger dependencies such as LLVM can bottleneck the build process,
# making it necessary to cancel the build and manually run build commands in each directory.
#
# - Building many projects at once means canceling (Control-C) can lead to the build being in an undefined state.
# It's possible canceling happens as a patch is being applied or files are being copied.
# (steps that aren't part of the compilation process where it's typically safe to cancel).
if [[ -z "${MY_MAKE_CALL_LEVEL}" ]]; then
export MY_MAKE_CALL_LEVEL=0
export MY_MAKEFLAGS=$MAKEFLAGS
# Extract the jobs argument (`-jN`, `-j N`, `--jobs=N`).
add_next=0
for i in "$@"; do
case $i in
-j*)
export MY_JOBS_ARG=$i
if [ "$MY_JOBS_ARG" = "-j" ]; then
add_next=1
fi
;;
--jobs=*)
shift # past argument=value
export MY_JOBS_ARG=$i
;;
*)
if (( $add_next == 1 )); then
export MY_JOBS_ARG="$MY_JOBS_ARG $i"
add_next=0
fi
;;
esac
done
unset i add_next
if [[ -z "${MY_JOBS_ARG}" ]]; then
export MY_JOBS_ARG="-j$(nproc)"
fi
# Support user defined `MAKEFLAGS`.
export MAKEFLAGS="$MY_MAKEFLAGS -j1"
else
export MY_MAKE_CALL_LEVEL=$(( $MY_MAKE_CALL_LEVEL + 1 ))
if (( $MY_MAKE_CALL_LEVEL == 1 )); then
# Important to set jobs to 1, otherwise user defined jobs argument is used.
export MAKEFLAGS="$MY_MAKEFLAGS -j1"
elif (( $MY_MAKE_CALL_LEVEL == 2 )); then
# This is the level used by each sub-project.
export MAKEFLAGS="$MY_MAKEFLAGS $MY_JOBS_ARG"
fi
# Else leave `MY_MAKEFLAGS` flags as-is, avoids setting a high number of jobs on recursive
# calls (which may easily run out of memory). Let the job-server handle the rest.
fi
# Useful for troubleshooting the wrapper.
# echo "Call level: $MY_MAKE_CALL_LEVEL, args=$@".
# Call actual make but ensure recursive calls run via this script.
exec make MAKE="$0" "$@"

View File

@@ -1,7 +1,7 @@
diff -Naur external_igc_opencl_clang.orig/CMakeLists.txt external_igc_opencl_clang/CMakeLists.txt
--- external_igc_opencl_clang.orig/CMakeLists.txt 2022-03-16 05:51:10 -0600
+++ external_igc_opencl_clang/CMakeLists.txt 2022-05-23 10:40:09 -0600
@@ -126,22 +126,24 @@
@@ -147,22 +147,24 @@
)
endif()

View File

@@ -23,19 +23,19 @@ if(EXISTS ${SOURCE_DIR}/.git)
if(MY_WC_BRANCH STREQUAL "HEAD")
# Detached HEAD, check whether commit hash is reachable
# in the master branch
# in the main branch
execute_process(COMMAND git rev-parse --short=12 HEAD
WORKING_DIRECTORY ${SOURCE_DIR}
OUTPUT_VARIABLE MY_WC_HASH
OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND git branch --list master blender-v* --contains ${MY_WC_HASH}
execute_process(COMMAND git branch --list main blender-v* --contains ${MY_WC_HASH}
WORKING_DIRECTORY ${SOURCE_DIR}
OUTPUT_VARIABLE _git_contains_check
OUTPUT_STRIP_TRAILING_WHITESPACE)
if(NOT _git_contains_check STREQUAL "")
set(MY_WC_BRANCH "master")
set(MY_WC_BRANCH "main")
else()
execute_process(COMMAND git show-ref --tags -d
WORKING_DIRECTORY ${SOURCE_DIR}
@@ -48,7 +48,7 @@ if(EXISTS ${SOURCE_DIR}/.git)
OUTPUT_STRIP_TRAILING_WHITESPACE)
if(_git_tag_hashes MATCHES "${_git_head_hash}")
set(MY_WC_BRANCH "master")
set(MY_WC_BRANCH "main")
else()
execute_process(COMMAND git branch --contains ${MY_WC_HASH}
WORKING_DIRECTORY ${SOURCE_DIR}

View File

@@ -11,11 +11,11 @@
mkdir ~/blender-git
cd ~/blender-git
git clone http://git.blender.org/blender.git
git clone https://projects.blender.org/blender/blender.git
cd blender
git submodule update --init --recursive
git submodule foreach git checkout master
git submodule foreach git pull --rebase origin master
git submodule foreach git checkout main
git submodule foreach git pull --rebase origin main
# create build dir
mkdir ~/blender-git/build-cmake
@@ -35,7 +35,7 @@ ln -s ~/blender-git/build-cmake/bin/blender ~/blender-git/blender/blender.bin
echo ""
echo "* Useful Commands *"
echo " Run Blender: ~/blender-git/blender/blender.bin"
echo " Update Blender: git pull --rebase; git submodule foreach git pull --rebase origin master"
echo " Update Blender: git pull --rebase; git submodule foreach git pull --rebase origin main"
echo " Reconfigure Blender: cd ~/blender-git/build-cmake ; cmake ."
echo " Build Blender: cd ~/blender-git/build-cmake ; make"
echo ""

View File

@@ -544,7 +544,7 @@ endfunction()
function(setup_platform_linker_libs
target
)
# jemalloc must be early in the list, to be before pthread (see T57998)
# jemalloc must be early in the list, to be before pthread (see #57998).
if(WITH_MEM_JEMALLOC)
target_link_libraries(${target} ${JEMALLOC_LIBRARIES})
endif()

View File

@@ -440,7 +440,7 @@ string(APPEND PLATFORM_LINKFLAGS " -stdlib=libc++")
# Make stack size more similar to Embree, required for Embree.
string(APPEND PLATFORM_LINKFLAGS_EXECUTABLE " -Wl,-stack_size,0x100000")
# Suppress ranlib "has no symbols" warnings (workaround for T48250)
# Suppress ranlib "has no symbols" warnings (workaround for #48250).
set(CMAKE_C_ARCHIVE_CREATE "<CMAKE_AR> Scr <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> Scr <TARGET> <LINK_FLAGS> <OBJECTS>")
# llvm-ranlib doesn't support this flag. Xcode's libtool does.

View File

@@ -121,7 +121,7 @@ if(WITH_WINDOWS_BUNDLE_CRT)
include(InstallRequiredSystemLibraries)
# ucrtbase(d).dll cannot be in the manifest, due to the way windows 10 handles
# redirects for this dll, for details see T88813.
# redirects for this dll, for details see #88813.
foreach(lib ${CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS})
string(FIND ${lib} "ucrtbase" pos)
if(NOT pos EQUAL -1)
@@ -295,7 +295,7 @@ unset(MATERIALX_LIB_FOLDER_EXISTS)
if(NOT MSVC_CLANG AND # Available with MSVC 15.7+ but not for CLANG.
NOT WITH_WINDOWS_SCCACHE AND # And not when sccache is enabled
NOT VS_CLANG_TIDY) # Clang-tidy does not like these options
add_compile_options(/experimental:external /external:templates- /external:I "${LIBDIR}" /external:W0)
add_compile_options(/experimental:external /external:I "${LIBDIR}" /external:W0)
endif()
# Add each of our libraries to our cmake_prefix_path so find_package() could work

View File

@@ -5,16 +5,16 @@
update-code:
git:
submodules:
- branch: master
- branch: main
commit_id: HEAD
path: release/scripts/addons
- branch: master
- branch: main
commit_id: HEAD
path: release/scripts/addons_contrib
- branch: master
- branch: main
commit_id: HEAD
path: release/datafiles/locale
- branch: master
- branch: main
commit_id: HEAD
path: source/tools
svn:

View File

@@ -58,7 +58,7 @@ Each Blender release supports one Python version, and the package is only compat
## Source Code
* [Releases](https://download.blender.org/source/)
* Repository: [git.blender.org/blender.git](https://git.blender.org/gitweb/gitweb.cgi/blender.git)
* Repository: [projects.blender.org/blender/blender.git](https://projects.blender.org/blender/blender)
## Credits

View File

@@ -170,7 +170,7 @@ def git_update_skip(args: argparse.Namespace, check_remote_exists: bool = True)
return "rebase or merge in progress, complete it first"
# Abort if uncommitted changes.
changes = check_output([args.git_command, 'status', '--porcelain', '--untracked-files=no'])
changes = check_output([args.git_command, 'status', '--porcelain', '--untracked-files=no', '--ignore-submodules'])
if len(changes) != 0:
return "you have unstaged changes"
@@ -202,8 +202,8 @@ def submodules_update(
sys.exit(1)
# Update submodules to appropriate given branch,
# falling back to master if none is given and/or found in a sub-repository.
branch_fallback = "master"
# falling back to main if none is given and/or found in a sub-repository.
branch_fallback = "main"
if not branch:
branch = branch_fallback

View File

@@ -3,9 +3,9 @@ if NOT exist "%BLENDER_DIR%\source\tools\.git" (
if not "%GIT%" == "" (
"%GIT%" submodule update --init --recursive --progress
if errorlevel 1 goto FAIL
"%GIT%" submodule foreach git checkout master
"%GIT%" submodule foreach git checkout main
if errorlevel 1 goto FAIL
"%GIT%" submodule foreach git pull --rebase origin master
"%GIT%" submodule foreach git pull --rebase origin main
if errorlevel 1 goto FAIL
goto EOF
) else (

View File

@@ -37,7 +37,7 @@ def draw_callback_px(self, context):
# BLF drawing routine
font_id = font_info["font_id"]
blf.position(font_id, 2, 80, 0)
blf.size(font_id, 50, 72)
blf.size(font_id, 50)
blf.draw(font_id, "Hello World")

View File

@@ -476,7 +476,7 @@ MODULE_GROUPING = {
# -------------------------------BLENDER----------------------------------------
# converting bytes to strings, due to T30154
# Converting bytes to strings, due to #30154.
BLENDER_REVISION = str(bpy.app.build_hash, 'utf_8')
BLENDER_REVISION_TIMESTAMP = bpy.app.build_commit_timestamp
@@ -487,7 +487,7 @@ BLENDER_VERSION_DOTS = "%d.%d" % (bpy.app.version[0], bpy.app.version[1])
if BLENDER_REVISION != "Unknown":
# SHA1 Git hash
BLENDER_VERSION_HASH = BLENDER_REVISION
BLENDER_VERSION_HASH_HTML_LINK = "<a href=https://developer.blender.org/rB%s>%s</a>" % (
BLENDER_VERSION_HASH_HTML_LINK = "<a href=https://projects.blender.org/blender/blender/commit/%s>%s</a>" % (
BLENDER_VERSION_HASH, BLENDER_VERSION_HASH,
)
BLENDER_VERSION_DATE = time.strftime("%d/%m/%Y", time.localtime(BLENDER_REVISION_TIMESTAMP))
@@ -647,7 +647,7 @@ def undocumented_message(module_name, type_name, identifier):
module_name, type_name, identifier,
)
return "Undocumented, consider `contributing <https://developer.blender.org/T51061>`__."
return "Undocumented, consider `contributing <https://developer.blender.org/>`__."
def range_str(val):
@@ -1816,9 +1816,9 @@ def pyrna2sphinx(basepath):
# operators
def write_ops():
API_BASEURL = "https://developer.blender.org/diffusion/B/browse/master/release/scripts"
API_BASEURL_ADDON = "https://developer.blender.org/diffusion/BA"
API_BASEURL_ADDON_CONTRIB = "https://developer.blender.org/diffusion/BAC"
API_BASEURL = "https://projects.blender.org/blender/blender/src/branch/main/release/scripts"
API_BASEURL_ADDON = "https://projects.blender.org/blender/blender-addons"
API_BASEURL_ADDON_CONTRIB = "https://projects.blender.org/blender/blender-addons-contrib"
op_modules = {}
op = None
@@ -2200,7 +2200,7 @@ def write_rst_enum_items(basepath, key, key_no_prefix, enum_items):
Write a single page for a static enum in RST.
This helps avoiding very large lists being in-lined in many places which is an issue
especially with icons in ``bpy.types.UILayout``. See T87008.
especially with icons in ``bpy.types.UILayout``. See #87008.
"""
filepath = os.path.join(basepath, "%s.rst" % key_no_prefix)
with open(filepath, "w", encoding="utf-8") as fh:

View File

@@ -156,7 +156,7 @@ var Popover = function() {
},
getNamed : function(v) {
$.each(all_versions, function(ix, title) {
if (ix === "master" || ix === "latest") {
if (ix === "master" || ix === "main" || ix === "latest") {
var m = title.match(/\d\.\d[\w\d\.]*/)[0];
if (parseFloat(m) == v) {
v = ix;

View File

@@ -1,5 +1,5 @@
Project: Blender
URL: https://git.blender.org/blender.git
URL: https://projects.blender.org/blender/blender.git
License: Apache 2.0
Upstream version: N/A
Local modifications: None

View File

@@ -12,6 +12,7 @@ from bpy.props import (
PointerProperty,
StringProperty,
)
from bpy.app.translations import pgettext_iface as iface_
from math import pi
@@ -1664,30 +1665,48 @@ class CyclesPreferences(bpy.types.AddonPreferences):
col.label(text="No compatible GPUs found for Cycles", icon='INFO')
if device_type == 'CUDA':
col.label(text="Requires NVIDIA GPU with compute capability 3.0", icon='BLANK1')
compute_capability = "3.0"
col.label(text=iface_("Requires NVIDIA GPU with compute capability %s") % compute_capability,
icon='BLANK1', translate=False)
elif device_type == 'OPTIX':
col.label(text="Requires NVIDIA GPU with compute capability 5.0", icon='BLANK1')
col.label(text="and NVIDIA driver version 470 or newer", icon='BLANK1')
compute_capability = "5.0"
driver_version = "470"
col.label(text=iface_("Requires NVIDIA GPU with compute capability %s") % compute_capability,
icon='BLANK1', translate=False)
col.label(text="and NVIDIA driver version %s or newer" % driver_version,
icon='BLANK1', translate=False)
elif device_type == 'HIP':
import sys
if sys.platform[:3] == "win":
col.label(text="Requires AMD GPU with RDNA architecture", icon='BLANK1')
col.label(text="and AMD Radeon Pro 21.Q4 driver or newer", icon='BLANK1')
driver_version = "21.Q4"
col.label(text="Requires AMD GPU with Vega or RDNA architecture", icon='BLANK1')
col.label(text=iface_("and AMD Radeon Pro %s driver or newer") % driver_version,
icon='BLANK1', translate=False)
elif sys.platform.startswith("linux"):
col.label(text="Requires AMD GPU with RDNA architecture", icon='BLANK1')
col.label(text="and AMD driver version 22.10 or newer", icon='BLANK1')
driver_version = "22.10"
col.label(text="Requires AMD GPU with Vega or RDNA architecture", icon='BLANK1')
col.label(text=iface_("and AMD driver version %s or newer") % driver_version, icon='BLANK1',
translate=False)
elif device_type == 'ONEAPI':
import sys
if sys.platform.startswith("win"):
driver_version = "101.4032"
col.label(text="Requires Intel GPU with Xe-HPG architecture", icon='BLANK1')
col.label(text="and Windows driver version 101.4032 or newer", icon='BLANK1')
col.label(text=iface_("and Windows driver version %s or newer") % driver_version,
icon='BLANK1', translate=False)
elif sys.platform.startswith("linux"):
driver_version = "1.3.24931"
col.label(text="Requires Intel GPU with Xe-HPG architecture and", icon='BLANK1')
col.label(text=" - intel-level-zero-gpu version 1.3.24931 or newer", icon='BLANK1')
col.label(text=iface_(" - intel-level-zero-gpu version %s or newer") % driver_version,
icon='BLANK1', translate=False)
col.label(text=" - oneAPI Level-Zero Loader", icon='BLANK1')
elif device_type == 'METAL':
col.label(text="Requires Apple Silicon with macOS 12.2 or newer", icon='BLANK1')
col.label(text="or AMD with macOS 12.3 or newer", icon='BLANK1')
silicon_mac_version = "12.2"
amd_mac_version = "12.3"
col.label(text=iface_("Requires Apple Silicon with macOS %s or newer") % silicon_mac_version,
icon='BLANK1', translate=False)
col.label(text=iface_("or AMD with macOS %s or newer") % amd_mac_version, icon='BLANK1',
translate=False)
return
for device in devices:
@@ -1723,12 +1742,21 @@ class CyclesPreferences(bpy.types.AddonPreferences):
if compute_device_type == 'METAL':
import platform
# MetalRT only works on Apple Silicon at present, pending argument encoding fixes on AMD
# Kernel specialization is only viable on Apple Silicon at present due to relative compilation speed
if platform.machine() == 'arm64':
import re
is_navi_2 = False
for device in devices:
if re.search(r"((RX)|(Pro)|(PRO))\s+W?6\d00X", device.name):
is_navi_2 = True
break
# MetalRT only works on Apple Silicon and Navi2.
is_arm64 = platform.machine() == 'arm64'
if is_arm64 or is_navi_2:
col = layout.column()
col.use_property_split = True
col.prop(self, "kernel_optimization_level")
# Kernel specialization is only supported on Apple Silicon
if is_arm64:
col.prop(self, "kernel_optimization_level")
col.prop(self, "use_metalrt")
def draw(self, context):

View File

@@ -20,7 +20,7 @@ class CyclesPresetPanel(PresetPanel, Panel):
@staticmethod
def post_cb(context):
# Modify an arbitrary built-in scene property to force a depsgraph
# update, because add-on properties don't. (see T62325)
# update, because add-on properties don't. (see #62325)
render = context.scene.render
render.filter_size = render.filter_size

View File

@@ -105,11 +105,12 @@ GPUShader *BlenderFallbackDisplayShader::bind(int width, int height)
/* Bind shader now to enable uniform assignment. */
GPU_shader_bind(shader_program_);
GPU_shader_uniform_int(shader_program_, image_texture_location_, 0);
int slot = 0;
GPU_shader_uniform_int_ex(shader_program_, image_texture_location_, 1, 1, &slot);
float size[2];
size[0] = width;
size[1] = height;
GPU_shader_uniform_vector(shader_program_, fullscreen_location_, 2, 1, size);
GPU_shader_uniform_float_ex(shader_program_, fullscreen_location_, 2, 1, size);
return shader_program_;
}

View File

@@ -20,7 +20,7 @@ BlenderImageLoader::BlenderImageLoader(BL::Image b_image,
: b_image(b_image),
frame(frame),
tile_number(tile_number),
/* Don't free cache for preview render to avoid race condition from T93560, to be fixed
/* Don't free cache for preview render to avoid race condition from #93560, to be fixed
* properly later as we are close to release. */
free_cache(!is_preview_render && !b_image.has_data())
{
@@ -72,7 +72,7 @@ bool BlenderImageLoader::load_metadata(const ImageDeviceFeatures &, ImageMetaDat
metadata.colorspace = u_colorspace_raw;
}
else {
/* In some cases (e.g. T94135), the colorspace setting in Blender gets updated as part of the
/* In some cases (e.g. #94135), the colorspace setting in Blender gets updated as part of the
* metadata queries in this function, so update the colorspace setting here. */
PointerRNA colorspace_ptr = b_image.colorspace_settings().ptr;
metadata.colorspace = get_enum_identifier(colorspace_ptr, "name");

View File

@@ -24,7 +24,7 @@ void BlenderSync::sync_light(BL::Object &b_parent,
Light *light = light_map.find(key);
/* Check if the transform was modified, in case a linked collection is moved we do not get a
* specific depsgraph update (T88515). This also mimics the behavior for Objects. */
* specific depsgraph update (#88515). This also mimics the behavior for Objects. */
const bool tfm_updated = (light && light->get_tfm() != tfm);
/* Update if either object or light data changed. */

View File

@@ -404,7 +404,7 @@ void BlenderSession::render(BL::Depsgraph &b_depsgraph_)
* point we know that we've got everything to render current view layer.
*/
/* At the moment we only free if we are not doing multi-view
* (or if we are rendering the last view). See T58142/D4239 for discussion.
* (or if we are rendering the last view). See #58142/D4239 for discussion.
*/
if (view_index == num_views - 1) {
free_blender_memory_if_possible();

View File

@@ -766,7 +766,7 @@ void BlenderSync::free_data_after_sync(BL::Depsgraph &b_depsgraph)
(BlenderSession::headless || is_interface_locked) &&
/* Baking re-uses the depsgraph multiple times, clearing crashes
* reading un-evaluated mesh data which isn't aligned with the
* geometry we're baking, see T71012. */
* geometry we're baking, see #71012. */
!scene->bake_manager->get_baking() &&
/* Persistent data must main caches for performance and correctness. */
!is_persistent_data;

View File

@@ -53,8 +53,12 @@ void CUDADevice::set_error(const string &error)
}
CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
: Device(info, stats, profiler), texture_info(this, "texture_info", MEM_GLOBAL)
: GPUDevice(info, stats, profiler)
{
/* Verify that base class types can be used with specific backend types */
static_assert(sizeof(texMemObject) == sizeof(CUtexObject));
static_assert(sizeof(arrayMemObject) == sizeof(CUarray));
first_error = true;
cuDevId = info.num;
@@ -65,12 +69,6 @@ CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
need_texture_info = false;
device_texture_headroom = 0;
device_working_headroom = 0;
move_texture_to_host = false;
map_host_limit = 0;
map_host_used = 0;
can_map_host = 0;
pitch_alignment = 0;
/* Initialize CUDA. */
@@ -91,8 +89,9 @@ CUDADevice::CUDADevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
/* CU_CTX_MAP_HOST for mapping host memory when out of device memory.
* CU_CTX_LMEM_RESIZE_TO_MAX for reserving local memory ahead of render,
* so we can predict which memory to map to host. */
cuda_assert(
cuDeviceGetAttribute(&can_map_host, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, cuDevice));
int value;
cuda_assert(cuDeviceGetAttribute(&value, CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY, cuDevice));
can_map_host = value != 0;
cuda_assert(cuDeviceGetAttribute(
&pitch_alignment, CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT, cuDevice));
@@ -499,311 +498,57 @@ void CUDADevice::reserve_local_memory(const uint kernel_features)
# endif
}
void CUDADevice::init_host_memory()
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep is free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower so that some space is left after all
* texture memory allocations. */
device_working_headroom = 32 * 1024 * 1024LL; // 32MB
device_texture_headroom = 128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void CUDADevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
}
}
void CUDADevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(cuda_mem_map_mutex);
foreach (CUDAMemMap::value_type &pair, cuda_mem_map) {
device_memory &mem = *pair.first;
CUDAMem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple CUDA devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
CUDADevice::CUDAMem *CUDADevice::generic_alloc(device_memory &mem, size_t pitch_padding)
void CUDADevice::get_device_memory_info(size_t &total, size_t &free)
{
CUDAContextScope scope(this);
CUdeviceptr device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
CUresult mem_alloc_result = CUDA_ERROR_OUT_OF_MEMORY;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
cuMemGetInfo(&free, &total);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
cuMemGetInfo(&free, &total);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = cuMemAlloc(&device_pointer, size);
if (mem_alloc_result == CUDA_SUCCESS) {
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (mem_alloc_result != CUDA_SUCCESS && can_map_host && mem.type != MEM_DEVICE_ONLY) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = CUDA_SUCCESS;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = cuMemHostAlloc(
&shared_pointer, size, CU_MEMHOSTALLOC_DEVICEMAP | CU_MEMHOSTALLOC_WRITECOMBINED);
assert((mem_alloc_result == CUDA_SUCCESS && shared_pointer != 0) ||
(mem_alloc_result != CUDA_SUCCESS && shared_pointer == 0));
}
if (mem_alloc_result == CUDA_SUCCESS) {
cuda_assert(cuMemHostGetDevicePointer_v2(&device_pointer, shared_pointer, 0));
map_host_used += size;
status = " in host memory";
}
}
if (mem_alloc_result != CUDA_SUCCESS) {
if (mem.type == MEM_DEVICE_ONLY) {
status = " failed, out of device memory";
set_error("System is out of GPU memory");
}
else {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(cuda_mem_map_mutex);
CUDAMem *cmem = &cuda_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* CUDA memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void CUDADevice::generic_copy_to(device_memory &mem)
bool CUDADevice::alloc_device(void *&device_pointer, size_t size)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
CUDAContextScope scope(this);
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* cuMemAlloc regardless of mem.host_pointer and mem.shared_pointer, and should copy data from
* mem.host_pointer. */
thread_scoped_lock lock(cuda_mem_map_mutex);
if (!cuda_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const CUDAContextScope scope(this);
cuda_assert(
cuMemcpyHtoD((CUdeviceptr)mem.device_pointer, mem.host_pointer, mem.memory_size()));
}
CUresult mem_alloc_result = cuMemAlloc((CUdeviceptr *)&device_pointer, size);
return mem_alloc_result == CUDA_SUCCESS;
}
void CUDADevice::generic_free(device_memory &mem)
void CUDADevice::free_device(void *device_pointer)
{
if (mem.device_pointer) {
CUDAContextScope scope(this);
thread_scoped_lock lock(cuda_mem_map_mutex);
DCHECK(cuda_mem_map.find(&mem) != cuda_mem_map.end());
const CUDAMem &cmem = cuda_mem_map[&mem];
CUDAContextScope scope(this);
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
cuda_assert(cuMemFree((CUdeviceptr)device_pointer));
}
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
cuMemFreeHost(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
cuda_assert(cuMemFree(mem.device_pointer));
}
bool CUDADevice::alloc_host(void *&shared_pointer, size_t size)
{
CUDAContextScope scope(this);
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
CUresult mem_alloc_result = cuMemHostAlloc(
&shared_pointer, size, CU_MEMHOSTALLOC_DEVICEMAP | CU_MEMHOSTALLOC_WRITECOMBINED);
return mem_alloc_result == CUDA_SUCCESS;
}
cuda_mem_map.erase(cuda_mem_map.find(&mem));
}
void CUDADevice::free_host(void *shared_pointer)
{
CUDAContextScope scope(this);
cuMemFreeHost(shared_pointer);
}
bool CUDADevice::transform_host_pointer(void *&device_pointer, void *&shared_pointer)
{
CUDAContextScope scope(this);
cuda_assert(cuMemHostGetDevicePointer_v2((CUdeviceptr *)&device_pointer, shared_pointer, 0));
return true;
}
void CUDADevice::copy_host_to_device(void *device_pointer, void *host_pointer, size_t size)
{
const CUDAContextScope scope(this);
cuda_assert(cuMemcpyHtoD((CUdeviceptr)device_pointer, host_pointer, size));
}
void CUDADevice::mem_alloc(device_memory &mem)
@@ -868,8 +613,8 @@ void CUDADevice::mem_zero(device_memory &mem)
/* If use_mapped_host of mem is false, mem.device_pointer currently refers to device memory
* regardless of mem.host_pointer and mem.shared_pointer. */
thread_scoped_lock lock(cuda_mem_map_mutex);
if (!cuda_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const CUDAContextScope scope(this);
cuda_assert(cuMemsetD8((CUdeviceptr)mem.device_pointer, 0, mem.memory_size()));
}
@@ -994,19 +739,19 @@ void CUDADevice::tex_alloc(device_texture &mem)
return;
}
CUDAMem *cmem = NULL;
Mem *cmem = NULL;
CUarray array_3d = NULL;
size_t src_pitch = mem.data_width * dsize * mem.data_elements;
size_t dst_pitch = src_pitch;
if (!mem.is_resident(this)) {
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
if (mem.data_depth > 1) {
array_3d = (CUarray)mem.device_pointer;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
dst_pitch = align_up(src_pitch, pitch_alignment);
@@ -1050,10 +795,10 @@ void CUDADevice::tex_alloc(device_texture &mem)
mem.device_size = size;
stats.mem_alloc(size);
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
/* 2D texture, using pitch aligned linear memory. */
@@ -1137,8 +882,8 @@ void CUDADevice::tex_alloc(device_texture &mem)
texDesc.filterMode = filter_mode;
texDesc.flags = CU_TRSF_NORMALIZED_COORDINATES;
thread_scoped_lock lock(cuda_mem_map_mutex);
cmem = &cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cuda_assert(cuTexObjectCreate(&cmem->texobject, &resDesc, &texDesc, NULL));
@@ -1153,9 +898,9 @@ void CUDADevice::tex_free(device_texture &mem)
{
if (mem.device_pointer) {
CUDAContextScope scope(this);
thread_scoped_lock lock(cuda_mem_map_mutex);
DCHECK(cuda_mem_map.find(&mem) != cuda_mem_map.end());
const CUDAMem &cmem = cuda_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
if (cmem.texobject) {
/* Free bindless texture. */
@@ -1164,16 +909,16 @@ void CUDADevice::tex_free(device_texture &mem)
if (!mem.is_resident(this)) {
/* Do not free memory here, since it was allocated on a different device. */
cuda_mem_map.erase(cuda_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else if (cmem.array) {
/* Free array. */
cuArrayDestroy(cmem.array);
cuArrayDestroy(reinterpret_cast<CUarray>(cmem.array));
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
cuda_mem_map.erase(cuda_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else {
lock.unlock();

View File

@@ -21,7 +21,7 @@ CCL_NAMESPACE_BEGIN
class DeviceQueue;
class CUDADevice : public Device {
class CUDADevice : public GPUDevice {
friend class CUDAContextScope;
@@ -29,36 +29,11 @@ class CUDADevice : public Device {
CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
size_t device_texture_headroom;
size_t device_working_headroom;
bool move_texture_to_host;
size_t map_host_used;
size_t map_host_limit;
int can_map_host;
int pitch_alignment;
int cuDevId;
int cuDevArchitecture;
bool first_error;
struct CUDAMem {
CUDAMem() : texobject(0), array(0), use_mapped_host(false)
{
}
CUtexObject texobject;
CUarray array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, CUDAMem> CUDAMemMap;
CUDAMemMap cuda_mem_map;
thread_mutex cuda_mem_map_mutex;
/* Bindless Textures */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
CUDADeviceKernels kernels;
static bool have_precompiled_kernels();
@@ -88,17 +63,13 @@ class CUDADevice : public Device {
void reserve_local_memory(const uint kernel_features);
void init_host_memory();
void load_texture_info();
void move_textures_to_host(size_t size, bool for_texture);
CUDAMem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
void generic_copy_to(device_memory &mem);
void generic_free(device_memory &mem);
virtual void get_device_memory_info(size_t &total, size_t &free) override;
virtual bool alloc_device(void *&device_pointer, size_t size) override;
virtual void free_device(void *device_pointer) override;
virtual bool alloc_host(void *&shared_pointer, size_t size) override;
virtual void free_host(void *shared_pointer) override;
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) override;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) override;
void mem_alloc(device_memory &mem) override;

View File

@@ -452,6 +452,320 @@ void *Device::get_cpu_osl_memory()
return nullptr;
}
GPUDevice::~GPUDevice() noexcept(false)
{
}
bool GPUDevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
return true;
}
else {
return false;
}
}
void GPUDevice::init_host_memory(size_t preferred_texture_headroom,
size_t preferred_working_headroom)
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower than the working one so there
* is space left for it. */
device_working_headroom = preferred_working_headroom > 0 ? preferred_working_headroom :
32 * 1024 * 1024LL; // 32MB
device_texture_headroom = preferred_texture_headroom > 0 ? preferred_texture_headroom :
128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void GPUDevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(device_mem_map_mutex);
foreach (MemMap::value_type &pair, device_mem_map) {
device_memory &mem = *pair.first;
Mem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple backend devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
GPUDevice::Mem *GPUDevice::generic_alloc(device_memory &mem, size_t pitch_padding)
{
void *device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
bool mem_alloc_result = false;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
get_device_memory_info(total, free);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
get_device_memory_info(total, free);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = alloc_device(device_pointer, size);
if (mem_alloc_result) {
device_mem_in_use += size;
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (!mem_alloc_result && can_map_host && mem.type != MEM_DEVICE_ONLY) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = true;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = alloc_host(shared_pointer, size);
assert((mem_alloc_result && shared_pointer != 0) ||
(!mem_alloc_result && shared_pointer == 0));
}
if (mem_alloc_result) {
assert(transform_host_pointer(device_pointer, shared_pointer));
map_host_used += size;
status = " in host memory";
}
}
if (!mem_alloc_result) {
if (mem.type == MEM_DEVICE_ONLY) {
status = " failed, out of device memory";
set_error("System is out of GPU memory");
}
else {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(device_mem_map_mutex);
Mem *cmem = &device_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void GPUDevice::generic_free(device_memory &mem)
{
if (mem.device_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
free_host(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
free_device((void *)mem.device_pointer);
device_mem_in_use -= mem.device_size;
}
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
device_mem_map.erase(device_mem_map.find(&mem));
}
}
void GPUDevice::generic_copy_to(device_memory &mem)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* backend device allocation regardless of mem.host_pointer and mem.shared_pointer, and should
* copy data from mem.host_pointer. */
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
copy_host_to_device((void *)mem.device_pointer, mem.host_pointer, mem.memory_size());
}
}
/* DeviceInfo */
CCL_NAMESPACE_END

View File

@@ -309,6 +309,93 @@ class Device {
static uint devices_initialized_mask;
};
/* Device, which is GPU, with some common functionality for GPU backends */
class GPUDevice : public Device {
protected:
GPUDevice(const DeviceInfo &info_, Stats &stats_, Profiler &profiler_)
: Device(info_, stats_, profiler_),
texture_info(this, "texture_info", MEM_GLOBAL),
need_texture_info(false),
can_map_host(false),
map_host_used(0),
map_host_limit(0),
device_texture_headroom(0),
device_working_headroom(0),
device_mem_map(),
device_mem_map_mutex(),
move_texture_to_host(false),
device_mem_in_use(0)
{
}
public:
virtual ~GPUDevice() noexcept(false);
/* For GPUs that can use bindless textures in some way or another. */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
/* Returns true if the texture info was copied to the device (meaning, some more
* re-initialization might be needed). */
virtual bool load_texture_info();
protected:
/* Memory allocation, only accessed through device_memory. */
friend class device_memory;
bool can_map_host;
size_t map_host_used;
size_t map_host_limit;
size_t device_texture_headroom;
size_t device_working_headroom;
typedef unsigned long long texMemObject;
typedef unsigned long long arrayMemObject;
struct Mem {
Mem() : texobject(0), array(0), use_mapped_host(false)
{
}
texMemObject texobject;
arrayMemObject array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, Mem> MemMap;
MemMap device_mem_map;
thread_mutex device_mem_map_mutex;
bool move_texture_to_host;
/* Simple counter which will try to track amount of used device memory */
size_t device_mem_in_use;
virtual void init_host_memory(size_t preferred_texture_headroom = 0,
size_t preferred_working_headroom = 0);
virtual void move_textures_to_host(size_t size, bool for_texture);
/* Allocation, deallocation and copy functions, with corresponding
* support of device/host allocations. */
virtual GPUDevice::Mem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
virtual void generic_free(device_memory &mem);
virtual void generic_copy_to(device_memory &mem);
/* total - amount of device memory, free - amount of available device memory */
virtual void get_device_memory_info(size_t &total, size_t &free) = 0;
virtual bool alloc_device(void *&device_pointer, size_t size) = 0;
virtual void free_device(void *device_pointer) = 0;
virtual bool alloc_host(void *&shared_pointer, size_t size) = 0;
virtual void free_host(void *shared_pointer) = 0;
/* This function should return device pointer corresponding to shared pointer, which
* is host buffer, allocated in `alloc_host`. The function should `true`, if such
* address transformation is possible and `false` otherwise. */
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) = 0;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) = 0;
};
CCL_NAMESPACE_END
#endif /* __DEVICE_H__ */

View File

@@ -53,8 +53,12 @@ void HIPDevice::set_error(const string &error)
}
HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
: Device(info, stats, profiler), texture_info(this, "texture_info", MEM_GLOBAL)
: GPUDevice(info, stats, profiler)
{
/* Verify that base class types can be used with specific backend types */
static_assert(sizeof(texMemObject) == sizeof(hipTextureObject_t));
static_assert(sizeof(arrayMemObject) == sizeof(hArray));
first_error = true;
hipDevId = info.num;
@@ -65,12 +69,6 @@ HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
need_texture_info = false;
device_texture_headroom = 0;
device_working_headroom = 0;
move_texture_to_host = false;
map_host_limit = 0;
map_host_used = 0;
can_map_host = 0;
pitch_alignment = 0;
/* Initialize HIP. */
@@ -91,7 +89,9 @@ HIPDevice::HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler)
/* hipDeviceMapHost for mapping host memory when out of device memory.
* hipDeviceLmemResizeToMax for reserving local memory ahead of render,
* so we can predict which memory to map to host. */
hip_assert(hipDeviceGetAttribute(&can_map_host, hipDeviceAttributeCanMapHostMemory, hipDevice));
int value;
hip_assert(hipDeviceGetAttribute(&value, hipDeviceAttributeCanMapHostMemory, hipDevice));
can_map_host = value != 0;
hip_assert(
hipDeviceGetAttribute(&pitch_alignment, hipDeviceAttributeTexturePitchAlignment, hipDevice));
@@ -460,305 +460,58 @@ void HIPDevice::reserve_local_memory(const uint kernel_features)
# endif
}
void HIPDevice::init_host_memory()
{
/* Limit amount of host mapped memory, because allocating too much can
* cause system instability. Leave at least half or 4 GB of system
* memory free, whichever is smaller. */
size_t default_limit = 4 * 1024 * 1024 * 1024LL;
size_t system_ram = system_physical_ram();
if (system_ram > 0) {
if (system_ram / 2 > default_limit) {
map_host_limit = system_ram - default_limit;
}
else {
map_host_limit = system_ram / 2;
}
}
else {
VLOG_WARNING << "Mapped host memory disabled, failed to get system RAM";
map_host_limit = 0;
}
/* Amount of device memory to keep is free after texture memory
* and working memory allocations respectively. We set the working
* memory limit headroom lower so that some space is left after all
* texture memory allocations. */
device_working_headroom = 32 * 1024 * 1024LL; // 32MB
device_texture_headroom = 128 * 1024 * 1024LL; // 128MB
VLOG_INFO << "Mapped host memory limit set to " << string_human_readable_number(map_host_limit)
<< " bytes. (" << string_human_readable_size(map_host_limit) << ")";
}
void HIPDevice::load_texture_info()
{
if (need_texture_info) {
/* Unset flag before copying, so this does not loop indefinitely if the copy below calls
* into 'move_textures_to_host' (which calls 'load_texture_info' again). */
need_texture_info = false;
texture_info.copy_to_device();
}
}
void HIPDevice::move_textures_to_host(size_t size, bool for_texture)
{
/* Break out of recursive call, which can happen when moving memory on a multi device. */
static bool any_device_moving_textures_to_host = false;
if (any_device_moving_textures_to_host) {
return;
}
/* Signal to reallocate textures in host memory only. */
move_texture_to_host = true;
while (size > 0) {
/* Find suitable memory allocation to move. */
device_memory *max_mem = NULL;
size_t max_size = 0;
bool max_is_image = false;
thread_scoped_lock lock(hip_mem_map_mutex);
foreach (HIPMemMap::value_type &pair, hip_mem_map) {
device_memory &mem = *pair.first;
HIPMem *cmem = &pair.second;
/* Can only move textures allocated on this device (and not those from peer devices).
* And need to ignore memory that is already on the host. */
if (!mem.is_resident(this) || cmem->use_mapped_host) {
continue;
}
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) &&
(&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
/* Can't move this type of memory. */
if (!is_texture || cmem->array) {
continue;
}
/* For other textures, only move image textures. */
if (for_texture && !is_image) {
continue;
}
/* Try to move largest allocation, prefer moving images. */
if (is_image > max_is_image || (is_image == max_is_image && mem.device_size > max_size)) {
max_is_image = is_image;
max_size = mem.device_size;
max_mem = &mem;
}
}
lock.unlock();
/* Move to host memory. This part is mutex protected since
* multiple HIP devices could be moving the memory. The
* first one will do it, and the rest will adopt the pointer. */
if (max_mem) {
VLOG_WORK << "Move memory from device to host: " << max_mem->name;
static thread_mutex move_mutex;
thread_scoped_lock lock(move_mutex);
any_device_moving_textures_to_host = true;
/* Potentially need to call back into multi device, so pointer mapping
* and peer devices are updated. This is also necessary since the device
* pointer may just be a key here, so cannot be accessed and freed directly.
* Unfortunately it does mean that memory is reallocated on all other
* devices as well, which is potentially dangerous when still in use (since
* a thread rendering on another devices would only be caught in this mutex
* if it so happens to do an allocation at the same time as well. */
max_mem->device_copy_to();
size = (max_size >= size) ? 0 : size - max_size;
any_device_moving_textures_to_host = false;
}
else {
break;
}
}
/* Unset flag before texture info is reloaded, since it should stay in device memory. */
move_texture_to_host = false;
/* Update texture info array with new pointers. */
load_texture_info();
}
HIPDevice::HIPMem *HIPDevice::generic_alloc(device_memory &mem, size_t pitch_padding)
void HIPDevice::get_device_memory_info(size_t &total, size_t &free)
{
HIPContextScope scope(this);
hipDeviceptr_t device_pointer = 0;
size_t size = mem.memory_size() + pitch_padding;
hipError_t mem_alloc_result = hipErrorOutOfMemory;
const char *status = "";
/* First try allocating in device memory, respecting headroom. We make
* an exception for texture info. It is small and frequently accessed,
* so treat it as working memory.
*
* If there is not enough room for working memory, we will try to move
* textures to host memory, assuming the performance impact would have
* been worse for working memory. */
bool is_texture = (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) && (&mem != &texture_info);
bool is_image = is_texture && (mem.data_height > 1);
size_t headroom = (is_texture) ? device_texture_headroom : device_working_headroom;
size_t total = 0, free = 0;
hipMemGetInfo(&free, &total);
/* Move textures to host memory if needed. */
if (!move_texture_to_host && !is_image && (size + headroom) >= free && can_map_host) {
move_textures_to_host(size + headroom - free, is_texture);
hipMemGetInfo(&free, &total);
}
/* Allocate in device memory. */
if (!move_texture_to_host && (size + headroom) < free) {
mem_alloc_result = hipMalloc(&device_pointer, size);
if (mem_alloc_result == hipSuccess) {
status = " in device memory";
}
}
/* Fall back to mapped host memory if needed and possible. */
void *shared_pointer = 0;
if (mem_alloc_result != hipSuccess && can_map_host) {
if (mem.shared_pointer) {
/* Another device already allocated host memory. */
mem_alloc_result = hipSuccess;
shared_pointer = mem.shared_pointer;
}
else if (map_host_used + size < map_host_limit) {
/* Allocate host memory ourselves. */
mem_alloc_result = hipHostMalloc(
&shared_pointer, size, hipHostMallocMapped | hipHostMallocWriteCombined);
assert((mem_alloc_result == hipSuccess && shared_pointer != 0) ||
(mem_alloc_result != hipSuccess && shared_pointer == 0));
}
if (mem_alloc_result == hipSuccess) {
hip_assert(hipHostGetDevicePointer(&device_pointer, shared_pointer, 0));
map_host_used += size;
status = " in host memory";
}
}
if (mem_alloc_result != hipSuccess) {
status = " failed, out of device and host memory";
set_error("System is out of GPU and shared host memory");
}
if (mem.name) {
VLOG_WORK << "Buffer allocate: " << mem.name << ", "
<< string_human_readable_number(mem.memory_size()) << " bytes. ("
<< string_human_readable_size(mem.memory_size()) << ")" << status;
}
mem.device_pointer = (device_ptr)device_pointer;
mem.device_size = size;
stats.mem_alloc(size);
if (!mem.device_pointer) {
return NULL;
}
/* Insert into map of allocations. */
thread_scoped_lock lock(hip_mem_map_mutex);
HIPMem *cmem = &hip_mem_map[&mem];
if (shared_pointer != 0) {
/* Replace host pointer with our host allocation. Only works if
* HIP memory layout is the same and has no pitch padding. Also
* does not work if we move textures to host during a render,
* since other devices might be using the memory. */
if (!move_texture_to_host && pitch_padding == 0 && mem.host_pointer &&
mem.host_pointer != shared_pointer) {
memcpy(shared_pointer, mem.host_pointer, size);
/* A Call to device_memory::host_free() should be preceded by
* a call to device_memory::device_free() for host memory
* allocated by a device to be handled properly. Two exceptions
* are here and a call in OptiXDevice::generic_alloc(), where
* the current host memory can be assumed to be allocated by
* device_memory::host_alloc(), not by a device */
mem.host_free();
mem.host_pointer = shared_pointer;
}
mem.shared_pointer = shared_pointer;
mem.shared_counter++;
cmem->use_mapped_host = true;
}
else {
cmem->use_mapped_host = false;
}
return cmem;
}
void HIPDevice::generic_copy_to(device_memory &mem)
bool HIPDevice::alloc_device(void *&device_pointer, size_t size)
{
if (!mem.host_pointer || !mem.device_pointer) {
return;
}
HIPContextScope scope(this);
/* If use_mapped_host of mem is false, the current device only uses device memory allocated by
* hipMalloc regardless of mem.host_pointer and mem.shared_pointer, and should copy data from
* mem.host_pointer. */
thread_scoped_lock lock(hip_mem_map_mutex);
if (!hip_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const HIPContextScope scope(this);
hip_assert(
hipMemcpyHtoD((hipDeviceptr_t)mem.device_pointer, mem.host_pointer, mem.memory_size()));
}
hipError_t mem_alloc_result = hipMalloc((hipDeviceptr_t *)&device_pointer, size);
return mem_alloc_result == hipSuccess;
}
void HIPDevice::generic_free(device_memory &mem)
void HIPDevice::free_device(void *device_pointer)
{
if (mem.device_pointer) {
HIPContextScope scope(this);
thread_scoped_lock lock(hip_mem_map_mutex);
DCHECK(hip_mem_map.find(&mem) != hip_mem_map.end());
const HIPMem &cmem = hip_mem_map[&mem];
HIPContextScope scope(this);
/* If cmem.use_mapped_host is true, reference counting is used
* to safely free a mapped host memory. */
hip_assert(hipFree((hipDeviceptr_t)device_pointer));
}
if (cmem.use_mapped_host) {
assert(mem.shared_pointer);
if (mem.shared_pointer) {
assert(mem.shared_counter > 0);
if (--mem.shared_counter == 0) {
if (mem.host_pointer == mem.shared_pointer) {
mem.host_pointer = 0;
}
hipHostFree(mem.shared_pointer);
mem.shared_pointer = 0;
}
}
map_host_used -= mem.device_size;
}
else {
/* Free device memory. */
hip_assert(hipFree(mem.device_pointer));
}
bool HIPDevice::alloc_host(void *&shared_pointer, size_t size)
{
HIPContextScope scope(this);
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
hipError_t mem_alloc_result = hipHostMalloc(
&shared_pointer, size, hipHostMallocMapped | hipHostMallocWriteCombined);
hip_mem_map.erase(hip_mem_map.find(&mem));
}
return mem_alloc_result == hipSuccess;
}
void HIPDevice::free_host(void *shared_pointer)
{
HIPContextScope scope(this);
hipHostFree(shared_pointer);
}
bool HIPDevice::transform_host_pointer(void *&device_pointer, void *&shared_pointer)
{
HIPContextScope scope(this);
hip_assert(hipHostGetDevicePointer((hipDeviceptr_t *)&device_pointer, shared_pointer, 0));
return true;
}
void HIPDevice::copy_host_to_device(void *device_pointer, void *host_pointer, size_t size)
{
const HIPContextScope scope(this);
hip_assert(hipMemcpyHtoD((hipDeviceptr_t)device_pointer, host_pointer, size));
}
void HIPDevice::mem_alloc(device_memory &mem)
@@ -823,8 +576,8 @@ void HIPDevice::mem_zero(device_memory &mem)
/* If use_mapped_host of mem is false, mem.device_pointer currently refers to device memory
* regardless of mem.host_pointer and mem.shared_pointer. */
thread_scoped_lock lock(hip_mem_map_mutex);
if (!hip_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
thread_scoped_lock lock(device_mem_map_mutex);
if (!device_mem_map[&mem].use_mapped_host || mem.host_pointer != mem.shared_pointer) {
const HIPContextScope scope(this);
hip_assert(hipMemsetD8((hipDeviceptr_t)mem.device_pointer, 0, mem.memory_size()));
}
@@ -951,19 +704,19 @@ void HIPDevice::tex_alloc(device_texture &mem)
return;
}
HIPMem *cmem = NULL;
Mem *cmem = NULL;
hArray array_3d = NULL;
size_t src_pitch = mem.data_width * dsize * mem.data_elements;
size_t dst_pitch = src_pitch;
if (!mem.is_resident(this)) {
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
if (mem.data_depth > 1) {
array_3d = (hArray)mem.device_pointer;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
dst_pitch = align_up(src_pitch, pitch_alignment);
@@ -1007,10 +760,10 @@ void HIPDevice::tex_alloc(device_texture &mem)
mem.device_size = size;
stats.mem_alloc(size);
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
cmem->texobject = 0;
cmem->array = array_3d;
cmem->array = reinterpret_cast<arrayMemObject>(array_3d);
}
else if (mem.data_height > 0) {
/* 2D texture, using pitch aligned linear memory. */
@@ -1095,8 +848,8 @@ void HIPDevice::tex_alloc(device_texture &mem)
texDesc.filterMode = filter_mode;
texDesc.flags = HIP_TRSF_NORMALIZED_COORDINATES;
thread_scoped_lock lock(hip_mem_map_mutex);
cmem = &hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
cmem = &device_mem_map[&mem];
hip_assert(hipTexObjectCreate(&cmem->texobject, &resDesc, &texDesc, NULL));
@@ -1111,9 +864,9 @@ void HIPDevice::tex_free(device_texture &mem)
{
if (mem.device_pointer) {
HIPContextScope scope(this);
thread_scoped_lock lock(hip_mem_map_mutex);
DCHECK(hip_mem_map.find(&mem) != hip_mem_map.end());
const HIPMem &cmem = hip_mem_map[&mem];
thread_scoped_lock lock(device_mem_map_mutex);
DCHECK(device_mem_map.find(&mem) != device_mem_map.end());
const Mem &cmem = device_mem_map[&mem];
if (cmem.texobject) {
/* Free bindless texture. */
@@ -1122,16 +875,16 @@ void HIPDevice::tex_free(device_texture &mem)
if (!mem.is_resident(this)) {
/* Do not free memory here, since it was allocated on a different device. */
hip_mem_map.erase(hip_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else if (cmem.array) {
/* Free array. */
hipArrayDestroy(cmem.array);
hipArrayDestroy(reinterpret_cast<hArray>(cmem.array));
stats.mem_free(mem.device_size);
mem.device_pointer = 0;
mem.device_size = 0;
hip_mem_map.erase(hip_mem_map.find(&mem));
device_mem_map.erase(device_mem_map.find(&mem));
}
else {
lock.unlock();
@@ -1153,7 +906,7 @@ bool HIPDevice::should_use_graphics_interop()
* possible, but from the empiric measurements it can be considerably slower than using naive
* pixels copy. */
/* Disable graphics interop for now, because of driver bug in 21.40. See T92972 */
/* Disable graphics interop for now, because of driver bug in 21.40. See #92972 */
# if 0
HIPContextScope scope(this);

View File

@@ -18,7 +18,7 @@ CCL_NAMESPACE_BEGIN
class DeviceQueue;
class HIPDevice : public Device {
class HIPDevice : public GPUDevice {
friend class HIPContextScope;
@@ -26,36 +26,11 @@ class HIPDevice : public Device {
hipDevice_t hipDevice;
hipCtx_t hipContext;
hipModule_t hipModule;
size_t device_texture_headroom;
size_t device_working_headroom;
bool move_texture_to_host;
size_t map_host_used;
size_t map_host_limit;
int can_map_host;
int pitch_alignment;
int hipDevId;
int hipDevArchitecture;
bool first_error;
struct HIPMem {
HIPMem() : texobject(0), array(0), use_mapped_host(false)
{
}
hipTextureObject_t texobject;
hArray array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, HIPMem> HIPMemMap;
HIPMemMap hip_mem_map;
thread_mutex hip_mem_map_mutex;
/* Bindless Textures */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
HIPDeviceKernels kernels;
static bool have_precompiled_kernels();
@@ -81,17 +56,13 @@ class HIPDevice : public Device {
virtual bool load_kernels(const uint kernel_features) override;
void reserve_local_memory(const uint kernel_features);
void init_host_memory();
void load_texture_info();
void move_textures_to_host(size_t size, bool for_texture);
HIPMem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
void generic_copy_to(device_memory &mem);
void generic_free(device_memory &mem);
virtual void get_device_memory_info(size_t &total, size_t &free) override;
virtual bool alloc_device(void *&device_pointer, size_t size) override;
virtual void free_device(void *device_pointer) override;
virtual bool alloc_host(void *&shared_pointer, size_t size) override;
virtual void free_host(void *shared_pointer) override;
virtual bool transform_host_pointer(void *&device_pointer, void *&shared_pointer) override;
virtual void copy_host_to_device(void *device_pointer, void *host_pointer, size_t size) override;
void mem_alloc(device_memory &mem) override;

View File

@@ -51,7 +51,7 @@ static inline bool hipSupportsDevice(const int hipDevId)
hipDeviceGetAttribute(&major, hipDeviceAttributeComputeCapabilityMajor, hipDevId);
hipDeviceGetAttribute(&minor, hipDeviceAttributeComputeCapabilityMinor, hipDevId);
return (major >= 10);
return (major >= 9);
}
CCL_NAMESPACE_END

View File

@@ -73,6 +73,10 @@ const char *device_kernel_as_string(DeviceKernel kernel)
return "integrator_terminated_paths_array";
case DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY:
return "integrator_sorted_paths_array";
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS:
return "integrator_sort_bucket_pass";
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS:
return "integrator_sort_write_pass";
case DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY:
return "integrator_compact_paths_array";
case DEVICE_KERNEL_INTEGRATOR_COMPACT_STATES:

View File

@@ -247,6 +247,8 @@ class device_memory {
bool is_resident(Device *sub_device) const;
protected:
friend class Device;
friend class GPUDevice;
friend class CUDADevice;
friend class OptiXDevice;
friend class HIPDevice;

View File

@@ -21,6 +21,7 @@ class BVHMetal : public BVH {
API_AVAILABLE(macos(11.0))
vector<id<MTLAccelerationStructure>> blas_array;
vector<uint32_t> blas_lookup;
bool motion_blur = false;

View File

@@ -816,6 +816,11 @@ bool BVHMetal::build_TLAS(Progress &progress,
uint32_t instance_index = 0;
uint32_t motion_transform_index = 0;
// allocate look up buffer for wost case scenario
uint64_t count = objects.size();
blas_lookup.resize(count);
for (Object *ob : objects) {
/* Skip non-traceable objects */
if (!ob->is_traceable())
@@ -843,12 +848,15 @@ bool BVHMetal::build_TLAS(Progress &progress,
/* Set user instance ID to object index */
int object_index = ob->get_device_index();
uint32_t user_id = uint32_t(object_index);
int currIndex = instance_index++;
assert(user_id < blas_lookup.size());
blas_lookup[user_id] = accel_struct_index;
/* Bake into the appropriate descriptor */
if (motion_blur) {
MTLAccelerationStructureMotionInstanceDescriptor *instances =
(MTLAccelerationStructureMotionInstanceDescriptor *)[instanceBuf contents];
MTLAccelerationStructureMotionInstanceDescriptor &desc = instances[instance_index++];
MTLAccelerationStructureMotionInstanceDescriptor &desc = instances[currIndex];
desc.accelerationStructureIndex = accel_struct_index;
desc.userID = user_id;
@@ -894,7 +902,7 @@ bool BVHMetal::build_TLAS(Progress &progress,
else {
MTLAccelerationStructureUserIDInstanceDescriptor *instances =
(MTLAccelerationStructureUserIDInstanceDescriptor *)[instanceBuf contents];
MTLAccelerationStructureUserIDInstanceDescriptor &desc = instances[instance_index++];
MTLAccelerationStructureUserIDInstanceDescriptor &desc = instances[currIndex];
desc.accelerationStructureIndex = accel_struct_index;
desc.userID = user_id;

View File

@@ -55,6 +55,10 @@ void device_metal_info(vector<DeviceInfo> &devices)
info.denoisers = DENOISER_NONE;
info.id = id;
if (MetalInfo::get_device_vendor(device) == METAL_GPU_AMD) {
info.has_light_tree = false;
}
devices.push_back(info);
device_index++;
}

View File

@@ -74,6 +74,11 @@ class MetalDevice : public Device {
id<MTLBuffer> texture_bindings_3d = nil;
std::vector<id<MTLTexture>> texture_slot_map;
/* BLAS encoding & lookup */
id<MTLArgumentEncoder> mtlBlasArgEncoder = nil;
id<MTLBuffer> blas_buffer = nil;
id<MTLBuffer> blas_lookup_buffer = nil;
bool use_metalrt = false;
MetalPipelineType kernel_specialization_level = PSO_GENERIC;
@@ -105,6 +110,8 @@ class MetalDevice : public Device {
bool use_adaptive_compilation();
bool use_local_atomic_sort() const;
bool make_source_and_check_if_compile_needed(MetalPipelineType pso_type);
void make_source(MetalPipelineType pso_type, const uint kernel_features);

View File

@@ -105,6 +105,7 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
}
case METAL_GPU_AMD: {
max_threads_per_threadgroup = 128;
use_metalrt = info.use_metalrt;
break;
}
case METAL_GPU_APPLE: {
@@ -192,6 +193,10 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
arg_desc_as.dataType = MTLDataTypeInstanceAccelerationStructure;
arg_desc_as.access = MTLArgumentAccessReadOnly;
MTLArgumentDescriptor *arg_desc_ptrs = [[MTLArgumentDescriptor alloc] init];
arg_desc_ptrs.dataType = MTLDataTypePointer;
arg_desc_ptrs.access = MTLArgumentAccessReadOnly;
MTLArgumentDescriptor *arg_desc_ift = [[MTLArgumentDescriptor alloc] init];
arg_desc_ift.dataType = MTLDataTypeIntersectionFunctionTable;
arg_desc_ift.access = MTLArgumentAccessReadOnly;
@@ -204,14 +209,28 @@ MetalDevice::MetalDevice(const DeviceInfo &info, Stats &stats, Profiler &profile
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_shadow */
arg_desc_ift.index = index++;
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_local */
arg_desc_ift.index = index++;
[ancillary_desc addObject:[arg_desc_ift copy]]; /* ift_local_prim */
arg_desc_ptrs.index = index++;
[ancillary_desc addObject:[arg_desc_ptrs copy]]; /* blas array */
arg_desc_ptrs.index = index++;
[ancillary_desc addObject:[arg_desc_ptrs copy]]; /* look up table for blas */
[arg_desc_ift release];
[arg_desc_as release];
[arg_desc_ptrs release];
}
}
mtlAncillaryArgEncoder = [mtlDevice newArgumentEncoderWithArguments:ancillary_desc];
// preparing the blas arg encoder
MTLArgumentDescriptor *arg_desc_blas = [[MTLArgumentDescriptor alloc] init];
arg_desc_blas.dataType = MTLDataTypeInstanceAccelerationStructure;
arg_desc_blas.access = MTLArgumentAccessReadOnly;
mtlBlasArgEncoder = [mtlDevice newArgumentEncoderWithArguments:@[ arg_desc_blas ]];
[arg_desc_blas release];
for (int i = 0; i < ancillary_desc.count; i++) {
[ancillary_desc[i] release];
}
@@ -271,6 +290,11 @@ bool MetalDevice::use_adaptive_compilation()
return DebugFlags().metal.adaptive_compile;
}
bool MetalDevice::use_local_atomic_sort() const
{
return DebugFlags().metal.use_local_atomic_sort;
}
void MetalDevice::make_source(MetalPipelineType pso_type, const uint kernel_features)
{
string global_defines;
@@ -278,6 +302,10 @@ void MetalDevice::make_source(MetalPipelineType pso_type, const uint kernel_feat
global_defines += "#define __KERNEL_FEATURES__ " + to_string(kernel_features) + "\n";
}
if (use_local_atomic_sort()) {
global_defines += "#define __KERNEL_LOCAL_ATOMIC_SORT__\n";
}
if (use_metalrt) {
global_defines += "#define __METALRT__\n";
if (motion_blur) {
@@ -558,7 +586,7 @@ void MetalDevice::erase_allocation(device_memory &mem)
if (it != metal_mem_map.end()) {
MetalMem *mmem = it->second.get();
/* blank out reference to MetalMem* in the launch params (fixes crash T94736) */
/* blank out reference to MetalMem* in the launch params (fixes crash #94736) */
if (mmem->pointer_index >= 0) {
device_ptr *pointers = (device_ptr *)&launch_params;
pointers[mmem->pointer_index] = 0;
@@ -1231,6 +1259,33 @@ void MetalDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
if (@available(macos 11.0, *)) {
if (bvh->params.top_level) {
bvhMetalRT = bvh_metal;
// allocate required buffers for BLAS array
uint64_t count = bvhMetalRT->blas_array.size();
uint64_t bufferSize = mtlBlasArgEncoder.encodedLength * count;
blas_buffer = [mtlDevice newBufferWithLength:bufferSize options:default_storage_mode];
stats.mem_alloc(blas_buffer.allocatedSize);
for (uint64_t i = 0; i < count; ++i) {
[mtlBlasArgEncoder setArgumentBuffer:blas_buffer
offset:i * mtlBlasArgEncoder.encodedLength];
[mtlBlasArgEncoder setAccelerationStructure:bvhMetalRT->blas_array[i] atIndex:0];
}
count = bvhMetalRT->blas_lookup.size();
bufferSize = sizeof(uint32_t) * count;
blas_lookup_buffer = [mtlDevice newBufferWithLength:bufferSize
options:default_storage_mode];
stats.mem_alloc(blas_lookup_buffer.allocatedSize);
memcpy([blas_lookup_buffer contents],
bvhMetalRT -> blas_lookup.data(),
blas_lookup_buffer.allocatedSize);
if (default_storage_mode == MTLResourceStorageModeManaged) {
[blas_buffer didModifyRange:NSMakeRange(0, blas_buffer.length)];
[blas_lookup_buffer didModifyRange:NSMakeRange(0, blas_lookup_buffer.length)];
}
}
}
}

View File

@@ -19,6 +19,8 @@ enum {
METALRT_FUNC_SHADOW_BOX,
METALRT_FUNC_LOCAL_TRI,
METALRT_FUNC_LOCAL_BOX,
METALRT_FUNC_LOCAL_TRI_PRIM,
METALRT_FUNC_LOCAL_BOX_PRIM,
METALRT_FUNC_CURVE_RIBBON,
METALRT_FUNC_CURVE_RIBBON_SHADOW,
METALRT_FUNC_CURVE_ALL,
@@ -28,7 +30,13 @@ enum {
METALRT_FUNC_NUM
};
enum { METALRT_TABLE_DEFAULT, METALRT_TABLE_SHADOW, METALRT_TABLE_LOCAL, METALRT_TABLE_NUM };
enum {
METALRT_TABLE_DEFAULT,
METALRT_TABLE_SHADOW,
METALRT_TABLE_LOCAL,
METALRT_TABLE_LOCAL_PRIM,
METALRT_TABLE_NUM
};
/* Pipeline State Object types */
enum MetalPipelineType {

View File

@@ -87,6 +87,9 @@ struct ShaderCache {
break;
}
}
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS] = {1024, 1024};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS] = {1024, 1024};
}
~ShaderCache();
@@ -521,6 +524,8 @@ void MetalKernelPipeline::compile()
"__anyhit__cycles_metalrt_shadow_all_hit_box",
"__anyhit__cycles_metalrt_local_hit_tri",
"__anyhit__cycles_metalrt_local_hit_box",
"__anyhit__cycles_metalrt_local_hit_tri_prim",
"__anyhit__cycles_metalrt_local_hit_box_prim",
"__intersection__curve_ribbon",
"__intersection__curve_ribbon_shadow",
"__intersection__curve_all",
@@ -611,11 +616,17 @@ void MetalKernelPipeline::compile()
rt_intersection_function[METALRT_FUNC_LOCAL_BOX],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX],
nil];
table_functions[METALRT_TABLE_LOCAL_PRIM] = [NSArray
arrayWithObjects:rt_intersection_function[METALRT_FUNC_LOCAL_TRI_PRIM],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX_PRIM],
rt_intersection_function[METALRT_FUNC_LOCAL_BOX_PRIM],
nil];
NSMutableSet *unique_functions = [NSMutableSet
setWithArray:table_functions[METALRT_TABLE_DEFAULT]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_SHADOW]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_LOCAL]];
[unique_functions addObjectsFromArray:table_functions[METALRT_TABLE_LOCAL_PRIM]];
if (kernel_has_intersection(device_kernel)) {
linked_functions = [[NSArray arrayWithArray:[unique_functions allObjects]]

View File

@@ -25,6 +25,7 @@ class MetalDeviceQueue : public DeviceQueue {
virtual int num_concurrent_states(const size_t) const override;
virtual int num_concurrent_busy_states(const size_t) const override;
virtual int num_sort_partition_elements() const override;
virtual bool supports_local_atomic_sort() const override;
virtual void init_execution() override;

View File

@@ -315,6 +315,11 @@ int MetalDeviceQueue::num_sort_partition_elements() const
return MetalInfo::optimal_sort_partition_elements(metal_device_->mtlDevice);
}
bool MetalDeviceQueue::supports_local_atomic_sort() const
{
return metal_device_->use_local_atomic_sort();
}
void MetalDeviceQueue::init_execution()
{
/* Synchronize all textures and memory copies before executing task. */
@@ -477,6 +482,12 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
if (metal_device_->bvhMetalRT) {
id<MTLAccelerationStructure> accel_struct = metal_device_->bvhMetalRT->accel_struct;
[metal_device_->mtlAncillaryArgEncoder setAccelerationStructure:accel_struct atIndex:2];
[metal_device_->mtlAncillaryArgEncoder setBuffer:metal_device_->blas_buffer
offset:0
atIndex:7];
[metal_device_->mtlAncillaryArgEncoder setBuffer:metal_device_->blas_lookup_buffer
offset:0
atIndex:8];
}
for (int table = 0; table < METALRT_TABLE_NUM; table++) {
@@ -527,6 +538,10 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
if (bvhMetalRT) {
/* Mark all Accelerations resources as used */
[mtlComputeCommandEncoder useResource:bvhMetalRT->accel_struct usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResource:metal_device_->blas_buffer
usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResource:metal_device_->blas_lookup_buffer
usage:MTLResourceUsageRead];
[mtlComputeCommandEncoder useResources:bvhMetalRT->blas_array.data()
count:bvhMetalRT->blas_array.size()
usage:MTLResourceUsageRead];
@@ -553,13 +568,24 @@ bool MetalDeviceQueue::enqueue(DeviceKernel kernel,
/* See parallel_active_index.h for why this amount of shared memory is needed.
* Rounded up to 16 bytes for Metal */
shared_mem_bytes = (int)round_up((num_threads_per_block + 1) * sizeof(int), 16);
[mtlComputeCommandEncoder setThreadgroupMemoryLength:shared_mem_bytes atIndex:0];
break;
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS:
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS: {
int key_count = metal_device_->launch_params.data.max_shaders;
shared_mem_bytes = (int)round_up(key_count * sizeof(int), 16);
break;
}
default:
break;
}
if (shared_mem_bytes) {
assert(shared_mem_bytes <= 32 * 1024);
[mtlComputeCommandEncoder setThreadgroupMemoryLength:shared_mem_bytes atIndex:0];
}
MTLSize size_threadgroups_per_dispatch = MTLSizeMake(
divide_up(work_size, num_threads_per_block), 1, 1);
MTLSize size_threads_per_threadgroup = MTLSizeMake(num_threads_per_block, 1, 1);

View File

@@ -64,6 +64,12 @@ MetalGPUVendor MetalInfo::get_device_vendor(id<MTLDevice> device)
return METAL_GPU_INTEL;
}
else if (strstr(device_name, "AMD")) {
/* Setting this env var hides AMD devices thus exposing any integrated Intel devices. */
if (auto str = getenv("CYCLES_METAL_FORCE_INTEL")) {
if (atoi(str)) {
return METAL_GPU_UNKNOWN;
}
}
return METAL_GPU_AMD;
}
else if (strstr(device_name, "Apple")) {
@@ -96,6 +102,15 @@ vector<id<MTLDevice>> const &MetalInfo::get_usable_devices()
return usable_devices;
}
/* If the system has both an AMD GPU (discrete) and an Intel one (integrated), prefer the AMD
* one. This can be overridden with CYCLES_METAL_FORCE_INTEL. */
bool has_usable_amd_gpu = false;
if (@available(macos 12.3, *)) {
for (id<MTLDevice> device in MTLCopyAllDevices()) {
has_usable_amd_gpu |= (get_device_vendor(device) == METAL_GPU_AMD);
}
}
metal_printf("Usable Metal devices:\n");
for (id<MTLDevice> device in MTLCopyAllDevices()) {
string device_name = get_device_name(device);
@@ -111,8 +126,10 @@ vector<id<MTLDevice>> const &MetalInfo::get_usable_devices()
}
# if defined(MAC_OS_VERSION_13_0)
if (@available(macos 13.0, *)) {
usable |= (vendor == METAL_GPU_INTEL);
if (!has_usable_amd_gpu) {
if (@available(macos 13.0, *)) {
usable |= (vendor == METAL_GPU_INTEL);
}
}
# endif

View File

@@ -854,12 +854,14 @@ bool OptiXDevice::load_osl_kernels()
context, group_descs, 2, &group_options, nullptr, 0, &osl_groups[i * 2]));
}
OptixStackSizes stack_size[NUM_PROGRAM_GROUPS] = {};
vector<OptixStackSizes> osl_stack_size(osl_groups.size());
/* Update SBT with new entries. */
sbt_data.alloc(NUM_PROGRAM_GROUPS + osl_groups.size());
for (int i = 0; i < NUM_PROGRAM_GROUPS; ++i) {
optix_assert(optixSbtRecordPackHeader(groups[i], &sbt_data[i]));
optix_assert(optixProgramGroupGetStackSize(groups[i], &stack_size[i]));
}
for (size_t i = 0; i < osl_groups.size(); ++i) {
if (osl_groups[i] != NULL) {
@@ -907,13 +909,15 @@ bool OptiXDevice::load_osl_kernels()
0,
&pipelines[PIP_SHADE]));
const unsigned int css = std::max(stack_size[PG_RGEN_SHADE_SURFACE_RAYTRACE].cssRG,
stack_size[PG_RGEN_SHADE_SURFACE_MNEE].cssRG);
unsigned int dss = 0;
for (unsigned int i = 0; i < osl_stack_size.size(); ++i) {
dss = std::max(dss, osl_stack_size[i].dssDC);
}
optix_assert(optixPipelineSetStackSize(
pipelines[PIP_SHADE], 0, dss, 0, pipeline_options.usesMotionBlur ? 3 : 2));
pipelines[PIP_SHADE], 0, dss, css, pipeline_options.usesMotionBlur ? 3 : 2));
}
return !have_error();

View File

@@ -112,6 +112,13 @@ class DeviceQueue {
return 65536;
}
/* Does device support local atomic sorting kernels (INTEGRATOR_SORT_BUCKET_PASS and
* INTEGRATOR_SORT_WRITE_PASS)? */
virtual bool supports_local_atomic_sort() const
{
return false;
}
/* Initialize execution of kernels on this queue.
*
* Will, for example, load all data required by the kernels from Device to global or path state.

View File

@@ -71,6 +71,8 @@ PathTraceWorkGPU::PathTraceWorkGPU(Device *device,
device, "integrator_shader_mnee_sort_counter", MEM_READ_WRITE),
integrator_shader_sort_prefix_sum_(
device, "integrator_shader_sort_prefix_sum", MEM_READ_WRITE),
integrator_shader_sort_partition_key_offsets_(
device, "integrator_shader_sort_partition_key_offsets", MEM_READ_WRITE),
integrator_next_main_path_index_(device, "integrator_next_main_path_index", MEM_READ_WRITE),
integrator_next_shadow_path_index_(
device, "integrator_next_shadow_path_index", MEM_READ_WRITE),
@@ -207,33 +209,45 @@ void PathTraceWorkGPU::alloc_integrator_sorting()
integrator_state_gpu_.sort_partition_divisor = (int)divide_up(max_num_paths_,
num_sort_partitions_);
/* Allocate arrays for shader sorting. */
const int sort_buckets = device_scene_->data.max_shaders * num_sort_partitions_;
if (integrator_shader_sort_counter_.size() < sort_buckets) {
integrator_shader_sort_counter_.alloc(sort_buckets);
integrator_shader_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] =
(int *)integrator_shader_sort_counter_.device_pointer;
integrator_shader_sort_prefix_sum_.alloc(sort_buckets);
integrator_shader_sort_prefix_sum_.zero_to_device();
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_NODE_RAYTRACE) {
if (integrator_shader_raytrace_sort_counter_.size() < sort_buckets) {
integrator_shader_raytrace_sort_counter_.alloc(sort_buckets);
integrator_shader_raytrace_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE] =
(int *)integrator_shader_raytrace_sort_counter_.device_pointer;
if (num_sort_partitions_ > 1 && queue_->supports_local_atomic_sort()) {
/* Allocate array for partitioned shader sorting using local atomics. */
const int num_offsets = (device_scene_->data.max_shaders + 1) * num_sort_partitions_;
if (integrator_shader_sort_partition_key_offsets_.size() < num_offsets) {
integrator_shader_sort_partition_key_offsets_.alloc(num_offsets);
integrator_shader_sort_partition_key_offsets_.zero_to_device();
}
integrator_state_gpu_.sort_partition_key_offsets =
(int *)integrator_shader_sort_partition_key_offsets_.device_pointer;
}
else {
/* Allocate arrays for shader sorting. */
const int sort_buckets = device_scene_->data.max_shaders * num_sort_partitions_;
if (integrator_shader_sort_counter_.size() < sort_buckets) {
integrator_shader_sort_counter_.alloc(sort_buckets);
integrator_shader_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] =
(int *)integrator_shader_sort_counter_.device_pointer;
if (device_scene_->data.kernel_features & KERNEL_FEATURE_MNEE) {
if (integrator_shader_mnee_sort_counter_.size() < sort_buckets) {
integrator_shader_mnee_sort_counter_.alloc(sort_buckets);
integrator_shader_mnee_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE] =
(int *)integrator_shader_mnee_sort_counter_.device_pointer;
integrator_shader_sort_prefix_sum_.alloc(sort_buckets);
integrator_shader_sort_prefix_sum_.zero_to_device();
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_NODE_RAYTRACE) {
if (integrator_shader_raytrace_sort_counter_.size() < sort_buckets) {
integrator_shader_raytrace_sort_counter_.alloc(sort_buckets);
integrator_shader_raytrace_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE] =
(int *)integrator_shader_raytrace_sort_counter_.device_pointer;
}
}
if (device_scene_->data.kernel_features & KERNEL_FEATURE_MNEE) {
if (integrator_shader_mnee_sort_counter_.size() < sort_buckets) {
integrator_shader_mnee_sort_counter_.alloc(sort_buckets);
integrator_shader_mnee_sort_counter_.zero_to_device();
integrator_state_gpu_.sort_key_counter[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE] =
(int *)integrator_shader_mnee_sort_counter_.device_pointer;
}
}
}
}
@@ -451,8 +465,7 @@ void PathTraceWorkGPU::enqueue_path_iteration(DeviceKernel kernel, const int num
work_size = num_queued;
d_path_index = queued_paths_.device_pointer;
compute_sorted_queued_paths(
DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY, kernel, num_paths_limit);
compute_sorted_queued_paths(kernel, num_paths_limit);
}
else if (num_queued < work_size) {
work_size = num_queued;
@@ -511,11 +524,26 @@ void PathTraceWorkGPU::enqueue_path_iteration(DeviceKernel kernel, const int num
}
}
void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel kernel,
DeviceKernel queued_kernel,
void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel queued_kernel,
const int num_paths_limit)
{
int d_queued_kernel = queued_kernel;
/* Launch kernel to fill the active paths arrays. */
if (num_sort_partitions_ > 1 && queue_->supports_local_atomic_sort()) {
const int work_size = kernel_max_active_main_path_index(queued_kernel);
device_ptr d_queued_paths = queued_paths_.device_pointer;
int partition_size = (int)integrator_state_gpu_.sort_partition_divisor;
DeviceKernelArguments args(
&work_size, &partition_size, &num_paths_limit, &d_queued_paths, &d_queued_kernel);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS, 1024 * num_sort_partitions_, args);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS, 1024 * num_sort_partitions_, args);
return;
}
device_ptr d_counter = (device_ptr)integrator_state_gpu_.sort_key_counter[d_queued_kernel];
device_ptr d_prefix_sum = integrator_shader_sort_prefix_sum_.device_pointer;
assert(d_counter != 0 && d_prefix_sum != 0);
@@ -552,7 +580,7 @@ void PathTraceWorkGPU::compute_sorted_queued_paths(DeviceKernel kernel,
&d_prefix_sum,
&d_queued_kernel);
queue_->enqueue(kernel, work_size, args);
queue_->enqueue(DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY, work_size, args);
}
}

View File

@@ -70,9 +70,7 @@ class PathTraceWorkGPU : public PathTraceWork {
void enqueue_path_iteration(DeviceKernel kernel, const int num_paths_limit = INT_MAX);
void compute_queued_paths(DeviceKernel kernel, DeviceKernel queued_kernel);
void compute_sorted_queued_paths(DeviceKernel kernel,
DeviceKernel queued_kernel,
const int num_paths_limit);
void compute_sorted_queued_paths(DeviceKernel queued_kernel, const int num_paths_limit);
void compact_main_paths(const int num_active_paths);
void compact_shadow_paths();
@@ -135,6 +133,7 @@ class PathTraceWorkGPU : public PathTraceWork {
device_vector<int> integrator_shader_raytrace_sort_counter_;
device_vector<int> integrator_shader_mnee_sort_counter_;
device_vector<int> integrator_shader_sort_prefix_sum_;
device_vector<int> integrator_shader_sort_partition_key_offsets_;
/* Path split. */
device_vector<int> integrator_next_main_path_index_;
device_vector<int> integrator_next_shadow_path_index_;

View File

@@ -886,7 +886,7 @@ int RenderScheduler::get_num_samples_during_navigation(int resolution_divider) c
{
/* Special trick for fast navigation: schedule multiple samples during fast navigation
* (which will prefer to use lower resolution to keep up with refresh rate). This gives more
* usable visual feedback for artists. There are a couple of tricks though. */
* usable visual feedback for artists. */
if (is_denoise_active_during_update()) {
/* When denoising is used during navigation prefer using a higher resolution with less samples
@@ -896,25 +896,12 @@ int RenderScheduler::get_num_samples_during_navigation(int resolution_divider) c
return 1;
}
if (resolution_divider <= pixel_size_) {
/* When resolution divider is at or below pixel size, schedule one sample. This doesn't effect
* the sample count at this resolution division, but instead assists in the calculation of
* the resolution divider. */
return 1;
}
if (resolution_divider == pixel_size_ * 2) {
/* When resolution divider is the previous step to the final resolution, schedule two samples.
* This is so that rendering on lower resolution does not exceed time that it takes to render
* first sample at the full resolution. */
return 2;
}
/* Always render 4 samples, even if scene is configured for less.
* The idea here is to have enough information on the screen. Resolution divider of 2 allows us
* to have 4 time extra samples, so overall worst case timing is the same as the final resolution
* at one sample. */
return 4;
/* Schedule samples equal to the resolution divider up to a maximum of 4.
* The idea is to have enough information on the screen by increasing the sample count as the
* resolution is decreased. */
/* NOTE: Changing this formula will change the formula in
* `RenderScheduler::calculate_resolution_divider_for_time()`. */
return min(max(1, resolution_divider / pixel_size_), 4);
}
bool RenderScheduler::work_need_adaptive_filter() const
@@ -1100,9 +1087,10 @@ void RenderScheduler::update_start_resolution_divider()
/* TODO(sergey): Need to add hysteresis to avoid resolution divider bouncing around when actual
* render time is somewhere on a boundary between two resolutions. */
/* Never increase resolution to higher than the pixel size (which is possible if the scene is
* simple and compute device is fast). */
start_resolution_divider_ = max(resolution_divider_for_update, pixel_size_);
/* Don't let resolution drop below the desired one. It's better to be slow than provide an
* unreadable viewport render. */
start_resolution_divider_ = min(resolution_divider_for_update,
default_start_resolution_divider_);
VLOG_WORK << "Calculated resolution divider is " << start_resolution_divider_;
}
@@ -1187,24 +1175,24 @@ void RenderScheduler::check_time_limit_reached()
int RenderScheduler::calculate_resolution_divider_for_time(double desired_time, double actual_time)
{
/* TODO(sergey): There should a non-iterative analytical formula here. */
const double ratio_between_times = actual_time / desired_time;
int resolution_divider = 1;
/* We can pass `ratio_between_times` to `get_num_samples_during_navigation()` to get our
* navigation samples because the equation for calculating the resolution divider is as follows:
* `actual_time / desired_time = sqr(resolution_divider) / sample_count`.
* While `resolution_divider` is less than or equal to 4, `resolution_divider = sample_count`
* (This relationship is determined in `get_num_samples_during_navigation()`). With some
* substitution we end up with `actual_time / desired_time = resolution_divider` while the
* resolution divider is less than or equal to 4. Once the resolution divider increases above 4,
* the relationship of `actual_time / desired_time = resolution_divider` is no longer true,
* however the sample count retrieved from `get_num_samples_during_navigation()` is still
* accurate if we continue using this assumption. It should be noted that the interaction between
* `pixel_size`, sample count, and resolution divider are automatically accounted for and that's
* why `pixel_size` isn't included in any of the equations. */
const int navigation_samples = get_num_samples_during_navigation(
ceil_to_int(ratio_between_times));
/* This algorithm iterates through resolution dividers until a divider is found that achieves
* the desired render time. A limit of default_start_resolution_divider_ is put in place as the
* maximum resolution divider to avoid an unreadable viewport due to a low resolution.
* pre_resolution_division_samples and post_resolution_division_samples are used in this
* calculation to better predict the performance impact of changing resolution divisions as
* the sample count can also change between resolution divisions. */
while (actual_time > desired_time && resolution_divider < default_start_resolution_divider_) {
int pre_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
resolution_divider = resolution_divider * 2;
int post_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
actual_time /= 4.0 * pre_resolution_division_samples / post_resolution_division_samples;
}
return resolution_divider;
return ceil_to_int(sqrt(navigation_samples * ratio_between_times));
}
int calculate_resolution_divider_for_resolution(int width, int height, int resolution)

View File

@@ -412,11 +412,12 @@ if(WITH_CYCLES_CUDA_BINARIES)
# warn for other versions
if((CUDA_VERSION STREQUAL "101") OR
(CUDA_VERSION STREQUAL "102") OR
(CUDA_VERSION_MAJOR STREQUAL "11"))
(CUDA_VERSION_MAJOR STREQUAL "11") OR
(CUDA_VERSION_MAJOR STREQUAL "12"))
else()
message(WARNING
"CUDA version ${CUDA_VERSION_MAJOR}.${CUDA_VERSION_MINOR} detected, "
"build may succeed but only CUDA 11, 10.2 and 10.1 have been tested")
"build may succeed but only CUDA 12, 11, 10.2 and 10.1 have been tested")
endif()
# build for each arch
@@ -514,6 +515,16 @@ if(WITH_CYCLES_CUDA_BINARIES)
else()
message(STATUS "CUDA binaries for ${arch} require CUDA 10 or earlier, skipped.")
endif()
elseif(${arch} MATCHES ".*_3.")
if(DEFINED CUDA11_NVCC_EXECUTABLE)
set(cuda_nvcc_executable ${CUDA11_NVCC_EXECUTABLE})
set(cuda_toolkit_root_dir ${CUDA11_TOOLKIT_ROOT_DIR})
elseif("${CUDA_VERSION}" LESS 120) # Support for sm_35, sm_37 was removed in CUDA 12
set(cuda_nvcc_executable ${CUDA_NVCC_EXECUTABLE})
set(cuda_toolkit_root_dir ${CUDA_TOOLKIT_ROOT_DIR})
else()
message(STATUS "CUDA binaries for ${arch} require CUDA 11 or earlier, skipped.")
endif()
elseif(${arch} MATCHES ".*_7." AND "${CUDA_VERSION}" LESS 100)
message(STATUS "CUDA binaries for ${arch} require CUDA 10.0+, skipped.")
elseif(${arch} MATCHES ".*_8.")

View File

@@ -661,7 +661,8 @@ ccl_device void bsdf_blur(KernelGlobals kg, ccl_private ShaderClosure *sc, float
#endif
}
ccl_device_inline Spectrum bsdf_albedo(ccl_private const ShaderData *sd, ccl_private const ShaderClosure *sc)
ccl_device_inline Spectrum bsdf_albedo(ccl_private const ShaderData *sd,
ccl_private const ShaderClosure *sc)
{
Spectrum albedo = sc->weight;
/* Some closures include additional components such as Fresnel terms that cause their albedo to
@@ -685,7 +686,7 @@ ccl_device_inline Spectrum bsdf_albedo(ccl_private const ShaderData *sd, ccl_pri
albedo *= ((ccl_private const PrincipledSheenBsdf *)sc)->avg_value;
break;
case CLOSURE_BSDF_HAIR_PRINCIPLED_ID:
albedo *= bsdf_principled_hair_albedo(sc);
albedo *= bsdf_principled_hair_albedo(sd, sc);
break;
default:
break;

View File

@@ -478,10 +478,18 @@ ccl_device_inline float bsdf_principled_hair_albedo_roughness_scale(
return (((((0.245f * x) + 5.574f) * x - 10.73f) * x + 2.532f) * x - 0.215f) * x + 5.969f;
}
ccl_device Spectrum bsdf_principled_hair_albedo(ccl_private const ShaderClosure *sc)
ccl_device Spectrum bsdf_principled_hair_albedo(ccl_private const ShaderData *sd,
ccl_private const ShaderClosure *sc)
{
ccl_private PrincipledHairBSDF *bsdf = (ccl_private PrincipledHairBSDF *)sc;
return exp(-sqrt(bsdf->sigma) * bsdf_principled_hair_albedo_roughness_scale(bsdf->v));
const float cos_theta_o = cos_from_sin(dot(sd->wi, safe_normalize(sd->dPdu)));
const float cos_gamma_o = cos_from_sin(bsdf->extra->geom.w);
const float f = fresnel_dielectric_cos(cos_theta_o * cos_gamma_o, bsdf->eta);
const float roughness_scale = bsdf_principled_hair_albedo_roughness_scale(bsdf->v);
/* TODO(lukas): Adding the Fresnel term here as a workaround until the proper refactor. */
return exp(-sqrt(bsdf->sigma) * roughness_scale) + make_spectrum(f);
}
ccl_device_inline Spectrum

View File

@@ -519,14 +519,6 @@ ccl_device int bsdf_microfacet_ggx_setup(ccl_private MicrofacetBsdf *bsdf)
return SD_BSDF | SD_BSDF_HAS_EVAL;
}
/* Required to maintain OSL interface. */
ccl_device int bsdf_microfacet_ggx_isotropic_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_y = bsdf->alpha_x;
return bsdf_microfacet_ggx_setup(bsdf);
}
ccl_device int bsdf_microfacet_ggx_fresnel_setup(ccl_private MicrofacetBsdf *bsdf,
ccl_private const ShaderData *sd)
{
@@ -613,14 +605,6 @@ ccl_device int bsdf_microfacet_beckmann_setup(ccl_private MicrofacetBsdf *bsdf)
return SD_BSDF | SD_BSDF_HAS_EVAL;
}
/* Required to maintain OSL interface. */
ccl_device int bsdf_microfacet_beckmann_isotropic_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_y = bsdf->alpha_x;
return bsdf_microfacet_beckmann_setup(bsdf);
}
ccl_device int bsdf_microfacet_beckmann_refraction_setup(ccl_private MicrofacetBsdf *bsdf)
{
bsdf->alpha_x = saturatef(bsdf->alpha_x);

View File

@@ -90,8 +90,10 @@ ccl_device float schlick_fresnel(float u)
}
/* Calculate the fresnel color, which is a blend between white and the F0 color */
ccl_device_forceinline Spectrum
interpolate_fresnel_color(float3 L, float3 H, float ior, Spectrum F0)
ccl_device_forceinline Spectrum interpolate_fresnel_color(float3 L,
float3 H,
float ior,
Spectrum F0)
{
/* Compute the real Fresnel term and remap it from real_F0..1 to F0..1.
* The reason why we use this remapping instead of directly doing the

View File

@@ -10,7 +10,7 @@
#ifndef WITH_CYCLES_OPTIMIZED_KERNEL_AVX2
# define KERNEL_STUB
#else
/* SSE optimization disabled for now on 32 bit, see bug T36316. */
/* SSE optimization disabled for now on 32 bit, see bug #36316. */
# if !(defined(__GNUC__) && (defined(i386) || defined(_M_IX86)))
# define __KERNEL_SSE__
# define __KERNEL_SSE2__

View File

@@ -10,7 +10,7 @@
#ifndef WITH_CYCLES_OPTIMIZED_KERNEL_SSE2
# define KERNEL_STUB
#else
/* SSE optimization disabled for now on 32 bit, see bug T36316. */
/* SSE optimization disabled for now on 32 bit, see bug #36316. */
# if !(defined(__GNUC__) && (defined(i386) || defined(_M_IX86)))
# define __KERNEL_SSE2__
# endif

View File

@@ -10,7 +10,7 @@
#ifndef WITH_CYCLES_OPTIMIZED_KERNEL_SSE41
# define KERNEL_STUB
#else
/* SSE optimization disabled for now on 32 bit, see bug T36316. */
/* SSE optimization disabled for now on 32 bit, see bug #36316. */
# if !(defined(__GNUC__) && (defined(i386) || defined(_M_IX86)))
# define __KERNEL_SSE2__
# define __KERNEL_SSE3__

View File

@@ -401,6 +401,72 @@ ccl_gpu_kernel_threads(GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE)
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_SORT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_sort_bucket_pass,
int num_states,
int partition_size,
int num_states_limit,
ccl_global int *indices,
int kernel_index)
{
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
int max_shaders = context.launch_params_metal.data.max_shaders;
ccl_global ushort *d_queued_kernel = (ccl_global ushort *)
kernel_integrator_state.path.queued_kernel;
ccl_global uint *d_shader_sort_key = (ccl_global uint *)
kernel_integrator_state.path.shader_sort_key;
ccl_global int *key_offsets = (ccl_global int *)
kernel_integrator_state.sort_partition_key_offsets;
gpu_parallel_sort_bucket_pass(num_states,
partition_size,
max_shaders,
kernel_index,
d_queued_kernel,
d_shader_sort_key,
key_offsets,
(threadgroup int *)threadgroup_array,
metal_local_id,
metal_local_size,
metal_grid_id);
#endif
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_SORT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_sort_write_pass,
int num_states,
int partition_size,
int num_states_limit,
ccl_global int *indices,
int kernel_index)
{
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
int max_shaders = context.launch_params_metal.data.max_shaders;
ccl_global ushort *d_queued_kernel = (ccl_global ushort *)
kernel_integrator_state.path.queued_kernel;
ccl_global uint *d_shader_sort_key = (ccl_global uint *)
kernel_integrator_state.path.shader_sort_key;
ccl_global int *key_offsets = (ccl_global int *)
kernel_integrator_state.sort_partition_key_offsets;
gpu_parallel_sort_write_pass(num_states,
partition_size,
max_shaders,
kernel_index,
num_states_limit,
indices,
d_queued_kernel,
d_shader_sort_key,
key_offsets,
(threadgroup int *)threadgroup_array,
metal_local_id,
metal_local_size,
metal_grid_id);
#endif
}
ccl_gpu_kernel_postfix
ccl_gpu_kernel_threads(GPU_PARALLEL_ACTIVE_INDEX_DEFAULT_BLOCK_SIZE)
ccl_gpu_kernel_signature(integrator_compact_paths_array,
int num_states,
@@ -579,7 +645,7 @@ ccl_device_inline void kernel_gpu_film_convert_half_write(ccl_global uchar4 *rgb
const int y,
const half4 half_pixel)
{
/* Work around HIP issue with half float display, see T92972. */
/* Work around HIP issue with half float display, see #92972. */
#ifdef __KERNEL_HIP__
ccl_global half *out = ((ccl_global half *)rgba) + (rgba_offset + y * rgba_stride + x) * 4;
out[0] = half_pixel.x;

View File

@@ -178,7 +178,7 @@ __device__
simd_lane_index, \
simd_group_index, \
num_simd_groups, \
simdgroup_offset)
(threadgroup int *)threadgroup_array)
#elif defined(__KERNEL_ONEAPI__)
# define gpu_parallel_active_index_array(num_states, indices, num_indices, is_active_op) \

View File

@@ -19,6 +19,115 @@ CCL_NAMESPACE_BEGIN
# define GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE 512
#endif
#define GPU_PARALLEL_SORTED_INDEX_INACTIVE_KEY (~0)
#define GPU_PARALLEL_SORT_BLOCK_SIZE 1024
#if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
# define atomic_store_local(p, x) \
atomic_store_explicit((threadgroup atomic_int *)p, x, memory_order_relaxed)
# define atomic_load_local(p) \
atomic_load_explicit((threadgroup atomic_int *)p, memory_order_relaxed)
ccl_device_inline void gpu_parallel_sort_bucket_pass(const uint num_states,
const uint partition_size,
const uint max_shaders,
const uint queued_kernel,
ccl_global ushort *d_queued_kernel,
ccl_global uint *d_shader_sort_key,
ccl_global int *partition_key_offsets,
ccl_gpu_shared int *buckets,
const ushort local_id,
const ushort local_size,
const ushort grid_id)
{
/* Zero the bucket sizes. */
if (local_id < max_shaders) {
atomic_store_local(&buckets[local_id], 0);
}
ccl_gpu_syncthreads();
/* Determine bucket sizes within the partitions. */
const uint partition_start = partition_size * uint(grid_id);
const uint partition_end = min(num_states, partition_start + partition_size);
for (int state_index = partition_start + uint(local_id); state_index < partition_end;
state_index += uint(local_size)) {
ushort kernel_index = d_queued_kernel[state_index];
if (kernel_index == queued_kernel) {
uint key = d_shader_sort_key[state_index] % max_shaders;
atomic_fetch_and_add_uint32(&buckets[key], 1);
}
}
ccl_gpu_syncthreads();
/* Calculate the partition's local offsets from the prefix sum of bucket sizes. */
if (local_id == 0) {
int offset = 0;
for (int i = 0; i < max_shaders; i++) {
partition_key_offsets[i + uint(grid_id) * (max_shaders + 1)] = offset;
offset = offset + atomic_load_local(&buckets[i]);
}
/* Store the number of active states in this partition. */
partition_key_offsets[max_shaders + uint(grid_id) * (max_shaders + 1)] = offset;
}
}
ccl_device_inline void gpu_parallel_sort_write_pass(const uint num_states,
const uint partition_size,
const uint max_shaders,
const uint queued_kernel,
const int num_states_limit,
ccl_global int *indices,
ccl_global ushort *d_queued_kernel,
ccl_global uint *d_shader_sort_key,
ccl_global int *partition_key_offsets,
ccl_gpu_shared int *local_offset,
const ushort local_id,
const ushort local_size,
const ushort grid_id)
{
/* Calculate each partition's global offset from the prefix sum of the active state counts per
* partition. */
if (local_id < max_shaders) {
int partition_offset = 0;
for (int i = 0; i < uint(grid_id); i++) {
int partition_key_count = partition_key_offsets[max_shaders + uint(i) * (max_shaders + 1)];
partition_offset += partition_key_count;
}
ccl_global int *key_offsets = partition_key_offsets + (uint(grid_id) * (max_shaders + 1));
atomic_store_local(&local_offset[local_id], key_offsets[local_id] + partition_offset);
}
ccl_gpu_syncthreads();
/* Write the sorted active indices. */
const uint partition_start = partition_size * uint(grid_id);
const uint partition_end = min(num_states, partition_start + partition_size);
ccl_global int *key_offsets = partition_key_offsets + (uint(grid_id) * max_shaders);
for (int state_index = partition_start + uint(local_id); state_index < partition_end;
state_index += uint(local_size)) {
ushort kernel_index = d_queued_kernel[state_index];
if (kernel_index == queued_kernel) {
uint key = d_shader_sort_key[state_index] % max_shaders;
int index = atomic_fetch_and_add_uint32(&local_offset[key], 1);
if (index < num_states_limit) {
indices[index] = state_index;
}
}
}
}
#endif /* __KERNEL_LOCAL_ATOMIC_SORT__ */
template<typename GetKeyOp>
__device__ void gpu_parallel_sorted_index_array(const uint state_index,

View File

@@ -172,17 +172,14 @@ ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
kernel_assert(!"Invalid ift_local");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
if (is_null_intersection_function_table(metal_ancillaries->ift_local_prim)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid ift_local_prim");
return false;
}
# endif
MetalRTIntersectionLocalPayload payload;
payload.self = ray->self;
@@ -195,14 +192,48 @@ ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
}
payload.result = false;
typename metalrt_intersector_type::result_type intersection;
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
# if defined(__METALRT_MOTION__)
metalrt_intersector_type metalrt_intersect;
typename metalrt_intersector_type::result_type intersection;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, ray->time, metal_ancillaries->ift_local, payload);
# else
metalrt_blas_intersector_type metalrt_intersect;
typename metalrt_blas_intersector_type::result_type intersection;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
bool triangle_only = !kernel_data.bvh.have_curves && !kernel_data.bvh.have_points;
if (triangle_only) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
// if we know we are going to get max one hit, like for random-sss-walk we can
// optimize and accept the first hit
if (max_hits == 1) {
metalrt_intersect.accept_any_intersection(true);
}
int blas_index = metal_ancillaries->blas_userID_to_index_lookUp[local_object];
// transform the ray into object's local space
Transform itfm = kernel_data_fetch(objects, local_object).itfm;
r.origin = transform_point(&itfm, r.origin);
r.direction = transform_direction(&itfm, r.direction);
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, metal_ancillaries->ift_local, payload);
r,
metal_ancillaries->blas_accel_structs[blas_index].blas,
metal_ancillaries->ift_local_prim,
payload);
# endif
if (lcg_state) {

View File

@@ -105,10 +105,11 @@ struct kernel_gpu_##name \
{ \
PARAMS_MAKER(__VA_ARGS__)(__VA_ARGS__) \
void run(thread MetalKernelContext& context, \
threadgroup int *simdgroup_offset, \
threadgroup atomic_int *threadgroup_array, \
const uint metal_global_id, \
const ushort metal_local_id, \
const ushort metal_local_size, \
const ushort metal_grid_id, \
uint simdgroup_size, \
uint simd_lane_index, \
uint simd_group_index, \
@@ -117,22 +118,24 @@ struct kernel_gpu_##name \
kernel void cycles_metal_##name(device const kernel_gpu_##name *params_struct, \
constant KernelParamsMetal &ccl_restrict _launch_params_metal, \
constant MetalAncillaries *_metal_ancillaries, \
threadgroup int *simdgroup_offset[[ threadgroup(0) ]], \
threadgroup atomic_int *threadgroup_array[[ threadgroup(0) ]], \
const uint metal_global_id [[thread_position_in_grid]], \
const ushort metal_local_id [[thread_position_in_threadgroup]], \
const ushort metal_local_size [[threads_per_threadgroup]], \
const ushort metal_grid_id [[threadgroup_position_in_grid]], \
uint simdgroup_size [[threads_per_simdgroup]], \
uint simd_lane_index [[thread_index_in_simdgroup]], \
uint simd_group_index [[simdgroup_index_in_threadgroup]], \
uint num_simd_groups [[simdgroups_per_threadgroup]]) { \
MetalKernelContext context(_launch_params_metal, _metal_ancillaries); \
params_struct->run(context, simdgroup_offset, metal_global_id, metal_local_id, metal_local_size, simdgroup_size, simd_lane_index, simd_group_index, num_simd_groups); \
params_struct->run(context, threadgroup_array, metal_global_id, metal_local_id, metal_local_size, metal_grid_id, simdgroup_size, simd_lane_index, simd_group_index, num_simd_groups); \
} \
void kernel_gpu_##name::run(thread MetalKernelContext& context, \
threadgroup int *simdgroup_offset, \
threadgroup atomic_int *threadgroup_array, \
const uint metal_global_id, \
const ushort metal_local_id, \
const ushort metal_local_size, \
const ushort metal_grid_id, \
uint simdgroup_size, \
uint simd_lane_index, \
uint simd_group_index, \
@@ -263,13 +266,25 @@ ccl_device_forceinline uchar4 make_uchar4(const uchar x,
# if defined(__METALRT_MOTION__)
# define METALRT_TAGS instancing, instance_motion, primitive_motion
# define METALRT_BLAS_TAGS , primitive_motion
# else
# define METALRT_TAGS instancing
# define METALRT_BLAS_TAGS
# endif /* __METALRT_MOTION__ */
typedef acceleration_structure<METALRT_TAGS> metalrt_as_type;
typedef intersection_function_table<triangle_data, METALRT_TAGS> metalrt_ift_type;
typedef metal::raytracing::intersector<triangle_data, METALRT_TAGS> metalrt_intersector_type;
# if defined(__METALRT_MOTION__)
typedef acceleration_structure<primitive_motion> metalrt_blas_as_type;
typedef intersection_function_table<triangle_data, primitive_motion> metalrt_blas_ift_type;
typedef metal::raytracing::intersector<triangle_data, primitive_motion>
metalrt_blas_intersector_type;
# else
typedef acceleration_structure<> metalrt_blas_as_type;
typedef intersection_function_table<triangle_data> metalrt_blas_ift_type;
typedef metal::raytracing::intersector<triangle_data> metalrt_blas_intersector_type;
# endif
#endif /* __METALRT__ */
@@ -282,6 +297,12 @@ struct Texture3DParamsMetal {
texture3d<float, access::sample> tex;
};
#ifdef __METALRT__
struct MetalRTBlasWrapper {
metalrt_blas_as_type blas;
};
#endif
struct MetalAncillaries {
device Texture2DParamsMetal *textures_2d;
device Texture3DParamsMetal *textures_3d;
@@ -291,6 +312,9 @@ struct MetalAncillaries {
metalrt_ift_type ift_default;
metalrt_ift_type ift_shadow;
metalrt_ift_type ift_local;
metalrt_blas_ift_type ift_local_prim;
constant MetalRTBlasWrapper *blas_accel_structs;
constant int *blas_userID_to_index_lookUp;
#endif
};

View File

@@ -139,6 +139,20 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
#endif
}
[[intersection(triangle, triangle_data )]] TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri_prim(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload [[payload]],
uint primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
{
//instance_id, aka the user_id has been removed. If we take this function we optimized the
//SSS for starting traversal from a primitive acceleration structure instead of the root of the global AS.
//this means we will always be intersecting the correct object no need for the userid to check
return metalrt_local_hit<TriangleIntersectionResult, METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, payload.local_object, primitive_id, barycentrics, ray_tmax);
}
[[intersection(triangle, triangle_data, METALRT_TAGS)]] TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
@@ -163,6 +177,17 @@ __anyhit__cycles_metalrt_local_hit_box(const float ray_tmax [[max_distance]])
return result;
}
[[intersection(bounding_box, triangle_data )]] BoundingBoxIntersectionResult
__anyhit__cycles_metalrt_local_hit_box_prim(const float ray_tmax [[max_distance]])
{
/* unused function */
BoundingBoxIntersectionResult result;
result.distance = ray_tmax;
result.accept = false;
result.continue_search = false;
return result;
}
template<uint intersection_type>
bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,

View File

@@ -372,6 +372,16 @@ bool oneapi_enqueue_kernel(KernelContext *kernel_context,
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sorted_paths_array);
break;
}
case DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS: {
oneapi_call(
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sort_bucket_pass);
break;
}
case DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS: {
oneapi_call(
kg, cgh, global_size, local_size, args, oneapi_kernel_integrator_sort_write_pass);
break;
}
case DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY: {
oneapi_call(kg,
cgh,

View File

@@ -132,6 +132,9 @@ typedef struct IntegratorStateGPU {
/* Index of main path which will be used by a next shadow catcher split. */
ccl_global int *next_main_path_index;
/* Partition/key offsets used when writing sorted active indices. */
ccl_global int *sort_partition_key_offsets;
/* Divisor used to partition active indices by locality when sorting by material. */
uint sort_partition_divisor;
} IntegratorStateGPU;

View File

@@ -115,6 +115,13 @@ ccl_device_forceinline void integrator_path_init_sorted(KernelGlobals kg,
atomic_fetch_and_add_uint32(&kernel_integrator_state.queue_counter->num_queued[next_kernel], 1);
INTEGRATOR_STATE_WRITE(state, path, queued_kernel) = next_kernel;
INTEGRATOR_STATE_WRITE(state, path, shader_sort_key) = key_;
# if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
if (!kernel_integrator_state.sort_key_counter[next_kernel]) {
return;
}
# endif
atomic_fetch_and_add_uint32(&kernel_integrator_state.sort_key_counter[next_kernel][key_], 1);
}
@@ -130,6 +137,13 @@ ccl_device_forceinline void integrator_path_next_sorted(KernelGlobals kg,
atomic_fetch_and_add_uint32(&kernel_integrator_state.queue_counter->num_queued[next_kernel], 1);
INTEGRATOR_STATE_WRITE(state, path, queued_kernel) = next_kernel;
INTEGRATOR_STATE_WRITE(state, path, shader_sort_key) = key_;
# if defined(__KERNEL_LOCAL_ATOMIC_SORT__)
if (!kernel_integrator_state.sort_key_counter[next_kernel]) {
return;
}
# endif
atomic_fetch_and_add_uint32(&kernel_integrator_state.sort_key_counter[next_kernel][key_], 1);
}

View File

@@ -209,14 +209,7 @@ ccl_device void osl_closure_microfacet_setup(KernelGlobals kg,
if (closure->distribution == make_string("ggx", 11253504724482777663ull) ||
closure->distribution == make_string("default", 4430693559278735917ull)) {
if (!closure->refract) {
if (closure->alpha_x == closure->alpha_y) {
/* Isotropic */
sd->flag |= bsdf_microfacet_ggx_isotropic_setup(bsdf);
}
else {
/* Anisotropic */
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
else {
sd->flag |= bsdf_microfacet_ggx_refraction_setup(bsdf);
@@ -225,14 +218,7 @@ ccl_device void osl_closure_microfacet_setup(KernelGlobals kg,
/* Beckmann */
else {
if (!closure->refract) {
if (closure->alpha_x == closure->alpha_y) {
/* Isotropic */
sd->flag |= bsdf_microfacet_beckmann_isotropic_setup(bsdf);
}
else {
/* Anisotropic */
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
else {
sd->flag |= bsdf_microfacet_beckmann_refraction_setup(bsdf);
@@ -258,9 +244,9 @@ ccl_device void osl_closure_microfacet_ggx_setup(
}
bsdf->N = ensure_valid_reflection(sd->Ng, sd->wi, closure->N);
bsdf->alpha_x = closure->alpha_x;
bsdf->alpha_x = bsdf->alpha_y = closure->alpha_x;
sd->flag |= bsdf_microfacet_ggx_isotropic_setup(bsdf);
sd->flag |= bsdf_microfacet_ggx_setup(bsdf);
}
ccl_device void osl_closure_microfacet_ggx_aniso_setup(
@@ -652,9 +638,9 @@ ccl_device void osl_closure_microfacet_beckmann_setup(
}
bsdf->N = ensure_valid_reflection(sd->Ng, sd->wi, closure->N);
bsdf->alpha_x = closure->alpha_x;
bsdf->alpha_x = bsdf->alpha_y = closure->alpha_x;
sd->flag |= bsdf_microfacet_beckmann_isotropic_setup(bsdf);
sd->flag |= bsdf_microfacet_beckmann_setup(bsdf);
}
ccl_device void osl_closure_microfacet_beckmann_aniso_setup(

View File

@@ -74,7 +74,8 @@ CCL_NAMESPACE_BEGIN
#define __VOLUME__
/* TODO: solve internal compiler errors and enable light tree on HIP. */
#ifdef __KERNEL_HIP__
/* TODO: solve internal compiler perf issue and enable light tree on Metal/AMD. */
#if defined(__KERNEL_HIP__) || defined(__KERNEL_METAL_AMD__)
# undef __LIGHT_TREE__
#endif
@@ -1508,6 +1509,8 @@ typedef enum DeviceKernel : int {
DEVICE_KERNEL_INTEGRATOR_ACTIVE_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_TERMINATED_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_SORT_BUCKET_PASS,
DEVICE_KERNEL_INTEGRATOR_SORT_WRITE_PASS,
DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY,
DEVICE_KERNEL_INTEGRATOR_COMPACT_STATES,
DEVICE_KERNEL_INTEGRATOR_TERMINATED_SHADOW_PATHS_ARRAY,

View File

@@ -13,6 +13,7 @@
#include "scene/light.h"
#include "scene/mesh.h"
#include "scene/object.h"
#include "scene/osl.h"
#include "scene/pointcloud.h"
#include "scene/scene.h"
#include "scene/shader.h"
@@ -25,7 +26,6 @@
#ifdef WITH_OSL
# include "kernel/osl/globals.h"
# include "kernel/osl/services.h"
#endif
#include "util/foreach.h"
@@ -1717,20 +1717,7 @@ void GeometryManager::device_update_displacement_images(Device *device,
/* If any OSL node is used for displacement, it may reference a texture. But it's
* unknown which ones, so have to load them all. */
if (has_osl_node) {
set<OSLRenderServices *> services_shared;
device->foreach_device([&services_shared](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
services_shared.insert(og->services);
});
for (OSLRenderServices *services : services_shared) {
for (auto it = services->textures.begin(); it != services->textures.end(); ++it) {
if (it->second->handle.get_manager() == image_manager) {
const int slot = it->second->handle.svm_slot();
bump_images.insert(slot);
}
}
}
OSLShaderManager::osl_image_slots(device, image_manager, bump_images);
}
#endif

View File

@@ -394,7 +394,7 @@ bool OSLShaderManager::osl_compile(const string &inputfile, const string &output
/* Compile.
*
* Mutex protected because the OSL compiler does not appear to be thread safe, see T92503. */
* Mutex protected because the OSL compiler does not appear to be thread safe, see #92503. */
static thread_mutex osl_compiler_mutex;
thread_scoped_lock lock(osl_compiler_mutex);
@@ -665,6 +665,27 @@ OSLNode *OSLShaderManager::osl_node(ShaderGraph *graph,
return node;
}
/* Static function, so only this file needs to be compile with RTTT. */
void OSLShaderManager::osl_image_slots(Device *device,
ImageManager *image_manager,
set<int> &image_slots)
{
set<OSLRenderServices *> services_shared;
device->foreach_device([&services_shared](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
services_shared.insert(og->services);
});
for (OSLRenderServices *services : services_shared) {
for (auto it = services->textures.begin(); it != services->textures.end(); ++it) {
if (it->second->handle.get_manager() == image_manager) {
const int slot = it->second->handle.svm_slot();
image_slots.insert(slot);
}
}
}
}
/* Graph Compiler */
OSLCompiler::OSLCompiler(OSLShaderManager *manager, OSL::ShadingSystem *ss, Scene *scene)

View File

@@ -92,6 +92,9 @@ class OSLShaderManager : public ShaderManager {
const std::string &bytecode_hash = "",
const std::string &bytecode = "");
/* Get image slots used by OSL services on device. */
static void osl_image_slots(Device *device, ImageManager *image_manager, set<int> &image_slots);
private:
void texture_system_init();
void texture_system_free();

View File

@@ -573,7 +573,7 @@ void ShaderManager::device_update_common(Device * /*device*/,
kfilm->is_rec709 = is_rec709;
}
void ShaderManager::device_free_common(Device *, DeviceScene *dscene, Scene *scene)
void ShaderManager::device_free_common(Device * /*device*/, DeviceScene *dscene, Scene * /*scene*/)
{
dscene->shaders.free();
}

View File

@@ -520,7 +520,7 @@ bool TileManager::write_tile(const RenderBuffers &tile_buffers)
/* If there is an overscan used for the tile copy pixels into single continuous block of memory
* without any "gaps".
* This is a workaround for bug in OIIO (https://github.com/OpenImageIO/oiio/pull/3176).
* Our task reference: T93008. */
* Our task reference: #93008. */
if (tile_params.window_x || tile_params.window_y ||
tile_params.window_width != tile_params.width ||
tile_params.window_height != tile_params.height) {

View File

@@ -73,16 +73,55 @@ ccl_device_inline float atomic_add_and_fetch_float(volatile ccl_global float *_s
return new_value.float_value;
}
# define atomic_fetch_and_add_uint32(p, x) \
atomic_fetch_add_explicit((device atomic_uint *)p, x, memory_order_relaxed)
# define atomic_fetch_and_sub_uint32(p, x) \
atomic_fetch_sub_explicit((device atomic_uint *)p, x, memory_order_relaxed)
# define atomic_fetch_and_inc_uint32(p) \
atomic_fetch_add_explicit((device atomic_uint *)p, 1, memory_order_relaxed)
# define atomic_fetch_and_dec_uint32(p) \
atomic_fetch_sub_explicit((device atomic_uint *)p, 1, memory_order_relaxed)
# define atomic_fetch_and_or_uint32(p, x) \
atomic_fetch_or_explicit((device atomic_uint *)p, x, memory_order_relaxed)
template<class T> ccl_device_inline uint32_t atomic_fetch_and_add_uint32(device T *p, int x)
{
return atomic_fetch_add_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_sub_uint32(device T *p, int x)
{
return atomic_fetch_sub_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_inc_uint32(device T *p)
{
return atomic_fetch_add_explicit((device atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_dec_uint32(device T *p)
{
return atomic_fetch_sub_explicit((device atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_or_uint32(device T *p, int x)
{
return atomic_fetch_or_explicit((device atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_add_uint32(threadgroup T *p, int x)
{
return atomic_fetch_add_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_sub_uint32(threadgroup T *p, int x)
{
return atomic_fetch_sub_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_inc_uint32(threadgroup T *p)
{
return atomic_fetch_add_explicit((threadgroup atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_dec_uint32(threadgroup T *p)
{
return atomic_fetch_sub_explicit((threadgroup atomic_uint *)p, 1, memory_order_relaxed);
}
template<class T> ccl_device_inline uint32_t atomic_fetch_and_or_uint32(threadgroup T *p, int x)
{
return atomic_fetch_or_explicit((threadgroup atomic_uint *)p, x, memory_order_relaxed);
}
ccl_device_inline float atomic_compare_and_swap_float(volatile ccl_global float *dest,
const float old_val,

View File

@@ -69,6 +69,9 @@ void DebugFlags::Metal::reset()
{
if (getenv("CYCLES_METAL_ADAPTIVE_COMPILE") != NULL)
adaptive_compile = true;
if (auto str = getenv("CYCLES_METAL_LOCAL_ATOMIC_SORT"))
use_local_atomic_sort = (atoi(str) != 0);
}
DebugFlags::OptiX::OptiX()

View File

@@ -97,6 +97,9 @@ class DebugFlags {
/* Whether adaptive feature based runtime compile is enabled or not. */
bool adaptive_compile = false;
/* Whether local atomic sorting is enabled or not. */
bool use_local_atomic_sort = true;
};
/* Get instance of debug flags registry. */

View File

@@ -74,7 +74,7 @@ ccl_device float fast_sinf(float x)
*
* Results on: [-2pi,2pi].
*
* Examined 2173837240 values of sin: 0.00662760244 avg ulp diff, 2 max ulp,
* Examined 2173837240 values of sin: 0.00662760244 avg ULP diff, 2 max ULP,
* 1.19209e-07 max error
*/
int q = fast_rint(x * M_1_PI_F);
@@ -256,11 +256,11 @@ ccl_device float fast_acosf(float x)
/* clamp and crush denormals. */
const float m = (f < 1.0f) ? 1.0f - (1.0f - f) : 1.0f;
/* Based on http://www.pouet.net/topic.php?which=9132&page=2
* 85% accurate (ulp 0)
* 85% accurate (ULP 0)
* Examined 2130706434 values of acos:
* 15.2000597 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // without "denormal crush"
* 15.2000597 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // without "denormal crush"
* Examined 2130706434 values of acos:
* 15.2007108 avg ulp diff, 4492 max ulp, 4.51803e-05 max error // with "denormal crush"
* 15.2007108 avg ULP diff, 4492 max ULP, 4.51803e-05 max error // with "denormal crush"
*/
const float a = sqrtf(1.0f - m) *
(1.5707963267f + m * (-0.213300989f + m * (0.077980478f + m * -0.02164095f)));
@@ -270,9 +270,8 @@ ccl_device float fast_acosf(float x)
ccl_device float fast_asinf(float x)
{
/* Based on acosf approximation above.
* Max error is 4.51133e-05 (ulps are higher because we are consistently off
* by a little amount).
*/
* Max error is 4.51133e-05 (ULPS are higher because we are consistently off
* by a little amount). */
const float f = fabsf(x);
/* Clamp and crush denormals. */
const float m = (f < 1.0f) ? 1.0f - (1.0f - f) : 1.0f;
@@ -290,9 +289,9 @@ ccl_device float fast_atanf(float x)
const float t = s * s;
/* http://mathforum.org/library/drmath/view/62672.html
* Examined 4278190080 values of atan:
* 2.36864877 avg ulp diff, 302 max ulp, 6.55651e-06 max error // (with denormals)
* 2.36864877 avg ULP diff, 302 max ULP, 6.55651e-06 max error // (with denormals)
* Examined 4278190080 values of atan:
* 171160502 avg ulp diff, 855638016 max ulp, 6.55651e-06 max error // (crush denormals)
* 171160502 avg ULP diff, 855638016 max ULP, 6.55651e-06 max error // (crush denormals)
*/
float r = s * madd(0.43157974f, t, 1.0f) / madd(madd(0.05831938f, t, 0.76443945f), t, 1.0f);
if (a > 1.0f) {
@@ -343,8 +342,8 @@ ccl_device float fast_log2f(float x)
int exponent = (int)(bits >> 23) - 127;
float f = __uint_as_float((bits & 0x007FFFFF) | 0x3f800000) - 1.0f;
/* Examined 2130706432 values of log2 on [1.17549435e-38,3.40282347e+38]:
* 0.0797524457 avg ulp diff, 3713596 max ulp, 7.62939e-06 max error.
* ulp histogram:
* 0.0797524457 avg ULP diff, 3713596 max ULP, 7.62939e-06 max error.
* ULP histogram:
* 0 = 97.46%
* 1 = 2.29%
* 2 = 0.11%
@@ -363,7 +362,7 @@ ccl_device float fast_log2f(float x)
ccl_device_inline float fast_logf(float x)
{
/* Examined 2130706432 values of logf on [1.17549435e-38,3.40282347e+38]:
* 0.313865375 avg ulp diff, 5148137 max ulp, 7.62939e-06 max error.
* 0.313865375 avg ULP diff, 5148137 max ULP, 7.62939e-06 max error.
*/
return fast_log2f(x) * M_LN2_F;
}
@@ -371,7 +370,7 @@ ccl_device_inline float fast_logf(float x)
ccl_device_inline float fast_log10(float x)
{
/* Examined 2130706432 values of log10f on [1.17549435e-38,3.40282347e+38]:
* 0.631237033 avg ulp diff, 4471615 max ulp, 3.8147e-06 max error.
* 0.631237033 avg ULP diff, 4471615 max ULP, 3.8147e-06 max error.
*/
return fast_log2f(x) * M_LN2_F / M_LN10_F;
}
@@ -392,12 +391,12 @@ ccl_device float fast_exp2f(float x)
/* Range reduction. */
int m = (int)x;
x -= m;
x = 1.0f - (1.0f - x); /* Crush denormals (does not affect max ulps!). */
x = 1.0f - (1.0f - x); /* Crush denormals (does not affect max ULPS!). */
/* 5th degree polynomial generated with sollya
* Examined 2247622658 values of exp2 on [-126,126]: 2.75764912 avg ulp diff,
* 232 max ulp.
* Examined 2247622658 values of exp2 on [-126,126]: 2.75764912 avg ULP diff,
* 232 max ULP.
*
* ulp histogram:
* ULP histogram:
* 0 = 87.81%
* 1 = 4.18%
*/
@@ -415,14 +414,14 @@ ccl_device float fast_exp2f(float x)
ccl_device_inline float fast_expf(float x)
{
/* Examined 2237485550 values of exp on [-87.3300018,87.3300018]:
* 2.6666452 avg ulp diff, 230 max ulp.
* 2.6666452 avg ULP diff, 230 max ULP.
*/
return fast_exp2f(x / M_LN2_F);
}
#if !defined(__KERNEL_GPU__) && !defined(_MSC_VER)
/* MSVC seems to have a code-gen bug here in at least SSE41/AVX, see
* T78047 and T78869 for details. Just disable for now, it only makes
* #78047 and #78869 for details. Just disable for now, it only makes
* a small difference in denoising performance. */
ccl_device float4 fast_exp2f4(float4 x)
{
@@ -454,7 +453,7 @@ ccl_device_inline float4 fast_expf4(float4 x)
ccl_device_inline float fast_exp10(float x)
{
/* Examined 2217701018 values of exp10 on [-37.9290009,37.9290009]:
* 2.71732409 avg ulp diff, 232 max ulp.
* 2.71732409 avg ULP diff, 232 max ULP.
*/
return fast_exp2f(x * M_LN10_F / M_LN2_F);
}
@@ -475,7 +474,7 @@ ccl_device float fast_sinhf(float x)
float a = fabsf(x);
if (a > 1.0f) {
/* Examined 53389559 values of sinh on [1,87.3300018]:
* 33.6886442 avg ulp diff, 178 max ulp. */
* 33.6886442 avg ULP diff, 178 max ULP. */
float e = fast_expf(a);
return copysignf(0.5f * e - 0.5f / e, x);
}
@@ -495,7 +494,7 @@ ccl_device float fast_sinhf(float x)
ccl_device_inline float fast_coshf(float x)
{
/* Examined 2237485550 values of cosh on [-87.3300018,87.3300018]:
* 1.78256726 avg ulp diff, 178 max ulp.
* 1.78256726 avg ULP diff, 178 max ULP.
*/
float e = fast_expf(fabsf(x));
return 0.5f * e + 0.5f / e;
@@ -506,7 +505,7 @@ ccl_device_inline float fast_tanhf(float x)
/* Examined 4278190080 values of tanh on [-3.40282347e+38,3.40282347e+38]:
* 3.12924e-06 max error.
*/
/* NOTE: ulp error is high because of sub-optimal handling around the origin. */
/* NOTE: ULP error is high because of sub-optimal handling around the origin. */
float e = fast_expf(2.0f * fabsf(x));
return copysignf(1.0f - 2.0f / (1.0f + e), x);
}
@@ -579,7 +578,7 @@ ccl_device_inline float fast_erfcf(float x)
{
/* Examined 2164260866 values of erfcf on [-4,4]: 1.90735e-06 max error.
*
* ulp histogram:
* ULP histogram:
*
* 0 = 80.30%
*/

View File

@@ -516,7 +516,7 @@ GHOST_TSuccess GHOST_ContextCGL::releaseNativeHandles()
/* OpenGL on Metal
*
* Use Metal layer to avoid Viewport lagging on macOS, see T60043. */
* Use Metal layer to avoid Viewport lagging on macOS, see #60043. */
static const MTLPixelFormat METAL_FRAMEBUFFERPIXEL_FORMAT = MTLPixelFormatBGRA8Unorm;
static const OSType METAL_CORE_VIDEO_PIXEL_FORMAT = kCVPixelFormatType_32BGRA;

View File

@@ -141,7 +141,7 @@ GHOST_TSuccess GHOST_ContextGLX::initializeDrawingContext()
/* -------------------------------------------------------------------- */
#else
/* Important to initialize only GLXEW (_not_ GLEW),
* since this breaks w/ Mesa's `swrast`, see: T46431. */
* since this breaks w/ Mesa's `swrast`, see: #46431. */
glxewInit();
#endif /* USE_GLXEW_INIT_WORKAROUND */

View File

@@ -302,7 +302,7 @@ bool GHOST_NDOFManager::setDevice(ushort vendor_id, ushort product_id)
switch (product_id) {
case 0xC62E: /* Plugged in. */
case 0xC62F: /* Wireless. */
case 0xC658: /* Wireless (3DConnexion Universal Wireless Receiver in WIN32), see T82412. */
case 0xC658: /* Wireless (3DConnexion Universal Wireless Receiver in WIN32), see #82412. */
{
device_type_ = NDOF_SpaceMouseWireless;
hid_map_button_num_ = 2;
@@ -341,7 +341,7 @@ bool GHOST_NDOFManager::setDevice(ushort vendor_id, ushort product_id)
hid_map_button_mask_ = int(~(UINT_MAX << hid_map_button_num_));
}
CLOG_INFO(LOG, 2, "%d buttons -> hex:%X", hid_map_button_num_, (uint)hid_map_button_mask_);
CLOG_INFO(LOG, 2, "%d buttons -> hex:%X", hid_map_button_num_, uint(hid_map_button_mask_));
return device_type_ != NDOF_UnknownDevice;
}
@@ -445,14 +445,14 @@ void GHOST_NDOFManager::updateButton(int button_number, bool press, uint64_t tim
2,
"button=%d, press=%d (out of range %d, ignoring!)",
button_number,
(int)press,
int(press),
hid_map_button_num_);
return;
}
const NDOF_ButtonT button = hid_map_[button_number];
if (button == NDOF_BUTTON_NONE) {
CLOG_INFO(
LOG, 2, "button=%d, press=%d (mapped to none, ignoring!)", button_number, (int)press);
LOG, 2, "button=%d, press=%d (mapped to none, ignoring!)", button_number, int(press));
return;
}
@@ -460,7 +460,7 @@ void GHOST_NDOFManager::updateButton(int button_number, bool press, uint64_t tim
2,
"button=%d, press=%d, name=%s",
button_number,
(int)press,
int(press),
ndof_button_names[button]);
GHOST_IWindow *window = system_.getWindowManager()->getActiveWindow();

View File

@@ -602,7 +602,7 @@ void GHOST_SystemSDL::processEvent(SDL_Event *sdl_event)
/* NOTE: the `sdl_sub_evt.keysym.sym` is truncated,
* for unicode support ghost has to be modified. */
/* TODO(@campbellbarton): support full unicode, SDL supports this but it needs to be
/* TODO(@ideasman42): support full unicode, SDL supports this but it needs to be
* explicitly enabled via #SDL_StartTextInput which GHOST would have to wrap. */
char utf8_buf[sizeof(GHOST_TEventKeyData::utf8_buf)] = {'\0'};
if (type == GHOST_kEventKeyDown) {

View File

@@ -82,6 +82,8 @@
#include "CLG_log.h"
#ifdef USE_EVENT_BACKGROUND_THREAD
# include "GHOST_TimerTask.h"
# include <pthread.h>
#endif
@@ -139,7 +141,7 @@ constexpr size_t events_pending_default_size = 4096 / sizeof(void *);
* \{ */
/**
* GNOME (mutter 42.2 had a bug with confine not respecting scale - Hi-DPI), See: T98793.
* GNOME (mutter 42.2 had a bug with confine not respecting scale - Hi-DPI), See: #98793.
* Even though this has been fixed, at time of writing it's not yet in a release.
* Workaround the problem by implementing confine with a software cursor.
* While this isn't ideal, it's not adding a lot of overhead as software
@@ -174,7 +176,7 @@ static bool use_gnome_confine_hack = false;
/**
* KDE (plasma 5.26.1) has a bug where the cursor surface needs to be committed
* (via `wl_surface_commit`) when it was hidden and is being set to visible again, see: T102048.
* (via `wl_surface_commit`) when it was hidden and is being set to visible again, see: #102048.
* See: https://bugs.kde.org/show_bug.cgi?id=461001
*/
#define USE_KDE_TABLET_HIDDEN_CURSOR_HACK
@@ -195,8 +197,8 @@ static bool use_gnome_confine_hack = false;
* \{ */
/**
* Fix short-cut part of keyboard reading code not properly handling some keys, see: T102194.
* \note This is similar to X11 workaround by the same name, see: T47228.
* Fix short-cut part of keyboard reading code not properly handling some keys, see: #102194.
* \note This is similar to X11 workaround by the same name, see: #47228.
*/
#define USE_NON_LATIN_KB_WORKAROUND
@@ -239,7 +241,7 @@ enum {
BTN_STYLUS = 0x14b,
/** Use as right-mouse. */
BTN_STYLUS2 = 0x14c,
/** NOTE(@campbellbarton): Map to an additional button (not sure which hardware uses this). */
/** NOTE(@ideasman42): Map to an additional button (not sure which hardware uses this). */
BTN_STYLUS3 = 0x149,
};
@@ -768,7 +770,12 @@ struct GWL_Seat {
int32_t rate = 0;
/** Time (milliseconds) after which to start repeating keys. */
int32_t delay = 0;
/** Timer for key repeats. */
/**
* Timer for key repeats.
*
* \note For as long as #USE_EVENT_BACKGROUND_THREAD is defined, any access to this
* (including null checks, must lock `timer_mutex` first.
*/
GHOST_ITimerTask *timer = nullptr;
} key_repeat;
@@ -832,6 +839,42 @@ static bool gwl_seat_key_depressed_suppress_warning(const GWL_Seat *seat)
return suppress_warning;
}
/**
* \note Caller must lock `timer_mutex`.
*/
static void gwl_seat_key_repeat_timer_add(GWL_Seat *seat,
GHOST_TimerProcPtr key_repeat_fn,
GHOST_TUserDataPtr payload,
const bool use_delay)
{
GHOST_SystemWayland *system = seat->system;
const uint64_t time_step = 1000 / seat->key_repeat.rate;
const uint64_t time_start = use_delay ? seat->key_repeat.delay : time_step;
#ifdef USE_EVENT_BACKGROUND_THREAD
GHOST_TimerTask *timer = new GHOST_TimerTask(
system->getMilliSeconds() + time_start, time_step, key_repeat_fn, payload);
seat->key_repeat.timer = timer;
system->ghost_timer_manager()->addTimer(timer);
#else
seat->key_repeat.timer = system->installTimer(time_start, time_step, key_repeat_fn, payload);
#endif
}
/**
* \note The caller must lock `timer_mutex`.
*/
static void gwl_seat_key_repeat_timer_remove(GWL_Seat *seat)
{
GHOST_SystemWayland *system = seat->system;
#ifdef USE_EVENT_BACKGROUND_THREAD
system->ghost_timer_manager()->removeTimer(
static_cast<GHOST_TimerTask *>(seat->key_repeat.timer));
#else
system->removeTimer(seat->key_repeat.timer);
#endif
seat->key_repeat.timer = nullptr;
}
/** \} */
/* -------------------------------------------------------------------- */
@@ -873,7 +916,7 @@ struct GWL_Display {
* The main purpose of having an active seat is an alternative from always using the first
* seat which prevents events from any other seat.
*
* NOTE(@campbellbarton): This could be extended and developed further extended to support
* NOTE(@ideasman42): This could be extended and developed further extended to support
* an active seat per window (for e.g.), basic support is sufficient for now as currently isn't
* a widely used feature.
*/
@@ -906,6 +949,16 @@ struct GWL_Display {
/** Guard against multiple threads accessing `events_pending` at once. */
std::mutex events_pending_mutex;
/**
* A separate timer queue, needed so the WAYLAND thread can lock access.
* Using the system's #GHOST_Sysem::getTimerManager is not thread safe because
* access to the timer outside of WAYLAND specific logic will not lock.
*
* Needed because #GHOST_System::dispatchEvents fires timers
* outside of WAYLAND (without locking the `timer_mutex`).
*/
GHOST_TimerManager *ghost_timer_manager = nullptr;
#endif /* USE_EVENT_BACKGROUND_THREAD */
};
@@ -959,6 +1012,13 @@ static void gwl_display_destroy(GWL_Display *display)
gwl_display_event_thread_destroy(display);
display->system->server_mutex->unlock();
}
/* Important to remove after the seats which may have key repeat timers active. */
if (display->ghost_timer_manager) {
delete display->ghost_timer_manager;
display->ghost_timer_manager = nullptr;
}
#endif /* USE_EVENT_BACKGROUND_THREAD */
if (display->wl_display) {
@@ -1179,7 +1239,7 @@ static void gwl_registry_entry_remove_all(GWL_Display *display)
{
const bool on_exit = true;
/* NOTE(@campbellbarton): Free by slot instead of simply looping over
/* NOTE(@ideasman42): Free by slot instead of simply looping over
* `display->registry_entry` so the order of freeing is always predictable.
* Otherwise global objects would be feed in the order they are registered.
* While this works in my tests, it could cause difficult to reproduce bugs
@@ -1209,7 +1269,7 @@ static void gwl_registry_entry_remove_all(GWL_Display *display)
* so there is no reason to update all other outputs that an output was removed (for e.g.).
* Pass as -1 to update all slots.
*
* NOTE(@campbellbarton): Updating all other items on a single change is typically worth avoiding.
* NOTE(@ideasman42): Updating all other items on a single change is typically worth avoiding.
* In practice this isn't a problem as so there are so few elements in `display->registry_entry`,
* so few use update functions and adding/removal at runtime is rarely called (plugging/unplugging)
* hardware for e.g. So while it's possible to store dependency links to avoid unnecessary
@@ -1258,7 +1318,7 @@ static void ghost_wl_display_report_error(struct wl_display *display)
fprintf(stderr, "The Wayland connection experienced a fatal error: %s\n", strerror(ecode));
}
/* NOTE(@campbellbarton): The application is running,
/* NOTE(@ideasman42): The application is running,
* however an error closes all windows and most importantly:
* shuts down the GPU context (loosing all GPU state - shaders, bind codes etc),
* so recovering from this effectively involves restarting.
@@ -1268,7 +1328,7 @@ static void ghost_wl_display_report_error(struct wl_display *display)
* So in practice re-connecting to the display server isn't an option.
*
* Exit since leaving the process open will simply flood the output and do nothing.
* Although as the process is in a valid state, auto-save for e.g. is possible, see: T100855. */
* Although as the process is in a valid state, auto-save for e.g. is possible, see: #100855. */
::exit(-1);
}
@@ -1382,7 +1442,7 @@ static GHOST_TKey xkb_map_gkey(const xkb_keysym_t sym)
/* Additional keys for non US layouts. */
/* Uses the same physical key as #XKB_KEY_KP_Decimal for QWERTZ layout, see: T102287. */
/* Uses the same physical key as #XKB_KEY_KP_Decimal for QWERTZ layout, see: #102287. */
GXMAP(gkey, XKB_KEY_KP_Separator, GHOST_kKeyNumpadPeriod);
default:
@@ -2910,7 +2970,7 @@ static void gesture_pinch_handle_begin(void *data,
if (wl_surface *wl_surface_focus = seat->pointer.wl_surface_window) {
win = ghost_wl_surface_user_data(wl_surface_focus);
}
/* NOTE(@campbellbarton): Blender's use of track-pad coordinates is inconsistent and needs work.
/* NOTE(@ideasman42): Blender's use of track-pad coordinates is inconsistent and needs work.
* This isn't specific to WAYLAND, in practice they tend to work well enough in most cases.
* Some operators scale by the UI scale, some don't.
* Even this window scale is not correct because it doesn't account for:
@@ -2924,7 +2984,7 @@ static void gesture_pinch_handle_begin(void *data,
*/
const wl_fixed_t win_scale = win ? win->scale() : 1;
/* NOTE(@campbellbarton): Scale factors match Blender's operators & default preferences.
/* NOTE(@ideasman42): Scale factors match Blender's operators & default preferences.
* For these values to work correctly, operator logic will need to be changed not to scale input
* by the region size (as with 3D view zoom) or preference for 3D view orbit sensitivity.
*
@@ -3087,7 +3147,7 @@ static const struct zwp_pointer_gesture_swipe_v1_listener gesture_swipe_listener
/* -------------------------------------------------------------------- */
/** \name Listener (Touch Seat), #wl_touch_listener
*
* NOTE(@campbellbarton): It's not clear if this interface is used by popular compositors.
* NOTE(@ideasman42): It's not clear if this interface is used by popular compositors.
* It looks like GNOME/KDE only support `zwp_pointer_gestures_v1_interface`.
* If this isn't used anywhere, it could be removed.
* \{ */
@@ -3718,9 +3778,14 @@ static void keyboard_handle_leave(void *data,
GWL_Seat *seat = static_cast<GWL_Seat *>(data);
seat->keyboard.wl_surface_window = nullptr;
/* Losing focus must stop repeating text. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* Losing focus must stop repeating text. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
}
}
#ifdef USE_GNOME_KEYBOARD_SUPPRESS_WARNING
@@ -3743,9 +3808,9 @@ static xkb_keysym_t xkb_state_key_get_one_sym_without_modifiers(
/* Use an empty keyboard state to access key symbol without modifiers. */
xkb_keysym_t sym = xkb_state_key_get_one_sym(xkb_state_empty, key);
/* NOTE(@campbellbarton): Only perform the number-locked lookup as a fallback
/* NOTE(@ideasman42): Only perform the number-locked lookup as a fallback
* when a number-pad key has been pressed. This is important as some key-maps use number lock
* for switching other layers (in particular `de(neo_qwertz)` turns on layer-4), see: T96170.
* for switching other layers (in particular `de(neo_qwertz)` turns on layer-4), see: #96170.
* Alternative solutions could be to inspect the layout however this could get involved
* and turning on the number-lock is only needed for a limited set of keys. */
@@ -3780,36 +3845,32 @@ static xkb_keysym_t xkb_state_key_get_one_sym_without_modifiers(
return sym;
}
/**
* \note Caller must lock `timer_mutex`.
*/
static void keyboard_handle_key_repeat_cancel(GWL_Seat *seat)
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
GHOST_ASSERT(seat->key_repeat.timer != nullptr, "Caller much check for timer");
delete static_cast<GWL_KeyRepeatPlayload *>(seat->key_repeat.timer->getUserData());
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
}
/**
* Restart the key-repeat timer.
* \param use_delay: When false, use the interval
* (prevents pause when the setting changes while the key is held).
*
* \note Caller must lock `timer_mutex`.
*/
static void keyboard_handle_key_repeat_reset(GWL_Seat *seat, const bool use_delay)
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
GHOST_ASSERT(seat->key_repeat.timer != nullptr, "Caller much check for timer");
GHOST_SystemWayland *system = seat->system;
GHOST_ITimerTask *timer = seat->key_repeat.timer;
GHOST_TimerProcPtr key_repeat_fn = timer->getTimerProc();
GHOST_TimerProcPtr key_repeat_fn = seat->key_repeat.timer->getTimerProc();
GHOST_TUserDataPtr payload = seat->key_repeat.timer->getUserData();
seat->system->removeTimer(seat->key_repeat.timer);
const uint64_t time_step = 1000 / seat->key_repeat.rate;
const uint64_t time_start = use_delay ? seat->key_repeat.delay : time_step;
seat->key_repeat.timer = system->installTimer(time_start, time_step, key_repeat_fn, payload);
gwl_seat_key_repeat_timer_remove(seat);
gwl_seat_key_repeat_timer_add(seat, key_repeat_fn, payload, use_delay);
}
static void keyboard_handle_key(void *data,
@@ -3848,6 +3909,11 @@ static void keyboard_handle_key(void *data,
break;
}
#ifdef USE_EVENT_BACKGROUND_THREAD
/* Any access to `seat->key_repeat.timer` must lock. */
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
struct GWL_KeyRepeatPlayload *key_repeat_payload = nullptr;
/* Delete previous timer. */
@@ -3872,7 +3938,7 @@ static void keyboard_handle_key(void *data,
else {
/* Key-up from keys that were not repeating cause the repeat timer to pause.
*
* NOTE(@campbellbarton): This behavior isn't universal, some text input systems will
* NOTE(@ideasman42): This behavior isn't universal, some text input systems will
* stop the repeat entirely. Choose to pause repeat instead as this is what GTK/WIN32 do,
* and it fits better for keyboard input that isn't related to text entry. */
timer_action = RESET;
@@ -3886,23 +3952,14 @@ static void keyboard_handle_key(void *data,
break;
}
case RESET: {
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* The payload will be added again. */
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
break;
}
case CANCEL: {
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
delete key_repeat_payload;
key_repeat_payload = nullptr;
seat->system->removeTimer(seat->key_repeat.timer);
seat->key_repeat.timer = nullptr;
gwl_seat_key_repeat_timer_remove(seat);
break;
}
}
@@ -3956,8 +4013,8 @@ static void keyboard_handle_key(void *data,
utf8_buf));
}
};
seat->key_repeat.timer = seat->system->installTimer(
seat->key_repeat.delay, 1000 / seat->key_repeat.rate, key_repeat_fn, key_repeat_payload);
gwl_seat_key_repeat_timer_add(seat, key_repeat_fn, key_repeat_payload, true);
}
}
@@ -3982,8 +4039,13 @@ static void keyboard_handle_modifiers(void *data,
/* A modifier changed so reset the timer,
* see comment in #keyboard_handle_key regarding this behavior. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, true);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, true);
}
}
#ifdef USE_GNOME_KEYBOARD_SUPPRESS_WARNING
@@ -4002,9 +4064,14 @@ static void keyboard_repeat_handle_info(void *data,
seat->key_repeat.rate = rate;
seat->key_repeat.delay = delay;
/* Unlikely possible this setting changes while repeating. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, false);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
/* Unlikely possible this setting changes while repeating. */
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_reset(seat, false);
}
}
}
@@ -4275,8 +4342,14 @@ static void gwl_seat_capability_keyboard_disable(GWL_Seat *seat)
if (!seat->wl_keyboard) {
return;
}
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
{
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*seat->system->timer_mutex};
#endif
if (seat->key_repeat.timer) {
keyboard_handle_key_repeat_cancel(seat);
}
}
wl_keyboard_destroy(seat->wl_keyboard);
seat->wl_keyboard = nullptr;
@@ -4394,7 +4467,7 @@ static void xdg_output_handle_logical_size(void *data,
#ifdef USE_GNOME_CONFINE_HACK
/* Use a bug in GNOME to check GNOME is in use. If the bug is fixed this won't cause an issue
* as T98793 has been fixed up-stream too, but not in a release at time of writing. */
* as #98793 has been fixed up-stream too, but not in a release at time of writing. */
use_gnome_confine_hack = true;
#endif
@@ -5411,6 +5484,8 @@ GHOST_SystemWayland::GHOST_SystemWayland(bool background)
#ifdef USE_EVENT_BACKGROUND_THREAD
gwl_display_event_thread_create(display_);
display_->ghost_timer_manager = new GHOST_TimerManager();
#endif
}
@@ -5491,10 +5566,16 @@ bool GHOST_SystemWayland::processEvents(bool waitForEvent)
#endif /* USE_EVENT_BACKGROUND_THREAD */
{
const uint64_t now = getMilliSeconds();
#ifdef USE_EVENT_BACKGROUND_THREAD
std::lock_guard lock_timer_guard{*display_->system->timer_mutex};
{
std::lock_guard lock_timer_guard{*display_->system->timer_mutex};
if (ghost_timer_manager()->fireTimers(now)) {
any_processed = true;
}
}
#endif
if (getTimerManager()->fireTimers(getMilliSeconds())) {
if (getTimerManager()->fireTimers(now)) {
any_processed = true;
}
}
@@ -6717,6 +6798,13 @@ struct wl_shm *GHOST_SystemWayland::wl_shm() const
return display_->wl_shm;
}
#ifdef USE_EVENT_BACKGROUND_THREAD
GHOST_TimerManager *GHOST_SystemWayland::ghost_timer_manager()
{
return display_->ghost_timer_manager;
}
#endif
/** \} */
/* -------------------------------------------------------------------- */
@@ -6949,7 +7037,7 @@ bool GHOST_SystemWayland::window_cursor_grab_set(const GHOST_TGrabCursorMode mod
UNPACK2(xy_next));
wl_surface_commit(wl_surface);
/* NOTE(@campbellbarton): The new cursor position is a hint,
/* NOTE(@ideasman42): The new cursor position is a hint,
* it's possible the hint is ignored. It doesn't seem like there is a good way to
* know if the hint will be used or not, at least not immediately. */
xy_motion[0] = xy_next[0];
@@ -6992,7 +7080,7 @@ bool GHOST_SystemWayland::window_cursor_grab_set(const GHOST_TGrabCursorMode mod
if (mode != GHOST_kGrabDisable) {
if (grab_state_next.use_lock) {
if (!grab_state_prev.use_lock) {
/* TODO(@campbellbarton): As WAYLAND does not support warping the pointer it may not be
/* TODO(@ideasman42): As WAYLAND does not support warping the pointer it may not be
* possible to support #GHOST_kGrabWrap by pragmatically settings it's coordinates.
* An alternative could be to draw the cursor in software (and hide the real cursor),
* or just accept a locked cursor on WAYLAND. */

View File

@@ -165,6 +165,16 @@ class GHOST_SystemWayland : public GHOST_System {
bool cursor_grab_use_software_display_get(const GHOST_TGrabCursorMode mode);
#ifdef USE_EVENT_BACKGROUND_THREAD
/**
* Return a separate WAYLAND local timer manager to #GHOST_System::getTimerManager
* Manipulation & access must lock with #GHOST_WaylandSystem::server_mutex.
*
* See #GWL_Display::ghost_timer_manager doc-string for details on why this is needed.
*/
GHOST_TimerManager *ghost_timer_manager();
#endif
/* WAYLAND direct-data access. */
struct wl_display *wl_display();
@@ -198,7 +208,7 @@ class GHOST_SystemWayland : public GHOST_System {
* Clear all references to this output.
*
* \note The compositor should have already called the `wl_surface_listener.leave` callback,
* however some compositors may not (see T103586).
* however some compositors may not (see #103586).
* So remove references to the output before it's destroyed to avoid crashing.
*
* \return true when any references were removed.
@@ -233,7 +243,14 @@ class GHOST_SystemWayland : public GHOST_System {
* from running at the same time. */
std::mutex *server_mutex = nullptr;
/** Threads must lock this before manipulating timers. */
/**
* Threads must lock this before manipulating #GWL_Display::ghost_timer_manager.
*
* \note Using a separate lock to `server_mutex` is necessary because the
* server lock is already held when calling `ghost_wl_display_event_pump`.
* If manipulating the timer used the `server_mutex`, event pump can indirectly
* handle key up/down events which would lock `server_mutex` causing a dead-lock.
*/
std::mutex *timer_mutex = nullptr;
std::thread::id main_thread_id;

View File

@@ -424,10 +424,9 @@ bool GHOST_SystemWin32::processEvents(bool waitForEvent)
processTrackpad();
/* PeekMessage above is allowed to dispatch messages to the wndproc without us
/* `PeekMessage` above is allowed to dispatch messages to the `wndproc` without us
* noticing, so we need to check the event manager here to see if there are
* events waiting in the queue.
*/
* events waiting in the queue. */
hasEventHandled |= this->m_eventManager->getNumEvents() > 0;
} while (waitForEvent && !hasEventHandled);
@@ -566,8 +565,8 @@ GHOST_TKey GHOST_SystemWin32::hardKey(RAWINPUT const &raw, bool *r_key_down)
/**
* \note this function can be extended to include other exotic cases as they arise.
*
* This function was added in response to bug T25715.
* This is going to be a long list T42426.
* This function was added in response to bug #25715.
* This is going to be a long list #42426.
*/
GHOST_TKey GHOST_SystemWin32::processSpecialKey(short vKey, short scanCode) const
{
@@ -1080,11 +1079,11 @@ GHOST_EventCursor *GHOST_SystemWin32::processCursorEvent(GHOST_WindowWin32 *wind
if (window->getCursorGrabMode() == GHOST_kGrabHide) {
window->getClientBounds(bounds);
/* WARNING(@campbellbarton): The current warping logic fails to warp on every event,
/* WARNING(@ideasman42): The current warping logic fails to warp on every event,
* so the box needs to small enough not to let the cursor escape the window but large
* enough that the cursor isn't being warped every time.
* If this was not the case it would be less trouble to simply warp the cursor to the
* center of the screen on every motion, see: D16558 (alternative fix for T102346). */
* center of the screen on every motion, see: D16558 (alternative fix for #102346). */
const int32_t subregion_div = 4; /* One quarter of the region. */
const int32_t size[2] = {bounds.getWidth(), bounds.getHeight()};
const int32_t center[2] = {(bounds.m_l + bounds.m_r) / 2, (bounds.m_t + bounds.m_b) / 2};
@@ -1179,7 +1178,7 @@ GHOST_EventKey *GHOST_SystemWin32::processKeyEvent(GHOST_WindowWin32 *window, RA
GHOST_TKey key = system->hardKey(raw, &key_down);
GHOST_EventKey *event;
/* NOTE(@campbellbarton): key repeat in WIN32 also applies to modifier-keys.
/* NOTE(@ideasman42): key repeat in WIN32 also applies to modifier-keys.
* Check for this case and filter out modifier-repeat.
* Typically keyboard events are *not* filtered as part of GHOST's event handling.
* As other GHOST back-ends don't have the behavior, it's simplest not to send them through.
@@ -1210,7 +1209,7 @@ GHOST_EventKey *GHOST_SystemWin32::processKeyEvent(GHOST_WindowWin32 *window, RA
const bool ctrl_pressed = has_state && state[VK_CONTROL] & 0x80;
const bool alt_pressed = has_state && state[VK_MENU] & 0x80;
/* We can be here with !key_down if processing dead keys (diacritics). See T103119. */
/* We can be here with !key_down if processing dead keys (diacritics). See #103119. */
/* No text with control key pressed (Alt can be used to insert special characters though!). */
if (ctrl_pressed && !alt_pressed) {

View File

@@ -282,7 +282,7 @@ class GHOST_SystemWin32 : public GHOST_System {
GHOST_TSuccess exit();
/**
* Converts raw WIN32 key codes from the wndproc to GHOST keys.
* Converts raw WIN32 key codes from the `wndproc` to GHOST keys.
* \param vKey: The virtual key from #hardKey.
* \param ScanCode: The ScanCode of pressed key (similar to PS/2 Set 1).
* \param extend: Flag if key is not primly (left or right).
@@ -291,7 +291,7 @@ class GHOST_SystemWin32 : public GHOST_System {
GHOST_TKey convertKey(short vKey, short ScanCode, short extend) const;
/**
* Catches raw WIN32 key codes from WM_INPUT in the wndproc.
* Catches raw WIN32 key codes from WM_INPUT in the `wndproc`.
* \param raw: RawInput structure with detailed info about the key event.
* \param r_key_down: Set true when the key is pressed, otherwise false.
* \return The GHOST key (GHOST_kKeyUnknown if no match).
@@ -319,8 +319,8 @@ class GHOST_SystemWin32 : public GHOST_System {
* Creates tablet events from pointer events.
* \param type: The type of pointer event.
* \param window: The window receiving the event (the active window).
* \param wParam: The wParam from the wndproc.
* \param lParam: The lParam from the wndproc.
* \param wParam: The wParam from the `wndproc`.
* \param lParam: The lParam from the `wndproc`.
* \param eventhandled: True if the method handled the event.
*/
static void processPointerEvent(
@@ -337,8 +337,8 @@ class GHOST_SystemWin32 : public GHOST_System {
/**
* Handles a mouse wheel event.
* \param window: The window receiving the event (the active window).
* \param wParam: The wParam from the wndproc.
* \param lParam: The lParam from the wndproc.
* \param wParam: The wParam from the `wndproc`.
* \param lParam: The lParam from the `wndproc`.
*/
static void processWheelEvent(GHOST_WindowWin32 *window, WPARAM wParam, LPARAM lParam);

View File

@@ -46,7 +46,7 @@
#ifdef WITH_X11_XFIXES
# include <X11/extensions/Xfixes.h>
/* Workaround for XWayland grab glitch: T53004. */
/* Workaround for XWayland grab glitch: #53004. */
# define WITH_XWAYLAND_HACK
#endif
@@ -71,11 +71,11 @@
# define USE_XINPUT_HOTPLUG
#endif
/* see T34039 Fix Alt key glitch on Unity desktop */
/* see #34039 Fix Alt key glitch on Unity desktop */
#define USE_UNITY_WORKAROUND
/* Fix 'shortcut' part of keyboard reading code only ever using first defined key-map
* instead of active one. See T47228 and D1746 */
* instead of active one. See #47228 and D1746 */
#define USE_NON_LATIN_KB_WORKAROUND
static uchar bit_is_on(const uchar *ptr, int bit)
@@ -278,7 +278,7 @@ uint8_t GHOST_SystemX11::getNumDisplays() const
void GHOST_SystemX11::getMainDisplayDimensions(uint32_t &width, uint32_t &height) const
{
if (m_display) {
/* NOTE(@campbellbarton): for this to work as documented,
/* NOTE(@ideasman42): for this to work as documented,
* we would need to use Xinerama check r54370 for code that did this,
* we've since removed since its not worth the extra dependency. */
getAllDisplayDimensions(width, height);
@@ -927,8 +927,8 @@ void GHOST_SystemX11::processEvent(XEvent *xe)
if (window->getCursorGrabMode() == GHOST_kGrabHide) {
window->getClientBounds(bounds);
/* TODO(@campbellbarton): warp the cursor to `window->getCursorGrabInitPos`,
* on every motion event, see: D16557 (alternative fix for T102346). */
/* TODO(@ideasman42): warp the cursor to `window->getCursorGrabInitPos`,
* on every motion event, see: D16557 (alternative fix for #102346). */
const int32_t subregion_div = 4; /* One quarter of the region. */
const int32_t size[2] = {bounds.getWidth(), bounds.getHeight()};
const int32_t center[2] = {
@@ -964,7 +964,7 @@ void GHOST_SystemX11::processEvent(XEvent *xe)
if (x_new != xme.x_root || y_new != xme.y_root) {
/* Use time of last event to avoid wrapping several times on the 'same' actual wrap.
* Note that we need to deal with X and Y separately as those might wrap at the same time
* but still in two different events (corner case, see T74918).
* but still in two different events (corner case, see #74918).
* We also have to add a few extra milliseconds of 'padding', as sometimes we get two
* close events that will generate extra wrap on the same axis within those few
* milliseconds. */
@@ -1028,7 +1028,7 @@ void GHOST_SystemX11::processEvent(XEvent *xe)
/* XXX: Code below is kinda awfully convoluted... Issues are:
* - In keyboards like Latin ones, numbers need a 'Shift' to be accessed but key_sym
* is unmodified (or anyone swapping the keys with `xmodmap`).
* - #XLookupKeysym seems to always use first defined key-map (see T47228), which generates
* - #XLookupKeysym seems to always use first defined key-map (see #47228), which generates
* key-codes unusable by ghost_key_from_keysym for non-Latin-compatible key-maps.
*
* To address this, we:
@@ -1715,7 +1715,7 @@ GHOST_TSuccess GHOST_SystemX11::setCursorPosition(int32_t x, int32_t y)
#if defined(WITH_X11_XINPUT) && defined(USE_X11_XINPUT_WARP)
if ((m_xinput_version.present) && (m_xinput_version.major_version >= 2)) {
/* Needed to account for XInput "Coordinate Transformation Matrix", see T48901 */
/* Needed to account for XInput "Coordinate Transformation Matrix", see #48901 */
int device_id;
if (XIGetClientPointer(m_display, None, &device_id) != False) {
XIWarpPointer(m_display, device_id, None, None, 0, 0, 0, 0, relx, rely);
@@ -2015,8 +2015,8 @@ void GHOST_SystemX11::getClipboard_xcout(
return;
}
/* if it's not incr, and not format == 8, then there's
* nothing in the selection (that xclip understands, anyway) */
/* If it's not INCR, and not `format == 8`, then there's
* nothing in the selection (that `xclip` understands, anyway). */
if (pty_format != 8) {
*context = XCLIB_XCOUT_NONE;

View File

@@ -20,8 +20,8 @@
/* Disable XINPUT warp, currently not implemented by Xorg for multi-head display.
* (see comment in XSERVER `Xi/xiwarppointer.c` -> `FIXME: panoramix stuff is missing` ~ v1.13.4)
* If this is supported we can add back XINPUT for warping (fixing T48901).
* For now disable (see T50383). */
* If this is supported we can add back XINPUT for warping (fixing #48901).
* For now disable (see #50383). */
// # define USE_X11_XINPUT_WARP
#endif

View File

@@ -669,7 +669,7 @@ static void xdg_surface_handle_configure(void *data,
GHOST_SystemWayland *system = win->ghost_system;
const bool is_main_thread = system->main_thread_id == std::this_thread::get_id();
if (!is_main_thread) {
/* NOTE(@campbellbarton): this only gets one redraw,
/* NOTE(@ideasman42): this only gets one redraw,
* I could not find a case where this causes problems. */
gwl_window_pending_actions_tag(win, PENDING_FRAME_CONFIGURE);
}
@@ -774,7 +774,7 @@ GHOST_WindowWayland::GHOST_WindowWayland(GHOST_SystemWayland *system,
window_->ghost_window = this;
window_->ghost_system = system;
/* NOTE(@campbellbarton): The scale set here to avoid flickering on startup.
/* NOTE(@ideasman42): The scale set here to avoid flickering on startup.
* When all monitors use the same scale (which is quite common) there aren't any problems.
*
* When monitors have different scales there may still be a visible window resize on startup.
@@ -812,12 +812,12 @@ GHOST_WindowWayland::GHOST_WindowWayland(GHOST_SystemWayland *system,
* when the `window_->scale` changed. */
const int32_t size_min[2] = {320, 240};
/* This value is expected to match the base name of the `.desktop` file. see T101805.
/* This value is expected to match the base name of the `.desktop` file. see #101805.
*
* NOTE: the XDG desktop-entry-spec defines that this should follow the "reverse DNS" convention.
* For e.g. `org.blender.Blender` - however the `.desktop` file distributed with Blender is
* simply called `blender.desktop`, so the it's important to follow that name.
* Other distributions such as SNAP & FLATPAK may need to change this value T101779.
* Other distributions such as SNAP & FLATPAK may need to change this value #101779.
* Currently there isn't a way to configure this, we may want to support that. */
const char *xdg_app_id = (
#ifdef WITH_GHOST_WAYLAND_APP_ID
@@ -1078,9 +1078,9 @@ GHOST_WindowWayland::~GHOST_WindowWayland()
wl_surface_destroy(window_->wl_surface);
/* NOTE(@campbellbarton): Flushing will often run the appropriate handlers event
/* NOTE(@ideasman42): Flushing will often run the appropriate handlers event
* (#wl_surface_listener.leave in particular) to avoid attempted access to the freed surfaces.
* This is not fool-proof though, hence the call to #window_surface_unref, see: T99078. */
* This is not fool-proof though, hence the call to #window_surface_unref, see: #99078. */
wl_display_flush(system_->wl_display());
delete window_;

View File

@@ -15,7 +15,7 @@
#include <wayland-util.h> /* For #wl_fixed_t */
/**
* Define to workaround for a bug/limitation in WAYLAND, see: T100855 & upstream report:
* Define to workaround for a bug/limitation in WAYLAND, see: #100855 & upstream report:
* https://gitlab.freedesktop.org/wayland/wayland/-/issues/159
*
* Consume events from WAYLAND in a thread, this is needed because overflowing the event queue

View File

@@ -154,7 +154,7 @@ GHOST_WindowWin32::GHOST_WindowWin32(GHOST_SystemWin32 *system,
}
if (parentwindow) {
/* Release any parent capture to allow immediate interaction (T90110). */
/* Release any parent capture to allow immediate interaction (#90110). */
::ReleaseCapture();
parentwindow->lostMouseCapture();
}

View File

@@ -418,7 +418,7 @@ void GHOST_WindowX11::refreshXInputDevices()
for (GHOST_SystemX11::GHOST_TabletX11 &xtablet : m_system->GetXTablets()) {
/* With modern XInput (XLIB 1.6.2 at least and/or EVDEV 2.9.0) and some 'no-name' tablets
* like 'UC-LOGIC Tablet WP5540U', we also need to 'select' ButtonPress for motion event,
* otherwise we do not get any tablet motion event once pen is pressed... See T43367.
* otherwise we do not get any tablet motion event once pen is pressed... See #43367.
*/
XEventClass ev;
@@ -1467,7 +1467,7 @@ GHOST_TSuccess GHOST_WindowX11::setWindowCursorGrab(GHOST_TGrabCursorMode mode)
}
}
/* Perform this last so to workaround XWayland bug, see: T53004. */
/* Perform this last so to workaround XWayland bug, see: #53004. */
if (m_cursorGrab == GHOST_kGrabHide) {
setWindowCursorVisibility(true);
}

View File

@@ -281,6 +281,8 @@ inline T *MEM_new(const char *allocation_name, Args &&...args)
*/
template<typename T> inline void MEM_delete(const T *ptr)
{
static_assert(!std::is_void_v<T>,
"MEM_delete on a void pointer not possible. Cast it to a non-void type?");
if (ptr == nullptr) {
/* Support #ptr being null, because C++ `delete` supports that as well. */
return;

Some files were not shown because too many files have changed in this diff Show More