1
1

Compare commits

..

520 Commits

Author SHA1 Message Date
f77be09171 Merge branch 'master' into temp-ui-cpp 2022-11-26 00:22:49 -06:00
1a3681b83c Cleanup: Move interface_intern.hh
The entire interface directory is now compiled as C++ files!
2022-11-26 00:21:17 -06:00
3c2073d844 Cleanup: Move interface eyedroppers directory to C++ 2022-11-26 00:01:49 -06:00
a1ea6d6496 Cleanup: Move interface_handlers.c to C++ 2022-11-25 23:48:33 -06:00
1836acbeac Cleanup: Move six more interface files to C++ 2022-11-25 23:48:18 -06:00
8d269a2488 PyDocs: Fix incorrect data type of bmesh.types.BMFaceSeq.new
The API documentation of [[ https://docs.blender.org/api/current/bmesh.types.html?highlight=faces#bmesh.types.BMFaceSeq.new | bmesh.types.BMFaceSeq.new ]] indicates that the argument is only `BMVert`.
But the correct one is sequence of `BMVert`.
This patch fixes this mismatch.

Contributed by @Nutti

Differential Revision: https://developer.blender.org/D15668
2022-11-25 19:55:04 -05:00
ed6e1381dc Cleanup: Avoid using macros to refer to theme global variables
Prefer slightly more verbose code to the use of macros where
they aren't really necessary and just add indirection.
2022-11-25 17:11:44 -06:00
248ee6270b Cleanup: Remove unnecessary includes 2022-11-25 17:10:28 -06:00
afd16c487d Cleanup: Move four interface files to C++ 2022-11-25 17:09:47 -06:00
4029cdee7b Merge branch 'blender-v3.4-release' 2022-11-25 15:28:48 -06:00
4a0e19e608 Cleanup: Group deprecated mesh DNA fields, improve comments 2022-11-25 12:54:22 -06:00
5ca6965273 Merge branch 'blender-v3.4-release' 2022-11-25 18:54:49 +01:00
f07b09da27 Cycles: Improve oneAPI backend support for non-Intel platforms 2022-11-25 17:46:59 +01:00
f83aa1ae08 Fix T102764: Slow change of active material slot
The issue is caused by the combination of the following factors:

- There is a driver from custom property to the subdivision surface
  modifier.
- Active material index tags the ID for the copy-on-write update.

Dependency graph currently does not fully distinguish between
copy-on-write tag and properties-update tag, so the copy-on-write tag
makes the dependency graph believe that it is property which actually
affects evaluation has been changed.

The simple solution is to treat the active material slot index as an
interface data which does not need to trigger copy-on-write tag.

The possible downside of this solution is that if someone has a driver
from this property the driver will stop working. Whether there is such
a real-life setup or not is not clear. Is not something advisable to do
anyway.

Possible alternative would be to introduce more granularity into the
way how property tagging is done. This is something that would be nice
to implement eventually, but it is a much bigger refactor.

Differential Revision: https://developer.blender.org/D16613
2022-11-25 17:44:54 +01:00
d1c21d5673 Fix T102470: Make material LineArt properties animatable.
MaterialLineArt didn't have a path func, now corrected.
2022-11-25 23:18:41 +08:00
994e3c6ac5 Clarify depsgraph API usage in the libraries code
Basically copy the information from the commit message of the
03e2f11d48 directly to the code.

This makes the information easier to find when working on the
code.
2022-11-25 15:25:55 +01:00
118afe1de3 Fix T101824: Line art flickers when light object has scaling.
Line art doesn't expect light or camera objects to have scaling.
2022-11-25 22:00:21 +08:00
ae6e35279f Clarify comment about ID_RECALC_COPY_ON_WRITE
The copy-on-write is really an implementation detail of the
dependency graph. While there are still cases where there is
no better tag to be used, the ID_RECALC_COPY_ON_WRITE should
not be used in combination with a dedicated tag.

For example if location of object changes the proper tag is
`ID_RECALC_TRANSFORM`. Tagging with `ID_RECALC_TRANSFORM |
ID_RECALC_COPY_ON_WRITE` will seemingly work, but this is
not an intended usage.
2022-11-25 14:55:50 +01:00
043673ca70 Cleanup: Alembic, deduplicate CacheObjectPath creation
No functionnal changes.
2022-11-25 14:37:48 +01:00
3cf803cf3c Cleanup: Alembic, use MEM_cnew
Avoids extra cast. No functionnal change.
2022-11-25 14:27:18 +01:00
6422f75088 Merge branch 'blender-v3.4-release' 2022-11-25 12:44:05 +01:00
6bc3311410 Fix: Missing node warning when compositor is enabled
If the compositor is enabled or disabled, the node warnings for
unsupported nodes is not updated because of a missing redraw. This patch
adds that missing redraw in order to make the change immediate.
2022-11-25 13:19:13 +02:00
60ad5f49fa Cleanup: move C++ declarations to separate .hh header 2022-11-25 12:18:10 +01:00
826535979a Nodes: add non-const utility to find socket by identifier
This does the same as the corresponding const method.
2022-11-25 12:11:13 +01:00
32690cafd1 Fix: Missing compositor warning for Render Layers
The Render Layers node didn't draw a warning in the viewport when an
unsupported pass is used. This patch adds that warning.
2022-11-25 12:59:50 +02:00
64c26d2862 Merge branch 'blender-v3.4-release' 2022-11-25 11:45:01 +01:00
b9c358392d BLI: Fix error in vector library and add more test for operators
The operator was wrongly returning a reference to local temp variable.

Add test for all uncovered operators.
2022-11-25 11:28:04 +01:00
0ce18561bc Fix (unreported) uv unwrap selected was splitting selection
Add support for `pin_unselected` in new UV Packing API.

Regression introduced by API change in rBe3075f3cf7ce.
2022-11-25 15:52:04 +13:00
851906744e Merge branch 'blender-v3.4-release' 2022-11-24 15:14:12 -06:00
0710ec485e Cleanup: Remove unused IMB tile cache code (part 2)
Missed in the first commit[1].

Initially it was reported that the `flags` parameter was unused on
`imb_cache_filename` but it turns out another swath of code was unused
related to that same function. Clean this up now too.

[1] 38573d515e
2022-11-24 12:41:05 -08:00
49129180b2 Merge branch 'blender-v3.4-release' 2022-11-24 17:30:54 -03:00
58c8c4fde3 Animation: Improve performance of Bake Action operator
This dramatically improves baking performance by batch-adding keyframes
instead of adding them individually, reducing unnecessary overhead.

Testing indicates an approximate 4x performance uplift.

Reviewed By: sybren, RiggingDojo

Differential Revision: https://developer.blender.org/D8808
2022-11-24 11:26:17 -08:00
81754a0fc5 Cleanup: remove else after return. 2022-11-24 09:56:05 -08:00
412642865d Cleanup: Resolve a warning for the ambiguity on the parenthesis in oneAPI code
No functional changes.
2022-11-24 18:05:02 +01:00
14a0fb0cc6 Merge branch 'blender-v3.4-release' 2022-11-24 09:02:23 -08:00
de27925aea Fix (unreported) inconsistent name_map during file reading.
Swapping some ID lists between Mains must invalidate the name_map cache.

Note that in theory, at least WM type could be ignored by name_map
cache, since it is a singleton. However, don't think it's worth adding
extra complication here, for really marginal benefits. The overhead of
rebuilding the name cache here is extremly small.

For some reason, this issue did not show so far in master, only appeared
in some branch work on improving (in)direct status of linked IDs... Go
figure.
2022-11-24 17:08:18 +01:00
d6d5089c65 Metal: Fix a warning and compilation errors
These were oversight when developping without testing on MacOS.
2022-11-24 16:57:46 +01:00
20c1ce3d9b BLI: Make scalar vector constructor more generic
This makes it possible to use any type of scalar to init a vector and
reduce code duplication.
2022-11-24 16:16:42 +01:00
f47daa7ec9 Merge branch 'blender-v3.4-release' 2022-11-24 15:33:25 +01:00
Christoph Lendenfeld
bb665ef8d2 Merge branch 'blender-v3.4-release' 2022-11-24 15:20:30 +01:00
1b7b996e16 Merge branch 'blender-v3.4-release' 2022-11-24 13:36:48 +01:00
5f626ac331 Cleanup: use more concise function names in function nodes
This is the same naming convention that's used for geometry nodes.
2022-11-24 12:49:17 +01:00
369914e7b1 Liblink: Add test over direct vs indirect link status.
Some checks are currently commented out, since Blender will never 'clear'
the directly linked status of an ID once it has been used by local data.
2022-11-24 10:52:27 +01:00
bbf09eb59c Merge branch 'blender-v3.4-release' 2022-11-24 10:15:36 +01:00
2dcdfab94c Realtime Compositor: Warn about unsupported MacOS
This patch warns the user that MacOS is not supported for the viewport
compositor in the shading panel.

See T102353.
2022-11-24 09:25:44 +02:00
38573d515e Cleanup: Remove unused IMB tile cache code
This removes the unused code for the IBM tile cache APIs.  These have
been unused for as far back as I could manage to search.

Since TIFF was used for the cached images, this removal will allow for
an easier review when it comes time to move TIFF to OIIO as part of
T101413.

Differential Revision: https://developer.blender.org/D16587
2022-11-23 19:31:10 -08:00
f4e1f62c62 Merge branch 'blender-v3.4-release' 2022-11-23 19:35:39 +01:00
584089879c BLI: Follow up and fix recent span slicing change
a5e7657cee didn't account for slices of zero sizes, and the asserts
were slightly incorrect otherwise. Also, the change didn't apply to
`Span`, only `MutableSpan`, which was a mistake. This also adds "safe"
methods to `IndexMask`, and switches function calls where necessary.
2022-11-23 11:36:06 -06:00
38cf48f62b Fix: Missing caches in curves bounds evaluation 2022-11-23 11:36:06 -06:00
50aad904b3 Merge branch 'blender-v3.4-release' 2022-11-23 14:15:22 -03:00
f13160d188 Cleanup: quiet deprecation warnings
This fixes these warnings: P3340.
2022-11-23 17:15:33 +01:00
583f19d692 Merge branch 'blender-v3.4-release' 2022-11-23 17:03:17 +01:00
c1eeb38f7c Cleanup: Move poly normal calculation to mesh_normals.cc 2022-11-23 09:49:18 -06:00
c3d6f5ecf3 Merge branch 'blender-v3.4-release' 2022-11-23 16:39:09 +01:00
737d363e02 Cleanup: remove unused node type
This wasn't used for backwards compatibility, because Blender does not
read from the `nodetype` anywhere. It also wasn't used for forward
compatibility, because it was not initialized for new node groups.
2022-11-23 16:15:25 +01:00
460f7ec7aa Windows: Run blender-launcher.exe instead of blender.exe
With this change Blender, delivered via the Microsoft store, will launch without the console window flashing.

Ref T88613

Differential Revision: https://developer.blender.org/D16589
2022-11-23 15:14:13 +01:00
Jeroen Bakker
a819523dff Vulkan: Add VK memory allocator 3.0.1 to extern.
Vulkan doesn't have a memory allocator builtin. The application should
provide the memory allocator at runtime. Vulkan Memory Allocator is a
widely used implementation.

Vulkan Memory Allocator is a header only implementation, but the using
application should compile a part in a CPP compile unit. The file
`vk_mem_alloc_impl.cc` and `extern_vulkan_memory_allocator` library
is therefore introduced.

Reviewed By: fclem

Differential Revision: https://developer.blender.org/D16572
2022-11-23 14:42:27 +01:00
68a450cbe4 Cleanup: Remove unused parameter in node draw 2022-11-23 15:06:40 +02:00
aa0c2c0f47 Cleanup: move some data from bNodeTree to run-time data
No functional changes are expected.
2022-11-23 14:05:30 +01:00
4f02817367 Nodes: remove bNodeTree->interface_type
This is not used for anything in practice currently. The original intention
was probably to generate different socket subtypes, but that is solved
differently now (e.g. using `NodeSocketFloatDistance`). It's possible
that an addon tried to use this but it's rather unlikely.

Differential Revision: https://developer.blender.org/D13188
2022-11-23 13:49:07 +01:00
247d75d2b1 Realtime Compositor: Warn about unsupported setups
This patch warns the user that the compositor setup is not fully
supported when an unsupported node is used. The warning is displayed as
an engine warning overlay and in the node header itself.

See T102353.

Differential Revision: https://developer.blender.org/D16508

Reviewed By: Clement Foucault
2022-11-23 14:35:26 +02:00
6396d29779 Merge branch 'blender-v3.4-release' 2022-11-23 13:25:12 +01:00
1c00b2ef70 Cleanup: move paint_cursor.c and paint_image_proj.c to C++
This makes it easier to use c++ when improving the internal node api.
2022-11-23 12:56:34 +01:00
106277be43 Merge branch 'blender-v3.4-release' 2022-11-23 12:54:15 +01:00
7d44676b5f Realtime Compositor: Disable on MacOS
This patch disables the realtime compositor on MacOS until Metal is
supported. This is because MacOS doesn't support the necessary GPU
features to make it work.

An engine error overlay is displayed if it is enabled and the option
itself is greyed out.

See T102353.

Differential Revision: https://developer.blender.org/D16510

Reviewed By: Clement Foucault
2022-11-23 13:34:31 +02:00
11275b7363 Realtime Compositor: Extend option to enable compositor
This patch turns the checkbox option to enable the viewport compositor
into a 3-option enum that allows:

- Disabled.
- Enabled.
- Enabled only in camera view.

See T102353.

Differential Revision: https://developer.blender.org/D16509

Reviewed By: Clement Foucault
2022-11-23 13:27:47 +02:00
80249ce6e4 Asset Browser: Allow changing active catalog from Python
The active catalog ID (UUID) was a read only property. From a studio I
got the request to make this editable, so their pipeline tooling can
make certain assets visible.

Differential Revision: https://developer.blender.org/D16356

Reviewed by: Sybren Stüvel
2022-11-23 12:05:16 +01:00
e0c5ff87b7 Realtime Compositor: Implement Track Position node
This patch implements the Track Position node for the realtime
compositor.

Differential Revision: https://developer.blender.org/D16387

Reviewed By: Clement Foucault
2022-11-23 12:56:24 +02:00
571f373155 UI: Don't render missing linked material previews, avoids UI freezing
Opening the material selector after reloading files could cause long UI
freezes, because some linked in materials don't have the preview stored
in the source file. So Blender would keep rerendering it after every
file load, which may involve compiling OpenGL shaders, which again
freezes the UI typically. This was reported as quite an issue for the
Heist Production by the Blender Studio.

Don't render these missing material previews from linked data-blocks
anymore.

Differential Revision: https://developer.blender.org/D16538

Reviewed by: Brecht Van Lommel, Jeroen Bakker
2022-11-23 11:39:53 +01:00
c464fd724b Fix T102697: Gpencil Subdiv modifier level increased
The old hard limit was 5, but now it's possible set to a max
value of 16. UI limit remains to 5.

This extreme value is only used in some corner case, but it
was a request by some artists.

Warning: Using very high values could produce a long calculation time, especially in strokes with a high density of points.
2022-11-23 11:23:38 +01:00
356373ff7a Cleanup: move some data from bNodeSocket to run-time data
No functional changes are expected.
2022-11-23 10:42:17 +01:00
5938e97a24 Merge branch 'blender-v3.4-release' 2022-11-23 10:36:11 +01:00
01e479b790 Cleanup: simplify removing asset code
Differential Revision: https://developer.blender.org/D16570
2022-11-23 10:04:24 +01:00
63ae0939ed Merge branch 'blender-v3.4-release' 2022-11-23 10:22:42 +09:00
cdcbf05ea8 BLI: Make Report Missing Files display message when no files are missing
Before this, if there were no missing files, the operator would run
successfully but there would be no user feedback at all, making the
user wonder if the operator was even run.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16585
2022-11-22 16:48:51 -08:00
2cb6b0b4eb Merge branch 'blender-v3.4-release' 2022-11-22 15:52:33 -06:00
e3567aad0a Merge branch 'blender-v3.4-release' 2022-11-22 12:49:48 -08:00
0f3b3ee679 Merge branch 'blender-v3.4-release' 2022-11-22 12:48:29 -08:00
40ac3776db Merge branch 'blender-v3.4-release' 2022-11-22 12:41:54 -08:00
a777c09d5f Sculpt: Standardize face set undo steps, optimize memory usage
Currently the face set of every single face is saved for every sculpt undo step.
When only changing the face sets of a small section of the mesh, this can be quite
wasteful. It also makes face sets a special case compare to all other sculpt undo step
types, which makes the whole system more complex and harder to improve.

Fixes T101203.

Reviewed By: Hans Goudey
Differential Revision: https://developer.blender.org/D16224
Ref D16224
2022-11-22 12:13:33 -08:00
41f58fadae Cleanup: Decrease variable scope, change names in BMesh layer handling 2022-11-22 14:03:39 -06:00
aa1a51ff9f Merge branch 'blender-v3.4-release' 2022-11-22 13:52:30 -06:00
55db44ca2c Merge branch 'blender-v3.4-release' 2022-11-22 11:17:30 -08:00
c3e919b317 Sculpt: Fix broken multires hidden undo
Wrote a new API method, BKE_pbvh_sync_visibility_from_verts
that flushes vertex hidden flags to edges & faces.

Fixes not being able to sculpt outside a face set after
undoing the fkey hide-all-but-this operator.
2022-11-22 11:11:03 -08:00
14d0b57be7 Cleanup: Use array_utils to copy evaluated field array 2022-11-22 12:49:51 -06:00
8e535ee9b4 Cleanup: Remove unused math min max utility
This has a friendlier multithreaded implementation in BLI_bounds.hh now.
2022-11-22 12:31:31 -06:00
772696a1a5 Geometry Nodes: Parallelize bounds compuation in points to volume
On my computer this saves a few milliseconds when there are
over 1 million points.
2022-11-22 12:13:52 -06:00
8990983b07 Cleanup: Use const arguments for custom normals 2022-11-22 11:48:16 -06:00
a5e7657cee BLI: Remove clamping from span slicing
Currently slicing a span clamped the final size so that it would be
within bounds of the input. However, in the vast majority of cases
that is already the case anyway, and we can use asserts to detect
when that assumption fails.

The clamping had a performance cost. On a test interpolating a boolean
attribute from 1 million curves to 4 million points, removing the
clamping saved about 10% of the time. That's an extreme case but
this probably slightly improves performance in other cases too.
Slicing is used a lot in the new curve code.

This commit introduces `slice_safe` which still does the clamping,
and uses it in the few places that needed it or where I wasn't
sure.
2022-11-22 11:29:24 -06:00
822dab9a42 Cleanup: Strict compiler warnings
Mark function as local to the translation unit.
2022-11-22 17:52:44 +01:00
67194fb247 Merge branch 'blender-v3.4-release' 2022-11-22 16:35:26 +01:00
ed82bbfc2c Cleanup: use designated initializers in C 2022-11-22 10:48:16 -03:00
dd5fdb9370 GPU: Add MoltenVK as dependencies to Vulkan SDK.
MoltenVK is part of the vulkan SDK. Blender requires the vulkan SDK
to compile. This patch adds the MoltenVK includes and libraries to
the Vulkan includes and libraries.
2022-11-22 14:07:34 +01:00
51f56e71cb Merge branch 'blender-v3.4-release' 2022-11-22 13:41:07 +01:00
02c6136958 Fix compile error with msvc
error C2059: syntax error: '}'
2022-11-22 09:13:16 -03:00
b79e5ae4f2 GHOST: Add missing C_API function to header file.
- GHOST_GetDrawingContext was missing.
2022-11-22 12:45:49 +01:00
7dea18b3aa Tracking: Store lens principal point in normalized space
This avoids need to do special trickery detecting whether the principal
point is to be changed when reloading movie clip. This also allows to
transfer the optical center from high-res footage to possibly its lower
resolution proxy without manual adjustment.

On a user level the difference is that the principal point is exposed in
the normalized coordinates: frame center has coordinate of (0, 0), left
bottom corner of a frame has coordinate of (-1, -1) and the right top
corner has coordinate of (1, 1).

Another user-visible change is that there is no more operator for setting
the principal point to center: use backspace on the center sliders will
reset values to 0 which corresponds to the center.

The code implements versioning in both directions, so it should be
possible to open file in older Blender versions without loosing
configuration.

For the Python API there are two ways to access the property:
- `tracking.camera.principal_point` which is measured in the normalized
  space.
- `tracking.camera.principal_point_pixels` to access the pixel-space
  principal point.

Both properties are not animatable, so there will by no conflict coming.

Differential Revision: https://developer.blender.org/D16573
2022-11-22 11:49:56 +01:00
04dc58df83 Tracking: Mark more deprecated RNA access as such
Also clarify a bit the new way of accessing the data.
2022-11-22 11:49:56 +01:00
294ff0de43 Cleanup: More clear function name
Make it explicit in the name that track index is the one used for
selection (previously it read as if the track index is within its
list).
2022-11-22 11:49:56 +01:00
f383dfabf7 Cleanup: Simplify public tracking API
Remove functions which are a trivial accessor.
2022-11-22 11:49:56 +01:00
d37efe332c Tracking: Mark deprecated active tracks access as such 2022-11-22 11:49:56 +01:00
ceaf4779da Fix active track always assigned for the active object
A missing part from the storage split refactor.
2022-11-22 11:49:56 +01:00
88d9ed3c1c Fix missing update on active motion track change from Python 2022-11-22 11:49:56 +01:00
0b251493c8 Tracking: Raise python exception when assigning wrong active track
Before this an attempt to assign track from another object wos
silently assigning active object to null. Such silencing of
errors is not really a good way.
2022-11-22 11:49:56 +01:00
ea969ccc02 Refactor: Replace marker visibility macro with function
Also optimize sub-optimal request for active object for every
call of the check.

Should be no functional changes.
2022-11-22 11:49:56 +01:00
b864397201 Refactor: Clip editor tracking selection operator
De-duplicate selection logic and threshold between various
operators (selection and sliding).

The user measurable difference is that regular selection
threshold now matches sliding threshold: it is more strict
now. The possible downside of this is that it might be more
tricky to select tracks, but this is what needs to happen
for tools support. Also, this matches object selection in
viewport.
2022-11-22 11:49:56 +01:00
4d1a116cdf Refactor: Simplify tracking active element accessor API
Use active object accessor, and then access data from the
object. There is no need to have an API call for shortcut
of all object fields.

Should be no functional change.
2022-11-22 11:49:56 +01:00
1300da6d39 Cleanup: More clear function name in tracking module
Make it more obvious in the name that an operation is not
cheap, and that the function operates on a tracks from
object and does not need a global tracking structure.
2022-11-22 11:49:56 +01:00
016f9c2cf5 Cleanup: Variable naming in tracking files
Make it obvious that the object is the motion tracking one, and
not the ID_OB type.
2022-11-22 11:49:56 +01:00
953f719e58 Cleanup: Variable scope in tracking files 2022-11-22 11:49:56 +01:00
1168665653 Refactor: Remove trivial accessor functions from tracking 2022-11-22 11:49:56 +01:00
fe38715600 Refactor: Unify storage for motion tracking camera and objects
Historically tracks and reconstruction for motion tracking camera
object were stored in the motion tracking structure. This is because
the data structures pre-dates object tracking support, and it was
never changed to preserve compatibility.

Now the compatibility code supports more tricks and allows to change
the ownership without breaking any compatibility. This is what this
change does: it moves tracks from motion tracking structure to the
motion tracking camera object, and does it in a way that no
compatibility is broken.

One of the side-effects of this change is that the active track is
now stored on motion tracking object level, which allows to change
active motion tracking object without loosing active track. Other
than that there are no expected user-level changes.
2022-11-22 11:49:56 +01:00
4d497721ec Refactor: Streamline tracking data copying a bit
Prepare the code to more easily support pointers remapping
for tracking objects.

Should be no functional changes.
2022-11-22 11:49:56 +01:00
3c479b9823 Movie clip: Remove special selection synchronization function
The function is already doing a lot of memory indirections and
sub-optimal lookups, so for the simplicity and robustness of the
system might as well just do copy-on-write update.
2022-11-22 11:49:56 +01:00
7411fa4e0d Cleanup: Better const-correctness in tracking code 2022-11-22 11:49:56 +01:00
44d7ec7e80 Clip Editor: Migrate orientation file to C++
Should be no functional changes.

Some of the code is suboptimal from C++ syntax point of view.
It will be worked on as a further development.
2022-11-22 11:49:56 +01:00
Jeroen Bakker
6dac345a64 GHOST: Vulkan Backend.
This adds a vulkan backend to GHOST. The code was extracted from the
tmp-vulkan branch. The main difference with the original code is that
GHOST isn't responsible for fallback. For Metal backend there is already
an idea that the GPU module is responsible for the fallback, not the system.

For Blender we target Vulkan 1.2 at the time of this patch.
MoltenVK (needed to convert Vulkan calls to Metal) has been added as
a separate package.

This patch isn't useful for end-users, currently when starting blender with
`--gpu-backend vulkan` it would crash as the `VBBackend` doesn't initialize
the expected global structs in the GPU module.

Validated to be working on Windows and Apple. Linux still needs to be tested.

Reviewed By: fclem

Differential Revision: https://developer.blender.org/D13155
2022-11-22 11:29:09 +01:00
efa87ad5eb Merge branch 'blender-v3.4-release' 2022-11-22 10:48:55 +01:00
89349067b6 Merge branch 'blender-v3.4-release' 2022-11-21 18:01:29 -06:00
25ddb576ff Cleanup: add ATTR_FALLTHROGUH 2022-11-21 10:43:12 -08:00
02e045ffbe Merge branch 'blender-v3.4-release' 2022-11-21 19:19:58 +01:00
be1745425c Nodes: Remove "level" building pass on update
The node level was an indication of how deep the node was in the tree.
It was only used for detecting link cycles. Now that the node topology
cache from 25e307d725 exists, this calculation can be removed
completely.

The level calculation was quadratic and very slow on larger node trees.
In the mouse house file with a few thousand nodes, it took 23ms on
every single update. Another benefit is storing slightly less runtime
data, though this was only 2 bytes per node.

Differential Revision: https://developer.blender.org/D16566
2022-11-21 11:34:22 -06:00
b391037424 Nodes: Use topology cache for node exec node list
Instead of generating a dependency sorted node list whenever evaluating
texture or EEVEE/viewport shader nodes, use the existing sorted array
from the topology cache. This may be more efficient because the
algorithm isn't quadratic. It's also the second-to-last place to
use `node.runtime->level`, which can be removed soon.

Differential Revision: https://developer.blender.org/D16565
2022-11-21 11:30:49 -06:00
b0e38f4d04 Cleanup: Strict compiler warnings
Remove `private:` from the PBVHFaceIter. This is not really a C++
class, and the C++ code generates a lot of warnings about unused
fields.

Also mark function static and run clang-format.
2022-11-21 16:57:46 +01:00
Iliya Katueshenock
2f69266c64 Cleanup: use topology cache for frame timings overlay
Differential Revision: https://developer.blender.org/D16571
2022-11-21 16:10:05 +01:00
5d08396970 Merge branch 'blender-v3.4-release' 2022-11-21 15:53:42 +01:00
Jarrett Johnson
1c0f5e79fa DRW: Fix pointcloud selection
Fixes point cloud selection by using new draw call.

Reviewed By: fclem

Maniphest Tasks: T102659

Differential Revision: https://developer.blender.org/D16501
2022-11-21 15:50:32 +01:00
2910be8f19 Cleanup: Correct semantics for .blend listing in file/asset browser
When attempting to load contents of a .blend, the code would just assume
if the number of added items is 0, that means it's not a .blend (but a
directory, although the previous commit fixed that part already).
However there may be situations where a .blend file simply doesn't
contain anything of interest to be added (e.g. when listing assets
only), so have a proper "none" value for this.
2022-11-21 12:29:18 +01:00
c58e7da43e Asset Browser: Avoid non-existent directory prints
When loading asset libraries, there would be a bunch of "non-existent
directory" prints because we were calling a function to list directory
contents on .blend file paths. Make sure the path actually points to a
directory.
2022-11-21 12:29:18 +01:00
9f83ef2149 BLI: make different pointer types compatible in hash tables
For example, this allows doing a lookup using a raw pointer in a
hash table that uses `std::unique_ptr` as key.
2022-11-21 12:02:10 +01:00
85990c877c Merge branch 'blender-v3.4-release' 2022-11-21 11:45:12 +01:00
71efb7805b Merge branch 'blender-v3.4-release' 2022-11-21 11:23:18 +01:00
7a6cdeb242 Merge remote-tracking branch 'origin/blender-v3.4-release' 2022-11-20 18:31:53 -07:00
Wannes Malfait
e83f46ea76 Geometry Nodes: Use Mesh instead of BMesh in split edges node
Rewrite the edge split code to operate directly on Mesh instead
of BMesh. This allows for the use of multi-threading and makes
the node around 2 times faster. Around 15% of the time is spent
just on the creation of the topology maps, so these being cached
on the mesh could cause an even greater speedup. The new node
gave identical results compared to the BMesh version on all the
meshes I tested it on (up to permutation of the indices).

Here are some of the results on a few simple test cases:
(Intel i7-7700HQ (8 cores) @ 2.800GHz , with 50% of edges selected)
|       | 370x370 UV Sphere | 400x400 Grid | Suzanne 4 subdiv levels |
| ----- | ----------------- | -------------- | --------------------- |
| Mesh  | 89ms              | 111ms          | 76ms                  |
| BMesh | 200ms             | 276ms          | 208ms                 |

Differential Revision: https://developer.blender.org/D16399
2022-11-20 15:42:10 -06:00
97e0cc41ca Nodes: Simplify view panning when selecting node
The "Activate Same Type Next/Prev" and "Find Node" operators pan
the view to the newly selected node if it's outside of the view. This
simplifies that check and improves it in the case where the node
is only partially visible-- now it pans in while it didn't before.
2022-11-20 14:45:58 -06:00
984edb2c4e Nodes: Replace implementation of select next/prev type operator
The previous code was quadratic; it looped over every link for every
node. For one large node tree I tested the operator took 20ms. On the
same node tree it now takes less than 1ms.

The change replaces the current building of the "dependency list"
on every call with a use of the topology cache from 25e307d725.
2022-11-20 14:45:58 -06:00
012895e8a1 Cleanup: Remove unused node operator property 2022-11-20 14:45:58 -06:00
47c92bf8de Cleanup: Remove unused boolean in node select function 2022-11-20 14:45:58 -06:00
afb7da5538 Sculpt: Add comment explaining PaintMaskFloodMode 2022-11-20 09:53:29 -08:00
b041678028 Merge branch 'blender-v3.4-release' 2022-11-20 09:44:53 -08:00
42c30f3b9c Sculpt: Fix missing const in recent commit 2022-11-20 08:17:02 -08:00
6b3cee2538 Sculpt: Face iterator API
This patch adds basic face iterators to the sculpt API.  The interface is similar to the existing vertex iterators.  It's not C++ (though it does mark private fields in PBVHFaceIter as private if compiling under C++).

Example:

```
PBVHFaceIter fd;

BKE_pbvh_face_iter_begin(pbvh, node, fd) {

  /* Face reference and face index */
  PBVHFaceRef face = fd->face;
  int face_index = fd->index;

  /* Can read and modify hide flag if it exist (it may not) */
  if (fd->hide) {
    *fd->hide ^= true; /* toggle hide */
  }

  /* Can read and modify face set if it exists */
  if (fd->face_set) {
    *fd->face_set = something;
  }

  /*Can read vertices*/
  for (int i=0; i<fd.verts_num; i++) {
    float *co = SCULPT_vertex_co_get(ss, fd.verts[i]);
  }
}
BKE_pbvh_face_iter_end(fd);
```

Reviewed By: Brecht Von Lommen and Hans Goudey
Differential Revision: https://developer.blender.org/D16225
Ref D16225
2022-11-20 08:08:26 -08:00
5097105b3c Sculpt: Fix T102567: multires crash with high subdivision levels
Previous fix did not work for ngons.
The pbvh leaf limit minimum is now set
to the maximum ngon's vertex count.
2022-11-20 07:32:21 -08:00
41d29a1603 Merge branch 'blender-v3.4-release' 2022-11-20 07:18:45 -08:00
d24c0011cf GPencil: Make Ignore Transparent option more consistent
The code was doing the oposite of the UI option.

Related to T102625
2022-11-20 11:22:20 +01:00
0fd94a1f5e Tests: enable element-wise multiplication for mathutils API tests 2022-11-20 10:42:17 +11:00
9100cc0f39 Merge branch 'blender-v3.4-release' 2022-11-20 10:41:19 +11:00
4d17301ba4 Merge branch 'blender-v3.4-release' 2022-11-19 20:47:16 +01:00
dfb157f9c4 Merge branch 'blender-v3.4-release' 2022-11-19 19:09:49 +01:00
2654c523c1 Cleanup: use nullptr in C++ 2022-11-19 11:51:42 +01:00
7ca9fb9865 Merge branch 'blender-v3.4-release' 2022-11-19 21:15:28 +11:00
Aaron Carlisle
bfb3d78902 Cleanup: Remove disabled edge slide keymap feature
This feature has been disabled since 2.80 but the feature description was still visible in the UI.

Addresses part of T101429

Breaking changes:

- Removes `EDGESLIDE_EDGE_NEXT`
- Removes `EDGESLIDE_PREV_NEXT`

Reviewed By: mano-wii

Maniphest Tasks: T101429

Differential Revision: https://developer.blender.org/D16430
2022-11-18 21:13:33 -05:00
511ac66dab Mesh: Use shared cache for derived triangulation
Use the shared cache system introduced in e8f4010611 for the
"looptris" triangulation cache. This avoids recalculation when meshes
are copied but the positions or topology don't change. The most obvious
improvement is for cases like a large meshes being adjusted slightly
with a simple geometry nodes modifier. In a basic test with a transform
node with a 1 million point grid I observed an improvement of 13%, from
9.75 to 11 FPS, which shows that we avoid spending 6ms recalculating
the triangulation of every update.

This also makes the thread safety for the triangulation data use a
more standard double-checked lock pattern, which is nice because we
can avoid holding a lock whenever the cached data is retrieved.

Split from https://developer.blender.org/D16530
2022-11-18 17:29:24 -06:00
12d7994a48 Cleanup: Improve comment about copying mesh shared caches 2022-11-18 17:02:30 -06:00
c83e33b661 Cleanup: Sort includes in mesh header 2022-11-18 17:01:38 -06:00
05f93b58d3 Fix: Crash when writing mesh after previous commit
Runtime data was accessed after it was explicitly set to null.
2022-11-18 16:24:15 -06:00
1ea169d90e Mesh: Move loose edge flag to a separate cache
As part of T95966, this patch moves loose edge information out of the
flag on each edge and into a new lazily calculated cache in mesh
runtime data. The number of loose edges is also cached, so further
processing can be skipped completely when there are no loose edges.

Previously the `ME_LOOSEEDGE` flag was updated on a "best effort"
basis. In order to be sure that it was correct, you had to be sure
to call `BKE_mesh_calc_edges_loose` first. Now the loose edge tag
is always correct. It also doesn't have to be calculated eagerly
in various places like the screw modifier where the complexity
wasn't worth the theoretical performance benefit.

The patch also adds a function to eagerly set the number of loose
edges to zero to avoid building the cache. This is used by various
primitive nodes, with the goal of improving drawing performance.
This results in a few ms shaved off extracting draw data for some
large meshes in my tests.

In the Python API, `MeshEdge.is_loose` is no longer editable.
No built-in addons set the value anyway. The upside is that
addons can be sure the data is correct based on the mesh.

**Tests**
There is one test failure in the Python OBJ exporter: `export_obj_cube`
that happens because of existing incorrect versioning. Opening the
file in master, all the edges were set to "loose", which is fixed
by this patch.

Differential Revision: https://developer.blender.org/D16504
2022-11-18 16:05:06 -06:00
c0f33814c1 Fix T102611: Unable to change file output node sockets from python
Similar to 84c66fe9db
2022-11-18 14:01:40 -06:00
ab819517fc Fix: Crash when deleting node
Caused by b4c3ea2644 not removing the
dangling pointers to the freed internal links from the vector.
2022-11-18 13:58:36 -06:00
21adf2ec89 Cleanup: Split UV sample geometry node into two functions
This separates the UV reverse sampling and the barycentric mixing of
the mesh attribute into separate multi-functions. This separates
concerns and allows for future de-duplication of the UV sampling
function if that is implemented as an optimization pass. That would
be helpful since it's the much more expensive operation.

This was simplified by returning the triangle index in the reverse
UV sampler rather than a pointer to the triangle, which required
passing a span of triangles separately in a few places.
2022-11-18 13:38:55 -06:00
8fa69dafdd Cleanup: Remove unnecessary using keyword and namespace 2022-11-18 13:38:55 -06:00
b3b00be34e UI: Simplify description for geometry node socket
Simplify wording of "Output true" to a noun.
2022-11-18 13:38:55 -06:00
1c0cd50472 Curves: Add descriptions for normal mode RNA enum
This is only exposed in the "Set Normal Node" now but will be used in
more places in the future.
2022-11-18 13:38:55 -06:00
f5128f219f Cleanup: Use simpler check for Bezier curves 2022-11-18 13:38:55 -06:00
c6e4953719 Fix use-after-free of asset catalog data in node add menu
(Probably requires ASan for a reliable crash.)

Steps to reproduce were:
* Enter Geometry Nodes Workspace
* Press "New" button in the geometry nodes editor header
* Right-click the data-block selector -> "Mark as Asset"
* Change 3D View to Asset Browser
* Create a catalog
* Drag new Geometry Nodes asset into the catalog
* Save the file
* Press Shift+A in the geometry nodes editor

There was a general issue here with keeping catalog pointers around
during the add menu building. The way it does things, catalogs may be
reloaded in between.
Since the Current File asset library isn't loaded in a separate thread,
the use-after-free would always happen in between. For other libraries
it could still happen, but apparently didn't by chance.
2022-11-18 17:52:59 +01:00
b211266226 Merge branch 'blender-v3.4-release' 2022-11-18 16:05:10 +01:00
6c0a5461f7 Fix build error when not using unity build 2022-11-18 16:04:56 +01:00
0151d846e8 Fix MSVC warnings from recent asset system changes
* Mismatching class vs struct forward declaration (one forward
  declaration wasn't needed anymore)
* Unused member warning (`on_load_callback_store_`)
2022-11-18 15:20:16 +01:00
4e38771d5c Merge branch 'blender-v3.4-release' 2022-11-18 13:56:43 +01:00
b4c3ea2644 Cleanup: move internal links of nodes to runtime data
No functional changes are expected.
2022-11-18 13:46:35 +01:00
40b63bbf5b Merge branch 'blender-v3.4-release' 2022-11-18 12:50:21 +01:00
7b82d8f029 Nodes: move most runtime data out of bNode
* This patch just moves runtime data to the runtime struct to cleanup
  the dna struct. Arguably, some of this data should not even be there
  because it's very use case specific. This can be cleaned up separately.
* `miniwidth` was removed completely, because it was not used anywhere.
  The corresponding rna property `width_hidden` is kept to avoid
  script breakage, but does not do anything (e.g. node wrangler sets it).
* Since rna is in C, some helper functions where added to access the
  C++ runtime data from rna.
* This size of `bNode` decreases from 432 to 368 bytes.
2022-11-18 12:47:02 +01:00
754f674977 Cleanup: Missing trailing underscore in private asset system member vars
See style guide:
https://wiki.blender.org/wiki/Style_Guide/C_Cpp#Class_data_member_names
2022-11-18 12:45:56 +01:00
d5c8d3e661 Cleanup: Avoid unnecessary/annoying type alias in asset system
A `using FooPtr = std::unique_ptr<Foo>` isn't that useful usually, just
saves a few character stokes. It obfuscates the underlying type, which
is usually relevant information. Plus, `Ptr` for a unique pointer is
misleading (should be `UPtr` or similar).
2022-11-18 12:45:56 +01:00
61d0f77810 Cleanup: Better follow class layout style guide in asset headers
Move "using" declarations and member variables to the top of the class.
See https://wiki.blender.org/wiki/Style_Guide/C_Cpp#Class_Layout.

Changes access specifiers of some variables from public/protected to
private, there was no point in not having them private.
2022-11-18 12:45:56 +01:00
e31f282917 Cleanup: Minor cleanups in asset system headers
- Move main comment on class to header comment where it's more visible.
- Improve comment.
- Move stdlib includes first, like we do it usually
- Separate includes my code module
- Remove unnecessary forward declarations
2022-11-18 12:45:56 +01:00
7c0cecfd00 Asset system: Move catalog tree code to own files
The catalog code is already quite complex, I rather keep the tree stuff
separate in a more focused unit.
2022-11-18 12:45:56 +01:00
Iliya Katueshenock
6239e089cf Nodes: cache children of frame nodes
This allows for optimizations because one does not have to iterate
over all nodes anymore to find all nodes within a frame.

Differential Revision: https://developer.blender.org/D16106
2022-11-18 11:20:13 +01:00
dec459e424 Cleanup: move some files that use nodes to C++ 2022-11-18 11:08:52 +01:00
bc886bc8d7 Cleanup: Use int64_t for size methods.
- BKE_pbvh_pixels.hh
2022-11-18 10:45:06 +01:00
Pablo Vazquez
2c096f17a6 UI: Refactor Node Context Menu
The Node Context Menu contains options that are not always available for
the selected nodes, and misses important entries for accesibility.

This patch covers the following:
* Add operators to join and remove nodes from frames.
* Sort and group entries more logically and follow Blender conventions.
* Add `Insert into Group`
* Show group actions only on nodes that support it.
* Move all toggles to a sub-menu called `Show/Hide`.
* When nothing is selected, show Add menu, links actions, and paste.

Inspired by RightClickSelect proposals and community feedback.

See D16216 for images.

Reviewed By: HooglyBoogly

Differential Revision: https://developer.blender.org/D16216
2022-11-17 23:27:12 +01:00
8d77973dd7 Fix T99125: Curve mapping widget removes all vector points
Add a new flag value `CUMA_REMOVE` to explicitly tag duplicate points
for removal. This prevents a bug where all curve points with vector
handles were deleted, when removing duplicate curve points while
updating the widget. This happened, because the flag value used to tag
points for removal was the same as the value of `CUMA_HANDLE_VECTOR`
used to store the handle type of the curve point.

Reviewed By: Hans Goudey

Differential Revision: http://developer.blender.org/D16463
2022-11-17 22:00:17 +01:00
609a681fb5 Merge branch 'blender-v3.4-release' 2022-11-17 17:51:23 +01:00
ca253df623 Cleanup: move transform_snap to C++ 2022-11-17 17:13:49 +01:00
f74234895a Merge branch 'blender-v3.4-release' 2022-11-17 17:05:03 +01:00
780b29109c Merge branch 'blender-v3.4-release' 2022-11-17 16:05:01 +01:00
6bf13d0734 Fix crash when loading different file with asset browser open
Steps to reproduce were:
- Open an asset browser
- Open an asset library with assets in it
- Load a different file (e.g. File -> New -> General)

Didn't see a nice way to fix this with the current pre file load handler
callback we use for freeing asset libraries. Using this is cleaner, but
for now, the relationship between UI and asset system is too close
still, so better do explicit freeing at the right point in time.
2022-11-17 15:50:08 +01:00
c7bd508766 Cleanup outliner instancing collection code.
Remove needless call to `id_lib_extern`, this is already part of
`id_ud_plus` code.
2022-11-17 15:46:31 +01:00
53f401ea63 Merge remote-tracking branch 'origin/blender-v3.4-release' 2022-11-17 07:29:58 -07:00
576d99e59a Cleanup: move texture nodes to C++
No functional changes are expected. The goal here is to make
further refactorings to the nodes system easier.
2022-11-17 13:04:45 +01:00
dad8f4ac09 Merge branch 'blender-v3.4-release' 2022-11-17 12:21:10 +01:00
59f8061a34 Assets: Refactor asset representation storage
- Move code to manage storage to own class in own file, separates
  concerns and different levels of abstraction better.
- Store local ID assets separately in the storage class for more
  efficient lookups (e.g. for ID remapping).
- Make API function names and comments more complete.
2022-11-17 11:55:38 +01:00
67869432f2 Asset system: Remap local asset ID pointers as part of UI remapping
After checking with @mont29, this is much prefered over calling this in
BKE directly.
2022-11-17 11:55:38 +01:00
dd260d2f03 Fix T102554: Crash when Use Nodes is enabled
Blender crashes when enabling Use Nodes after the viewport compositor is
already enabled.

This happens because the active viewer key is not yet initialized for the
node tree at this point, which eventually leads to a nullptr.

This patch fixes that by returning the root context in case the active
viewer key is not yet initialized.
2022-11-17 11:12:41 +02:00
9a09adb7af Fix: Missing bounding box dirty tag when clearing mesh geometry
Similar to 801451c459.
2022-11-16 18:30:04 -06:00
bf0180d206 Sculpt: Remove some normal calculation with deformed sculpting
Remove unnecessary (and No-op) normal calculation when sculpting on top
of deformed coordinates. Examples are shape keys and deform modifiers.
On a 1 million face mesh, this saved 100ms per stroke update.
This function actually did nothing since cfa53e0fbe,
so that large improvement comes for free.

Conceptually this is correct because when sculpting on deformed
coordinates, we don't change the positions of the base mesh directly.
In the future it might be better to allocate a separate array for
normals when using deformed coordinates, but it's not clear that's
necessary yet.
2022-11-16 18:22:09 -06:00
c481549870 Cleanup: Remove unused node clipboard type handling 2022-11-16 18:01:34 -06:00
845a3573f5 Cleanup: Improve curves comments 2022-11-16 17:54:51 -06:00
Yann Doersam
b7a4f79748 Nodes: Allow pasting common nodes between editor types
Ignore difference between source and target tree type. When copying
nodes from clipboard to target tree compatibility is checked. After
pasting nodes only the links between nodes that are existing in the
node tree are added.

See Task T95033.

Differential Revision: https://developer.blender.org/D16349
2022-11-16 17:36:30 -06:00
0d3a33e45e Geometry Nodes: Add "Exists" output to Named Attribute input node
As described in T100004, add an output socket that returns true if the
attribute accessed by the node was already present in that context.

Initial patch by Edward (@edward88).

Differential Revision: https://developer.blender.org/D16316
2022-11-16 17:27:28 -06:00
145839aa42 Fix T102365: Wireframe skips edges after recent cleanup
10131a6f62 replaced use of the `ME_EDGERENDER` flag with
`ME_EDGEDRAW`. However, left over from previous refactors, code
for leaving edit mode set that flag based on the edge angle. Edge angle
wireframe hiding is currently supposed to be adjustable with the
wireframe overlay settings. This patch restores the previous behavior
from before the cleanup commit.

Differential Revision: https://developer.blender.org/D16451
2022-11-16 16:49:02 -06:00
cacfaaa9a5 Fix T92416: First render with unknown image colorspace looks different
The issue here was that the Barbershop benchmark scene was saved with a
custom OCIO config, which leads to some textures having a unknown
colorspace when loading with a default installation.

This is automatically fixed by Blender during image loading, but since
Cycles queried the colorspace before actually loading the image, it
didn't get the updated value in the first render.

To fix this, just re-query the colorspace after the image is loaded.

Note that non-packed images still get treated as raw data if the
colorspace is unknown, but this is at least consistent and doesn't
magically change when you press F12 a second time.

Differential Revision: https://developer.blender.org/D16427
2022-11-16 23:42:23 +01:00
1677ddb7ee Sculpt: Avoid retrieving vertices attribute when flushing positions
Currently the positions are retrieved again for every vertex. This is
slow, and will get slower when positions are stored as a named
attribute. Saves around 0.5ms per stroke update when a modifier
is active in my test with a 1 million face mesh.
2022-11-16 14:54:20 -06:00
87ace7207d Revert "Sculpt/Paint: Use cached triangulation when building PBVH"
This reverts commit 676137f043.

This change worked locally with a specific test file and local changes,
but didn't work in general, since we don't reliably retrieve the new
looptris after setting them the first time. This can be improved again
in the future, but probably along with a more general look about ownership
is handled with PBVH.
2022-11-16 14:29:12 -06:00
676137f043 Sculpt/Paint: Use cached triangulation when building PBVH
This avoids recalculation of looptri derived triangulation whenever
switching to sculpt mode or whenever the PBVH is rebuilt, which can
happen after strokes in some situations. In my tests actually building
the PBVH is much more expensive (300ms), but this saves 6ms when
switching to sculpt mode and in other situations.

The cost is the possibility of higher memory usage because the cache
will live in the original main database mesh. However, the impact of
that will be smaller when the shared cache concept from D16204 is
applied to this data too.
2022-11-16 13:35:27 -06:00
25b3515324 Cleanup: Remove unnecessary mesh normals debugging function
This assertion function came from when derived normal data was stored
as custom data layers, which made it harder to keep track of whether
it was allocated and propagated. Nowadays it's all relatively easy to
predict, so there's no point in keeping this function around-- it only
makes code longer and more complex looking.
2022-11-16 13:07:49 -06:00
6cf4999e50 Cleanup: Slightly improve mesh normals and runtime comments
Also resolved an unused variable warning caused by an earlier cleanup.
2022-11-16 12:54:48 -06:00
89ca298210 Cleanup: Don't set mesh normals directly for metaballs
At the cost of a memory copy, this allows using a C++ type to store
normals in mesh runtime data in upcoming patches.
2022-11-16 12:37:35 -06:00
99c970a94d UI: Allow Joining of Tiny Screen Areas
Allow joining of areas that are below our minimum sizes.

See D16522 for more details.

Differential Revision: https://developer.blender.org/D16522

Reviewed by Campbell Barton
2022-11-16 09:53:06 -08:00
Rateeb Riyasat
cce4271b31 Add regression test for triangulate faces
Basic test for `quads_convert_to_tris`

As the operator name in blender is different from the bpy name, the operator
name in blender was opted in terms of the blend file collection name as well
as the test name. This was done so that new developers in the future can
easier understand which operator this corresponds to. Although it might be
better to change this to the bpy name so as to be consistent with the rest
of the codebase.

Updated blend file `lib/tests/modeling/operators.blend` has been
committed as rBL63101.

Reviewed By: zazizizou, mont29

Differential Revision: https://developer.blender.org/D16072
2022-11-16 18:34:11 +01:00
801451c459 Fix: Missing clearing of mesh triangulation data
Missed in e412fe1798
2022-11-16 11:25:54 -06:00
a81abbbb8f Fix T100772: Joins with Interfering Tiny Areas
Detect unlikely situation of an area (that is smaller than our allowed
minimums) sharing an edge that will be moved during a join.

See D16519 for more details.

Differential Revision: https://developer.blender.org/D16519

Reviewed by Campbell Barton
2022-11-16 09:08:15 -08:00
6077fe7fae Merge branch 'blender-v3.4-release' 2022-11-16 18:00:48 +01:00
f7ca0ecfff Cleanup: Cognitive complexity in mask animation filtering
No functional changes expected.
2022-11-16 15:29:14 +01:00
4f2ce8d8d3 DrawManager: Remove experimental draw lock.
The draw locking was implemented for project Heist and moved behind an experimental
feature after it became clear there were issues with it. Nowadays it isn't used,
and the idea is to replace it with a different solution after all draw engines have
been ported to the new draw manager API. {T102180}

This patch will remove the experimental feature as it isn't used, or useful.
2022-11-16 15:18:39 +01:00
5a05fa8f74 Merge branch 'blender-v3.4-release' 2022-11-16 11:01:29 -03:00
e4871b2835 EEVEE/Viewport: Make info text when compiling shaders more clear
The N in `Compiling Shaders N` in Text Info, is the number of how many
shaders are left in the queue. It's a countdown, but this wasn't mentioned
and led to confusion.

Ideally this text would be like Cycles' "Samples 50/100", but in EEVEE it's
not easy to guess how many shaders are left (this number could even go
up mid-compilation).

In the past there used to be a progress bar but it's also confusing because
it could be 90/100 shaders done, but the remaining 10 are slow to compile.

Change the text to "Compiling Shaders (N remaining)" so it's easier to
understand what is going on. Similar to how some game engines do.
2022-11-16 14:28:21 +01:00
0ebb7ab41f Geometry Nodes: disable unreachable nodes in evaluator
Nodes that were not connected to any output could still impact performance.
While they were never executed, sometimes their inputs could keep references
to geometries that other nodes want to modify. That caused unnecessary geometry
copies, because a geometry can only be modified if it is not shared.

Now, inputs that will never be used are tagged accordingly and they will never
have references to geometries that others might want to modify.
2022-11-16 14:26:11 +01:00
edcce2c073 Cleanup: correct inverted variable name 2022-11-16 13:19:23 +01:00
1e88fc251f Cleanup: remove unused data member 2022-11-16 13:19:23 +01:00
71ce178b3e Merge branch 'blender-v3.4-release' 2022-11-16 12:55:44 +01:00
06e9d40c33 Merge branch 'blender-v3.4-release' 2022-11-16 12:32:51 +01:00
ec9acdeac2 Merge branch 'blender-v3.4-release' 2022-11-16 21:37:01 +11:00
91d3cc51c3 Merge branch 'blender-v3.4-release' 2022-11-16 21:36:55 +11:00
1aa851e939 Mesh: Don't tag normals and triangluation dirty when translating
This only applies to procedural operations rather than edit mode
operations, but it might save some recalculations of these caches
for the transform geometry node in some cases.
2022-11-15 23:58:34 -06:00
c8c14d1681 Cleanup: Remove unnecessary clearing of mesh runtime data
The calls in the remesh operator were unnecessary because the mesh is
about to be replaced anyway, and nothing invalidates the caches, and
the call in BMesh -> Mesh conversion was unnecessary because the caches
are cleared at the top of the function already.
2022-11-15 23:43:22 -06:00
90fb1cc4e6 Cleanup: Remove unnecessary dirty normal tags
These were redundant for one of a few reasons:
- A call to `BKE_mesh_tag_coords_changed` was correct instead
- A mesh has dirty normals when created from scratch anyway
- The call was redundant with `BKE_mesh_runtime_clear_geometry`
2022-11-15 20:28:39 -06:00
192cd76b7c Fix: Build error on MSVC with mismatched struct/class keywords 2022-11-15 20:26:33 -06:00
e412fe1798 Cleanup: Simplify freeing and clearing mesh runtime data
Separate freeing and clearing mesh runtime data in a more obvious way.
This makes it easier to see what data is meant to be cleared on certain
changes, rather than conflating it with freeing all of the runtime
caches.

Also comment and reduce the surface area of the "mesh runtime" API.
The redundancy in some functions made it confusing which one should
be used, resulting in subtle bugs or unnecessary boilerplate code.

Also, now bke::MeshRuntime is able to free all the data it owns by
itself, which makes this area easier to reason about. That required
changing the interface of a few functions to avoid passing Mesh when
they really just dealt with some runtime struct.

With more RAII semantics in the future, more of this manual freeing
will become unnecessary.
2022-11-15 20:26:33 -06:00
550c51b08b Merge branch 'blender-v3.4-release' 2022-11-16 12:33:59 +11:00
71067a58ec Merge branch 'blender-v3.4-release' 2022-11-16 12:33:55 +11:00
6940c4b602 Merge branch 'blender-v3.4-release' 2022-11-16 12:33:53 +11:00
0ac19425d4 Merge branch 'blender-v3.4-release' 2022-11-16 12:33:49 +11:00
e3ddfedbb6 Merge branch 'blender-v3.4-release' 2022-11-16 12:33:46 +11:00
5e203c4f4b Merge branch 'blender-v3.4-release' 2022-11-16 12:33:43 +11:00
25630ab2a1 Merge branch 'blender-v3.4-release' 2022-11-16 12:33:38 +11:00
87ba0dcaca Merge branch 'blender-v3.4-release' 2022-11-15 18:55:18 -06:00
60523ea523 Cleanup: format 2022-11-16 12:59:47 +13:00
9f2f9dbca6 Merge branch 'blender-v3.4-release' 2022-11-16 11:28:57 +13:00
da82d46a5a Cleanup: simplify asserts in uv unwrapper 2022-11-16 10:36:09 +13:00
65944e7e84 Merge branch 'blender-v3.4-release' 2022-11-15 22:13:08 +01:00
e8f4010611 Geometry: Cache bounds min and max, share between data-blocks
Bounding box calculation can be a large in some situations, especially
instancing. This patch caches the min and max of the bounding box in
runtime data of meshes, point clouds, and curves, implementing part of
T96968.

Bounds are now calculated lazily-- only after they are tagged dirty.
Also, cached bounds are also shared when copying geometry data-blocks
that have equivalent data. When bounds are calculated on an evaluated
data-block, they are also accessible on the original, and the next
evaluated ID will also share them. A geometry will stop sharing bounds
as soon as its positions (or radii) are changed.

Just caching the bounds gave a 2-3x speedup with thousands of mesh
geometry instances in the viewport. Sharing the bounds can eliminate
recalculations entirely in cases like copying meshes in geometry nodes
or the selection paint brush in curves sculpt mode, which causes a
reevaluation but doesn't change the positions.

**Implementation**
The sharing is achieved with a `shared_ptr` that points to a cache mutex
(from D16419) and the cached bounds data. When geometries are copied,
the bounds are shared by default, and only "un-shared" when the bounds
are tagged dirty.

Point clouds have a new runtime struct to store this data. Functions
for tagging the data dirty are improved for added for point clouds
and improved for curves. A missing tag has also been fixed for mesh
sculpt mode.

**Future**
There are further improvements which can be worked on next
- Apply changes to volume objects and other types where it makes sense
- Continue cleanup changes described in T96968
- Apply shared cache design to more expensive data like triangulation
  or normals

Differential Revision: https://developer.blender.org/D16204
2022-11-15 13:48:00 -06:00
b2d9716b4a Cleanup: Remove DerivedMesh use in UV unwrapping
Previously the UV unwrapping handling for subsurf modifiers used
`DerivedMesh`to implement the subdivision. Since we're trying to remove
`DerivedMesh` in general, and since this just made use of the `Mesh`
data anyway, it's relatively simple to remove it here. Combined with
D15939, this makes it possible to remove more `DerivedMesh` code.

Differential Revision: https://developer.blender.org/D16487
2022-11-15 13:48:00 -06:00
d775995dc3 DRW: Manager: Add possibility to bind UBO and VBO as SSBO through commands
This exposes `GPU_uniformbuf_bind_as_ssbo` and `GPU_vertbuf_bind_as_ssbo`
through the `draw::Pass` API.
2022-11-15 20:16:25 +01:00
a9a5f7ce17 GPU: UniformBuf: Add GPU_uniformbuf_clear_to_zero
This allows clearing the entire buffer directly on GPU.
2022-11-15 20:16:25 +01:00
b8fd474bb1 Merge branch 'blender-v3.4-release' 2022-11-15 18:41:24 +01:00
50b257715f Assets: Avoid quadratic complexity when freeing asset libraries
Using a vector to store assets means we have to lookup the position of
the asset to be able to remove/free it. Use a `blender::Set` instead for
(nearly?) constant time removal.
2022-11-15 17:44:47 +01:00
2f442234e7 Merge branch 'blender-v3.4-release' 2022-11-15 08:42:55 -08:00
2d251478bb Sculpt: Fix mask from cavity settings issues
Mask from cavity can now pull settings from three
places: the operator properties, scene tool settings
or the brush.  This is needed to make the "create mask"
button work as expected.
2022-11-15 08:39:06 -08:00
277b2fcbfa GPU: Update Vulkan backend with latest API changes.
UniformBuf::bind_as_ssbo has been introduced.
2022-11-15 16:24:30 +01:00
4fb02d7f8e Fix T102482: Crash loading geometry nodes assets after file load
Asset library data is destructed on file load. Asset lists (weak and
hopefully temporary design) contain pointers into it that would dangle
then. Make sure the asset lists are destructed before the asset library
data.
2022-11-15 16:21:06 +01:00
cff78860ac Merge branch 'blender-v3.4-release' 2022-11-15 12:08:05 -03:00
23d0b5dcd2 Merge branch 'blender-v3.4-release' 2022-11-15 16:04:35 +01:00
a859837cde Cleanup: Move OptiX denoiser code from device into denoiser class
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).

Differential Revision: https://developer.blender.org/D16502
2022-11-15 15:50:01 +01:00
a94c3aafe5 Merge branch 'blender-v3.4-release' 2022-11-15 14:53:37 +01:00
ff40b90f99 GPU: UniformBuffer: Add possibility to bind as SSBO
This way UBOs can be modified directly in shader just like VBOs and IBOs.
2022-11-15 14:41:38 +01:00
5db84d0ef1 GPU: State: Add GPU_BARRIER_UNIFORM
This allows to synchronise uniform buffer writes from compute shader
when an UBO is bound as SSBO.
2022-11-15 14:41:38 +01:00
f0ce95b7b9 GPU: Enabled Metal test cases.
This commit enabled the metal gpu backend test cases. These test cases
will currently fail, but are by default disabled.
2022-11-15 13:14:05 +01:00
d2728868c0 GPU: Improve Codegen variable names
Include the node name and parameter index in the variable name for easier debugging.
(Enabled for debug builds only)

Reviewed By: fclem, jbakker

Differential Revision: https://developer.blender.org/D16496
2022-11-15 13:06:58 +01:00
191a3bf2ad Merge branch 'blender-v3.4-release' 2022-11-15 13:05:49 +01:00
66939d47b1 Merge branch 'blender-v3.4-release' 2022-11-15 12:44:39 +01:00
84dce9c1fa Merge branch 'blender-v3.4-release' 2022-11-15 12:11:12 +01:00
1b5ceb9a75 Merge branch 'blender-v3.4-release' 2022-11-15 12:05:11 +01:00
d95216b94c Merge branch 'blender-v3.4-release' 2022-11-15 11:35:29 +01:00
d46317cf3c Fix T101775: grease pencil keyframe not filtered in dopesheet summary
Grease pencil data keyframes were listed twice in the summary.

First by the generic object data listing,
which did not handle properly grease pencil objects,
and did not account for the grease pencil filter.
Second by the specific grease pencil function.

Now only the second call is made,
and the filter hides keyframes in summary as well.

Reviewed By : Jeroen Bakker, Falk David

Differential Revision: https://developer.blender.org/D16369
2022-11-15 09:49:39 +01:00
7f80b5e675 Animation: rearrange grease pencil channels in the main dopesheet
Operations to rearrange channels in the main dopesheet
did not cover grease pencil layer channels.
Now grease pencil layer channels can be moved up and down
in the main dopesheet just like other channels.

Reviewed By: Sybren A. Stüvel

Differential Revision: https://developer.blender.org/D15542
2022-11-15 09:09:48 +01:00
91215ace72 Merge branch 'blender-v3.4-release' 2022-11-15 08:08:17 +01:00
2a41cd46ba Cleanup: format 2022-11-15 16:43:18 +11:00
f0f97e18c1 Cleanup: quiet unused variable warning 2022-11-15 16:41:50 +11:00
f396ab236a Merge branch 'blender-v3.4-release' 2022-11-15 16:37:56 +11:00
be024ee7b7 Merge branch 'blender-v3.4-release' 2022-11-15 15:32:52 +11:00
435c824a5f Merge branch 'blender-v3.4-release' 2022-11-15 15:32:47 +11:00
2205e5f63f Merge branch 'blender-v3.4-release' 2022-11-15 15:32:34 +11:00
Iliya Katueshenock
efcd587bc2 Geometry Nodes: Image Info Node
This commit adds a new "Image Info" node to retrieve various
information from an image like its width, height, and whether
it has an alpha channel. It is also possible to retrieve the FPS
and frame count of video files.

Differential Revision: https://developer.blender.org/D15042
2022-11-14 18:55:51 -06:00
b64042b482 Merge branch 'blender-v3.4-release' 2022-11-14 18:21:35 -06:00
d158db475b Merge branch 'blender-v3.4-release' 2022-11-14 17:53:25 -06:00
e3ee913932 Cleanup: Resolve unused variable warning in draw module 2022-11-14 14:50:10 -06:00
103fe4d1d1 Merge branch 'blender-v3.4-release' 2022-11-14 20:09:02 +01:00
6873aabf93 Cleanup: BVH utils: Remove print, use spans instead of pointers 2022-11-14 11:59:40 -06:00
b0e2e45496 Cycles: Enable MetalRT pointclouds & other fixes
Code authored by Marco Giordano.

This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs:
- Incorrect kernel hashing
- Missing specialisation constants
- Incorrect visibility filtering
- Missing null pointer check

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16499
2022-11-14 16:39:18 +00:00
85bbcb32eb Merge branch 'blender-v3.4-release' 2022-11-14 16:46:01 +01:00
276d7f7c19 Mesh: Avoid calculating normals when building BVH tree
Though they are sometimes used by users of the BVH tree, mostly
vertex normals when building the BVH tree is unnecessary. Skip it
instead and avoid storing the vertex normals in the BVH tree cache.
They are just calculated in the few places they are actually needed.
This should save at least a few percent of the runtime in some cases
where the normals weren't needed otherwise.
2022-11-14 07:56:53 -06:00
cc12d2a5a0 Cleanup: Reduce indentation and variable scope in BVH utils 2022-11-14 07:56:53 -06:00
1aaf4ce0a4 Cleanup: Use C++ BitVector instead of BLI_bitmap for BVH utils
This gives a friendlier interface, an inline buffer, RAII, etc.
Also switch some BMesh functions that were only used by the snap
system's use of BVH utils.
2022-11-14 07:56:53 -06:00
6192695a94 Cleanup: Allow using C++ features in BMesh header functions
Generally the `extern "C" {` brackets shouldn't be added around other
headers since it causes problems when using C++ features in them.
Follow that convention for the "bmesh.h" header.
2022-11-14 07:56:53 -06:00
28dc3b0b84 Cleanup: Move bmesh_iterators.c to C++ 2022-11-14 07:56:53 -06:00
37f50ffdbc Cleanup: Braces around initialization warning
There is no need to initialize the bounding box as it is fully
initialized by the BKE_boundbox_init_from_minmax().
2022-11-14 14:43:41 +01:00
Ejner Fergo
b557e4317d Gizmo toggle for Movie Clip Editor
This patch adds a "Show Gizmo" toggle to the Movie Clip Editor header, for consistency with other editors.

{F13892765}

Differential Revision: https://developer.blender.org/D16437
2022-11-14 14:35:32 +01:00
Mikhail Matrosov
857bb1b5ec Cycles: improve adaptive sampling for overexposed scenes
Render time is reduced for overexposed scenes, by taking into account absolute
light intensity for adaptive sampling.

This can negatively affect some scenes where compositing or color management
are used to make the scene much darker or lighter. For best results adjust the
Film > Exposure setting to bring the intensity into a good range, and then do
further compositing and color management on top of that. Note that this setting
is different than color management exposure.

Previously Cycles' adaptive sampling used sqrt(I) to normalize noise level to
conform to a viewer's eye sensitivity. It is great for darker regions of the
image, but also requests too much samples in bright regions, sometimes several
times more than needed. Highlights can tolerate more noise because in most
examples it is still less noticeable then the noise in darker areas in the same
render.

Differential Revision: https://developer.blender.org/D16392
2022-11-14 14:14:31 +01:00
d7f1430f11 Merge branch 'blender-v3.4-release' 2022-11-14 14:04:13 +01:00
Colin Basnett
4dd19a1ad6 UI: disable curve map & profile zoom buttons at max/min zoom level
Disable the zoom in and out buttons on the when they would have no effect.

This also removes an incorrect comment that indicates the maximum zoom level
was 20x when in fact it was 25x.

Differential Revision: https://developer.blender.org/D16252
2022-11-14 14:03:50 +01:00
187bce103b DRW: Fix compilation issues in inline functions 2022-11-14 14:01:23 +01:00
0fe21fe7b9 Merge branch 'blender-v3.4-release' 2022-11-14 13:20:21 +01:00
ea2dda306c Asset system: New asset system code module (with files from BKE)
Adds a new `source/blender/asset_system` directory and moves asset
related files from BKE to it. More asset related code can follow
(e.g. asset indexing, ED_assetlist stuff) but needs further work to
untangle it. I also kept `BKE_asset.h` and `asset.cc` as is, since they
deal with asset DNA data mostly, thus make sense in BKE.

Motivation:
- Makes the asset system design more present (term wasn't even used in
  code before).
- An `asset_system` directory is quite descriptive (trivial to identify
  core asset system features) and makes it easy to find asset code.
- Asset system is mostly runtime data, with little relation to other
  `Main`/BKE/DNA types.
- There's a lot of stuff in BKE already. It shouldn't be just a dump for
  all stuff that seems core enough.
- Being its own directly helps us be more mindful about encapsulating
  the module well, and avoiding dependencies on other modules.
- We can be more free with splitting files here than in BKE.
- In future there might be an asset system BPY module, which would then
  map quite nicely to the `asset_system` directory.

Checked with some other core devs, consensus seems that this makes
sense.
2022-11-14 12:46:34 +01:00
6a96edce2e Merge branch 'blender-v3.4-release' 2022-11-14 12:24:45 +01:00
bfb6ea898b DRW: View: Add base for multi-view support
This implements the base needed for supporting multiple view concurently
inside the same drawcall.

The view used by common macros and view related functions is indexed using
a global variable `drw_view_id` which can be set arbitrarly or read
from the `drw_ResourceID`.

This is needed for EEVEE-Next shadow but can be used for other purpose
in the future.

Note that a shader specialization is needed for it to work. `DRW_VIEW_LEN`
needs to be defined to the amount of view the shader will access.

The number of views contained in a `draw::View` is set at construction
time.

Note that the maximum number of object correctly drawn by the shaders
using multiple views will be lower than thoses who don't.
2022-11-14 11:17:38 +01:00
ab3fcd62cc Cleanup: DRW: Remove two clang-tidy warnings 2022-11-14 11:17:38 +01:00
5fe146e505 Cleanup: Move mesh_remap.c to C++
To facilitate further mesh data structure refactoring.
2022-11-13 20:27:37 -06:00
909f47e0e1 UV: fix crash with uv copy on empty selection
Introduced in 721fc9c1c9
2022-11-14 13:37:12 +13:00
e0cb3e0a39 Merge branch 'blender-v3.4-release' 2022-11-14 10:38:01 +11:00
b8d1022dff Merge branch 'blender-v3.4-release' 2022-11-14 10:37:57 +11:00
dc513a0af8 Cleanup: Disable mesh normal debug time printing
Left enabled mistakenly by d63ada602d.
2022-11-13 14:19:20 -06:00
3eb2bc2c3f Cleanup: Move lineart_cpu.c to C++
To enable further mesh data structure refactoring-- access to loose
edges in particular.
2022-11-13 14:16:24 -06:00
fba7461e1a UV: fix compile on windows
Remove VLAs for compiling on windows.
Regression from 721fc9c1c9
2022-11-14 08:22:36 +13:00
d17f5bcd8f Fix T95335 Bevel operator Loop Slide overshoot.
If the edge you are going to slide along is very close to in line
with the adjacent beveled edge, then there will be sharp overshoots.
There is an epsilon comparison to just abandon loop slide if this
situation is happening. That epsilon used to be 0.25 radians, but
bug T86768 complained that that value was too high, so it was changed
to .0001 radians (5 millidegrees). Now this current bug shows that
that was too aggressively small, so this change ups it by a factor
of 10, to .001 radians (5 centidegrees). All previous bug reports
remained fixed.
2022-11-13 14:09:27 -05:00
7419e291e8 Merge branch 'blender-v3.4-release' 2022-11-13 22:54:38 +05:30
ce9fcb15a3 DRW: Manager: Fix ClearMulti breaking compilation on Mac
The error was:
`draw_pass.hh:1055:16: error: call to implicitly-deleted default constructor of 'blender::draw::command::Undetermined [3]'
2022-11-13 18:02:17 +01:00
bd622aef3c BLI: Fix ListBaseWrapper::get wrong return type 2022-11-13 16:48:30 +01:00
c255be2d02 DRW: Manager: Add bind_texture command for vertex buffer
This allows the same behavior as with `DRW_shgroup_buffer_texture`.
2022-11-13 16:47:43 +01:00
cd64615425 DRW: Manager: Add ClearMulti command
Allows to record `GPU_framebuffer_multi_clear` inside `draw::Pass`.
2022-11-13 16:23:22 +01:00
930d14cc62 DRW: Manager: Finish / change implementation of framebuffer_set command
Use reference instead of direct pointer. This is because framebuffers
often use temp textures and are configured later just before submission.
2022-11-13 16:16:26 +01:00
f1466ce9a8 DRW: Wrappers: Allow taking reference of the framebuffer object
This is in order to make it work with the new `framebuffer_set` command
which requires a `GPUFrameBuffer **`.
2022-11-13 16:02:57 +01:00
0e4bdd428c DRW: Wrappers: Allow trivial types inside draw::SwapChain
This allows to use pointers and such other trivial types which cannot
implement the `swap` mehod.
2022-11-13 16:00:58 +01:00
67dfb61700 DRW: Wrappers: Avoid default vector length of 0 if sizeof(T) is large
This increases the default size to some reasonable value (>512bytes) and
allocate at least 1 element.
2022-11-13 15:59:23 +01:00
d0f05ba915 Cleanup: fix compiler error/warnings 2022-11-13 11:29:04 +01:00
721fc9c1c9 UV: implement copy and paste for uv
Implement a new topology-based copy and paste solution for UVs.

Usage notes:

* Open the UV Editor

* Use the selection tools to select a Quad joined to a Triangle joined to another Quad.
* From the menu, choose UV > UV Copy
 * The UV co-ordinates for your quad<=>tri<=>quad are now stored internally

* Use the selection tools to select a different Quad joined to a Triangle joined to a Quad.
* (Optional) From the menu, choose UV > Split > Selection

* From the menu, choose UV > UV Paste
 * The UV co-ordinates for the new selection will be moved to match the stored UVs.

Repeat selection / UV Paste steps as many times as desired.
For performance considerations, see https://en.wikipedia.org/wiki/Graph_isomorphism_problem

In theory, UV Copy and Paste should work with all UV selection modes.
Please report any problems.

A copy has been made of the Graph Isomorphism code from https://github.com/stefanoquer/graphISO
Copyright (c) 2019 Stefano Quer stefano.quer@polito.it GPL v3 or later.

Additional integration code Copyright (c) 2022 by Blender Foundation, GPL v2 or later.

Maniphest Tasks: T77911
Differential Revision: https://developer.blender.org/D16278
2022-11-13 12:48:17 +13:00
533c396898 Merge remote-tracking branch 'origin/blender-v3.4-release' 2022-11-12 13:32:26 -07:00
115cf5ef98 Cleanup: Move cloth.c to C++
To support further mesh data structure refactoring.
2022-11-12 12:14:09 -06:00
a6c822733a BLI: improve CPPType system
* Support bidirectional type lookups. E.g. finding the base type of a
  field was supported, but not the other way around. This also removes
  the todo in `get_vector_type`. To achieve this, types have to be
  registered up-front.
* Separate `CPPType` from other "type traits". For example, previously
  `ValueOrFieldCPPType` adds additional behavior on top of `CPPType`.
  Previously, it was a subclass, now it just contains a reference to the
  `CPPType` it corresponds to. This follows the composition-over-inheritance
  idea. This makes it easier to have self-contained "type traits" without
  having to put everything into `CPPType`.

Differential Revision: https://developer.blender.org/D16479
2022-11-12 18:33:31 +01:00
a145b96396 Merge branch 'blender-v3.4-release' 2022-11-12 16:51:53 +01:00
Iliya Katueshenock
99fe17f52d BLI: use templates for disjoint set data structure
Differential Revision: https://developer.blender.org/D16472
2022-11-12 14:26:47 +01:00
db25e64f6a Merge branch 'blender-v3.4-release' 2022-11-12 14:23:32 +01:00
Iliya Katueshenock
b5e82ff93d Cleanup: remove unused variable
Differential Revision: https://developer.blender.org/D16350
2022-11-12 14:20:19 +01:00
5a37724455 Merge branch 'blender-v3.4-release' 2022-11-12 14:19:32 +01:00
Iliya Katueshenock
3534c2b4ad Cleanup: make GArray declarations more explicit
Differential Revision: https://developer.blender.org/D16064
2022-11-12 14:14:42 +01:00
0fc27536fb Merge branch 'blender-v3.4-release' 2022-11-12 19:52:11 +11:00
4737f9cff2 Merge branch 'blender-v3.4-release' 2022-11-12 17:10:42 +11:00
935d6a965a Merge branch 'blender-v3.4-release' 2022-11-12 17:10:39 +11:00
b973e27327 Merge branch 'blender-v3.4-release' 2022-11-12 17:10:36 +11:00
cd659f7bbf Merge branch 'blender-v3.4-release' 2022-11-12 17:10:32 +11:00
41137eb7a5 Merge branch 'blender-v3.4-release' 2022-11-12 17:10:29 +11:00
e87b99d7f3 Merge branch 'blender-v3.4-release' 2022-11-12 17:10:25 +11:00
fcfa9ac219 Merge branch 'blender-v3.4-release' 2022-11-12 17:10:21 +11:00
1a8516163f Cleanup: Simplify handling of loop to poly map in normal calculation
A Loop to poly map was passed as an optional output to the loop normal
calculation. That meant it was often recalculated more than necessary.
Instead, treat it as an optional argument. This also helps relieve
unnecessary responsibilities from the already-complicated loop normal
calculation code.
2022-11-11 23:27:36 -06:00
d63ada602d Cleanup: Use simpler timers for mesh normals debug timing 2022-11-11 23:26:56 -06:00
7c519aa5d8 Cleanup: Make loop normal calculation function static 2022-11-11 23:26:49 -06:00
78bfb74743 Cleanup: Decrease variable scope in mesh loop normal calculation 2022-11-11 21:56:17 -06:00
d9e5a3e6ad Cleanup: Use spans for loop normal calculation input data 2022-11-11 21:49:43 -06:00
d0522d4ef1 Cleanup: Remove unnecessary struct keywords 2022-11-11 19:05:22 -06:00
03ccf37162 Cleanup: Rename curves sculpt selection variable
It's a bit simpler to skip the "indices" in the name, that can be
assumed from the type.
2022-11-11 15:32:51 -06:00
5465aa63d5 Merge branch 'blender-v3.4-release' 2022-11-11 13:49:55 -07:00
9d827a1834 Fix OSL object matrix with Cycles on the GPU
The OSL GPU services implementation of "osl_get_matrix" and
"osl_get_inverse_matrix" was missing support for the "common",
"shader" and "object" matrices and thus any matrix operations in OSL
shaders using these would not work. This patch adds the proper
implementation copied from the OSL CPU services.

Maniphest Tasks: T101222
2022-11-11 20:21:08 +01:00
1fdaf748bf Add poll messages for marker operators
A number of operators were missing poll messages when disabled.

These are the following new error messages:

1. "No markers are selected"
2. "Markers are locked"

Reviewed By: sybren

Differential Revision: https://developer.blender.org/D16403
2022-11-11 11:13:24 -08:00
5671e7a92c Cleanup: Fixing anti-patterns in fcurve.c
This is a clean-up pass that eliminates a few problematic patterns:

* Eliminating redundant parentheses around simple expressions.
* Combing declaration and assignment of variables where appropriate.
* Moving variable declarations closer to their first use.
* Many variables and arguments have been marked as `const`.
* Using `LISTBASE_FOREACH_*` variants where applicable instead of
  manually managing loop control flow.

There are no functional changes.

Reviewed By: sybren

Differential Revision: https://developer.blender.org/D16459
2022-11-11 11:07:30 -08:00
abbbf9f002 Merge branch 'blender-v3.4-release' 2022-11-11 12:41:04 -06:00
Michael Jones
2c596319a4 Cycles: Cache only up to 5 kernels of each type on Metal
This patch adapts D14754 for the Metal backend. Kernels of the same type are already organised into subdirectories which simplifies type matching.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16469
2022-11-11 18:10:29 +00:00
6f6a0185f2 Merge branch 'blender-v3.4-release' 2022-11-11 19:05:44 +01:00
097a13f5be Fix broken Cycles rendering with recent OSL versions
Commit c8dd33f5a37b6a6db0b6950d24f9a7cff5ceb799 in OSL
changed behavior of shader parameters that reference each other
and are also overwritten with an instance value.
This is causing the "NormalIn" parameter of a few OSL nodes in
Cycles to be set to zero somehow, which should instead have
received the value from a "node_geometry" node Cycles generates
and connects automatically. I am not entirely sure why that is
happening, but these parameters are superfluous anyway, since
OSL already provides the necessary data in the global variable "N".
So this patch simply removes those parameters (which mimics
SVM, where these parameters do not exist either), which also
fixes the rendering artifacts that occured with recent OSL.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D16470
2022-11-11 17:10:30 +01:00
dc8a1d38b7 Merge branch 'blender-v3.4-release' 2022-11-11 09:09:28 -06:00
a304dfdb69 Cleanup: Improve curves sculpt code section names 2022-11-11 08:50:28 -06:00
111974234c Addons submodule version bump
(Previous attempt was accidentally 4 days outdated)
2022-11-11 15:10:30 +01:00
fe2be36510 Merge branch 'blender-v3.4-release' 2022-11-11 13:27:27 +01:00
5f35e7f12a Addons submodule version bump 2022-11-11 12:48:14 +01:00
a3877d8fe4 Merge branch 'blender-v3.4-release' 2022-11-11 11:49:54 +01:00
824d5984aa Merge branch 'blender-v3.4-release' 2022-11-11 10:10:15 +01:00
6f1b5e1081 Merge branch 'blender-v3.4-release' 2022-11-11 08:48:43 +01:00
c967aab4ef Cleanup: Remove unused navigation widget struct members
The `region_size[2]` was set to -1 but was never accessed.
2022-11-10 22:38:38 -05:00
ca1642cd0c Cleanup: Use string argument for attribute API function
Instead of CustomDataLayer, which exposes the internal implementation
more than necessary, and requires that the layer is always available,
which isn't always true.
2022-11-10 15:29:21 -06:00
34f4646786 Cleanup: Clarify and deduplicate attribute convert implementation
The ED level function is used for more code paths now, and it has been
cleaned up. Handling of the active attribute is slightly improved too.
2022-11-10 15:29:21 -06:00
Ramil Roosileht
9800312590 Mesh: Convert color attribute operator
Implements an operator to convert color attributes in
available domains and types, as described in T97106.

Differential Revision: https://developer.blender.org/D15596
2022-11-10 15:29:21 -06:00
f613c504c4 Merge branch 'blender-v3.4-release' 2022-11-10 11:52:04 -08:00
3c089c0a88 Sculpt: Fix T102209: Multiresolution levels greater than 6 crashes
pbvh->leaf_limit needs to be at least 4 to split nodes
original face boundaries properly.
2022-11-10 11:30:04 -08:00
Edward
59618c7646 Sculpt: Fix T101914: Wpaint gradient tool doesn't work with vertex mask
Reviewed by: Julian Kaspar & Joseph Eagar
Differential Revision: https://developer.blender.org/D16293
Ref D16293
2022-11-10 11:03:59 -08:00
366796bbbe Merge branch 'blender-v3.4-release' 2022-11-10 19:59:47 +01:00
969aa7bbfc Sculpt: Change symmetrize merge threshold and expose in workspace panel
The sculpt symmetrize operator's merge threshold now defaults
to 0.0005 instead of 0.001, which tends to be a bit too big for
metric scale.

Also changed its step and precision a bit to be more usable.
2022-11-10 10:55:09 -08:00
Julien Kaspar
659de90a32 Sculpt: Rename Show/Hide operators for consistency
This is a minor naming update to make the box hide and show operators in sculpt mode follow current naming conventions.

Reviewed by: Joseph Eagar
Differential Revision: https://developer.blender.org/D16413
Ref D16413
2022-11-10 10:47:08 -08:00
cc2b5959bb Sculpt: Fix inconsistent naming for cavity_from_mask operator
With db40b62252 there have been various UI adjustments and improved renaming.
The Mask From Cavity menu operator didn't follow this new naming yet.

Reviewed By: Joseph Eagar
Differential Revision: https://developer.blender.org/D16409
Ref D16409
2022-11-10 10:40:54 -08:00
6a8ce5ec1c Fix abort when rendering with OSL and OptiX in Cycles
LLVM could kill the process during OSL PTX code generation, due
to generated symbols contained invalid characters in their name.
Those names are generated by Cycles and were not properly filtered:

- If the locale was set to something other than the minimal locale
  (when Blender was built with WITH_INTERNATIONAL), pointers
  may be printed with grouping characters, like commas or dots,
  added to them.
- Material names from Blender may contain the full range of UTF8
  characters.

This fixes those cases by forcing the locale used in the symbol name
generation to the minimal locale and using the material name hash
instead of the actual material name string.
2022-11-10 19:31:59 +01:00
2688d7200a Sculpt: Fix T102379: Crash in dyntopo 2022-11-10 10:28:10 -08:00
a5b2a3041f Fix const-correctness for a number of F-Curve functions
Reviewed By: sybren

Differential Revision: https://developer.blender.org/D16445
2022-11-10 10:22:03 -08:00
df9ab1c922 Merge branch 'blender-v3.4-release' 2022-11-10 18:07:34 +01:00
acaa736037 Fix: GPU: Set the last enum in ENUM_OPERATORS 2022-11-10 17:43:53 +01:00
04eab3fd39 Merge branch 'blender-v3.4-release' 2022-11-10 16:18:16 +01:00
e7a3454f5f Cleanup: Fix strict compiler warning 2022-11-10 16:11:02 +01:00
b85fe57887 Merge branch 'blender-v3.4-release' 2022-11-10 15:54:16 +01:00
598bb9065c Cleanup: Move sculpt.c to C++ 2022-11-10 07:40:41 -06:00
55e86f94a0 EEVEE Next: Fix wrong DoF when a non-camera object is the active camera
Related to T101533.

Reviewed By: fclem

Differential Revision: https://developer.blender.org/D16412
2022-11-10 13:01:06 +01:00
bc672e76eb Merge branch 'blender-v3.4-release' 2022-11-10 12:41:57 +01:00
a4668ecf17 Merge branch 'blender-v3.4-release' 2022-11-10 12:39:01 +01:00
Jason Schleifer
0b4bd3ddc0 NLA: Update context menu to include meta strip operators
The meta strip operator is available in the Add menu, but not in the
context menu.

This patch adds these two operators to the context menu:
* nla.meta_add
* nla.meta_remove

Reviewed By: sybren, RiggingDojo

Differential Revision: https://developer.blender.org/D16353
2022-11-10 12:07:29 +01:00
8ef092d2d8 Fix T102151: Output nodes don't work inside node groups
Using output nodes inside node groups in compositor node trees doesn't
work for the realtime compositor.

Currently, the realtime compositor only considers top level output
nodes. That means if a user edits a node group and adds an output node
in the group, the output node outside of the node group will still be
used, which breaks the temporary viewers workflow where users debug
results inside a node group.

This patch fixes that by first considering the output nodes in the
active context, then consider the root context as a fallback. This is
mostly consistent with the CPU compositor, but the realtime compositor
allow viewing node group output nodes even if no output nodes exist at
the top level context.

Differential Revision: https://developer.blender.org/D16446

Reviewed By: Clement Foucault
2022-11-10 13:02:15 +02:00
ec1ab6310a Merge branch 'blender-v3.4-release' 2022-11-10 11:05:32 +01:00
baabac5909 Merge branch 'blender-v3.4-release' 2022-11-10 11:34:43 +11:00
7d606ad3b8 Merge branch 'blender-v3.4-release' 2022-11-10 11:22:08 +11:00
1bf3069912 Merge branch 'blender-v3.4-release' 2022-11-10 11:22:02 +11:00
9b646cfae5 Merge branch 'blender-v3.4-release' 2022-11-10 11:21:18 +11:00
2630fdb787 Cleanup: format 2022-11-10 11:17:16 +11:00
d49dec896a Attempt to fix build error on Windows
Was failing since 1efc94bb2f, probably because some include uses
`std::min()`/`std::max()` which messes with the windows min/max defines.
2022-11-09 23:09:56 +01:00
32757b2429 Fix uninitialized variable use in asset metadata test
Wasn't an issue until 1efc94bb2f added a destructor, which would
attempt to destruct variables at uninitialized memory.
2022-11-09 23:04:24 +01:00
95077549c1 Fix failure in recently added asset library tests
Mistake in 1efc94bb2f.
2022-11-09 22:51:32 +01:00
1cdc6381cf Merge branch 'blender-v3.4-release' 2022-11-09 14:40:34 -07:00
c932fd79ac Merge branch 'blender-v3.4-release' 2022-11-09 22:38:25 +01:00
b8fc7ed994 Fix incorrect forward declarations, causing warnings on Windows 2022-11-09 22:37:49 +01:00
4cd9e9991c Merge branch 'blender-v3.4-release' 2022-11-09 22:03:56 +01:00
7a2827ee99 Merge branch 'blender-v3.4-release' 2022-11-09 13:55:08 -06:00
501036faae Merge branch 'blender-v3.4-release' 2022-11-09 20:42:19 +01:00
5f169fdfdc Merge branch 'blender-v3.4-release' 2022-11-09 19:45:19 +01:00
ad227e73f3 Cleanup: Link to documentation page for asset representation type 2022-11-09 19:37:05 +01:00
5291e4c358 Cleanup: Remove unused class variable, added in previous commit 2022-11-09 19:35:47 +01:00
1efc94bb2f Asset System: New core type to represent assets (AssetRepresenation)
Introduces a new `AssetRepresentation` type, as a runtime only container
to hold asset information. It is supposed to become _the_ main way to
represent and refer to assets in the asset system, see T87235. It can
store things like the asset name, asset traits, preview and other asset
metadata.

Technical documentation:
https://wiki.blender.org/wiki/Source/Architecture/Asset_System/Back_End#Asset_Representation.

By introducing a proper asset representation type, we do an important
step away from the previous, non-optimal representation of assets as
files in the file browser backend, and towards the asset system as
backend. It should replace the temporary & hacky `AssetHandle` design in
the near future. Note that the loading of asset data still happens
through the file browser backend, check the linked to Wiki page for more
information on that.

As a side-effect, asset metadata isn't stored in file browser file
entries when browsing with link/append anymore. Don't think this was
ever used, but scripts may have accessed this. Can be brought back if
there's a need for it.
2022-11-09 19:30:47 +01:00
7395062480 Cleanup: Miscellaneous cleanups to trim curves node
- Fix braces initialization warning
- Fixed missing static specifier
- Removed two unused functions
2022-11-09 12:13:59 -06:00
eea3913348 Merge branch 'blender-v3.4-release' 2022-11-09 10:56:28 -06:00
a43053a00a Improved Korean Font Sample
Small change to the text sample used for Korean font previews

See D16428 for details.

Differential Revision: https://developer.blender.org/D16428

Reviewed by Brecht Van Lommel
2022-11-09 08:51:00 -08:00
ce68367969 Fix T102140: Replacement of Noto Sans CJK Font
Replace our Noto Sans CJK with a version that has Simplified Chinese
set as the default script.

See D16426 for details and examples

Differential Revision: https://developer.blender.org/D16426

Reviewed by Brecht Van Lommel
2022-11-09 08:22:38 -08:00
Germano Cavalcante
edc00429e8 Fix T102257: Crash when making an Object as Effector set to Guide and trying to scrub the timeline
rB67e23b4b2967 revealed the bug. But the bug already existed before,
it just wasn't triggered.

Apparently the problem happens because the python code generated in
`initGuiding()` cannot be executed twice.

The second time the `initGuiding()` code is executed, the local python
variables are removed to make way for the others, but the reference to
one of the grids in a `Solver` object (name='solver_guiding2') is still
being used somewhere. So an error is raised and a crash is forced.

The solution is to prevent the python code in `initGuiding()` from being
executed twice.

When `FLUID_DOMAIN_ACTIVE_GUIDE` is in `fds->active_fields` this
indicates that the pointer in `mPhiGuideIn` has been set and the guiding
is already computed (does not need to be computed again).

Maniphest Tasks: T102257

Differential Revision: https://developer.blender.org/D16416
2022-11-09 12:11:20 -03:00
e6b38deb9d Cycles: Add basic support for using OSL with OptiX
This patch  generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.

Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D15902
2022-11-09 15:30:21 +01:00
efe073f57c Merge branch 'blender-v3.4-release' 2022-11-09 14:43:05 +01:00
477faffd78 Cleanup: unused argument warning 2022-11-09 21:08:31 +11:00
683b945917 Cleanup: format 2022-11-09 21:07:09 +11:00
024bec85f6 Depsgraph: simplify scheduling in depsgraph evaluator
No functional or performance changes are expected.

Differential Revision: https://developer.blender.org/D16423
2022-11-09 09:58:05 +01:00
59e69fc2bd Fix strict compiler warnings
Functions which are local to a translation unit should either be
marked as static, or be in an anonymous namespace.
2022-11-09 09:47:24 +01:00
aba0d01b78 Fix T102278: Compositor transforms apply locally
When using two transformed compositor results, the transformation of one
of them is apparently in the local space of the other, while it should
be applied in the global space instead.

In order to realize a compositor result on a certain operation domain,
the domain of the result is projected on the operation domain and later
realized. This is done by multiplying by the inverse of the operation
domain. However, the order of multiplication was inverted, so the
transformation was applied in the local space of the operation domain.

This patch fixes that by inverting the order of multiplication in domain
realization.
2022-11-09 10:35:41 +02:00
3836b6ff8c Cancel Equalize Handles & Snap Keys when no control points are selected
The Equalize Handles and Snap Keys operators would allow the user to
invoke them successfully even when they would have no effect due to
there not being any selected control points.

This patch makes it so that an error is displayed when these operators
are invoked with no control points are selected.

The reason this is in the `invoke` function is because it would be too
expensive to run this check in the `poll` function since it requires a
linear search through all the keys of all the visible F-Curves.

Reviewed By: sybren

Differential Revision: https://developer.blender.org/D16390
2022-11-08 21:04:47 -08:00
800b025518 Merge branch 'blender-v3.4-release' 2022-11-09 14:34:08 +11:00
baee7ce4a5 Fix T102306: buildtime shader compilation option fails under Wayland
libdecor (for window decorations) was crashing on exit with the shader
builder, avoid the crash by calling the "background" system creation
function which doesn't initialize window management under Wayland.
2022-11-09 14:01:14 +11:00
76c308e45d Merge branch 'blender-v3.4-release' 2022-11-09 13:57:27 +11:00
801db0d429 Revert "Fix T102306: buildtime shader compilation option fails under Wayland"
This reverts commit 6fa05e2c29.
2022-11-09 13:54:19 +11:00
756538b4a1 BLI_math: remove normalize from mat3_normalized_to_quat_fast
The quaternion calculated are unit length unless the the input matrix is
degenerate. Detect degenerate cases and remove the normalize_qt call.
2022-11-09 13:25:11 +11:00
c6612da1e6 Merge branch 'blender-v3.4-release' 2022-11-09 13:06:05 +11:00
2d9d08677e Cleanup: fix types from f04f9cc3d0 2022-11-09 14:54:37 +13:00
335082dcd3 Merge branch 'blender-v3.4-release' 2022-11-08 15:31:33 -08:00
f0b5f94cb5 Cleanup: format 2022-11-09 11:59:51 +13:00
f04f9cc3d0 Cleanup: add unique_index_table to UvElementMap
In anticipation of UV Copy+Paste, we need fast access to indices
of unique UvElements. Can also be used to improve performance and
simplify code for UV Sculpt tools and UV Stitch.

No user visible changes expected.

Maniphest Tasks: T77911

See also: D16278
2022-11-09 11:47:16 +13:00
75265f27da Merge branch 'blender-v3.4-release' 2022-11-08 21:21:02 +01:00
Lukas Stockner
1eca437197 Color Management: Parallelize ImBuf conversion to float
Motivated by long loading times in T101969, reduces render preparation time from 14sec to 6sec.

Another possible improvement would be to use C++ and template based on OCIO vs. sRGB,
but moving the file to C++ seems nontrivial (and opens up the question whether ocio_capi
makes any sense then or we should just use OCIO directly) so I left it at a direct 1:1
parallelization of the existing code for now.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16317
2022-11-08 20:59:41 +01:00
c6aacd718a Cleanup: Improve precision during UV packing.
Simplify API and improve accuracy of uv packing placement
by using pre-translation and double precision internally.

Will protect against future precision problems with UDIM.

No user visible changes expected.

Maniphest Tasks: T68889
Differential Revision: https://developer.blender.org/D16362
2022-11-09 08:50:53 +13:00
96d8e5e66b Merge branch 'blender-v3.4-release' 2022-11-08 13:39:45 -06:00
4b57bc4e5d Cleanup: format 2022-11-09 08:30:18 +13:00
b539d425f0 Merge branch 'blender-v3.4-release' 2022-11-08 19:47:55 +01:00
35eb37c60d Merge branch 'blender-v3.4-release' 2022-11-08 19:36:06 +01:00
ad5814a2a7 Merge branch 'blender-v3.4-release' 2022-11-08 12:33:31 -06:00
8eab23bc66 Geometry Nodes: Fix alignment of exposed properties in the modifier
The spacing and alignment of the properties in the geometry nodes
modifier could vary depending on the type of the socket or
whether the input can accept attributes.
Wrapping each property in its own `row` layout allows us to make
the spacing and alignment between them consistent.

Reviewed By: Hans Goudey

Differential Revision: http://developer.blender.org/D16417
2022-11-08 19:14:29 +01:00
95d36a31b6 Merge branch 'blender-v3.4-release' 2022-11-08 18:23:49 +01:00
4c182aef7c GPencil: Make Sculpt Auto-masking Global and not by Brush
The auto-masking was working by Brush and this was very
inconvenient because it was necessary set the options by
Brush, now the options are global and can be set at once.

Also, the automa-masking now works with `and` logic
and not with `or` as before. That means that a stroke
must meet all the conditions of the masking.

Added new Layer and Material options to masking the 
strokes using the same Layer/Material of the selected stroke.
Before, only Active Layer and Active Material could be masked.

The options of masking has been moved to the top-bar using
the same design of Mesh Sculpt masking.

As result of the changes above, the following props changed:

Removed:

`brush.gpencil_settings.use_automasking_strokes`
`brush.gpencil_settings.use_automasking_layer`
`brush.gpencil_settings.use_automasking_material`

Added:

`tool_settings.gpencil_sculpt.use_automasking_stroke`
`tool_settings.gpencil_sculpt.use_automasking_layer_stroke`
`tool_settings.gpencil_sculpt.use_automasking_material_stroke`
`tool_settings.gpencil_sculpt.use_automasking_layer_active`
`tool_settings.gpencil_sculpt.use_automasking_material_active`


Reviewed by: Julien Kaspar, Matias Mendiola, Daniel Martinez Lara
2022-11-08 16:55:59 +01:00
bbb1d3e5e7 Merge branch 'blender-v3.4-release' 2022-11-08 16:28:11 +01:00
6a14ca18d0 Merge branch 'blender-v3.4-release' 2022-11-08 16:26:09 +01:00
c73ae711bf BLI: new basic CacheMutex
This patch introduces a new `CacheMutex` which makes it easy to implement
lazily computed caches in e.g. `Curves`. For more details see `BLI_cache_mutex.hh`.

Differential Revision: https://developer.blender.org/D16419
2022-11-08 15:50:49 +01:00
Edward
1d71f82033 Texture Paint: sync adding a new texture slot to the Image Editor
When changing the texture paint slot index or activating a Texture Node, the texture displayed in the Image Editor changes accordingly.
This patch syncs the Image Editor when a new texture paint slot was added, which currently is not the case.

Also deduplicates some code.
2022-11-08 14:28:44 +01:00
77c4d3154b Merge branch 'blender-v3.4-release' 2022-11-08 13:47:43 +01:00
Brecht Van Lommel
e1b3d91127 Refactor: replace Cycles sse/avx types by vectorized float4/int4/float8/int8
The distinction existed for legacy reasons, to easily port of Embree
intersection code without affecting the main vector types. However we are now
using SIMD for these types as well, so no good reason to keep the distinction.

Also more consistently pass these vector types by value in inline functions.
Previously it was partially changed for functions used by Metal to avoid having
to add address space qualifiers, simple to do it everywhere.

Also removes function declarations for vector math headers, serves no real
purpose.

Differential Revision: https://developer.blender.org/D16146
2022-11-08 12:28:40 +01:00
32ec0521c5 Merge branch 'blender-v3.4-release' 2022-11-08 12:18:12 +01:00
c047042adf Merge branch 'blender-v3.4-release' 2022-11-08 12:03:07 +01:00
f12236d1e3 Merge branch 'blender-v3.4-release' 2022-11-08 11:34:38 +01:00
871375f222 PyAPI: add invalid objects check for RNA struct keys()/values()/items() 2022-11-08 17:17:30 +11:00
8b151982fe Merge branch 'blender-v3.4-release' 2022-11-08 17:00:46 +11:00
fddcdcc20c Merge branch 'blender-v3.4-release' 2022-11-08 12:18:52 +11:00
9d1380e0a9 Merge branch 'blender-v3.4-release' 2022-11-08 12:18:48 +11:00
452865c80d Merge branch 'blender-v3.4-release' 2022-11-08 12:18:44 +11:00
2257a9bfb1 Cleanup: correct type of RNA struct methods
Some BPy_StructRNA methods used BPy_PropertyRNA in their function
signatures, while this didn't case any bugs, it could lead to issues
in the future.
2022-11-08 11:26:33 +11:00
4eb9322eda Cleanup: PyMethodDef formatting
Missed these changes in [0].

Also replace designated initializers in some C code, as it's not used
often and would need to be removed when converting to C++.

[0] e555ede626
2022-11-08 11:13:58 +11:00
3e71220efc Fix support for building with ffmpeg < 5.0
Seems like the new audio channel api was not as backwards compatible as we thought.
Therefore we need to reintroduce the usage of the old api to make older ffmpeg version be able to compile Blender.

This change is only intended to stick around for two releases or so. After that we hope that most Linux distros ship
ffmpeg >=5.0 so we can switch to it.

Reviewed By: Sergey

Differential Revision: http://developer.blender.org/D16408
2022-11-07 17:46:13 +01:00
95631c94c4 Merge branch 'blender-v3.4-release' 2022-11-07 15:30:49 +01:00
8473b5a592 Fix strict compiler warnings 2022-11-07 14:20:50 +01:00
403fc9a3f1 Merge branch 'blender-v3.4-release' 2022-11-07 08:46:15 -03:00
e555ede626 Cleanup: unify struct declaration style for Python types, update names
Use struct identifiers in comments before the value.
This has some advantages:

- The struct identifiers didn't mix well with other code-comments,
  where other comments were wrapped onto the next line.
- Minor changes could re-align all other comments in the struct.
- PyVarObject_HEAD_INIT & tp_name are no longer placed on the same line.

Remove overly verbose comments copied from PyTypeObject (Python v2.x),
these aren't especially helpful and get outdated.

Also corrected some outdated names:

- PyTypeObject.tp_print -> tp_vectorcall_offset
- PyTypeObject.tp_reserved -> tp_as_async
2022-11-07 22:38:32 +11:00
719332d120 Cleanup: remove unused variable 2022-11-07 22:38:32 +11:00
688b408bbb Fix 'ED_transform_snap_object_project_ray_all' not return 'hit_list'
Missed in rBff4f14b21a42.
2022-11-07 08:38:09 -03:00
888fb0b395 Merge branch 'blender-v3.4-release' 2022-11-07 12:32:36 +01:00
ff4f14b21a Fix T102053: snap fails with instances of geometry nodes
As instances are often generated geometries, we cannot rely on the data
provided by `DupliObject::ob`.

Use `DupliObject::ob_data` when possible.

This required a major refactor in the code as the output variables are
now gathered in context and easier to access.
2022-11-07 08:27:54 -03:00
b2db324f60 Fix potentially uninitialized memory usage
`nearest_world_tree_co` allows null parameter, so the `index` variable
isn't really needed and doesn't even need to be initialized.
2022-11-07 08:27:54 -03:00
cad897de16 Transform: remove SnapData cache for meshes
All cache needed is already stored in `Mesh.runtime`.
2022-11-07 08:27:54 -03:00
129197f20d Merge branch 'blender-v3.4-release' 2022-11-07 21:33:24 +11:00
74140d41b1 Cycles: Apple GPU threadgroup tuning
This patch tunes maximum threads-per-threadgroup and threads-per-block for faster renders on Apple GPUs. Appropriate tuning is selected based on the GPU architecture (M1 or M2). We see a benchmark uplift of around 5-10% on M1 family chips. Similar uplift is expected on M2 with upcoming OS changes. (Ref T101931)

Reviewed By: brecht

Maniphest Tasks: T101931

Differential Revision: https://developer.blender.org/D16299
2022-11-07 10:00:46 +00:00
671c3e1fa4 Fix File Browser Move Bookmark malfunction if no item is selected
The operator was acting on non selected items (wasnt checking SpaceFile
bookmarknr for being -1) which could end up removing items even.

Now sanatize this by introducing proper poll (which returns false if
nothing is selected).

Fixes T102014.

Maniphest Tasks: T102014

Differential Revision: https://developer.blender.org/D16385
2022-11-07 10:28:37 +01:00
37ca6e4fd1 Merge branch 'blender-v3.4-release' 2022-11-06 15:45:45 +01:00
a8865f3402 Fix: Missing initialization curves bounds in set origin operator
It could be changed, but currently curves.bounds_min_max
relies on the initial value of its arguments. Split from D16331.
2022-11-06 10:23:14 +01:00
3852094b35 Cleanup: Nodes: Use const arguments, avoid recursive iteration
Use the node topology cache and avoid modifying the node tree
in a non-threadsafe way to improve the predictability of using
the helper function. Replaces the implementation from
e0d4047136.
2022-11-06 10:23:14 +01:00
Pablo Vazquez
28e952dacd UI: Sort items in Weight Locks menu
Reorder the items in the `Locks` menu:
* Split into three groups: Lock, Unlock, Invert.
* Use icon only in the first item of each group, following the HIG.

Reviewed By: Severin

Differential Revision: https://developer.blender.org/D16383
2022-11-06 02:28:48 +01:00
6ef6778215 Fix: Broken debug build after recent cleanup commit
5060f26f50
2022-11-05 21:37:37 +01:00
8b29d6cd75 Cleanup: Remove unused node function 2022-11-05 21:37:37 +01:00
c6725dc507 Cleanup: Use Vector in group input/output node update functions
Also reduce the scope of variables and use ListBase macros
2022-11-05 21:37:37 +01:00
db3bf36770 Sculpt: Fix T102253: Missing call to SCULPT_automasking_node_update 2022-11-05 11:57:39 -07:00
5060f26f50 Cleanup: Move function to legacy mesh conversion file 2022-11-05 18:14:26 +01:00
455d195d55 OBJ Export: Remove edge recalculation
The removed function call removes all attributes from mesh edges
and rebuilds the mesh edge topology. This isn't necessary because
meshes always have edges in the first place.

Exporting a 4 million face grid, this saved 1.5 seconds out of 4
seconds total for the whole export.

Tests files have to be updated, since the edge calculation could
potentially change the order of elements. This is also a fix, since
previously the exporter would delete all attributes on the evaluated
mesh edges.

Differential Revision: https://developer.blender.org/D16391
2022-11-05 16:28:13 +01:00
4ec5a8cbc2 Cleanup: Remove unnecessary node type registraction functions
These functions provided little benefit compared to simply setting
the function pointers directly.
2022-11-05 16:10:27 +01:00
e673f3ba24 Cleanup: Remove redundant assignment of loose edge flag
This is assigned by `BKE_mesh_calc_edges_loose` a few lines below.
2022-11-05 13:26:52 +01:00
38086dcfdc Merge branch 'blender-v3.4-release' 2022-11-05 20:03:46 +11:00
412e7d3771 Merge branch 'blender-v3.4-release' 2022-11-05 17:10:23 +11:00
b3e1540c50 Cleanup: use bools and typed enums for WM_job type & flag
Also use typed enum for the event handler flag.
2022-11-05 14:14:39 +11:00
ae3073323e Cleanup: use bool instead of short for job stop & do_update arguments
Since these values are only ever 0/1, use bool type.
2022-11-05 13:47:01 +11:00
4a313b8252 Cleanup: Move legacy mesh conversions to proper file 2022-11-04 23:28:10 +01:00
23dafa4ad6 Cleanup: OBJ: Simplify access to loose edges
Implementing this with a separate function just added extra code,
there wasn't much benefit to it.
2022-11-04 21:58:15 +01:00
59af0fba9d Merge branch 'blender-v3.4-release' 2022-11-04 21:06:39 +01:00
4b200b491c Cleanup: Use mesh API functions 2022-11-04 20:37:17 +01:00
10131a6f62 Cleanup: Mesh: Remove redundant edge render flag
Currently there are both "EDGERENDER" and "EDGEDRAW" flags, which are
almost always used together. Both are runtime data and not exposed to
RNA, used to skip drawing some edges after the subdivision surface
modifier. The render flag is a relic of the Blender internal renderer.
This commit removes the render flag and replaces its uses with the
draw flag.
2022-11-04 20:19:52 +01:00
85ce488298 Realtime Compositor: Implement static cache manager
This patch introduces the concept of a Cached Resource that can be
cached across compositor evaluations as well as used by multiple
operations in the same evaluation. Additionally, this patch implements a
new structure for the realtime compositor, the Static Cache Manager,
that manages all the cached resources and deletes them when they are no
longer needed.

This improves responsiveness while adjusting compositor node trees and
also conserves memory usage.

Differential Revision: https://developer.blender.org/D16357

Reviewed By: Clement Foucault
2022-11-04 16:14:22 +02:00
943d574185 Merge branch 'blender-v3.4-release' 2022-11-04 21:59:38 +11:00
4b2458b457 Merge branch 'blender-v3.4-release' 2022-11-04 19:19:23 +11:00
d4f0ccb6b4 Cleanup: pass const view_area in sequencer functions 2022-11-04 18:50:31 +11:00
7a7055c186 Cleanup: use bool for render types ok/result_ok 2022-11-04 18:50:31 +11:00
d6b2f4ad8e Merge branch 'blender-v3.4-release' 2022-11-04 00:03:47 -07:00
624c11d69f BLI_path: remove use of BLI_path_normalize in BLI_path_parent_dir
Normalize is no longer necessary as BLI_path_name_at_index skips
redundant path components such as "//" and "/./".

This has the advantage that the path length isn't limited to FILE_MAX.
2022-11-04 17:22:11 +11:00
3d34b8c901 Merge branch 'blender-v3.4-release' 2022-11-04 16:51:18 +11:00
ecda9483f1 Merge branch 'blender-v3.4-release' 2022-11-03 23:41:11 -04:00
e36db7018d Merge branch 'blender-v3.4-release' 2022-11-04 14:00:14 +11:00
9dfc134c9d DRW: Fix incorrect logic in state redundancy check
Error introduced by rB3c39a3affee7.
2022-11-03 19:41:36 +01:00
c2a99cb0b6 Merge branch 'blender-v3.4-release' 2022-11-03 19:38:36 +01:00
5401e68a61 Merge branch 'blender-v3.4-release' 2022-11-03 12:20:44 -05:00
3c39a3affe DRW: Add support for clip plane count as part of the draw state.
This moves the implementation from the View to the draw manager itself.

However, this is not its final place and should be moved to the shader
create info at some point in the future.
For now it is not possible because of possible interaction with the
old draw manager codebase.
2022-11-03 17:03:22 +01:00
dcfe4a302c Merge branch 'blender-v3.4-release' 2022-11-03 16:52:39 +01:00
41c692ee2f Fix deprecation warnings in FFmpeg related code
The non-deprecated API dates back to 2017, so it should be safe
to simply migrate to it.

Fixes verbose error prints, making it easier to see actual issues.

Differential Revision: https://developer.blender.org/D16370
2022-11-03 15:18:02 +01:00
74c293863d Cycles: Remove use of sprintf() in MD5 code
The new Xcode declares the `sprintf()` function deprecated and
suggests to sue `snprintf()` as a safer alternative.

This change actually moves away from any formatted printing and
uses inlined byte-to-hex-string conversion which is also safe
and is (unmesurably) faster.

Differential Revision: https://developer.blender.org/D16378
2022-11-03 15:10:37 +01:00
90805c9943 Merge branch 'blender-v3.4-release' 2022-11-03 14:32:27 +01:00
a16dd407b3 Revert "Blender 3.4 - Beta"
This reverts commit 666135c32a.
2022-11-03 10:12:46 +01:00
31c52bc34e Merge branch 'blender-v3.4-release' 2022-11-03 10:12:17 +01:00
ba8754cf11 Blender 3.5 Alpha: Start of new release cycle. 2022-11-03 10:11:13 +01:00
ae3f443220 Merge branch 'master' into temp-ui-cpp 2022-04-03 12:07:43 -05:00
b9d1b07a45 Cleanup: Move interface.c to C++
Similar to 4537eb0c3b
2022-04-03 12:07:38 -05:00
dd1be8db19 Cleanup: Move interface View2D files to C++ 2022-04-02 21:32:25 -05:00
6450f380ac Cleanup: Move more interface files to C++ 2022-04-02 15:07:36 -05:00
1051c17af3 Cleanup: Move interface_query.c to C++ 2022-04-02 14:07:43 -05:00
87628abaa1 Merge branch 'master' into temp-ui-cpp 2022-04-02 13:54:52 -05:00
462177fd62 Merge branch 'master' into temp-ui-cpp 2021-12-01 22:37:05 -05:00
5d221a2a8a Merge branch 'temp-ui-cpp' of git.blender.org:blender into temp-ui-cpp 2021-11-26 08:08:01 -05:00
fb4b7aaa8f Attempt to resolve build error 2021-11-26 08:07:50 -05:00
6183f63250 Merge branch 'master' into temp-ui-cpp 2021-11-21 11:39:51 -05:00
08228956d9 Cleanup: Move outliner_tools.c to C++ 2021-11-20 17:00:41 -05:00
956be86c09 Merge branch 'master' into temp-ui-cpp 2021-11-20 16:31:11 -05:00
e2f9602be2 Remove designated initializer 2021-11-20 15:59:05 -05:00
24ae9c52d8 Cleanup: Move search menu template to C++
This will allow using improved data structures, and
make other refactors to the search button API cleaner.
2021-11-19 23:51:48 -05:00
1287 changed files with 56289 additions and 30420 deletions

View File

@@ -1239,12 +1239,11 @@ if(WITH_OPENGL)
add_definitions(-DWITH_OPENGL)
endif()
# -----------------------------------------------------------------------------
#-----------------------------------------------------------------------------
# Configure Vulkan.
if(WITH_VULKAN_BACKEND)
add_definitions(-DWITH_VULKAN_BACKEND)
list(APPEND BLENDER_GL_LIBRARIES ${VULKAN_LIBRARIES})
endif()
# -----------------------------------------------------------------------------

View File

@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-or-later
## Update and uncomment this in the release branch
set(BLENDER_VERSION 3.4)
# set(BLENDER_VERSION 3.1)
function(download_source dep)
set(TARGET_FILE ${${dep}_FILE})

View File

@@ -0,0 +1,59 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2022 Blender Foundation.
# - Find MoltenVK libraries
# Find the MoltenVK includes and libraries
# This module defines
# MOLTENVK_INCLUDE_DIRS, where to find MoltenVK headers, Set when
# MOLTENVK_INCLUDE_DIR is found.
# MOLTENVK_LIBRARIES, libraries to link against to use MoltenVK.
# MOLTENVK_ROOT_DIR, The base directory to search for MoltenVK.
# This can also be an environment variable.
# MOLTENVK_FOUND, If false, do not try to use MoltenVK.
#
# If MOLTENVK_ROOT_DIR was defined in the environment, use it.
IF(NOT MOLTENVK_ROOT_DIR AND NOT $ENV{MOLTENVK_ROOT_DIR} STREQUAL "")
SET(MOLTENVK_ROOT_DIR $ENV{MOLTENVK_ROOT_DIR})
ENDIF()
SET(_moltenvk_SEARCH_DIRS
${MOLTENVK_ROOT_DIR}
${LIBDIR}/vulkan/MoltenVK
)
FIND_PATH(MOLTENVK_INCLUDE_DIR
NAMES
MoltenVK/vk_mvk_moltenvk.h
HINTS
${_moltenvk_SEARCH_DIRS}
PATH_SUFFIXES
include
)
FIND_LIBRARY(MOLTENVK_LIBRARY
NAMES
MoltenVK
HINTS
${_moltenvk_SEARCH_DIRS}
PATH_SUFFIXES
dylib/macOS
)
# handle the QUIETLY and REQUIRED arguments and set MOLTENVK_FOUND to TRUE if
# all listed variables are TRUE
INCLUDE(FindPackageHandleStandardArgs)
FIND_PACKAGE_HANDLE_STANDARD_ARGS(MoltenVK DEFAULT_MSG MOLTENVK_LIBRARY MOLTENVK_INCLUDE_DIR)
IF(MOLTENVK_FOUND)
SET(MOLTENVK_LIBRARIES ${MOLTENVK_LIBRARY})
SET(MOLTENVK_INCLUDE_DIRS ${MOLTENVK_INCLUDE_DIR})
ENDIF()
MARK_AS_ADVANCED(
MOLTENVK_INCLUDE_DIR
MOLTENVK_LIBRARY
)
UNSET(_moltenvk_SEARCH_DIRS)

View File

@@ -100,6 +100,23 @@ if(WITH_USD)
find_package(USD REQUIRED)
endif()
if(WITH_VULKAN_BACKEND)
find_package(MoltenVK REQUIRED)
if(EXISTS ${LIBDIR}/vulkan)
set(VULKAN_FOUND On)
set(VULKAN_ROOT_DIR ${LIBDIR}/vulkan/macOS)
set(VULKAN_INCLUDE_DIR ${VULKAN_ROOT_DIR}/include)
set(VULKAN_LIBRARY ${VULKAN_ROOT_DIR}/lib/libvulkan.1.dylib)
set(VULKAN_INCLUDE_DIRS ${VULKAN_INCLUDE_DIR} ${MOLTENVK_INCLUDE_DIRS})
set(VULKAN_LIBRARIES ${VULKAN_LIBRARY} ${MOLTENVK_LIBRARIES})
else()
message(WARNING "Vulkan SDK was not found, disabling WITH_VULKAN_BACKEND")
set(WITH_VULKAN_BACKEND OFF)
endif()
endif()
if(WITH_OPENSUBDIV)
find_package(OpenSubdiv)
endif()

View File

@@ -108,6 +108,10 @@ find_package_wrapper(ZLIB REQUIRED)
find_package_wrapper(Zstd REQUIRED)
find_package_wrapper(Epoxy REQUIRED)
if(WITH_VULKAN_BACKEND)
find_package_wrapper(Vulkan REQUIRED)
endif()
function(check_freetype_for_brotli)
include(CheckSymbolExists)
set(CMAKE_REQUIRED_INCLUDES ${FREETYPE_INCLUDE_DIRS})
@@ -322,9 +326,10 @@ if(WITH_CYCLES AND WITH_CYCLES_DEVICE_ONEAPI)
file(GLOB _sycl_runtime_libraries
${SYCL_ROOT_DIR}/lib/libsycl.so
${SYCL_ROOT_DIR}/lib/libsycl.so.*
${SYCL_ROOT_DIR}/lib/libpi_level_zero.so
${SYCL_ROOT_DIR}/lib/libpi_*.so
)
list(FILTER _sycl_runtime_libraries EXCLUDE REGEX ".*\.py")
list(REMOVE_ITEM _sycl_runtime_libraries "${SYCL_ROOT_DIR}/lib/libpi_opencl.so")
list(APPEND PLATFORM_BUNDLED_LIBRARIES ${_sycl_runtime_libraries})
unset(_sycl_runtime_libraries)
endif()

View File

@@ -419,7 +419,7 @@ if(WITH_IMAGE_OPENEXR)
warn_hardcoded_paths(OpenEXR)
set(OPENEXR ${LIBDIR}/openexr)
set(OPENEXR_INCLUDE_DIR ${OPENEXR}/include)
set(OPENEXR_INCLUDE_DIRS ${OPENEXR_INCLUDE_DIR} ${IMATH_INCLUDE_DIRS} ${OPENEXR}/include/OpenEXR)
set(OPENEXR_INCLUDE_DIRS ${OPENEXR_INCLUDE_DIR} ${IMATH_INCLUDE_DIRS} ${OPENEXR_INCLUDE_DIR}/OpenEXR)
set(OPENEXR_LIBPATH ${OPENEXR}/lib)
# Check if the 3.x library name exists
# if not assume this is a 2.x library folder
@@ -568,7 +568,8 @@ if(WITH_OPENIMAGEIO)
if(NOT OpenImageIO_FOUND)
set(OPENIMAGEIO ${LIBDIR}/OpenImageIO)
set(OPENIMAGEIO_LIBPATH ${OPENIMAGEIO}/lib)
set(OPENIMAGEIO_INCLUDE_DIRS ${OPENIMAGEIO}/include)
set(OPENIMAGEIO_INCLUDE_DIR ${OPENIMAGEIO}/include)
set(OPENIMAGEIO_INCLUDE_DIRS ${OPENIMAGEIO_INCLUDE_DIR})
set(OIIO_OPTIMIZED optimized ${OPENIMAGEIO_LIBPATH}/OpenImageIO.lib optimized ${OPENIMAGEIO_LIBPATH}/OpenImageIO_Util.lib)
set(OIIO_DEBUG debug ${OPENIMAGEIO_LIBPATH}/OpenImageIO_d.lib debug ${OPENIMAGEIO_LIBPATH}/OpenImageIO_Util_d.lib)
set(OPENIMAGEIO_LIBRARIES ${OIIO_OPTIMIZED} ${OIIO_DEBUG})
@@ -785,6 +786,14 @@ if(WITH_CYCLES AND WITH_CYCLES_OSL)
endif()
find_path(OSL_INCLUDE_DIR OSL/oslclosure.h PATHS ${CYCLES_OSL}/include)
find_program(OSL_COMPILER NAMES oslc PATHS ${CYCLES_OSL}/bin)
file(STRINGS "${OSL_INCLUDE_DIR}/OSL/oslversion.h" OSL_LIBRARY_VERSION_MAJOR
REGEX "^[ \t]*#define[ \t]+OSL_LIBRARY_VERSION_MAJOR[ \t]+[0-9]+.*$")
file(STRINGS "${OSL_INCLUDE_DIR}/OSL/oslversion.h" OSL_LIBRARY_VERSION_MINOR
REGEX "^[ \t]*#define[ \t]+OSL_LIBRARY_VERSION_MINOR[ \t]+[0-9]+.*$")
string(REGEX REPLACE ".*#define[ \t]+OSL_LIBRARY_VERSION_MAJOR[ \t]+([.0-9]+).*"
"\\1" OSL_LIBRARY_VERSION_MAJOR ${OSL_LIBRARY_VERSION_MAJOR})
string(REGEX REPLACE ".*#define[ \t]+OSL_LIBRARY_VERSION_MINOR[ \t]+([.0-9]+).*"
"\\1" OSL_LIBRARY_VERSION_MINOR ${OSL_LIBRARY_VERSION_MINOR})
endif()
if(WITH_CYCLES AND WITH_CYCLES_EMBREE)
@@ -917,6 +926,20 @@ if(WITH_HARU)
set(HARU_LIBRARIES ${HARU_ROOT_DIR}/lib/libhpdfs.lib)
endif()
if(WITH_VULKAN_BACKEND)
if(EXISTS ${LIBDIR}/vulkan)
set(VULKAN_FOUND On)
set(VULKAN_ROOT_DIR ${LIBDIR}/vulkan)
set(VULKAN_INCLUDE_DIR ${VULKAN_ROOT_DIR}/include)
set(VULKAN_INCLUDE_DIRS ${VULKAN_INCLUDE_DIR})
set(VULKAN_LIBRARY ${VULKAN_ROOT_DIR}/lib/vulkan-1.lib)
set(VULKAN_LIBRARIES ${VULKAN_LIBRARY})
else()
message(WARNING "Vulkan SDK was not found, disabling WITH_VULKAN_BACKEND")
set(WITH_VULKAN_BACKEND OFF)
endif()
endif()
if(WITH_CYCLES AND WITH_CYCLES_PATH_GUIDING)
find_package(openpgl QUIET)
if(openpgl_FOUND)
@@ -949,7 +972,13 @@ if(WITH_CYCLES AND WITH_CYCLES_DEVICE_ONEAPI)
endforeach()
unset(_sycl_runtime_libraries_glob)
list(APPEND _sycl_runtime_libraries ${SYCL_ROOT_DIR}/bin/pi_level_zero.dll)
file(GLOB _sycl_pi_runtime_libraries_glob
${SYCL_ROOT_DIR}/bin/pi_*.dll
)
list(REMOVE_ITEM _sycl_pi_runtime_libraries_glob "${SYCL_ROOT_DIR}/bin/pi_opencl.dll")
list (APPEND _sycl_runtime_libraries ${_sycl_pi_runtime_libraries_glob})
unset(_sycl_pi_runtime_libraries_glob)
list(APPEND PLATFORM_BUNDLED_LIBRARIES ${_sycl_runtime_libraries})
unset(_sycl_runtime_libraries)
endif()

View File

@@ -5,38 +5,38 @@
update-code:
git:
submodules:
- branch: blender-v3.4-release
- branch: master
commit_id: HEAD
path: release/scripts/addons
- branch: blender-v3.4-release
- branch: master
commit_id: HEAD
path: release/scripts/addons_contrib
- branch: blender-v3.4-release
- branch: master
commit_id: HEAD
path: release/datafiles/locale
- branch: blender-v3.4-release
- branch: master
commit_id: HEAD
path: source/tools
svn:
libraries:
darwin-arm64:
branch: tags/blender-3.4-release
branch: trunk
commit_id: HEAD
path: lib/darwin_arm64
darwin-x86_64:
branch: tags/blender-3.4-release
branch: trunk
commit_id: HEAD
path: lib/darwin
linux-x86_64:
branch: tags/blender-3.4-release
branch: trunk
commit_id: HEAD
path: lib/linux_centos7_x86_64
windows-amd64:
branch: tags/blender-3.4-release
branch: trunk
commit_id: HEAD
path: lib/win64_vc15
tests:
branch: tags/blender-3.4-release
branch: trunk
commit_id: HEAD
path: lib/tests
benchmarks:

View File

@@ -53,18 +53,7 @@ This package provides Blender as a Python module for use in studio pipelines, we
[System requirements](https://www.blender.org/download/requirements/) are the same as Blender.
Each Blender release supports one Python version, and the package is only compatible with that version.
## Source Code
* [Releases](https://download.blender.org/source/)
* Repository: [git.blender.org/blender.git](https://git.blender.org/gitweb/gitweb.cgi/blender.git)
## Credits
Created by the [Blender developer community](https://www.blender.org/about/credits/).
Thanks to Tyler Alden Gubala for maintaining the original version of this package."""
Each Blender release supports one Python version, and the package is only compatible with that version."""
# ------------------------------------------------------------------------------
# Generic Functions

View File

@@ -38,7 +38,7 @@ PROJECT_NAME = Blender
# could be handy for archiving the generated documentation or if some version
# control system is used.
PROJECT_NUMBER = V3.4
PROJECT_NUMBER = V3.5
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a

View File

@@ -91,3 +91,7 @@ endif()
if(WITH_COMPOSITOR_CPU)
add_subdirectory(smaa_areatex)
endif()
if(WITH_VULKAN_BACKEND)
add_subdirectory(vulkan_memory_allocator)
endif()

View File

@@ -0,0 +1,24 @@
# SPDX-License-Identifier: GPL-2.0-or-later
# Copyright 2022 Blender Foundation. All rights reserved.
set(INC
.
)
set(INC_SYS
${VULKAN_INCLUDE_DIRS}
)
set(SRC
vk_mem_alloc_impl.cc
vk_mem_alloc.h
)
blender_add_lib(extern_vulkan_memory_allocator "${SRC}" "${INC}" "${INC_SYS}" "${LIB}")
if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_C_COMPILER_ID MATCHES "Clang")
target_compile_options(extern_vulkan_memory_allocator
PRIVATE "-Wno-nullability-completeness"
)
endif()

View File

@@ -0,0 +1,19 @@
Copyright (c) 2017-2022 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@@ -0,0 +1,5 @@
Project: VulkanMemoryAllocator
URL: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator
License: MIT
Upstream version: a6bfc23
Local modifications: None

175
extern/vulkan_memory_allocator/README.md vendored Normal file
View File

@@ -0,0 +1,175 @@
# Vulkan Memory Allocator
Easy to integrate Vulkan memory allocation library.
**Documentation:** Browse online: [Vulkan Memory Allocator](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/) (generated from Doxygen-style comments in [include/vk_mem_alloc.h](include/vk_mem_alloc.h))
**License:** MIT. See [LICENSE.txt](LICENSE.txt)
**Changelog:** See [CHANGELOG.md](CHANGELOG.md)
**Product page:** [Vulkan Memory Allocator on GPUOpen](https://gpuopen.com/gaming-product/vulkan-memory-allocator/)
**Build status:**
- Windows: [![Build status](https://ci.appveyor.com/api/projects/status/4vlcrb0emkaio2pn/branch/master?svg=true)](https://ci.appveyor.com/project/adam-sawicki-amd/vulkanmemoryallocator/branch/master)
- Linux: [![Build Status](https://app.travis-ci.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.svg?branch=master)](https://app.travis-ci.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator)
[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.svg)](http://isitmaintained.com/project/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator "Average time to resolve an issue")
# Problem
Memory allocation and resource (buffer and image) creation in Vulkan is difficult (comparing to older graphics APIs, like D3D11 or OpenGL) for several reasons:
- It requires a lot of boilerplate code, just like everything else in Vulkan, because it is a low-level and high-performance API.
- There is additional level of indirection: `VkDeviceMemory` is allocated separately from creating `VkBuffer`/`VkImage` and they must be bound together.
- Driver must be queried for supported memory heaps and memory types. Different GPU vendors provide different types of it.
- It is recommended to allocate bigger chunks of memory and assign parts of them to particular resources, as there is a limit on maximum number of memory blocks that can be allocated.
# Features
This library can help game developers to manage memory allocations and resource creation by offering some higher-level functions:
1. Functions that help to choose correct and optimal memory type based on intended usage of the memory.
- Required or preferred traits of the memory are expressed using higher-level description comparing to Vulkan flags.
2. Functions that allocate memory blocks, reserve and return parts of them (`VkDeviceMemory` + offset + size) to the user.
- Library keeps track of allocated memory blocks, used and unused ranges inside them, finds best matching unused ranges for new allocations, respects all the rules of alignment and buffer/image granularity.
3. Functions that can create an image/buffer, allocate memory for it and bind them together - all in one call.
Additional features:
- Well-documented - description of all functions and structures provided, along with chapters that contain general description and example code.
- Thread-safety: Library is designed to be used in multithreaded code. Access to a single device memory block referred by different buffers and textures (binding, mapping) is synchronized internally. Memory mapping is reference-counted.
- Configuration: Fill optional members of `VmaAllocatorCreateInfo` structure to provide custom CPU memory allocator, pointers to Vulkan functions and other parameters.
- Customization and integration with custom engines: Predefine appropriate macros to provide your own implementation of all external facilities used by the library like assert, mutex, atomic.
- Support for memory mapping, reference-counted internally. Support for persistently mapped memory: Just allocate with appropriate flag and access the pointer to already mapped memory.
- Support for non-coherent memory. Functions that flush/invalidate memory. `nonCoherentAtomSize` is respected automatically.
- Support for resource aliasing (overlap).
- Support for sparse binding and sparse residency: Convenience functions that allocate or free multiple memory pages at once.
- Custom memory pools: Create a pool with desired parameters (e.g. fixed or limited maximum size) and allocate memory out of it.
- Linear allocator: Create a pool with linear algorithm and use it for much faster allocations and deallocations in free-at-once, stack, double stack, or ring buffer fashion.
- Support for Vulkan 1.0, 1.1, 1.2, 1.3.
- Support for extensions (and equivalent functionality included in new Vulkan versions):
- VK_KHR_dedicated_allocation: Just enable it and it will be used automatically by the library.
- VK_KHR_buffer_device_address: Flag `VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT_KHR` is automatically added to memory allocations where needed.
- VK_EXT_memory_budget: Used internally if available to query for current usage and budget. If not available, it falls back to an estimation based on memory heap sizes.
- VK_EXT_memory_priority: Set `priority` of allocations or custom pools and it will be set automatically using this extension.
- VK_AMD_device_coherent_memory
- Defragmentation of GPU and CPU memory: Let the library move data around to free some memory blocks and make your allocations better compacted.
- Statistics: Obtain brief or detailed statistics about the amount of memory used, unused, number of allocated blocks, number of allocations etc. - globally, per memory heap, and per memory type.
- Debug annotations: Associate custom `void* pUserData` and debug `char* pName` with each allocation.
- JSON dump: Obtain a string in JSON format with detailed map of internal state, including list of allocations, their string names, and gaps between them.
- Convert this JSON dump into a picture to visualize your memory. See [tools/GpuMemDumpVis](tools/GpuMemDumpVis/README.md).
- Debugging incorrect memory usage: Enable initialization of all allocated memory with a bit pattern to detect usage of uninitialized or freed memory. Enable validation of a magic number after every allocation to detect out-of-bounds memory corruption.
- Support for interoperability with OpenGL.
- Virtual allocator: Interface for using core allocation algorithm to allocate any custom data, e.g. pieces of one large buffer.
# Prerequisites
- Self-contained C++ library in single header file. No external dependencies other than standard C and C++ library and of course Vulkan. Some features of C++14 used. STL containers, RTTI, or C++ exceptions are not used.
- Public interface in C, in same convention as Vulkan API. Implementation in C++.
- Error handling implemented by returning `VkResult` error codes - same way as in Vulkan.
- Interface documented using Doxygen-style comments.
- Platform-independent, but developed and tested on Windows using Visual Studio. Continuous integration setup for Windows and Linux. Used also on Android, MacOS, and other platforms.
# Example
Basic usage of this library is very simple. Advanced features are optional. After you created global `VmaAllocator` object, a complete code needed to create a buffer may look like this:
```cpp
VkBufferCreateInfo bufferInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO };
bufferInfo.size = 65536;
bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;
VmaAllocationCreateInfo allocInfo = {};
allocInfo.usage = VMA_MEMORY_USAGE_AUTO;
VkBuffer buffer;
VmaAllocation allocation;
vmaCreateBuffer(allocator, &bufferInfo, &allocInfo, &buffer, &allocation, nullptr);
```
With this one function call:
1. `VkBuffer` is created.
2. `VkDeviceMemory` block is allocated if needed.
3. An unused region of the memory block is bound to this buffer.
`VmaAllocation` is an object that represents memory assigned to this buffer. It can be queried for parameters like `VkDeviceMemory` handle and offset.
# How to build
On Windows it is recommended to use [CMake UI](https://cmake.org/runningcmake/). Alternatively you can generate a Visual Studio project map using CMake in command line: `cmake -B./build/ -DCMAKE_BUILD_TYPE=Debug -G "Visual Studio 16 2019" -A x64 ./`
On Linux:
```
mkdir build
cd build
cmake ..
make
```
The following targets are available
| Target | Description | CMake option | Default setting |
| ------------- | ------------- | ------------- | ------------- |
| VmaSample | VMA sample application | `VMA_BUILD_SAMPLE` | `OFF` |
| VmaBuildSampleShaders | Shaders for VmaSample | `VMA_BUILD_SAMPLE_SHADERS` | `OFF` |
Please note that while VulkanMemoryAllocator library is supported on other platforms besides Windows, VmaSample is not.
These CMake options are available
| CMake option | Description | Default setting |
| ------------- | ------------- | ------------- |
| `VMA_RECORDING_ENABLED` | Enable VMA memory recording for debugging | `OFF` |
| `VMA_USE_STL_CONTAINERS` | Use C++ STL containers instead of VMA's containers | `OFF` |
| `VMA_STATIC_VULKAN_FUNCTIONS` | Link statically with Vulkan API | `OFF` |
| `VMA_DYNAMIC_VULKAN_FUNCTIONS` | Fetch pointers to Vulkan functions internally (no static linking) | `ON` |
| `VMA_DEBUG_ALWAYS_DEDICATED_MEMORY` | Every allocation will have its own memory block | `OFF` |
| `VMA_DEBUG_INITIALIZE_ALLOCATIONS` | Automatically fill new allocations and destroyed allocations with some bit pattern | `OFF` |
| `VMA_DEBUG_GLOBAL_MUTEX` | Enable single mutex protecting all entry calls to the library | `OFF` |
| `VMA_DEBUG_DONT_EXCEED_MAX_MEMORY_ALLOCATION_COUNT` | Never exceed [VkPhysicalDeviceLimits::maxMemoryAllocationCount](https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#limits-maxMemoryAllocationCount) and return error | `OFF` |
# Binaries
The release comes with precompiled binary executable for "VulkanSample" application which contains test suite. It is compiled using Visual Studio 2019, so it requires appropriate libraries to work, including "MSVCP140.dll", "VCRUNTIME140.dll", "VCRUNTIME140_1.dll". If the launch fails with error message telling about those files missing, please download and install [Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads), "x64" version.
# Read more
See **[Documentation](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/)**.
# Software using this library
- **[X-Plane](https://x-plane.com/)**
- **[Detroit: Become Human](https://gpuopen.com/learn/porting-detroit-3/)**
- **[Vulkan Samples](https://github.com/LunarG/VulkanSamples)** - official Khronos Vulkan samples. License: Apache-style.
- **[Anvil](https://github.com/GPUOpen-LibrariesAndSDKs/Anvil)** - cross-platform framework for Vulkan. License: MIT.
- **[Filament](https://github.com/google/filament)** - physically based rendering engine for Android, Windows, Linux and macOS, from Google. Apache License 2.0.
- **[Atypical Games - proprietary game engine](https://developer.samsung.com/galaxy-gamedev/gamedev-blog/infinitejet.html)**
- **[Flax Engine](https://flaxengine.com/)**
- **[Godot Engine](https://github.com/godotengine/godot/)** - multi-platform 2D and 3D game engine. License: MIT.
- **[Lightweight Java Game Library (LWJGL)](https://www.lwjgl.org/)** - includes binding of the library for Java. License: BSD.
- **[PowerVR SDK](https://github.com/powervr-graphics/Native_SDK)** - C++ cross-platform 3D graphics SDK, from Imagination. License: MIT.
- **[Skia](https://github.com/google/skia)** - complete 2D graphic library for drawing Text, Geometries, and Images, from Google.
- **[The Forge](https://github.com/ConfettiFX/The-Forge)** - cross-platform rendering framework. Apache License 2.0.
- **[VK9](https://github.com/disks86/VK9)** - Direct3D 9 compatibility layer using Vulkan. Zlib lincese.
- **[vkDOOM3](https://github.com/DustinHLand/vkDOOM3)** - Vulkan port of GPL DOOM 3 BFG Edition. License: GNU GPL.
- **[vkQuake2](https://github.com/kondrak/vkQuake2)** - vanilla Quake 2 with Vulkan support. License: GNU GPL.
- **[Vulkan Best Practice for Mobile Developers](https://github.com/ARM-software/vulkan_best_practice_for_mobile_developers)** from ARM. License: MIT.
- **[RPCS3](https://github.com/RPCS3/rpcs3)** - PlayStation 3 emulator/debugger. License: GNU GPLv2.
- **[PPSSPP](https://github.com/hrydgard/ppsspp)** - Playstation Portable emulator/debugger. License: GNU GPLv2+.
[Many other projects on GitHub](https://github.com/search?q=AMD_VULKAN_MEMORY_ALLOCATOR_H&type=Code) and some game development studios that use Vulkan in their games.
# See also
- **[D3D12 Memory Allocator](https://github.com/GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator)** - equivalent library for Direct3D 12. License: MIT.
- **[Awesome Vulkan](https://github.com/vinjn/awesome-vulkan)** - a curated list of awesome Vulkan libraries, debuggers and resources.
- **[VulkanMemoryAllocator-Hpp](https://github.com/malte-v/VulkanMemoryAllocator-Hpp)** - C++ binding for this library. License: CC0-1.0.
- **[PyVMA](https://github.com/realitix/pyvma)** - Python wrapper for this library. Author: Jean-Sébastien B. (@realitix). License: Apache 2.0.
- **[vk-mem](https://github.com/gwihlidal/vk-mem-rs)** - Rust binding for this library. Author: Graham Wihlidal. License: Apache 2.0 or MIT.
- **[Haskell bindings](https://hackage.haskell.org/package/VulkanMemoryAllocator)**, **[github](https://github.com/expipiplus1/vulkan/tree/master/VulkanMemoryAllocator)** - Haskell bindings for this library. Author: Ellie Hermaszewska (@expipiplus1). License BSD-3-Clause.
- **[vma_sample_sdl](https://github.com/rextimmy/vma_sample_sdl)** - SDL port of the sample app of this library (with the goal of running it on multiple platforms, including MacOS). Author: @rextimmy. License: MIT.
- **[vulkan-malloc](https://github.com/dylanede/vulkan-malloc)** - Vulkan memory allocation library for Rust. Based on version 1 of this library. Author: Dylan Ede (@dylanede). License: MIT / Apache 2.0.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0-or-later
* Copyright 2022 Blender Foundation. All rights reserved. */
#ifdef __APPLE__
# include <MoltenVK/vk_mvk_moltenvk.h>
#else
# include <vulkan/vulkan.h>
#endif
#define VMA_IMPLEMENTATION
#include "vk_mem_alloc.h"

View File

@@ -253,6 +253,33 @@ if(WITH_CYCLES_OSL)
)
endif()
if(WITH_CYCLES_DEVICE_CUDA OR WITH_CYCLES_DEVICE_OPTIX)
add_definitions(-DWITH_CUDA)
if(WITH_CUDA_DYNLOAD)
include_directories(
../../extern/cuew/include
)
add_definitions(-DWITH_CUDA_DYNLOAD)
else()
include_directories(
SYSTEM
${CUDA_TOOLKIT_INCLUDE}
)
endif()
endif()
if(WITH_CYCLES_DEVICE_HIP)
add_definitions(-DWITH_HIP)
if(WITH_HIP_DYNLOAD)
include_directories(
../../extern/hipew/include
)
add_definitions(-DWITH_HIP_DYNLOAD)
endif()
endif()
if(WITH_CYCLES_DEVICE_OPTIX)
find_package(OptiX 7.3.0)
@@ -261,12 +288,16 @@ if(WITH_CYCLES_DEVICE_OPTIX)
include_directories(
SYSTEM
${OPTIX_INCLUDE_DIR}
)
)
else()
set_and_warn_library_found("OptiX" OPTIX_FOUND WITH_CYCLES_DEVICE_OPTIX)
endif()
endif()
if(WITH_CYCLES_DEVICE_METAL)
add_definitions(-DWITH_METAL)
endif()
if (WITH_CYCLES_DEVICE_ONEAPI)
add_definitions(-DWITH_ONEAPI)
endif()

View File

@@ -58,7 +58,7 @@ class CyclesRender(bpy.types.RenderEngine):
if not self.session:
if self.is_preview:
cscene = bpy.context.scene.cycles
use_osl = cscene.shading_system and cscene.device == 'CPU'
use_osl = cscene.shading_system
engine.create(self, data, preview_osl=use_osl)
else:

View File

@@ -156,6 +156,11 @@ def with_osl():
return _cycles.with_osl
def osl_version():
import _cycles
return _cycles.osl_version
def with_path_guiding():
import _cycles
return _cycles.with_path_guiding

View File

@@ -290,7 +290,7 @@ class CyclesRenderSettings(bpy.types.PropertyGroup):
)
shading_system: BoolProperty(
name="Open Shading Language",
description="Use Open Shading Language (CPU rendering only)",
description="Use Open Shading Language",
)
preview_pause: BoolProperty(

View File

@@ -2307,7 +2307,10 @@ def draw_device(self, context):
col.prop(cscene, "device")
from . import engine
if engine.with_osl() and use_cpu(context):
if engine.with_osl() and (
use_cpu(context) or
(use_optix(context) and (engine.osl_version()[1] >= 13 or engine.osl_version()[0] > 1))
):
col.prop(cscene, "shading_system")

View File

@@ -367,11 +367,13 @@ static void attr_create_generic(Scene *scene,
{
AttributeSet &attributes = (subdivision) ? mesh->subd_attributes : mesh->attributes;
static const ustring u_velocity("velocity");
const ustring default_color_name{b_mesh.attributes.default_color_name().c_str()};
int attribute_index = 0;
int render_color_index = b_mesh.attributes.render_color_index();
for (BL::Attribute &b_attribute : b_mesh.attributes) {
const ustring name{b_attribute.name().c_str()};
const bool is_render_color = name == default_color_name;
const bool is_render_color = (attribute_index++ == render_color_index);
if (need_motion && name == u_velocity) {
attr_create_motion(mesh, b_attribute, motion_scale);
@@ -1083,11 +1085,11 @@ static void create_subd_mesh(Scene *scene,
const int edges_num = b_mesh.edges.length();
if (edges_num != 0 && b_mesh.edge_creases.length() > 0) {
BL::MeshEdgeCreaseLayer creases = b_mesh.edge_creases[0];
size_t num_creases = 0;
const float *creases = static_cast<float *>(b_mesh.edge_creases[0].ptr.data);
for (int i = 0; i < edges_num; i++) {
if (creases.data[i].value() != 0.0f) {
if (creases[i] != 0.0f) {
num_creases++;
}
}
@@ -1096,18 +1098,17 @@ static void create_subd_mesh(Scene *scene,
const MEdge *edges = static_cast<MEdge *>(b_mesh.edges[0].ptr.data);
for (int i = 0; i < edges_num; i++) {
const float crease = creases.data[i].value();
if (crease != 0.0f) {
if (creases[i] != 0.0f) {
const MEdge &b_edge = edges[i];
mesh->add_edge_crease(b_edge.v1, b_edge.v2, crease);
mesh->add_edge_crease(b_edge.v1, b_edge.v2, creases[i]);
}
}
}
for (BL::MeshVertexCreaseLayer &c : b_mesh.vertex_creases) {
for (int i = 0; i < c.data.length(); ++i) {
if (c.data[i].value() != 0.0f) {
mesh->add_vertex_crease(i, c.data[i].value());
for (BL::MeshVertexCreaseLayer &c : b_mesh.vertex_creases) {
for (int i = 0; i < c.data.length(); ++i) {
if (c.data[i].value() != 0.0f) {
mesh->add_vertex_crease(i, c.data[i].value());
}
}
}
}

View File

@@ -1,6 +1,584 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2011-2022 Blender Foundation
###########################################################################
# Helper macros
###########################################################################
macro(_set_default variable value)
if(NOT ${variable})
set(${variable} ${value})
endif()
endmacro()
###########################################################################
# Precompiled libraries detection
#
# Use precompiled libraries from Blender repository
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(APPLE)
if("${CMAKE_OSX_ARCHITECTURES}" STREQUAL "x86_64")
set(_cycles_lib_dir "${CMAKE_SOURCE_DIR}/../lib/darwin")
else()
set(_cycles_lib_dir "${CMAKE_SOURCE_DIR}/../lib/darwin_arm64")
endif()
# Always use system zlib
find_package(ZLIB REQUIRED)
elseif(WIN32)
if(CMAKE_CL_64)
set(_cycles_lib_dir "${CMAKE_SOURCE_DIR}/../lib/win64_vc15")
else()
message(FATAL_ERROR "Unsupported Visual Studio Version")
endif()
else()
# Path to a locally compiled libraries.
set(LIBDIR_NAME ${CMAKE_SYSTEM_NAME}_${CMAKE_SYSTEM_PROCESSOR})
string(TOLOWER ${LIBDIR_NAME} LIBDIR_NAME)
set(LIBDIR_NATIVE_ABI ${CMAKE_SOURCE_DIR}/../lib/${LIBDIR_NAME})
# Path to precompiled libraries with known CentOS 7 ABI.
set(LIBDIR_CENTOS7_ABI ${CMAKE_SOURCE_DIR}/../lib/linux_centos7_x86_64)
# Choose the best suitable libraries.
if(EXISTS ${LIBDIR_NATIVE_ABI})
set(_cycles_lib_dir ${LIBDIR_NATIVE_ABI})
elseif(EXISTS ${LIBDIR_CENTOS7_ABI})
set(_cycles_lib_dir ${LIBDIR_CENTOS7_ABI})
set(WITH_CXX11_ABI OFF)
if(CMAKE_COMPILER_IS_GNUCC AND
CMAKE_C_COMPILER_VERSION VERSION_LESS 9.3)
message(FATAL_ERROR "GCC version must be at least 9.3 for precompiled libraries, found ${CMAKE_C_COMPILER_VERSION}")
endif()
endif()
if(DEFINED _cycles_lib_dir)
message(STATUS "Using precompiled libraries at ${_cycles_lib_dir}")
endif()
# Avoid namespace pollustion.
unset(LIBDIR_NATIVE_ABI)
unset(LIBDIR_CENTOS7_ABI)
endif()
if(EXISTS ${_cycles_lib_dir})
_set_default(ALEMBIC_ROOT_DIR "${_cycles_lib_dir}/alembic")
_set_default(BOOST_ROOT "${_cycles_lib_dir}/boost")
_set_default(BLOSC_ROOT_DIR "${_cycles_lib_dir}/blosc")
_set_default(EMBREE_ROOT_DIR "${_cycles_lib_dir}/embree")
_set_default(EPOXY_ROOT_DIR "${_cycles_lib_dir}/epoxy")
_set_default(IMATH_ROOT_DIR "${_cycles_lib_dir}/imath")
_set_default(GLEW_ROOT_DIR "${_cycles_lib_dir}/glew")
_set_default(JPEG_ROOT "${_cycles_lib_dir}/jpeg")
_set_default(LLVM_ROOT_DIR "${_cycles_lib_dir}/llvm")
_set_default(CLANG_ROOT_DIR "${_cycles_lib_dir}/llvm")
_set_default(NANOVDB_ROOT_DIR "${_cycles_lib_dir}/openvdb")
_set_default(OPENCOLORIO_ROOT_DIR "${_cycles_lib_dir}/opencolorio")
_set_default(OPENEXR_ROOT_DIR "${_cycles_lib_dir}/openexr")
_set_default(OPENIMAGEDENOISE_ROOT_DIR "${_cycles_lib_dir}/openimagedenoise")
_set_default(OPENIMAGEIO_ROOT_DIR "${_cycles_lib_dir}/openimageio")
_set_default(OPENJPEG_ROOT_DIR "${_cycles_lib_dir}/openjpeg")
_set_default(OPENSUBDIV_ROOT_DIR "${_cycles_lib_dir}/opensubdiv")
_set_default(OPENVDB_ROOT_DIR "${_cycles_lib_dir}/openvdb")
_set_default(OSL_ROOT_DIR "${_cycles_lib_dir}/osl")
_set_default(PNG_ROOT "${_cycles_lib_dir}/png")
_set_default(PUGIXML_ROOT_DIR "${_cycles_lib_dir}/pugixml")
_set_default(SDL2_ROOT_DIR "${_cycles_lib_dir}/sdl")
_set_default(TBB_ROOT_DIR "${_cycles_lib_dir}/tbb")
_set_default(TIFF_ROOT "${_cycles_lib_dir}/tiff")
_set_default(USD_ROOT_DIR "${_cycles_lib_dir}/usd")
_set_default(WEBP_ROOT_DIR "${_cycles_lib_dir}/webp")
_set_default(ZLIB_ROOT "${_cycles_lib_dir}/zlib")
if(WIN32)
set(LEVEL_ZERO_ROOT_DIR ${_cycles_lib_dir}/level_zero)
else()
set(LEVEL_ZERO_ROOT_DIR ${_cycles_lib_dir}/level-zero)
endif()
_set_default(SYCL_ROOT_DIR "${_cycles_lib_dir}/dpcpp")
# Ignore system libraries
set(CMAKE_IGNORE_PATH "${CMAKE_PLATFORM_IMPLICIT_LINK_DIRECTORIES};${CMAKE_SYSTEM_INCLUDE_PATH};${CMAKE_C_IMPLICIT_INCLUDE_DIRECTORIES};${CMAKE_CXX_IMPLICIT_INCLUDE_DIRECTORIES}")
else()
unset(_cycles_lib_dir)
endif()
endif()
###########################################################################
# Zlib
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(ZLIB_INCLUDE_DIRS ${_cycles_lib_dir}/zlib/include)
set(ZLIB_LIBRARIES ${_cycles_lib_dir}/zlib/lib/libz_st.lib)
set(ZLIB_INCLUDE_DIR ${_cycles_lib_dir}/zlib/include)
set(ZLIB_LIBRARY ${_cycles_lib_dir}/zlib/lib/libz_st.lib)
set(ZLIB_DIR ${_cycles_lib_dir}/zlib)
set(ZLIB_FOUND ON)
elseif(NOT APPLE)
find_package(ZLIB REQUIRED)
endif()
endif()
###########################################################################
# PThreads
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(PTHREADS_LIBRARIES "${_cycles_lib_dir}/pthreads/lib/pthreadVC3.lib")
include_directories("${_cycles_lib_dir}/pthreads/include")
else()
set(CMAKE_THREAD_PREFER_PTHREAD TRUE)
find_package(Threads REQUIRED)
set(PTHREADS_LIBRARIES ${CMAKE_THREAD_LIBS_INIT})
endif()
endif()
###########################################################################
# OpenImageIO and image libraries
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(MSVC AND EXISTS ${_cycles_lib_dir})
add_definitions(
# OIIO changed the name of this define in newer versions
# we define both, so it would work with both old and new
# versions.
-DOIIO_STATIC_BUILD
-DOIIO_STATIC_DEFINE
)
set(OPENIMAGEIO_INCLUDE_DIR ${OPENIMAGEIO_ROOT_DIR}/include)
set(OPENIMAGEIO_INCLUDE_DIRS ${OPENIMAGEIO_INCLUDE_DIR} ${OPENIMAGEIO_INCLUDE_DIR}/OpenImageIO)
# Special exceptions for libraries which needs explicit debug version
set(OPENIMAGEIO_LIBRARIES
optimized ${OPENIMAGEIO_ROOT_DIR}/lib/OpenImageIO.lib
optimized ${OPENIMAGEIO_ROOT_DIR}/lib/OpenImageIO_Util.lib
debug ${OPENIMAGEIO_ROOT_DIR}/lib/OpenImageIO_d.lib
debug ${OPENIMAGEIO_ROOT_DIR}/lib/OpenImageIO_Util_d.lib
)
set(PUGIXML_INCLUDE_DIR ${PUGIXML_ROOT_DIR}/include)
set(PUGIXML_LIBRARIES
optimized ${PUGIXML_ROOT_DIR}/lib/pugixml.lib
debug ${PUGIXML_ROOT_DIR}/lib/pugixml_d.lib
)
else()
find_package(OpenImageIO REQUIRED)
if(OPENIMAGEIO_PUGIXML_FOUND)
set(PUGIXML_INCLUDE_DIR "${OPENIMAGEIO_INCLUDE_DIR}/OpenImageIO")
set(PUGIXML_LIBRARIES "")
else()
find_package(PugiXML REQUIRED)
endif()
endif()
# Dependencies
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(OPENJPEG_INCLUDE_DIR ${OPENJPEG}/include/openjpeg-2.3)
set(OPENJPEG_LIBRARIES ${_cycles_lib_dir}/openjpeg/lib/openjp2${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
find_package(OpenJPEG REQUIRED)
endif()
find_package(JPEG REQUIRED)
find_package(TIFF REQUIRED)
find_package(WebP)
if(EXISTS ${_cycles_lib_dir})
set(PNG_NAMES png16 libpng16 png libpng)
endif()
find_package(PNG REQUIRED)
endif()
###########################################################################
# OpenEXR
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(OPENEXR_INCLUDE_DIR ${OPENEXR_ROOT_DIR}/include)
set(OPENEXR_INCLUDE_DIRS ${OPENEXR_INCLUDE_DIR} ${OPENEXR_ROOT_DIR}/include/OpenEXR ${IMATH_ROOT_DIR}/include ${IMATH_ROOT_DIR}/include/Imath)
set(OPENEXR_LIBRARIES
optimized ${OPENEXR_ROOT_DIR}/lib/OpenEXR_s.lib
optimized ${OPENEXR_ROOT_DIR}/lib/OpenEXRCore_s.lib
optimized ${OPENEXR_ROOT_DIR}/lib/Iex_s.lib
optimized ${IMATH_ROOT_DIR}/lib/Imath_s.lib
optimized ${OPENEXR_ROOT_DIR}/lib/IlmThread_s.lib
debug ${OPENEXR_ROOT_DIR}/lib/OpenEXR_s_d.lib
debug ${OPENEXR_ROOT_DIR}/lib/OpenEXRCore_s_d.lib
debug ${OPENEXR_ROOT_DIR}/lib/Iex_s_d.lib
debug ${IMATH_ROOT_DIR}/lib/Imath_s_d.lib
debug ${OPENEXR_ROOT_DIR}/lib/IlmThread_s_d.lib
)
else()
find_package(OpenEXR REQUIRED)
endif()
endif()
###########################################################################
# OpenShadingLanguage & LLVM
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_OSL)
if(EXISTS ${_cycles_lib_dir})
set(LLVM_STATIC ON)
endif()
if(MSVC AND EXISTS ${_cycles_lib_dir})
# TODO(sergey): On Windows llvm-config doesn't give proper results for the
# library names, use hardcoded libraries for now.
file(GLOB _llvm_libs_release ${LLVM_ROOT_DIR}/lib/*.lib)
file(GLOB _llvm_libs_debug ${LLVM_ROOT_DIR}/debug/lib/*.lib)
set(_llvm_libs)
foreach(_llvm_lib_path ${_llvm_libs_release})
get_filename_component(_llvm_lib_name ${_llvm_lib_path} ABSOLUTE)
list(APPEND _llvm_libs optimized ${_llvm_lib_name})
endforeach()
foreach(_llvm_lib_path ${_llvm_libs_debug})
get_filename_component(_llvm_lib_name ${_llvm_lib_path} ABSOLUTE)
list(APPEND _llvm_libs debug ${_llvm_lib_name})
endforeach()
set(LLVM_LIBRARY ${_llvm_libs})
unset(_llvm_lib_name)
unset(_llvm_lib_path)
unset(_llvm_libs)
unset(_llvm_libs_debug)
unset(_llvm_libs_release)
set(OSL_INCLUDE_DIR ${OSL_ROOT_DIR}/include)
set(OSL_LIBRARIES
optimized ${OSL_ROOT_DIR}/lib/oslcomp.lib
optimized ${OSL_ROOT_DIR}/lib/oslexec.lib
optimized ${OSL_ROOT_DIR}/lib/oslquery.lib
debug ${OSL_ROOT_DIR}/lib/oslcomp_d.lib
debug ${OSL_ROOT_DIR}/lib/oslexec_d.lib
debug ${OSL_ROOT_DIR}/lib/oslquery_d.lib
${PUGIXML_LIBRARIES}
)
find_program(OSL_COMPILER NAMES oslc PATHS ${OSL_ROOT_DIR}/bin)
else()
find_package(OSL REQUIRED)
find_package(LLVM REQUIRED)
find_package(Clang REQUIRED)
endif()
endif()
###########################################################################
# OpenPGL
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_PATH_GUIDING)
if(NOT openpgl_DIR AND EXISTS ${_cycles_lib_dir})
set(openpgl_DIR ${_cycles_lib_dir}/openpgl/lib/cmake/openpgl)
endif()
find_package(openpgl QUIET)
if(openpgl_FOUND)
if(WIN32)
get_target_property(OPENPGL_LIBRARIES_RELEASE openpgl::openpgl LOCATION_RELEASE)
get_target_property(OPENPGL_LIBRARIES_DEBUG openpgl::openpgl LOCATION_DEBUG)
set(OPENPGL_LIBRARIES optimized ${OPENPGL_LIBRARIES_RELEASE} debug ${OPENPGL_LIBRARIES_DEBUG})
else()
get_target_property(OPENPGL_LIBRARIES openpgl::openpgl LOCATION)
endif()
get_target_property(OPENPGL_INCLUDE_DIR openpgl::openpgl INTERFACE_INCLUDE_DIRECTORIES)
else()
set_and_warn_library_found("OpenPGL" openpgl_FOUND WITH_CYCLES_PATH_GUIDING)
endif()
endif()
###########################################################################
# OpenColorIO
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_OPENCOLORIO)
set(WITH_OPENCOLORIO ON)
if(NOT USD_OVERRIDE_OPENCOLORIO)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(OPENCOLORIO_INCLUDE_DIRS ${OPENCOLORIO_ROOT_DIR}/include)
set(OPENCOLORIO_LIBRARIES
optimized ${OPENCOLORIO_ROOT_DIR}/lib/OpenColorIO.lib
optimized ${OPENCOLORIO_ROOT_DIR}/lib/libyaml-cpp.lib
optimized ${OPENCOLORIO_ROOT_DIR}/lib/libexpatMD.lib
optimized ${OPENCOLORIO_ROOT_DIR}/lib/pystring.lib
debug ${OPENCOLORIO_ROOT_DIR}/lib/OpencolorIO_d.lib
debug ${OPENCOLORIO_ROOT_DIR}/lib/libyaml-cpp_d.lib
debug ${OPENCOLORIO_ROOT_DIR}/lib/libexpatdMD.lib
debug ${OPENCOLORIO_ROOT_DIR}/lib/pystring_d.lib
)
else()
find_package(OpenColorIO REQUIRED)
endif()
endif()
endif()
###########################################################################
# Boost
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(EXISTS ${_cycles_lib_dir})
if(MSVC)
set(Boost_USE_STATIC_RUNTIME OFF)
set(Boost_USE_MULTITHREADED ON)
set(Boost_USE_STATIC_LIBS ON)
else()
set(BOOST_LIBRARYDIR ${_cycles_lib_dir}/boost/lib)
set(Boost_NO_BOOST_CMAKE ON)
set(Boost_NO_SYSTEM_PATHS ON)
endif()
endif()
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(BOOST_INCLUDE_DIR ${BOOST_ROOT}/include)
set(BOOST_VERSION_HEADER ${BOOST_INCLUDE_DIR}/boost/version.hpp)
if(EXISTS ${BOOST_VERSION_HEADER})
file(STRINGS "${BOOST_VERSION_HEADER}" BOOST_LIB_VERSION REGEX "#define BOOST_LIB_VERSION ")
if(BOOST_LIB_VERSION MATCHES "#define BOOST_LIB_VERSION \"([0-9_]+)\"")
set(BOOST_VERSION "${CMAKE_MATCH_1}")
endif()
endif()
if(NOT BOOST_VERSION)
message(FATAL_ERROR "Unable to determine Boost version")
endif()
set(BOOST_POSTFIX "vc142-mt-x64-${BOOST_VERSION}.lib")
set(BOOST_DEBUG_POSTFIX "vc142-mt-gd-x64-${BOOST_VERSION}.lib")
set(BOOST_LIBRARIES
optimized ${BOOST_ROOT}/lib/libboost_date_time-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_iostreams-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_filesystem-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_regex-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_system-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_thread-${BOOST_POSTFIX}
optimized ${BOOST_ROOT}/lib/libboost_chrono-${BOOST_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_date_time-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_iostreams-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_filesystem-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_regex-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_system-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_thread-${BOOST_DEBUG_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_chrono-${BOOST_DEBUG_POSTFIX}
)
if(WITH_CYCLES_OSL)
set(BOOST_LIBRARIES ${BOOST_LIBRARIES}
optimized ${BOOST_ROOT}/lib/libboost_wave-${BOOST_POSTFIX}
debug ${BOOST_ROOT}/lib/libboost_wave-${BOOST_DEBUG_POSTFIX})
endif()
else()
set(__boost_packages iostreams filesystem regex system thread date_time)
if(WITH_CYCLES_OSL)
list(APPEND __boost_packages wave)
endif()
find_package(Boost 1.48 COMPONENTS ${__boost_packages} REQUIRED)
if(NOT Boost_FOUND)
# Try to find non-multithreaded if -mt not found, this flag
# doesn't matter for us, it has nothing to do with thread
# safety, but keep it to not disturb build setups.
set(Boost_USE_MULTITHREADED OFF)
find_package(Boost 1.48 COMPONENTS ${__boost_packages})
endif()
unset(__boost_packages)
set(BOOST_INCLUDE_DIR ${Boost_INCLUDE_DIRS})
set(BOOST_LIBRARIES ${Boost_LIBRARIES})
set(BOOST_LIBPATH ${Boost_LIBRARY_DIRS})
endif()
set(BOOST_DEFINITIONS "-DBOOST_ALL_NO_LIB ${BOOST_DEFINITIONS}")
endif()
###########################################################################
# Embree
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_EMBREE)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(EMBREE_INCLUDE_DIRS ${EMBREE_ROOT_DIR}/include)
set(EMBREE_LIBRARIES
optimized ${EMBREE_ROOT_DIR}/lib/embree3.lib
optimized ${EMBREE_ROOT_DIR}/lib/embree_avx2.lib
optimized ${EMBREE_ROOT_DIR}/lib/embree_avx.lib
optimized ${EMBREE_ROOT_DIR}/lib/embree_sse42.lib
optimized ${EMBREE_ROOT_DIR}/lib/lexers.lib
optimized ${EMBREE_ROOT_DIR}/lib/math.lib
optimized ${EMBREE_ROOT_DIR}/lib/simd.lib
optimized ${EMBREE_ROOT_DIR}/lib/tasking.lib
optimized ${EMBREE_ROOT_DIR}/lib/sys.lib
debug ${EMBREE_ROOT_DIR}/lib/embree3_d.lib
debug ${EMBREE_ROOT_DIR}/lib/embree_avx2_d.lib
debug ${EMBREE_ROOT_DIR}/lib/embree_avx_d.lib
debug ${EMBREE_ROOT_DIR}/lib/embree_sse42_d.lib
debug ${EMBREE_ROOT_DIR}/lib/lexers_d.lib
debug ${EMBREE_ROOT_DIR}/lib/math_d.lib
debug ${EMBREE_ROOT_DIR}/lib/simd_d.lib
debug ${EMBREE_ROOT_DIR}/lib/sys_d.lib
debug ${EMBREE_ROOT_DIR}/lib/tasking_d.lib
)
else()
find_package(Embree 3.8.0 REQUIRED)
endif()
endif()
###########################################################################
# Logging
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_LOGGING)
find_package(Glog REQUIRED)
find_package(Gflags REQUIRED)
endif()
###########################################################################
# OpenSubdiv
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_OPENSUBDIV)
set(WITH_OPENSUBDIV ON)
if(NOT USD_OVERRIDE_OPENSUBDIV)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(OPENSUBDIV_INCLUDE_DIRS ${OPENSUBDIV_ROOT_DIR}/include)
set(OPENSUBDIV_LIBRARIES
optimized ${OPENSUBDIV_ROOT_DIR}/lib/osdCPU.lib
optimized ${OPENSUBDIV_ROOT_DIR}/lib/osdGPU.lib
debug ${OPENSUBDIV_ROOT_DIR}/lib/osdCPU_d.lib
debug ${OPENSUBDIV_ROOT_DIR}/lib/osdGPU_d.lib
)
else()
find_package(OpenSubdiv REQUIRED)
endif()
endif()
endif()
###########################################################################
# OpenVDB
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_OPENVDB)
set(WITH_OPENVDB ON)
set(OPENVDB_DEFINITIONS -DNOMINMAX -D_USE_MATH_DEFINES)
if(NOT USD_OVERRIDE_OPENVDB)
find_package(OpenVDB REQUIRED)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(BLOSC_LIBRARY
optimized ${BLOSC_ROOT_DIR}/lib/libblosc.lib
debug ${BLOSC_ROOT_DIR}/lib/libblosc_d.lib
)
else()
find_package(Blosc REQUIRED)
endif()
endif()
endif()
###########################################################################
# NanoVDB
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_NANOVDB)
set(WITH_NANOVDB ON)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(NANOVDB_INCLUDE_DIR ${NANOVDB_ROOT_DIR}/include)
set(NANOVDB_INCLUDE_DIRS ${NANOVDB_INCLUDE_DIR})
else()
find_package(NanoVDB REQUIRED)
endif()
endif()
###########################################################################
# OpenImageDenoise
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY AND WITH_CYCLES_OPENIMAGEDENOISE)
set(WITH_OPENIMAGEDENOISE ON)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(OPENIMAGEDENOISE_INCLUDE_DIRS ${OPENIMAGEDENOISE_ROOT_DIR}/include)
set(OPENIMAGEDENOISE_LIBRARIES
optimized ${OPENIMAGEDENOISE_ROOT_DIR}/lib/OpenImageDenoise.lib
optimized ${OPENIMAGEDENOISE_ROOT_DIR}/lib/common.lib
optimized ${OPENIMAGEDENOISE_ROOT_DIR}/lib/dnnl.lib
debug ${OPENIMAGEDENOISE_ROOT_DIR}/lib/OpenImageDenoise_d.lib
debug ${OPENIMAGEDENOISE_ROOT_DIR}/lib/common_d.lib
debug ${OPENIMAGEDENOISE_ROOT_DIR}/lib/dnnl_d.lib
)
else()
find_package(OpenImageDenoise REQUIRED)
endif()
endif()
###########################################################################
# TBB
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if(NOT USD_OVERRIDE_TBB)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(TBB_INCLUDE_DIRS ${TBB_ROOT_DIR}/include)
set(TBB_LIBRARIES
optimized ${TBB_ROOT_DIR}/lib/tbb.lib
debug ${TBB_ROOT_DIR}/lib/tbb_debug.lib
)
else()
find_package(TBB REQUIRED)
endif()
endif()
endif()
###########################################################################
# Epoxy
###########################################################################
if(CYCLES_STANDALONE_REPOSITORY)
if((WITH_CYCLES_STANDALONE AND WITH_CYCLES_STANDALONE_GUI) OR
WITH_CYCLES_HYDRA_RENDER_DELEGATE)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(Epoxy_LIBRARIES "${_cycles_lib_dir}/epoxy/lib/epoxy.lib")
set(Epoxy_INCLUDE_DIRS "${_cycles_lib_dir}/epoxy/include")
else()
find_package(Epoxy REQUIRED)
endif()
endif()
endif()
###########################################################################
# Alembic
###########################################################################
if(WITH_CYCLES_ALEMBIC)
if(CYCLES_STANDALONE_REPOSITORY)
if(MSVC AND EXISTS ${_cycles_lib_dir})
set(ALEMBIC_INCLUDE_DIRS ${_cycles_lib_dir}/alembic/include)
set(ALEMBIC_LIBRARIES
optimized ${_cycles_lib_dir}/alembic/lib/Alembic.lib
debug ${_cycles_lib_dir}/alembic/lib/Alembic_d.lib)
else()
find_package(Alembic REQUIRED)
endif()
set(WITH_ALEMBIC ON)
endif()
endif()
###########################################################################
# System Libraries
###########################################################################
# Detect system libraries again
if(EXISTS ${_cycles_lib_dir})
unset(CMAKE_IGNORE_PATH)
unset(_cycles_lib_dir)
endif()
###########################################################################
# SDL
###########################################################################
@@ -109,3 +687,5 @@ if(WITH_CYCLES_DEVICE_ONEAPI)
set(WITH_CYCLES_DEVICE_ONEAPI OFF)
endif()
endif()
unset(_cycles_lib_dir)

View File

@@ -8,28 +8,13 @@ set(INC
set(INC_SYS )
if(WITH_CYCLES_DEVICE_OPTIX OR WITH_CYCLES_DEVICE_CUDA)
if(WITH_CUDA_DYNLOAD)
list(APPEND INC
../../../extern/cuew/include
)
add_definitions(-DWITH_CUDA_DYNLOAD)
else()
list(APPEND INC_SYS
${CUDA_TOOLKIT_INCLUDE}
)
if(NOT WITH_CUDA_DYNLOAD)
add_definitions(-DCYCLES_CUDA_NVCC_EXECUTABLE="${CUDA_NVCC_EXECUTABLE}")
endif()
add_definitions(-DCYCLES_RUNTIME_OPTIX_ROOT_DIR="${CYCLES_RUNTIME_OPTIX_ROOT_DIR}")
endif()
if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
list(APPEND INC
../../../extern/hipew/include
)
add_definitions(-DWITH_HIP_DYNLOAD)
endif()
set(SRC_BASE
device.cpp
denoise.cpp
@@ -168,24 +153,15 @@ if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
)
endif()
if(WITH_CYCLES_DEVICE_CUDA)
add_definitions(-DWITH_CUDA)
endif()
if(WITH_CYCLES_DEVICE_HIP)
add_definitions(-DWITH_HIP)
endif()
if(WITH_CYCLES_DEVICE_OPTIX)
add_definitions(-DWITH_OPTIX)
endif()
if(WITH_CYCLES_DEVICE_METAL)
list(APPEND LIB
${METAL_LIBRARY}
)
add_definitions(-DWITH_METAL)
list(APPEND SRC
${SRC_METAL}
)
endif()
if (WITH_CYCLES_DEVICE_ONEAPI)
if(WITH_CYCLES_ONEAPI_BINARIES)
set(cycles_kernel_oneapi_lib_suffix "_aot")
@@ -203,7 +179,6 @@ if (WITH_CYCLES_DEVICE_ONEAPI)
else()
list(APPEND LIB ${SYCL_LIBRARY})
endif()
add_definitions(-DWITH_ONEAPI)
list(APPEND SRC
${SRC_ONEAPI}
)

View File

@@ -78,24 +78,4 @@ class DenoiseParams : public Node {
}
};
/* All the parameters needed to perform buffer denoising on a device.
* Is not really a task in its canonical terms (as in, is not an asynchronous running task). Is
* more like a wrapper for all the arguments and parameters needed to perform denoising. Is a
* single place where they are all listed, so that it's not required to modify all device methods
* when these parameters do change. */
class DeviceDenoiseTask {
public:
DenoiseParams params;
int num_samples;
RenderBuffers *render_buffers;
BufferParams buffer_params;
/* Allow to do in-place modification of the input passes (scaling them down i.e.). This will
* lower the memory footprint of the denoiser but will make input passes "invalid" (from path
* tracer) point of view. */
bool allow_inplace_modification;
};
CCL_NAMESPACE_END

View File

@@ -160,6 +160,11 @@ class Device {
return true;
}
virtual bool load_osl_kernels()
{
return true;
}
/* GPU device only functions.
* These may not be used on CPU or multi-devices. */
@@ -228,21 +233,6 @@ class Device {
return nullptr;
}
/* Buffer denoising. */
/* Returns true if task is fully handled. */
virtual bool denoise_buffer(const DeviceDenoiseTask & /*task*/)
{
LOG(ERROR) << "Request buffer denoising from a device which does not support it.";
return false;
}
virtual DeviceQueue *get_denoise_queue()
{
LOG(ERROR) << "Request denoising queue from a device which does not support it.";
return nullptr;
}
/* Sub-devices */
/* Run given callback for every individual device which will be handling rendering.

View File

@@ -7,6 +7,30 @@
CCL_NAMESPACE_BEGIN
bool device_kernel_has_shading(DeviceKernel kernel)
{
return (kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_BACKGROUND ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_LIGHT ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_VOLUME ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SHADOW ||
kernel == DEVICE_KERNEL_SHADER_EVAL_DISPLACE ||
kernel == DEVICE_KERNEL_SHADER_EVAL_BACKGROUND ||
kernel == DEVICE_KERNEL_SHADER_EVAL_CURVE_SHADOW_TRANSPARENCY);
}
bool device_kernel_has_intersection(DeviceKernel kernel)
{
return (kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_SHADOW ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_SUBSURFACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_VOLUME_STACK ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE);
}
const char *device_kernel_as_string(DeviceKernel kernel)
{
switch (kernel) {

View File

@@ -11,6 +11,9 @@
CCL_NAMESPACE_BEGIN
bool device_kernel_has_shading(DeviceKernel kernel);
bool device_kernel_has_intersection(DeviceKernel kernel);
const char *device_kernel_as_string(DeviceKernel kernel);
std::ostream &operator<<(std::ostream &os, DeviceKernel kernel);

View File

@@ -45,6 +45,36 @@ bool kernel_has_intersection(DeviceKernel device_kernel)
struct ShaderCache {
ShaderCache(id<MTLDevice> _mtlDevice) : mtlDevice(_mtlDevice)
{
/* Initialize occupancy tuning LUT. */
if (MetalInfo::get_device_vendor(mtlDevice) == METAL_GPU_APPLE) {
switch (MetalInfo::get_apple_gpu_architecture(mtlDevice)) {
default:
case APPLE_M2:
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_COMPACT_SHADOW_STATES] = {32, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INIT_FROM_CAMERA] = {832, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST] = {64, 64};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_SHADOW] = {64, 64};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_SUBSURFACE] = {704, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_QUEUED_PATHS_ARRAY] = {1024, 256};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_BACKGROUND] = {64, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_SHADOW] = {256, 256};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] = {448, 384};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY] = {1024, 1024};
break;
case APPLE_M1:
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_COMPACT_SHADOW_STATES] = {256, 128};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INIT_FROM_CAMERA] = {768, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST] = {512, 128};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_SHADOW] = {384, 128};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_INTERSECT_SUBSURFACE] = {512, 64};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_QUEUED_PATHS_ARRAY] = {512, 256};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_BACKGROUND] = {512, 128};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_SHADOW] = {384, 32};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE] = {576, 384};
occupancy_tuning[DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY] = {832, 832};
break;
}
}
}
~ShaderCache();
@@ -73,6 +103,11 @@ struct ShaderCache {
std::function<void(MetalKernelPipeline *)> completionHandler;
};
struct OccupancyTuningParameters {
int threads_per_threadgroup = 0;
int num_threads_per_block = 0;
} occupancy_tuning[DEVICE_KERNEL_NUM];
std::mutex cache_mutex;
PipelineCollection pipelines[DEVICE_KERNEL_NUM];
@@ -230,6 +265,13 @@ void ShaderCache::load_kernel(DeviceKernel device_kernel,
request.pipeline->device_kernel = device_kernel;
request.pipeline->threads_per_threadgroup = device->max_threads_per_threadgroup;
if (occupancy_tuning[device_kernel].threads_per_threadgroup) {
request.pipeline->threads_per_threadgroup =
occupancy_tuning[device_kernel].threads_per_threadgroup;
request.pipeline->num_threads_per_block =
occupancy_tuning[device_kernel].num_threads_per_block;
}
/* metalrt options */
request.pipeline->use_metalrt = device->use_metalrt;
request.pipeline->metalrt_features = device->use_metalrt ?
@@ -384,13 +426,6 @@ void MetalKernelPipeline::compile()
const std::string function_name = std::string("cycles_metal_") +
device_kernel_as_string(device_kernel);
int threads_per_threadgroup = this->threads_per_threadgroup;
if (device_kernel > DEVICE_KERNEL_INTEGRATOR_MEGAKERNEL &&
device_kernel < DEVICE_KERNEL_INTEGRATOR_RESET) {
/* Always use 512 for the sorting kernels */
threads_per_threadgroup = 512;
}
NSString *entryPoint = [@(function_name.c_str()) copy];
NSError *error = NULL;
@@ -601,7 +636,9 @@ void MetalKernelPipeline::compile()
metalbin_path = path_cache_get(path_join("kernels", metalbin_name));
path_create_directories(metalbin_path);
if (path_exists(metalbin_path) && use_binary_archive) {
/* Retrieve shader binary from disk, and update the file timestamp for LRU purging to work as
* intended. */
if (use_binary_archive && path_cache_kernel_exists_and_mark_used(metalbin_path)) {
if (@available(macOS 11.0, *)) {
MTLBinaryArchiveDescriptor *archiveDesc = [[MTLBinaryArchiveDescriptor alloc] init];
archiveDesc.url = [NSURL fileURLWithPath:@(metalbin_path.c_str())];
@@ -662,12 +699,14 @@ void MetalKernelPipeline::compile()
return;
}
int num_threads_per_block = round_down(computePipelineState.maxTotalThreadsPerThreadgroup,
computePipelineState.threadExecutionWidth);
num_threads_per_block = std::max(num_threads_per_block,
(int)computePipelineState.threadExecutionWidth);
if (!num_threads_per_block) {
num_threads_per_block = round_down(computePipelineState.maxTotalThreadsPerThreadgroup,
computePipelineState.threadExecutionWidth);
num_threads_per_block = std::max(num_threads_per_block,
(int)computePipelineState.threadExecutionWidth);
}
this->pipeline = computePipelineState;
this->num_threads_per_block = num_threads_per_block;
if (@available(macOS 11.0, *)) {
if (creating_new_archive || recreate_archive) {
@@ -676,6 +715,9 @@ void MetalKernelPipeline::compile()
metal_printf("Failed to save binary archive, error:\n%s\n",
[[error localizedDescription] UTF8String]);
}
else {
path_cache_kernel_mark_added_and_clear_old(metalbin_path);
}
}
}
};

View File

@@ -138,6 +138,15 @@ class MultiDevice : public Device {
return true;
}
bool load_osl_kernels() override
{
foreach (SubDevice &sub, devices)
if (!sub.device->load_osl_kernels())
return false;
return true;
}
void build_bvh(BVH *bvh, Progress &progress, bool refit) override
{
/* Try to build and share a single acceleration structure, if possible */
@@ -204,10 +213,12 @@ class MultiDevice : public Device {
virtual void *get_cpu_osl_memory() override
{
if (devices.size() > 1) {
/* Always return the OSL memory of the CPU device (this works since the constructor above
* guarantees that CPU devices are always added to the back). */
if (devices.size() > 1 && devices.back().device->info.type != DEVICE_CPU) {
return NULL;
}
return devices.front().device->get_cpu_osl_memory();
return devices.back().device->get_cpu_osl_memory();
}
bool is_resident(device_ptr key, Device *sub_device) override

View File

@@ -31,6 +31,8 @@ bool device_oneapi_init()
* improves stability as of intel/LLVM SYCL-nightly/20220529.
* All these env variable can be set beforehand by end-users and
* will in that case -not- be overwritten. */
/* By default, enable only Level-Zero and if all devices are allowed, also CUDA and HIP.
* OpenCL backend isn't currently well supported. */
# ifdef _WIN32
if (getenv("SYCL_CACHE_PERSISTENT") == nullptr) {
_putenv_s("SYCL_CACHE_PERSISTENT", "1");
@@ -39,7 +41,12 @@ bool device_oneapi_init()
_putenv_s("SYCL_CACHE_THRESHOLD", "0");
}
if (getenv("SYCL_DEVICE_FILTER") == nullptr) {
_putenv_s("SYCL_DEVICE_FILTER", "level_zero");
if (getenv("CYCLES_ONEAPI_ALL_DEVICES") == nullptr) {
_putenv_s("SYCL_DEVICE_FILTER", "level_zero");
}
else {
_putenv_s("SYCL_DEVICE_FILTER", "level_zero,cuda,hip");
}
}
if (getenv("SYCL_ENABLE_PCI") == nullptr) {
_putenv_s("SYCL_ENABLE_PCI", "1");
@@ -50,7 +57,12 @@ bool device_oneapi_init()
# elif __linux__
setenv("SYCL_CACHE_PERSISTENT", "1", false);
setenv("SYCL_CACHE_THRESHOLD", "0", false);
setenv("SYCL_DEVICE_FILTER", "level_zero", false);
if (getenv("CYCLES_ONEAPI_ALL_DEVICES") == nullptr) {
setenv("SYCL_DEVICE_FILTER", "level_zero", false);
}
else {
setenv("SYCL_DEVICE_FILTER", "level_zero,cuda,hip", false);
}
setenv("SYCL_ENABLE_PCI", "1", false);
setenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_IN_ORDER_QUEUE", "0", false);
# endif

View File

@@ -430,9 +430,9 @@ void OneapiDevice::check_usm(SyclQueue *queue_, const void *usm_ptr, bool allow_
sycl::usm::alloc usm_type = get_pointer_type(usm_ptr, queue->get_context());
(void)usm_type;
assert(usm_type == sycl::usm::alloc::device ||
((device_type == sycl::info::device_type::cpu || allow_host) &&
usm_type == sycl::usm::alloc::host ||
usm_type == sycl::usm::alloc::unknown));
(usm_type == sycl::usm::alloc::host &&
(allow_host || device_type == sycl::info::device_type::cpu)) ||
usm_type == sycl::usm::alloc::unknown);
# else
/* Silence warning about unused arguments. */
(void)queue_;

View File

@@ -9,6 +9,10 @@
#include "util/log.h"
#ifdef WITH_OSL
# include <OSL/oslversion.h>
#endif
#ifdef WITH_OPTIX
# include <optix_function_table_definition.h>
#endif
@@ -65,6 +69,9 @@ void device_optix_info(const vector<DeviceInfo> &cuda_devices, vector<DeviceInfo
info.type = DEVICE_OPTIX;
info.id += "_OptiX";
# if defined(WITH_OSL) && (OSL_VERSION_MINOR >= 13 || OSL_VERSION_MAJOR > 1)
info.has_osl = true;
# endif
info.denoisers |= DENOISER_OPTIX;
devices.push_back(info);

File diff suppressed because it is too large Load Diff

View File

@@ -1,16 +1,14 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2019, NVIDIA Corporation.
* Copyright 2019-2022 Blender Foundation. */
* Copyright 2019, NVIDIA Corporation
* Copyright 2019-2022 Blender Foundation */
#pragma once
#ifdef WITH_OPTIX
# include "device/cuda/device_impl.h"
# include "device/optix/queue.h"
# include "device/optix/util.h"
# include "kernel/types.h"
# include "util/unique_ptr.h"
# include "kernel/osl/globals.h"
CCL_NAMESPACE_BEGIN
@@ -23,8 +21,16 @@ enum {
PG_RGEN_INTERSECT_SHADOW,
PG_RGEN_INTERSECT_SUBSURFACE,
PG_RGEN_INTERSECT_VOLUME_STACK,
PG_RGEN_SHADE_BACKGROUND,
PG_RGEN_SHADE_LIGHT,
PG_RGEN_SHADE_SURFACE,
PG_RGEN_SHADE_SURFACE_RAYTRACE,
PG_RGEN_SHADE_SURFACE_MNEE,
PG_RGEN_SHADE_VOLUME,
PG_RGEN_SHADE_SHADOW,
PG_RGEN_EVAL_DISPLACE,
PG_RGEN_EVAL_BACKGROUND,
PG_RGEN_EVAL_CURVE_SHADOW_TRANSPARENCY,
PG_MISS,
PG_HITD, /* Default hit group. */
PG_HITS, /* __SHADOW_RECORD_ALL__ hit group. */
@@ -40,14 +46,14 @@ enum {
};
static const int MISS_PROGRAM_GROUP_OFFSET = PG_MISS;
static const int NUM_MIS_PROGRAM_GROUPS = 1;
static const int NUM_MISS_PROGRAM_GROUPS = 1;
static const int HIT_PROGAM_GROUP_OFFSET = PG_HITD;
static const int NUM_HIT_PROGRAM_GROUPS = 8;
static const int CALLABLE_PROGRAM_GROUPS_BASE = PG_CALL_SVM_AO;
static const int NUM_CALLABLE_PROGRAM_GROUPS = 2;
/* List of OptiX pipelines. */
enum { PIP_SHADE_RAYTRACE, PIP_SHADE_MNEE, PIP_INTERSECT, NUM_PIPELINES };
enum { PIP_SHADE, PIP_INTERSECT, NUM_PIPELINES };
/* A single shader binding table entry. */
struct SbtRecord {
@@ -61,52 +67,35 @@ class OptiXDevice : public CUDADevice {
OptixModule optix_module = NULL; /* All necessary OptiX kernels are in one module. */
OptixModule builtin_modules[2] = {};
OptixPipeline pipelines[NUM_PIPELINES] = {};
OptixProgramGroup groups[NUM_PROGRAM_GROUPS] = {};
OptixPipelineCompileOptions pipeline_options = {};
bool motion_blur = false;
device_vector<SbtRecord> sbt_data;
device_only_memory<KernelParamsOptiX> launch_params;
OptixTraversableHandle tlas_handle = 0;
# ifdef WITH_OSL
OSLGlobals osl_globals;
vector<OptixModule> osl_modules;
vector<OptixProgramGroup> osl_groups;
# endif
private:
OptixTraversableHandle tlas_handle = 0;
vector<unique_ptr<device_only_memory<char>>> delayed_free_bvh_memory;
thread_mutex delayed_free_bvh_mutex;
class Denoiser {
public:
explicit Denoiser(OptiXDevice *device);
OptiXDevice *device;
OptiXDeviceQueue queue;
OptixDenoiser optix_denoiser = nullptr;
/* Configuration size, as provided to `optixDenoiserSetup`.
* If the `optixDenoiserSetup()` was never used on the current `optix_denoiser` the
* `is_configured` will be false. */
bool is_configured = false;
int2 configured_size = make_int2(0, 0);
/* OptiX denoiser state and scratch buffers, stored in a single memory buffer.
* The memory layout goes as following: [denoiser state][scratch buffer]. */
device_only_memory<unsigned char> state;
OptixDenoiserSizes sizes = {};
bool use_pass_albedo = false;
bool use_pass_normal = false;
bool use_pass_flow = false;
};
Denoiser denoiser_;
public:
OptiXDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler);
~OptiXDevice();
private:
BVHLayoutMask get_bvh_layout_mask() const override;
string compile_kernel_get_common_cflags(const uint kernel_features);
bool load_kernels(const uint kernel_features) override;
bool load_osl_kernels() override;
bool build_optix_bvh(BVHOptiX *bvh,
OptixBuildOperation operation,
const OptixBuildInput &build_input,
@@ -123,52 +112,7 @@ class OptiXDevice : public CUDADevice {
virtual unique_ptr<DeviceQueue> gpu_queue_create() override;
/* --------------------------------------------------------------------
* Denoising.
*/
class DenoiseContext;
class DenoisePass;
virtual bool denoise_buffer(const DeviceDenoiseTask &task) override;
virtual DeviceQueue *get_denoise_queue() override;
/* Read guiding passes from the render buffers, preprocess them in a way which is expected by
* OptiX and store in the guiding passes memory within the given context.
*
* Pre=-processing of the guiding passes is to only happen once per context lifetime. DO not
* preprocess them for every pass which is being denoised. */
bool denoise_filter_guiding_preprocess(DenoiseContext &context);
/* Set fake albedo pixels in the albedo guiding pass storage.
* After this point only passes which do not need albedo for denoising can be processed. */
bool denoise_filter_guiding_set_fake_albedo(DenoiseContext &context);
void denoise_pass(DenoiseContext &context, PassType pass_type);
/* Read input color pass from the render buffer into the memory which corresponds to the noisy
* input within the given context. Pixels are scaled to the number of samples, but are not
* preprocessed yet. */
void denoise_color_read(DenoiseContext &context, const DenoisePass &pass);
/* Run corresponding filter kernels, preparing data for the denoiser or copying data from the
* denoiser result to the render buffer. */
bool denoise_filter_color_preprocess(DenoiseContext &context, const DenoisePass &pass);
bool denoise_filter_color_postprocess(DenoiseContext &context, const DenoisePass &pass);
/* Make sure the OptiX denoiser is created and configured. */
bool denoise_ensure(DenoiseContext &context);
/* Create OptiX denoiser descriptor if needed.
* Will do nothing if the current OptiX descriptor is usable for the given parameters.
* If the OptiX denoiser descriptor did re-allocate here it is left unconfigured. */
bool denoise_create_if_needed(DenoiseContext &context);
/* Configure existing OptiX denoiser descriptor for the use for the given task. */
bool denoise_configure_if_needed(DenoiseContext &context);
/* Run configured denoiser. */
bool denoise_run(DenoiseContext &context, const DenoisePass &pass);
void *get_cpu_osl_memory() override;
};
CCL_NAMESPACE_END

View File

@@ -24,21 +24,33 @@ void OptiXDeviceQueue::init_execution()
CUDADeviceQueue::init_execution();
}
static bool is_optix_specific_kernel(DeviceKernel kernel)
static bool is_optix_specific_kernel(DeviceKernel kernel, bool use_osl)
{
return (kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_SHADOW ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_SUBSURFACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_VOLUME_STACK);
# ifdef WITH_OSL
/* OSL uses direct callables to execute, so shading needs to be done in OptiX if OSL is used. */
if (use_osl && device_kernel_has_shading(kernel)) {
return true;
}
# else
(void)use_osl;
# endif
return device_kernel_has_intersection(kernel);
}
bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
const int work_size,
DeviceKernelArguments const &args)
{
if (!is_optix_specific_kernel(kernel)) {
OptiXDevice *const optix_device = static_cast<OptiXDevice *>(cuda_device_);
# ifdef WITH_OSL
const bool use_osl = static_cast<OSLGlobals *>(optix_device->get_cpu_osl_memory())->use;
# else
const bool use_osl = false;
# endif
if (!is_optix_specific_kernel(kernel, use_osl)) {
return CUDADeviceQueue::enqueue(kernel, work_size, args);
}
@@ -50,8 +62,6 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
const CUDAContextScope scope(cuda_device_);
OptiXDevice *const optix_device = static_cast<OptiXDevice *>(cuda_device_);
const device_ptr sbt_data_ptr = optix_device->sbt_data.device_pointer;
const device_ptr launch_params_ptr = optix_device->launch_params.device_pointer;
@@ -62,9 +72,7 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
sizeof(device_ptr),
cuda_stream_));
if (kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE ||
kernel == DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE) {
if (kernel == DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST || device_kernel_has_shading(kernel)) {
cuda_device_assert(
cuda_device_,
cuMemcpyHtoDAsync(launch_params_ptr + offsetof(KernelParamsOptiX, render_buffer),
@@ -72,6 +80,15 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
sizeof(device_ptr),
cuda_stream_));
}
if (kernel == DEVICE_KERNEL_SHADER_EVAL_DISPLACE ||
kernel == DEVICE_KERNEL_SHADER_EVAL_BACKGROUND ||
kernel == DEVICE_KERNEL_SHADER_EVAL_CURVE_SHADOW_TRANSPARENCY) {
cuda_device_assert(cuda_device_,
cuMemcpyHtoDAsync(launch_params_ptr + offsetof(KernelParamsOptiX, offset),
args.values[2], // &d_offset
sizeof(int32_t),
cuda_stream_));
}
cuda_device_assert(cuda_device_, cuStreamSynchronize(cuda_stream_));
@@ -79,14 +96,35 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
OptixShaderBindingTable sbt_params = {};
switch (kernel) {
case DEVICE_KERNEL_INTEGRATOR_SHADE_BACKGROUND:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_BACKGROUND * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_LIGHT:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_LIGHT * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_SURFACE * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_RAYTRACE:
pipeline = optix_device->pipelines[PIP_SHADE_RAYTRACE];
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_SURFACE_RAYTRACE * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_SURFACE_MNEE:
pipeline = optix_device->pipelines[PIP_SHADE_MNEE];
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_SURFACE_MNEE * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_VOLUME:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_VOLUME * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_SHADE_SHADOW:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_SHADE_SHADOW * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_INTEGRATOR_INTERSECT_CLOSEST:
pipeline = optix_device->pipelines[PIP_INTERSECT];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_INTERSECT_CLOSEST * sizeof(SbtRecord);
@@ -104,6 +142,20 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_INTERSECT_VOLUME_STACK * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_SHADER_EVAL_DISPLACE:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_EVAL_DISPLACE * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_SHADER_EVAL_BACKGROUND:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr + PG_RGEN_EVAL_BACKGROUND * sizeof(SbtRecord);
break;
case DEVICE_KERNEL_SHADER_EVAL_CURVE_SHADOW_TRANSPARENCY:
pipeline = optix_device->pipelines[PIP_SHADE];
sbt_params.raygenRecord = sbt_data_ptr +
PG_RGEN_EVAL_CURVE_SHADOW_TRANSPARENCY * sizeof(SbtRecord);
break;
default:
LOG(ERROR) << "Invalid kernel " << device_kernel_as_string(kernel)
<< " is attempted to be enqueued.";
@@ -112,7 +164,7 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
sbt_params.missRecordBase = sbt_data_ptr + MISS_PROGRAM_GROUP_OFFSET * sizeof(SbtRecord);
sbt_params.missRecordStrideInBytes = sizeof(SbtRecord);
sbt_params.missRecordCount = NUM_MIS_PROGRAM_GROUPS;
sbt_params.missRecordCount = NUM_MISS_PROGRAM_GROUPS;
sbt_params.hitgroupRecordBase = sbt_data_ptr + HIT_PROGAM_GROUP_OFFSET * sizeof(SbtRecord);
sbt_params.hitgroupRecordStrideInBytes = sizeof(SbtRecord);
sbt_params.hitgroupRecordCount = NUM_HIT_PROGRAM_GROUPS;
@@ -120,6 +172,12 @@ bool OptiXDeviceQueue::enqueue(DeviceKernel kernel,
sbt_params.callablesRecordCount = NUM_CALLABLE_PROGRAM_GROUPS;
sbt_params.callablesRecordStrideInBytes = sizeof(SbtRecord);
# ifdef WITH_OSL
if (use_osl) {
sbt_params.callablesRecordCount += static_cast<unsigned int>(optix_device->osl_groups.size());
}
# endif
/* Launch the ray generation program. */
optix_device_assert(optix_device,
optixLaunch(pipeline,

View File

@@ -8,7 +8,7 @@ set(INC
set(SRC
adaptive_sampling.cpp
denoiser.cpp
denoiser_device.cpp
denoiser_gpu.cpp
denoiser_oidn.cpp
denoiser_optix.cpp
path_trace.cpp
@@ -30,7 +30,7 @@ set(SRC
set(SRC_HEADERS
adaptive_sampling.h
denoiser.h
denoiser_device.h
denoiser_gpu.h
denoiser_oidn.h
denoiser_optix.h
path_trace.h

View File

@@ -16,9 +16,11 @@ unique_ptr<Denoiser> Denoiser::create(Device *path_trace_device, const DenoisePa
{
DCHECK(params.use);
#ifdef WITH_OPTIX
if (params.type == DENOISER_OPTIX && Device::available_devices(DEVICE_MASK_OPTIX).size()) {
return make_unique<OptiXDenoiser>(path_trace_device, params);
}
#endif
/* Always fallback to OIDN. */
DenoiseParams oidn_params = params;

View File

@@ -1,27 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#pragma once
#include "integrator/denoiser.h"
#include "util/unique_ptr.h"
CCL_NAMESPACE_BEGIN
/* Denoiser which uses device-specific denoising implementation, such as OptiX denoiser which are
* implemented as a part of a driver of specific device.
*
* This implementation makes sure the to-be-denoised buffer is available on the denoising device
* and invoke denoising kernel via device API. */
class DeviceDenoiser : public Denoiser {
public:
DeviceDenoiser(Device *path_trace_device, const DenoiseParams &params);
~DeviceDenoiser();
virtual bool denoise_buffer(const BufferParams &buffer_params,
RenderBuffers *render_buffers,
const int num_samples,
bool allow_inplace_modification) override;
};
CCL_NAMESPACE_END

View File

@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#include "integrator/denoiser_device.h"
#include "integrator/denoiser_gpu.h"
#include "device/denoise.h"
#include "device/device.h"
@@ -13,27 +13,27 @@
CCL_NAMESPACE_BEGIN
DeviceDenoiser::DeviceDenoiser(Device *path_trace_device, const DenoiseParams &params)
DenoiserGPU::DenoiserGPU(Device *path_trace_device, const DenoiseParams &params)
: Denoiser(path_trace_device, params)
{
}
DeviceDenoiser::~DeviceDenoiser()
DenoiserGPU::~DenoiserGPU()
{
/* Explicit implementation, to allow forward declaration of Device in the header. */
}
bool DeviceDenoiser::denoise_buffer(const BufferParams &buffer_params,
RenderBuffers *render_buffers,
const int num_samples,
bool allow_inplace_modification)
bool DenoiserGPU::denoise_buffer(const BufferParams &buffer_params,
RenderBuffers *render_buffers,
const int num_samples,
bool allow_inplace_modification)
{
Device *denoiser_device = get_denoiser_device();
if (!denoiser_device) {
return false;
}
DeviceDenoiseTask task;
DenoiseTask task;
task.params = params_;
task.num_samples = num_samples;
task.buffer_params = buffer_params;
@@ -50,8 +50,6 @@ bool DeviceDenoiser::denoise_buffer(const BufferParams &buffer_params,
else {
VLOG_WORK << "Creating temporary buffer on denoiser device.";
DeviceQueue *queue = denoiser_device->get_denoise_queue();
/* Create buffer which is available by the device used by denoiser. */
/* TODO(sergey): Optimize data transfers. For example, only copy denoising related passes,
@@ -70,13 +68,13 @@ bool DeviceDenoiser::denoise_buffer(const BufferParams &buffer_params,
render_buffers->buffer.data(),
sizeof(float) * local_render_buffers.buffer.size());
queue->copy_to_device(local_render_buffers.buffer);
denoiser_queue_->copy_to_device(local_render_buffers.buffer);
task.render_buffers = &local_render_buffers;
task.allow_inplace_modification = true;
}
const bool denoise_result = denoiser_device->denoise_buffer(task);
const bool denoise_result = denoise_buffer(task);
if (local_buffer_used) {
local_render_buffers.copy_from_device();
@@ -90,4 +88,21 @@ bool DeviceDenoiser::denoise_buffer(const BufferParams &buffer_params,
return denoise_result;
}
Device *DenoiserGPU::ensure_denoiser_device(Progress *progress)
{
Device *denoiser_device = Denoiser::ensure_denoiser_device(progress);
if (!denoiser_device) {
return nullptr;
}
if (!denoiser_queue_) {
denoiser_queue_ = denoiser_device->gpu_queue_create();
if (!denoiser_queue_) {
return nullptr;
}
}
return denoiser_device;
}
CCL_NAMESPACE_END

View File

@@ -0,0 +1,52 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#pragma once
#include "integrator/denoiser.h"
CCL_NAMESPACE_BEGIN
/* Implementation of Denoiser which uses a device-specific denoising implementation, running on a
* GPU device queue. It makes sure the to-be-denoised buffer is available on the denoising device
* and invokes denoising kernels via the device queue API. */
class DenoiserGPU : public Denoiser {
public:
DenoiserGPU(Device *path_trace_device, const DenoiseParams &params);
~DenoiserGPU();
virtual bool denoise_buffer(const BufferParams &buffer_params,
RenderBuffers *render_buffers,
const int num_samples,
bool allow_inplace_modification) override;
protected:
/* All the parameters needed to perform buffer denoising on a device.
* Is not really a task in its canonical terms (as in, is not an asynchronous running task). Is
* more like a wrapper for all the arguments and parameters needed to perform denoising. Is a
* single place where they are all listed, so that it's not required to modify all device methods
* when these parameters do change. */
class DenoiseTask {
public:
DenoiseParams params;
int num_samples;
RenderBuffers *render_buffers;
BufferParams buffer_params;
/* Allow to do in-place modification of the input passes (scaling them down i.e.). This will
* lower the memory footprint of the denoiser but will make input passes "invalid" (from path
* tracer) point of view. */
bool allow_inplace_modification;
};
/* Returns true if task is fully handled. */
virtual bool denoise_buffer(const DenoiseTask & /*task*/) = 0;
virtual Device *ensure_denoiser_device(Progress *progress) override;
unique_ptr<DeviceQueue> denoiser_queue_;
};
CCL_NAMESPACE_END

View File

@@ -1,16 +1,216 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#include "integrator/denoiser_optix.h"
#ifdef WITH_OPTIX
#include "device/denoise.h"
#include "device/device.h"
# include "integrator/denoiser_optix.h"
# include "integrator/pass_accessor_gpu.h"
# include "device/optix/device_impl.h"
# include "device/optix/queue.h"
# include <optix_denoiser_tiling.h>
CCL_NAMESPACE_BEGIN
OptiXDenoiser::OptiXDenoiser(Device *path_trace_device, const DenoiseParams &params)
: DeviceDenoiser(path_trace_device, params)
# if OPTIX_ABI_VERSION >= 60
using ::optixUtilDenoiserInvokeTiled;
# else
// A minimal copy of functionality `optix_denoiser_tiling.h` which allows to fix integer overflow
// issues without bumping SDK or driver requirement.
//
// The original code is Copyright NVIDIA Corporation, BSD-3-Clause.
static OptixResult optixUtilDenoiserSplitImage(const OptixImage2D &input,
const OptixImage2D &output,
unsigned int overlapWindowSizeInPixels,
unsigned int tileWidth,
unsigned int tileHeight,
std::vector<OptixUtilDenoiserImageTile> &tiles)
{
if (tileWidth == 0 || tileHeight == 0)
return OPTIX_ERROR_INVALID_VALUE;
unsigned int inPixelStride = optixUtilGetPixelStride(input);
unsigned int outPixelStride = optixUtilGetPixelStride(output);
int inp_w = std::min(tileWidth + 2 * overlapWindowSizeInPixels, input.width);
int inp_h = std::min(tileHeight + 2 * overlapWindowSizeInPixels, input.height);
int inp_y = 0, copied_y = 0;
do {
int inputOffsetY = inp_y == 0 ? 0 :
std::max((int)overlapWindowSizeInPixels,
inp_h - ((int)input.height - inp_y));
int copy_y = inp_y == 0 ? std::min(input.height, tileHeight + overlapWindowSizeInPixels) :
std::min(tileHeight, input.height - copied_y);
int inp_x = 0, copied_x = 0;
do {
int inputOffsetX = inp_x == 0 ? 0 :
std::max((int)overlapWindowSizeInPixels,
inp_w - ((int)input.width - inp_x));
int copy_x = inp_x == 0 ? std::min(input.width, tileWidth + overlapWindowSizeInPixels) :
std::min(tileWidth, input.width - copied_x);
OptixUtilDenoiserImageTile tile;
tile.input.data = input.data + (size_t)(inp_y - inputOffsetY) * input.rowStrideInBytes +
+(size_t)(inp_x - inputOffsetX) * inPixelStride;
tile.input.width = inp_w;
tile.input.height = inp_h;
tile.input.rowStrideInBytes = input.rowStrideInBytes;
tile.input.pixelStrideInBytes = input.pixelStrideInBytes;
tile.input.format = input.format;
tile.output.data = output.data + (size_t)inp_y * output.rowStrideInBytes +
(size_t)inp_x * outPixelStride;
tile.output.width = copy_x;
tile.output.height = copy_y;
tile.output.rowStrideInBytes = output.rowStrideInBytes;
tile.output.pixelStrideInBytes = output.pixelStrideInBytes;
tile.output.format = output.format;
tile.inputOffsetX = inputOffsetX;
tile.inputOffsetY = inputOffsetY;
tiles.push_back(tile);
inp_x += inp_x == 0 ? tileWidth + overlapWindowSizeInPixels : tileWidth;
copied_x += copy_x;
} while (inp_x < static_cast<int>(input.width));
inp_y += inp_y == 0 ? tileHeight + overlapWindowSizeInPixels : tileHeight;
copied_y += copy_y;
} while (inp_y < static_cast<int>(input.height));
return OPTIX_SUCCESS;
}
static OptixResult optixUtilDenoiserInvokeTiled(OptixDenoiser denoiser,
CUstream stream,
const OptixDenoiserParams *params,
CUdeviceptr denoiserState,
size_t denoiserStateSizeInBytes,
const OptixDenoiserGuideLayer *guideLayer,
const OptixDenoiserLayer *layers,
unsigned int numLayers,
CUdeviceptr scratch,
size_t scratchSizeInBytes,
unsigned int overlapWindowSizeInPixels,
unsigned int tileWidth,
unsigned int tileHeight)
{
if (!guideLayer || !layers)
return OPTIX_ERROR_INVALID_VALUE;
std::vector<std::vector<OptixUtilDenoiserImageTile>> tiles(numLayers);
std::vector<std::vector<OptixUtilDenoiserImageTile>> prevTiles(numLayers);
for (unsigned int l = 0; l < numLayers; l++) {
if (const OptixResult res = ccl::optixUtilDenoiserSplitImage(layers[l].input,
layers[l].output,
overlapWindowSizeInPixels,
tileWidth,
tileHeight,
tiles[l]))
return res;
if (layers[l].previousOutput.data) {
OptixImage2D dummyOutput = layers[l].previousOutput;
if (const OptixResult res = ccl::optixUtilDenoiserSplitImage(layers[l].previousOutput,
dummyOutput,
overlapWindowSizeInPixels,
tileWidth,
tileHeight,
prevTiles[l]))
return res;
}
}
std::vector<OptixUtilDenoiserImageTile> albedoTiles;
if (guideLayer->albedo.data) {
OptixImage2D dummyOutput = guideLayer->albedo;
if (const OptixResult res = ccl::optixUtilDenoiserSplitImage(guideLayer->albedo,
dummyOutput,
overlapWindowSizeInPixels,
tileWidth,
tileHeight,
albedoTiles))
return res;
}
std::vector<OptixUtilDenoiserImageTile> normalTiles;
if (guideLayer->normal.data) {
OptixImage2D dummyOutput = guideLayer->normal;
if (const OptixResult res = ccl::optixUtilDenoiserSplitImage(guideLayer->normal,
dummyOutput,
overlapWindowSizeInPixels,
tileWidth,
tileHeight,
normalTiles))
return res;
}
std::vector<OptixUtilDenoiserImageTile> flowTiles;
if (guideLayer->flow.data) {
OptixImage2D dummyOutput = guideLayer->flow;
if (const OptixResult res = ccl::optixUtilDenoiserSplitImage(guideLayer->flow,
dummyOutput,
overlapWindowSizeInPixels,
tileWidth,
tileHeight,
flowTiles))
return res;
}
for (size_t t = 0; t < tiles[0].size(); t++) {
std::vector<OptixDenoiserLayer> tlayers;
for (unsigned int l = 0; l < numLayers; l++) {
OptixDenoiserLayer layer = {};
layer.input = (tiles[l])[t].input;
layer.output = (tiles[l])[t].output;
if (layers[l].previousOutput.data)
layer.previousOutput = (prevTiles[l])[t].input;
tlayers.push_back(layer);
}
OptixDenoiserGuideLayer gl = {};
if (guideLayer->albedo.data)
gl.albedo = albedoTiles[t].input;
if (guideLayer->normal.data)
gl.normal = normalTiles[t].input;
if (guideLayer->flow.data)
gl.flow = flowTiles[t].input;
if (const OptixResult res = optixDenoiserInvoke(denoiser,
stream,
params,
denoiserState,
denoiserStateSizeInBytes,
&gl,
&tlayers[0],
numLayers,
(tiles[0])[t].inputOffsetX,
(tiles[0])[t].inputOffsetY,
scratch,
scratchSizeInBytes))
return res;
}
return OPTIX_SUCCESS;
}
# endif
OptiXDenoiser::OptiXDenoiser(Device *path_trace_device, const DenoiseParams &params)
: DenoiserGPU(path_trace_device, params), state_(path_trace_device, "__denoiser_state", true)
{
}
OptiXDenoiser::~OptiXDenoiser()
{
/* It is important that the OptixDenoiser handle is destroyed before the OptixDeviceContext
* handle, which is guaranteed since the local denoising device owning the OptiX device context
* is deleted as part of the Denoiser class destructor call after this. */
if (optix_denoiser_ != nullptr) {
optixDenoiserDestroy(optix_denoiser_);
}
}
uint OptiXDenoiser::get_device_type_mask() const
@@ -18,4 +218,569 @@ uint OptiXDenoiser::get_device_type_mask() const
return DEVICE_MASK_OPTIX;
}
class OptiXDenoiser::DenoiseContext {
public:
explicit DenoiseContext(OptiXDevice *device, const DenoiseTask &task)
: denoise_params(task.params),
render_buffers(task.render_buffers),
buffer_params(task.buffer_params),
guiding_buffer(device, "denoiser guiding passes buffer", true),
num_samples(task.num_samples)
{
num_input_passes = 1;
if (denoise_params.use_pass_albedo) {
num_input_passes += 1;
use_pass_albedo = true;
pass_denoising_albedo = buffer_params.get_pass_offset(PASS_DENOISING_ALBEDO);
if (denoise_params.use_pass_normal) {
num_input_passes += 1;
use_pass_normal = true;
pass_denoising_normal = buffer_params.get_pass_offset(PASS_DENOISING_NORMAL);
}
}
if (denoise_params.temporally_stable) {
prev_output.device_pointer = render_buffers->buffer.device_pointer;
prev_output.offset = buffer_params.get_pass_offset(PASS_DENOISING_PREVIOUS);
prev_output.stride = buffer_params.stride;
prev_output.pass_stride = buffer_params.pass_stride;
num_input_passes += 1;
use_pass_motion = true;
pass_motion = buffer_params.get_pass_offset(PASS_MOTION);
}
use_guiding_passes = (num_input_passes - 1) > 0;
if (use_guiding_passes) {
if (task.allow_inplace_modification) {
guiding_params.device_pointer = render_buffers->buffer.device_pointer;
guiding_params.pass_albedo = pass_denoising_albedo;
guiding_params.pass_normal = pass_denoising_normal;
guiding_params.pass_flow = pass_motion;
guiding_params.stride = buffer_params.stride;
guiding_params.pass_stride = buffer_params.pass_stride;
}
else {
guiding_params.pass_stride = 0;
if (use_pass_albedo) {
guiding_params.pass_albedo = guiding_params.pass_stride;
guiding_params.pass_stride += 3;
}
if (use_pass_normal) {
guiding_params.pass_normal = guiding_params.pass_stride;
guiding_params.pass_stride += 3;
}
if (use_pass_motion) {
guiding_params.pass_flow = guiding_params.pass_stride;
guiding_params.pass_stride += 2;
}
guiding_params.stride = buffer_params.width;
guiding_buffer.alloc_to_device(buffer_params.width * buffer_params.height *
guiding_params.pass_stride);
guiding_params.device_pointer = guiding_buffer.device_pointer;
}
}
pass_sample_count = buffer_params.get_pass_offset(PASS_SAMPLE_COUNT);
}
const DenoiseParams &denoise_params;
RenderBuffers *render_buffers = nullptr;
const BufferParams &buffer_params;
/* Previous output. */
struct {
device_ptr device_pointer = 0;
int offset = PASS_UNUSED;
int stride = -1;
int pass_stride = -1;
} prev_output;
/* Device-side storage of the guiding passes. */
device_only_memory<float> guiding_buffer;
struct {
device_ptr device_pointer = 0;
/* NOTE: Are only initialized when the corresponding guiding pass is enabled. */
int pass_albedo = PASS_UNUSED;
int pass_normal = PASS_UNUSED;
int pass_flow = PASS_UNUSED;
int stride = -1;
int pass_stride = -1;
} guiding_params;
/* Number of input passes. Including the color and extra auxiliary passes. */
int num_input_passes = 0;
bool use_guiding_passes = false;
bool use_pass_albedo = false;
bool use_pass_normal = false;
bool use_pass_motion = false;
int num_samples = 0;
int pass_sample_count = PASS_UNUSED;
/* NOTE: Are only initialized when the corresponding guiding pass is enabled. */
int pass_denoising_albedo = PASS_UNUSED;
int pass_denoising_normal = PASS_UNUSED;
int pass_motion = PASS_UNUSED;
/* For passes which don't need albedo channel for denoising we replace the actual albedo with
* the (0.5, 0.5, 0.5). This flag indicates that the real albedo pass has been replaced with
* the fake values and denoising of passes which do need albedo can no longer happen. */
bool albedo_replaced_with_fake = false;
};
class OptiXDenoiser::DenoisePass {
public:
DenoisePass(const PassType type, const BufferParams &buffer_params) : type(type)
{
noisy_offset = buffer_params.get_pass_offset(type, PassMode::NOISY);
denoised_offset = buffer_params.get_pass_offset(type, PassMode::DENOISED);
const PassInfo pass_info = Pass::get_info(type);
num_components = pass_info.num_components;
use_compositing = pass_info.use_compositing;
use_denoising_albedo = pass_info.use_denoising_albedo;
}
PassType type;
int noisy_offset;
int denoised_offset;
int num_components;
bool use_compositing;
bool use_denoising_albedo;
};
bool OptiXDenoiser::denoise_buffer(const DenoiseTask &task)
{
OptiXDevice *const optix_device = static_cast<OptiXDevice *>(denoiser_device_);
const CUDAContextScope scope(optix_device);
DenoiseContext context(optix_device, task);
if (!denoise_ensure(context)) {
return false;
}
if (!denoise_filter_guiding_preprocess(context)) {
LOG(ERROR) << "Error preprocessing guiding passes.";
return false;
}
/* Passes which will use real albedo when it is available. */
denoise_pass(context, PASS_COMBINED);
denoise_pass(context, PASS_SHADOW_CATCHER_MATTE);
/* Passes which do not need albedo and hence if real is present it needs to become fake. */
denoise_pass(context, PASS_SHADOW_CATCHER);
return true;
}
bool OptiXDenoiser::denoise_filter_guiding_preprocess(const DenoiseContext &context)
{
const BufferParams &buffer_params = context.buffer_params;
const int work_size = buffer_params.width * buffer_params.height;
DeviceKernelArguments args(&context.guiding_params.device_pointer,
&context.guiding_params.pass_stride,
&context.guiding_params.pass_albedo,
&context.guiding_params.pass_normal,
&context.guiding_params.pass_flow,
&context.render_buffers->buffer.device_pointer,
&buffer_params.offset,
&buffer_params.stride,
&buffer_params.pass_stride,
&context.pass_sample_count,
&context.pass_denoising_albedo,
&context.pass_denoising_normal,
&context.pass_motion,
&buffer_params.full_x,
&buffer_params.full_y,
&buffer_params.width,
&buffer_params.height,
&context.num_samples);
return denoiser_queue_->enqueue(DEVICE_KERNEL_FILTER_GUIDING_PREPROCESS, work_size, args);
}
bool OptiXDenoiser::denoise_filter_guiding_set_fake_albedo(const DenoiseContext &context)
{
const BufferParams &buffer_params = context.buffer_params;
const int work_size = buffer_params.width * buffer_params.height;
DeviceKernelArguments args(&context.guiding_params.device_pointer,
&context.guiding_params.pass_stride,
&context.guiding_params.pass_albedo,
&buffer_params.width,
&buffer_params.height);
return denoiser_queue_->enqueue(DEVICE_KERNEL_FILTER_GUIDING_SET_FAKE_ALBEDO, work_size, args);
}
void OptiXDenoiser::denoise_pass(DenoiseContext &context, PassType pass_type)
{
const BufferParams &buffer_params = context.buffer_params;
const DenoisePass pass(pass_type, buffer_params);
if (pass.noisy_offset == PASS_UNUSED) {
return;
}
if (pass.denoised_offset == PASS_UNUSED) {
LOG(DFATAL) << "Missing denoised pass " << pass_type_as_string(pass_type);
return;
}
if (pass.use_denoising_albedo) {
if (context.albedo_replaced_with_fake) {
LOG(ERROR) << "Pass which requires albedo is denoised after fake albedo has been set.";
return;
}
}
else if (context.use_guiding_passes && !context.albedo_replaced_with_fake) {
context.albedo_replaced_with_fake = true;
if (!denoise_filter_guiding_set_fake_albedo(context)) {
LOG(ERROR) << "Error replacing real albedo with the fake one.";
return;
}
}
/* Read and preprocess noisy color input pass. */
denoise_color_read(context, pass);
if (!denoise_filter_color_preprocess(context, pass)) {
LOG(ERROR) << "Error converting denoising passes to RGB buffer.";
return;
}
if (!denoise_run(context, pass)) {
LOG(ERROR) << "Error running OptiX denoiser.";
return;
}
/* Store result in the combined pass of the render buffer.
*
* This will scale the denoiser result up to match the number of, possibly per-pixel, samples. */
if (!denoise_filter_color_postprocess(context, pass)) {
LOG(ERROR) << "Error copying denoiser result to the denoised pass.";
return;
}
denoiser_queue_->synchronize();
}
void OptiXDenoiser::denoise_color_read(const DenoiseContext &context, const DenoisePass &pass)
{
PassAccessor::PassAccessInfo pass_access_info;
pass_access_info.type = pass.type;
pass_access_info.mode = PassMode::NOISY;
pass_access_info.offset = pass.noisy_offset;
/* Denoiser operates on passes which are used to calculate the approximation, and is never used
* on the approximation. The latter is not even possible because OptiX does not support
* denoising of semi-transparent pixels. */
pass_access_info.use_approximate_shadow_catcher = false;
pass_access_info.use_approximate_shadow_catcher_background = false;
pass_access_info.show_active_pixels = false;
/* TODO(sergey): Consider adding support of actual exposure, to avoid clamping in extreme cases.
*/
const PassAccessorGPU pass_accessor(
denoiser_queue_.get(), pass_access_info, 1.0f, context.num_samples);
PassAccessor::Destination destination(pass_access_info.type);
destination.d_pixels = context.render_buffers->buffer.device_pointer +
pass.denoised_offset * sizeof(float);
destination.num_components = 3;
destination.pixel_stride = context.buffer_params.pass_stride;
BufferParams buffer_params = context.buffer_params;
buffer_params.window_x = 0;
buffer_params.window_y = 0;
buffer_params.window_width = buffer_params.width;
buffer_params.window_height = buffer_params.height;
pass_accessor.get_render_tile_pixels(context.render_buffers, buffer_params, destination);
}
bool OptiXDenoiser::denoise_filter_color_preprocess(const DenoiseContext &context,
const DenoisePass &pass)
{
const BufferParams &buffer_params = context.buffer_params;
const int work_size = buffer_params.width * buffer_params.height;
DeviceKernelArguments args(&context.render_buffers->buffer.device_pointer,
&buffer_params.full_x,
&buffer_params.full_y,
&buffer_params.width,
&buffer_params.height,
&buffer_params.offset,
&buffer_params.stride,
&buffer_params.pass_stride,
&pass.denoised_offset);
return denoiser_queue_->enqueue(DEVICE_KERNEL_FILTER_COLOR_PREPROCESS, work_size, args);
}
bool OptiXDenoiser::denoise_filter_color_postprocess(const DenoiseContext &context,
const DenoisePass &pass)
{
const BufferParams &buffer_params = context.buffer_params;
const int work_size = buffer_params.width * buffer_params.height;
DeviceKernelArguments args(&context.render_buffers->buffer.device_pointer,
&buffer_params.full_x,
&buffer_params.full_y,
&buffer_params.width,
&buffer_params.height,
&buffer_params.offset,
&buffer_params.stride,
&buffer_params.pass_stride,
&context.num_samples,
&pass.noisy_offset,
&pass.denoised_offset,
&context.pass_sample_count,
&pass.num_components,
&pass.use_compositing);
return denoiser_queue_->enqueue(DEVICE_KERNEL_FILTER_COLOR_POSTPROCESS, work_size, args);
}
bool OptiXDenoiser::denoise_ensure(DenoiseContext &context)
{
if (!denoise_create_if_needed(context)) {
LOG(ERROR) << "OptiX denoiser creation has failed.";
return false;
}
if (!denoise_configure_if_needed(context)) {
LOG(ERROR) << "OptiX denoiser configuration has failed.";
return false;
}
return true;
}
bool OptiXDenoiser::denoise_create_if_needed(DenoiseContext &context)
{
const bool recreate_denoiser = (optix_denoiser_ == nullptr) ||
(use_pass_albedo_ != context.use_pass_albedo) ||
(use_pass_normal_ != context.use_pass_normal) ||
(use_pass_motion_ != context.use_pass_motion);
if (!recreate_denoiser) {
return true;
}
/* Destroy existing handle before creating new one. */
if (optix_denoiser_) {
optixDenoiserDestroy(optix_denoiser_);
}
/* Create OptiX denoiser handle on demand when it is first used. */
OptixDenoiserOptions denoiser_options = {};
denoiser_options.guideAlbedo = context.use_pass_albedo;
denoiser_options.guideNormal = context.use_pass_normal;
OptixDenoiserModelKind model = OPTIX_DENOISER_MODEL_KIND_HDR;
if (context.use_pass_motion) {
model = OPTIX_DENOISER_MODEL_KIND_TEMPORAL;
}
const OptixResult result = optixDenoiserCreate(
static_cast<OptiXDevice *>(denoiser_device_)->context,
model,
&denoiser_options,
&optix_denoiser_);
if (result != OPTIX_SUCCESS) {
denoiser_device_->set_error("Failed to create OptiX denoiser");
return false;
}
/* OptiX denoiser handle was created with the requested number of input passes. */
use_pass_albedo_ = context.use_pass_albedo;
use_pass_normal_ = context.use_pass_normal;
use_pass_motion_ = context.use_pass_motion;
/* OptiX denoiser has been created, but it needs configuration. */
is_configured_ = false;
return true;
}
bool OptiXDenoiser::denoise_configure_if_needed(DenoiseContext &context)
{
/* Limit maximum tile size denoiser can be invoked with. */
const int2 tile_size = make_int2(min(context.buffer_params.width, 4096),
min(context.buffer_params.height, 4096));
if (is_configured_ && (configured_size_.x == tile_size.x && configured_size_.y == tile_size.y)) {
return true;
}
optix_device_assert(
denoiser_device_,
optixDenoiserComputeMemoryResources(optix_denoiser_, tile_size.x, tile_size.y, &sizes_));
/* Allocate denoiser state if tile size has changed since last setup. */
state_.device = denoiser_device_;
state_.alloc_to_device(sizes_.stateSizeInBytes + sizes_.withOverlapScratchSizeInBytes);
/* Initialize denoiser state for the current tile size. */
const OptixResult result = optixDenoiserSetup(
optix_denoiser_,
0, /* Work around bug in r495 drivers that causes artifacts when denoiser setup is called
* on a stream that is not the default stream. */
tile_size.x + sizes_.overlapWindowSizeInPixels * 2,
tile_size.y + sizes_.overlapWindowSizeInPixels * 2,
state_.device_pointer,
sizes_.stateSizeInBytes,
state_.device_pointer + sizes_.stateSizeInBytes,
sizes_.withOverlapScratchSizeInBytes);
if (result != OPTIX_SUCCESS) {
denoiser_device_->set_error("Failed to set up OptiX denoiser");
return false;
}
cuda_device_assert(denoiser_device_, cuCtxSynchronize());
is_configured_ = true;
configured_size_ = tile_size;
return true;
}
bool OptiXDenoiser::denoise_run(const DenoiseContext &context, const DenoisePass &pass)
{
const BufferParams &buffer_params = context.buffer_params;
const int width = buffer_params.width;
const int height = buffer_params.height;
/* Set up input and output layer information. */
OptixImage2D color_layer = {0};
OptixImage2D albedo_layer = {0};
OptixImage2D normal_layer = {0};
OptixImage2D flow_layer = {0};
OptixImage2D output_layer = {0};
OptixImage2D prev_output_layer = {0};
/* Color pass. */
{
const int pass_denoised = pass.denoised_offset;
const int64_t pass_stride_in_bytes = context.buffer_params.pass_stride * sizeof(float);
color_layer.data = context.render_buffers->buffer.device_pointer +
pass_denoised * sizeof(float);
color_layer.width = width;
color_layer.height = height;
color_layer.rowStrideInBytes = pass_stride_in_bytes * context.buffer_params.stride;
color_layer.pixelStrideInBytes = pass_stride_in_bytes;
color_layer.format = OPTIX_PIXEL_FORMAT_FLOAT3;
}
/* Previous output. */
if (context.prev_output.offset != PASS_UNUSED) {
const int64_t pass_stride_in_bytes = context.prev_output.pass_stride * sizeof(float);
prev_output_layer.data = context.prev_output.device_pointer +
context.prev_output.offset * sizeof(float);
prev_output_layer.width = width;
prev_output_layer.height = height;
prev_output_layer.rowStrideInBytes = pass_stride_in_bytes * context.prev_output.stride;
prev_output_layer.pixelStrideInBytes = pass_stride_in_bytes;
prev_output_layer.format = OPTIX_PIXEL_FORMAT_FLOAT3;
}
/* Optional albedo and color passes. */
if (context.num_input_passes > 1) {
const device_ptr d_guiding_buffer = context.guiding_params.device_pointer;
const int64_t pixel_stride_in_bytes = context.guiding_params.pass_stride * sizeof(float);
const int64_t row_stride_in_bytes = context.guiding_params.stride * pixel_stride_in_bytes;
if (context.use_pass_albedo) {
albedo_layer.data = d_guiding_buffer + context.guiding_params.pass_albedo * sizeof(float);
albedo_layer.width = width;
albedo_layer.height = height;
albedo_layer.rowStrideInBytes = row_stride_in_bytes;
albedo_layer.pixelStrideInBytes = pixel_stride_in_bytes;
albedo_layer.format = OPTIX_PIXEL_FORMAT_FLOAT3;
}
if (context.use_pass_normal) {
normal_layer.data = d_guiding_buffer + context.guiding_params.pass_normal * sizeof(float);
normal_layer.width = width;
normal_layer.height = height;
normal_layer.rowStrideInBytes = row_stride_in_bytes;
normal_layer.pixelStrideInBytes = pixel_stride_in_bytes;
normal_layer.format = OPTIX_PIXEL_FORMAT_FLOAT3;
}
if (context.use_pass_motion) {
flow_layer.data = d_guiding_buffer + context.guiding_params.pass_flow * sizeof(float);
flow_layer.width = width;
flow_layer.height = height;
flow_layer.rowStrideInBytes = row_stride_in_bytes;
flow_layer.pixelStrideInBytes = pixel_stride_in_bytes;
flow_layer.format = OPTIX_PIXEL_FORMAT_FLOAT2;
}
}
/* Denoise in-place of the noisy input in the render buffers. */
output_layer = color_layer;
OptixDenoiserGuideLayer guide_layers = {};
guide_layers.albedo = albedo_layer;
guide_layers.normal = normal_layer;
guide_layers.flow = flow_layer;
OptixDenoiserLayer image_layers = {};
image_layers.input = color_layer;
image_layers.previousOutput = prev_output_layer;
image_layers.output = output_layer;
/* Finally run denoising. */
OptixDenoiserParams params = {}; /* All parameters are disabled/zero. */
optix_device_assert(denoiser_device_,
ccl::optixUtilDenoiserInvokeTiled(
optix_denoiser_,
static_cast<OptiXDeviceQueue *>(denoiser_queue_.get())->stream(),
&params,
state_.device_pointer,
sizes_.stateSizeInBytes,
&guide_layers,
&image_layers,
1,
state_.device_pointer + sizes_.stateSizeInBytes,
sizes_.withOverlapScratchSizeInBytes,
sizes_.overlapWindowSizeInPixels,
configured_size_.x,
configured_size_.y));
return true;
}
CCL_NAMESPACE_END
#endif

View File

@@ -3,16 +3,84 @@
#pragma once
#include "integrator/denoiser_device.h"
#ifdef WITH_OPTIX
# include "integrator/denoiser_gpu.h"
# include "device/optix/util.h"
CCL_NAMESPACE_BEGIN
class OptiXDenoiser : public DeviceDenoiser {
/* Implementation of denoising API which uses the OptiX denoiser. */
class OptiXDenoiser : public DenoiserGPU {
public:
OptiXDenoiser(Device *path_trace_device, const DenoiseParams &params);
~OptiXDenoiser();
protected:
virtual uint get_device_type_mask() const override;
private:
class DenoiseContext;
class DenoisePass;
virtual bool denoise_buffer(const DenoiseTask &task) override;
/* Read guiding passes from the render buffers, preprocess them in a way which is expected by
* OptiX and store in the guiding passes memory within the given context.
*
* Pre-processing of the guiding passes is to only happen once per context lifetime. DO not
* preprocess them for every pass which is being denoised. */
bool denoise_filter_guiding_preprocess(const DenoiseContext &context);
/* Set fake albedo pixels in the albedo guiding pass storage.
* After this point only passes which do not need albedo for denoising can be processed. */
bool denoise_filter_guiding_set_fake_albedo(const DenoiseContext &context);
void denoise_pass(DenoiseContext &context, PassType pass_type);
/* Read input color pass from the render buffer into the memory which corresponds to the noisy
* input within the given context. Pixels are scaled to the number of samples, but are not
* preprocessed yet. */
void denoise_color_read(const DenoiseContext &context, const DenoisePass &pass);
/* Run corresponding filter kernels, preparing data for the denoiser or copying data from the
* denoiser result to the render buffer. */
bool denoise_filter_color_preprocess(const DenoiseContext &context, const DenoisePass &pass);
bool denoise_filter_color_postprocess(const DenoiseContext &context, const DenoisePass &pass);
/* Make sure the OptiX denoiser is created and configured. */
bool denoise_ensure(DenoiseContext &context);
/* Create OptiX denoiser descriptor if needed.
* Will do nothing if the current OptiX descriptor is usable for the given parameters.
* If the OptiX denoiser descriptor did re-allocate here it is left unconfigured. */
bool denoise_create_if_needed(DenoiseContext &context);
/* Configure existing OptiX denoiser descriptor for the use for the given task. */
bool denoise_configure_if_needed(DenoiseContext &context);
/* Run configured denoiser. */
bool denoise_run(const DenoiseContext &context, const DenoisePass &pass);
OptixDenoiser optix_denoiser_ = nullptr;
/* Configuration size, as provided to `optixDenoiserSetup`.
* If the `optixDenoiserSetup()` was never used on the current `optix_denoiser` the
* `is_configured` will be false. */
bool is_configured_ = false;
int2 configured_size_ = make_int2(0, 0);
/* OptiX denoiser state and scratch buffers, stored in a single memory buffer.
* The memory layout goes as following: [denoiser state][scratch buffer]. */
device_only_memory<unsigned char> state_;
OptixDenoiserSizes sizes_ = {};
bool use_pass_albedo_ = false;
bool use_pass_normal_ = false;
bool use_pass_motion_ = false;
};
CCL_NAMESPACE_END
#endif

View File

@@ -37,6 +37,14 @@ set(SRC_KERNEL_DEVICE_OPTIX
device/optix/kernel_shader_raytrace.cu
)
if(WITH_CYCLES_OSL AND (OSL_LIBRARY_VERSION_MINOR GREATER_EQUAL 13 OR OSL_LIBRARY_VERSION_MAJOR GREATER 1))
set(SRC_KERNEL_DEVICE_OPTIX
${SRC_KERNEL_DEVICE_OPTIX}
osl/services_optix.cu
device/optix/kernel_osl.cu
)
endif()
set(SRC_KERNEL_DEVICE_ONEAPI
device/oneapi/kernel.cpp
)
@@ -181,6 +189,16 @@ set(SRC_KERNEL_SVM_HEADERS
svm/vertex_color.h
)
if(WITH_CYCLES_OSL)
set(SRC_KERNEL_OSL_HEADERS
osl/osl.h
osl/closures_setup.h
osl/closures_template.h
osl/services_gpu.h
osl/types.h
)
endif()
set(SRC_KERNEL_GEOM_HEADERS
geom/geom.h
geom/attribute.h
@@ -306,6 +324,7 @@ set(SRC_KERNEL_HEADERS
${SRC_KERNEL_GEOM_HEADERS}
${SRC_KERNEL_INTEGRATOR_HEADERS}
${SRC_KERNEL_LIGHT_HEADERS}
${SRC_KERNEL_OSL_HEADERS}
${SRC_KERNEL_SAMPLE_HEADERS}
${SRC_KERNEL_SVM_HEADERS}
${SRC_KERNEL_TYPES_HEADERS}
@@ -328,6 +347,7 @@ set(SRC_UTIL_HEADERS
../util/math_int2.h
../util/math_int3.h
../util/math_int4.h
../util/math_int8.h
../util/math_matrix.h
../util/projection.h
../util/rect.h
@@ -350,6 +370,8 @@ set(SRC_UTIL_HEADERS
../util/types_int3_impl.h
../util/types_int4.h
../util/types_int4_impl.h
../util/types_int8.h
../util/types_int8_impl.h
../util/types_spectrum.h
../util/types_uchar2.h
../util/types_uchar2_impl.h
@@ -660,6 +682,16 @@ if(WITH_CYCLES_DEVICE_OPTIX AND WITH_CYCLES_CUDA_BINARIES)
kernel_optix_shader_raytrace
"device/optix/kernel_shader_raytrace.cu"
"--keep-device-functions")
if(WITH_CYCLES_OSL AND (OSL_LIBRARY_VERSION_MINOR GREATER_EQUAL 13 OR OSL_LIBRARY_VERSION_MAJOR GREATER 1))
CYCLES_OPTIX_KERNEL_ADD(
kernel_optix_osl
"device/optix/kernel_osl.cu"
"--relocatable-device-code=true")
CYCLES_OPTIX_KERNEL_ADD(
kernel_optix_osl_services
"osl/services_optix.cu"
"--relocatable-device-code=true")
endif()
add_custom_target(cycles_kernel_optix ALL DEPENDS ${optix_ptx})
cycles_set_solution_folder(cycles_kernel_optix)
@@ -947,6 +979,7 @@ source_group("geom" FILES ${SRC_KERNEL_GEOM_HEADERS})
source_group("integrator" FILES ${SRC_KERNEL_INTEGRATOR_HEADERS})
source_group("kernel" FILES ${SRC_KERNEL_TYPES_HEADERS})
source_group("light" FILES ${SRC_KERNEL_LIGHT_HEADERS})
source_group("osl" FILES ${SRC_KERNEL_OSL_HEADERS})
source_group("sample" FILES ${SRC_KERNEL_SAMPLE_HEADERS})
source_group("svm" FILES ${SRC_KERNEL_SVM_HEADERS})
source_group("util" FILES ${SRC_KERNEL_UTIL_HEADERS})
@@ -983,6 +1016,7 @@ delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_FILM_HEADERS}" ${CYCLE
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_GEOM_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/geom)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_INTEGRATOR_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/integrator)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_LIGHT_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/light)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_OSL_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/osl)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_SAMPLE_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/sample)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_SVM_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/svm)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_KERNEL_TYPES_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel)

View File

@@ -297,8 +297,10 @@ ccl_device_inline void bsdf_roughness_eta(const KernelGlobals kg,
ccl_private float2 *roughness,
ccl_private float *eta)
{
#ifdef __SVM__
bool refractive = false;
float alpha = 1.0f;
#endif
switch (sc->type) {
case CLOSURE_BSDF_DIFFUSE_ID:
*roughness = one_float2();
@@ -578,11 +580,11 @@ ccl_device_inline
case CLOSURE_BSDF_MICROFACET_GGX_FRESNEL_ID:
case CLOSURE_BSDF_MICROFACET_GGX_CLEARCOAT_ID:
case CLOSURE_BSDF_MICROFACET_GGX_REFRACTION_ID:
eval = bsdf_microfacet_ggx_eval(sc, sd->N, sd->I, omega_in, pdf);
eval = bsdf_microfacet_ggx_eval(sc, sd->I, omega_in, pdf);
break;
case CLOSURE_BSDF_MICROFACET_MULTI_GGX_ID:
case CLOSURE_BSDF_MICROFACET_MULTI_GGX_FRESNEL_ID:
eval = bsdf_microfacet_multi_ggx_eval(sc, sd->N, sd->I, omega_in, pdf, &sd->lcg_state);
eval = bsdf_microfacet_multi_ggx_eval(sc, sd->I, omega_in, pdf, &sd->lcg_state);
break;
case CLOSURE_BSDF_MICROFACET_MULTI_GGX_GLASS_ID:
case CLOSURE_BSDF_MICROFACET_MULTI_GGX_GLASS_FRESNEL_ID:
@@ -590,10 +592,10 @@ ccl_device_inline
break;
case CLOSURE_BSDF_MICROFACET_BECKMANN_ID:
case CLOSURE_BSDF_MICROFACET_BECKMANN_REFRACTION_ID:
eval = bsdf_microfacet_beckmann_eval(sc, sd->N, sd->I, omega_in, pdf);
eval = bsdf_microfacet_beckmann_eval(sc, sd->I, omega_in, pdf);
break;
case CLOSURE_BSDF_ASHIKHMIN_SHIRLEY_ID:
eval = bsdf_ashikhmin_shirley_eval(sc, sd->N, sd->I, omega_in, pdf);
eval = bsdf_ashikhmin_shirley_eval(sc, sd->I, omega_in, pdf);
break;
case CLOSURE_BSDF_ASHIKHMIN_VELVET_ID:
eval = bsdf_ashikhmin_velvet_eval(sc, sd->I, omega_in, pdf);

View File

@@ -40,13 +40,11 @@ ccl_device_inline float bsdf_ashikhmin_shirley_roughness_to_exponent(float rough
}
ccl_device_forceinline Spectrum bsdf_ashikhmin_shirley_eval(ccl_private const ShaderClosure *sc,
const float3 Ng,
const float3 I,
const float3 omega_in,
ccl_private float *pdf)
{
ccl_private const MicrofacetBsdf *bsdf = (ccl_private const MicrofacetBsdf *)sc;
const float cosNgI = dot(Ng, omega_in);
float3 N = bsdf->N;
float NdotI = dot(N, I); /* in Cycles/OSL convention I is omega_out */
@@ -54,8 +52,7 @@ ccl_device_forceinline Spectrum bsdf_ashikhmin_shirley_eval(ccl_private const Sh
float out = 0.0f;
if ((cosNgI < 0.0f) || fmaxf(bsdf->alpha_x, bsdf->alpha_y) <= 1e-4f ||
!(NdotI > 0.0f && NdotO > 0.0f)) {
if (fmaxf(bsdf->alpha_x, bsdf->alpha_y) <= 1e-4f || !(NdotI > 0.0f && NdotO > 0.0f)) {
*pdf = 0.0f;
return zero_spectrum();
}
@@ -213,7 +210,7 @@ ccl_device int bsdf_ashikhmin_shirley_sample(ccl_private const ShaderClosure *sc
}
else {
/* leave the rest to eval */
*eval = bsdf_ashikhmin_shirley_eval(sc, N, I, *omega_in, pdf);
*eval = bsdf_ashikhmin_shirley_eval(sc, I, *omega_in, pdf);
}
return label;

View File

@@ -517,30 +517,27 @@ ccl_device Spectrum bsdf_microfacet_ggx_eval_transmit(ccl_private const Microfac
}
ccl_device Spectrum bsdf_microfacet_ggx_eval(ccl_private const ShaderClosure *sc,
const float3 Ng,
const float3 I,
const float3 omega_in,
ccl_private float *pdf)
{
ccl_private const MicrofacetBsdf *bsdf = (ccl_private const MicrofacetBsdf *)sc;
const bool m_refractive = bsdf->type == CLOSURE_BSDF_MICROFACET_GGX_REFRACTION_ID;
const float alpha_x = bsdf->alpha_x;
const float alpha_y = bsdf->alpha_y;
const float cosNgI = dot(Ng, omega_in);
if (((cosNgI < 0.0f) != m_refractive) || alpha_x * alpha_y <= 1e-7f) {
*pdf = 0.0f;
return zero_spectrum();
}
const bool m_refractive = bsdf->type == CLOSURE_BSDF_MICROFACET_GGX_REFRACTION_ID;
const float3 N = bsdf->N;
const float cosNO = dot(N, I);
const float cosNI = dot(N, omega_in);
return (cosNgI < 0.0f) ? bsdf_microfacet_ggx_eval_transmit(
bsdf, N, I, omega_in, pdf, alpha_x, alpha_y, cosNO, cosNI) :
bsdf_microfacet_ggx_eval_reflect(
bsdf, N, I, omega_in, pdf, alpha_x, alpha_y, cosNO, cosNI);
if (((cosNI < 0.0f) != m_refractive) || alpha_x * alpha_y <= 1e-7f) {
*pdf = 0.0f;
return zero_spectrum();
}
return (cosNI < 0.0f) ? bsdf_microfacet_ggx_eval_transmit(
bsdf, N, I, omega_in, pdf, alpha_x, alpha_y, cosNO, cosNI) :
bsdf_microfacet_ggx_eval_reflect(
bsdf, N, I, omega_in, pdf, alpha_x, alpha_y, cosNO, cosNI);
}
ccl_device int bsdf_microfacet_ggx_sample(KernelGlobals kg,
@@ -945,26 +942,23 @@ ccl_device Spectrum bsdf_microfacet_beckmann_eval_transmit(ccl_private const Mic
}
ccl_device Spectrum bsdf_microfacet_beckmann_eval(ccl_private const ShaderClosure *sc,
const float3 Ng,
const float3 I,
const float3 omega_in,
ccl_private float *pdf)
{
ccl_private const MicrofacetBsdf *bsdf = (ccl_private const MicrofacetBsdf *)sc;
const bool m_refractive = bsdf->type == CLOSURE_BSDF_MICROFACET_BECKMANN_REFRACTION_ID;
const float alpha_x = bsdf->alpha_x;
const float alpha_y = bsdf->alpha_y;
const float cosNgI = dot(Ng, omega_in);
if (((cosNgI < 0.0f) != m_refractive) || alpha_x * alpha_y <= 1e-7f) {
*pdf = 0.0f;
return zero_spectrum();
}
const bool m_refractive = bsdf->type == CLOSURE_BSDF_MICROFACET_BECKMANN_REFRACTION_ID;
const float3 N = bsdf->N;
const float cosNO = dot(N, I);
const float cosNI = dot(N, omega_in);
if (((cosNI < 0.0f) != m_refractive) || alpha_x * alpha_y <= 1e-7f) {
*pdf = 0.0f;
return zero_spectrum();
}
return (cosNI < 0.0f) ? bsdf_microfacet_beckmann_eval_transmit(
bsdf, N, I, omega_in, pdf, alpha_x, alpha_y, cosNO, cosNI) :
bsdf_microfacet_beckmann_eval_reflect(

View File

@@ -416,16 +416,14 @@ ccl_device int bsdf_microfacet_multi_ggx_refraction_setup(ccl_private Microfacet
}
ccl_device Spectrum bsdf_microfacet_multi_ggx_eval(ccl_private const ShaderClosure *sc,
const float3 Ng,
const float3 I,
const float3 omega_in,
ccl_private float *pdf,
ccl_private uint *lcg_state)
{
ccl_private const MicrofacetBsdf *bsdf = (ccl_private const MicrofacetBsdf *)sc;
const float cosNgI = dot(Ng, omega_in);
if ((cosNgI < 0.0f) || bsdf->alpha_x * bsdf->alpha_y < 1e-7f) {
if (bsdf->alpha_x * bsdf->alpha_y < 1e-7f) {
*pdf = 0.0f;
return zero_spectrum();
}

View File

@@ -7,6 +7,7 @@
* one with SSE2 intrinsics.
*/
#if defined(__x86_64__) || defined(_M_X64)
# define __KERNEL_SSE__
# define __KERNEL_SSE2__
#endif
@@ -29,11 +30,15 @@
# define __KERNEL_SSE41__
# endif
# ifdef __AVX__
# define __KERNEL_SSE__
# ifndef __KERNEL_SSE__
# define __KERNEL_SSE__
# endif
# define __KERNEL_AVX__
# endif
# ifdef __AVX2__
# define __KERNEL_SSE__
# ifndef __KERNEL_SSE__
# define __KERNEL_SSE__
# endif
# define __KERNEL_AVX2__
# endif
#endif

View File

@@ -30,6 +30,7 @@ typedef unsigned long long uint64_t;
/* Qualifiers */
#define ccl_device __device__ __inline__
#define ccl_device_extern extern "C" __device__
#if __CUDA_ARCH__ < 500
# define ccl_device_inline __device__ __forceinline__
# define ccl_device_forceinline __device__ __forceinline__
@@ -109,14 +110,14 @@ ccl_device_forceinline T ccl_gpu_tex_object_read_3D(const ccl_gpu_tex_object_3D
typedef unsigned short half;
__device__ half __float2half(const float f)
ccl_device_forceinline half __float2half(const float f)
{
half val;
asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(val) : "f"(f));
return val;
}
__device__ float __half2float(const half h)
ccl_device_forceinline float __half2float(const half h)
{
float val;
asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(h));

View File

@@ -28,6 +28,7 @@ typedef unsigned long long uint64_t;
/* Qualifiers */
#define ccl_device __device__ __inline__
#define ccl_device_extern extern "C" __device__
#define ccl_device_inline __device__ __inline__
#define ccl_device_forceinline __device__ __forceinline__
#define ccl_device_noinline __device__ __noinline__

View File

@@ -38,6 +38,7 @@ using namespace metal::raytracing;
# define ccl_device_noinline ccl_device __attribute__((noinline))
#endif
#define ccl_device_extern extern "C"
#define ccl_device_noinline_cpu ccl_device
#define ccl_device_inline_method ccl_device
#define ccl_global device

View File

@@ -28,6 +28,7 @@
/* Qualifier wrappers for different names on different devices */
#define ccl_device
#define ccl_device_extern extern "C"
#define ccl_global
#define ccl_always_inline __attribute__((always_inline))
#define ccl_device_inline inline

View File

@@ -33,14 +33,16 @@ typedef unsigned long long uint64_t;
#endif
#define ccl_device \
__device__ __forceinline__ // Function calls are bad for OptiX performance, so inline everything
static __device__ \
__forceinline__ // Function calls are bad for OptiX performance, so inline everything
#define ccl_device_extern extern "C" __device__
#define ccl_device_inline ccl_device
#define ccl_device_forceinline ccl_device
#define ccl_device_inline_method ccl_device
#define ccl_device_noinline __device__ __noinline__
#define ccl_device_inline_method __device__ __forceinline__
#define ccl_device_noinline static __device__ __noinline__
#define ccl_device_noinline_cpu ccl_device
#define ccl_global
#define ccl_inline_constant __constant__
#define ccl_inline_constant static __constant__
#define ccl_device_constant __constant__ __device__
#define ccl_constant const
#define ccl_gpu_shared __shared__
@@ -57,23 +59,6 @@ typedef unsigned long long uint64_t;
#define kernel_assert(cond)
/* GPU thread, block, grid size and index */
#define ccl_gpu_thread_idx_x (threadIdx.x)
#define ccl_gpu_block_dim_x (blockDim.x)
#define ccl_gpu_block_idx_x (blockIdx.x)
#define ccl_gpu_grid_dim_x (gridDim.x)
#define ccl_gpu_warp_size (warpSize)
#define ccl_gpu_thread_mask(thread_warp) uint(0xFFFFFFFF >> (ccl_gpu_warp_size - thread_warp))
#define ccl_gpu_global_id_x() (ccl_gpu_block_idx_x * ccl_gpu_block_dim_x + ccl_gpu_thread_idx_x)
#define ccl_gpu_global_size_x() (ccl_gpu_grid_dim_x * ccl_gpu_block_dim_x)
/* GPU warp synchronization. */
#define ccl_gpu_syncthreads() __syncthreads()
#define ccl_gpu_ballot(predicate) __ballot_sync(0xFFFFFFFF, predicate)
/* GPU texture objects */
typedef unsigned long long CUtexObject;
@@ -101,14 +86,14 @@ ccl_device_forceinline T ccl_gpu_tex_object_read_3D(const ccl_gpu_tex_object_3D
typedef unsigned short half;
__device__ half __float2half(const float f)
ccl_device_forceinline half __float2half(const float f)
{
half val;
asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(val) : "f"(f));
return val;
}
__device__ float __half2float(const half h)
ccl_device_forceinline float __half2float(const half h)
{
float val;
asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(h));

View File

@@ -25,6 +25,7 @@ struct KernelParamsOptiX {
/* Kernel arguments */
const int *path_index_array;
float *render_buffer;
int offset;
/* Global scene data and textures */
KernelData data;
@@ -36,7 +37,11 @@ struct KernelParamsOptiX {
};
#ifdef __NVCC__
extern "C" static __constant__ KernelParamsOptiX kernel_params;
extern "C"
# ifndef __CUDACC_RDC__
static
# endif
__constant__ KernelParamsOptiX kernel_params;
#endif
/* Abstraction macros */

View File

@@ -0,0 +1,83 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#define WITH_OSL
/* Copy of the regular OptiX kernels with additional OSL support. */
#include "kernel/device/optix/kernel_shader_raytrace.cu"
#include "kernel/bake/bake.h"
#include "kernel/integrator/shade_background.h"
#include "kernel/integrator/shade_light.h"
#include "kernel/integrator/shade_shadow.h"
#include "kernel/integrator/shade_volume.h"
extern "C" __global__ void __raygen__kernel_optix_integrator_shade_background()
{
const int global_index = optixGetLaunchIndex().x;
const int path_index = (kernel_params.path_index_array) ?
kernel_params.path_index_array[global_index] :
global_index;
integrator_shade_background(nullptr, path_index, kernel_params.render_buffer);
}
extern "C" __global__ void __raygen__kernel_optix_integrator_shade_light()
{
const int global_index = optixGetLaunchIndex().x;
const int path_index = (kernel_params.path_index_array) ?
kernel_params.path_index_array[global_index] :
global_index;
integrator_shade_light(nullptr, path_index, kernel_params.render_buffer);
}
extern "C" __global__ void __raygen__kernel_optix_integrator_shade_surface()
{
const int global_index = optixGetLaunchIndex().x;
const int path_index = (kernel_params.path_index_array) ?
kernel_params.path_index_array[global_index] :
global_index;
integrator_shade_surface(nullptr, path_index, kernel_params.render_buffer);
}
extern "C" __global__ void __raygen__kernel_optix_integrator_shade_volume()
{
const int global_index = optixGetLaunchIndex().x;
const int path_index = (kernel_params.path_index_array) ?
kernel_params.path_index_array[global_index] :
global_index;
integrator_shade_volume(nullptr, path_index, kernel_params.render_buffer);
}
extern "C" __global__ void __raygen__kernel_optix_integrator_shade_shadow()
{
const int global_index = optixGetLaunchIndex().x;
const int path_index = (kernel_params.path_index_array) ?
kernel_params.path_index_array[global_index] :
global_index;
integrator_shade_shadow(nullptr, path_index, kernel_params.render_buffer);
}
extern "C" __global__ void __raygen__kernel_optix_shader_eval_displace()
{
KernelShaderEvalInput *const input = (KernelShaderEvalInput *)kernel_params.path_index_array;
float *const output = kernel_params.render_buffer;
const int global_index = kernel_params.offset + optixGetLaunchIndex().x;
kernel_displace_evaluate(nullptr, input, output, global_index);
}
extern "C" __global__ void __raygen__kernel_optix_shader_eval_background()
{
KernelShaderEvalInput *const input = (KernelShaderEvalInput *)kernel_params.path_index_array;
float *const output = kernel_params.render_buffer;
const int global_index = kernel_params.offset + optixGetLaunchIndex().x;
kernel_background_evaluate(nullptr, input, output, global_index);
}
extern "C" __global__ void __raygen__kernel_optix_shader_eval_curve_shadow_transparency()
{
KernelShaderEvalInput *const input = (KernelShaderEvalInput *)kernel_params.path_index_array;
float *const output = kernel_params.render_buffer;
const int global_index = kernel_params.offset + optixGetLaunchIndex().x;
kernel_curve_shadow_transparency_evaluate(nullptr, input, output, global_index);
}

View File

@@ -58,13 +58,29 @@ ccl_device bool film_adaptive_sampling_convergence_check(KernelGlobals kg,
const float4 I = kernel_read_pass_float4(buffer + kernel_data.film.pass_combined);
const float sample = __float_as_uint(buffer[kernel_data.film.pass_sample_count]);
const float inv_sample = 1.0f / sample;
const float intensity_scale = kernel_data.film.exposure / sample;
/* The per pixel error as seen in section 2.1 of
* "A hierarchical automatic stopping condition for Monte Carlo global illumination" */
const float error_difference = (fabsf(I.x - A.x) + fabsf(I.y - A.y) + fabsf(I.z - A.z)) *
inv_sample;
const float error_normalize = sqrtf((I.x + I.y + I.z) * inv_sample);
intensity_scale;
const float intensity = (I.x + I.y + I.z) * intensity_scale;
/* Anything with R+G+B > 1 is highly exposed - even in sRGB it's a range that
* some displays aren't even able to display without significant losses in
* detalization. Everything with R+G+B > 3 is overexposed and should receive
* even less samples. Filmic-like curves need maximum sampling rate at
* intensity near 0.1-0.2, so threshold of 1 for R+G+B leaves an additional
* fstop in case it is needed for compositing.
*/
float error_normalize;
if (intensity < 1.0f) {
error_normalize = sqrtf(intensity);
}
else {
error_normalize = intensity;
}
/* A small epsilon is added to the divisor to prevent division by zero. */
const float error = error_difference / (0.0001f + error_normalize);
const bool did_converge = (error < threshold);

View File

@@ -157,47 +157,4 @@ ccl_device_inline void film_write_data_passes(KernelGlobals kg,
#endif
}
ccl_device_inline void film_write_data_passes_background(
KernelGlobals kg, IntegratorState state, ccl_global float *ccl_restrict render_buffer)
{
#ifdef __PASSES__
const uint32_t path_flag = INTEGRATOR_STATE(state, path, flag);
if (!(path_flag & PATH_RAY_TRANSPARENT_BACKGROUND)) {
return;
}
/* Don't write data passes for paths that were split off for shadow catchers
* to avoid double-counting. */
if (path_flag & PATH_RAY_SHADOW_CATCHER_PASS) {
return;
}
const int flag = kernel_data.film.pass_flag;
if (!(flag & PASS_ANY)) {
return;
}
if (!(path_flag & PATH_RAY_SINGLE_PASS_DONE)) {
ccl_global float *buffer = film_pass_pixel_render_buffer(kg, state, render_buffer);
if (INTEGRATOR_STATE(state, path, sample) == 0) {
if (flag & PASSMASK(DEPTH)) {
film_overwrite_pass_float(buffer + kernel_data.film.pass_depth, 0.0f);
}
if (flag & PASSMASK(OBJECT_ID)) {
film_overwrite_pass_float(buffer + kernel_data.film.pass_object_id, 0.0f);
}
if (flag & PASSMASK(MATERIAL_ID)) {
film_overwrite_pass_float(buffer + kernel_data.film.pass_material_id, 0.0f);
}
if (flag & PASSMASK(POSITION)) {
film_overwrite_pass_float3(buffer + kernel_data.film.pass_position, zero_float3());
}
}
}
#endif
}
CCL_NAMESPACE_END

View File

@@ -24,8 +24,8 @@ ccl_device void displacement_shader_eval(KernelGlobals kg,
/* this will modify sd->P */
#ifdef __OSL__
if (kg->osl) {
OSLShader::eval_displacement(kg, state, sd);
if (kernel_data.kernel_features & KERNEL_FEATURE_OSL) {
osl_eval_nodes<SHADER_TYPE_DISPLACEMENT>(kg, state, sd, 0);
}
else
#endif

View File

@@ -3,7 +3,6 @@
#pragma once
#include "kernel/film/data_passes.h"
#include "kernel/film/light_passes.h"
#include "kernel/integrator/guiding.h"
@@ -132,7 +131,6 @@ ccl_device_inline void integrate_background(KernelGlobals kg,
/* Write to render buffer. */
film_write_background(kg, state, L, transparent, is_transparent_background_ray, render_buffer);
film_write_data_passes_background(kg, state, render_buffer);
}
ccl_device_inline void integrate_distant_lights(KernelGlobals kg,

View File

@@ -827,13 +827,8 @@ ccl_device void surface_shader_eval(KernelGlobals kg,
sd->num_closure_left = max_closures;
#ifdef __OSL__
if (kg->osl) {
if (sd->object == OBJECT_NONE && sd->lamp == LAMP_NONE) {
OSLShader::eval_background(kg, state, sd, path_flag);
}
else {
OSLShader::eval_surface(kg, state, sd, path_flag);
}
if (kernel_data.kernel_features & KERNEL_FEATURE_OSL) {
osl_eval_nodes<SHADER_TYPE_SURFACE>(kg, state, sd, path_flag);
}
else
#endif

View File

@@ -491,8 +491,8 @@ ccl_device_inline void volume_shader_eval(KernelGlobals kg,
/* evaluate shader */
# ifdef __OSL__
if (kg->osl) {
OSLShader::eval_volume(kg, state, sd, path_flag);
if (kernel_data.kernel_features & KERNEL_FEATURE_OSL) {
osl_eval_nodes<SHADER_TYPE_VOLUME>(kg, state, sd, path_flag);
}
else
# endif

View File

@@ -25,13 +25,18 @@
#include "kernel/osl/osl.h"
#include "kernel/osl/closures_setup.h"
#define TO_VEC3(v) OSL::Vec3(v.x, v.y, v.z)
#define TO_FLOAT3(v) make_float3(v[0], v[1], v[2])
CCL_NAMESPACE_BEGIN
static_assert(sizeof(OSLClosure) == sizeof(OSL::ClosureColor) &&
sizeof(OSLClosureAdd) == sizeof(OSL::ClosureAdd) &&
sizeof(OSLClosureMul) == sizeof(OSL::ClosureMul) &&
sizeof(OSLClosureComponent) == sizeof(OSL::ClosureComponent));
static_assert(sizeof(ShaderGlobals) == sizeof(OSL::ShaderGlobals) &&
offsetof(ShaderGlobals, Ci) == offsetof(OSL::ShaderGlobals, Ci));
/* Registration */
#define OSL_CLOSURE_STRUCT_BEGIN(Upper, lower) \
@@ -60,53 +65,18 @@ void OSLRenderServices::register_closures(OSL::ShadingSystem *ss)
#include "closures_template.h"
}
/* Globals */
/* Surface & Background */
static void shaderdata_to_shaderglobals(const KernelGlobalsCPU *kg,
ShaderData *sd,
const void *state,
uint32_t path_flag,
OSLThreadData *tdata)
template<>
void osl_eval_nodes<SHADER_TYPE_SURFACE>(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
{
OSL::ShaderGlobals *globals = &tdata->globals;
const differential3 dP = differential_from_compact(sd->Ng, sd->dP);
const differential3 dI = differential_from_compact(sd->I, sd->dI);
/* copy from shader data to shader globals */
globals->P = TO_VEC3(sd->P);
globals->dPdx = TO_VEC3(dP.dx);
globals->dPdy = TO_VEC3(dP.dy);
globals->I = TO_VEC3(sd->I);
globals->dIdx = TO_VEC3(dI.dx);
globals->dIdy = TO_VEC3(dI.dy);
globals->N = TO_VEC3(sd->N);
globals->Ng = TO_VEC3(sd->Ng);
globals->u = sd->u;
globals->dudx = sd->du.dx;
globals->dudy = sd->du.dy;
globals->v = sd->v;
globals->dvdx = sd->dv.dx;
globals->dvdy = sd->dv.dy;
globals->dPdu = TO_VEC3(sd->dPdu);
globals->dPdv = TO_VEC3(sd->dPdv);
globals->surfacearea = 1.0f;
globals->time = sd->time;
/* booleans */
globals->raytype = path_flag;
globals->flipHandedness = 0;
globals->backfacing = (sd->flag & SD_BACKFACING);
/* shader data to be used in services callbacks */
globals->renderstate = sd;
/* hacky, we leave it to services to fetch actual object matrix */
globals->shader2common = sd;
globals->object2common = sd;
/* must be set to NULL before execute */
globals->Ci = NULL;
/* setup shader globals from shader data */
OSLThreadData *tdata = kg->osl_tdata;
shaderdata_to_shaderglobals(
kg, sd, path_flag, reinterpret_cast<ShaderGlobals *>(&tdata->globals));
/* clear trace data */
tdata->tracedata.init = false;
@@ -121,53 +91,6 @@ static void shaderdata_to_shaderglobals(const KernelGlobalsCPU *kg,
sd->osl_path_state = (const IntegratorStateCPU *)state;
sd->osl_shadow_path_state = nullptr;
}
}
static void flatten_closure_tree(const KernelGlobalsCPU *kg,
ShaderData *sd,
uint32_t path_flag,
const OSL::ClosureColor *closure,
float3 weight = make_float3(1.0f, 1.0f, 1.0f))
{
/* OSL gives us a closure tree, we flatten it into arrays per
* closure type, for evaluation, sampling, etc later on. */
switch (closure->id) {
case OSL::ClosureColor::MUL: {
OSL::ClosureMul *mul = (OSL::ClosureMul *)closure;
flatten_closure_tree(kg, sd, path_flag, mul->closure, TO_FLOAT3(mul->weight) * weight);
break;
}
case OSL::ClosureColor::ADD: {
OSL::ClosureAdd *add = (OSL::ClosureAdd *)closure;
flatten_closure_tree(kg, sd, path_flag, add->closureA, weight);
flatten_closure_tree(kg, sd, path_flag, add->closureB, weight);
break;
}
#define OSL_CLOSURE_STRUCT_BEGIN(Upper, lower) \
case OSL_CLOSURE_##Upper##_ID: { \
const OSL::ClosureComponent *comp = reinterpret_cast<const OSL::ClosureComponent *>(closure); \
weight *= TO_FLOAT3(comp->w); \
osl_closure_##lower##_setup( \
kg, sd, path_flag, weight, reinterpret_cast<const Upper##Closure *>(comp + 1)); \
break; \
}
#include "closures_template.h"
default:
break;
}
}
/* Surface */
void OSLShader::eval_surface(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
{
/* setup shader globals from shader data */
OSLThreadData *tdata = kg->osl_tdata;
shaderdata_to_shaderglobals(kg, sd, state, path_flag, tdata);
/* execute shader for this point */
OSL::ShadingSystem *ss = (OSL::ShadingSystem *)kg->osl_ss;
@@ -175,101 +98,99 @@ void OSLShader::eval_surface(const KernelGlobalsCPU *kg,
OSL::ShadingContext *octx = tdata->context;
int shader = sd->shader & SHADER_MASK;
/* automatic bump shader */
if (kg->osl->bump_state[shader]) {
/* save state */
const float3 P = sd->P;
const float dP = sd->dP;
const OSL::Vec3 dPdx = globals->dPdx;
const OSL::Vec3 dPdy = globals->dPdy;
if (sd->object == OBJECT_NONE && sd->lamp == LAMP_NONE) {
/* background */
if (kg->osl->background_state) {
ss->execute(octx, *(kg->osl->background_state), *globals);
}
}
else {
/* automatic bump shader */
if (kg->osl->bump_state[shader]) {
/* save state */
const float3 P = sd->P;
const float dP = sd->dP;
const OSL::Vec3 dPdx = globals->dPdx;
const OSL::Vec3 dPdy = globals->dPdy;
/* set state as if undisplaced */
if (sd->flag & SD_HAS_DISPLACEMENT) {
float data[9];
bool found = kg->osl->services->get_attribute(sd,
true,
OSLRenderServices::u_empty,
TypeDesc::TypeVector,
OSLRenderServices::u_geom_undisplaced,
data);
(void)found;
assert(found);
/* set state as if undisplaced */
if (sd->flag & SD_HAS_DISPLACEMENT) {
float data[9];
bool found = kg->osl->services->get_attribute(sd,
true,
OSLRenderServices::u_empty,
TypeDesc::TypeVector,
OSLRenderServices::u_geom_undisplaced,
data);
(void)found;
assert(found);
differential3 tmp_dP;
memcpy(&sd->P, data, sizeof(float) * 3);
memcpy(&tmp_dP.dx, data + 3, sizeof(float) * 3);
memcpy(&tmp_dP.dy, data + 6, sizeof(float) * 3);
differential3 tmp_dP;
memcpy(&sd->P, data, sizeof(float) * 3);
memcpy(&tmp_dP.dx, data + 3, sizeof(float) * 3);
memcpy(&tmp_dP.dy, data + 6, sizeof(float) * 3);
object_position_transform(kg, sd, &sd->P);
object_dir_transform(kg, sd, &tmp_dP.dx);
object_dir_transform(kg, sd, &tmp_dP.dy);
object_position_transform(kg, sd, &sd->P);
object_dir_transform(kg, sd, &tmp_dP.dx);
object_dir_transform(kg, sd, &tmp_dP.dy);
sd->dP = differential_make_compact(tmp_dP);
sd->dP = differential_make_compact(tmp_dP);
globals->P = TO_VEC3(sd->P);
globals->dPdx = TO_VEC3(tmp_dP.dx);
globals->dPdy = TO_VEC3(tmp_dP.dy);
globals->P = TO_VEC3(sd->P);
globals->dPdx = TO_VEC3(tmp_dP.dx);
globals->dPdy = TO_VEC3(tmp_dP.dy);
}
/* execute bump shader */
ss->execute(octx, *(kg->osl->bump_state[shader]), *globals);
/* reset state */
sd->P = P;
sd->dP = dP;
globals->P = TO_VEC3(P);
globals->dPdx = TO_VEC3(dPdx);
globals->dPdy = TO_VEC3(dPdy);
}
/* execute bump shader */
ss->execute(octx, *(kg->osl->bump_state[shader]), *globals);
/* reset state */
sd->P = P;
sd->dP = dP;
globals->P = TO_VEC3(P);
globals->dPdx = TO_VEC3(dPdx);
globals->dPdy = TO_VEC3(dPdy);
}
/* surface shader */
if (kg->osl->surface_state[shader]) {
ss->execute(octx, *(kg->osl->surface_state[shader]), *globals);
/* surface shader */
if (kg->osl->surface_state[shader]) {
ss->execute(octx, *(kg->osl->surface_state[shader]), *globals);
}
}
/* flatten closure tree */
if (globals->Ci) {
flatten_closure_tree(kg, sd, path_flag, globals->Ci);
}
}
/* Background */
void OSLShader::eval_background(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
{
/* setup shader globals from shader data */
OSLThreadData *tdata = kg->osl_tdata;
shaderdata_to_shaderglobals(kg, sd, state, path_flag, tdata);
/* execute shader for this point */
OSL::ShadingSystem *ss = (OSL::ShadingSystem *)kg->osl_ss;
OSL::ShaderGlobals *globals = &tdata->globals;
OSL::ShadingContext *octx = tdata->context;
if (kg->osl->background_state) {
ss->execute(octx, *(kg->osl->background_state), *globals);
}
/* return background color immediately */
if (globals->Ci) {
flatten_closure_tree(kg, sd, path_flag, globals->Ci);
flatten_closure_tree(kg, sd, path_flag, reinterpret_cast<OSLClosure *>(globals->Ci));
}
}
/* Volume */
void OSLShader::eval_volume(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
template<>
void osl_eval_nodes<SHADER_TYPE_VOLUME>(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
{
/* setup shader globals from shader data */
OSLThreadData *tdata = kg->osl_tdata;
shaderdata_to_shaderglobals(kg, sd, state, path_flag, tdata);
shaderdata_to_shaderglobals(
kg, sd, path_flag, reinterpret_cast<ShaderGlobals *>(&tdata->globals));
/* clear trace data */
tdata->tracedata.init = false;
/* Used by render-services. */
sd->osl_globals = kg;
if (path_flag & PATH_RAY_SHADOW) {
sd->osl_path_state = nullptr;
sd->osl_shadow_path_state = (const IntegratorShadowStateCPU *)state;
}
else {
sd->osl_path_state = (const IntegratorStateCPU *)state;
sd->osl_shadow_path_state = nullptr;
}
/* execute shader */
OSL::ShadingSystem *ss = (OSL::ShadingSystem *)kg->osl_ss;
@@ -283,17 +204,30 @@ void OSLShader::eval_volume(const KernelGlobalsCPU *kg,
/* flatten closure tree */
if (globals->Ci) {
flatten_closure_tree(kg, sd, path_flag, globals->Ci);
flatten_closure_tree(kg, sd, path_flag, reinterpret_cast<OSLClosure *>(globals->Ci));
}
}
/* Displacement */
void OSLShader::eval_displacement(const KernelGlobalsCPU *kg, const void *state, ShaderData *sd)
template<>
void osl_eval_nodes<SHADER_TYPE_DISPLACEMENT>(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag)
{
/* setup shader globals from shader data */
OSLThreadData *tdata = kg->osl_tdata;
shaderdata_to_shaderglobals(kg, sd, state, 0, tdata);
shaderdata_to_shaderglobals(
kg, sd, path_flag, reinterpret_cast<ShaderGlobals *>(&tdata->globals));
/* clear trace data */
tdata->tracedata.init = false;
/* Used by render-services. */
sd->osl_globals = kg;
sd->osl_path_state = (const IntegratorStateCPU *)state;
sd->osl_shadow_path_state = nullptr;
/* execute shader */
OSL::ShadingSystem *ss = (OSL::ShadingSystem *)kg->osl_ss;

View File

@@ -40,12 +40,7 @@ CCL_NAMESPACE_BEGIN
const char *label;
#define OSL_CLOSURE_STRUCT_END(Upper, lower) \
} \
; \
ccl_device void osl_closure_##lower##_setup(KernelGlobals kg, \
ccl_private ShaderData *sd, \
uint32_t path_flag, \
float3 weight, \
ccl_private Upper##Closure *closure);
;
#define OSL_CLOSURE_STRUCT_MEMBER(Upper, TYPE, type, name, key) type name;
#define OSL_CLOSURE_STRUCT_ARRAY_MEMBER(Upper, TYPE, type, name, key, size) type name[size];
@@ -210,11 +205,9 @@ ccl_device void osl_closure_microfacet_setup(KernelGlobals kg,
bsdf->ior = closure->ior;
bsdf->T = closure->T;
static OSL::ustring u_ggx("ggx");
static OSL::ustring u_default("default");
/* GGX */
if (closure->distribution == u_ggx || closure->distribution == u_default) {
if (closure->distribution == make_string("ggx", 11253504724482777663ull) ||
closure->distribution == make_string("default", 4430693559278735917ull)) {
if (!closure->refract) {
if (closure->alpha_x == closure->alpha_y) {
/* Isotropic */
@@ -1000,18 +993,14 @@ ccl_device void osl_closure_bssrdf_setup(KernelGlobals kg,
float3 weight,
ccl_private const BSSRDFClosure *closure)
{
static ustring u_burley("burley");
static ustring u_random_walk_fixed_radius("random_walk_fixed_radius");
static ustring u_random_walk("random_walk");
ClosureType type;
if (closure->method == u_burley) {
if (closure->method == make_string("burley", 186330084368958868ull)) {
type = CLOSURE_BSSRDF_BURLEY_ID;
}
else if (closure->method == u_random_walk_fixed_radius) {
else if (closure->method == make_string("random_walk_fixed_radius", 5695810351010063150ull)) {
type = CLOSURE_BSSRDF_RANDOM_WALK_FIXED_RADIUS_ID;
}
else if (closure->method == u_random_walk) {
else if (closure->method == make_string("random_walk", 11360609267673527222ull)) {
type = CLOSURE_BSSRDF_RANDOM_WALK_ID;
}
else {

View File

@@ -40,7 +40,7 @@ OSL_CLOSURE_STRUCT_BEGIN(Transparent, transparent)
OSL_CLOSURE_STRUCT_END(Transparent, transparent)
OSL_CLOSURE_STRUCT_BEGIN(Microfacet, microfacet)
OSL_CLOSURE_STRUCT_MEMBER(Microfacet, STRING, ustring, distribution, NULL)
OSL_CLOSURE_STRUCT_MEMBER(Microfacet, STRING, DeviceString, distribution, NULL)
OSL_CLOSURE_STRUCT_MEMBER(Microfacet, VECTOR, packed_float3, N, NULL)
OSL_CLOSURE_STRUCT_MEMBER(Microfacet, VECTOR, packed_float3, T, NULL)
OSL_CLOSURE_STRUCT_MEMBER(Microfacet, FLOAT, float, alpha_x, NULL)
@@ -210,7 +210,7 @@ OSL_CLOSURE_STRUCT_BEGIN(PhongRamp, phong_ramp)
OSL_CLOSURE_STRUCT_END(PhongRamp, phong_ramp)
OSL_CLOSURE_STRUCT_BEGIN(BSSRDF, bssrdf)
OSL_CLOSURE_STRUCT_MEMBER(BSSRDF, STRING, ustring, method, NULL)
OSL_CLOSURE_STRUCT_MEMBER(BSSRDF, STRING, DeviceString, method, NULL)
OSL_CLOSURE_STRUCT_MEMBER(BSSRDF, VECTOR, packed_float3, N, NULL)
OSL_CLOSURE_STRUCT_MEMBER(BSSRDF, VECTOR, packed_float3, radius, NULL)
OSL_CLOSURE_STRUCT_MEMBER(BSSRDF, VECTOR, packed_float3, albedo, NULL)

View File

@@ -1,38 +1,171 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
/* SPDX-License-Identifier: BSD-3-Clause
*
* Adapted from Open Shading Language
* Copyright (c) 2009-2010 Sony Pictures Imageworks Inc., et al.
* All Rights Reserved.
*
* Modifications Copyright 2011-2022 Blender Foundation. */
#pragma once
/* OSL Shader Engine
*
* Holds all variables to execute and use OSL shaders from the kernel. These
* are initialized externally by OSLShaderManager before rendering starts.
*
* Before/after a thread starts rendering, thread_init/thread_free must be
* called, which will store any per thread OSL state in thread local storage.
* This means no thread state must be passed along in the kernel itself.
* Holds all variables to execute and use OSL shaders from the kernel.
*/
#include "kernel/osl/types.h"
#include "kernel/osl/closures_setup.h"
CCL_NAMESPACE_BEGIN
class OSLShader {
public:
/* eval */
static void eval_surface(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag);
static void eval_background(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag);
static void eval_volume(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag);
static void eval_displacement(const KernelGlobalsCPU *kg, const void *state, ShaderData *sd);
};
ccl_device_inline void shaderdata_to_shaderglobals(KernelGlobals kg,
ccl_private ShaderData *sd,
uint32_t path_flag,
ccl_private ShaderGlobals *globals)
{
const differential3 dP = differential_from_compact(sd->Ng, sd->dP);
const differential3 dI = differential_from_compact(sd->I, sd->dI);
/* copy from shader data to shader globals */
globals->P = sd->P;
globals->dPdx = dP.dx;
globals->dPdy = dP.dy;
globals->I = sd->I;
globals->dIdx = dI.dx;
globals->dIdy = dI.dy;
globals->N = sd->N;
globals->Ng = sd->Ng;
globals->u = sd->u;
globals->dudx = sd->du.dx;
globals->dudy = sd->du.dy;
globals->v = sd->v;
globals->dvdx = sd->dv.dx;
globals->dvdy = sd->dv.dy;
globals->dPdu = sd->dPdu;
globals->dPdv = sd->dPdv;
globals->time = sd->time;
globals->dtime = 1.0f;
globals->surfacearea = 1.0f;
globals->raytype = path_flag;
globals->flipHandedness = 0;
globals->backfacing = (sd->flag & SD_BACKFACING);
/* shader data to be used in services callbacks */
globals->renderstate = sd;
/* hacky, we leave it to services to fetch actual object matrix */
globals->shader2common = sd;
globals->object2common = sd;
/* must be set to NULL before execute */
globals->Ci = nullptr;
}
ccl_device void flatten_closure_tree(KernelGlobals kg,
ccl_private ShaderData *sd,
uint32_t path_flag,
ccl_private const OSLClosure *closure)
{
int stack_size = 0;
float3 weight = one_float3();
float3 weight_stack[16];
ccl_private const OSLClosure *closure_stack[16];
while (closure) {
switch (closure->id) {
case OSL_CLOSURE_MUL_ID: {
ccl_private const OSLClosureMul *mul = static_cast<ccl_private const OSLClosureMul *>(
closure);
weight *= mul->weight;
closure = mul->closure;
continue;
}
case OSL_CLOSURE_ADD_ID: {
if (stack_size >= 16) {
kernel_assert(!"Exhausted OSL closure stack");
break;
}
ccl_private const OSLClosureAdd *add = static_cast<ccl_private const OSLClosureAdd *>(
closure);
closure = add->closureA;
weight_stack[stack_size] = weight;
closure_stack[stack_size++] = add->closureB;
continue;
}
#define OSL_CLOSURE_STRUCT_BEGIN(Upper, lower) \
case OSL_CLOSURE_##Upper##_ID: { \
ccl_private const OSLClosureComponent *comp = \
static_cast<ccl_private const OSLClosureComponent *>(closure); \
osl_closure_##lower##_setup(kg, \
sd, \
path_flag, \
weight * comp->weight, \
reinterpret_cast<ccl_private const Upper##Closure *>(comp + 1)); \
break; \
}
#include "closures_template.h"
default:
break;
}
if (stack_size > 0) {
weight = weight_stack[--stack_size];
closure = closure_stack[stack_size];
}
else {
closure = nullptr;
}
}
}
#ifndef __KERNEL_GPU__
template<ShaderType type>
void osl_eval_nodes(const KernelGlobalsCPU *kg,
const void *state,
ShaderData *sd,
uint32_t path_flag);
#else
template<ShaderType type, typename ConstIntegratorGenericState>
ccl_device_inline void osl_eval_nodes(KernelGlobals kg,
ConstIntegratorGenericState state,
ccl_private ShaderData *sd,
uint32_t path_flag)
{
ShaderGlobals globals;
shaderdata_to_shaderglobals(kg, sd, path_flag, &globals);
const int shader = sd->shader & SHADER_MASK;
# ifdef __KERNEL_OPTIX__
uint8_t group_data[2048];
uint8_t closure_pool[1024];
sd->osl_closure_pool = closure_pool;
unsigned int optix_dc_index = 2 /* NUM_CALLABLE_PROGRAM_GROUPS */ +
(shader + type * kernel_data.max_shaders) * 2;
optixDirectCall<void>(optix_dc_index + 0,
/* shaderglobals_ptr = */ &globals,
/* groupdata_ptr = */ (void *)group_data,
/* userdata_base_ptr = */ (void *)nullptr,
/* output_base_ptr = */ (void *)nullptr,
/* shadeindex = */ 0);
optixDirectCall<void>(optix_dc_index + 1,
/* shaderglobals_ptr = */ &globals,
/* groupdata_ptr = */ (void *)group_data,
/* userdata_base_ptr = */ (void *)nullptr,
/* output_base_ptr = */ (void *)nullptr,
/* shadeindex = */ 0);
# endif
if (globals.Ci) {
flatten_closure_tree(kg, sd, path_flag, globals.Ci);
}
}
#endif
CCL_NAMESPACE_END

View File

@@ -119,8 +119,8 @@ ustring OSLRenderServices::u_u("u");
ustring OSLRenderServices::u_v("v");
ustring OSLRenderServices::u_empty;
OSLRenderServices::OSLRenderServices(OSL::TextureSystem *texture_system)
: OSL::RendererServices(texture_system)
OSLRenderServices::OSLRenderServices(OSL::TextureSystem *texture_system, int device_type)
: OSL::RendererServices(texture_system), device_type_(device_type)
{
}
@@ -131,6 +131,17 @@ OSLRenderServices::~OSLRenderServices()
}
}
int OSLRenderServices::supports(string_view feature) const
{
#ifdef WITH_OPTIX
if (feature == "OptiX") {
return device_type_ == DEVICE_OPTIX;
}
#endif
return false;
}
bool OSLRenderServices::get_matrix(OSL::ShaderGlobals *sg,
OSL::Matrix44 &result,
OSL::TransformationPtr xform,
@@ -1139,29 +1150,40 @@ TextureSystem::TextureHandle *OSLRenderServices::get_texture_handle(ustring file
{
OSLTextureHandleMap::iterator it = textures.find(filename);
/* For non-OIIO textures, just return a pointer to our own OSLTextureHandle. */
if (it != textures.end()) {
if (it->second->type != OSLTextureHandle::OIIO) {
return (TextureSystem::TextureHandle *)it->second.get();
if (device_type_ == DEVICE_CPU) {
/* For non-OIIO textures, just return a pointer to our own OSLTextureHandle. */
if (it != textures.end()) {
if (it->second->type != OSLTextureHandle::OIIO) {
return (TextureSystem::TextureHandle *)it->second.get();
}
}
/* Get handle from OpenImageIO. */
OSL::TextureSystem *ts = m_texturesys;
TextureSystem::TextureHandle *handle = ts->get_texture_handle(filename);
if (handle == NULL) {
return NULL;
}
/* Insert new OSLTextureHandle if needed. */
if (it == textures.end()) {
textures.insert(filename, new OSLTextureHandle(OSLTextureHandle::OIIO));
it = textures.find(filename);
}
/* Assign OIIO texture handle and return. */
it->second->oiio_handle = handle;
return (TextureSystem::TextureHandle *)it->second.get();
}
else {
if (it != textures.end() && it->second->type == OSLTextureHandle::SVM &&
it->second->svm_slots[0].w == -1) {
return reinterpret_cast<TextureSystem::TextureHandle *>(
static_cast<uintptr_t>(it->second->svm_slots[0].y + 1));
}
}
/* Get handle from OpenImageIO. */
OSL::TextureSystem *ts = m_texturesys;
TextureSystem::TextureHandle *handle = ts->get_texture_handle(filename);
if (handle == NULL) {
return NULL;
}
/* Insert new OSLTextureHandle if needed. */
if (it == textures.end()) {
textures.insert(filename, new OSLTextureHandle(OSLTextureHandle::OIIO));
it = textures.find(filename);
}
/* Assign OIIO texture handle and return. */
it->second->oiio_handle = handle;
return (TextureSystem::TextureHandle *)it->second.get();
}
bool OSLRenderServices::good(TextureSystem::TextureHandle *texture_handle)

View File

@@ -22,11 +22,8 @@ class PtexCache;
CCL_NAMESPACE_BEGIN
class Object;
class Scene;
class Shader;
struct ShaderData;
struct float3;
struct KernelGlobalsCPU;
/* OSL Texture Handle
@@ -73,11 +70,13 @@ typedef OIIO::unordered_map_concurrent<ustring, OSLTextureHandleRef, ustringHash
class OSLRenderServices : public OSL::RendererServices {
public:
OSLRenderServices(OSL::TextureSystem *texture_system);
OSLRenderServices(OSL::TextureSystem *texture_system, int device_type);
~OSLRenderServices();
static void register_closures(OSL::ShadingSystem *ss);
int supports(string_view feature) const override;
bool get_matrix(OSL::ShaderGlobals *sg,
OSL::Matrix44 &result,
OSL::TransformationPtr xform,
@@ -324,6 +323,9 @@ class OSLRenderServices : public OSL::RendererServices {
* and is required because texture handles are cached as part of the shared
* shading system. */
OSLTextureHandleMap textures;
private:
int device_type_;
};
CCL_NAMESPACE_END

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,17 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#define WITH_OSL
// clang-format off
#include "kernel/device/optix/compat.h"
#include "kernel/device/optix/globals.h"
#include "kernel/device/gpu/image.h" /* Texture lookup uses normal CUDA intrinsics. */
#include "kernel/osl/services_gpu.h"
// clang-format on
extern "C" __device__ void __direct_callable__dummy_services()
{
}

View File

@@ -5,9 +5,53 @@
CCL_NAMESPACE_BEGIN
struct DeviceString {
#if defined(__KERNEL_GPU__)
/* Strings are represented by their hashes in CUDA and OptiX. */
size_t str_;
ccl_device_inline_method uint64_t hash() const
{
return str_;
}
#elif defined(OPENIMAGEIO_USTRING_H)
ustring str_;
ccl_device_inline_method uint64_t hash() const
{
return str_.hash();
}
#else
const char *str_;
#endif
ccl_device_inline_method bool operator==(DeviceString b) const
{
return str_ == b.str_;
}
ccl_device_inline_method bool operator!=(DeviceString b) const
{
return str_ != b.str_;
}
};
ccl_device_inline DeviceString make_string(const char *str, size_t hash)
{
#if defined(__KERNEL_GPU__)
(void)str;
return {hash};
#elif defined(OPENIMAGEIO_USTRING_H)
(void)hash;
return {ustring(str)};
#else
(void)hash;
return {str};
#endif
}
/* Closure */
enum ClosureTypeOSL {
enum OSLClosureType {
OSL_CLOSURE_MUL_ID = -1,
OSL_CLOSURE_ADD_ID = -2,
@@ -17,4 +61,60 @@ enum ClosureTypeOSL {
#include "closures_template.h"
};
struct OSLClosure {
OSLClosureType id;
};
struct ccl_align(8) OSLClosureMul : public OSLClosure
{
packed_float3 weight;
ccl_private const OSLClosure *closure;
};
struct ccl_align(8) OSLClosureAdd : public OSLClosure
{
ccl_private const OSLClosure *closureA;
ccl_private const OSLClosure *closureB;
};
struct ccl_align(8) OSLClosureComponent : public OSLClosure
{
packed_float3 weight;
};
/* Globals */
struct ShaderGlobals {
packed_float3 P, dPdx, dPdy;
packed_float3 dPdz;
packed_float3 I, dIdx, dIdy;
packed_float3 N;
packed_float3 Ng;
float u, dudx, dudy;
float v, dvdx, dvdy;
packed_float3 dPdu, dPdv;
float time;
float dtime;
packed_float3 dPdtime;
packed_float3 Ps, dPsdx, dPsdy;
ccl_private void *renderstate;
ccl_private void *tracedata;
ccl_private void *objdata;
void *context;
void *renderer;
ccl_private void *object2common;
ccl_private void *shader2common;
ccl_private OSLClosure *Ci;
float surfacearea;
int raytype;
int flipHandedness;
int backfacing;
};
struct OSLNoiseOptions {
};
struct OSLTextureOptions {
};
CCL_NAMESPACE_END

View File

@@ -39,11 +39,11 @@ ccl_device_noinline_cpu float perlin_1d(float x)
}
/* 2D, 3D, and 4D noise can be accelerated using SSE, so we first check if
* SSE is supported, that is, if __KERNEL_SSE2__ is defined. If it is not
* SSE is supported, that is, if __KERNEL_SSE__ is defined. If it is not
* supported, we do a standard implementation, but if it is supported, we
* do an implementation using SSE intrinsics.
*/
#if !defined(__KERNEL_SSE2__)
#if !defined(__KERNEL_SSE__)
/* ** Standard Implementation ** */
@@ -250,18 +250,18 @@ ccl_device_noinline_cpu float perlin_4d(float x, float y, float z, float w)
/* SSE Bilinear Interpolation:
*
* The function takes two ssef inputs:
* The function takes two float4 inputs:
* - p : Contains the values at the points (v0, v1, v2, v3).
* - f : Contains the values (x, y, _, _). The third and fourth values are unused.
*
* The interpolation is done in two steps:
* 1. Interpolate (v0, v1) and (v2, v3) along the x axis to get g (g0, g1).
* (v2, v3) is generated by moving v2 and v3 to the first and second
* places of the ssef using the shuffle mask <2, 3, 2, 3>. The third and
* places of the float4 using the shuffle mask <2, 3, 2, 3>. The third and
* fourth values are unused.
* 2. Interpolate g0 and g1 along the y axis to get the final value.
* g1 is generated by populating an ssef with the second value of g.
* Only the first value is important in the final ssef.
* g1 is generated by populating an float4 with the second value of g.
* Only the first value is important in the final float4.
*
* v1 v3 g1
* @ + + + + @ @ y
@@ -272,27 +272,27 @@ ccl_device_noinline_cpu float perlin_4d(float x, float y, float z, float w)
* v0 v2 g0
*
*/
ccl_device_inline ssef bi_mix(ssef p, ssef f)
ccl_device_inline float4 bi_mix(float4 p, float4 f)
{
ssef g = mix(p, shuffle<2, 3, 2, 3>(p), shuffle<0>(f));
float4 g = mix(p, shuffle<2, 3, 2, 3>(p), shuffle<0>(f));
return mix(g, shuffle<1>(g), shuffle<1>(f));
}
ccl_device_inline ssef fade(const ssef &t)
ccl_device_inline float4 fade(const float4 t)
{
ssef a = madd(t, 6.0f, -15.0f);
ssef b = madd(t, a, 10.0f);
float4 a = madd(t, make_float4(6.0f), make_float4(-15.0f));
float4 b = madd(t, a, make_float4(10.0f));
return (t * t) * (t * b);
}
/* Negate val if the nth bit of h is 1. */
# define negate_if_nth_bit(val, h, n) ((val) ^ cast(((h) & (1 << (n))) << (31 - (n))))
ccl_device_inline ssef grad(const ssei &hash, const ssef &x, const ssef &y)
ccl_device_inline float4 grad(const int4 hash, const float4 x, const float4 y)
{
ssei h = hash & 7;
ssef u = select(h < 4, x, y);
ssef v = 2.0f * select(h < 4, y, x);
int4 h = hash & 7;
float4 u = select(h < 4, x, y);
float4 v = 2.0f * select(h < 4, y, x);
return negate_if_nth_bit(u, h, 0) + negate_if_nth_bit(v, h, 1);
}
@@ -310,28 +310,28 @@ ccl_device_inline ssef grad(const ssei &hash, const ssef &x, const ssef &y)
*/
ccl_device_noinline_cpu float perlin_2d(float x, float y)
{
ssei XY;
ssef fxy = floorfrac(ssef(x, y, 0.0f, 0.0f), &XY);
ssef uv = fade(fxy);
int4 XY;
float4 fxy = floorfrac(make_float4(x, y, 0.0f, 0.0f), &XY);
float4 uv = fade(fxy);
ssei XY1 = XY + 1;
ssei X = shuffle<0, 0, 0, 0>(XY, XY1);
ssei Y = shuffle<0, 2, 0, 2>(shuffle<1, 1, 1, 1>(XY, XY1));
int4 XY1 = XY + make_int4(1);
int4 X = shuffle<0, 0, 0, 0>(XY, XY1);
int4 Y = shuffle<0, 2, 0, 2>(shuffle<1, 1, 1, 1>(XY, XY1));
ssei h = hash_ssei2(X, Y);
int4 h = hash_int4_2(X, Y);
ssef fxy1 = fxy - 1.0f;
ssef fx = shuffle<0, 0, 0, 0>(fxy, fxy1);
ssef fy = shuffle<0, 2, 0, 2>(shuffle<1, 1, 1, 1>(fxy, fxy1));
float4 fxy1 = fxy - make_float4(1.0f);
float4 fx = shuffle<0, 0, 0, 0>(fxy, fxy1);
float4 fy = shuffle<0, 2, 0, 2>(shuffle<1, 1, 1, 1>(fxy, fxy1));
ssef g = grad(h, fx, fy);
float4 g = grad(h, fx, fy);
return extract<0>(bi_mix(g, uv));
}
/* SSE Trilinear Interpolation:
*
* The function takes three ssef inputs:
* The function takes three float4 inputs:
* - p : Contains the values at the points (v0, v1, v2, v3).
* - q : Contains the values at the points (v4, v5, v6, v7).
* - f : Contains the values (x, y, z, _). The fourth value is unused.
@@ -340,11 +340,11 @@ ccl_device_noinline_cpu float perlin_2d(float x, float y)
* 1. Interpolate p and q along the x axis to get s (s0, s1, s2, s3).
* 2. Interpolate (s0, s1) and (s2, s3) along the y axis to get g (g0, g1).
* (s2, s3) is generated by moving v2 and v3 to the first and second
* places of the ssef using the shuffle mask <2, 3, 2, 3>. The third and
* places of the float4 using the shuffle mask <2, 3, 2, 3>. The third and
* fourth values are unused.
* 3. Interpolate g0 and g1 along the z axis to get the final value.
* g1 is generated by populating an ssef with the second value of g.
* Only the first value is important in the final ssef.
* g1 is generated by populating an float4 with the second value of g.
* Only the first value is important in the final float4.
*
* v3 v7
* @ + + + + + + @ s3 @
@@ -362,10 +362,10 @@ ccl_device_noinline_cpu float perlin_2d(float x, float y)
* @ + + + + + + @ @
* v0 v4 s0
*/
ccl_device_inline ssef tri_mix(ssef p, ssef q, ssef f)
ccl_device_inline float4 tri_mix(float4 p, float4 q, float4 f)
{
ssef s = mix(p, q, shuffle<0>(f));
ssef g = mix(s, shuffle<2, 3, 2, 3>(s), shuffle<1>(f));
float4 s = mix(p, q, shuffle<0>(f));
float4 g = mix(s, shuffle<2, 3, 2, 3>(s), shuffle<1>(f));
return mix(g, shuffle<1>(g), shuffle<2>(f));
}
@@ -374,24 +374,24 @@ ccl_device_inline ssef tri_mix(ssef p, ssef q, ssef f)
* supported, we do an SSE implementation, but if it is supported,
* we do an implementation using AVX intrinsics.
*/
# if !defined(__KERNEL_AVX__)
# if !defined(__KERNEL_AVX2__)
ccl_device_inline ssef grad(const ssei &hash, const ssef &x, const ssef &y, const ssef &z)
ccl_device_inline float4 grad(const int4 hash, const float4 x, const float4 y, const float4 z)
{
ssei h = hash & 15;
ssef u = select(h < 8, x, y);
ssef vt = select((h == 12) | (h == 14), x, z);
ssef v = select(h < 4, y, vt);
int4 h = hash & 15;
float4 u = select(h < 8, x, y);
float4 vt = select((h == 12) | (h == 14), x, z);
float4 v = select(h < 4, y, vt);
return negate_if_nth_bit(u, h, 0) + negate_if_nth_bit(v, h, 1);
}
ccl_device_inline ssef
grad(const ssei &hash, const ssef &x, const ssef &y, const ssef &z, const ssef &w)
ccl_device_inline float4
grad(const int4 hash, const float4 x, const float4 y, const float4 z, const float4 w)
{
ssei h = hash & 31;
ssef u = select(h < 24, x, y);
ssef v = select(h < 16, y, z);
ssef s = select(h < 8, z, w);
int4 h = hash & 31;
float4 u = select(h < 24, x, y);
float4 v = select(h < 16, y, z);
float4 s = select(h < 8, z, w);
return negate_if_nth_bit(u, h, 0) + negate_if_nth_bit(v, h, 1) + negate_if_nth_bit(s, h, 2);
}
@@ -401,7 +401,7 @@ grad(const ssei &hash, const ssef &x, const ssef &y, const ssef &z, const ssef &
* between two trilinear interpolations.
*
*/
ccl_device_inline ssef quad_mix(ssef p, ssef q, ssef r, ssef s, ssef f)
ccl_device_inline float4 quad_mix(float4 p, float4 q, float4 r, float4 s, float4 f)
{
return mix(tri_mix(p, q, f), tri_mix(r, s, f), shuffle<3>(f));
}
@@ -427,23 +427,23 @@ ccl_device_inline ssef quad_mix(ssef p, ssef q, ssef r, ssef s, ssef f)
*/
ccl_device_noinline_cpu float perlin_3d(float x, float y, float z)
{
ssei XYZ;
ssef fxyz = floorfrac(ssef(x, y, z, 0.0f), &XYZ);
ssef uvw = fade(fxyz);
int4 XYZ;
float4 fxyz = floorfrac(make_float4(x, y, z, 0.0f), &XYZ);
float4 uvw = fade(fxyz);
ssei XYZ1 = XYZ + 1;
ssei Y = shuffle<1, 1, 1, 1>(XYZ, XYZ1);
ssei Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZ, XYZ1));
int4 XYZ1 = XYZ + make_int4(1);
int4 Y = shuffle<1, 1, 1, 1>(XYZ, XYZ1);
int4 Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZ, XYZ1));
ssei h1 = hash_ssei3(shuffle<0>(XYZ), Y, Z);
ssei h2 = hash_ssei3(shuffle<0>(XYZ1), Y, Z);
int4 h1 = hash_int4_3(shuffle<0>(XYZ), Y, Z);
int4 h2 = hash_int4_3(shuffle<0>(XYZ1), Y, Z);
ssef fxyz1 = fxyz - 1.0f;
ssef fy = shuffle<1, 1, 1, 1>(fxyz, fxyz1);
ssef fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyz, fxyz1));
float4 fxyz1 = fxyz - make_float4(1.0f);
float4 fy = shuffle<1, 1, 1, 1>(fxyz, fxyz1);
float4 fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyz, fxyz1));
ssef g1 = grad(h1, shuffle<0>(fxyz), fy, fz);
ssef g2 = grad(h2, shuffle<0>(fxyz1), fy, fz);
float4 g1 = grad(h1, shuffle<0>(fxyz), fy, fz);
float4 g2 = grad(h2, shuffle<0>(fxyz1), fy, fz);
return extract<0>(tri_mix(g1, g2, uvw));
}
@@ -481,29 +481,29 @@ ccl_device_noinline_cpu float perlin_3d(float x, float y, float z)
*/
ccl_device_noinline_cpu float perlin_4d(float x, float y, float z, float w)
{
ssei XYZW;
ssef fxyzw = floorfrac(ssef(x, y, z, w), &XYZW);
ssef uvws = fade(fxyzw);
int4 XYZW;
float4 fxyzw = floorfrac(make_float4(x, y, z, w), &XYZW);
float4 uvws = fade(fxyzw);
ssei XYZW1 = XYZW + 1;
ssei Y = shuffle<1, 1, 1, 1>(XYZW, XYZW1);
ssei Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZW, XYZW1));
int4 XYZW1 = XYZW + make_int4(1);
int4 Y = shuffle<1, 1, 1, 1>(XYZW, XYZW1);
int4 Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZW, XYZW1));
ssei h1 = hash_ssei4(shuffle<0>(XYZW), Y, Z, shuffle<3>(XYZW));
ssei h2 = hash_ssei4(shuffle<0>(XYZW1), Y, Z, shuffle<3>(XYZW));
int4 h1 = hash_int4_4(shuffle<0>(XYZW), Y, Z, shuffle<3>(XYZW));
int4 h2 = hash_int4_4(shuffle<0>(XYZW1), Y, Z, shuffle<3>(XYZW));
ssei h3 = hash_ssei4(shuffle<0>(XYZW), Y, Z, shuffle<3>(XYZW1));
ssei h4 = hash_ssei4(shuffle<0>(XYZW1), Y, Z, shuffle<3>(XYZW1));
int4 h3 = hash_int4_4(shuffle<0>(XYZW), Y, Z, shuffle<3>(XYZW1));
int4 h4 = hash_int4_4(shuffle<0>(XYZW1), Y, Z, shuffle<3>(XYZW1));
ssef fxyzw1 = fxyzw - 1.0f;
ssef fy = shuffle<1, 1, 1, 1>(fxyzw, fxyzw1);
ssef fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyzw, fxyzw1));
float4 fxyzw1 = fxyzw - make_float4(1.0f);
float4 fy = shuffle<1, 1, 1, 1>(fxyzw, fxyzw1);
float4 fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyzw, fxyzw1));
ssef g1 = grad(h1, shuffle<0>(fxyzw), fy, fz, shuffle<3>(fxyzw));
ssef g2 = grad(h2, shuffle<0>(fxyzw1), fy, fz, shuffle<3>(fxyzw));
float4 g1 = grad(h1, shuffle<0>(fxyzw), fy, fz, shuffle<3>(fxyzw));
float4 g2 = grad(h2, shuffle<0>(fxyzw1), fy, fz, shuffle<3>(fxyzw));
ssef g3 = grad(h3, shuffle<0>(fxyzw), fy, fz, shuffle<3>(fxyzw1));
ssef g4 = grad(h4, shuffle<0>(fxyzw1), fy, fz, shuffle<3>(fxyzw1));
float4 g3 = grad(h3, shuffle<0>(fxyzw), fy, fz, shuffle<3>(fxyzw1));
float4 g4 = grad(h4, shuffle<0>(fxyzw1), fy, fz, shuffle<3>(fxyzw1));
return extract<0>(quad_mix(g1, g2, g3, g4, uvws));
}
@@ -512,22 +512,22 @@ ccl_device_noinline_cpu float perlin_4d(float x, float y, float z, float w)
/* AVX Implementation */
ccl_device_inline avxf grad(const avxi &hash, const avxf &x, const avxf &y, const avxf &z)
ccl_device_inline vfloat8 grad(const vint8 hash, const vfloat8 x, const vfloat8 y, const vfloat8 z)
{
avxi h = hash & 15;
avxf u = select(h < 8, x, y);
avxf vt = select((h == 12) | (h == 14), x, z);
avxf v = select(h < 4, y, vt);
vint8 h = hash & 15;
vfloat8 u = select(h < 8, x, y);
vfloat8 vt = select((h == 12) | (h == 14), x, z);
vfloat8 v = select(h < 4, y, vt);
return negate_if_nth_bit(u, h, 0) + negate_if_nth_bit(v, h, 1);
}
ccl_device_inline avxf
grad(const avxi &hash, const avxf &x, const avxf &y, const avxf &z, const avxf &w)
ccl_device_inline vfloat8
grad(const vint8 hash, const vfloat8 x, const vfloat8 y, const vfloat8 z, const vfloat8 w)
{
avxi h = hash & 31;
avxf u = select(h < 24, x, y);
avxf v = select(h < 16, y, z);
avxf s = select(h < 8, z, w);
vint8 h = hash & 31;
vfloat8 u = select(h < 24, x, y);
vfloat8 v = select(h < 16, y, z);
vfloat8 s = select(h < 8, z, w);
return negate_if_nth_bit(u, h, 0) + negate_if_nth_bit(v, h, 1) + negate_if_nth_bit(s, h, 2);
}
@@ -537,13 +537,13 @@ grad(const avxi &hash, const avxf &x, const avxf &y, const avxf &z, const avxf &
* 1. Interpolate p and q along the w axis to get s.
* 2. Trilinearly interpolate (s0, s1, s2, s3) and (s4, s5, s6, s7) to get the final
* value. (s0, s1, s2, s3) and (s4, s5, s6, s7) are generated by extracting the
* low and high ssef from s.
* low and high float4 from s.
*
*/
ccl_device_inline ssef quad_mix(avxf p, avxf q, ssef f)
ccl_device_inline float4 quad_mix(vfloat8 p, vfloat8 q, float4 f)
{
ssef fv = shuffle<3>(f);
avxf s = mix(p, q, avxf(fv, fv));
float4 fv = shuffle<3>(f);
vfloat8 s = mix(p, q, make_vfloat8(fv, fv));
return tri_mix(low(s), high(s), f);
}
@@ -565,25 +565,25 @@ ccl_device_inline ssef quad_mix(avxf p, avxf q, ssef f)
*/
ccl_device_noinline_cpu float perlin_3d(float x, float y, float z)
{
ssei XYZ;
ssef fxyz = floorfrac(ssef(x, y, z, 0.0f), &XYZ);
ssef uvw = fade(fxyz);
int4 XYZ;
float4 fxyz = floorfrac(make_float4(x, y, z, 0.0f), &XYZ);
float4 uvw = fade(fxyz);
ssei XYZ1 = XYZ + 1;
ssei X = shuffle<0>(XYZ);
ssei X1 = shuffle<0>(XYZ1);
ssei Y = shuffle<1, 1, 1, 1>(XYZ, XYZ1);
ssei Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZ, XYZ1));
int4 XYZ1 = XYZ + make_int4(1);
int4 X = shuffle<0>(XYZ);
int4 X1 = shuffle<0>(XYZ1);
int4 Y = shuffle<1, 1, 1, 1>(XYZ, XYZ1);
int4 Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZ, XYZ1));
avxi h = hash_avxi3(avxi(X, X1), avxi(Y, Y), avxi(Z, Z));
vint8 h = hash_int8_3(make_vint8(X, X1), make_vint8(Y, Y), make_vint8(Z, Z));
ssef fxyz1 = fxyz - 1.0f;
ssef fx = shuffle<0>(fxyz);
ssef fx1 = shuffle<0>(fxyz1);
ssef fy = shuffle<1, 1, 1, 1>(fxyz, fxyz1);
ssef fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyz, fxyz1));
float4 fxyz1 = fxyz - make_float4(1.0f);
float4 fx = shuffle<0>(fxyz);
float4 fx1 = shuffle<0>(fxyz1);
float4 fy = shuffle<1, 1, 1, 1>(fxyz, fxyz1);
float4 fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyz, fxyz1));
avxf g = grad(h, avxf(fx, fx1), avxf(fy, fy), avxf(fz, fz));
vfloat8 g = grad(h, make_vfloat8(fx, fx1), make_vfloat8(fy, fy), make_vfloat8(fz, fz));
return extract<0>(tri_mix(low(g), high(g), uvw));
}
@@ -617,31 +617,37 @@ ccl_device_noinline_cpu float perlin_3d(float x, float y, float z)
*/
ccl_device_noinline_cpu float perlin_4d(float x, float y, float z, float w)
{
ssei XYZW;
ssef fxyzw = floorfrac(ssef(x, y, z, w), &XYZW);
ssef uvws = fade(fxyzw);
int4 XYZW;
float4 fxyzw = floorfrac(make_float4(x, y, z, w), &XYZW);
float4 uvws = fade(fxyzw);
ssei XYZW1 = XYZW + 1;
ssei X = shuffle<0>(XYZW);
ssei X1 = shuffle<0>(XYZW1);
ssei Y = shuffle<1, 1, 1, 1>(XYZW, XYZW1);
ssei Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZW, XYZW1));
ssei W = shuffle<3>(XYZW);
ssei W1 = shuffle<3>(XYZW1);
int4 XYZW1 = XYZW + make_int4(1);
int4 X = shuffle<0>(XYZW);
int4 X1 = shuffle<0>(XYZW1);
int4 Y = shuffle<1, 1, 1, 1>(XYZW, XYZW1);
int4 Z = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(XYZW, XYZW1));
int4 W = shuffle<3>(XYZW);
int4 W1 = shuffle<3>(XYZW1);
avxi h1 = hash_avxi4(avxi(X, X1), avxi(Y, Y), avxi(Z, Z), avxi(W, W));
avxi h2 = hash_avxi4(avxi(X, X1), avxi(Y, Y), avxi(Z, Z), avxi(W1, W1));
vint8 h1 = hash_int8_4(make_vint8(X, X1), make_vint8(Y, Y), make_vint8(Z, Z), make_vint8(W, W));
vint8 h2 = hash_int8_4(
make_vint8(X, X1), make_vint8(Y, Y), make_vint8(Z, Z), make_vint8(W1, W1));
ssef fxyzw1 = fxyzw - 1.0f;
ssef fx = shuffle<0>(fxyzw);
ssef fx1 = shuffle<0>(fxyzw1);
ssef fy = shuffle<1, 1, 1, 1>(fxyzw, fxyzw1);
ssef fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyzw, fxyzw1));
ssef fw = shuffle<3>(fxyzw);
ssef fw1 = shuffle<3>(fxyzw1);
float4 fxyzw1 = fxyzw - make_float4(1.0f);
float4 fx = shuffle<0>(fxyzw);
float4 fx1 = shuffle<0>(fxyzw1);
float4 fy = shuffle<1, 1, 1, 1>(fxyzw, fxyzw1);
float4 fz = shuffle<0, 2, 0, 2>(shuffle<2, 2, 2, 2>(fxyzw, fxyzw1));
float4 fw = shuffle<3>(fxyzw);
float4 fw1 = shuffle<3>(fxyzw1);
avxf g1 = grad(h1, avxf(fx, fx1), avxf(fy, fy), avxf(fz, fz), avxf(fw, fw));
avxf g2 = grad(h2, avxf(fx, fx1), avxf(fy, fy), avxf(fz, fz), avxf(fw1, fw1));
vfloat8 g1 = grad(
h1, make_vfloat8(fx, fx1), make_vfloat8(fy, fy), make_vfloat8(fz, fz), make_vfloat8(fw, fw));
vfloat8 g2 = grad(h2,
make_vfloat8(fx, fx1),
make_vfloat8(fy, fy),
make_vfloat8(fz, fz),
make_vfloat8(fw1, fw1));
return extract<0>(quad_mix(g1, g2, uvws));
}

View File

@@ -75,10 +75,14 @@ CCL_NAMESPACE_BEGIN
#define __VOLUME__
/* Device specific features */
#ifndef __KERNEL_GPU__
# ifdef WITH_OSL
# define __OSL__
#ifdef WITH_OSL
# define __OSL__
# ifdef __KERNEL_OPTIX__
/* Kernels with OSL support are built separately in OptiX and don't need SVM. */
# undef __SVM__
# endif
#endif
#ifndef __KERNEL_GPU__
# ifdef WITH_PATH_GUIDING
# define __PATH_GUIDING__
# endif
@@ -918,9 +922,13 @@ typedef struct ccl_align(16) ShaderData
float ray_dP;
#ifdef __OSL__
# ifdef __KERNEL_GPU__
ccl_private uint8_t *osl_closure_pool;
# else
const struct KernelGlobalsCPU *osl_globals;
const struct IntegratorStateCPU *osl_path_state;
const struct IntegratorShadowStateCPU *osl_shadow_path_state;
# endif
#endif
/* LCG state for closures that require additional random numbers. */
@@ -1531,6 +1539,9 @@ enum KernelFeatureFlag : uint32_t {
/* Path guiding. */
KERNEL_FEATURE_PATH_GUIDING = (1U << 26U),
/* OSL. */
KERNEL_FEATURE_OSL = (1U << 27U),
};
/* Shader node feature mask, to specialize shader evaluation for kernels. */

View File

@@ -386,46 +386,6 @@ void ConstantFolder::fold_mix_color(NodeMix type, bool clamp_factor, bool clamp)
}
}
void ConstantFolder::fold_mix_float(bool clamp_factor, bool clamp) const
{
ShaderInput *fac_in = node->input("Factor");
ShaderInput *float1_in = node->input("A");
ShaderInput *float2_in = node->input("B");
float fac = clamp_factor ? saturatef(node->get_float(fac_in->socket_type)) :
node->get_float(fac_in->socket_type);
bool fac_is_zero = !fac_in->link && fac == 0.0f;
bool fac_is_one = !fac_in->link && fac == 1.0f;
/* remove no-op node when factor is 0.0 */
if (fac_is_zero) {
if (try_bypass_or_make_constant(float1_in, clamp)) {
return;
}
}
/* remove useless mix floats nodes */
if (float1_in->link && float2_in->link) {
if (float1_in->link == float2_in->link) {
try_bypass_or_make_constant(float1_in, clamp);
return;
}
}
else if (!float1_in->link && !float2_in->link) {
float value1 = node->get_float(float1_in->socket_type);
float value2 = node->get_float(float2_in->socket_type);
if (value1 == value2) {
try_bypass_or_make_constant(float1_in, clamp);
return;
}
}
/* remove no-op mix float node when factor is 1.0 */
if (fac_is_one) {
try_bypass_or_make_constant(float2_in, clamp);
return;
}
}
void ConstantFolder::fold_math(NodeMathType type) const
{
ShaderInput *value1_in = node->input("Value1");

View File

@@ -52,7 +52,6 @@ class ConstantFolder {
/* Specific nodes. */
void fold_mix(NodeMix type, bool clamp) const;
void fold_mix_color(NodeMix type, bool clamp_factor, bool clamp) const;
void fold_mix_float(bool clamp_factor, bool clamp) const;
void fold_math(NodeMathType type) const;
void fold_vector_math(NodeVectorMathType type) const;
void fold_mapping(NodeMappingType type) const;

View File

@@ -38,16 +38,17 @@ OSL::TextureSystem *OSLShaderManager::ts_shared = NULL;
int OSLShaderManager::ts_shared_users = 0;
thread_mutex OSLShaderManager::ts_shared_mutex;
OSL::ShadingSystem *OSLShaderManager::ss_shared = NULL;
OSLRenderServices *OSLShaderManager::services_shared = NULL;
OSL::ErrorHandler OSLShaderManager::errhandler;
map<int, OSL::ShadingSystem *> OSLShaderManager::ss_shared;
int OSLShaderManager::ss_shared_users = 0;
thread_mutex OSLShaderManager::ss_shared_mutex;
thread_mutex OSLShaderManager::ss_mutex;
int OSLCompiler::texture_shared_unique_id = 0;
/* Shader Manager */
OSLShaderManager::OSLShaderManager()
OSLShaderManager::OSLShaderManager(Device *device) : device_(device)
{
texture_system_init();
shading_system_init();
@@ -107,11 +108,12 @@ void OSLShaderManager::device_update_specific(Device *device,
device_free(device, dscene, scene);
/* set texture system */
scene->image_manager->set_osl_texture_system((void *)ts);
/* set texture system (only on CPU devices, since GPU devices cannot use OIIO) */
if (device->info.type == DEVICE_CPU) {
scene->image_manager->set_osl_texture_system((void *)ts_shared);
}
/* create shaders */
OSLGlobals *og = (OSLGlobals *)device->get_cpu_osl_memory();
Shader *background_shader = scene->background->get_shader(scene);
foreach (Shader *shader, scene->shaders) {
@@ -125,22 +127,34 @@ void OSLShaderManager::device_update_specific(Device *device,
* compile shaders alternating */
thread_scoped_lock lock(ss_mutex);
OSLCompiler compiler(this, services, ss, scene);
compiler.background = (shader == background_shader);
compiler.compile(og, shader);
device->foreach_device(
[this, scene, shader, background = (shader == background_shader)](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
OSL::ShadingSystem *ss = ss_shared[sub_device->info.type];
OSLCompiler compiler(this, ss, scene);
compiler.background = background;
compiler.compile(og, shader);
});
if (shader->get_use_mis() && shader->has_surface_emission)
scene->light_manager->tag_update(scene, LightManager::SHADER_COMPILED);
}
/* setup shader engine */
og->ss = ss;
og->ts = ts;
og->services = services;
int background_id = scene->shader_manager->get_shader_id(background_shader);
og->background_state = og->surface_state[background_id & SHADER_MASK];
og->use = true;
device->foreach_device([background_id](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
OSL::ShadingSystem *ss = ss_shared[sub_device->info.type];
og->ss = ss;
og->ts = ts_shared;
og->services = static_cast<OSLRenderServices *>(ss->renderer());
og->background_state = og->surface_state[background_id & SHADER_MASK];
og->use = true;
});
foreach (Shader *shader, scene->shaders)
shader->clear_modified();
@@ -148,8 +162,12 @@ void OSLShaderManager::device_update_specific(Device *device,
update_flags = UPDATE_NONE;
/* add special builtin texture types */
services->textures.insert(ustring("@ao"), new OSLTextureHandle(OSLTextureHandle::AO));
services->textures.insert(ustring("@bevel"), new OSLTextureHandle(OSLTextureHandle::BEVEL));
for (const auto &[device_type, ss] : ss_shared) {
OSLRenderServices *services = static_cast<OSLRenderServices *>(ss->renderer());
services->textures.insert(ustring("@ao"), new OSLTextureHandle(OSLTextureHandle::AO));
services->textures.insert(ustring("@bevel"), new OSLTextureHandle(OSLTextureHandle::BEVEL));
}
device_update_common(device, dscene, scene, progress);
@@ -166,26 +184,35 @@ void OSLShaderManager::device_update_specific(Device *device,
* is being freed after the Session is freed.
*/
thread_scoped_lock lock(ss_shared_mutex);
ss->optimize_all_groups();
for (const auto &[device_type, ss] : ss_shared) {
ss->optimize_all_groups();
}
}
/* load kernels */
if (!device->load_osl_kernels()) {
progress.set_error(device->error_message());
}
}
void OSLShaderManager::device_free(Device *device, DeviceScene *dscene, Scene *scene)
{
OSLGlobals *og = (OSLGlobals *)device->get_cpu_osl_memory();
device_free_common(device, dscene, scene);
/* clear shader engine */
og->use = false;
og->ss = NULL;
og->ts = NULL;
device->foreach_device([](Device *sub_device) {
OSLGlobals *og = (OSLGlobals *)sub_device->get_cpu_osl_memory();
og->surface_state.clear();
og->volume_state.clear();
og->displacement_state.clear();
og->bump_state.clear();
og->background_state.reset();
og->use = false;
og->ss = NULL;
og->ts = NULL;
og->surface_state.clear();
og->volume_state.clear();
og->displacement_state.clear();
og->bump_state.clear();
og->background_state.reset();
});
}
void OSLShaderManager::texture_system_init()
@@ -193,7 +220,7 @@ void OSLShaderManager::texture_system_init()
/* create texture system, shared between different renders to reduce memory usage */
thread_scoped_lock lock(ts_shared_mutex);
if (ts_shared_users == 0) {
if (ts_shared_users++ == 0) {
ts_shared = TextureSystem::create(true);
ts_shared->attribute("automip", 1);
@@ -203,24 +230,18 @@ void OSLShaderManager::texture_system_init()
/* effectively unlimited for now, until we support proper mipmap lookups */
ts_shared->attribute("max_memory_MB", 16384);
}
ts = ts_shared;
ts_shared_users++;
}
void OSLShaderManager::texture_system_free()
{
/* shared texture system decrease users and destroy if no longer used */
thread_scoped_lock lock(ts_shared_mutex);
ts_shared_users--;
if (ts_shared_users == 0) {
if (--ts_shared_users == 0) {
ts_shared->invalidate_all(true);
OSL::TextureSystem::destroy(ts_shared);
ts_shared = NULL;
}
ts = NULL;
}
void OSLShaderManager::shading_system_init()
@@ -228,101 +249,105 @@ void OSLShaderManager::shading_system_init()
/* create shading system, shared between different renders to reduce memory usage */
thread_scoped_lock lock(ss_shared_mutex);
if (ss_shared_users == 0) {
/* Must use aligned new due to concurrent hash map. */
services_shared = util_aligned_new<OSLRenderServices>(ts_shared);
device_->foreach_device([](Device *sub_device) {
const DeviceType device_type = sub_device->info.type;
string shader_path = path_get("shader");
if (ss_shared_users++ == 0 || ss_shared.find(device_type) == ss_shared.end()) {
/* Must use aligned new due to concurrent hash map. */
OSLRenderServices *services = util_aligned_new<OSLRenderServices>(ts_shared, device_type);
string shader_path = path_get("shader");
# ifdef _WIN32
/* Annoying thing, Cycles stores paths in UTF-8 codepage, so it can
* operate with file paths with any character. This requires to use wide
* char functions, but OSL uses old fashioned ANSI functions which means:
*
* - We have to convert our paths to ANSI before passing to OSL
* - OSL can't be used when there's a multi-byte character in the path
* to the shaders folder.
*/
shader_path = string_to_ansi(shader_path);
/* Annoying thing, Cycles stores paths in UTF-8 codepage, so it can
* operate with file paths with any character. This requires to use wide
* char functions, but OSL uses old fashioned ANSI functions which means:
*
* - We have to convert our paths to ANSI before passing to OSL
* - OSL can't be used when there's a multi-byte character in the path
* to the shaders folder.
*/
shader_path = string_to_ansi(shader_path);
# endif
ss_shared = new OSL::ShadingSystem(services_shared, ts_shared, &errhandler);
ss_shared->attribute("lockgeom", 1);
ss_shared->attribute("commonspace", "world");
ss_shared->attribute("searchpath:shader", shader_path);
ss_shared->attribute("greedyjit", 1);
OSL::ShadingSystem *ss = new OSL::ShadingSystem(services, ts_shared, &errhandler);
ss->attribute("lockgeom", 1);
ss->attribute("commonspace", "world");
ss->attribute("searchpath:shader", shader_path);
ss->attribute("greedyjit", 1);
VLOG_INFO << "Using shader search path: " << shader_path;
VLOG_INFO << "Using shader search path: " << shader_path;
/* our own ray types */
static const char *raytypes[] = {
"camera", /* PATH_RAY_CAMERA */
"reflection", /* PATH_RAY_REFLECT */
"refraction", /* PATH_RAY_TRANSMIT */
"diffuse", /* PATH_RAY_DIFFUSE */
"glossy", /* PATH_RAY_GLOSSY */
"singular", /* PATH_RAY_SINGULAR */
"transparent", /* PATH_RAY_TRANSPARENT */
"volume_scatter", /* PATH_RAY_VOLUME_SCATTER */
/* our own ray types */
static const char *raytypes[] = {
"camera", /* PATH_RAY_CAMERA */
"reflection", /* PATH_RAY_REFLECT */
"refraction", /* PATH_RAY_TRANSMIT */
"diffuse", /* PATH_RAY_DIFFUSE */
"glossy", /* PATH_RAY_GLOSSY */
"singular", /* PATH_RAY_SINGULAR */
"transparent", /* PATH_RAY_TRANSPARENT */
"volume_scatter", /* PATH_RAY_VOLUME_SCATTER */
"shadow", /* PATH_RAY_SHADOW_OPAQUE */
"shadow", /* PATH_RAY_SHADOW_TRANSPARENT */
"shadow", /* PATH_RAY_SHADOW_OPAQUE */
"shadow", /* PATH_RAY_SHADOW_TRANSPARENT */
"__unused__", /* PATH_RAY_NODE_UNALIGNED */
"__unused__", /* PATH_RAY_MIS_SKIP */
"__unused__", /* PATH_RAY_NODE_UNALIGNED */
"__unused__", /* PATH_RAY_MIS_SKIP */
"diffuse_ancestor", /* PATH_RAY_DIFFUSE_ANCESTOR */
"diffuse_ancestor", /* PATH_RAY_DIFFUSE_ANCESTOR */
/* Remaining irrelevant bits up to 32. */
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
};
/* Remaining irrelevant bits up to 32. */
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
"__unused__",
};
const int nraytypes = sizeof(raytypes) / sizeof(raytypes[0]);
ss_shared->attribute("raytypes", TypeDesc(TypeDesc::STRING, nraytypes), raytypes);
const int nraytypes = sizeof(raytypes) / sizeof(raytypes[0]);
ss->attribute("raytypes", TypeDesc(TypeDesc::STRING, nraytypes), raytypes);
OSLRenderServices::register_closures(ss_shared);
OSLRenderServices::register_closures(ss);
loaded_shaders.clear();
}
ss_shared[device_type] = ss;
}
});
ss = ss_shared;
services = services_shared;
ss_shared_users++;
loaded_shaders.clear();
}
void OSLShaderManager::shading_system_free()
{
/* shared shading system decrease users and destroy if no longer used */
thread_scoped_lock lock(ss_shared_mutex);
ss_shared_users--;
if (ss_shared_users == 0) {
delete ss_shared;
ss_shared = NULL;
device_->foreach_device([](Device * /*sub_device*/) {
if (--ss_shared_users == 0) {
for (const auto &[device_type, ss] : ss_shared) {
OSLRenderServices *services = static_cast<OSLRenderServices *>(ss->renderer());
util_aligned_delete(services_shared);
services_shared = NULL;
}
delete ss;
ss = NULL;
services = NULL;
util_aligned_delete(services);
}
ss_shared.clear();
}
});
}
bool OSLShaderManager::osl_compile(const string &inputfile, const string &outputfile)
@@ -447,7 +472,9 @@ const char *OSLShaderManager::shader_load_filepath(string filepath)
const char *OSLShaderManager::shader_load_bytecode(const string &hash, const string &bytecode)
{
ss->LoadMemoryCompiledShader(hash.c_str(), bytecode.c_str());
for (const auto &[device_type, ss] : ss_shared) {
ss->LoadMemoryCompiledShader(hash.c_str(), bytecode.c_str());
}
OSLShaderInfo info;
@@ -614,11 +641,11 @@ OSLNode *OSLShaderManager::osl_node(ShaderGraph *graph,
/* Graph Compiler */
OSLCompiler::OSLCompiler(OSLShaderManager *manager,
OSLRenderServices *services,
OSL::ShadingSystem *ss,
Scene *scene)
: scene(scene), manager(manager), services(services), ss(ss)
OSLCompiler::OSLCompiler(OSLShaderManager *manager, OSL::ShadingSystem *ss, Scene *scene)
: scene(scene),
manager(manager),
services(static_cast<OSLRenderServices *>(ss->renderer())),
ss(ss)
{
current_type = SHADER_TYPE_SURFACE;
current_shader = NULL;
@@ -629,6 +656,8 @@ string OSLCompiler::id(ShaderNode *node)
{
/* assign layer unique name based on pointer address + bump mode */
stringstream stream;
stream.imbue(std::locale("C")); /* Ensure that no grouping characters (e.g. commas with en_US
locale) are added to the pointer string */
stream << "node_" << node->type->name << "_" << node;
return stream.str();
@@ -1124,7 +1153,12 @@ OSL::ShaderGroupRef OSLCompiler::compile_type(Shader *shader, ShaderGraph *graph
{
current_type = type;
OSL::ShaderGroupRef group = ss->ShaderGroupBegin(shader->name.c_str());
/* Use name hash to identify shader group to avoid issues with non-alphanumeric characters */
stringstream name;
name.imbue(std::locale("C"));
name << "shader_" << shader->name.hash();
OSL::ShaderGroupRef group = ss->ShaderGroupBegin(name.str());
ShaderNode *output = graph->output();
ShaderNodeSet dependencies;

View File

@@ -54,7 +54,7 @@ struct OSLShaderInfo {
class OSLShaderManager : public ShaderManager {
public:
OSLShaderManager();
OSLShaderManager(Device *device);
~OSLShaderManager();
static void free_memory();
@@ -92,25 +92,22 @@ class OSLShaderManager : public ShaderManager {
const std::string &bytecode_hash = "",
const std::string &bytecode = "");
protected:
private:
void texture_system_init();
void texture_system_free();
void shading_system_init();
void shading_system_free();
OSL::ShadingSystem *ss;
OSL::TextureSystem *ts;
OSLRenderServices *services;
OSL::ErrorHandler errhandler;
Device *device_;
map<string, OSLShaderInfo> loaded_shaders;
static OSL::TextureSystem *ts_shared;
static thread_mutex ts_shared_mutex;
static int ts_shared_users;
static OSL::ShadingSystem *ss_shared;
static OSLRenderServices *services_shared;
static OSL::ErrorHandler errhandler;
static map<int, OSL::ShadingSystem *> ss_shared;
static thread_mutex ss_shared_mutex;
static thread_mutex ss_mutex;
static int ss_shared_users;
@@ -123,10 +120,7 @@ class OSLShaderManager : public ShaderManager {
class OSLCompiler {
public:
#ifdef WITH_OSL
OSLCompiler(OSLShaderManager *manager,
OSLRenderServices *services,
OSL::ShadingSystem *shadingsys,
Scene *scene);
OSLCompiler(OSLShaderManager *manager, OSL::ShadingSystem *shadingsys, Scene *scene);
#endif
void compile(OSLGlobals *og, Shader *shader);

View File

@@ -99,11 +99,8 @@ Scene::Scene(const SceneParams &params_, Device *device)
{
memset((void *)&dscene.data, 0, sizeof(dscene.data));
/* OSL only works on the CPU */
if (device->info.has_osl)
shader_manager = ShaderManager::create(params.shadingsystem);
else
shader_manager = ShaderManager::create(SHADINGSYSTEM_SVM);
shader_manager = ShaderManager::create(
device->info.has_osl ? params.shadingsystem : SHADINGSYSTEM_SVM, device);
light_manager = new LightManager();
geometry_manager = new GeometryManager();

View File

@@ -395,15 +395,16 @@ ShaderManager::~ShaderManager()
{
}
ShaderManager *ShaderManager::create(int shadingsystem)
ShaderManager *ShaderManager::create(int shadingsystem, Device *device)
{
ShaderManager *manager;
(void)shadingsystem; /* Ignored when built without OSL. */
(void)device;
#ifdef WITH_OSL
if (shadingsystem == SHADINGSYSTEM_OSL) {
manager = new OSLShaderManager();
manager = new OSLShaderManager(device);
}
else
#endif
@@ -722,6 +723,10 @@ uint ShaderManager::get_kernel_features(Scene *scene)
}
}
if (use_osl()) {
kernel_features |= KERNEL_FEATURE_OSL;
}
return kernel_features;
}

View File

@@ -170,7 +170,7 @@ class ShaderManager {
UPDATE_NONE = 0u,
};
static ShaderManager *create(int shadingsystem);
static ShaderManager *create(int shadingsystem, Device *device);
virtual ~ShaderManager();
virtual void reset(Scene *scene) = 0;

View File

@@ -5132,9 +5132,6 @@ void MixFloatNode::constant_fold(const ConstantFolder &folder)
}
folder.make_constant(a * (1 - fac) + b * fac);
}
else {
folder.fold_mix_float(use_clamp, false);
}
}
/* Mix Vector */
@@ -5188,9 +5185,6 @@ void MixVectorNode::constant_fold(const ConstantFolder &folder)
}
folder.make_constant(a * (one_float3() - fac) + b * fac);
}
else {
folder.fold_mix_color(NODE_MIX_BLEND, use_clamp, false);
}
}
/* Mix Vector Non Uniform */

View File

@@ -1539,6 +1539,10 @@ class OSLNode final : public ShaderNode {
{
return true;
}
virtual int get_feature()
{
return ShaderNode::get_feature() | KERNEL_FEATURE_NODE_RAYTRACE;
}
virtual bool equals(const ShaderNode & /*other*/)
{

View File

@@ -45,17 +45,24 @@ set(SRC
# Disable AVX tests on macOS. Rosetta has problems running them, and other
# platforms should be enough to verify AVX operations are implemented correctly.
if(NOT APPLE)
if(CXX_HAS_SSE)
list(APPEND SRC
util_float8_sse2_test.cpp
)
set_source_files_properties(util_float8_avx_test.cpp PROPERTIES COMPILE_FLAGS "${CYCLES_SSE2_KERNEL_FLAGS}")
endif()
if(CXX_HAS_AVX)
list(APPEND SRC
util_avxf_avx_test.cpp
util_float8_avx_test.cpp
)
set_source_files_properties(util_avxf_avx_test.cpp PROPERTIES COMPILE_FLAGS "${CYCLES_AVX_KERNEL_FLAGS}")
set_source_files_properties(util_float8_avx_test.cpp PROPERTIES COMPILE_FLAGS "${CYCLES_AVX_KERNEL_FLAGS}")
endif()
if(CXX_HAS_AVX2)
list(APPEND SRC
util_avxf_avx2_test.cpp
util_float8_avx2_test.cpp
)
set_source_files_properties(util_avxf_avx2_test.cpp PROPERTIES COMPILE_FLAGS "${CYCLES_AVX2_KERNEL_FLAGS}")
set_source_files_properties(util_float8_avx2_test.cpp PROPERTIES COMPILE_FLAGS "${CYCLES_AVX2_KERNEL_FLAGS}")
endif()
endif()

View File

@@ -1,211 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#include "testing/testing.h"
#include "util/system.h"
#include "util/types.h"
CCL_NAMESPACE_BEGIN
static bool validate_cpu_capabilities()
{
#ifdef __KERNEL_AVX2__
return system_cpu_support_avx2();
#else
# ifdef __KERNEL_AVX__
return system_cpu_support_avx();
# endif
#endif
}
#define INIT_AVX_TEST \
if (!validate_cpu_capabilities()) \
return; \
\
const avxf avxf_a(0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, 0.8f); \
const avxf avxf_b(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f); \
const avxf avxf_c(1.1f, 2.2f, 3.3f, 4.4f, 5.5f, 6.6f, 7.7f, 8.8f);
#define compare_vector_scalar(a, b) \
for (size_t index = 0; index < a.size; index++) \
EXPECT_FLOAT_EQ(a[index], b);
#define compare_vector_vector(a, b) \
for (size_t index = 0; index < a.size; index++) \
EXPECT_FLOAT_EQ(a[index], b[index]);
#define compare_vector_vector_near(a, b, abserror) \
for (size_t index = 0; index < a.size; index++) \
EXPECT_NEAR(a[index], b[index], abserror);
#define basic_test_vv(a, b, op) \
INIT_AVX_TEST \
avxf c = a op b; \
for (size_t i = 0; i < a.size; i++) \
EXPECT_FLOAT_EQ(c[i], a[i] op b[i]);
/* vector op float tests */
#define basic_test_vf(a, b, op) \
INIT_AVX_TEST \
avxf c = a op b; \
for (size_t i = 0; i < a.size; i++) \
EXPECT_FLOAT_EQ(c[i], a[i] op b);
static const float float_b = 1.5f;
TEST(TEST_CATEGORY_NAME, avxf_add_vv){basic_test_vv(avxf_a, avxf_b, +)} TEST(TEST_CATEGORY_NAME,
avxf_sub_vv){
basic_test_vv(avxf_a, avxf_b, -)} TEST(TEST_CATEGORY_NAME, avxf_mul_vv){
basic_test_vv(avxf_a, avxf_b, *)} TEST(TEST_CATEGORY_NAME, avxf_div_vv){
basic_test_vv(avxf_a, avxf_b, /)} TEST(TEST_CATEGORY_NAME, avxf_add_vf){
basic_test_vf(avxf_a, float_b, +)} TEST(TEST_CATEGORY_NAME, avxf_sub_vf){
basic_test_vf(avxf_a, float_b, -)} TEST(TEST_CATEGORY_NAME, avxf_mul_vf){
basic_test_vf(avxf_a, float_b, *)} TEST(TEST_CATEGORY_NAME,
avxf_div_vf){basic_test_vf(avxf_a, float_b, /)}
TEST(TEST_CATEGORY_NAME, avxf_ctor)
{
INIT_AVX_TEST
compare_vector_scalar(avxf(7.0f, 6.0f, 5.0f, 4.0f, 3.0f, 2.0f, 1.0f, 0.0f),
static_cast<float>(index));
compare_vector_scalar(avxf(1.0f), 1.0f);
compare_vector_vector(avxf(1.0f, 2.0f), avxf(1.0f, 1.0f, 1.0f, 1.0f, 2.0f, 2.0f, 2.0f, 2.0f));
compare_vector_vector(avxf(1.0f, 2.0f, 3.0f, 4.0f),
avxf(1.0f, 2.0f, 3.0f, 4.0f, 1.0f, 2.0f, 3.0f, 4.0f));
compare_vector_vector(avxf(make_float3(1.0f, 2.0f, 3.0f)),
avxf(0.0f, 3.0f, 2.0f, 1.0f, 0.0f, 3.0f, 2.0f, 1.0f));
}
TEST(TEST_CATEGORY_NAME, avxf_sqrt)
{
INIT_AVX_TEST
compare_vector_vector(mm256_sqrt(avxf(1.0f, 4.0f, 9.0f, 16.0f, 25.0f, 36.0f, 49.0f, 64.0f)),
avxf(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f));
}
TEST(TEST_CATEGORY_NAME, avxf_min_max)
{
INIT_AVX_TEST
compare_vector_vector(min(avxf_a, avxf_b), avxf_a);
compare_vector_vector(max(avxf_a, avxf_b), avxf_b);
}
TEST(TEST_CATEGORY_NAME, avxf_set_sign)
{
INIT_AVX_TEST
avxf res = set_sign_bit<1, 0, 0, 0, 0, 0, 0, 0>(avxf_a);
compare_vector_vector(res, avxf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, -0.8f));
}
TEST(TEST_CATEGORY_NAME, avxf_msub)
{
INIT_AVX_TEST
avxf res = msub(avxf_a, avxf_b, avxf_c);
avxf exp = avxf((avxf_a[7] * avxf_b[7]) - avxf_c[7],
(avxf_a[6] * avxf_b[6]) - avxf_c[6],
(avxf_a[5] * avxf_b[5]) - avxf_c[5],
(avxf_a[4] * avxf_b[4]) - avxf_c[4],
(avxf_a[3] * avxf_b[3]) - avxf_c[3],
(avxf_a[2] * avxf_b[2]) - avxf_c[2],
(avxf_a[1] * avxf_b[1]) - avxf_c[1],
(avxf_a[0] * avxf_b[0]) - avxf_c[0]);
compare_vector_vector(res, exp);
}
TEST(TEST_CATEGORY_NAME, avxf_madd)
{
INIT_AVX_TEST
avxf res = madd(avxf_a, avxf_b, avxf_c);
avxf exp = avxf((avxf_a[7] * avxf_b[7]) + avxf_c[7],
(avxf_a[6] * avxf_b[6]) + avxf_c[6],
(avxf_a[5] * avxf_b[5]) + avxf_c[5],
(avxf_a[4] * avxf_b[4]) + avxf_c[4],
(avxf_a[3] * avxf_b[3]) + avxf_c[3],
(avxf_a[2] * avxf_b[2]) + avxf_c[2],
(avxf_a[1] * avxf_b[1]) + avxf_c[1],
(avxf_a[0] * avxf_b[0]) + avxf_c[0]);
compare_vector_vector(res, exp);
}
TEST(TEST_CATEGORY_NAME, avxf_nmadd)
{
INIT_AVX_TEST
avxf res = nmadd(avxf_a, avxf_b, avxf_c);
avxf exp = avxf(avxf_c[7] - (avxf_a[7] * avxf_b[7]),
avxf_c[6] - (avxf_a[6] * avxf_b[6]),
avxf_c[5] - (avxf_a[5] * avxf_b[5]),
avxf_c[4] - (avxf_a[4] * avxf_b[4]),
avxf_c[3] - (avxf_a[3] * avxf_b[3]),
avxf_c[2] - (avxf_a[2] * avxf_b[2]),
avxf_c[1] - (avxf_a[1] * avxf_b[1]),
avxf_c[0] - (avxf_a[0] * avxf_b[0]));
compare_vector_vector(res, exp);
}
TEST(TEST_CATEGORY_NAME, avxf_compare)
{
INIT_AVX_TEST
avxf a(0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f);
avxf b(7.0f, 6.0f, 5.0f, 4.0f, 3.0f, 2.0f, 1.0f, 0.0f);
avxb res = a <= b;
int exp[8] = {
a[0] <= b[0] ? -1 : 0,
a[1] <= b[1] ? -1 : 0,
a[2] <= b[2] ? -1 : 0,
a[3] <= b[3] ? -1 : 0,
a[4] <= b[4] ? -1 : 0,
a[5] <= b[5] ? -1 : 0,
a[6] <= b[6] ? -1 : 0,
a[7] <= b[7] ? -1 : 0,
};
compare_vector_vector(res, exp);
}
TEST(TEST_CATEGORY_NAME, avxf_permute)
{
INIT_AVX_TEST
avxf res = permute<3, 0, 1, 7, 6, 5, 2, 4>(avxf_b);
compare_vector_vector(res, avxf(4.0f, 6.0f, 3.0f, 2.0f, 1.0f, 7.0f, 8.0f, 5.0f));
}
TEST(TEST_CATEGORY_NAME, avxf_blend)
{
INIT_AVX_TEST
avxf res = blend<0, 0, 1, 0, 1, 0, 1, 0>(avxf_a, avxf_b);
compare_vector_vector(res, avxf(0.1f, 0.2f, 3.0f, 0.4f, 5.0f, 0.6f, 7.0f, 0.8f));
}
TEST(TEST_CATEGORY_NAME, avxf_shuffle)
{
INIT_AVX_TEST
avxf res = shuffle<0, 1, 2, 3, 1, 3, 2, 0>(avxf_a);
compare_vector_vector(res, avxf(0.4f, 0.2f, 0.1f, 0.3f, 0.5f, 0.6f, 0.7f, 0.8f));
}
TEST(TEST_CATEGORY_NAME, avxf_cross)
{
INIT_AVX_TEST
avxf res = cross(avxf_b, avxf_c);
compare_vector_vector_near(res,
avxf(0.0f,
-9.5367432e-07f,
0.0f,
4.7683716e-07f,
0.0f,
-3.8146973e-06f,
3.8146973e-06f,
3.8146973e-06f),
0.000002000f);
}
TEST(TEST_CATEGORY_NAME, avxf_dot3)
{
INIT_AVX_TEST
float den, den2;
dot3(avxf_a, avxf_b, den, den2);
EXPECT_FLOAT_EQ(den, 14.9f);
EXPECT_FLOAT_EQ(den2, 2.9f);
}
CCL_NAMESPACE_END

View File

@@ -1,11 +1,13 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#define __KERNEL_SSE__
#define __KERNEL_AVX__
#define __KERNEL_AVX2__
#define TEST_CATEGORY_NAME util_avx2
#if (defined(i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)) && \
defined(__AVX2__)
# include "util_avxf_test.h"
# include "util_float8_test.h"
#endif

View File

@@ -1,11 +1,12 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#define __KERNEL_SSE__
#define __KERNEL_AVX__
#define TEST_CATEGORY_NAME util_avx
#if (defined(i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)) && \
defined(__AVX__)
# include "util_avxf_test.h"
# include "util_float8_test.h"
#endif

View File

@@ -0,0 +1,12 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#define __KERNEL_SSE__
#define __KERNEL_SSE2__
#define TEST_CATEGORY_NAME util_sse2
#if (defined(i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)) && \
defined(__SSE2__)
# include "util_float8_test.h"
#endif

View File

@@ -0,0 +1,103 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#include "testing/testing.h"
#include "util/math.h"
#include "util/system.h"
#include "util/types.h"
CCL_NAMESPACE_BEGIN
static bool validate_cpu_capabilities()
{
#if defined(__KERNEL_AVX2__)
return system_cpu_support_avx2();
#elif defined(__KERNEL_AVX__)
return system_cpu_support_avx();
#elif defined(__KERNEL_SSE2__)
return system_cpu_support_sse2();
#else
return false;
#endif
}
#define INIT_FLOAT8_TEST \
if (!validate_cpu_capabilities()) \
return; \
\
const vfloat8 float8_a = make_vfloat8(0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, 0.8f); \
const vfloat8 float8_b = make_vfloat8(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f); \
const vfloat8 float8_c = make_vfloat8(1.1f, 2.2f, 3.3f, 4.4f, 5.5f, 6.6f, 7.7f, 8.8f);
#define compare_vector_scalar(a, b) \
for (size_t index = 0; index < 8; index++) \
EXPECT_FLOAT_EQ(a[index], b);
#define compare_vector_vector(a, b) \
for (size_t index = 0; index < 8; index++) \
EXPECT_FLOAT_EQ(a[index], b[index]);
#define compare_vector_vector_near(a, b, abserror) \
for (size_t index = 0; index < 8; index++) \
EXPECT_NEAR(a[index], b[index], abserror);
#define basic_test_vv(a, b, op) \
INIT_FLOAT8_TEST \
vfloat8 c = a op b; \
for (size_t i = 0; i < 8; i++) \
EXPECT_FLOAT_EQ(c[i], a[i] op b[i]);
/* vector op float tests */
#define basic_test_vf(a, b, op) \
INIT_FLOAT8_TEST \
vfloat8 c = a op b; \
for (size_t i = 0; i < 8; i++) \
EXPECT_FLOAT_EQ(c[i], a[i] op b);
static const float float_b = 1.5f;
TEST(TEST_CATEGORY_NAME,
float8_add_vv){basic_test_vv(float8_a, float8_b, +)} TEST(TEST_CATEGORY_NAME, float8_sub_vv){
basic_test_vv(float8_a, float8_b, -)} TEST(TEST_CATEGORY_NAME, float8_mul_vv){
basic_test_vv(float8_a, float8_b, *)} TEST(TEST_CATEGORY_NAME, float8_div_vv){
basic_test_vv(float8_a, float8_b, /)} TEST(TEST_CATEGORY_NAME, float8_add_vf){
basic_test_vf(float8_a, float_b, +)} TEST(TEST_CATEGORY_NAME, float8_sub_vf){
basic_test_vf(float8_a, float_b, -)} TEST(TEST_CATEGORY_NAME, float8_mul_vf){
basic_test_vf(float8_a, float_b, *)} TEST(TEST_CATEGORY_NAME,
float8_div_vf){basic_test_vf(float8_a, float_b, /)}
TEST(TEST_CATEGORY_NAME, float8_ctor)
{
INIT_FLOAT8_TEST
compare_vector_scalar(make_vfloat8(0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f),
static_cast<float>(index));
compare_vector_scalar(make_vfloat8(1.0f), 1.0f);
}
TEST(TEST_CATEGORY_NAME, float8_sqrt)
{
INIT_FLOAT8_TEST
compare_vector_vector(sqrt(make_vfloat8(1.0f, 4.0f, 9.0f, 16.0f, 25.0f, 36.0f, 49.0f, 64.0f)),
make_vfloat8(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f));
}
TEST(TEST_CATEGORY_NAME, float8_min_max)
{
INIT_FLOAT8_TEST
compare_vector_vector(min(float8_a, float8_b), float8_a);
compare_vector_vector(max(float8_a, float8_b), float8_b);
}
TEST(TEST_CATEGORY_NAME, float8_shuffle)
{
INIT_FLOAT8_TEST
vfloat8 res0 = shuffle<0, 1, 2, 3, 1, 3, 2, 0>(float8_a);
compare_vector_vector(res0, make_vfloat8(0.1f, 0.2f, 0.3f, 0.4f, 0.6f, 0.8f, 0.7f, 0.5f));
vfloat8 res1 = shuffle<3>(float8_a);
compare_vector_vector(res1, make_vfloat8(0.4f, 0.4f, 0.4f, 0.4f, 0.8f, 0.8f, 0.8f, 0.8f));
vfloat8 res2 = shuffle<3, 2, 1, 0>(float8_a, float8_b);
compare_vector_vector(res2, make_vfloat8(0.4f, 0.3f, 2.0f, 1.0f, 0.8f, 0.7f, 6.0f, 5.0f));
}
CCL_NAMESPACE_END

View File

@@ -69,6 +69,7 @@ set(SRC_HEADERS
math_int2.h
math_int3.h
math_int4.h
math_int8.h
math_matrix.h
md5.h
murmurhash.h
@@ -85,13 +86,7 @@ set(SRC_HEADERS
rect.h
set.h
simd.h
avxf.h
avxb.h
avxi.h
semaphore.h
sseb.h
ssef.h
ssei.h
stack_allocator.h
static_assert.h
stats.h
@@ -118,6 +113,8 @@ set(SRC_HEADERS
types_int3_impl.h
types_int4.h
types_int4_impl.h
types_int8.h
types_int8_impl.h
types_spectrum.h
types_uchar2.h
types_uchar2_impl.h

View File

@@ -1,230 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2013 Intel Corporation
* Modifications Copyright 2014-2022 Blender Foundation. */
#ifndef __UTIL_AVXB_H__
#define __UTIL_AVXB_H__
CCL_NAMESPACE_BEGIN
struct avxf;
/*! 4-wide SSE bool type. */
struct avxb {
typedef avxb Mask; // mask type
typedef avxf Float; // float type
enum { size = 8 }; // number of SIMD elements
union {
__m256 m256;
int32_t v[8];
}; // data
////////////////////////////////////////////////////////////////////////////////
/// Constructors, Assignment & Cast Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline avxb()
{
}
__forceinline avxb(const avxb &other)
{
m256 = other.m256;
}
__forceinline avxb &operator=(const avxb &other)
{
m256 = other.m256;
return *this;
}
__forceinline avxb(const __m256 input) : m256(input)
{
}
__forceinline avxb(const __m128 &a, const __m128 &b)
: m256(_mm256_insertf128_ps(_mm256_castps128_ps256(a), b, 1))
{
}
__forceinline operator const __m256 &(void) const
{
return m256;
}
__forceinline operator const __m256i(void) const
{
return _mm256_castps_si256(m256);
}
__forceinline operator const __m256d(void) const
{
return _mm256_castps_pd(m256);
}
////////////////////////////////////////////////////////////////////////////////
/// Constants
////////////////////////////////////////////////////////////////////////////////
__forceinline avxb(FalseTy) : m256(_mm256_setzero_ps())
{
}
__forceinline avxb(TrueTy) : m256(_mm256_castsi256_ps(_mm256_set1_epi32(-1)))
{
}
////////////////////////////////////////////////////////////////////////////////
/// Array Access
////////////////////////////////////////////////////////////////////////////////
__forceinline bool operator[](const size_t i) const
{
assert(i < 8);
return (_mm256_movemask_ps(m256) >> i) & 1;
}
__forceinline int32_t &operator[](const size_t i)
{
assert(i < 8);
return v[i];
}
};
////////////////////////////////////////////////////////////////////////////////
/// Unary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb operator!(const avxb &a)
{
return _mm256_xor_ps(a, avxb(True));
}
////////////////////////////////////////////////////////////////////////////////
/// Binary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb operator&(const avxb &a, const avxb &b)
{
return _mm256_and_ps(a, b);
}
__forceinline const avxb operator|(const avxb &a, const avxb &b)
{
return _mm256_or_ps(a, b);
}
__forceinline const avxb operator^(const avxb &a, const avxb &b)
{
return _mm256_xor_ps(a, b);
}
////////////////////////////////////////////////////////////////////////////////
/// Assignment Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb operator&=(avxb &a, const avxb &b)
{
return a = a & b;
}
__forceinline const avxb operator|=(avxb &a, const avxb &b)
{
return a = a | b;
}
__forceinline const avxb operator^=(avxb &a, const avxb &b)
{
return a = a ^ b;
}
////////////////////////////////////////////////////////////////////////////////
/// Comparison Operators + Select
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb operator!=(const avxb &a, const avxb &b)
{
return _mm256_xor_ps(a, b);
}
__forceinline const avxb operator==(const avxb &a, const avxb &b)
{
#ifdef __KERNEL_AVX2__
return _mm256_castsi256_ps(_mm256_cmpeq_epi32(a, b));
#else
__m128i a_lo = _mm_castps_si128(_mm256_extractf128_ps(a, 0));
__m128i a_hi = _mm_castps_si128(_mm256_extractf128_ps(a, 1));
__m128i b_lo = _mm_castps_si128(_mm256_extractf128_ps(b, 0));
__m128i b_hi = _mm_castps_si128(_mm256_extractf128_ps(b, 1));
__m128i c_lo = _mm_cmpeq_epi32(a_lo, b_lo);
__m128i c_hi = _mm_cmpeq_epi32(a_hi, b_hi);
__m256i result = _mm256_insertf128_si256(_mm256_castsi128_si256(c_lo), c_hi, 1);
return _mm256_castsi256_ps(result);
#endif
}
__forceinline const avxb select(const avxb &m, const avxb &t, const avxb &f)
{
#if defined(__KERNEL_SSE41__)
return _mm256_blendv_ps(f, t, m);
#else
return _mm256_or_ps(_mm256_and_ps(m, t), _mm256_andnot_ps(m, f));
#endif
}
////////////////////////////////////////////////////////////////////////////////
/// Movement/Shifting/Shuffling Functions
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb unpacklo(const avxb &a, const avxb &b)
{
return _mm256_unpacklo_ps(a, b);
}
__forceinline const avxb unpackhi(const avxb &a, const avxb &b)
{
return _mm256_unpackhi_ps(a, b);
}
////////////////////////////////////////////////////////////////////////////////
/// Reduction Operations
////////////////////////////////////////////////////////////////////////////////
#if defined(__KERNEL_SSE41__)
__forceinline uint32_t popcnt(const avxb &a)
{
return _mm_popcnt_u32(_mm256_movemask_ps(a));
}
#else
__forceinline uint32_t popcnt(const avxb &a)
{
return bool(a[0]) + bool(a[1]) + bool(a[2]) + bool(a[3]) + bool(a[4]) + bool(a[5]) + bool(a[6]) +
bool(a[7]);
}
#endif
__forceinline bool reduce_and(const avxb &a)
{
return _mm256_movemask_ps(a) == 0xf;
}
__forceinline bool reduce_or(const avxb &a)
{
return _mm256_movemask_ps(a) != 0x0;
}
__forceinline bool all(const avxb &b)
{
return _mm256_movemask_ps(b) == 0xf;
}
__forceinline bool any(const avxb &b)
{
return _mm256_movemask_ps(b) != 0x0;
}
__forceinline bool none(const avxb &b)
{
return _mm256_movemask_ps(b) == 0x0;
}
__forceinline uint32_t movemask(const avxb &a)
{
return _mm256_movemask_ps(a);
}
////////////////////////////////////////////////////////////////////////////////
/// Debug Functions
////////////////////////////////////////////////////////////////////////////////
ccl_device_inline void print_avxb(const char *label, const avxb &a)
{
printf("%s: %d %d %d %d %d %d %d %d\n", label, a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]);
}
CCL_NAMESPACE_END
#endif

View File

@@ -1,379 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2016 Intel Corporation */
#ifndef __UTIL_AVXF_H__
#define __UTIL_AVXF_H__
CCL_NAMESPACE_BEGIN
struct avxb;
struct avxf {
typedef avxf Float;
enum { size = 8 }; /* Number of SIMD elements. */
union {
__m256 m256;
float f[8];
int i[8];
};
__forceinline avxf()
{
}
__forceinline avxf(const avxf &other)
{
m256 = other.m256;
}
__forceinline avxf &operator=(const avxf &other)
{
m256 = other.m256;
return *this;
}
__forceinline avxf(const __m256 a) : m256(a)
{
}
__forceinline avxf(const __m256i a) : m256(_mm256_castsi256_ps(a))
{
}
__forceinline operator const __m256 &() const
{
return m256;
}
__forceinline operator __m256 &()
{
return m256;
}
__forceinline avxf(float a) : m256(_mm256_set1_ps(a))
{
}
__forceinline avxf(float high32x4, float low32x4)
: m256(_mm256_set_ps(
high32x4, high32x4, high32x4, high32x4, low32x4, low32x4, low32x4, low32x4))
{
}
__forceinline avxf(float a3, float a2, float a1, float a0)
: m256(_mm256_set_ps(a3, a2, a1, a0, a3, a2, a1, a0))
{
}
__forceinline avxf(
float a7, float a6, float a5, float a4, float a3, float a2, float a1, float a0)
: m256(_mm256_set_ps(a7, a6, a5, a4, a3, a2, a1, a0))
{
}
__forceinline avxf(float3 a) : m256(_mm256_set_ps(a.w, a.z, a.y, a.x, a.w, a.z, a.y, a.x))
{
}
__forceinline avxf(int a3, int a2, int a1, int a0)
{
const __m256i foo = _mm256_set_epi32(a3, a2, a1, a0, a3, a2, a1, a0);
m256 = _mm256_castsi256_ps(foo);
}
__forceinline avxf(int a7, int a6, int a5, int a4, int a3, int a2, int a1, int a0)
{
const __m256i foo = _mm256_set_epi32(a7, a6, a5, a4, a3, a2, a1, a0);
m256 = _mm256_castsi256_ps(foo);
}
__forceinline avxf(__m128 a, __m128 b)
{
const __m256 foo = _mm256_castps128_ps256(a);
m256 = _mm256_insertf128_ps(foo, b, 1);
}
__forceinline const float &operator[](const size_t i) const
{
assert(i < 8);
return f[i];
}
__forceinline float &operator[](const size_t i)
{
assert(i < 8);
return f[i];
}
};
__forceinline avxf cross(const avxf &a, const avxf &b)
{
avxf r(0.0,
a[4] * b[5] - a[5] * b[4],
a[6] * b[4] - a[4] * b[6],
a[5] * b[6] - a[6] * b[5],
0.0,
a[0] * b[1] - a[1] * b[0],
a[2] * b[0] - a[0] * b[2],
a[1] * b[2] - a[2] * b[1]);
return r;
}
__forceinline void dot3(const avxf &a, const avxf &b, float &den, float &den2)
{
const avxf t = _mm256_mul_ps(a.m256, b.m256);
den = ((float *)&t)[0] + ((float *)&t)[1] + ((float *)&t)[2];
den2 = ((float *)&t)[4] + ((float *)&t)[5] + ((float *)&t)[6];
}
////////////////////////////////////////////////////////////////////////////////
/// Unary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxf cast(const __m256i &a)
{
return _mm256_castsi256_ps(a);
}
__forceinline const avxf mm256_sqrt(const avxf &a)
{
return _mm256_sqrt_ps(a.m256);
}
////////////////////////////////////////////////////////////////////////////////
/// Binary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxf operator+(const avxf &a, const avxf &b)
{
return _mm256_add_ps(a.m256, b.m256);
}
__forceinline const avxf operator+(const avxf &a, const float &b)
{
return a + avxf(b);
}
__forceinline const avxf operator+(const float &a, const avxf &b)
{
return avxf(a) + b;
}
__forceinline const avxf operator-(const avxf &a, const avxf &b)
{
return _mm256_sub_ps(a.m256, b.m256);
}
__forceinline const avxf operator-(const avxf &a, const float &b)
{
return a - avxf(b);
}
__forceinline const avxf operator-(const float &a, const avxf &b)
{
return avxf(a) - b;
}
__forceinline const avxf operator*(const avxf &a, const avxf &b)
{
return _mm256_mul_ps(a.m256, b.m256);
}
__forceinline const avxf operator*(const avxf &a, const float &b)
{
return a * avxf(b);
}
__forceinline const avxf operator*(const float &a, const avxf &b)
{
return avxf(a) * b;
}
__forceinline const avxf operator/(const avxf &a, const avxf &b)
{
return _mm256_div_ps(a.m256, b.m256);
}
__forceinline const avxf operator/(const avxf &a, const float &b)
{
return a / avxf(b);
}
__forceinline const avxf operator/(const float &a, const avxf &b)
{
return avxf(a) / b;
}
__forceinline const avxf operator|(const avxf &a, const avxf &b)
{
return _mm256_or_ps(a.m256, b.m256);
}
__forceinline const avxf operator^(const avxf &a, const avxf &b)
{
return _mm256_xor_ps(a.m256, b.m256);
}
__forceinline const avxf operator&(const avxf &a, const avxf &b)
{
return _mm256_and_ps(a.m256, b.m256);
}
__forceinline const avxf max(const avxf &a, const avxf &b)
{
return _mm256_max_ps(a.m256, b.m256);
}
__forceinline const avxf min(const avxf &a, const avxf &b)
{
return _mm256_min_ps(a.m256, b.m256);
}
////////////////////////////////////////////////////////////////////////////////
/// Movement/Shifting/Shuffling Functions
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxf shuffle(const avxf &a, const __m256i &shuf)
{
return _mm256_permutevar_ps(a, shuf);
}
template<int i0, int i1, int i2, int i3, int i4, int i5, int i6, int i7>
__forceinline const avxf shuffle(const avxf &a)
{
return _mm256_permutevar_ps(a, _mm256_set_epi32(i7, i6, i5, i4, i3, i2, i1, i0));
}
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const avxf shuffle(const avxf &a, const avxf &b)
{
return _mm256_shuffle_ps(a, b, _MM_SHUFFLE(i3, i2, i1, i0));
}
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const avxf shuffle(const avxf &a)
{
return shuffle<i0, i1, i2, i3>(a, a);
}
template<size_t i0> __forceinline const avxf shuffle(const avxf &a, const avxf &b)
{
return shuffle<i0, i0, i0, i0>(a, b);
}
template<size_t i0> __forceinline const avxf shuffle(const avxf &a)
{
return shuffle<i0>(a, a);
}
template<size_t i> __forceinline float extract(const avxf &a)
{
__m256 b = shuffle<i, i, i, i>(a).m256;
return _mm256_cvtss_f32(b);
}
template<> __forceinline float extract<0>(const avxf &a)
{
return _mm256_cvtss_f32(a.m256);
}
__forceinline ssef low(const avxf &a)
{
return _mm256_extractf128_ps(a.m256, 0);
}
__forceinline ssef high(const avxf &a)
{
return _mm256_extractf128_ps(a.m256, 1);
}
template<int i0, int i1, int i2, int i3, int i4, int i5, int i6, int i7>
__forceinline const avxf permute(const avxf &a)
{
#ifdef __KERNEL_AVX2__
return _mm256_permutevar8x32_ps(a, _mm256_set_epi32(i7, i6, i5, i4, i3, i2, i1, i0));
#else
float temp[8];
_mm256_storeu_ps((float *)&temp, a);
return avxf(temp[i7], temp[i6], temp[i5], temp[i4], temp[i3], temp[i2], temp[i1], temp[i0]);
#endif
}
template<int S0, int S1, int S2, int S3, int S4, int S5, int S6, int S7>
ccl_device_inline const avxf set_sign_bit(const avxf &a)
{
return a ^ avxf(S7 << 31, S6 << 31, S5 << 31, S4 << 31, S3 << 31, S2 << 31, S1 << 31, S0 << 31);
}
template<size_t S0, size_t S1, size_t S2, size_t S3, size_t S4, size_t S5, size_t S6, size_t S7>
ccl_device_inline const avxf blend(const avxf &a, const avxf &b)
{
return _mm256_blend_ps(
a, b, S7 << 0 | S6 << 1 | S5 << 2 | S4 << 3 | S3 << 4 | S2 << 5 | S1 << 6 | S0 << 7);
}
template<size_t S0, size_t S1, size_t S2, size_t S3>
ccl_device_inline const avxf blend(const avxf &a, const avxf &b)
{
return blend<S0, S1, S2, S3, S0, S1, S2, S3>(a, b);
}
//#if defined(__KERNEL_SSE41__)
__forceinline avxf maxi(const avxf &a, const avxf &b)
{
const avxf ci = _mm256_max_ps(a, b);
return ci;
}
__forceinline avxf mini(const avxf &a, const avxf &b)
{
const avxf ci = _mm256_min_ps(a, b);
return ci;
}
//#endif
////////////////////////////////////////////////////////////////////////////////
/// Ternary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxf madd(const avxf &a, const avxf &b, const avxf &c)
{
#ifdef __KERNEL_AVX2__
return _mm256_fmadd_ps(a, b, c);
#else
return c + (a * b);
#endif
}
__forceinline const avxf nmadd(const avxf &a, const avxf &b, const avxf &c)
{
#ifdef __KERNEL_AVX2__
return _mm256_fnmadd_ps(a, b, c);
#else
return c - (a * b);
#endif
}
__forceinline const avxf msub(const avxf &a, const avxf &b, const avxf &c)
{
#ifdef __KERNEL_AVX2__
return _mm256_fmsub_ps(a, b, c);
#else
return (a * b) - c;
#endif
}
////////////////////////////////////////////////////////////////////////////////
/// Comparison Operators + Select
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxb operator<=(const avxf &a, const avxf &b)
{
return _mm256_cmp_ps(a.m256, b.m256, _CMP_LE_OS);
}
__forceinline const avxf select(const avxb &m, const avxf &t, const avxf &f)
{
return _mm256_blendv_ps(f, t, m);
}
////////////////////////////////////////////////////////////////////////////////
/// Common Functions
////////////////////////////////////////////////////////////////////////////////
__forceinline avxf mix(const avxf &a, const avxf &b, const avxf &t)
{
return madd(t, b, (avxf(1.0f) - t) * a);
}
#ifndef _mm256_set_m128
# define _mm256_set_m128(/* __m128 */ hi, /* __m128 */ lo) \
_mm256_insertf128_ps(_mm256_castps128_ps256(lo), (hi), 0x1)
#endif
#define _mm256_loadu2_m128(/* float const* */ hiaddr, /* float const* */ loaddr) \
_mm256_set_m128(_mm_loadu_ps(hiaddr), _mm_loadu_ps(loaddr))
CCL_NAMESPACE_END
#endif

View File

@@ -1,732 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2009-2013 Intel Corporation */
#ifndef __UTIL_AVXI_H__
#define __UTIL_AVXI_H__
CCL_NAMESPACE_BEGIN
struct avxb;
struct avxi {
typedef avxb Mask; // mask type for us
enum { size = 8 }; // number of SIMD elements
union { // data
__m256i m256;
#if !defined(__KERNEL_AVX2__)
struct {
__m128i l, h;
};
#endif
int32_t v[8];
};
////////////////////////////////////////////////////////////////////////////////
/// Constructors, Assignment & Cast Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline avxi()
{
}
__forceinline avxi(const avxi &a)
{
m256 = a.m256;
}
__forceinline avxi &operator=(const avxi &a)
{
m256 = a.m256;
return *this;
}
__forceinline avxi(const __m256i a) : m256(a)
{
}
__forceinline operator const __m256i &(void) const
{
return m256;
}
__forceinline operator __m256i &(void)
{
return m256;
}
__forceinline explicit avxi(const ssei &a)
: m256(_mm256_insertf128_si256(_mm256_castsi128_si256(a), a, 1))
{
}
__forceinline avxi(const ssei &a, const ssei &b)
: m256(_mm256_insertf128_si256(_mm256_castsi128_si256(a), b, 1))
{
}
#if defined(__KERNEL_AVX2__)
__forceinline avxi(const __m128i &a, const __m128i &b)
: m256(_mm256_insertf128_si256(_mm256_castsi128_si256(a), b, 1))
{
}
#else
__forceinline avxi(const __m128i &a, const __m128i &b) : l(a), h(b)
{
}
#endif
__forceinline explicit avxi(const int32_t *const a)
: m256(_mm256_castps_si256(_mm256_loadu_ps((const float *)a)))
{
}
__forceinline avxi(int32_t a) : m256(_mm256_set1_epi32(a))
{
}
__forceinline avxi(int32_t a, int32_t b) : m256(_mm256_set_epi32(b, a, b, a, b, a, b, a))
{
}
__forceinline avxi(int32_t a, int32_t b, int32_t c, int32_t d)
: m256(_mm256_set_epi32(d, c, b, a, d, c, b, a))
{
}
__forceinline avxi(
int32_t a, int32_t b, int32_t c, int32_t d, int32_t e, int32_t f, int32_t g, int32_t h)
: m256(_mm256_set_epi32(h, g, f, e, d, c, b, a))
{
}
__forceinline explicit avxi(const __m256 a) : m256(_mm256_cvtps_epi32(a))
{
}
////////////////////////////////////////////////////////////////////////////////
/// Constants
////////////////////////////////////////////////////////////////////////////////
__forceinline avxi(ZeroTy) : m256(_mm256_setzero_si256())
{
}
#if defined(__KERNEL_AVX2__)
__forceinline avxi(OneTy) : m256(_mm256_set1_epi32(1))
{
}
__forceinline avxi(PosInfTy) : m256(_mm256_set1_epi32(pos_inf))
{
}
__forceinline avxi(NegInfTy) : m256(_mm256_set1_epi32(neg_inf))
{
}
#else
__forceinline avxi(OneTy) : m256(_mm256_set_epi32(1, 1, 1, 1, 1, 1, 1, 1))
{
}
__forceinline avxi(PosInfTy)
: m256(_mm256_set_epi32(
pos_inf, pos_inf, pos_inf, pos_inf, pos_inf, pos_inf, pos_inf, pos_inf))
{
}
__forceinline avxi(NegInfTy)
: m256(_mm256_set_epi32(
neg_inf, neg_inf, neg_inf, neg_inf, neg_inf, neg_inf, neg_inf, neg_inf))
{
}
#endif
__forceinline avxi(StepTy) : m256(_mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0))
{
}
////////////////////////////////////////////////////////////////////////////////
/// Array Access
////////////////////////////////////////////////////////////////////////////////
__forceinline const int32_t &operator[](const size_t i) const
{
assert(i < 8);
return v[i];
}
__forceinline int32_t &operator[](const size_t i)
{
assert(i < 8);
return v[i];
}
};
////////////////////////////////////////////////////////////////////////////////
/// Unary Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxi cast(const __m256 &a)
{
return _mm256_castps_si256(a);
}
__forceinline const avxi operator+(const avxi &a)
{
return a;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator-(const avxi &a)
{
return _mm256_sub_epi32(_mm256_setzero_si256(), a.m256);
}
__forceinline const avxi abs(const avxi &a)
{
return _mm256_abs_epi32(a.m256);
}
#else
__forceinline const avxi operator-(const avxi &a)
{
return avxi(_mm_sub_epi32(_mm_setzero_si128(), a.l), _mm_sub_epi32(_mm_setzero_si128(), a.h));
}
__forceinline const avxi abs(const avxi &a)
{
return avxi(_mm_abs_epi32(a.l), _mm_abs_epi32(a.h));
}
#endif
////////////////////////////////////////////////////////////////////////////////
/// Binary Operators
////////////////////////////////////////////////////////////////////////////////
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator+(const avxi &a, const avxi &b)
{
return _mm256_add_epi32(a.m256, b.m256);
}
#else
__forceinline const avxi operator+(const avxi &a, const avxi &b)
{
return avxi(_mm_add_epi32(a.l, b.l), _mm_add_epi32(a.h, b.h));
}
#endif
__forceinline const avxi operator+(const avxi &a, const int32_t b)
{
return a + avxi(b);
}
__forceinline const avxi operator+(const int32_t a, const avxi &b)
{
return avxi(a) + b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator-(const avxi &a, const avxi &b)
{
return _mm256_sub_epi32(a.m256, b.m256);
}
#else
__forceinline const avxi operator-(const avxi &a, const avxi &b)
{
return avxi(_mm_sub_epi32(a.l, b.l), _mm_sub_epi32(a.h, b.h));
}
#endif
__forceinline const avxi operator-(const avxi &a, const int32_t b)
{
return a - avxi(b);
}
__forceinline const avxi operator-(const int32_t a, const avxi &b)
{
return avxi(a) - b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator*(const avxi &a, const avxi &b)
{
return _mm256_mullo_epi32(a.m256, b.m256);
}
#else
__forceinline const avxi operator*(const avxi &a, const avxi &b)
{
return avxi(_mm_mullo_epi32(a.l, b.l), _mm_mullo_epi32(a.h, b.h));
}
#endif
__forceinline const avxi operator*(const avxi &a, const int32_t b)
{
return a * avxi(b);
}
__forceinline const avxi operator*(const int32_t a, const avxi &b)
{
return avxi(a) * b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator&(const avxi &a, const avxi &b)
{
return _mm256_and_si256(a.m256, b.m256);
}
#else
__forceinline const avxi operator&(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_and_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(b)));
}
#endif
__forceinline const avxi operator&(const avxi &a, const int32_t b)
{
return a & avxi(b);
}
__forceinline const avxi operator&(const int32_t a, const avxi &b)
{
return avxi(a) & b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator|(const avxi &a, const avxi &b)
{
return _mm256_or_si256(a.m256, b.m256);
}
#else
__forceinline const avxi operator|(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_or_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(b)));
}
#endif
__forceinline const avxi operator|(const avxi &a, const int32_t b)
{
return a | avxi(b);
}
__forceinline const avxi operator|(const int32_t a, const avxi &b)
{
return avxi(a) | b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator^(const avxi &a, const avxi &b)
{
return _mm256_xor_si256(a.m256, b.m256);
}
#else
__forceinline const avxi operator^(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_xor_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(b)));
}
#endif
__forceinline const avxi operator^(const avxi &a, const int32_t b)
{
return a ^ avxi(b);
}
__forceinline const avxi operator^(const int32_t a, const avxi &b)
{
return avxi(a) ^ b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi operator<<(const avxi &a, const int32_t n)
{
return _mm256_slli_epi32(a.m256, n);
}
__forceinline const avxi operator>>(const avxi &a, const int32_t n)
{
return _mm256_srai_epi32(a.m256, n);
}
__forceinline const avxi sra(const avxi &a, const int32_t b)
{
return _mm256_srai_epi32(a.m256, b);
}
__forceinline const avxi srl(const avxi &a, const int32_t b)
{
return _mm256_srli_epi32(a.m256, b);
}
#else
__forceinline const avxi operator<<(const avxi &a, const int32_t n)
{
return avxi(_mm_slli_epi32(a.l, n), _mm_slli_epi32(a.h, n));
}
__forceinline const avxi operator>>(const avxi &a, const int32_t n)
{
return avxi(_mm_srai_epi32(a.l, n), _mm_srai_epi32(a.h, n));
}
__forceinline const avxi sra(const avxi &a, const int32_t b)
{
return avxi(_mm_srai_epi32(a.l, b), _mm_srai_epi32(a.h, b));
}
__forceinline const avxi srl(const avxi &a, const int32_t b)
{
return avxi(_mm_srli_epi32(a.l, b), _mm_srli_epi32(a.h, b));
}
#endif
#if defined(__KERNEL_AVX2__)
__forceinline const avxi min(const avxi &a, const avxi &b)
{
return _mm256_min_epi32(a.m256, b.m256);
}
#else
__forceinline const avxi min(const avxi &a, const avxi &b)
{
return avxi(_mm_min_epi32(a.l, b.l), _mm_min_epi32(a.h, b.h));
}
#endif
__forceinline const avxi min(const avxi &a, const int32_t b)
{
return min(a, avxi(b));
}
__forceinline const avxi min(const int32_t a, const avxi &b)
{
return min(avxi(a), b);
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxi max(const avxi &a, const avxi &b)
{
return _mm256_max_epi32(a.m256, b.m256);
}
#else
__forceinline const avxi max(const avxi &a, const avxi &b)
{
return avxi(_mm_max_epi32(a.l, b.l), _mm_max_epi32(a.h, b.h));
}
#endif
__forceinline const avxi max(const avxi &a, const int32_t b)
{
return max(a, avxi(b));
}
__forceinline const avxi max(const int32_t a, const avxi &b)
{
return max(avxi(a), b);
}
////////////////////////////////////////////////////////////////////////////////
/// Assignment Operators
////////////////////////////////////////////////////////////////////////////////
__forceinline avxi &operator+=(avxi &a, const avxi &b)
{
return a = a + b;
}
__forceinline avxi &operator+=(avxi &a, const int32_t b)
{
return a = a + b;
}
__forceinline avxi &operator-=(avxi &a, const avxi &b)
{
return a = a - b;
}
__forceinline avxi &operator-=(avxi &a, const int32_t b)
{
return a = a - b;
}
__forceinline avxi &operator*=(avxi &a, const avxi &b)
{
return a = a * b;
}
__forceinline avxi &operator*=(avxi &a, const int32_t b)
{
return a = a * b;
}
__forceinline avxi &operator&=(avxi &a, const avxi &b)
{
return a = a & b;
}
__forceinline avxi &operator&=(avxi &a, const int32_t b)
{
return a = a & b;
}
__forceinline avxi &operator|=(avxi &a, const avxi &b)
{
return a = a | b;
}
__forceinline avxi &operator|=(avxi &a, const int32_t b)
{
return a = a | b;
}
__forceinline avxi &operator^=(avxi &a, const avxi &b)
{
return a = a ^ b;
}
__forceinline avxi &operator^=(avxi &a, const int32_t b)
{
return a = a ^ b;
}
__forceinline avxi &operator<<=(avxi &a, const int32_t b)
{
return a = a << b;
}
__forceinline avxi &operator>>=(avxi &a, const int32_t b)
{
return a = a >> b;
}
////////////////////////////////////////////////////////////////////////////////
/// Comparison Operators + Select
////////////////////////////////////////////////////////////////////////////////
#if defined(__KERNEL_AVX2__)
__forceinline const avxb operator==(const avxi &a, const avxi &b)
{
return _mm256_castsi256_ps(_mm256_cmpeq_epi32(a.m256, b.m256));
}
#else
__forceinline const avxb operator==(const avxi &a, const avxi &b)
{
return avxb(_mm_castsi128_ps(_mm_cmpeq_epi32(a.l, b.l)),
_mm_castsi128_ps(_mm_cmpeq_epi32(a.h, b.h)));
}
#endif
__forceinline const avxb operator==(const avxi &a, const int32_t b)
{
return a == avxi(b);
}
__forceinline const avxb operator==(const int32_t a, const avxi &b)
{
return avxi(a) == b;
}
__forceinline const avxb operator!=(const avxi &a, const avxi &b)
{
return !(a == b);
}
__forceinline const avxb operator!=(const avxi &a, const int32_t b)
{
return a != avxi(b);
}
__forceinline const avxb operator!=(const int32_t a, const avxi &b)
{
return avxi(a) != b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxb operator<(const avxi &a, const avxi &b)
{
return _mm256_castsi256_ps(_mm256_cmpgt_epi32(b.m256, a.m256));
}
#else
__forceinline const avxb operator<(const avxi &a, const avxi &b)
{
return avxb(_mm_castsi128_ps(_mm_cmplt_epi32(a.l, b.l)),
_mm_castsi128_ps(_mm_cmplt_epi32(a.h, b.h)));
}
#endif
__forceinline const avxb operator<(const avxi &a, const int32_t b)
{
return a < avxi(b);
}
__forceinline const avxb operator<(const int32_t a, const avxi &b)
{
return avxi(a) < b;
}
__forceinline const avxb operator>=(const avxi &a, const avxi &b)
{
return !(a < b);
}
__forceinline const avxb operator>=(const avxi &a, const int32_t b)
{
return a >= avxi(b);
}
__forceinline const avxb operator>=(const int32_t a, const avxi &b)
{
return avxi(a) >= b;
}
#if defined(__KERNEL_AVX2__)
__forceinline const avxb operator>(const avxi &a, const avxi &b)
{
return _mm256_castsi256_ps(_mm256_cmpgt_epi32(a.m256, b.m256));
}
#else
__forceinline const avxb operator>(const avxi &a, const avxi &b)
{
return avxb(_mm_castsi128_ps(_mm_cmpgt_epi32(a.l, b.l)),
_mm_castsi128_ps(_mm_cmpgt_epi32(a.h, b.h)));
}
#endif
__forceinline const avxb operator>(const avxi &a, const int32_t b)
{
return a > avxi(b);
}
__forceinline const avxb operator>(const int32_t a, const avxi &b)
{
return avxi(a) > b;
}
__forceinline const avxb operator<=(const avxi &a, const avxi &b)
{
return !(a > b);
}
__forceinline const avxb operator<=(const avxi &a, const int32_t b)
{
return a <= avxi(b);
}
__forceinline const avxb operator<=(const int32_t a, const avxi &b)
{
return avxi(a) <= b;
}
__forceinline const avxi select(const avxb &m, const avxi &t, const avxi &f)
{
return _mm256_castps_si256(_mm256_blendv_ps(_mm256_castsi256_ps(f), _mm256_castsi256_ps(t), m));
}
////////////////////////////////////////////////////////////////////////////////
/// Movement/Shifting/Shuffling Functions
////////////////////////////////////////////////////////////////////////////////
#if defined(__KERNEL_AVX2__)
__forceinline avxi unpacklo(const avxi &a, const avxi &b)
{
return _mm256_unpacklo_epi32(a.m256, b.m256);
}
__forceinline avxi unpackhi(const avxi &a, const avxi &b)
{
return _mm256_unpackhi_epi32(a.m256, b.m256);
}
#else
__forceinline avxi unpacklo(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_unpacklo_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(b)));
}
__forceinline avxi unpackhi(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_unpackhi_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(b)));
}
#endif
template<size_t i> __forceinline const avxi shuffle(const avxi &a)
{
return _mm256_castps_si256(_mm256_permute_ps(_mm256_castsi256_ps(a), _MM_SHUFFLE(i, i, i, i)));
}
template<size_t i0, size_t i1> __forceinline const avxi shuffle(const avxi &a)
{
return _mm256_permute2f128_si256(a, a, (i1 << 4) | (i0 << 0));
}
template<size_t i0, size_t i1> __forceinline const avxi shuffle(const avxi &a, const avxi &b)
{
return _mm256_permute2f128_si256(a, b, (i1 << 4) | (i0 << 0));
}
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const avxi shuffle(const avxi &a)
{
return _mm256_castps_si256(
_mm256_permute_ps(_mm256_castsi256_ps(a), _MM_SHUFFLE(i3, i2, i1, i0)));
}
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const avxi shuffle(const avxi &a, const avxi &b)
{
return _mm256_castps_si256(_mm256_shuffle_ps(
_mm256_castsi256_ps(a), _mm256_castsi256_ps(b), _MM_SHUFFLE(i3, i2, i1, i0)));
}
template<> __forceinline const avxi shuffle<0, 0, 2, 2>(const avxi &b)
{
return _mm256_castps_si256(_mm256_moveldup_ps(_mm256_castsi256_ps(b)));
}
template<> __forceinline const avxi shuffle<1, 1, 3, 3>(const avxi &b)
{
return _mm256_castps_si256(_mm256_movehdup_ps(_mm256_castsi256_ps(b)));
}
template<> __forceinline const avxi shuffle<0, 1, 0, 1>(const avxi &b)
{
return _mm256_castps_si256(
_mm256_castpd_ps(_mm256_movedup_pd(_mm256_castps_pd(_mm256_castsi256_ps(b)))));
}
__forceinline const avxi broadcast(const int *ptr)
{
return _mm256_castps_si256(_mm256_broadcast_ss((const float *)ptr));
}
template<size_t i> __forceinline const avxi insert(const avxi &a, const ssei &b)
{
return _mm256_insertf128_si256(a, b, i);
}
template<size_t i> __forceinline const ssei extract(const avxi &a)
{
return _mm256_extractf128_si256(a, i);
}
////////////////////////////////////////////////////////////////////////////////
/// Reductions
////////////////////////////////////////////////////////////////////////////////
__forceinline const avxi vreduce_min2(const avxi &v)
{
return min(v, shuffle<1, 0, 3, 2>(v));
}
__forceinline const avxi vreduce_min4(const avxi &v)
{
avxi v1 = vreduce_min2(v);
return min(v1, shuffle<2, 3, 0, 1>(v1));
}
__forceinline const avxi vreduce_min(const avxi &v)
{
avxi v1 = vreduce_min4(v);
return min(v1, shuffle<1, 0>(v1));
}
__forceinline const avxi vreduce_max2(const avxi &v)
{
return max(v, shuffle<1, 0, 3, 2>(v));
}
__forceinline const avxi vreduce_max4(const avxi &v)
{
avxi v1 = vreduce_max2(v);
return max(v1, shuffle<2, 3, 0, 1>(v1));
}
__forceinline const avxi vreduce_max(const avxi &v)
{
avxi v1 = vreduce_max4(v);
return max(v1, shuffle<1, 0>(v1));
}
__forceinline const avxi vreduce_add2(const avxi &v)
{
return v + shuffle<1, 0, 3, 2>(v);
}
__forceinline const avxi vreduce_add4(const avxi &v)
{
avxi v1 = vreduce_add2(v);
return v1 + shuffle<2, 3, 0, 1>(v1);
}
__forceinline const avxi vreduce_add(const avxi &v)
{
avxi v1 = vreduce_add4(v);
return v1 + shuffle<1, 0>(v1);
}
__forceinline int reduce_min(const avxi &v)
{
return extract<0>(extract<0>(vreduce_min(v)));
}
__forceinline int reduce_max(const avxi &v)
{
return extract<0>(extract<0>(vreduce_max(v)));
}
__forceinline int reduce_add(const avxi &v)
{
return extract<0>(extract<0>(vreduce_add(v)));
}
__forceinline uint32_t select_min(const avxi &v)
{
return __bsf(movemask(v == vreduce_min(v)));
}
__forceinline uint32_t select_max(const avxi &v)
{
return __bsf(movemask(v == vreduce_max(v)));
}
__forceinline uint32_t select_min(const avxb &valid, const avxi &v)
{
const avxi a = select(valid, v, avxi(pos_inf));
return __bsf(movemask(valid & (a == vreduce_min(a))));
}
__forceinline uint32_t select_max(const avxb &valid, const avxi &v)
{
const avxi a = select(valid, v, avxi(neg_inf));
return __bsf(movemask(valid & (a == vreduce_max(a))));
}
////////////////////////////////////////////////////////////////////////////////
/// Output Operators
////////////////////////////////////////////////////////////////////////////////
ccl_device_inline void print_avxi(const char *label, const avxi &a)
{
printf("%s: %d %d %d %d %d %d %d %d\n", label, a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]);
}
CCL_NAMESPACE_END
#endif

View File

@@ -228,28 +228,27 @@ ccl_device float3 xyY_to_xyz(float x, float y, float Y)
* exp = exponent, encoded as uint32_t
* e2coeff = 2^(127/exponent - 127) * bias_coeff^(1/exponent), encoded as uint32_t
*/
template<unsigned exp, unsigned e2coeff> ccl_device_inline ssef fastpow(const ssef &arg)
template<unsigned exp, unsigned e2coeff> ccl_device_inline float4 fastpow(const float4 &arg)
{
ssef ret;
ret = arg * cast(ssei(e2coeff));
ret = ssef(cast(ret));
ret = ret * cast(ssei(exp));
ret = cast(ssei(ret));
float4 ret = arg * cast(make_int4(e2coeff));
ret = make_float4(cast(ret));
ret = ret * cast(make_int4(exp));
ret = cast(make_int4(ret));
return ret;
}
/* Improve x ^ 1.0f/5.0f solution with Newton-Raphson method */
ccl_device_inline ssef improve_5throot_solution(const ssef &old_result, const ssef &x)
ccl_device_inline float4 improve_5throot_solution(const float4 &old_result, const float4 &x)
{
ssef approx2 = old_result * old_result;
ssef approx4 = approx2 * approx2;
ssef t = x / approx4;
ssef summ = madd(ssef(4.0f), old_result, t);
return summ * ssef(1.0f / 5.0f);
float4 approx2 = old_result * old_result;
float4 approx4 = approx2 * approx2;
float4 t = x / approx4;
float4 summ = madd(make_float4(4.0f), old_result, t);
return summ * make_float4(1.0f / 5.0f);
}
/* Calculate powf(x, 2.4). Working domain: 1e-10 < x < 1e+10 */
ccl_device_inline ssef fastpow24(const ssef &arg)
ccl_device_inline float4 fastpow24(const float4 &arg)
{
/* max, avg and |avg| errors were calculated in gcc without FMA instructions
* The final precision should be better than powf in glibc */
@@ -257,9 +256,10 @@ ccl_device_inline ssef fastpow24(const ssef &arg)
/* Calculate x^4/5, coefficient 0.994 was constructed manually to minimize avg error */
/* 0x3F4CCCCD = 4/5 */
/* 0x4F55A7FB = 2^(127/(4/5) - 127) * 0.994^(1/(4/5)) */
ssef x = fastpow<0x3F4CCCCD, 0x4F55A7FB>(arg); // error max = 0.17 avg = 0.0018 |avg| = 0.05
ssef arg2 = arg * arg;
ssef arg4 = arg2 * arg2;
float4 x = fastpow<0x3F4CCCCD, 0x4F55A7FB>(
arg); // error max = 0.17 avg = 0.0018 |avg| = 0.05
float4 arg2 = arg * arg;
float4 arg4 = arg2 * arg2;
/* error max = 0.018 avg = 0.0031 |avg| = 0.0031 */
x = improve_5throot_solution(x, arg4);
@@ -271,12 +271,12 @@ ccl_device_inline ssef fastpow24(const ssef &arg)
return x * (x * x);
}
ccl_device ssef color_srgb_to_linear(const ssef &c)
ccl_device float4 color_srgb_to_linear(const float4 &c)
{
sseb cmp = c < ssef(0.04045f);
ssef lt = max(c * ssef(1.0f / 12.92f), ssef(0.0f));
ssef gtebase = (c + ssef(0.055f)) * ssef(1.0f / 1.055f); /* fma */
ssef gte = fastpow24(gtebase);
int4 cmp = c < make_float4(0.04045f);
float4 lt = max(c * make_float4(1.0f / 12.92f), make_float4(0.0f));
float4 gtebase = (c + make_float4(0.055f)) * make_float4(1.0f / 1.055f); /* fma */
float4 gte = fastpow24(gtebase);
return select(cmp, lt, gte);
}
#endif /* __KERNEL_SSE2__ */
@@ -302,10 +302,8 @@ ccl_device float4 color_linear_to_srgb_v4(float4 c)
ccl_device float4 color_srgb_to_linear_v4(float4 c)
{
#ifdef __KERNEL_SSE2__
ssef r_ssef;
float4 &r = (float4 &)r_ssef;
r = c;
r_ssef = color_srgb_to_linear(r_ssef);
float4 r = c;
r = color_srgb_to_linear(r);
r.w = c.w;
return r;
#else

View File

@@ -23,6 +23,7 @@
/* Leave inlining decisions to compiler for these, the inline keyword here
* is not about performance but including function definitions in headers. */
# define ccl_device static inline
# define ccl_device_extern extern "C"
# define ccl_device_noinline static inline
# define ccl_device_noinline_cpu ccl_device_noinline

View File

@@ -154,17 +154,17 @@ ccl_device_inline half float_to_half_display(const float f)
ccl_device_inline half4 float4_to_half4_display(const float4 f)
{
#ifdef __KERNEL_SSE2__
#ifdef __KERNEL_SSE__
/* CPU: SSE and AVX. */
ssef x = min(max(load4f(f), 0.0f), 65504.0f);
float4 x = min(max(f, make_float4(0.0f)), make_float4(65504.0f));
# ifdef __KERNEL_AVX2__
ssei rpack = _mm_cvtps_ph(x, 0);
int4 rpack = int4(_mm_cvtps_ph(x, 0));
# else
ssei absolute = cast(x) & 0x7FFFFFFF;
ssei Z = absolute + 0xC8000000;
ssei result = andnot(absolute < 0x38800000, Z);
ssei rshift = (result >> 13) & 0x7FFF;
ssei rpack = _mm_packs_epi32(rshift, rshift);
int4 absolute = cast(x) & make_int4(0x7FFFFFFF);
int4 Z = absolute + make_int4(0xC8000000);
int4 result = andnot(absolute < make_int4(0x38800000), Z);
int4 rshift = (result >> 13) & make_int4(0x7FFF);
int4 rpack = int4(_mm_packs_epi32(rshift, rshift));
# endif
half4 h;
_mm_storel_pi((__m64 *)&h, _mm_castsi128_ps(rpack));

View File

@@ -222,7 +222,7 @@ ccl_device_inline float3 hash_float4_to_float3(float4 k)
/* SSE Versions Of Jenkins Lookup3 Hash Functions */
#ifdef __KERNEL_SSE2__
#ifdef __KERNEL_SSE__
# define rot(x, k) (((x) << (k)) | (srl(x, 32 - (k))))
# define mix(a, b, c) \
@@ -265,10 +265,10 @@ ccl_device_inline float3 hash_float4_to_float3(float4 k)
c -= rot(b, 24); \
}
ccl_device_inline ssei hash_ssei(ssei kx)
ccl_device_inline int4 hash_int4(int4 kx)
{
ssei a, b, c;
a = b = c = ssei(0xdeadbeef + (1 << 2) + 13);
int4 a, b, c;
a = b = c = make_int4(0xdeadbeef + (1 << 2) + 13);
a += kx;
final(a, b, c);
@@ -276,10 +276,10 @@ ccl_device_inline ssei hash_ssei(ssei kx)
return c;
}
ccl_device_inline ssei hash_ssei2(ssei kx, ssei ky)
ccl_device_inline int4 hash_int4_2(int4 kx, int4 ky)
{
ssei a, b, c;
a = b = c = ssei(0xdeadbeef + (2 << 2) + 13);
int4 a, b, c;
a = b = c = make_int4(0xdeadbeef + (2 << 2) + 13);
b += ky;
a += kx;
@@ -288,10 +288,10 @@ ccl_device_inline ssei hash_ssei2(ssei kx, ssei ky)
return c;
}
ccl_device_inline ssei hash_ssei3(ssei kx, ssei ky, ssei kz)
ccl_device_inline int4 hash_int4_3(int4 kx, int4 ky, int4 kz)
{
ssei a, b, c;
a = b = c = ssei(0xdeadbeef + (3 << 2) + 13);
int4 a, b, c;
a = b = c = make_int4(0xdeadbeef + (3 << 2) + 13);
c += kz;
b += ky;
@@ -301,10 +301,10 @@ ccl_device_inline ssei hash_ssei3(ssei kx, ssei ky, ssei kz)
return c;
}
ccl_device_inline ssei hash_ssei4(ssei kx, ssei ky, ssei kz, ssei kw)
ccl_device_inline int4 hash_int4_4(int4 kx, int4 ky, int4 kz, int4 kw)
{
ssei a, b, c;
a = b = c = ssei(0xdeadbeef + (4 << 2) + 13);
int4 a, b, c;
a = b = c = make_int4(0xdeadbeef + (4 << 2) + 13);
a += kx;
b += ky;
@@ -317,11 +317,11 @@ ccl_device_inline ssei hash_ssei4(ssei kx, ssei ky, ssei kz, ssei kw)
return c;
}
# if defined(__KERNEL_AVX__)
ccl_device_inline avxi hash_avxi(avxi kx)
# if defined(__KERNEL_AVX2__)
ccl_device_inline vint8 hash_int8(vint8 kx)
{
avxi a, b, c;
a = b = c = avxi(0xdeadbeef + (1 << 2) + 13);
vint8 a, b, c;
a = b = c = make_vint8(0xdeadbeef + (1 << 2) + 13);
a += kx;
final(a, b, c);
@@ -329,10 +329,10 @@ ccl_device_inline avxi hash_avxi(avxi kx)
return c;
}
ccl_device_inline avxi hash_avxi2(avxi kx, avxi ky)
ccl_device_inline vint8 hash_int8_2(vint8 kx, vint8 ky)
{
avxi a, b, c;
a = b = c = avxi(0xdeadbeef + (2 << 2) + 13);
vint8 a, b, c;
a = b = c = make_vint8(0xdeadbeef + (2 << 2) + 13);
b += ky;
a += kx;
@@ -341,10 +341,10 @@ ccl_device_inline avxi hash_avxi2(avxi kx, avxi ky)
return c;
}
ccl_device_inline avxi hash_avxi3(avxi kx, avxi ky, avxi kz)
ccl_device_inline vint8 hash_int8_3(vint8 kx, vint8 ky, vint8 kz)
{
avxi a, b, c;
a = b = c = avxi(0xdeadbeef + (3 << 2) + 13);
vint8 a, b, c;
a = b = c = make_vint8(0xdeadbeef + (3 << 2) + 13);
c += kz;
b += ky;
@@ -354,10 +354,10 @@ ccl_device_inline avxi hash_avxi3(avxi kx, avxi ky, avxi kz)
return c;
}
ccl_device_inline avxi hash_avxi4(avxi kx, avxi ky, avxi kz, avxi kw)
ccl_device_inline vint8 hash_int8_4(vint8 kx, vint8 ky, vint8 kz, vint8 kw)
{
avxi a, b, c;
a = b = c = avxi(0xdeadbeef + (4 << 2) + 13);
vint8 a, b, c;
a = b = c = make_vint8(0xdeadbeef + (4 << 2) + 13);
a += kx;
b += ky;

View File

@@ -532,12 +532,14 @@ CCL_NAMESPACE_END
#include "util/math_int2.h"
#include "util/math_int3.h"
#include "util/math_int4.h"
#include "util/math_int8.h"
#include "util/math_float2.h"
#include "util/math_float3.h"
#include "util/math_float4.h"
#include "util/math_float8.h"
#include "util/math_float3.h"
#include "util/rect.h"
CCL_NAMESPACE_BEGIN

View File

@@ -10,55 +10,6 @@
CCL_NAMESPACE_BEGIN
/*******************************************************************************
* Declaration.
*/
#if !defined(__KERNEL_METAL__)
ccl_device_inline float2 operator-(const float2 &a);
ccl_device_inline float2 operator*(const float2 &a, const float2 &b);
ccl_device_inline float2 operator*(const float2 &a, float f);
ccl_device_inline float2 operator*(float f, const float2 &a);
ccl_device_inline float2 operator/(float f, const float2 &a);
ccl_device_inline float2 operator/(const float2 &a, float f);
ccl_device_inline float2 operator/(const float2 &a, const float2 &b);
ccl_device_inline float2 operator+(const float2 &a, const float f);
ccl_device_inline float2 operator+(const float2 &a, const float2 &b);
ccl_device_inline float2 operator-(const float2 &a, const float f);
ccl_device_inline float2 operator-(const float2 &a, const float2 &b);
ccl_device_inline float2 operator+=(float2 &a, const float2 &b);
ccl_device_inline float2 operator*=(float2 &a, const float2 &b);
ccl_device_inline float2 operator*=(float2 &a, float f);
ccl_device_inline float2 operator/=(float2 &a, const float2 &b);
ccl_device_inline float2 operator/=(float2 &a, float f);
ccl_device_inline bool operator==(const float2 &a, const float2 &b);
ccl_device_inline bool operator!=(const float2 &a, const float2 &b);
ccl_device_inline bool is_zero(const float2 &a);
ccl_device_inline float average(const float2 &a);
ccl_device_inline float distance(const float2 &a, const float2 &b);
ccl_device_inline float dot(const float2 &a, const float2 &b);
ccl_device_inline float cross(const float2 &a, const float2 &b);
ccl_device_inline float len(const float2 a);
ccl_device_inline float2 normalize(const float2 &a);
ccl_device_inline float2 normalize_len(const float2 &a, float *t);
ccl_device_inline float2 safe_normalize(const float2 &a);
ccl_device_inline float2 min(const float2 &a, const float2 &b);
ccl_device_inline float2 max(const float2 &a, const float2 &b);
ccl_device_inline float2 clamp(const float2 &a, const float2 &mn, const float2 &mx);
ccl_device_inline float2 fabs(const float2 &a);
ccl_device_inline float2 as_float2(const float4 &a);
ccl_device_inline float2 interp(const float2 &a, const float2 &b, float t);
ccl_device_inline float2 floor(const float2 &a);
#endif /* !__KERNEL_METAL__ */
ccl_device_inline float2 safe_divide_float2_float(const float2 a, const float b);
/*******************************************************************************
* Definition.
*/
ccl_device_inline float2 zero_float2()
{
return make_float2(0.0f, 0.0f);
@@ -75,63 +26,63 @@ ccl_device_inline float2 operator-(const float2 &a)
return make_float2(-a.x, -a.y);
}
ccl_device_inline float2 operator*(const float2 &a, const float2 &b)
ccl_device_inline float2 operator*(const float2 a, const float2 b)
{
return make_float2(a.x * b.x, a.y * b.y);
}
ccl_device_inline float2 operator*(const float2 &a, float f)
ccl_device_inline float2 operator*(const float2 a, float f)
{
return make_float2(a.x * f, a.y * f);
}
ccl_device_inline float2 operator*(float f, const float2 &a)
ccl_device_inline float2 operator*(float f, const float2 a)
{
return make_float2(a.x * f, a.y * f);
}
ccl_device_inline float2 operator/(float f, const float2 &a)
ccl_device_inline float2 operator/(float f, const float2 a)
{
return make_float2(f / a.x, f / a.y);
}
ccl_device_inline float2 operator/(const float2 &a, float f)
ccl_device_inline float2 operator/(const float2 a, float f)
{
float invf = 1.0f / f;
return make_float2(a.x * invf, a.y * invf);
}
ccl_device_inline float2 operator/(const float2 &a, const float2 &b)
ccl_device_inline float2 operator/(const float2 a, const float2 b)
{
return make_float2(a.x / b.x, a.y / b.y);
}
ccl_device_inline float2 operator+(const float2 &a, const float f)
{
return a + make_float2(f, f);
}
ccl_device_inline float2 operator+(const float2 &a, const float2 &b)
ccl_device_inline float2 operator+(const float2 a, const float2 b)
{
return make_float2(a.x + b.x, a.y + b.y);
}
ccl_device_inline float2 operator-(const float2 &a, const float f)
ccl_device_inline float2 operator+(const float2 a, const float f)
{
return a - make_float2(f, f);
return a + make_float2(f, f);
}
ccl_device_inline float2 operator-(const float2 &a, const float2 &b)
ccl_device_inline float2 operator-(const float2 a, const float2 b)
{
return make_float2(a.x - b.x, a.y - b.y);
}
ccl_device_inline float2 operator+=(float2 &a, const float2 &b)
ccl_device_inline float2 operator-(const float2 a, const float f)
{
return a - make_float2(f, f);
}
ccl_device_inline float2 operator+=(float2 &a, const float2 b)
{
return a = a + b;
}
ccl_device_inline float2 operator*=(float2 &a, const float2 &b)
ccl_device_inline float2 operator*=(float2 &a, const float2 b)
{
return a = a * b;
}
@@ -141,7 +92,7 @@ ccl_device_inline float2 operator*=(float2 &a, float f)
return a = a * f;
}
ccl_device_inline float2 operator/=(float2 &a, const float2 &b)
ccl_device_inline float2 operator/=(float2 &a, const float2 b)
{
return a = a / b;
}
@@ -152,74 +103,81 @@ ccl_device_inline float2 operator/=(float2 &a, float f)
return a = a * invf;
}
ccl_device_inline bool operator==(const float2 &a, const float2 &b)
ccl_device_inline bool operator==(const float2 a, const float2 b)
{
return (a.x == b.x && a.y == b.y);
}
ccl_device_inline bool operator!=(const float2 &a, const float2 &b)
ccl_device_inline bool operator!=(const float2 a, const float2 b)
{
return !(a == b);
}
ccl_device_inline bool is_zero(const float2 &a)
ccl_device_inline bool is_zero(const float2 a)
{
return (a.x == 0.0f && a.y == 0.0f);
}
ccl_device_inline float average(const float2 &a)
ccl_device_inline float average(const float2 a)
{
return (a.x + a.y) * (1.0f / 2.0f);
}
ccl_device_inline float distance(const float2 &a, const float2 &b)
ccl_device_inline float dot(const float2 a, const float2 b)
{
return a.x * b.x + a.y * b.y;
}
#endif
ccl_device_inline float len(const float2 a)
{
return sqrtf(dot(a, a));
}
#if !defined(__KERNEL_METAL__)
ccl_device_inline float distance(const float2 a, const float2 b)
{
return len(a - b);
}
ccl_device_inline float dot(const float2 &a, const float2 &b)
{
return a.x * b.x + a.y * b.y;
}
ccl_device_inline float cross(const float2 &a, const float2 &b)
ccl_device_inline float cross(const float2 a, const float2 b)
{
return (a.x * b.y - a.y * b.x);
}
ccl_device_inline float2 normalize(const float2 &a)
ccl_device_inline float2 normalize(const float2 a)
{
return a / len(a);
}
ccl_device_inline float2 normalize_len(const float2 &a, ccl_private float *t)
ccl_device_inline float2 normalize_len(const float2 a, ccl_private float *t)
{
*t = len(a);
return a / (*t);
}
ccl_device_inline float2 safe_normalize(const float2 &a)
ccl_device_inline float2 safe_normalize(const float2 a)
{
float t = len(a);
return (t != 0.0f) ? a / t : a;
}
ccl_device_inline float2 min(const float2 &a, const float2 &b)
ccl_device_inline float2 min(const float2 a, const float2 b)
{
return make_float2(min(a.x, b.x), min(a.y, b.y));
}
ccl_device_inline float2 max(const float2 &a, const float2 &b)
ccl_device_inline float2 max(const float2 a, const float2 b)
{
return make_float2(max(a.x, b.x), max(a.y, b.y));
}
ccl_device_inline float2 clamp(const float2 &a, const float2 &mn, const float2 &mx)
ccl_device_inline float2 clamp(const float2 a, const float2 mn, const float2 mx)
{
return min(max(a, mn), mx);
}
ccl_device_inline float2 fabs(const float2 &a)
ccl_device_inline float2 fabs(const float2 a)
{
return make_float2(fabsf(a.x), fabsf(a.y));
}
@@ -229,28 +187,23 @@ ccl_device_inline float2 as_float2(const float4 &a)
return make_float2(a.x, a.y);
}
ccl_device_inline float2 interp(const float2 &a, const float2 &b, float t)
ccl_device_inline float2 interp(const float2 a, const float2 b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float2 mix(const float2 &a, const float2 &b, float t)
ccl_device_inline float2 mix(const float2 a, const float2 b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float2 floor(const float2 &a)
ccl_device_inline float2 floor(const float2 a)
{
return make_float2(floorf(a.x), floorf(a.y));
}
#endif /* !__KERNEL_METAL__ */
ccl_device_inline float len(const float2 a)
{
return sqrtf(dot(a, a));
}
ccl_device_inline float2 safe_divide_float2_float(const float2 a, const float b)
{
return (b != 0.0f) ? a / b : zero_float2();

View File

@@ -1,4 +1,5 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2013 Intel Corporation
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_MATH_FLOAT3_H__
@@ -10,73 +11,6 @@
CCL_NAMESPACE_BEGIN
/*******************************************************************************
* Declaration.
*/
#if !defined(__KERNEL_METAL__)
ccl_device_inline float3 operator-(const float3 &a);
ccl_device_inline float3 operator*(const float3 &a, const float3 &b);
ccl_device_inline float3 operator*(const float3 &a, const float f);
ccl_device_inline float3 operator*(const float f, const float3 &a);
ccl_device_inline float3 operator/(const float f, const float3 &a);
ccl_device_inline float3 operator/(const float3 &a, const float f);
ccl_device_inline float3 operator/(const float3 &a, const float3 &b);
ccl_device_inline float3 operator+(const float3 &a, const float f);
ccl_device_inline float3 operator+(const float3 &a, const float3 &b);
ccl_device_inline float3 operator-(const float3 &a, const float f);
ccl_device_inline float3 operator-(const float3 &a, const float3 &b);
ccl_device_inline float3 operator+=(float3 &a, const float3 &b);
ccl_device_inline float3 operator-=(float3 &a, const float3 &b);
ccl_device_inline float3 operator*=(float3 &a, const float3 &b);
ccl_device_inline float3 operator*=(float3 &a, float f);
ccl_device_inline float3 operator/=(float3 &a, const float3 &b);
ccl_device_inline float3 operator/=(float3 &a, float f);
ccl_device_inline bool operator==(const float3 &a, const float3 &b);
ccl_device_inline bool operator!=(const float3 &a, const float3 &b);
ccl_device_inline float distance(const float3 &a, const float3 &b);
ccl_device_inline float dot(const float3 &a, const float3 &b);
ccl_device_inline float dot_xy(const float3 &a, const float3 &b);
ccl_device_inline float3 cross(const float3 &a, const float3 &b);
ccl_device_inline float3 normalize(const float3 &a);
ccl_device_inline float3 min(const float3 &a, const float3 &b);
ccl_device_inline float3 max(const float3 &a, const float3 &b);
ccl_device_inline float3 clamp(const float3 &a, const float3 &mn, const float3 &mx);
ccl_device_inline float3 fabs(const float3 &a);
ccl_device_inline float3 mix(const float3 &a, const float3 &b, float t);
ccl_device_inline float3 rcp(const float3 &a);
ccl_device_inline float3 sqrt(const float3 &a);
ccl_device_inline float3 floor(const float3 &a);
ccl_device_inline float3 ceil(const float3 &a);
ccl_device_inline float3 reflect(const float3 incident, const float3 normal);
#endif /* !defined(__KERNEL_METAL__) */
ccl_device_inline float reduce_min(float3 a);
ccl_device_inline float reduce_max(float3 a);
ccl_device_inline float len(const float3 a);
ccl_device_inline float len_squared(const float3 a);
ccl_device_inline float3 project(const float3 v, const float3 v_proj);
ccl_device_inline float3 safe_normalize(const float3 a);
ccl_device_inline float3 normalize_len(const float3 a, ccl_private float *t);
ccl_device_inline float3 safe_normalize_len(const float3 a, ccl_private float *t);
ccl_device_inline float3 safe_divide(const float3 a, const float3 b);
ccl_device_inline float3 safe_divide(const float3 a, const float b);
ccl_device_inline float3 interp(float3 a, float3 b, float t);
ccl_device_inline float3 sqr(float3 a);
ccl_device_inline bool is_zero(const float3 a);
ccl_device_inline float reduce_add(const float3 a);
ccl_device_inline float average(const float3 a);
ccl_device_inline bool isequal(const float3 a, const float3 b);
/*******************************************************************************
* Definition.
*/
ccl_device_inline float3 zero_float3()
{
#ifdef __KERNEL_SSE__
@@ -109,7 +43,7 @@ ccl_device_inline float3 operator-(const float3 &a)
# endif
}
ccl_device_inline float3 operator*(const float3 &a, const float3 &b)
ccl_device_inline float3 operator*(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(_mm_mul_ps(a.m128, b.m128));
@@ -118,7 +52,7 @@ ccl_device_inline float3 operator*(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 operator*(const float3 &a, const float f)
ccl_device_inline float3 operator*(const float3 a, const float f)
{
# ifdef __KERNEL_SSE__
return float3(_mm_mul_ps(a.m128, _mm_set1_ps(f)));
@@ -127,7 +61,7 @@ ccl_device_inline float3 operator*(const float3 &a, const float f)
# endif
}
ccl_device_inline float3 operator*(const float f, const float3 &a)
ccl_device_inline float3 operator*(const float f, const float3 a)
{
# if defined(__KERNEL_SSE__)
return float3(_mm_mul_ps(_mm_set1_ps(f), a.m128));
@@ -136,7 +70,7 @@ ccl_device_inline float3 operator*(const float f, const float3 &a)
# endif
}
ccl_device_inline float3 operator/(const float f, const float3 &a)
ccl_device_inline float3 operator/(const float f, const float3 a)
{
# if defined(__KERNEL_SSE__)
return float3(_mm_div_ps(_mm_set1_ps(f), a.m128));
@@ -145,7 +79,7 @@ ccl_device_inline float3 operator/(const float f, const float3 &a)
# endif
}
ccl_device_inline float3 operator/(const float3 &a, const float f)
ccl_device_inline float3 operator/(const float3 a, const float f)
{
# if defined(__KERNEL_SSE__)
return float3(_mm_div_ps(a.m128, _mm_set1_ps(f)));
@@ -154,7 +88,7 @@ ccl_device_inline float3 operator/(const float3 &a, const float f)
# endif
}
ccl_device_inline float3 operator/(const float3 &a, const float3 &b)
ccl_device_inline float3 operator/(const float3 a, const float3 b)
{
# if defined(__KERNEL_SSE__)
return float3(_mm_div_ps(a.m128, b.m128));
@@ -163,12 +97,7 @@ ccl_device_inline float3 operator/(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 operator+(const float3 &a, const float f)
{
return a + make_float3(f, f, f);
}
ccl_device_inline float3 operator+(const float3 &a, const float3 &b)
ccl_device_inline float3 operator+(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(_mm_add_ps(a.m128, b.m128));
@@ -177,12 +106,12 @@ ccl_device_inline float3 operator+(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 operator-(const float3 &a, const float f)
ccl_device_inline float3 operator+(const float3 a, const float f)
{
return a - make_float3(f, f, f);
return a + make_float3(f, f, f);
}
ccl_device_inline float3 operator-(const float3 &a, const float3 &b)
ccl_device_inline float3 operator-(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(_mm_sub_ps(a.m128, b.m128));
@@ -191,17 +120,22 @@ ccl_device_inline float3 operator-(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 operator+=(float3 &a, const float3 &b)
ccl_device_inline float3 operator-(const float3 a, const float f)
{
return a - make_float3(f, f, f);
}
ccl_device_inline float3 operator+=(float3 &a, const float3 b)
{
return a = a + b;
}
ccl_device_inline float3 operator-=(float3 &a, const float3 &b)
ccl_device_inline float3 operator-=(float3 &a, const float3 b)
{
return a = a - b;
}
ccl_device_inline float3 operator*=(float3 &a, const float3 &b)
ccl_device_inline float3 operator*=(float3 &a, const float3 b)
{
return a = a * b;
}
@@ -211,7 +145,7 @@ ccl_device_inline float3 operator*=(float3 &a, float f)
return a = a * f;
}
ccl_device_inline float3 operator/=(float3 &a, const float3 &b)
ccl_device_inline float3 operator/=(float3 &a, const float3 b)
{
return a = a / b;
}
@@ -223,7 +157,7 @@ ccl_device_inline float3 operator/=(float3 &a, float f)
}
# if !(defined(__KERNEL_METAL__) || defined(__KERNEL_CUDA__))
ccl_device_inline packed_float3 operator*=(packed_float3 &a, const float3 &b)
ccl_device_inline packed_float3 operator*=(packed_float3 &a, const float3 b)
{
a = float3(a) * b;
return a;
@@ -235,7 +169,7 @@ ccl_device_inline packed_float3 operator*=(packed_float3 &a, float f)
return a;
}
ccl_device_inline packed_float3 operator/=(packed_float3 &a, const float3 &b)
ccl_device_inline packed_float3 operator/=(packed_float3 &a, const float3 b)
{
a = float3(a) / b;
return a;
@@ -248,7 +182,7 @@ ccl_device_inline packed_float3 operator/=(packed_float3 &a, float f)
}
# endif
ccl_device_inline bool operator==(const float3 &a, const float3 &b)
ccl_device_inline bool operator==(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return (_mm_movemask_ps(_mm_cmpeq_ps(a.m128, b.m128)) & 7) == 7;
@@ -257,17 +191,12 @@ ccl_device_inline bool operator==(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline bool operator!=(const float3 &a, const float3 &b)
ccl_device_inline bool operator!=(const float3 a, const float3 b)
{
return !(a == b);
}
ccl_device_inline float distance(const float3 &a, const float3 &b)
{
return len(a - b);
}
ccl_device_inline float dot(const float3 &a, const float3 &b)
ccl_device_inline float dot(const float3 a, const float3 b)
{
# if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
return _mm_cvtss_f32(_mm_dp_ps(a, b, 0x7F));
@@ -276,26 +205,62 @@ ccl_device_inline float dot(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float dot_xy(const float3 &a, const float3 &b)
#endif
ccl_device_inline float dot_xy(const float3 a, const float3 b)
{
# if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
#if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
return _mm_cvtss_f32(_mm_hadd_ps(_mm_mul_ps(a, b), b));
# else
#else
return a.x * b.x + a.y * b.y;
# endif
#endif
}
ccl_device_inline float3 cross(const float3 &a, const float3 &b)
ccl_device_inline float len(const float3 a)
{
#if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_dp_ps(a.m128, a.m128, 0x7F)));
#else
return sqrtf(dot(a, a));
#endif
}
ccl_device_inline float reduce_min(float3 a)
{
return min(min(a.x, a.y), a.z);
}
ccl_device_inline float reduce_max(float3 a)
{
return max(max(a.x, a.y), a.z);
}
ccl_device_inline float len_squared(const float3 a)
{
return dot(a, a);
}
#ifndef __KERNEL_METAL__
ccl_device_inline float distance(const float3 a, const float3 b)
{
return len(a - b);
}
ccl_device_inline float3 cross(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(shuffle<1, 2, 0, 3>(
msub(ssef(a), shuffle<1, 2, 0, 3>(ssef(b)), shuffle<1, 2, 0, 3>(ssef(a)) * ssef(b))));
const float4 x = float4(a.m128);
const float4 y = shuffle<1, 2, 0, 3>(float4(b.m128));
const float4 z = float4(_mm_mul_ps(shuffle<1, 2, 0, 3>(float4(a.m128)), float4(b.m128)));
return float3(shuffle<1, 2, 0, 3>(msub(x, y, z)).m128);
# else
return make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x);
# endif
}
ccl_device_inline float3 normalize(const float3 &a)
ccl_device_inline float3 normalize(const float3 a)
{
# if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
__m128 norm = _mm_sqrt_ps(_mm_dp_ps(a.m128, a.m128, 0x7F));
@@ -305,7 +270,7 @@ ccl_device_inline float3 normalize(const float3 &a)
# endif
}
ccl_device_inline float3 min(const float3 &a, const float3 &b)
ccl_device_inline float3 min(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(_mm_min_ps(a.m128, b.m128));
@@ -314,7 +279,7 @@ ccl_device_inline float3 min(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 max(const float3 &a, const float3 &b)
ccl_device_inline float3 max(const float3 a, const float3 b)
{
# ifdef __KERNEL_SSE__
return float3(_mm_max_ps(a.m128, b.m128));
@@ -323,12 +288,12 @@ ccl_device_inline float3 max(const float3 &a, const float3 &b)
# endif
}
ccl_device_inline float3 clamp(const float3 &a, const float3 &mn, const float3 &mx)
ccl_device_inline float3 clamp(const float3 a, const float3 mn, const float3 mx)
{
return min(max(a, mn), mx);
}
ccl_device_inline float3 fabs(const float3 &a)
ccl_device_inline float3 fabs(const float3 a)
{
# ifdef __KERNEL_SSE__
# ifdef __KERNEL_NEON__
@@ -342,7 +307,7 @@ ccl_device_inline float3 fabs(const float3 &a)
# endif
}
ccl_device_inline float3 sqrt(const float3 &a)
ccl_device_inline float3 sqrt(const float3 a)
{
# ifdef __KERNEL_SSE__
return float3(_mm_sqrt_ps(a));
@@ -351,7 +316,7 @@ ccl_device_inline float3 sqrt(const float3 &a)
# endif
}
ccl_device_inline float3 floor(const float3 &a)
ccl_device_inline float3 floor(const float3 a)
{
# ifdef __KERNEL_SSE__
return float3(_mm_floor_ps(a));
@@ -360,7 +325,7 @@ ccl_device_inline float3 floor(const float3 &a)
# endif
}
ccl_device_inline float3 ceil(const float3 &a)
ccl_device_inline float3 ceil(const float3 a)
{
# ifdef __KERNEL_SSE__
return float3(_mm_ceil_ps(a));
@@ -369,12 +334,12 @@ ccl_device_inline float3 ceil(const float3 &a)
# endif
}
ccl_device_inline float3 mix(const float3 &a, const float3 &b, float t)
ccl_device_inline float3 mix(const float3 a, const float3 b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float3 rcp(const float3 &a)
ccl_device_inline float3 rcp(const float3 a)
{
# ifdef __KERNEL_SSE__
/* Don't use _mm_rcp_ps due to poor precision. */
@@ -399,33 +364,6 @@ ccl_device_inline float3 log(float3 v)
return make_float3(logf(v.x), logf(v.y), logf(v.z));
}
#endif /* !__KERNEL_METAL__ */
ccl_device_inline float reduce_min(float3 a)
{
return min(min(a.x, a.y), a.z);
}
ccl_device_inline float reduce_max(float3 a)
{
return max(max(a.x, a.y), a.z);
}
ccl_device_inline float len(const float3 a)
{
#if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_dp_ps(a.m128, a.m128, 0x7F)));
#else
return sqrtf(dot(a, a));
#endif
}
ccl_device_inline float len_squared(const float3 a)
{
return dot(a, a);
}
#if !defined(__KERNEL_METAL__)
ccl_device_inline float3 reflect(const float3 incident, const float3 normal)
{
float3 unit_normal = normalize(normal);

View File

@@ -1,4 +1,5 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2013 Intel Corporation
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_MATH_FLOAT4_H__
@@ -10,85 +11,6 @@
CCL_NAMESPACE_BEGIN
/*******************************************************************************
* Declaration.
*/
#if !defined(__KERNEL_METAL__)
ccl_device_inline float4 operator-(const float4 &a);
ccl_device_inline float4 operator*(const float4 &a, const float4 &b);
ccl_device_inline float4 operator*(const float4 &a, float f);
ccl_device_inline float4 operator*(float f, const float4 &a);
ccl_device_inline float4 operator/(const float4 &a, float f);
ccl_device_inline float4 operator/(const float4 &a, const float4 &b);
ccl_device_inline float4 operator+(const float4 &a, const float f);
ccl_device_inline float4 operator+(const float4 &a, const float4 &b);
ccl_device_inline float4 operator-(const float4 &a, const float f);
ccl_device_inline float4 operator-(const float4 &a, const float4 &b);
ccl_device_inline float4 operator+=(float4 &a, const float4 &b);
ccl_device_inline float4 operator*=(float4 &a, const float4 &b);
ccl_device_inline float4 operator*=(float4 &a, float f);
ccl_device_inline float4 operator/=(float4 &a, float f);
ccl_device_inline int4 operator<(const float4 &a, const float4 &b);
ccl_device_inline int4 operator>=(const float4 &a, const float4 &b);
ccl_device_inline int4 operator<=(const float4 &a, const float4 &b);
ccl_device_inline bool operator==(const float4 &a, const float4 &b);
ccl_device_inline float distance(const float4 &a, const float4 &b);
ccl_device_inline float dot(const float4 &a, const float4 &b);
ccl_device_inline float len_squared(const float4 &a);
ccl_device_inline float4 rcp(const float4 &a);
ccl_device_inline float4 sqrt(const float4 &a);
ccl_device_inline float4 sqr(const float4 &a);
ccl_device_inline float4 cross(const float4 &a, const float4 &b);
ccl_device_inline bool is_zero(const float4 &a);
ccl_device_inline float average(const float4 &a);
ccl_device_inline float len(const float4 &a);
ccl_device_inline float4 normalize(const float4 &a);
ccl_device_inline float4 safe_normalize(const float4 &a);
ccl_device_inline float4 min(const float4 &a, const float4 &b);
ccl_device_inline float4 max(const float4 &a, const float4 &b);
ccl_device_inline float4 clamp(const float4 &a, const float4 &mn, const float4 &mx);
ccl_device_inline float4 fabs(const float4 &a);
ccl_device_inline float4 floor(const float4 &a);
ccl_device_inline float4 mix(const float4 &a, const float4 &b, float t);
#endif /* !__KERNEL_METAL__*/
ccl_device_inline float4 safe_divide(const float4 a, const float4 b);
ccl_device_inline float4 safe_divide(const float4 a, const float b);
#ifdef __KERNEL_SSE__
template<size_t index_0, size_t index_1, size_t index_2, size_t index_3>
__forceinline const float4 shuffle(const float4 &b);
template<size_t index_0, size_t index_1, size_t index_2, size_t index_3>
__forceinline const float4 shuffle(const float4 &a, const float4 &b);
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 &b);
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 &a, const float4 &b);
template<> __forceinline const float4 shuffle<2, 3, 2, 3>(const float4 &a, const float4 &b);
# ifdef __KERNEL_SSE3__
template<> __forceinline const float4 shuffle<0, 0, 2, 2>(const float4 &b);
template<> __forceinline const float4 shuffle<1, 1, 3, 3>(const float4 &b);
# endif
#endif /* __KERNEL_SSE__ */
ccl_device_inline float reduce_min(const float4 a);
ccl_device_inline float reduce_max(const float4 a);
ccl_device_inline float reduce_add(const float4 a);
ccl_device_inline bool isequal(const float4 a, const float4 b);
#ifndef __KERNEL_GPU__
ccl_device_inline float4 select(const int4 &mask, const float4 &a, const float4 &b);
#endif /* !__KERNEL_GPU__ */
/*******************************************************************************
* Definition.
*/
ccl_device_inline float4 zero_float4()
{
#ifdef __KERNEL_SSE__
@@ -103,6 +25,16 @@ ccl_device_inline float4 one_float4()
return make_float4(1.0f, 1.0f, 1.0f, 1.0f);
}
ccl_device_inline int4 cast(const float4 a)
{
#ifdef __KERNEL_SSE__
return int4(_mm_castps_si128(a));
#else
return make_int4(
__float_as_int(a.x), __float_as_int(a.y), __float_as_int(a.z), __float_as_int(a.w));
#endif
}
#if !defined(__KERNEL_METAL__)
ccl_device_inline float4 operator-(const float4 &a)
{
@@ -114,7 +46,7 @@ ccl_device_inline float4 operator-(const float4 &a)
# endif
}
ccl_device_inline float4 operator*(const float4 &a, const float4 &b)
ccl_device_inline float4 operator*(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_mul_ps(a.m128, b.m128));
@@ -123,7 +55,7 @@ ccl_device_inline float4 operator*(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 operator*(const float4 &a, float f)
ccl_device_inline float4 operator*(const float4 a, float f)
{
# if defined(__KERNEL_SSE__)
return a * make_float4(f);
@@ -132,17 +64,17 @@ ccl_device_inline float4 operator*(const float4 &a, float f)
# endif
}
ccl_device_inline float4 operator*(float f, const float4 &a)
ccl_device_inline float4 operator*(float f, const float4 a)
{
return a * f;
}
ccl_device_inline float4 operator/(const float4 &a, float f)
ccl_device_inline float4 operator/(const float4 a, float f)
{
return a * (1.0f / f);
}
ccl_device_inline float4 operator/(const float4 &a, const float4 &b)
ccl_device_inline float4 operator/(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_div_ps(a.m128, b.m128));
@@ -151,12 +83,7 @@ ccl_device_inline float4 operator/(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 operator+(const float4 &a, const float f)
{
return a + make_float4(f, f, f, f);
}
ccl_device_inline float4 operator+(const float4 &a, const float4 &b)
ccl_device_inline float4 operator+(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_add_ps(a.m128, b.m128));
@@ -165,12 +92,12 @@ ccl_device_inline float4 operator+(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 operator-(const float4 &a, const float f)
ccl_device_inline float4 operator+(const float4 a, const float f)
{
return a - make_float4(f, f, f, f);
return a + make_float4(f);
}
ccl_device_inline float4 operator-(const float4 &a, const float4 &b)
ccl_device_inline float4 operator-(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_sub_ps(a.m128, b.m128));
@@ -179,17 +106,22 @@ ccl_device_inline float4 operator-(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 operator+=(float4 &a, const float4 &b)
ccl_device_inline float4 operator-(const float4 a, const float f)
{
return a - make_float4(f);
}
ccl_device_inline float4 operator+=(float4 &a, const float4 b)
{
return a = a + b;
}
ccl_device_inline float4 operator-=(float4 &a, const float4 &b)
ccl_device_inline float4 operator-=(float4 &a, const float4 b)
{
return a = a - b;
}
ccl_device_inline float4 operator*=(float4 &a, const float4 &b)
ccl_device_inline float4 operator*=(float4 &a, const float4 b)
{
return a = a * b;
}
@@ -204,7 +136,7 @@ ccl_device_inline float4 operator/=(float4 &a, float f)
return a = a / f;
}
ccl_device_inline int4 operator<(const float4 &a, const float4 &b)
ccl_device_inline int4 operator<(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return int4(_mm_castps_si128(_mm_cmplt_ps(a.m128, b.m128)));
@@ -213,7 +145,7 @@ ccl_device_inline int4 operator<(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline int4 operator>=(const float4 &a, const float4 &b)
ccl_device_inline int4 operator>=(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return int4(_mm_castps_si128(_mm_cmpge_ps(a.m128, b.m128)));
@@ -222,7 +154,7 @@ ccl_device_inline int4 operator>=(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline int4 operator<=(const float4 &a, const float4 &b)
ccl_device_inline int4 operator<=(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return int4(_mm_castps_si128(_mm_cmple_ps(a.m128, b.m128)));
@@ -231,7 +163,7 @@ ccl_device_inline int4 operator<=(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline bool operator==(const float4 &a, const float4 &b)
ccl_device_inline bool operator==(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return (_mm_movemask_ps(_mm_cmpeq_ps(a.m128, b.m128)) & 15) == 15;
@@ -240,95 +172,19 @@ ccl_device_inline bool operator==(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float distance(const float4 &a, const float4 &b)
{
return len(a - b);
}
ccl_device_inline float dot(const float4 &a, const float4 &b)
{
# if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
# if defined(__KERNEL_NEON__)
__m128 t = vmulq_f32(a, b);
return vaddvq_f32(t);
# else
return _mm_cvtss_f32(_mm_dp_ps(a, b, 0xFF));
# endif
# else
return (a.x * b.x + a.y * b.y) + (a.z * b.z + a.w * b.w);
# endif
}
ccl_device_inline float len_squared(const float4 &a)
{
return dot(a, a);
}
ccl_device_inline float4 rcp(const float4 &a)
ccl_device_inline const float4 operator^(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
/* Don't use _mm_rcp_ps due to poor precision. */
return float4(_mm_div_ps(_mm_set_ps1(1.0f), a.m128));
return float4(_mm_xor_ps(a.m128, b.m128));
# else
return make_float4(1.0f / a.x, 1.0f / a.y, 1.0f / a.z, 1.0f / a.w);
return make_float4(__uint_as_float(__float_as_uint(a.x) ^ __float_as_uint(b.x)),
__uint_as_float(__float_as_uint(a.y) ^ __float_as_uint(b.y)),
__uint_as_float(__float_as_uint(a.z) ^ __float_as_uint(b.z)),
__uint_as_float(__float_as_uint(a.w) ^ __float_as_uint(b.w)));
# endif
}
ccl_device_inline float4 sqrt(const float4 &a)
{
# ifdef __KERNEL_SSE__
return float4(_mm_sqrt_ps(a.m128));
# else
return make_float4(sqrtf(a.x), sqrtf(a.y), sqrtf(a.z), sqrtf(a.w));
# endif
}
ccl_device_inline float4 sqr(const float4 &a)
{
return a * a;
}
ccl_device_inline float4 cross(const float4 &a, const float4 &b)
{
# ifdef __KERNEL_SSE__
return (shuffle<1, 2, 0, 0>(a) * shuffle<2, 0, 1, 0>(b)) -
(shuffle<2, 0, 1, 0>(a) * shuffle<1, 2, 0, 0>(b));
# else
return make_float4(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x, 0.0f);
# endif
}
ccl_device_inline bool is_zero(const float4 &a)
{
# ifdef __KERNEL_SSE__
return a == zero_float4();
# else
return (a.x == 0.0f && a.y == 0.0f && a.z == 0.0f && a.w == 0.0f);
# endif
}
ccl_device_inline float average(const float4 &a)
{
return reduce_add(a) * 0.25f;
}
ccl_device_inline float len(const float4 &a)
{
return sqrtf(dot(a, a));
}
ccl_device_inline float4 normalize(const float4 &a)
{
return a / len(a);
}
ccl_device_inline float4 safe_normalize(const float4 &a)
{
float t = len(a);
return (t != 0.0f) ? a / t : a;
}
ccl_device_inline float4 min(const float4 &a, const float4 &b)
ccl_device_inline float4 min(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_min_ps(a.m128, b.m128));
@@ -337,7 +193,7 @@ ccl_device_inline float4 min(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 max(const float4 &a, const float4 &b)
ccl_device_inline float4 max(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return float4(_mm_max_ps(a.m128, b.m128));
@@ -346,55 +202,119 @@ ccl_device_inline float4 max(const float4 &a, const float4 &b)
# endif
}
ccl_device_inline float4 clamp(const float4 &a, const float4 &mn, const float4 &mx)
ccl_device_inline float4 clamp(const float4 a, const float4 mn, const float4 mx)
{
return min(max(a, mn), mx);
}
ccl_device_inline float4 fabs(const float4 &a)
{
# if defined(__KERNEL_SSE__)
# if defined(__KERNEL_NEON__)
return float4(vabsq_f32(a));
# else
return float4(_mm_and_ps(a.m128, _mm_castsi128_ps(_mm_set1_epi32(0x7fffffff))));
# endif
# else
return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w));
# endif
}
ccl_device_inline float4 floor(const float4 &a)
{
# ifdef __KERNEL_SSE__
return float4(_mm_floor_ps(a));
# else
return make_float4(floorf(a.x), floorf(a.y), floorf(a.z), floorf(a.w));
# endif
}
ccl_device_inline float4 mix(const float4 &a, const float4 &b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float4 saturate(const float4 &a)
{
return make_float4(saturatef(a.x), saturatef(a.y), saturatef(a.z), saturatef(a.w));
}
ccl_device_inline float4 exp(float4 v)
{
return make_float4(expf(v.x), expf(v.y), expf(v.z), expf(v.z));
}
ccl_device_inline float4 log(float4 v)
{
return make_float4(logf(v.x), logf(v.y), logf(v.z), logf(v.z));
}
#endif /* !__KERNEL_METAL__*/
ccl_device_inline const float4 madd(const float4 a, const float4 b, const float4 c)
{
#ifdef __KERNEL_SSE__
# ifdef __KERNEL_NEON__
return float4(vfmaq_f32(c, a, b));
# elif defined(__KERNEL_AVX2__)
return float4(_mm_fmadd_ps(a, b, c));
# else
return a * b + c;
# endif
#else
return a * b + c;
#endif
}
ccl_device_inline float4 msub(const float4 a, const float4 b, const float4 c)
{
#ifdef __KERNEL_SSE__
# ifdef __KERNEL_NEON__
return float4(vfmaq_f32(vnegq_f32(c), a, b));
# elif defined(__KERNEL_AVX2__)
return float4(_mm_fmsub_ps(a, b, c));
# else
return a * b - c;
# endif
#else
return a * b - c;
#endif
}
#ifdef __KERNEL_SSE__
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const float4 shuffle(const float4 b)
{
# ifdef __KERNEL_NEON__
return float4(shuffle_neon<float32x4_t, i0, i1, i2, i3>(b.m128));
# else
return float4(
_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(b), _MM_SHUFFLE(i3, i2, i1, i0))));
# endif
}
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 a)
{
return float4(_mm_movelh_ps(a, a));
}
template<> __forceinline const float4 shuffle<2, 3, 2, 3>(const float4 a)
{
return float4(_mm_movehl_ps(a, a));
}
# ifdef __KERNEL_SSE3__
template<> __forceinline const float4 shuffle<0, 0, 2, 2>(const float4 b)
{
return float4(_mm_moveldup_ps(b));
}
template<> __forceinline const float4 shuffle<1, 1, 3, 3>(const float4 b)
{
return float4(_mm_movehdup_ps(b));
}
# endif /* __KERNEL_SSE3__ */
template<size_t i0, size_t i1, size_t i2, size_t i3>
__forceinline const float4 shuffle(const float4 a, const float4 b)
{
# ifdef __KERNEL_NEON__
return float4(shuffle_neon<float32x4_t, i0, i1, i2, i3>(a, b));
# else
return float4(_mm_shuffle_ps(a, b, _MM_SHUFFLE(i3, i2, i1, i0)));
# endif
}
template<size_t i0> __forceinline const float4 shuffle(const float4 b)
{
return shuffle<i0, i0, i0, i0>(b);
}
template<size_t i0> __forceinline const float4 shuffle(const float4 a, const float4 b)
{
# ifdef __KERNEL_NEON__
return float4(shuffle_neon<float32x4_t, i0, i0, i0, i0>(a, b));
# else
return float4(_mm_shuffle_ps(a, b, _MM_SHUFFLE(i0, i0, i0, i0)));
# endif
}
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 a, const float4 b)
{
return float4(_mm_movelh_ps(a, b));
}
template<> __forceinline const float4 shuffle<2, 3, 2, 3>(const float4 a, const float4 b)
{
return float4(_mm_movehl_ps(b, a));
}
template<size_t i> __forceinline float extract(const float4 a)
{
return _mm_cvtss_f32(shuffle<i, i, i, i>(a));
}
template<> __forceinline float extract<0>(const float4 a)
{
return _mm_cvtss_f32(a);
}
#endif
ccl_device_inline float reduce_add(const float4 a)
{
#if defined(__KERNEL_SSE__)
@@ -440,6 +360,166 @@ ccl_device_inline float reduce_max(const float4 a)
#endif
}
#if !defined(__KERNEL_METAL__)
ccl_device_inline float dot(const float4 a, const float4 b)
{
# if defined(__KERNEL_SSE41__) && defined(__KERNEL_SSE__)
# if defined(__KERNEL_NEON__)
__m128 t = vmulq_f32(a, b);
return vaddvq_f32(t);
# else
return _mm_cvtss_f32(_mm_dp_ps(a, b, 0xFF));
# endif
# else
return (a.x * b.x + a.y * b.y) + (a.z * b.z + a.w * b.w);
# endif
}
#endif /* !defined(__KERNEL_METAL__) */
ccl_device_inline float len(const float4 a)
{
return sqrtf(dot(a, a));
}
ccl_device_inline float len_squared(const float4 a)
{
return dot(a, a);
}
#if !defined(__KERNEL_METAL__)
ccl_device_inline float distance(const float4 a, const float4 b)
{
return len(a - b);
}
ccl_device_inline float4 rcp(const float4 a)
{
# ifdef __KERNEL_SSE__
/* Don't use _mm_rcp_ps due to poor precision. */
return float4(_mm_div_ps(_mm_set_ps1(1.0f), a.m128));
# else
return make_float4(1.0f / a.x, 1.0f / a.y, 1.0f / a.z, 1.0f / a.w);
# endif
}
ccl_device_inline float4 sqrt(const float4 a)
{
# ifdef __KERNEL_SSE__
return float4(_mm_sqrt_ps(a.m128));
# else
return make_float4(sqrtf(a.x), sqrtf(a.y), sqrtf(a.z), sqrtf(a.w));
# endif
}
ccl_device_inline float4 sqr(const float4 a)
{
return a * a;
}
ccl_device_inline float4 cross(const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
return (shuffle<1, 2, 0, 0>(a) * shuffle<2, 0, 1, 0>(b)) -
(shuffle<2, 0, 1, 0>(a) * shuffle<1, 2, 0, 0>(b));
# else
return make_float4(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x, 0.0f);
# endif
}
ccl_device_inline bool is_zero(const float4 a)
{
# ifdef __KERNEL_SSE__
return a == zero_float4();
# else
return (a.x == 0.0f && a.y == 0.0f && a.z == 0.0f && a.w == 0.0f);
# endif
}
ccl_device_inline float average(const float4 a)
{
return reduce_add(a) * 0.25f;
}
ccl_device_inline float4 normalize(const float4 a)
{
return a / len(a);
}
ccl_device_inline float4 safe_normalize(const float4 a)
{
float t = len(a);
return (t != 0.0f) ? a / t : a;
}
ccl_device_inline float4 fabs(const float4 a)
{
# if defined(__KERNEL_SSE__)
# if defined(__KERNEL_NEON__)
return float4(vabsq_f32(a));
# else
return float4(_mm_and_ps(a.m128, _mm_castsi128_ps(_mm_set1_epi32(0x7fffffff))));
# endif
# else
return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w));
# endif
}
ccl_device_inline float4 floor(const float4 a)
{
# ifdef __KERNEL_SSE__
# if defined(__KERNEL_NEON__)
return float4(vrndmq_f32(a));
# else
return float4(_mm_floor_ps(a));
# endif
# else
return make_float4(floorf(a.x), floorf(a.y), floorf(a.z), floorf(a.w));
# endif
}
ccl_device_inline float4 floorfrac(const float4 x, ccl_private int4 *i)
{
# ifdef __KERNEL_SSE__
const float4 f = floor(x);
*i = int4(_mm_cvttps_epi32(f.m128));
return x - f;
# else
float4 r;
r.x = floorfrac(x.x, &i->x);
r.y = floorfrac(x.y, &i->y);
r.z = floorfrac(x.z, &i->z);
r.w = floorfrac(x.w, &i->w);
return r;
# endif
}
ccl_device_inline float4 mix(const float4 a, const float4 b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float4 mix(const float4 a, const float4 b, const float4 t)
{
return a + t * (b - a);
}
ccl_device_inline float4 saturate(const float4 a)
{
return make_float4(saturatef(a.x), saturatef(a.y), saturatef(a.z), saturatef(a.w));
}
ccl_device_inline float4 exp(float4 v)
{
return make_float4(expf(v.x), expf(v.y), expf(v.z), expf(v.z));
}
ccl_device_inline float4 log(float4 v)
{
return make_float4(logf(v.x), logf(v.y), logf(v.z), logf(v.z));
}
#endif /* !__KERNEL_METAL__*/
ccl_device_inline bool isequal(const float4 a, const float4 b)
{
#if defined(__KERNEL_METAL__)
@@ -449,68 +529,23 @@ ccl_device_inline bool isequal(const float4 a, const float4 b)
#endif
}
#ifdef __KERNEL_SSE__
template<size_t index_0, size_t index_1, size_t index_2, size_t index_3>
__forceinline const float4 shuffle(const float4 &b)
{
# if defined(__KERNEL_NEON__)
return float4(shuffle_neon<__m128, index_0, index_1, index_2, index_3>(b.m128));
# else
return float4(_mm_castsi128_ps(
_mm_shuffle_epi32(_mm_castps_si128(b), _MM_SHUFFLE(index_3, index_2, index_1, index_0))));
# endif
}
template<size_t index_0, size_t index_1, size_t index_2, size_t index_3>
__forceinline const float4 shuffle(const float4 &a, const float4 &b)
{
# if defined(__KERNEL_NEON__)
return float4(shuffle_neon<__m128, index_0, index_1, index_2, index_3>(a.m128, b.m128));
# else
return float4(_mm_shuffle_ps(a.m128, b.m128, _MM_SHUFFLE(index_3, index_2, index_1, index_0)));
# endif
}
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 &b)
{
return float4(_mm_castpd_ps(_mm_movedup_pd(_mm_castps_pd(b))));
}
template<> __forceinline const float4 shuffle<0, 1, 0, 1>(const float4 &a, const float4 &b)
{
return float4(_mm_movelh_ps(a.m128, b.m128));
}
template<> __forceinline const float4 shuffle<2, 3, 2, 3>(const float4 &a, const float4 &b)
{
return float4(_mm_movehl_ps(b.m128, a.m128));
}
# ifdef __KERNEL_SSE3__
template<> __forceinline const float4 shuffle<0, 0, 2, 2>(const float4 &b)
{
return float4(_mm_moveldup_ps(b));
}
template<> __forceinline const float4 shuffle<1, 1, 3, 3>(const float4 &b)
{
return float4(_mm_movehdup_ps(b));
}
# endif /* __KERNEL_SSE3__ */
#endif /* __KERNEL_SSE__ */
#ifndef __KERNEL_GPU__
ccl_device_inline float4 select(const int4 &mask, const float4 &a, const float4 &b)
ccl_device_inline float4 select(const int4 mask, const float4 a, const float4 b)
{
# ifdef __KERNEL_SSE__
# ifdef __KERNEL_SSE41__
return float4(_mm_blendv_ps(b.m128, a.m128, _mm_castsi128_ps(mask.m128)));
# else
return float4(
_mm_or_ps(_mm_and_ps(_mm_castsi128_ps(mask), a), _mm_andnot_ps(_mm_castsi128_ps(mask), b)));
# endif
# else
return make_float4(
(mask.x) ? a.x : b.x, (mask.y) ? a.y : b.y, (mask.z) ? a.z : b.z, (mask.w) ? a.w : b.w);
# endif
}
ccl_device_inline float4 mask(const int4 &mask, const float4 &a)
ccl_device_inline float4 mask(const int4 mask, const float4 a)
{
/* Replace elements of x with zero where mask isn't set. */
return select(mask, a, zero_float4());

Some files were not shown because too many files have changed in this diff Show More