Compare commits

...

284 Commits

Author SHA1 Message Date
6485941d7a Merge branch 'master' into retopo_transform 2022-07-29 13:51:49 -04:00
9b9417b661 Cleanup: Replace reinterpret_cast<> with static_cast<> in UI code 2022-07-29 18:45:12 +02:00
03cd794119 Fix attempt for MSVC build error after 42ccbb7cd1 2022-07-29 18:10:26 +02:00
091156f64a Merge branch 'blender-v3.3-release' 2022-07-29 18:00:50 +02:00
Brecht Van Lommel
cfd16c04f8 Build: hide all symbols except a few required ones on Linux
Instead of specifying which symbols to hide, we hide all and make a few
visible. Some users may be relying on calling internal Blender functions,
but Windows is already hiding all of them and this is just not supported.

Fixes T99900: crash with some third-party Python libraries since OneAPI

Ref T76442

Differential Revision: https://developer.blender.org/D14971
2022-07-29 17:54:32 +02:00
42ccbb7cd1 Cleanup: Move RNA path functions into own C++ file
Adds `rna_path.cc` and `RNA_path.h`.

`rna_access.c` is a quite big file, which makes it rather hard and
inconvenient to navigate. RNA path functions form a nicely coherent unit
that can stand well on it's own, so it makes sense to split them off to
mitigate the problem. Moreover, I was looking into refactoring the quite
convoluted/overloaded `rna_path_parse()`, and found that some C++
features may help greatly with that. So having that code compile in C++
would be helpful to attempt that.

Differential Revision: https://developer.blender.org/D15540

Reviewed by: Brecht Van Lommel, Campbell Barton, Bastien Montagne
2022-07-29 16:56:48 +02:00
187d90f036 Merge branch 'blender-v3.3-release' 2022-07-29 15:33:25 +02:00
1665e40e16 install_deps: Add handling of libaom, update ffmpeg build for it.
Ref T98555.
2022-07-29 15:32:02 +02:00
d3879e9aaa Merge branch 'blender-v3.3-release' 2022-07-29 15:17:40 +02:00
065dfe744c install_deps: bump IMath/OpenEXR to 3.1.5.
Ref T98555.
2022-07-29 15:17:15 +02:00
3a138a74e5 install_deps: add building of Alembic binaries.
Those are used by alembic regression tests.
2022-07-29 15:17:15 +02:00
Tianhao Chai
b862cf0b9f Fix Cycles build error with CUDA on arm64
Checking arm64 assembly support before CUDA/Metal would cause NVCC to
generate inline arm64 assembly.

Differential Revision: https://developer.blender.org/D15569
2022-07-29 14:57:09 +02:00
a679164cf6 Merge branch 'blender-v3.3-release' 2022-07-29 12:25:31 +02:00
ae0b8e904c Fix (unreported) lib-linking of ID properties not taking library parameter.
While this was not a critical issue (that lib pointer is only used for
some kind of sanity check that no linked data uses local ID pointers),
better to keep `IDP_BlendReadLib` in sync with all other lib-linking
code.
2022-07-29 12:25:15 +02:00
b639e60864 Realtime Compositor: Add needed GPU module changes
This patch implements the necessary changes to the GPU module that are
needed by the realtime compositor.

A new function GPU_material_from_callbacks was added to construct a GPU
material from a number of callbacks. A callback to construct the
material graph by adding and linking the necessary GPU material nodes.
And the existing code generator callback. This essentially allows the
construction of GPU materials independent of node trees and without the
need to do any node tree localization.

A new composite source output to the code generator was added. This
output contains the serialization of nodes that are tagged with
GPU_NODE_TAG_COMPOSITOR, which are the nodes linked to the newly added
composite output links.

Two new GPU uniform setters were added for int2 and matrix3 types.

Shader create info now supports generated compute sources.

Shaders starting with gpu_shader_compositor are now considered part of
the shader library.

Additionally, two fixes were implemented. First, GPU setter node
de-duplication now appropriately increments the reference count of the
references resources. Second, unlinked sockets now get their value from
their associated GPU node stack instead of the socket itself.

Differential Revision: https://developer.blender.org/D14690

Reviewed By: Clement
2022-07-29 08:47:52 +02:00
c3ca487498 Render: Propagate view updates to draw engines
Currently, draw engines are not notified of view updates if a render
engine is active and was updated. It is unclear why this is the case
currently, but this behavior was part of the initial commit.

This patch propagates view updates regardless if the update was handled
by an active render engine. This is needed by the realtime compositor as
it implements logic for view updates, which currently does not execute
if Cycles is rendering for instance.

Differential Revision: https://developer.blender.org/D15207

Reviewed By: Brecht
2022-07-29 08:30:51 +02:00
4815772fda Cleanup: quiet warnings in recent BLF and rna_ui changes 2022-07-29 13:48:09 +10:00
e9bd6abde3 BLF: New Font Stack for Better Language Coverage
Replace our existing two fonts with a stack of new fonts to increase
and improve language coverage and to add many new symbols and icons.
Covers glyphs of top 44 languages - 1.5 billion more potential users.

See D10887 for lots of details.

Differential Revision: https://developer.blender.org/D10887

Reviewed by Brecht Van Lommel
2022-07-28 20:09:20 -07:00
c0845abd89 BLF: Fonts with FT_Face Optional
Allow FontBLFs to exist with NULL FT_Face, added only when actually
needed. Speeds up startup and unused fonts are not loaded.

See D15258 for more details.

Differential Revision: https://developer.blender.org/D15258

Reviewed by Brecht Van Lommel
2022-07-28 17:50:34 -07:00
848dd4a40a BLF: Don't Print Empty Strings
Optimize font drawing by skipping empty strings.

See D15472 for more details.

Differential Revision: https://developer.blender.org/D15472

Reviewed by Campbell Barton
2022-07-28 17:28:05 -07:00
e261290cb6 Merge branch 'blender-v3.3-release' 2022-07-28 17:40:42 -05:00
6ca602dd9f Fix T99761: Curves sculpt mode crash with empty curves
The virtual arrays may be null if the curves are empty,
it's simple to just skip the domain interpolation completely.
2022-07-28 17:39:10 -05:00
a9c74a0cd0 Fix set iterator test failure on macOS
This is a quite interesting case, where two arguments to a function are
evaluated in different order on Apple Clang than on GCC and I guess
MSVC. Left a comment on that.
2022-07-28 23:53:33 +02:00
3d91a853b2 Cleanup: Nodes: Store node group idname in tree type
There was already a utility to retrieve the correct node group idname
from the context, `node_group_idname`, but often it's clearer to
use lower-level arguments, or the context isn't accessible.
Storing the group idname in the tree type makes it accessible
without rewriting it elsewhere.
2022-07-28 16:34:17 -05:00
4757a5ad33 Cleanup: Make BKE_idprop.h self sufficient
It relied on uint, which is defined in a separate header.
2022-07-28 16:20:36 -05:00
eea1f9b1df Merge branch 'blender-v3.3-release' 2022-07-28 16:08:36 -05:00
1adeae56e6 Fix: Grammar mistake in info message 2022-07-28 16:08:20 -05:00
5c2fff306e Cleanup: Use LISTBASE_FOREACH macro 2022-07-28 16:02:46 -05:00
72d8a40a3d Cleanup: Use const context argument for UIList callbacks 2022-07-28 16:02:15 -05:00
cf61be6190 Cleanup: Use new IDProperty creation API for geometry ndoes modifier
Use the API from 36068487d0 instead
of the uglier `IDPropertyTemplate` API.
2022-07-28 15:50:39 -05:00
543ea41569 Cleanup: Remove unused node "add and link node" operator
The link drag search from 11be151d58 implements
this now. It was added in 3ebe7d970e but never used.
2022-07-28 15:40:32 -05:00
19528cfecd Merge branch 'blender-v3.3-release' 2022-07-28 21:31:14 +02:00
79ab76e156 Cleanup: simplifications and consistency for vector types
* OneAPI: remove separate float3 definition
* OneAPI: disable operator[] to match other GPUs
* OneAPI: make int3 compact to match other GPUs
* Use #pragma once
* Add __KERNEL_NATIVE_VECTOR_TYPES__ to simplify checks
* Remove unused vector3
2022-07-28 21:27:13 +02:00
fb42c5838c Revert "Fix T98773: GPU Subdivision breaks auto selection in UV edit mode"
This reverts commit e2c02655c7. It was already
reverted in the 3.2 branch, as it caused more serious issues than it solved.

Fixes T99805, T99323, T99296.
2022-07-28 21:20:51 +02:00
d094a3722c Fix wrong post-increment operators & test for BLI containers 2022-07-28 20:45:28 +02:00
9c65af2df0 Merge branch 'blender-v3.3-release' 2022-07-28 21:27:14 +03:00
68db023329 ID namemap: fix missing removal of old name in do_versions_rename_id
Was causing an assert that the old name exists in the name map, but
is not present in the actual database. Reported in #blender-coders
2022-07-28 21:26:30 +03:00
ae89fcfdaf Merge branch 'blender-v3.3-release' 2022-07-28 14:36:49 -03:00
fafb901baa PyDoc: fix 2D builtin shaders documentation
2D shaders require the `vec2` attribute for "pos" (not `vec3`)
2022-07-28 14:36:07 -03:00
2b9d4af261 EEVEE-Next: UI: Make Vector pass greyed out when motion blur is enabled
Also clears the render result to 0 to avoid invalid motion vectors.
2022-07-28 17:01:05 +02:00
53fc9add51 EEVEE-Next: Cleanup: Isolate render result readback and prototype progress
Still not working but the idea is to read the result and display the
first image sample so that user has a better feedback of the
rendering.
2022-07-28 17:01:05 +02:00
1e0aa2612c EEVEE-Next: Motion Blur new implementation
The new implementation leverage compute shaders to reduce the
number of passes and complexity.

The max blur amount is now detected automatically, replacing the property
in the render panel by a simple checkbox.

The dilation algorithm has also been rewritten from scratch into a 1 pass
algorithm that does the dilation more efficiently and more precisely.

Some differences with the old implementation can be observed in areas with
complex motion.
2022-07-28 17:01:05 +02:00
82327ce01d DRW: TextureFromPool: Change API to use acquire / release
This removes the quirk of having to call the sync function for each new
render loop.

# Conflicts:
#	source/blender/draw/engines/eevee_next/eevee_view.cc
2022-07-28 17:00:46 +02:00
0830ff55d8 EEVEE-Next: Fix Vector render pass 2022-07-28 16:58:01 +02:00
aacdaa7b1a Merge branch 'blender-v3.3-release' 2022-07-28 16:32:27 +02:00
ea23e937ce Cleanup/refactor: Readfile: Add dedicated function to insert ID pointers in libmap.
New `oldnewmap_lib_insert` does nothing special, it just wraps around existing
`oldnewmap_insert`, but it's the logical counter part of `oldnewmap_liblookup`.

It also helps tremendously when debuging complex ID pointers issues in
readfile.c code.
2022-07-28 16:29:57 +02:00
f3be8e66d7 Fix (studio-reported) crash in some rare cases in blendfile read code.
Crash would happen when a linked ID would become missing, that was
'pre-declared' and used only once as a 'weak link' in another library
stored before the one it came from.

In that case, the place-holder generated in read code would be freed in
`read_library_clear_weak_links`, when handling its 'owner' library, but
since all previous libraries in the list had already been 'lib_linked'
and their filedata (and related libmap) freed, the update of the libmaps
in `read_library_clear_weak_links` would not apply to data from those
previous libraries, leading to ID pointers there pointing to freed
memory.

This fix should also be backported to 2.93.
2022-07-28 16:29:57 +02:00
69bf74bd76 Merge branch 'blender-v3.3-release' 2022-07-28 16:40:37 +03:00
c49717a824 Fix T100017: OBJ: new importer does not import vertices that aren't part of any face
The Python based importer had a special case handling of "no faces in
the whole file at all", where it ended up treating the whole file
as essentially a point-cloud-like object (just loose vertices, no
faces or edges). The new importer code was missing this special case.

Fixes T100017. Added gtest coverage that was failing without the fix.
2022-07-28 16:39:42 +03:00
Iliay Katueshenock
07e201ec13 Geometry Nodes: add assert to check if node supports lazyness
Only nodes supporting lazyness can mark inputs as unused. For other
nodes, this is done automatically of all outputs are unused.

Differential Revision: https://developer.blender.org/D15409
2022-07-28 13:39:40 +02:00
d892f96cb1 Cleanup: Fix typo in comment 2022-07-28 12:52:20 +02:00
d034c28f51 Merge branch 'blender-v3.3-release' 2022-07-28 11:42:46 +02:00
ccb9d5d307 Curves: enable density brush when first entering curves sculpt mode
Previously, no tool was selected, which was a bug.
2022-07-28 11:41:36 +02:00
aa7d130347 Curves: improve handling of empty surface meshes 2022-07-28 11:37:35 +02:00
6ae9565d06 Cleanup: quiet GCC stringop-overflow warning 2022-07-28 16:08:59 +10:00
d41f0c7b15 Cleanup: unused header 2022-07-28 16:01:29 +10:00
a98102e32e Merge branch 'blender-v3.3-release' 2022-07-28 09:39:57 +10:00
8d4fa03e5c BLI_math: improve symmetrical values from sin_cos_from_fraction
When plotting equally distant points around a circle support an extra
axis of symmetry so twice as many exact values are repeated than
originally added in [0], see code-comments for a detailed explanation.
Tests to ensure accuracy and exact symmetry have been added too.

Follow up on fix for T87779.

[0]: 087f27a52f
2022-07-28 09:39:54 +10:00
397731d4df BLI_math: improve symmetrical values from sin_cos_from_fraction
When plotting equally distant points around a circle support an extra
axis of symmetry so twice as many exact values are repeated than
originally added in [0], see code-comments for a detailed explanation.
Tests to ensure accuracy and exact symmetry have been added too.

Follow up on fix for T87779.

[0]: 087f27a52f
2022-07-28 09:34:46 +10:00
ff048f5d27 Curves: Avoid virtual function overhead when finding selected curves
This showed up on a profile of sculpting with the comb brush.
Use a span instead of a virtual array.
2022-07-27 15:41:32 -05:00
165fa9e2a1 Merge branch 'blender-v3.3-release' 2022-07-27 21:26:18 +02:00
38af5b0501 Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX
Simplifies intersection code a little and slightly improves precision regarding
self intersection.

The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
2022-07-27 21:03:33 +02:00
69f2732a13 Cleanup: remove unnecessary bvh_instance_motion_pop 2022-07-27 21:02:21 +02:00
cd47d1b2ed Fix broken BVH2 on CPU after recent changes
Runtime switching between Embree and BVH2 got lost.
2022-07-27 20:58:02 +02:00
eee25a175a Merge branch 'master' into retopo_transform 2022-07-27 14:31:45 -04:00
55fb2abc81 Curves: Bring back parallel copying of curve and point attributes
This was removed in cacdea7f4a to fix a bug, but copying point
and curve attributes should be fine as long as the attribute arrays are
retrieved before-hand.

Differential Revision: https://developer.blender.org/D15541
2022-07-27 11:52:11 -05:00
0dcfd93c6e Fix: curves edit hints not propagated in Join Geometry node
Found while investigating why crazy-space editing didn't work in T100026.
2022-07-27 18:38:45 +02:00
6e5eb46d73 Fix T100026: crash with zero-sized attributes
The problem was that zero-sized and non-existant attributes were
handled the same in some parts of the attribute API, which led to
unexpected behavior.

The solution is to properly differentiate the case when an attribute
does not exist and when it is just empty (because the geometry
is empty).

Differential Revision: https://developer.blender.org/D15557
2022-07-27 18:20:22 +02:00
d6b970dd7b Merge branch 'blender-v3.3-release' 2022-07-27 18:07:29 +02:00
84a3ff63d0 Fix: missing evaluated offsets in Resample Curve node
Differential Revision: https://developer.blender.org/D15556
2022-07-27 18:05:31 +02:00
84272ce19a Fix: add missing return
It was correct but less efficient without this early return.
2022-07-27 17:54:49 +02:00
5560da7ceb Revert "Blender 3.3 splashscreen"
This reverts commit d61ab45385.
2022-07-27 17:32:21 +02:00
37ebd66570 Revert "Blender 3.3 - Beta"
This reverts commit 32a9aac3b8.
2022-07-27 17:32:05 +02:00
3b71a62390 Merge branch 'blender-v3.3-release' 2022-07-27 17:31:49 +02:00
d61ab45385 Blender 3.3 splashscreen
Credits: Piotr Krynski
2022-07-27 17:25:56 +02:00
b2dd1f8f01 Fix build include for rna_curves.c
* Since curves are no longer experimental, this should be included at any time.
2022-07-27 17:19:15 +02:00
32a9aac3b8 Blender 3.3 - Beta
* BLENDER_VERSION_CYCLE set to beta
* Update pipeline_config.yaml to point to 3.2 branches and svn tags
* Update and uncomment BLENDER_VERSION in download.cmake
2022-07-27 17:14:21 +02:00
9015952c9c Blender 3.4 Alpha: Start of new release cycle. 2022-07-27 16:53:19 +02:00
83362f87bb Blender 3.3: Finalizing version bump. 2022-07-27 16:33:49 +02:00
415f88d8b0 Fix wrong fileversion usage in own recent rB9ac81ed6abfb. 2022-07-27 16:20:50 +02:00
ea4b1d027d Geometry Nodes: Rename "Field on Domain" to "Interpolate Domain"
This name doesn't require understanding of fields, and
is phrased as an action which is consistent with other nodes.
Discussed in the latest geometry nodes sub-module meeting.
2022-07-27 08:56:17 -05:00
Erik Abrahamsson
c8ae1fce60 Geometry Nodes: Shortest Paths nodes
This adds three new nodes:
* `Shortest Edge Paths`: Actually finds the shortest paths.
* `Edge Paths to Curves`: Converts the paths to separate curves.
  This may generate a quadratic amount of data, making it slow
  for large meshes.
* `Edge Paths to Selection`: Generates an edge selection that
  contains all edges that are part of a path. This can be used
  with the Separate Geometry node to only keep the edges that
  are part of a path. For large meshes, this approach can be
  much faster than the `Edge Paths to Curves` node, because
  less data is created.

Differential Revision: https://developer.blender.org/D15274
2022-07-27 15:38:44 +02:00
9ac81ed6ab Fix corrupted blend files after issues from new name_map code.
Add a version of #BKE_main_namemap_validate that also fixes the issues,
and call it in a do_version to fix recent .blend files saved after the
regression introduced in rB7f8d05131a77.

This is mandatory to fix some production files here at the studio, among
other things.
2022-07-27 15:33:29 +02:00
9f53272df4 Fix more issues with new name map and liboverrides.
Follow-up to rB13e17507c069, forgot to handle shapekeys...
2022-07-27 15:33:29 +02:00
58dcd20998 ID namemap: Fix more issues when changing libs.
Fix tests, and some issue when making an ID local.

There are probably a few more issues still though.
2022-07-27 15:33:29 +02:00
Amelie Fondevilla
4843b161d6 Fix T99870 : Prevents crash when rearranging channels in dopesheet
The function to rearrange channels only works for F-curves channels for now, adding the `FCURVESONLY` filter prevents the function to be called for grease pencil channels, thereby fixing the crash.

Reviewed by : sybren
Differential Revision: http://developer.blender.org/D15504
2022-07-27 11:40:44 +02:00
7324f32a94 ID namemap tests: Use consistency check, fix an issue.
Massively use the new consistency check in namemap regression tests, and
fix an issue with library data tests revealed by those checks.
2022-07-27 11:22:48 +02:00
18dc611b40 ID namemap: Add check for consistency.
Add a util function to check that content of a given Main and the
namemaps in it are consistent.

Add some asserts calling this check after file read, and after some
override operations.
2022-07-27 11:22:48 +02:00
1e55b58e4f UI: Sort tools in curves sculpting mode
The previous order was based on the order of when the tools were
developed. Instead we now cluster them based on similar functionality:

* Selection
* Add/Remove
* Deform/Transform
* Annotation

Done in collaboration with Pablo Vazquez.
2022-07-27 11:15:48 +02:00
13e17507c0 Fix crashes due to non-uniqueness in ID names in some cases.
Liboverrides are doing some very low-level manipulation of IDs in apply
code, to reduce over-head of name and sorting handling.

This requires specific care to ensure thatr the new namemap runtime data
remains up-to-date and valid. Otherwise, names of existing IDs would be
missing from the map, which would later lead to having several different
IDs with the same name. Critical corruption in Blender ID management.

Reported by animators at the Blender studio.

Regression from rB7f8d05131a77.
2022-07-27 11:10:45 +02:00
4dd409a185 Fix T99976: Animated visibility not rendering properly in viewport
A mistake in the 0dcee6a386 which made specific driven visibility
to work, but did not properly handle actual time-based visibility.

The basic idea of the change is to preserve recalculation flags of
nodes which were tagged for update but were not evaluated due to
visibility constraints. In the file from the report this makes it
so tagging which is done first time ID is in the dependency graph
are handled when the ID actually becomes visible. This is what
solved the root of the problem from the report: there was missing
geometry update since it was "swallowed" by the evaluation during
the object being invisible. In other configurations this change
allows to handle pending geometry updates due to animated modifiers
be handled when object becomes visible without time change.

This change also solves visibility issue of the synchronization
component which also started to be handled badly since the
previous fix attempt. Basically, the needed exception in its
visibility handling did not happen and a regular logic was used
for it.

Tested with files from the T99733, T99976, and from the Heist
project.

Differential Revision: https://developer.blender.org/D15544
2022-07-27 10:19:42 +02:00
d706d0460c Cycles oneAPI: simplify num_concurrent_states selection
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
2022-07-27 09:45:33 +02:00
38e270ae30 Cleanup: Move wm_dragdrop.c to C++ 2022-07-26 23:15:33 -05:00
e67710b908 deps/oiio: fix build issue on windows
tiff now outputs tiffd.lib for debug builds
oiio was not informed about this and had
a build error because of it.
2022-07-26 20:59:44 -06:00
f43a8835dc deps/alembic: add missing imath dependency
if alembic builds before imath it'll cause a build error.
2022-07-26 20:54:27 -06:00
Jun Mizutani
2ca18e78f9 Sculpt: Remove debug printf
Reviewed By: Joseph Eagar
Differential Revision: D15547
Ref D15547
2022-07-26 14:25:58 -07:00
b75d0c7e7a Geometry Nodes: Implement link drag search for two nodes
It was never added for the field on domain and field at index nodes.
They need special handling because they have many what should be
a multi-type socket declaration.
2022-07-26 16:12:42 -05:00
f14f81e5ac Nodes: Allow using escape key to exit node resizing 2022-07-26 16:03:43 -05:00
faa0c7aa6f Cleanup: Move mesh_tessellate.c to C++ 2022-07-26 14:41:34 -05:00
Falk David
88f0d483bd Python: Expose property to mute action groups
This patch adds a `mute` RNA property on `ActionGroup`s that allows them to be easily muted/unmuted from python.
This uses the existing `AGRP_MUTED` flag which was also accessible from the user interface.

Reviewed By: sybren

Differential Revision: https://developer.blender.org/D15329
2022-07-26 21:32:05 +02:00
8571093f99 GPencil: Small UI change in overlay for consistency
To keep consistency is better add the word `Inactive` for `Fade Layers` and `Fade Objects`  to keep the same naming used in other areas of the overlay panel.

Reviewed by: Matias Mendiola
2022-07-26 16:39:17 +02:00
2b8e35eeb0 Fix T99984: Small GPencil overlay UI bugs in Edit Mode
This commit fixes the opacity for curves hiding the option.

Actually, the curve points and handles drawing is using the same code that mesh curves and the opacity is not supported. While this feature will be added for mesh curves and gpencil, now it's better to hide this option.

Reviewed: Matias Mendiola

Note: The handle problem reported in this task was fixed in  a separated commit: 203e7ba332
2022-07-26 16:34:27 +02:00
4f33dcff78 Merge branch 'active_modal_operator' into retopo_transform 2022-07-26 10:00:23 -04:00
fd39da1df6 Merge branch 'master' into retopo_transform 2022-07-26 10:00:14 -04:00
dbdab681cf Added operator to determine if operator is actively modal
Differential Revision: https://developer.blender.org/D15546
2022-07-26 09:53:25 -04:00
66c6cf0d71 Added operator to determine if operator is actively modal 2022-07-26 09:50:35 -04:00
1998269b10 Refactor: Extract color attributes as generic attributes
Previously there was a special extraction process for "vertex colors"
that copied the color data to the GPU with a special format. Instead,
this patch replaces this with use of the generic attribute extraction.
This reduces the number of code paths, allowing easier optimization
in the future.

To make it possible to use the generic extraction system for attributes
but also assign aliases for use by shaders, some changes are necessary.
First, the GPU material attribute can now store whether it actually refers
to the default color attribute, rather than a specific name. This replaces
the hack to use `CD_MCOL` in the color attribute shader node. Second,
the extraction code checks the names against the default and active
names and assigns aliases if the request corresponds to a special active
attribute. Finally, support for byte color attributes was added to the
generic attribute extraction.

Differential Revision: https://developer.blender.org/D15205
2022-07-26 08:37:38 -05:00
5945a90df9 Fix T98788: bad first curve tangent when first points have same position 2022-07-26 15:25:59 +02:00
72f77598a2 Fix T98798: tag collection geometry when changing instance offset
Changing the instance offset moves the entire "collection geometry".
So other features that depend on the geometry should be reevaluated.
2022-07-26 14:59:40 +02:00
ac1554bcf6 Fix T98982: cannot change default value of some node group input types 2022-07-26 14:49:12 +02:00
5aba7f9774 Geometry Nodes: Hide value button for field at index node
Changing the value doesn't accomplish anything, since the retrieved
value would be the same for every index then. So it's best to hide it
to make the node clearer.
2022-07-26 07:42:32 -05:00
78b7140b02 deps: update TIFF and OpenEXR
* OpenEXR 3.1.4 -> 3.1.5, this fixes several issues OSS fuzz found.
* libtiff 4.3.0 -> 4.4.0, this fixes several CVE's.

This also converts the harvest of libtiff on windows to a post install handler,
there's a few left but Windows is getting close to being harvest free.

Differential Revision: https://developer.blender.org/D15478
2022-07-26 13:25:58 +02:00
aa788b759a deps: FFmpeg vpx/aom-av1 updates
This is a refresh of our current FFmpeg 5.0.0 (unchanged) version with the
following changes:

* libvpx all platforms: enable SSE3/4/AVX/AVX2 instruction sets. libvpx has a
  proper CPUID check in place and will not call the faster kernels unless it is
  sure the CPU supports it. So we can safely enable this, this partially
  resolves T95743 (completely on Linux and macOS).

* libvpx Windows - threading was disabled due to a shared dependency on
  libwinpthreads.dll which we prefer not to distribute. However when configure
  cannot find pthreads it will happily fall back on a win32 threads based
  emulation layer. This also resolves the final part of T95743.

* libaom-av1 - new dependency required for D14920, this is a somewhat odd
  dependency, it's cmake based, but still needs the perl environment setup, so
  we have to setup the env and call cmake our selves for the configure, build
  and install commands. This dep has the same libwinpthreads issue as vpx on
  Windows, however since it's cmake based, it's easier to prevent cmake from
  detecting it.

Differential Revision: https://developer.blender.org/D15399
2022-07-26 13:25:58 +02:00
caf907626d Fix T99271: modifier errors are not cleared 2022-07-26 13:01:30 +02:00
c5712c6795 Fix T99373: add some padding in spreadsheet vector columns
This improves readability in some cases (e.g. in T99373).
2022-07-26 12:36:44 +02:00
Nate Rupsis
b08c5381ac default N-panel open for animation editors
The Graph, Driver, and Dopesheet's (and sub modes) properties panel
(N-Panel) are now open by default. This includes the editors in the
default Animation workspace.

Note that, because the Timeline is implemented as a special mode of the
Dopesheet, switching between Timeline and Dopesheet will *not* change
the visibility of the properties panel.

Maniphest Tasks: T97980

Differential Revision: https://developer.blender.org/D14910
2022-07-26 11:43:05 +02:00
e4a779264c Partially revert "Build: Fix build of library dependencies on Linux aarch64"
This reverts the Flex-related parts of commit
rBef268c78933079137288e326704431432adf9ad9, as those caused a build
error on CentOS 7 (which is used for the precompiled Linux libraries).

CentOS 7 only has Automake 1.13, whereas after this commit version 1.15
seems to be required.

Since in its patch description (D15319) it's mentioned that this
"probably doesn't warrant changing", and it's actually blocking the
build of the precompiled libraries for Blender 3.3 now, I'll revert the
Flex-related part of the commit.
2022-07-26 11:43:05 +02:00
Iliay Katueshenock
c94ca54cda BLI: add use_threading parameter to parallel_invoke
`parallel_invoke` allows executing functions on separate threads.
However, creating tasks in tbb has a measurable amount of overhead.
Therefore, it can be benefitial to disable parallelization when
the amount of work done per function is small.

See D15539 for some benchmark results.

Differential Revision: https://developer.blender.org/D15539
2022-07-26 11:10:16 +02:00
203e7ba332 GPencil: Update curve handle display after change overlay option
The handles were not updated after changing the settings.

This is a partial fix of T99984
2022-07-26 11:07:28 +02:00
c597d6cb64 Fix T99979: GPencil strokes cannot be edited after set origin
The stroke points were changed but the bounding box calculation was not done and this produced a problem in any bounding box check done by different tools.
2022-07-26 10:53:08 +02:00
c869f54dcb Cleanup: Typo in comments: data-lock -> data-block. 2022-07-26 10:00:20 +02:00
bdb4ebebf1 Cleanup: quiet GCC cast-function-type warnings for gflags 2022-07-26 14:47:12 +10:00
8ab91edd91 Merge branch 'master' into retopo_transform 2022-07-26 00:25:37 -04:00
3ac5a52d6e Retopology Snapping Mode now working 2022-07-26 00:24:37 -04:00
c3bc53162a Cleanup: format 2022-07-26 13:23:45 +10:00
f1f89ca751 Cleanup: spelling in comments 2022-07-26 13:21:21 +10:00
3ae85a0d8f Fix Python SystemExit exceptions silently exiting
Any script that raised a SystemExit called by --python, --python-expr
command line args or by executing the text block would exit without
printing a message. This caused the error from T99966 to be hidden.

Add explicit handling for SystemExit to ensure the message is always
shown before exiting.

More details noted in code-comments.
2022-07-26 13:21:15 +10:00
37ad72ab23 Fix T99966: Python API docs fail to generate
The recent addition of "active_action" [0] required updating in the
API docs type information.

[0]: cd21022b78
2022-07-26 12:51:01 +10:00
4cf6524731 Fix Cycles Metal build errors after recent changes
float8 is a reserved type in Metal, but is not implemented. So rename to
float8_t for now.

Also move back intersection handlers to kernel.metal, they can't be in the
class that encapsulates the other Metal kernel functions.
2022-07-26 00:17:37 +02:00
f76a2c0d18 Fix: Fix attribute writer debug warnings in terminal
Use an imperfect solution, since this code will be replaced soon anyway.
2022-07-25 16:06:28 -05:00
462f99bf38 Sculpt: Fix T99779, pbvh gets wrong active vertex for multires
The recent multires winding fix missed a code branch.
2022-07-25 11:53:48 -07:00
fb9f12eeec UI: Nishita sky: Increase Sun Elevation UI sensitivity and remove min/max
This now use default angle precision which matches the sun rotation.
Feeling is much more natural.
2022-07-25 19:26:12 +02:00
46dbfce7fc Cycles: Nishita Sky: Fix sun disk imprecision for large elevation
The issue was introduced by rBad5e3d30a2d2 which made possible to use
unbounded elevation angle.

In order to not touch the shading code, we just remap the value to the
expected range the shading code expects. This means that elevation angles
above +/-PI/2 effectively flip the sun rotation angle.
2022-07-25 19:26:12 +02:00
703dff333c Fix T99459: GPencil: Fill tool on the surface not in the correct place
There is a 1 pixel error in the size registered for the buffer
dimensions.

NOTE: This issue indicates that the texture scale is different from the
region, so the mouse-based coordinates used are actually misaligned.
This misalignment will be fixed in another commit.

Regression probably introduced in rB1d49293b8044 + rB45f167237f0c8
2022-07-25 14:13:48 -03:00
00a3533429 Curves: Unify poll functions, add message with no surface
The "snap to surface" operators now have "disabled" poll messages
when there is no surface object.

The implementation in most curves operators is also unified.
The goal is to avoid having to define and use the poll failure messages
in multiple places, to reduce the boilerplate that tends to be
necessary to add an operator, and to increase the likelihood that
operators are implemented with proper poll messages.

Differential Revision: https://developer.blender.org/D15528
2022-07-25 11:59:33 -05:00
739136caca Fix: Assert in resample curve node with single point curve 2022-07-25 11:53:22 -05:00
f26aa186b2 Cleanup: remove __KERNEL_CPU__
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
2022-07-25 17:43:35 +02:00
Andrii Symkin
793d203139 Cycles: add math functions for float8
This patch adds required math functions for float8 to make it possible
using float8 instead of float3 for color data.

Differential Revision: https://developer.blender.org/D15525
2022-07-25 17:36:58 +02:00
60a8ade18a Merge branch 'master' into retopo_transform 2022-07-25 10:38:16 -04:00
d57ce54e30 UX-related tweaks 2022-07-25 10:35:38 -04:00
7a74d91e32 Cleanup: move device BVH code to kernel/device/*/bvh.h
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
2022-07-25 16:34:22 +02:00
Arye Ramaty
c6ce70855a Geometry Nodes: Add node descriptions/tooltips
This commit adds tooltips to the geometry nodes add menu.

Differential Revision: https://developer.blender.org/D15414
2022-07-25 08:56:24 -05:00
881ef0548a Fix wrong Cycles SSS intersection distance after ray distance changes
No need anymore to have a difference between CPU/GPU, all distances
remain in world space.
2022-07-25 15:19:29 +02:00
484ad31653 Cycles: simplify handling of ray distance in GPU rendering
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.

There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
2022-07-25 13:27:40 +02:00
023eb2ea7c Cycles: more closely match some math and intersection operations in Embree
This helps with debugging, and gives a slightly closer match between CPU
and CUDA/HIP/Metal renders when it comes to ray tracing precision.
2022-07-25 13:27:40 +02:00
d567785658 Fix T99816: renaming attribute works incorrectly
This fixes two issues:
* There was a crash when the new attribute name was empty.
* The attribute name was incremented (e.g. "Attribute.001") when
  the old and new name were the same.
2022-07-25 13:16:59 +02:00
332d547ab7 Fix T99850: incorrect tangents on evaluated bezier curves
Cyclic curves don't need the tangent correction based on the first
and last handle position.
2022-07-25 13:10:34 +02:00
b9e66af686 Fix T99851: Subdivide Curve node does not initialize attributes of end point 2022-07-25 12:45:04 +02:00
2c81b4d4cf Fix T99880: no node timing for frames in node groups 2022-07-25 12:27:45 +02:00
5feb3541f4 Fix T99889: Fillet Curve node uses wrong radius 2022-07-25 12:20:54 +02:00
Ramil Roosileht
cf9dd3c0d8 Fix T99036: hex color in "Add Color Attribute"
Proposed solution by @scurest The color attribute in the RNA was tagged as
 COLOR_GAMMA. This change will change it to a regular COLOR.

{F13217692}

Reviewed By: joeedh, jbakker

Maniphest Tasks: T99036

Differential Revision: https://developer.blender.org/D15272
2022-07-25 12:15:27 +02:00
6f1cdcba85 Fix T99929: lattice modifier looks up vertex group index in wrong place
It looked up the vertex group index based on the object instead of the
actual mesh that is currently used. Since geometry nodes, the number
and order of attributes can change in arbitrary ways during evaluation.
Therefore, this index has to be looked up on the mesh which contains
the most up-to-date information.

There are probably similar issues in other modifiers. That has to be
fixed step by step. Ideally by using the attribute api directly eventually.
2022-07-25 11:54:49 +02:00
c5afef1224 Fix missing disabled hint when dragging from Asset Browser in edit mode
When dragging assets into the 3D View while in any other mode than
object mode, dropping would be disabled and the cursor would indicate
that. However there was supposed to be an "Only supported in object
mode" message, that similar operators showed, but got forgotten when
this one was introduced.
2022-07-25 11:44:56 +02:00
1c05f30e4d Curves: add warning when invalid uv map is used when adding curves
UV maps that are used for surface attachment must not have overlapping
uv islands, because then the same uv coordinate would correspond to
multiple surface positions.

Ref T99936.
2022-07-25 11:42:27 +02:00
53113a2e57 Geometry: detect when the sample uv is in multiple triangles 2022-07-25 11:32:39 +02:00
c5394f3db8 EEVEE-Next: Fix float3 passes being incorrect 2022-07-25 11:25:24 +02:00
f814871e81 EEVEE-Next: Fix some Material compilation errors 2022-07-25 11:25:24 +02:00
47d1a7484c EEVEE-Next: Display compatible properties panels
Only a few are kept not available as their features are not yet supported.
2022-07-25 11:25:24 +02:00
cd9ebc816e Fix build error with WITH_CYCLES_KERNEL_NATIVE_ONLY on macOS Arm
-march=native is not supported for all architectures.
2022-07-25 11:23:25 +02:00
72fb92ded8 Fix T99961: crash when spreadsheet shows volume grids 2022-07-25 11:22:14 +02:00
cacdea7f4a Fix: crash when accessing attributes from multiple threads
Calling two non-const methods on a `MutableAttributeAccessor`
at the same time in multiple threads is not safe.

While I don't know what caused the crash here exactly, I do know
that it happens while looking up the attribute for writing, which
may modify the unterlying geometry. I couldn't reproduce the
bug with a debug build or without threading.
2022-07-25 11:14:42 +02:00
Alex Parker
44258b5ad0 Undo: Improve image undo performance
When texture painting a lot of time is spent in ED_image_paint_tile_find.
This fixes stores the PaintTiles in a blender::Map making ED_image_paint_tile_find an O(1) rather than O(n) operation.

When using threading the locking should happen during read as well,
still this gives a boost in performance as the read is now much faster.

Reviewed By: jbakker

Maniphest Tasks: T99546

Differential Revision: https://developer.blender.org/D15415
2022-07-25 08:14:32 +02:00
7808ee9bd7 Geometry Nodes: Improve UV Sphere primive performance
In a test producing 10 million vertices I observed a 3.6x improvement,
from 470ms to 130ms. The largest improvement comes from calculating
each mesh array on a separate thread. Besides that, the larger changes
come from splitting the filling of corner and face arrays, and
precalculating sines and cosines for each ring.

Using `parallel_invoke` does gives some overhead. On a small 32x16
input, the time went up from 51us to 74us. It could be disabled
for small outputs in the future. The reasoning for this parallelization
method instead of more standard data-size-based parallelism is that the
latter wouldn't be helpful except for very high resolution.
2022-07-24 20:03:16 -05:00
Lukas Stockner
6db059e3d7 Render: Update lightgroup membership in objects and world if lightgroup is renamed
As discussed, this only updates objects in and the world of the scene to which the view layer belongs, which also avoids the problem of not having a BMain available.

Differential Revision: https://developer.blender.org/D14740
2022-07-24 21:33:04 +02:00
f7d5aaa365 Alembic: speed up edge crease import
The Alembic importer uses a linear search over the mesh edges to find
the right edge when setting edge creases. Although the complexity is
`O(m * n)`, with `m` being the number of creased edges, and `n` being
the number of edges, this can lead to a quadratic complexity as `m`
approches `n`.

This patch uses `EdgeHash` to store and retrieve the edges, which
should bring complexity closer to `O(n)`, provided that lookup is
`O(1)`.

See differential for some timings. In most files, this is expected
to give at least a 2-3x speedup for this operation, but can lead
orders of magnitude speed increase for dense meshes with a significant
number of edge creases.

Differential Revision: https://developer.blender.org/D15521
2022-07-24 21:18:11 +02:00
d26c29d8e4 Fix T98367: Light group passes do not work when shadow catcher is used 2022-07-24 20:36:46 +02:00
31365c6b9e Attributes: Use new API for C-API functions
Use the C++ API to implement more of the existing C functions.
This corrects the cases where one tries to add a builtin attribute
with the wrong domain or type on curves, though a better warning
message would be helpful in the future, and also reduces duplication
of the internal logic. Not much more is possible without changing
the interface.
2022-07-24 12:46:28 -05:00
ad632a13d9 EEVEE-Next: Decorelate Large filter spiral sampling
This avoids correlation artifacts with the jitter pattern itself.
Also try to reduce the visible spiral pattern.
2022-07-24 19:24:50 +02:00
b1c49b3b2a EEVEE-Next: Fix depth accumulation and stability in viewport
The display depth is used to composite Gpencil and Overlays. For it to
be stable we bias it using the dFdx gradient functions. This makes
overlays like edit mode not flicker.

The previous approach to save the 1st center sample does not work anymore
since we jitter the projection matrix in a looping pattern when scene
is updated. So the center depth is only (almost) valid 1/8th of the times.
The biasing technique, even if not perfect, does the job of being stable.

This has a few cons:
- it makes the geometry below the ground plane unlike workbench engine.
- it makes overlays render over geometry at larger depth discontinuities.
2022-07-24 19:24:50 +02:00
a5bcb4c148 EEVEE-Next: Make animated viewport non jittered when disabling denoising 2022-07-24 19:24:50 +02:00
68101fea68 EEVEE-Next: Add back background opacity toggle 2022-07-24 19:24:50 +02:00
8ac5b1fdb3 EEVEE-Next: Make Anti-Flicker more strong
This might make the image a bit blurier but it reduces the flickering of
shiny surfaces during animation.

This uses the technique described in "High Quality Temporal Supersampling"
by Brian Karis at Siggraph 2014 (Slide 45): Reduce the exponential factor
when the history is close the bounding box border.
2022-07-24 19:24:50 +02:00
bd9bb56f18 EEVEE-Next: Fix Alt+B render borders
A few offsets were missing.
Reminder that this does not change the actual render resolution but it
reduces the VRAM consumption of accumulation buffers.
2022-07-24 19:24:50 +02:00
364babab65 EEVEE-Next: Fix background velocity 2022-07-24 19:24:50 +02:00
0fcc04e7bf Cleanup: Fix off-by-half-errors with udim search 2022-07-24 14:48:30 +12:00
f1f2c26223 Cleanup: Simplify uv sculpt tool
No functional changes.
2022-07-24 13:48:53 +12:00
c94c0d988a Fix: Removing attributes from UI invalidates caches
Use the new attribute API to implement the attribute remove function
used by RNA, except for BMesh attributes. Currently, removing curve
attributes from the panel in the property editor does not mark the
relevant caches dirty (for example, the cache of curve type counts),
because that behavior is implemented with the new attribute API.
Also, eventually we want to merge the two APIs, and removing an
attribute is the first function that can be partially implemented
with the new API.

Differential Revision: https://developer.blender.org/D15495
2022-07-23 19:59:59 -05:00
0c3851d31f EEVEE-Next: Film: Rename filter_size for clarity and add box filter ...
... as a debug option.
2022-07-23 22:57:10 +02:00
3ea2b4ac31 EEVEE-Next: Film: Fix incorrect anti-aliasing
There was a confusion about what space the offset was in.
2022-07-23 22:57:10 +02:00
7c6d546f3a Fix an assert trip in boolean tickled by D11272 example.
The face merging code in exact boolean made an assumption that
the tesselated original face was manifold except at the boundaries.
This should be true but sometimes (e.g., if the input faces have
self-intersection, as happens in the example), it is not.
This commit makes face merging tolerant of such a situation.
It might leave some stray edges from triangulation, but it should
only happen if the input is malformed.
Note: the input may be malformed if there were previous booleans
in the stack, since snapping the exact result to float coordinates
is not guaranteed to leave the mesh without defects.

This is the second try at this commit. The previous one had a typo
in it -- luckily, the tests caught the problem.
2022-07-23 12:15:59 -04:00
d53ea1d0af Fix T99905: wrong toposort when the node tree is cyclic 2022-07-23 14:37:58 +02:00
092732d113 IO: speed up import of large amounts of objects in USD/OBJ by pre-sorting objects by name
Previously, when creating "very large" (tens-hundreds of thousands)
amounts of objects, the Blender code that was ensuring name
uniqueness was the bottleneck. That got recently addressed (D14162),
however now sorting of IDs by their names is the remaining bottleneck.

Name sorting code in Blender is optimized for the pattern where names
are inserted in already sorted order (i.e. objects expect to get added
near the end of the list). By doing this pre-sorting of objects
intended to get created by an importer (USD and OBJ, in this patch),
this sorting bottleneck can be largely removed, especially with very
high object counts.

Windows, Ryzen 5950X, import times:

- OBJ, splash screen scene (26k objects): 22.0s -> 20.7s
- USD, Disney Moana scene (250k objects): 585s -> 82.2s (10 minutes -> 1.5 minutes)

Reviewed By: Michael Kowalski, Howard Trickey
Differential Revision: https://developer.blender.org/D15506
2022-07-23 15:16:14 +03:00
beb746135d Fix T99830: missing update after reordering node group sockets 2022-07-23 13:30:15 +02:00
5da807e00f Fix: Store Named Attribute node not working when attribute did not exist 2022-07-23 12:14:45 +02:00
fc8b9efb24 Update RNA to User manual mappings 2022-07-22 21:00:34 -04:00
82467e5dcf Cleanup: Typo with uv sphere normal creation
Regression from 087f27a52f
2022-07-23 09:12:13 +12:00
80b2fc59d1 Fix T99873: Use evaluated vertex groups in armature modifier
Geometry nodes has added the ability to modify mesh vertex groups
during evaluation (see 3b6ee8cee7). However, the armature
modifier always uses the vertex groups from the original object.
This is wrong for the modifier stack, where each modifier is meant
to use the output of the previous.

This commit makes the armature modifier use the evaluated vertex groups
if they are available. Otherwise it uses the originals like before.

Differential Revision: https://developer.blender.org/D15515
2022-07-22 15:49:53 -05:00
7d8b651268 EEVEE-Next: Add exposure awareness to denoising
This uses the exposure to get a better approximation of the perceptual
brighness of a sample before accumulating it.

Note that we do not modify exposure of the image. Only the samples weights
are computed differently.
2022-07-22 21:03:06 +02:00
676a2f690c EEVEE-Next: Fix render not working
The swaps during accumulation were ignored because of the way the
`SwapChain<>` implementation works.

Using external references and updating them fixes the issue.
2022-07-22 20:32:17 +02:00
35843ddcd8 Fix T99835: Incorrect title case for two node names 2022-07-22 11:36:55 -05:00
98395e0bdf Cleanup: Use r_ prefix for boolean return parameters
Also rearrange some lines to simplify logic.
2022-07-22 10:49:09 -05:00
a735b2c335 Merge branch 'master' into retopo_transform 2022-07-22 11:41:35 -04:00
73aa6b8185 snap menu says "Tool" when retopo tool is active 2022-07-22 11:39:19 -04:00
c40971d79a Fix T99873: Store named attribute node cannot write to vertex groups
Since fd5e5dac89, the node would remove the attribute before
adding it again, which lost the vertex group status of an attribute,
meaning they were written as arbitrary attributes.

Now, the node first tries to write to attributes with the same domain
and data-type, which covers the vertex group case. Then it falls back
to removing the attribute and adding it again. Even that can fail
though, so I added an error message to make that a bit clearer.

Differential Revision: https://developer.blender.org/D15514
2022-07-22 10:31:40 -05:00
e4eaf424b9 Fix nodes not transforming
Error in {rB98bf714b37c1}
2022-07-22 12:17:22 -03:00
6bcda04d1f Geometry Nodes: Port sample curves node to new data-block
Use the newer more generic sampling and interpolation functions
developed recently (ab444a80a2) instead of the `CurveEval` type.
Functions are split up a bit more internally, to allow a separate mode
for supplying the curve index directly in the future (T92474).

In one basic test, the performance seems mostly unchanged from 3.1.

Differential Revision: https://developer.blender.org/D14621
2022-07-22 09:59:28 -05:00
1f94b56d77 Curves: support sculpting on deformed curves
Previously, curves sculpt tools only worked on original data. This was
very limiting, because one could effectively only sculpt the curves when
all procedural effects were turned off. This patch adds support for curves
sculpting while looking the result of procedural effects (like deformation
based on the surface mesh). This functionality is also known as "crazy space"
support in Blender.

For more details see D15407.

Differential Revision: https://developer.blender.org/D15407
2022-07-22 15:39:41 +02:00
Germano Cavalcante
98bf714b37 Refactor: arrange transform convert functions in 'TransConvertTypeInfo'
Simplify the transform code by bundling the TransData creation, Data
recalculation, and special updates into a single struct.

So similar functions and parameters can be accessed without special
type checks.

Differential Revision: https://developer.blender.org/D15494
2022-07-22 10:01:27 -03:00
185eeeaaac GHOST/Wayland: Fix mouse wheel events for Sway Compositor (use seat v5)
Bump the requested seat version to v5, use discreet scroll callback.

Tested with gnome, river & sway.
2022-07-22 22:15:00 +10:00
003dfae270 Cycles: enable oneAPI in Linux release builds
0f50ae131f didn't do it reliably
since it was deactivated explicitly a bit above.
2022-07-22 13:03:49 +02:00
e0d4aede4d BMesh: move bmesh_mesh to C++
This allows parts of the code to be threaded more easily.
2022-07-22 20:40:31 +10:00
95e60b4ffd Cleanup: move crazyspace.c to c++
Doing this in preparation for D15407.
2022-07-22 12:33:08 +02:00
087f27a52f Fix T87779: Asymmetric vertex positions in circles primitives
Add sin_cos_from_fraction which ensures each quadrant has matching
values when their sign is flipped.
2022-07-22 13:59:36 +10:00
08c5d99e88 Cleanup: add BKE_image_find_nearest_tile_with_offset
Every caller BKE_image_find_nearest_tile was calculating the tile offset
so add a version of this function that returns the offset too.
2022-07-22 13:07:24 +10:00
72e249974a Fix crash loading factory settings in image paint mode
Loading factory settings left the region NULL, causing the brushes
poll function to crash.
2022-07-22 12:25:10 +10:00
d3db38cfb1 Cleanup: quiet nonull-compare warnings with GCC 2022-07-22 12:23:33 +10:00
7725740543 UV: Edge support for select shortest path operator
Calculating shortest path selection in UV edge mode was done using vertex
path logic. Since the UV editor now supports proper edge selection [0],
this approach can sometimes give incorrect results.

This problem is now fixed by adding separate logic to calculate the
shortest path in UV edge mode.

Resolves T99344.

[0]: ffaaa0bcbf

Reviewed By: campbellbarton

Ref D15511.
2022-07-22 11:17:16 +10:00
aa1ffc093c Fix T99884: Crash when converting to old curve type
The conversion from Curves to CurveEval used an incorrect type
for one of the builtin attributes. Also, an incorrect default was used
for reading the nurbs_weight attribute.
2022-07-21 19:44:06 -05:00
7a4a6ccad7 Cleanups: Small changes to armature deform
Use const pointers, remove unused data member for parallel callback,
use listbase macro.
2022-07-21 17:21:56 -05:00
ada6012518 Fix T99854: Crash converting legacy NURBS curves to new type
Creating the attributes was done inside a parallel loop. Also correct a
typo for the parallel grain size, which was meant to be a power of two.
2022-07-21 12:13:42 -05:00
Sebastiano Barrera
a5c2d0018c Fix T91932: number sliders wrap around when dragged for long distance on X11
The value of number sliders (e.g. the "end frame" button) wrap around to
their pre-click value when dragging them for a very long distance (e.g.
by lifting the mouse off the desk and placing it back on to keep
dragging in the same direction).

The problem is X11-specific, and due to XTranslateCoordinates using a
signed int16 behind the curtains, while its signature and the rest of
Blender uses int32. The solution is to only use XTranslateCoordinates on
(0, 0) to get the delta between the screen and client reference systems,
and applying the delta in a second step.

Differential Revision: https://developer.blender.org/D15507
2022-07-21 19:02:03 +02:00
611be46cc9 Cleanup: compiler warning 2022-07-21 19:01:19 +02:00
a36f029459 Fix crash in some very rare case in remapping code.
Actualy 'safe' building of the base has in view layers (as part of
`BKE_main_collection_sync_remap`) would only happen when there was
already an existing one, otherwise it was skipped, and rebuilt later
(without the support for doublons) in collection sync code.

Very odd that that error was never spotted before, issue in code has
been there for a long time already. Probably only happens in rare cases
(specific conjuction of factors during remapping of old ID into itelf
new id)?

Reported by @hjalti from Blender studio. Reproducing case:
`heist/pro/shots/050_alarm/050_0160/050_0160.anim.blend`, r1407
2022-07-21 18:11:13 +02:00
ef5b435e8f DRW: Volume: Fix crash in command line render caused by null textures
This was caused by the world volume shader needing placeholder textures
that were not available until cache populate begins.

Adding a check and creating on the fly fixes the issue.
2022-07-21 16:41:51 +02:00
d431b1416b EEVEE-Next: Add back option to disable TAA (Viewport Denoising 2022-07-21 16:41:51 +02:00
b0f9639733 Fix crash due to improper handling of new library runtime name_map data on read/write.
Code handling read/write of libraries is still particular... but trying
to call `library_runtime_reset` on a random address at readtime was an
obvious mistake I should have caught during review :(

Regression from rB7f8d05131a77.
2022-07-21 16:39:07 +02:00
396b7a6ec8 Spreadsheet: Implement selection filter for curves sculpt mode
The spreadsheet can retrieve the float selection using the same
utilities as curves sculpt brushes. Theoretically this can work in
original, evaluated, and viewer node modes, at least when the
sculpt selection attributes are able to be propagated.

Differential Revision: https://developer.blender.org/D15393
2022-07-21 09:34:48 -05:00
412d93c298 GPU: Fix compilation with WITH_GPU_BUILDTIME_SHADER_BUILDER option 2022-07-21 15:50:35 +02:00
92eb59341c EEVEE-Next: Filter NaN at output to avoid propagation. 2022-07-21 15:50:35 +02:00
9f00e138ac Cleanup: DRW: common_math_geom_lib.glsl: Fix variable name style 2022-07-21 15:50:35 +02:00
e022753d7a EEVEE-Next: Add Temporal-AntiAliasing
The improvements over the old implementation are:
- Improved history reprojection filter (catmull-rom)
- Use proper velocity for history reprojection.
- History clipping is now done in YCoCg color space using better algorithm.
- Velocity is dilated to keep correct edge anti-aliasing on moving objects.

As a result, the 3x3 blocks that made the image smoother in the previous
implementation are no longer visible is replaced by correct antialiasing.

This removes the velocity resolve pass in order to reduce the bandwidth
usage. The velocities are just resolved as they are loadded in the film
pass.
2022-07-21 15:50:35 +02:00
2bad3577c0 DRW: common_math_geom_lib.glsl: Add line_aabb_clipping_dist 2022-07-21 15:50:35 +02:00
4ba6bac2f1 Fix build error in tests binary after previous commit
Also remove an unused include and add a comment,
const, use the math namespace.
2022-07-21 08:30:07 -05:00
63be57307e Cleanup: Rename length parameterization interpolation function
The name makes more sense as an action, other interpolation
methods besides linear probably don't make sense here anyway.
2022-07-21 08:15:06 -05:00
95ab16004d Cleanup: Remove debug print in test 2022-07-21 08:00:30 -05:00
03338e0270 GHOST/Wayland: fix cursor glitch after grabbing while hidden
When the cursor grabbing was disabled, Blender's internal location
(wmWindow.eventstate) kept the location before un-hiding.

This caused the paint cursor to show in the wrong location after
adjusting the color wheel for e.g.
2022-07-21 21:47:41 +10:00
a06b04f92d Cleanup: Simplify relation flags assignment 2022-07-21 12:54:35 +02:00
2034e8c42d Geometry Nodes: add debug check for whether AttributeWriter.finish is called
Calling `finish` after writing to generic attributes is currently necessary for
correctness. Previously, this was easy to forget. Now there is a check for this
in debug builds.
2022-07-21 12:47:44 +02:00
538da79c6d Curves: fix applying materials when applying modifier
The issue was that geometry nodes was run on the original curves,
and set a pointer to an evaluated material id on it. The fix is to not
mix up original and evaluated data by making sure that geometry nodes
does not modify the original data.
2022-07-21 12:23:38 +02:00
d099e0d2a4 Cleanup: Make automated code check happy.
- Assert that one of the thwo branches in
  `id_override_library_create_hierarchy` are always processed.
- Init success value regardless.
2022-07-21 12:18:57 +02:00
f7252e9692 Cleanup: Unused forward declaration 2022-07-21 12:16:31 +02:00
10b048fd9e Fix T99885: Invalid dependency graph state when curves surface is invisible
Differential Revision: https://developer.blender.org/D15510
2022-07-21 11:26:36 +02:00
Bastien Montagne
ee3facd087 LibOverride: support 'make override' for all selected items.
This commit allows to select several data-blocks in the outliner and
create overrides from all of them, not only the active one.

It properly creates a single hierarchy when several IDs from a same
hierarchy root data are selected.

Reviewed By: Severin

Differential Revision: https://developer.blender.org/D15497
2022-07-21 10:18:43 +02:00
0dcee6a386 Fix T99733: Objects with driven visibility are evaluated when not needed
The issue was caused by the fact that objects with driven or animated
visibility were considered visible by the dependency graph evaluation.

This change makes it so the dependency graph evaluation is aware of
visibility which might be changing. This is achieved by evaluating the
path of the graph which affects objects visibility and adjusts to it
before evaluating the rest of the graph.

There is some time penalty to this, but there does not seem to be a
way to fully avoid this penalty.

With the production shot from the heist project the FPS drops by a
tenth of a frame (~9.4 vs ~9.3 fps) when adding a driver to an object
which keeps it visible. Note that this is a bit hard to measure since
the FPS fluctuates quite a bit throughout the playback. On the other
hand, having a driver on a visibility of a heavy object from character
and setting visibility to false gives big speedup.

Also worth noting that there is no penalty at all when there are no
animated visibilities in the scene.

Differential Revision: https://developer.blender.org/D15498
2022-07-21 09:49:16 +02:00
4089b7b80b Depsgraph: Clear operation evaluation flags early on
The goal is to make it possible to evaluate the graph in multiple
passes without evaluating the same node multiple times.

Currently should not be any functional changes.
2022-07-21 09:48:59 +02:00
d6faee2824 Cleanup: format 2022-07-21 17:45:36 +10:00
2eeedbbca9 Cleanup: add ISMOUSE_MOTION macro
Replace verbose ELEM(..) usage, now each kind of mouse event has it's
own macro.
2022-07-21 16:23:33 +10:00
7a73685460 Fix WM_event_type_mask_test ignoring wheel and gesture events
WM_event_type_mask_test checks assumed ISMOUSE macro worked for any
kind of mouse event when it only accepted buttons & motion.

Now ISMOUSE checks for any kind of mouse event,
use ISMOUSE_BUTTON/WHEEL/GESTURE for more specific checks.
2022-07-21 16:07:11 +10:00
095b8d8688 WM: replace ISMOUSE with ISMOUSE_BUTTON
The ISMOUSE macro was used in situations only button events
needed to be checked.

The only functional difference would be MOUSEMOVE events were
previously accepted for these checks.
2022-07-21 15:54:39 +10:00
4ec0a8705b WM: categorize smart-zoom as a gesture
Event handling and the enum definition documents MOUSESMARTZOOM
as a gesture however it wasn't accepted by ISMOUSE_GESTURE,
instead it was added to the ISMOUSE macro.

Move the type check to ISMOUSE_GESTURE.
2022-07-21 15:28:01 +10:00
dd158f1cab Fix failing cycles test from previous commit
Deprecated custom data type CD_MTEXPOLY has inconsistent data usage.

Reviewed By: Campbell Barton
2022-07-21 16:28:56 +12:00
c171e8b95c Fix T90620: Ignore missing UV data caused by corrupt .blend file
Add crash protection and partial recovery for corrupt .blend files,
particularly for missing UV data.

Differential Revision: https://developer.blender.org/D15489
2022-07-21 15:24:38 +12:00
46a2592eef Cleanup: spelling in comments, typos in tool-tips 2022-07-21 13:21:53 +10:00
e75adb979b Fix T99678: Crash applying non-existent modifiers
Regression in [0] accessed the modifier type before NULL check.

[0]: 78fc5ea1c3
2022-07-21 12:52:24 +10:00
9f68369247 Fix T99687: Cloth filter crash
The code was failing to exclude the sculpt object from
the list of collision objects.
2022-07-20 15:17:07 -07:00
eb281e4b24 Fix T99878: Deleting curves or points removes anonymous attributes
Use the attribute API instead of the CustomData API, to correctly
handle anonymous attributes and simplify the code. One non-obvious
thing to note is that the type counts are recalculated by the "finish"
function of the `curve_type` attribute, so they don't need to be copied
explicitly. Also, the mutable attribute accessor cannot be an reference
if we want to give it an rvalue, which is convenient in this case.
2022-07-20 16:40:05 -05:00
698efac59e Merge branch 'master' into retopo_transform 2022-07-20 17:13:36 -04:00
afe11eff8a tweaked snap options and snap option descriptions
- retopo mode changes layout and options under snap menu
- retopo mode auto sets defaults on options not shown in menu
- reworked descriptions of snap options
2022-07-20 17:12:13 -04:00
fe108d85b4 Cleanup: Remove unused function 2022-07-20 14:30:44 -05:00
d34f8ac3d9 Cleanup: Remove unnecessary handling of normals for fluid colliders
The normals are transformed, but not used. It looks like this logic was
just copied from below where the mesh is transformed for creating
emitters, which do use vertex normals.
2022-07-20 13:18:03 -05:00
5d4574ea0e Fix T99340: Image.frame_duration returning wrong value when image not loaded
The logic here was broken in d5f1b9c, it should load the image first.
2022-07-20 18:23:03 +02:00
4ebe1c3e69 Merge branch 'master' into retopo_transform 2022-07-18 12:13:45 -04:00
887713d08d using correct argument now 2022-07-16 10:30:45 -04:00
cc761cdae6 Merge branch 'master' into retopo_transform 2022-07-16 06:53:04 -04:00
a2938c86ca using new property name 2022-07-16 06:52:19 -04:00
a66e20f984 merged in master 2022-07-16 06:46:21 -04:00
db317f070e clean up before split 2022-07-07 15:09:23 -04:00
7502bc583c Merge branch 'master' into transform_api 2022-07-07 14:51:22 -04:00
089870ab3a reorg+cleaned snap menu, removed dbg prints, reorg edge snap methods 2022-07-07 14:50:26 -04:00
298711d158 Merge branch 'master' into transform_api 2022-07-07 10:30:30 -04:00
8d284d4854 merged in master 2022-07-05 17:01:42 -04:00
9e88cfbe0c added retopo mode, updated transform ops 2022-06-28 11:46:00 -04:00
405bbb06f2 Merge branch 'D15154-gizmogroup_fallback' into transform_api 2022-06-08 10:38:29 -04:00
ed8f2bbf5c Merge branch 'D15153-cursor_relative' into transform_api 2022-06-08 10:38:14 -04:00
ddce8e9ea3 Merge branch 'master' into transform_api 2022-06-08 10:37:56 -04:00
1b9e31f004 exposed fallback option to Gizmo type
Differential Revision: https://developer.blender.org/D15154
2022-06-08 10:17:29 -04:00
8791762af0 exposed cursor_warp_relative through api
Differential Revision: https://developer.blender.org/D15153
2022-06-08 10:08:41 -04:00
8d813f2eed added snapping options to transform api 2022-06-08 09:45:03 -04:00
af29d103c6 sync icons 2022-06-08 09:42:57 -04:00
ca336c600b revert transform API changes (moved to another patch) 2022-06-07 16:25:15 -04:00
491ada0a38 revert some transform_ops.c 2022-06-07 16:13:04 -04:00
62f813754d minor change to capitalization of label 2022-06-07 16:10:26 -04:00
cbeb70bdae replaced static_cast with enum operator 2022-06-07 12:09:04 -04:00
139a651434 explicit tests against 0 rather than implicit bool coversion 2022-06-07 11:56:06 -04:00
add307d429 added comment 2022-06-07 11:46:22 -04:00
59e6dc8a93 minor variable name change 2022-06-07 11:44:56 -04:00
6410fe0492 improved comments 2022-06-07 11:22:55 -04:00
b818008ddf addressed reviewer comments, updated versioning (untested) 2022-06-07 11:12:28 -04:00
af9c969768 Merge branch 'master' into D14591-transform_snap_nearest 2022-06-07 09:32:37 -04:00
0b25d923e5 use face raycast initially with face nearest as fallback 2022-06-07 09:31:48 -04:00
a863ba191d Merge branch 'master' into D14591-transform_snap_nearest_old 2022-06-06 17:11:32 -04:00
f606393522 sync with work on laptop 2022-06-03 11:53:52 -04:00
f8b389b121 Merge branch 'master' into arcpatch-D14591 2022-06-03 11:53:34 -04:00
Jon Denning
59adee83e7 Transform Snap: added nearest face snap mode, added snapping options, lightly refactored snapping code.
This diff adds a new face nearest snapping mode, adds new snapping options, and (lightly) refactors code around snapping.

The new face nearest snapping mode will snap transformed geometry to the nearest surface in world space.  In contrast, the original face snapping mode uses projection (raycasting) to snap source to target geometry.  Face snapping therefore only works with what is visible, while nearest face snapping can snap geometry to occluded parts of the scene.  This new mode is critical for retopology work, where some of the target mesh might be occluded.

The nearest face snapping mode has two options: "Snap to Same Target" and "Face Nearest Steps".  When the Snap to Same Object option is enabled, the selected source geometry will stay near the target that it is nearest before editing started, which prevents the source geometry from snapping to other targets.  The Face Nearest Steps divides the overall transformation for each vertex into `n` smaller transformations, then applies those `n` transformations with surface snapping interlacing each step.  This steps option handles transformations that cross U-shaped objects better.

The new snapping options allow the artist to better control which target objects (objects to which the edited geometry is snapped) are considered when snapping.  In particular, the only option for filtering target objects was a "Project onto Self", which allowed the currently edited mesh to be considered as a target.  Now, the artist can choose any combination of the following to be considered as a target: the active object, any edited object that isn't active (see note below), any non-edited object.  Additionally, the artist has another snapping option to exclude objects that are not selectable as potential targets.

The Snapping Options dropdown has been lightly reorganized to allow for the additional options.

Included in this patch:

  - Refactored the snap-related `#define`s into `enum`s, and refactored enum-related `char`, `short`, and `int` to use the appropriate enum instead.
  - Snap target selection is more controllable for artist with additional snapping options.
  - Renamed a few of the snap-related functions to better reflect what they actually do now.  For example, `applySnapping` implies that this handles the snapping, while `applyProject` implies something entirely different is done there.  However, better names would be `applySnappingAsGroup` and `applySnappingIndividual`, respectively, where `applySnappingIndividual` previously only does Face snapping.
  - Added an initial coordinate parameter to snapping functions so that the nearest target before transforming can be determined (for "Snap to Same Object"), and so the transformation can be broken into smaller steps (for "Face Nearest Steps").
  - Separated the BVH Tree getter code from mesh/edit mesh to its own function to reduce code duplication.
  - Added icon for nearest face snapping.
  - Updated `startup.blend` so face nearest steps starts at 1, and the snap target selection options have reasonable defaults (include self, include edited, include nonedited)
  - The original "Project onto Self" was actually not correct!  This option should be called "Project onto Active" instead, but that only matters when editing multiple meshes at the same time.  This patch makes this change.

Not included in this patch / future updates:

  - Snapping "Target" is a confusing named, as "Target" is used as both the transformed items (or `SCE_SNAP_TARGET_CLOSEST`, etc.) and for the objects to which the transformed items are snapped (especially Shrinkwrap modifier).  I plan to submit another patch to make this clearer after this is accepted.
  - Many of the functions do not specify in which space the point info (coordinates and normal) is defined.
  - Target selection code could be simplified by separating it from the uber `snap_flag` variable.
  - The snapping dropdown is feeling very disorganized.  Also, since enabling both the Face Projection and the Face Nearest methods does not make sense, perhaps the switch between these methods could be a checkbox (similar to snapping to relative or absolute grid).

Differential Revision: https://developer.blender.org/D14591
2022-06-03 10:48:04 -04:00
614 changed files with 15438 additions and 10411 deletions

View File

@@ -139,6 +139,7 @@ if(NOT WIN32 OR ENABLE_MINGW64)
include(cmake/vpx.cmake)
include(cmake/x264.cmake)
include(cmake/xvidcore.cmake)
include(cmake/aom.cmake)
include(cmake/ffmpeg.cmake)
include(cmake/fftw.cmake)
include(cmake/sndfile.cmake)

View File

@@ -42,4 +42,5 @@ endif()
add_dependencies(
external_alembic
external_openexr
external_imath
)

View File

@@ -0,0 +1,45 @@
# SPDX-License-Identifier: GPL-2.0-or-later
if(WIN32)
# The default generator on windows is msbuild, which we do not
# want to use for this dep, as needs to build with mingw
set(AOM_GENERATOR "Ninja")
# The default flags are full of MSVC options given this will be
# building with mingw, it'll have an unhappy time with that and
# we need to clear them out.
set(AOM_CMAKE_FLAGS )
# CMake will correctly identify phreads being available, however
# we do not want to use them, as that gains a dependency on
# libpthreadswin.dll which we do not want. when pthreads is not
# available oam will use a pthreads emulation layer using win32 threads
set(AOM_EXTRA_ARGS_WIN32 -DCMAKE_HAVE_PTHREAD_H=OFF)
else()
set(AOM_GENERATOR "Unix Makefiles")
set(AOM_CMAKE_FLAGS ${DEFAULT_CMAKE_FLAGS})
endif()
set(AOM_EXTRA_ARGS
-DENABLE_TESTDATA=OFF
-DENABLE_TESTS=OFF
-DENABLE_TOOLS=OFF
-DENABLE_EXAMPLES=OFF
${AOM_EXTRA_ARGS_WIN32}
)
# This is slightly different from all other deps in the way that
# aom uses cmake as a build system, but still needs the environment setup
# to include perl so we manually setup the environment and call
# cmake directly for the configure, build and install commands.
ExternalProject_Add(external_aom
URL file://${PACKAGE_DIR}/${AOM_FILE}
DOWNLOAD_DIR ${DOWNLOAD_DIR}
URL_HASH ${AOM_HASH_TYPE}=${AOM_HASH}
PREFIX ${BUILD_DIR}/aom
CONFIGURE_COMMAND ${CONFIGURE_ENV} &&
cd ${BUILD_DIR}/aom/src/external_aom-build/ &&
${CMAKE_COMMAND} -G "${AOM_GENERATOR}" -DCMAKE_INSTALL_PREFIX=${LIBDIR}/aom ${AOM_CMAKE_FLAGS} ${AOM_EXTRA_ARGS} ${BUILD_DIR}/aom/src/external_aom/
BUILD_COMMAND ${CMAKE_COMMAND} --build .
INSTALL_COMMAND ${CMAKE_COMMAND} --build . --target install
INSTALL_DIR ${LIBDIR}/aom
)

View File

@@ -116,3 +116,4 @@ download_source(IGC_SPIRV_TOOLS)
download_source(IGC_SPIRV_TRANSLATOR)
download_source(GMMLIB)
download_source(OCLOC)
download_source(AOM)

View File

@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0-or-later
set(FFMPEG_CFLAGS "-I${mingw_LIBDIR}/lame/include -I${mingw_LIBDIR}/openjpeg/include/ -I${mingw_LIBDIR}/ogg/include -I${mingw_LIBDIR}/vorbis/include -I${mingw_LIBDIR}/theora/include -I${mingw_LIBDIR}/opus/include -I${mingw_LIBDIR}/vpx/include -I${mingw_LIBDIR}/x264/include -I${mingw_LIBDIR}/xvidcore/include -I${mingw_LIBDIR}/zlib/include")
set(FFMPEG_LDFLAGS "-L${mingw_LIBDIR}/lame/lib -L${mingw_LIBDIR}/openjpeg/lib -L${mingw_LIBDIR}/ogg/lib -L${mingw_LIBDIR}/vorbis/lib -L${mingw_LIBDIR}/theora/lib -L${mingw_LIBDIR}/opus/lib -L${mingw_LIBDIR}/vpx/lib -L${mingw_LIBDIR}/x264/lib -L${mingw_LIBDIR}/xvidcore/lib -L${mingw_LIBDIR}/zlib/lib")
set(FFMPEG_CFLAGS "-I${mingw_LIBDIR}/lame/include -I${mingw_LIBDIR}/openjpeg/include/ -I${mingw_LIBDIR}/ogg/include -I${mingw_LIBDIR}/vorbis/include -I${mingw_LIBDIR}/theora/include -I${mingw_LIBDIR}/opus/include -I${mingw_LIBDIR}/vpx/include -I${mingw_LIBDIR}/x264/include -I${mingw_LIBDIR}/xvidcore/include -I${mingw_LIBDIR}/zlib/include -I${mingw_LIBDIR}/aom/include")
set(FFMPEG_LDFLAGS "-L${mingw_LIBDIR}/lame/lib -L${mingw_LIBDIR}/openjpeg/lib -L${mingw_LIBDIR}/ogg/lib -L${mingw_LIBDIR}/vorbis/lib -L${mingw_LIBDIR}/theora/lib -L${mingw_LIBDIR}/opus/lib -L${mingw_LIBDIR}/vpx/lib -L${mingw_LIBDIR}/x264/lib -L${mingw_LIBDIR}/xvidcore/lib -L${mingw_LIBDIR}/zlib/lib -L${mingw_LIBDIR}/aom/lib")
set(FFMPEG_EXTRA_FLAGS --pkg-config-flags=--static --extra-cflags=${FFMPEG_CFLAGS} --extra-ldflags=${FFMPEG_LDFLAGS})
set(FFMPEG_ENV PKG_CONFIG_PATH=${mingw_LIBDIR}/openjpeg/lib/pkgconfig:${mingw_LIBDIR}/x264/lib/pkgconfig:${mingw_LIBDIR}/vorbis/lib/pkgconfig:${mingw_LIBDIR}/ogg/lib/pkgconfig:${mingw_LIBDIR}:${mingw_LIBDIR}/vpx/lib/pkgconfig:${mingw_LIBDIR}/theora/lib/pkgconfig:${mingw_LIBDIR}/openjpeg/lib/pkgconfig:${mingw_LIBDIR}/opus/lib/pkgconfig:)
set(FFMPEG_ENV PKG_CONFIG_PATH=${mingw_LIBDIR}/openjpeg/lib/pkgconfig:${mingw_LIBDIR}/x264/lib/pkgconfig:${mingw_LIBDIR}/vorbis/lib/pkgconfig:${mingw_LIBDIR}/ogg/lib/pkgconfig:${mingw_LIBDIR}:${mingw_LIBDIR}/vpx/lib/pkgconfig:${mingw_LIBDIR}/theora/lib/pkgconfig:${mingw_LIBDIR}/openjpeg/lib/pkgconfig:${mingw_LIBDIR}/opus/lib/pkgconfig:${mingw_LIBDIR}/aom/lib/pkgconfig:)
if(WIN32)
set(FFMPEG_ENV set ${FFMPEG_ENV} &&)
@@ -79,6 +79,7 @@ ExternalProject_Add(external_ffmpeg
--disable-librtmp
--enable-libx264
--enable-libxvid
--enable-libaom
--disable-libopencore-amrnb
--disable-libopencore-amrwb
--disable-libdc1394
@@ -125,6 +126,7 @@ add_dependencies(
external_vorbis
external_ogg
external_lame
external_aom
)
if(WIN32)
add_dependencies(

View File

@@ -5,8 +5,6 @@ ExternalProject_Add(external_flex
URL_HASH ${FLEX_HASH_TYPE}=${FLEX_HASH}
DOWNLOAD_DIR ${DOWNLOAD_DIR}
PREFIX ${BUILD_DIR}/flex
# This patch fixes build with some versions of glibc (https://github.com/westes/flex/commit/24fd0551333e7eded87b64dd36062da3df2f6380)
PATCH_COMMAND ${PATCH_CMD} -d ${BUILD_DIR}/flex/src/external_flex < ${PATCH_DIR}/flex.diff
CONFIGURE_COMMAND ${CONFIGURE_ENV} && cd ${BUILD_DIR}/flex/src/external_flex/ && ${CONFIGURE_COMMAND} --prefix=${LIBDIR}/flex
BUILD_COMMAND ${CONFIGURE_ENV} && cd ${BUILD_DIR}/flex/src/external_flex/ && make -j${MAKE_THREADS}
INSTALL_COMMAND ${CONFIGURE_ENV} && cd ${BUILD_DIR}/flex/src/external_flex/ && make install

View File

@@ -25,9 +25,6 @@ if(BUILD_MODE STREQUAL Release)
# glew-> opengl
${CMAKE_COMMAND} -E copy ${LIBDIR}/glew/lib/libglew32.lib ${HARVEST_TARGET}/opengl/lib/glew.lib &&
${CMAKE_COMMAND} -E copy_directory ${LIBDIR}/glew/include/ ${HARVEST_TARGET}/opengl/include/ &&
# tiff
${CMAKE_COMMAND} -E copy ${LIBDIR}/tiff/lib/tiff.lib ${HARVEST_TARGET}/tiff/lib/libtiff.lib &&
${CMAKE_COMMAND} -E copy_directory ${LIBDIR}/tiff/include/ ${HARVEST_TARGET}/tiff/include/
DEPENDS
)
endif()
@@ -177,6 +174,7 @@ harvest(opus/lib ffmpeg/lib "*.a")
harvest(vpx/lib ffmpeg/lib "*.a")
harvest(x264/lib ffmpeg/lib "*.a")
harvest(xvidcore/lib ffmpeg/lib "*.a")
harvest(aom/lib ffmpeg/lib "*.a")
harvest(webp/lib webp/lib "*.a")
harvest(webp/include webp/include "*.h")
harvest(usd/include usd/include "*.h")

View File

@@ -18,9 +18,15 @@ if(WIN32)
set(PNG_LIBNAME libpng16_static${LIBEXT})
set(OIIO_SIMD_FLAGS -DUSE_SIMD=sse2)
set(OPENJPEG_POSTFIX _msvc)
if(BUILD_MODE STREQUAL Debug)
set(TIFF_POSTFIX d)
else()
set(TIFF_POSTFIX)
endif()
else()
set(PNG_LIBNAME libpng${LIBEXT})
set(OIIO_SIMD_FLAGS)
set(TIFF_POSTFIX)
endif()
if(MSVC)
@@ -65,7 +71,7 @@ set(OPENIMAGEIO_EXTRA_ARGS
-DZLIB_INCLUDE_DIR=${LIBDIR}/zlib/include
-DPNG_LIBRARY=${LIBDIR}/png/lib/${PNG_LIBNAME}
-DPNG_PNG_INCLUDE_DIR=${LIBDIR}/png/include
-DTIFF_LIBRARY=${LIBDIR}/tiff/lib/${LIBPREFIX}tiff${LIBEXT}
-DTIFF_LIBRARY=${LIBDIR}/tiff/lib/${LIBPREFIX}tiff${TIFF_POSTFIX}${LIBEXT}
-DTIFF_INCLUDE_DIR=${LIBDIR}/tiff/include
-DJPEG_LIBRARY=${LIBDIR}/jpeg/lib/${JPEG_LIBRARY}
-DJPEG_INCLUDE_DIR=${LIBDIR}/jpeg/include

View File

@@ -3,6 +3,8 @@
set(TIFF_EXTRA_ARGS
-DZLIB_LIBRARY=${LIBDIR}/zlib/lib/${ZLIB_LIBRARY}
-DZLIB_INCLUDE_DIR=${LIBDIR}/zlib/include
-DJPEG_LIBRARY=${LIBDIR}/jpeg/lib/${JPEG_LIBRARY}
-DJPEG_INCLUDE_DIR=${LIBDIR}/jpeg/include
-DPNG_STATIC=ON
-DBUILD_SHARED_LIBS=OFF
-Dlzma=OFF
@@ -24,10 +26,12 @@ add_dependencies(
external_tiff
external_zlib
)
if(WIN32 AND BUILD_MODE STREQUAL Debug)
ExternalProject_Add_Step(external_tiff after_install
COMMAND ${CMAKE_COMMAND} -E copy ${LIBDIR}/tiff/lib/tiffd${LIBEXT} ${LIBDIR}/tiff/lib/tiff${LIBEXT}
DEPENDEES install
)
if(WIN32)
if(BUILD_MODE STREQUAL Release)
ExternalProject_Add_Step(external_tiff after_install
COMMAND ${CMAKE_COMMAND} -E copy ${LIBDIR}/tiff/lib/tiff.lib ${HARVEST_TARGET}/tiff/lib/libtiff.lib &&
${CMAKE_COMMAND} -E copy_directory ${LIBDIR}/tiff/include/ ${HARVEST_TARGET}/tiff/include/
DEPENDEES install
)
endif()
endif()

View File

@@ -45,15 +45,15 @@ set(PTHREADS_HASH f3bf81bb395840b3446197bcf4ecd653)
set(PTHREADS_HASH_TYPE MD5)
set(PTHREADS_FILE pthreads4w-code-${PTHREADS_VERSION}.zip)
set(OPENEXR_VERSION 3.1.4)
set(OPENEXR_VERSION 3.1.5)
set(OPENEXR_URI https://github.com/AcademySoftwareFoundation/openexr/archive/v${OPENEXR_VERSION}.tar.gz)
set(OPENEXR_HASH e990be1ff765797bc2d93a8060e1c1f2)
set(OPENEXR_HASH a92f38eedd43e56c0af56d4852506886)
set(OPENEXR_HASH_TYPE MD5)
set(OPENEXR_FILE openexr-${OPENEXR_VERSION}.tar.gz)
set(IMATH_VERSION 3.1.4)
set(IMATH_VERSION 3.1.5)
set(IMATH_URI https://github.com/AcademySoftwareFoundation/Imath/archive/v${OPENEXR_VERSION}.tar.gz)
set(IMATH_HASH fddf14ec73e12c34e74c3c175e311a3f)
set(IMATH_HASH dd375574276c54872b7b3d54053baff0)
set(IMATH_HASH_TYPE MD5)
set(IMATH_FILE imath-${IMATH_VERSION}.tar.gz)
@@ -163,9 +163,9 @@ set(ROBINMAP_HASH c08ec4b1bf1c85eb0d6432244a6a89862229da1cb834f3f90fba8dc35d8c8e
set(ROBINMAP_HASH_TYPE SHA256)
set(ROBINMAP_FILE robinmap-${ROBINMAP_VERSION}.tar.gz)
set(TIFF_VERSION 4.3.0)
set(TIFF_VERSION 4.4.0)
set(TIFF_URI http://download.osgeo.org/libtiff/tiff-${TIFF_VERSION}.tar.gz)
set(TIFF_HASH 0a2e4744d1426a8fc8211c0cdbc3a1b3)
set(TIFF_HASH 376f17f189e9d02280dfe709b2b2bbea)
set(TIFF_HASH_TYPE MD5)
set(TIFF_FILE tiff-${TIFF_VERSION}.tar.gz)
@@ -633,3 +633,9 @@ set(OCLOC_URI https://github.com/intel/compute-runtime/archive/refs/tags/${OCLOC
set(OCLOC_HASH ab22b8bf2560a57fdd3def0e35a62ca75991406f959c0263abb00cd6cd9ae998)
set(OCLOC_HASH_TYPE SHA256)
set(OCLOC_FILE ocloc-${OCLOC_VERSION}.tar.gz)
set(AOM_VERSION 3.4.0)
set(AOM_URI https://storage.googleapis.com/aom-releases/libaom-${AOM_VERSION}.tar.gz)
set(AOM_HASH bd754b58c3fa69f3ffd29da77de591bd9c26970e3b18537951336d6c0252e354)
set(AOM_HASH_TYPE SHA256)
set(AOM_FILE libaom-${AOM_VERSION}.tar.gz)

View File

@@ -1,11 +1,13 @@
# SPDX-License-Identifier: GPL-2.0-or-later
if(WIN32)
if("${CMAKE_SIZEOF_VOID_P}" EQUAL "8")
set(VPX_EXTRA_FLAGS --target=x86_64-win64-gcc --disable-multithread)
else()
set(VPX_EXTRA_FLAGS --target=x86-win32-gcc --disable-multithread)
endif()
# VPX is determined to use pthreads which it will tell ffmpeg to dynamically
# link, which is not something we're super into distribution wise. However
# if it cannot find pthread.h it'll happily provide a pthread emulation
# layer using win32 threads. So all this patch does is make it not find
# pthead.h
set(VPX_PATCH ${PATCH_CMD} -p 1 -d ${BUILD_DIR}/vpx/src/external_vpx < ${PATCH_DIR}/vpx_windows.diff)
set(VPX_EXTRA_FLAGS --target=x86_64-win64-gcc )
else()
if(APPLE)
if("${CMAKE_OSX_ARCHITECTURES}" STREQUAL "arm64")
@@ -18,6 +20,16 @@ else()
endif()
endif()
if(NOT BLENDER_PLATFORM_ARM)
list(APPEND VPX_EXTRA_FLAGS
--enable-sse4_1
--enable-sse3
--enable-ssse3
--enable-avx
--enable-avx2
)
endif()
ExternalProject_Add(external_vpx
URL file://${PACKAGE_DIR}/${VPX_FILE}
DOWNLOAD_DIR ${DOWNLOAD_DIR}
@@ -30,11 +42,6 @@ ExternalProject_Add(external_vpx
--enable-static
--disable-install-bins
--disable-install-srcs
--disable-sse4_1
--disable-sse3
--disable-ssse3
--disable-avx
--disable-avx2
--disable-unit-tests
--disable-examples
--enable-vp8
@@ -42,6 +49,7 @@ ExternalProject_Add(external_vpx
${VPX_EXTRA_FLAGS}
BUILD_COMMAND ${CONFIGURE_ENV} && cd ${BUILD_DIR}/vpx/src/external_vpx/ && make -j${MAKE_THREADS}
INSTALL_COMMAND ${CONFIGURE_ENV} && cd ${BUILD_DIR}/vpx/src/external_vpx/ && make install
PATCH_COMMAND ${VPX_PATCH}
INSTALL_DIR ${LIBDIR}/vpx
)

View File

@@ -478,7 +478,7 @@ OCIO_FORCE_BUILD=false
OCIO_FORCE_REBUILD=false
OCIO_SKIP=false
IMATH_VERSION="3.1.4"
IMATH_VERSION="3.1.5"
IMATH_VERSION_SHORT="3.1"
IMATH_VERSION_MIN="3.0"
IMATH_VERSION_MEX="4.0"
@@ -487,7 +487,7 @@ IMATH_FORCE_REBUILD=false
IMATH_SKIP=false
_with_built_imath=false
OPENEXR_VERSION="3.1.4"
OPENEXR_VERSION="3.1.5"
OPENEXR_VERSION_SHORT="3.1"
OPENEXR_VERSION_MIN="3.0"
OPENEXR_VERSION_MEX="4.0"
@@ -627,6 +627,9 @@ WEBP_DEV=""
VPX_USE=false
VPX_VERSION_MIN=0.9.7
VPX_DEV=""
AOM_USE=false
AOM_VERSION_MIN=3.3.0
AOM_DEV=""
OPUS_USE=false
OPUS_VERSION_MIN=1.1.1
OPUS_DEV=""
@@ -1209,7 +1212,7 @@ You may also want to build them yourself (optional ones are [between brackets]):
** [NumPy $PYTHON_NUMPY_VERSION] (use pip).
* Boost $BOOST_VERSION (from $BOOST_SOURCE, modules: $BOOST_BUILD_MODULES).
* TBB $TBB_VERSION (from $TBB_SOURCE).
* [FFMpeg $FFMPEG_VERSION (needs libvorbis, libogg, libtheora, libx264, libmp3lame, libxvidcore, libvpx, libwebp, ...)] (from $FFMPEG_SOURCE).
* [FFMpeg $FFMPEG_VERSION (needs libvorbis, libogg, libtheora, libx264, libmp3lame, libxvidcore, libvpx, libaom, libwebp, ...)] (from $FFMPEG_SOURCE).
* [OpenColorIO $OCIO_VERSION] (from $OCIO_SOURCE).
* Imath $IMATH_VERSION (from $IMATH_SOURCE).
* OpenEXR $OPENEXR_VERSION (from $OPENEXR_SOURCE).
@@ -3000,7 +3003,7 @@ compile_ALEMBIC() {
fi
# To be changed each time we make edits that would modify the compiled result!
alembic_magic=2
alembic_magic=3
_init_alembic
# Force having own builds for the dependencies.
@@ -3048,7 +3051,7 @@ compile_ALEMBIC() {
fi
if [ "$_with_built_openexr" = true ]; then
cmake_d="$cmake_d -D USE_ARNOLD=OFF"
cmake_d="$cmake_d -D USE_BINARIES=OFF"
cmake_d="$cmake_d -D USE_BINARIES=ON" # Tests use some Alembic binaries...
cmake_d="$cmake_d -D USE_EXAMPLES=OFF"
cmake_d="$cmake_d -D USE_HDF5=OFF"
cmake_d="$cmake_d -D USE_MAYA=OFF"
@@ -3634,7 +3637,7 @@ compile_FFmpeg() {
fi
# To be changed each time we make edits that would modify the compiled result!
ffmpeg_magic=5
ffmpeg_magic=6
_init_ffmpeg
# Force having own builds for the dependencies.
@@ -3687,6 +3690,10 @@ compile_FFmpeg() {
extra="$extra --enable-libvpx"
fi
if [ "$AOM_USE" = true ]; then
extra="$extra --enable-libaom"
fi
if [ "$WEBP_USE" = true ]; then
extra="$extra --enable-libwebp"
fi
@@ -4140,30 +4147,34 @@ install_DEB() {
WEBP_USE=true
fi
if [ "$WITH_ALL" = true ]; then
XVID_DEV="libxvidcore-dev"
check_package_DEB $XVID_DEV
if [ $? -eq 0 ]; then
XVID_USE=true
fi
XVID_DEV="libxvidcore-dev"
check_package_DEB $XVID_DEV
if [ $? -eq 0 ]; then
XVID_USE=true
fi
MP3LAME_DEV="libmp3lame-dev"
check_package_DEB $MP3LAME_DEV
if [ $? -eq 0 ]; then
MP3LAME_USE=true
fi
MP3LAME_DEV="libmp3lame-dev"
check_package_DEB $MP3LAME_DEV
if [ $? -eq 0 ]; then
MP3LAME_USE=true
fi
VPX_DEV="libvpx-dev"
check_package_version_ge_DEB $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
VPX_DEV="libvpx-dev"
check_package_version_ge_DEB $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
OPUS_DEV="libopus-dev"
check_package_version_ge_DEB $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
AOM_DEV="libaom-dev"
check_package_version_ge_DEB $AOM_DEV $AOM_VERSION_MIN
if [ $? -eq 0 ]; then
AOM_USE=true
fi
OPUS_DEV="libopus-dev"
check_package_version_ge_DEB $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
# Check cmake version and disable features for older distros.
@@ -4546,6 +4557,9 @@ install_DEB() {
if [ "$VPX_USE" = true ]; then
_packages="$_packages $VPX_DEV"
fi
if [ "$AOM_USE" = true ]; then
_packages="$_packages $AOM_DEV"
fi
if [ "$OPUS_USE" = true ]; then
_packages="$_packages $OPUS_DEV"
fi
@@ -4846,21 +4860,27 @@ install_RPM() {
WEBP_USE=true
fi
if [ "$WITH_ALL" = true ]; then
VPX_DEV="libvpx-devel"
check_package_version_ge_RPM $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
VPX_DEV="libvpx-devel"
check_package_version_ge_RPM $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
AOM_DEV="libaom-devel"
check_package_version_ge_RPM $AOM_DEV $AOM_VERSION_MIN
if [ $? -eq 0 ]; then
AOM_USE=true
fi
OPUS_DEV="libopus-devel"
check_package_version_ge_RPM $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
if [ "$WITH_ALL" = true ]; then
PRINT ""
install_packages_RPM libspnav-devel
OPUS_DEV="libopus-devel"
check_package_version_ge_RPM $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
fi
PRINT ""
@@ -5245,6 +5265,9 @@ install_RPM() {
if [ "$VPX_USE" = true ]; then
_packages="$_packages $VPX_DEV"
fi
if [ "$AOM_USE" = true ]; then
_packages="$_packages $AOM_DEV"
fi
if [ "$OPUS_USE" = true ]; then
_packages="$_packages $OPUS_DEV"
fi
@@ -5434,30 +5457,34 @@ install_ARCH() {
WEBP_USE=true
fi
if [ "$WITH_ALL" = true ]; then
XVID_DEV="xvidcore"
check_package_ARCH $XVID_DEV
if [ $? -eq 0 ]; then
XVID_USE=true
fi
XVID_DEV="xvidcore"
check_package_ARCH $XVID_DEV
if [ $? -eq 0 ]; then
XVID_USE=true
fi
MP3LAME_DEV="lame"
check_package_ARCH $MP3LAME_DEV
if [ $? -eq 0 ]; then
MP3LAME_USE=true
fi
MP3LAME_DEV="lame"
check_package_ARCH $MP3LAME_DEV
if [ $? -eq 0 ]; then
MP3LAME_USE=true
fi
VPX_DEV="libvpx"
check_package_version_ge_ARCH $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
VPX_DEV="libvpx"
check_package_version_ge_ARCH $VPX_DEV $VPX_VERSION_MIN
if [ $? -eq 0 ]; then
VPX_USE=true
fi
OPUS_DEV="opus"
check_package_version_ge_ARCH $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
AOM_DEV="libaom"
check_package_version_ge_ARCH $AOM_DEV $AOM_VERSION_MIN
if [ $? -eq 0 ]; then
AOM_USE=true
fi
OPUS_DEV="opus"
check_package_version_ge_ARCH $OPUS_DEV $OPUS_VERSION_MIN
if [ $? -eq 0 ]; then
OPUS_USE=true
fi
@@ -5835,6 +5862,9 @@ install_ARCH() {
if [ "$VPX_USE" = true ]; then
_packages="$_packages $VPX_DEV"
fi
if [ "$AOM_USE" = true ]; then
_packages="$_packages $AOM_DEV"
fi
if [ "$OPUS_USE" = true ]; then
_packages="$_packages $OPUS_DEV"
fi

View File

@@ -1,15 +0,0 @@
diff --git a/configure.ac b/configure.ac
index c6f12d644..3c977a4e3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -25,8 +25,10 @@
# autoconf requirements and initialization
AC_INIT([the fast lexical analyser generator],[2.6.4],[flex-help@lists.sourceforge.net],[flex])
+AC_PREREQ([2.60])
AC_CONFIG_SRCDIR([src/scan.l])
AC_CONFIG_AUX_DIR([build-aux])
+AC_USE_SYSTEM_EXTENSIONS
LT_INIT
AM_INIT_AUTOMAKE([1.15 -Wno-portability foreign std-options dist-lzip parallel-tests subdir-objects])
AC_CONFIG_HEADER([src/config.h])

View File

@@ -0,0 +1,11 @@
diff -Naur orig/configure external_vpx/configure
--- orig/configure 2022-07-06 09:22:04 -0600
+++ external_vpx/configure 2022-07-06 09:24:12 -0600
@@ -270,7 +270,6 @@
HAVE_LIST="
${ARCH_EXT_LIST}
vpx_ports
- pthread_h
unistd_h
"
EXPERIMENT_LIST="

View File

@@ -78,11 +78,6 @@ if(UNIX AND NOT APPLE)
set(WITH_PULSEAUDIO ON CACHE BOOL "" FORCE)
set(WITH_X11_XINPUT ON CACHE BOOL "" FORCE)
set(WITH_X11_XF86VMODE ON CACHE BOOL "" FORCE)
# Disable oneAPI on Linux for the time being.
# The AoT compilation takes too long to be used officially in the buildbot CI/CD and the JIT
# compilation has ABI compatibility issues when running builds made on centOS on Ubuntu.
set(WITH_CYCLES_DEVICE_ONEAPI OFF CACHE BOOL "" FORCE)
endif()
if(NOT APPLE)
set(WITH_XR_OPENXR ON CACHE BOOL "" FORCE)
@@ -93,6 +88,6 @@ if(NOT APPLE)
set(WITH_CYCLES_HIP_BINARIES ON CACHE BOOL "" FORCE)
set(WITH_CYCLES_DEVICE_ONEAPI ON CACHE BOOL "" FORCE)
# Disable AoT kernels compilations until buildbot can deliver them in a reasonabel time.
# Disable AoT kernels compilations until buildbot can deliver them in a reasonable time.
set(WITH_CYCLES_ONEAPI_BINARIES OFF CACHE BOOL "" FORCE)
endif()

View File

@@ -162,6 +162,9 @@ if(WITH_CODEC_FFMPEG)
mp3lame ogg opus swresample swscale
theora theoradec theoraenc vorbis vorbisenc
vorbisfile vpx x264 xvidcore)
if(EXISTS ${LIBDIR}/ffmpeg/lib/libaom.a)
list(APPEND FFMPEG_FIND_COMPONENTS aom)
endif()
find_package(FFmpeg)
endif()
@@ -467,8 +470,9 @@ string(APPEND CMAKE_CXX_FLAGS " -ftemplate-depth=1024")
# Avoid conflicts with Luxrender, and other plug-ins that may use the same
# libraries as Blender with a different version or build options.
set(PLATFORM_SYMBOLS_MAP ${CMAKE_SOURCE_DIR}/source/creator/symbols_apple.map)
string(APPEND PLATFORM_LINKFLAGS
" -Wl,-unexported_symbols_list,'${CMAKE_SOURCE_DIR}/source/creator/osx_locals.map'"
" -Wl,-unexported_symbols_list,'${PLATFORM_SYMBOLS_MAP}'"
)
string(APPEND CMAKE_CXX_FLAGS " -stdlib=libc++")

View File

@@ -202,6 +202,9 @@ if(WITH_CODEC_FFMPEG)
vpx
x264
xvidcore)
if(EXISTS ${LIBDIR}/ffmpeg/lib/libaom.a)
list(APPEND FFMPEG_FIND_COMPONENTS aom)
endif()
elseif(FFMPEG)
# Old cache variable used for root dir, convert to new standard.
set(FFMPEG_ROOT_DIR ${FFMPEG})
@@ -885,8 +888,9 @@ unset(_IS_LINKER_DEFAULT)
# Avoid conflicts with Mesa llvmpipe, Luxrender, and other plug-ins that may
# use the same libraries as Blender with a different version or build options.
set(PLATFORM_SYMBOLS_MAP ${CMAKE_SOURCE_DIR}/source/creator/symbols_unix.map)
set(PLATFORM_LINKFLAGS
"${PLATFORM_LINKFLAGS} -Wl,--version-script='${CMAKE_SOURCE_DIR}/source/creator/blender.map'"
"${PLATFORM_LINKFLAGS} -Wl,--version-script='${PLATFORM_SYMBOLS_MAP}'"
)
# Don't use position independent executable for portable install since file

View File

@@ -38,7 +38,7 @@ PROJECT_NAME = Blender
# could be handy for archiving the generated documentation or if some version
# control system is used.
PROJECT_NUMBER = V3.3
PROJECT_NUMBER = V3.4
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a

View File

@@ -1131,6 +1131,7 @@ def pymodule2sphinx(basepath, module_name, module, title, module_all_extra):
# Changes In Blender will force errors here.
context_type_map = {
# context_member: (RNA type, is_collection)
"active_action": ("Action", False),
"active_annotation_layer": ("GPencilLayer", False),
"active_bone": ("EditBone", False),
"active_file": ("FileSelectEntry", False),

View File

@@ -1,6 +1,13 @@
# SPDX-License-Identifier: GPL-2.0-or-later
# Copyright 2016 Blender Foundation. All rights reserved.
# Too noisy for code we don't maintain.
if(CMAKE_COMPILER_IS_GNUCC)
if(NOT "${CMAKE_CXX_COMPILER_VERSION}" VERSION_LESS "8.0")
add_cxx_flag("-Wno-cast-function-type")
endif()
endif()
set(INC
src
src/gflags

View File

@@ -36,8 +36,13 @@ if(WITH_CYCLES_NATIVE_ONLY)
)
if(NOT MSVC)
string(APPEND CMAKE_CXX_FLAGS " -march=native")
set(CYCLES_KERNEL_FLAGS "-march=native")
ADD_CHECK_CXX_COMPILER_FLAG(CMAKE_CXX_FLAGS _has_march_native "-march=native")
if(_has_march_native)
set(CYCLES_KERNEL_FLAGS "-march=native")
else()
set(CYCLES_KERNEL_FLAGS "")
endif()
unset(_has_march_native)
else()
if(NOT MSVC_NATIVE_ARCH_FLAGS)
TRY_RUN(

View File

@@ -55,7 +55,7 @@ static bool ObtainCacheParticleData(
return false;
Transform tfm = get_transform(b_ob->matrix_world());
Transform itfm = transform_quick_inverse(tfm);
Transform itfm = transform_inverse(tfm);
for (BL::Modifier &b_mod : b_ob->modifiers) {
if ((b_mod.type() == b_mod.type_PARTICLE_SYSTEM) &&

View File

@@ -928,8 +928,22 @@ static ShaderNode *add_node(Scene *scene,
sky->set_sun_disc(b_sky_node.sun_disc());
sky->set_sun_size(b_sky_node.sun_size());
sky->set_sun_intensity(b_sky_node.sun_intensity());
sky->set_sun_elevation(b_sky_node.sun_elevation());
sky->set_sun_rotation(b_sky_node.sun_rotation());
/* Patch sun position to be able to animate daylight cycle while keeping the shading code
* simple. */
float sun_rotation = b_sky_node.sun_rotation();
/* Wrap into [-2PI..2PI] range. */
float sun_elevation = fmodf(b_sky_node.sun_elevation(), M_2PI_F);
/* Wrap into [-PI..PI] range. */
if (fabsf(sun_elevation) >= M_PI_F) {
sun_elevation -= copysignf(2.0f, sun_elevation) * M_PI_F;
}
/* Wrap into [-PI/2..PI/2] range while keeping the same absolute position. */
if (sun_elevation >= M_PI_2_F || sun_elevation <= -M_PI_2_F) {
sun_elevation = copysignf(M_PI_F, sun_elevation) - sun_elevation;
sun_rotation += M_PI_F;
}
sky->set_sun_elevation(sun_elevation);
sky->set_sun_rotation(sun_rotation);
sky->set_altitude(b_sky_node.altitude());
sky->set_air_density(b_sky_node.air_density());
sky->set_dust_density(b_sky_node.dust_density());

View File

@@ -7,6 +7,7 @@
#include "MEM_guardedalloc.h"
#include "RNA_access.h"
#include "RNA_blender_cpp.h"
#include "RNA_path.h"
#include "RNA_types.h"
#include "blender/id_map.h"

View File

@@ -21,13 +21,9 @@
# include "bvh/embree.h"
/* Kernel includes are necessary so that the filter function for Embree can access the packed BVH.
*/
# include "kernel/bvh/embree.h"
# include "kernel/bvh/util.h"
# include "kernel/device/cpu/bvh.h"
# include "kernel/device/cpu/compat.h"
# include "kernel/device/cpu/globals.h"
# include "kernel/sample/lcg.h"
# include "scene/hair.h"
# include "scene/mesh.h"
@@ -46,265 +42,6 @@ static_assert(Object::MAX_MOTION_STEPS <= RTC_MAX_TIME_STEP_COUNT,
static_assert(Object::MAX_MOTION_STEPS == Geometry::MAX_MOTION_STEPS,
"Object and Geometry max motion steps inconsistent");
# define IS_HAIR(x) (x & 1)
/* This gets called by Embree at every valid ray/object intersection.
* Things like recording subsurface or shadow hits for later evaluation
* as well as filtering for volume objects happen here.
* Cycles' own BVH does that directly inside the traversal calls.
*/
static void rtc_filter_intersection_func(const RTCFilterFunctionNArguments *args)
{
/* Current implementation in Cycles assumes only single-ray intersection queries. */
assert(args->N == 1);
RTCHit *hit = (RTCHit *)args->hit;
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
}
}
/* This gets called by Embree at every valid ray/object intersection.
* Things like recording subsurface or shadow hits for later evaluation
* as well as filtering for volume objects happen here.
* Cycles' own BVH does that directly inside the traversal calls.
*/
static void rtc_filter_occluded_func(const RTCFilterFunctionNArguments *args)
{
/* Current implementation in Cycles assumes only single-ray intersection queries. */
assert(args->N == 1);
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
switch (ctx->type) {
case CCLIntersectContext::RAY_SHADOW_ALL: {
Intersection current_isect;
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (intersection_skip_self_shadow(cray->self, current_isect.object, current_isect.prim)) {
*args->valid = 0;
return;
}
/* If no transparent shadows or max number of hits exceeded, all light is blocked. */
const int flags = intersection_get_shader_flags(kg, current_isect.prim, current_isect.type);
if (!(flags & (SD_HAS_TRANSPARENT_SHADOW)) || ctx->num_hits >= ctx->max_hits) {
ctx->opaque_hit = true;
return;
}
++ctx->num_hits;
/* Always use baked shadow transparency for curves. */
if (current_isect.type & PRIMITIVE_CURVE) {
ctx->throughput *= intersection_curve_shadow_transparency(
kg, current_isect.object, current_isect.prim, current_isect.u);
if (ctx->throughput < CURVE_SHADOW_TRANSPARENCY_CUTOFF) {
ctx->opaque_hit = true;
return;
}
else {
*args->valid = 0;
return;
}
}
/* Test if we need to record this transparent intersection. */
const uint max_record_hits = min(ctx->max_hits, INTEGRATOR_SHADOW_ISECT_SIZE);
if (ctx->num_recorded_hits < max_record_hits || ray->tfar < ctx->max_t) {
/* If maximum number of hits was reached, replace the intersection with the
* highest distance. We want to find the N closest intersections. */
const uint num_recorded_hits = min(ctx->num_recorded_hits, max_record_hits);
uint isect_index = num_recorded_hits;
if (num_recorded_hits + 1 >= max_record_hits) {
float max_t = ctx->isect_s[0].t;
uint max_recorded_hit = 0;
for (uint i = 1; i < num_recorded_hits; ++i) {
if (ctx->isect_s[i].t > max_t) {
max_recorded_hit = i;
max_t = ctx->isect_s[i].t;
}
}
if (num_recorded_hits >= max_record_hits) {
isect_index = max_recorded_hit;
}
/* Limit the ray distance and stop counting hits beyond this.
* TODO: is there some way we can tell Embree to stop intersecting beyond
* this distance when max number of hits is reached?. Or maybe it will
* become irrelevant if we make max_hits a very high number on the CPU. */
ctx->max_t = max(current_isect.t, max_t);
}
ctx->isect_s[isect_index] = current_isect;
}
/* Always increase the number of recorded hits, even beyond the maximum,
* so that we can detect this and trace another ray if needed. */
++ctx->num_recorded_hits;
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
case CCLIntersectContext::RAY_LOCAL:
case CCLIntersectContext::RAY_SSS: {
/* Check if it's hitting the correct object. */
Intersection current_isect;
if (ctx->type == CCLIntersectContext::RAY_SSS) {
kernel_embree_convert_sss_hit(kg, ray, hit, &current_isect, ctx->local_object_id);
}
else {
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (ctx->local_object_id != current_isect.object) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
}
if (intersection_skip_self_local(cray->self, current_isect.prim)) {
*args->valid = 0;
return;
}
/* No intersection information requested, just return a hit. */
if (ctx->max_hits == 0) {
break;
}
/* Ignore curves. */
if (IS_HAIR(hit->geomID)) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
LocalIntersection *local_isect = ctx->local_isect;
int hit_idx = 0;
if (ctx->lcg_state) {
/* See triangle_intersect_subsurface() for the native equivalent. */
for (int i = min((int)ctx->max_hits, local_isect->num_hits) - 1; i >= 0; --i) {
if (local_isect->hits[i].t == ray->tfar) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
return;
}
}
local_isect->num_hits++;
if (local_isect->num_hits <= ctx->max_hits) {
hit_idx = local_isect->num_hits - 1;
}
else {
/* reservoir sampling: if we are at the maximum number of
* hits, randomly replace element or skip it */
hit_idx = lcg_step_uint(ctx->lcg_state) % local_isect->num_hits;
if (hit_idx >= ctx->max_hits) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
return;
}
}
}
else {
/* Record closest intersection only. */
if (local_isect->num_hits && current_isect.t > local_isect->hits[0].t) {
*args->valid = 0;
return;
}
local_isect->num_hits = 1;
}
/* record intersection */
local_isect->hits[hit_idx] = current_isect;
local_isect->Ng[hit_idx] = normalize(make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z));
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
case CCLIntersectContext::RAY_VOLUME_ALL: {
/* Append the intersection to the end of the array. */
if (ctx->num_hits < ctx->max_hits) {
Intersection current_isect;
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (intersection_skip_self(cray->self, current_isect.object, current_isect.prim)) {
*args->valid = 0;
return;
}
Intersection *isect = &ctx->isect_s[ctx->num_hits];
++ctx->num_hits;
*isect = current_isect;
/* Only primitives from volume object. */
uint tri_object = isect->object;
int object_flag = kernel_data_fetch(object_flag, tri_object);
if ((object_flag & SD_OBJECT_HAS_VOLUME) == 0) {
--ctx->num_hits;
}
/* This tells Embree to continue tracing. */
*args->valid = 0;
}
break;
}
case CCLIntersectContext::RAY_REGULAR:
default:
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
return;
}
break;
}
}
static void rtc_filter_func_backface_cull(const RTCFilterFunctionNArguments *args)
{
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
/* Always ignore back-facing intersections. */
if (dot(make_float3(ray->dir_x, ray->dir_y, ray->dir_z),
make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z)) > 0.0f) {
*args->valid = 0;
return;
}
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
}
}
static void rtc_filter_occluded_func_backface_cull(const RTCFilterFunctionNArguments *args)
{
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
/* Always ignore back-facing intersections. */
if (dot(make_float3(ray->dir_x, ray->dir_y, ray->dir_z),
make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z)) > 0.0f) {
*args->valid = 0;
return;
}
rtc_filter_occluded_func(args);
}
static size_t unaccounted_mem = 0;
static bool rtc_memory_monitor_func(void *userPtr, const ssize_t bytes, const bool)
@@ -535,8 +272,8 @@ void BVHEmbree::add_triangles(const Object *ob, const Mesh *mesh, int i)
set_tri_vertex_buffer(geom_id, mesh, false);
rtcSetGeometryUserData(geom_id, (void *)prim_offset);
rtcSetGeometryOccludedFilterFunction(geom_id, rtc_filter_occluded_func);
rtcSetGeometryIntersectFilterFunction(geom_id, rtc_filter_intersection_func);
rtcSetGeometryOccludedFilterFunction(geom_id, kernel_embree_filter_occluded_func);
rtcSetGeometryIntersectFilterFunction(geom_id, kernel_embree_filter_intersection_func);
rtcSetGeometryMask(geom_id, ob->visibility_for_tracing());
rtcCommitGeometry(geom_id);
@@ -739,8 +476,8 @@ void BVHEmbree::add_points(const Object *ob, const PointCloud *pointcloud, int i
set_point_vertex_buffer(geom_id, pointcloud, false);
rtcSetGeometryUserData(geom_id, (void *)prim_offset);
rtcSetGeometryIntersectFilterFunction(geom_id, rtc_filter_func_backface_cull);
rtcSetGeometryOccludedFilterFunction(geom_id, rtc_filter_occluded_func_backface_cull);
rtcSetGeometryIntersectFilterFunction(geom_id, kernel_embree_filter_func_backface_cull);
rtcSetGeometryOccludedFilterFunction(geom_id, kernel_embree_filter_occluded_func_backface_cull);
rtcSetGeometryMask(geom_id, ob->visibility_for_tracing());
rtcCommitGeometry(geom_id);
@@ -799,12 +536,13 @@ void BVHEmbree::add_curves(const Object *ob, const Hair *hair, int i)
rtcSetGeometryUserData(geom_id, (void *)prim_offset);
if (hair->curve_shape == CURVE_RIBBON) {
rtcSetGeometryIntersectFilterFunction(geom_id, rtc_filter_intersection_func);
rtcSetGeometryOccludedFilterFunction(geom_id, rtc_filter_occluded_func);
rtcSetGeometryIntersectFilterFunction(geom_id, kernel_embree_filter_intersection_func);
rtcSetGeometryOccludedFilterFunction(geom_id, kernel_embree_filter_occluded_func);
}
else {
rtcSetGeometryIntersectFilterFunction(geom_id, rtc_filter_func_backface_cull);
rtcSetGeometryOccludedFilterFunction(geom_id, rtc_filter_occluded_func_backface_cull);
rtcSetGeometryIntersectFilterFunction(geom_id, kernel_embree_filter_func_backface_cull);
rtcSetGeometryOccludedFilterFunction(geom_id,
kernel_embree_filter_occluded_func_backface_cull);
}
rtcSetGeometryMask(geom_id, ob->visibility_for_tracing());

View File

@@ -402,6 +402,18 @@ unique_ptr<DeviceQueue> OneapiDevice::gpu_queue_create()
return make_unique<OneapiDeviceQueue>(this);
}
int OneapiDevice::get_num_multiprocessors()
{
assert(device_queue_);
return oneapi_dll_.oneapi_get_num_multiprocessors(device_queue_);
}
int OneapiDevice::get_max_num_threads_per_multiprocessor()
{
assert(device_queue_);
return oneapi_dll_.oneapi_get_max_num_threads_per_multiprocessor(device_queue_);
}
bool OneapiDevice::should_use_graphics_interop()
{
/* NOTE(@nsirgien): oneAPI doesn't yet support direct writing into graphics API objects, so

View File

@@ -89,6 +89,9 @@ class OneapiDevice : public Device {
virtual unique_ptr<DeviceQueue> gpu_queue_create() override;
int get_num_multiprocessors();
int get_max_num_threads_per_multiprocessor();
/* NOTE(@nsirgien): Create this methods to avoid some compilation problems on Windows with host
* side compilation (MSVC). */
void *usm_aligned_alloc_host(size_t memory_size, size_t alignment);

View File

@@ -36,34 +36,9 @@ OneapiDeviceQueue::~OneapiDeviceQueue()
int OneapiDeviceQueue::num_concurrent_states(const size_t state_size) const
{
int num_states;
/* TODO: implement and use get_num_multiprocessors and get_max_num_threads_per_multiprocessor. */
const size_t compute_units = oneapi_dll_.oneapi_get_compute_units_amount(
oneapi_device_->sycl_queue());
if (compute_units >= 128) {
/* dGPU path, make sense to allocate more states, because it will be dedicated GPU memory. */
int base = 1024 * 1024;
/* linear dependency (with coefficient less that 1) from amount of compute units. */
num_states = (base * (compute_units / 128)) * 3 / 4;
/* Limit amount of integrator states by one quarter of device memory, because
* other allocations will need some space as well
* TODO: base this calculation on the how many states what the GPU is actually capable of
* running, with some headroom to improve occupancy. If the texture don't fit, offload into
* unified memory. */
size_t states_memory_size = num_states * state_size;
size_t device_memory_amount =
(oneapi_dll_.oneapi_get_memcapacity)(oneapi_device_->sycl_queue());
if (states_memory_size >= device_memory_amount / 4) {
num_states = device_memory_amount / 4 / state_size;
}
}
else {
/* iGPU path - no real need to allocate a lot of integrator states because it is shared GPU
* memory. */
num_states = 1024 * 512;
}
const int max_num_threads = oneapi_device_->get_num_multiprocessors() *
oneapi_device_->get_max_num_threads_per_multiprocessor();
int num_states = max(8 * max_num_threads, 65536) * 16;
VLOG_DEVICE_STATS << "GPU queue concurrent states: " << num_states << ", using up to "
<< string_human_readable_size(num_states * state_size);
@@ -73,14 +48,10 @@ int OneapiDeviceQueue::num_concurrent_states(const size_t state_size) const
int OneapiDeviceQueue::num_concurrent_busy_states() const
{
const size_t compute_units = oneapi_dll_.oneapi_get_compute_units_amount(
oneapi_device_->sycl_queue());
if (compute_units >= 128) {
return 1024 * 1024;
}
else {
return 1024 * 512;
}
const int max_num_threads = oneapi_device_->get_num_multiprocessors() *
oneapi_device_->get_max_num_threads_per_multiprocessor();
return 4 * max(8 * max_num_threads, 65536);
}
void OneapiDeviceQueue::init_execution()

View File

@@ -26,7 +26,6 @@
# include "util/task.h"
# include "util/time.h"
# undef __KERNEL_CPU__
# define __KERNEL_OPTIX__
# include "kernel/device/optix/globals.h"

View File

@@ -8,7 +8,6 @@
# include "util/time.h"
# undef __KERNEL_CPU__
# define __KERNEL_OPTIX__
# include "kernel/device/optix/globals.h"

View File

@@ -42,6 +42,7 @@ set(SRC_KERNEL_DEVICE_ONEAPI
)
set(SRC_KERNEL_DEVICE_CPU_HEADERS
device/cpu/bvh.h
device/cpu/compat.h
device/cpu/image.h
device/cpu/globals.h
@@ -71,11 +72,13 @@ set(SRC_KERNEL_DEVICE_HIP_HEADERS
)
set(SRC_KERNEL_DEVICE_OPTIX_HEADERS
device/optix/bvh.h
device/optix/compat.h
device/optix/globals.h
)
set(SRC_KERNEL_DEVICE_METAL_HEADERS
device/metal/bvh.h
device/metal/compat.h
device/metal/context_begin.h
device/metal/context_end.h
@@ -214,8 +217,6 @@ set(SRC_KERNEL_BVH_HEADERS
bvh/util.h
bvh/volume.h
bvh/volume_all.h
bvh/embree.h
bvh/metal.h
)
set(SRC_KERNEL_CAMERA_HEADERS
@@ -316,6 +317,7 @@ set(SRC_UTIL_HEADERS
../util/math_float2.h
../util/math_float3.h
../util/math_float4.h
../util/math_float8.h
../util/math_int2.h
../util/math_int3.h
../util/math_int4.h
@@ -353,8 +355,6 @@ set(SRC_UTIL_HEADERS
../util/types_uint4.h
../util/types_uint4_impl.h
../util/types_ushort4.h
../util/types_vector3.h
../util/types_vector3_impl.h
)
set(LIB

View File

@@ -1,40 +1,47 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
/* BVH
*
* Bounding volume hierarchy for ray tracing. We compile different variations
* of the same BVH traversal function for faster rendering when some types of
* primitives are not needed, using #includes to work around the lack of
* C++ templates in OpenCL.
*
* Originally based on "Understanding the Efficiency of Ray Traversal on GPUs",
* the code has been extended and modified to support more primitives and work
* with CPU/CUDA/OpenCL. */
#pragma once
#ifdef __EMBREE__
# include "kernel/bvh/embree.h"
#endif
#ifdef __METALRT__
# include "kernel/bvh/metal.h"
#endif
#include "kernel/bvh/types.h"
#include "kernel/bvh/util.h"
#include "kernel/integrator/state_util.h"
/* Device specific acceleration structures for ray tracing. */
#if defined(__EMBREE__)
# include "kernel/device/cpu/bvh.h"
# define __BVH2__
#elif defined(__METALRT__)
# include "kernel/device/metal/bvh.h"
#elif defined(__KERNEL_OPTIX__)
# include "kernel/device/optix/bvh.h"
#else
# define __BVH2__
#endif
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU_RAYTRACING__)
#ifdef __BVH2__
/* Regular BVH traversal */
/* BVH2
*
* Bounding volume hierarchy for ray tracing, when no native acceleration
* structure is available for the device.
* We compile different variations of the same BVH traversal function for
* faster rendering when some types of primitives are not needed, using #includes
* to work around the lack of C++ templates in OpenCL.
*
* Originally based on "Understanding the Efficiency of Ray Traversal on GPUs",
* the code has been extended and modified to support more primitives and work
* with CPU and various GPU kernel languages. */
# include "kernel/bvh/nodes.h"
/* Regular BVH traversal */
# define BVH_FUNCTION_NAME bvh_intersect
# define BVH_FUNCTION_FEATURES BVH_POINTCLOUD
# include "kernel/bvh/traversal.h"
@@ -57,9 +64,46 @@ CCL_NAMESPACE_BEGIN
# include "kernel/bvh/traversal.h"
# endif
/* Subsurface scattering BVH traversal */
ccl_device_intersect bool scene_intersect(KernelGlobals kg,
ccl_private const Ray *ray,
const uint visibility,
ccl_private Intersection *isect)
{
if (!intersection_ray_valid(ray)) {
return false;
}
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
return kernel_embree_intersect(kg, ray, visibility, isect);
}
# endif
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
# ifdef __HAIR__
if (kernel_data.bvh.have_curves) {
return bvh_intersect_hair_motion(kg, ray, isect, visibility);
}
# endif /* __HAIR__ */
return bvh_intersect_motion(kg, ray, isect, visibility);
}
# endif /* __OBJECT_MOTION__ */
# ifdef __HAIR__
if (kernel_data.bvh.have_curves) {
return bvh_intersect_hair(kg, ray, isect, visibility);
}
# endif /* __HAIR__ */
return bvh_intersect(kg, ray, isect, visibility);
}
/* Single object BVH traversal, for SSS/AO/bevel. */
# ifdef __BVH_LOCAL__
# if defined(__BVH_LOCAL__)
# define BVH_FUNCTION_NAME bvh_intersect_local
# define BVH_FUNCTION_FEATURES BVH_HAIR
# include "kernel/bvh/local.h"
@@ -69,25 +113,40 @@ CCL_NAMESPACE_BEGIN
# define BVH_FUNCTION_FEATURES BVH_MOTION | BVH_HAIR
# include "kernel/bvh/local.h"
# endif
# endif /* __BVH_LOCAL__ */
/* Volume BVH traversal */
ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private LocalIntersection *local_isect,
int local_object,
ccl_private uint *lcg_state,
int max_hits)
{
if (!intersection_ray_valid(ray)) {
if (local_isect) {
local_isect->num_hits = 0;
}
return false;
}
# if defined(__VOLUME__)
# define BVH_FUNCTION_NAME bvh_intersect_volume
# define BVH_FUNCTION_FEATURES BVH_HAIR
# include "kernel/bvh/volume.h"
# if defined(__OBJECT_MOTION__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_motion
# define BVH_FUNCTION_FEATURES BVH_MOTION | BVH_HAIR
# include "kernel/bvh/volume.h"
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
return kernel_embree_intersect_local(kg, ray, local_isect, local_object, lcg_state, max_hits);
}
# endif
# endif /* __VOLUME__ */
/* Record all intersections - Shadow BVH traversal */
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
return bvh_intersect_local_motion(kg, ray, local_isect, local_object, lcg_state, max_hits);
}
# endif /* __OBJECT_MOTION__ */
return bvh_intersect_local(kg, ray, local_isect, local_object, lcg_state, max_hits);
}
# endif
/* Transparent shadow BVH traversal, recording multiple intersections. */
# ifdef __SHADOW_RECORD_ALL__
# if defined(__SHADOW_RECORD_ALL__)
# define BVH_FUNCTION_NAME bvh_intersect_shadow_all
# define BVH_FUNCTION_FEATURES BVH_POINTCLOUD
# include "kernel/bvh/shadow_all.h"
@@ -110,412 +169,6 @@ CCL_NAMESPACE_BEGIN
# include "kernel/bvh/shadow_all.h"
# endif
# endif /* __SHADOW_RECORD_ALL__ */
/* Record all intersections - Volume BVH traversal. */
# if defined(__VOLUME_RECORD_ALL__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_all
# define BVH_FUNCTION_FEATURES BVH_HAIR
# include "kernel/bvh/volume_all.h"
# if defined(__OBJECT_MOTION__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_all_motion
# define BVH_FUNCTION_FEATURES BVH_MOTION | BVH_HAIR
# include "kernel/bvh/volume_all.h"
# endif
# endif /* __VOLUME_RECORD_ALL__ */
# undef BVH_FEATURE
# undef BVH_NAME_JOIN
# undef BVH_NAME_EVAL
# undef BVH_FUNCTION_FULL_NAME
#endif /* !defined(__KERNEL_GPU_RAYTRACING__) */
ccl_device_inline bool scene_intersect_valid(ccl_private const Ray *ray)
{
/* NOTE: Due to some vectorization code non-finite origin point might
* cause lots of false-positive intersections which will overflow traversal
* stack.
* This code is a quick way to perform early output, to avoid crashes in
* such cases.
* From production scenes so far it seems it's enough to test first element
* only.
* Scene intersection may also called with empty rays for conditional trace
* calls that evaluate to false, so filter those out.
*/
return isfinite_safe(ray->P.x) && isfinite_safe(ray->D.x) && len_squared(ray->D) != 0.0f;
}
ccl_device_intersect bool scene_intersect(KernelGlobals kg,
ccl_private const Ray *ray,
const uint visibility,
ccl_private Intersection *isect)
{
#ifdef __KERNEL_OPTIX__
uint p0 = 0;
uint p1 = 0;
uint p2 = 0;
uint p3 = 0;
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
uint ray_flags = OPTIX_RAY_FLAG_ENFORCE_ANYHIT;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
else if (visibility & PATH_RAY_SHADOW_OPAQUE) {
ray_flags |= OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT;
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
ray_flags,
0, /* SBT offset for PG_HITD */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
isect->t = __uint_as_float(p0);
isect->u = __uint_as_float(p1);
isect->v = __uint_as_float(p2);
isect->prim = p3;
isect->object = p4;
isect->type = p5;
return p5 != PRIMITIVE_NONE;
#elif defined(__METALRT__)
if (!scene_intersect_valid(ray)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_default)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
kernel_assert(!"Invalid ift_default");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionPayload payload;
payload.self = ray->self;
payload.u = 0.0f;
payload.v = 0.0f;
payload.visibility = visibility;
typename metalrt_intersector_type::result_type intersection;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
/* No further intersector setup required: Default MetalRT behavior is any-hit. */
}
else if (visibility & PATH_RAY_SHADOW_OPAQUE) {
/* No further intersector setup required: Shadow ray early termination is controlled by the
* intersection handler */
}
# if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_default,
payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_default, payload);
# endif
if (intersection.type == intersection_type::none) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
return false;
}
isect->t = intersection.distance;
isect->prim = payload.prim;
isect->type = payload.type;
isect->object = intersection.user_instance_id;
isect->t = intersection.distance;
if (intersection.type == intersection_type::triangle) {
isect->u = 1.0f - intersection.triangle_barycentric_coord.y -
intersection.triangle_barycentric_coord.x;
isect->v = intersection.triangle_barycentric_coord.x;
}
else {
isect->u = payload.u;
isect->v = payload.v;
}
return isect->type != PRIMITIVE_NONE;
#else
if (!scene_intersect_valid(ray)) {
return false;
}
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
isect->t = ray->tmax;
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_REGULAR);
IntersectContext rtc_ctx(&ctx);
RTCRayHit ray_hit;
ctx.ray = ray;
kernel_embree_setup_rayhit(*ray, ray_hit, visibility);
rtcIntersect1(kernel_data.device_bvh, &rtc_ctx.context, &ray_hit);
if (ray_hit.hit.geomID != RTC_INVALID_GEOMETRY_ID &&
ray_hit.hit.primID != RTC_INVALID_GEOMETRY_ID) {
kernel_embree_convert_hit(kg, &ray_hit.ray, &ray_hit.hit, isect);
return true;
}
return false;
}
# endif /* __EMBREE__ */
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
# ifdef __HAIR__
if (kernel_data.bvh.have_curves) {
return bvh_intersect_hair_motion(kg, ray, isect, visibility);
}
# endif /* __HAIR__ */
return bvh_intersect_motion(kg, ray, isect, visibility);
}
# endif /* __OBJECT_MOTION__ */
# ifdef __HAIR__
if (kernel_data.bvh.have_curves) {
return bvh_intersect_hair(kg, ray, isect, visibility);
}
# endif /* __HAIR__ */
return bvh_intersect(kg, ray, isect, visibility);
#endif /* __KERNEL_OPTIX__ */
}
#ifdef __BVH_LOCAL__
ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private LocalIntersection *local_isect,
int local_object,
ccl_private uint *lcg_state,
int max_hits)
{
# ifdef __KERNEL_OPTIX__
uint p0 = pointer_pack_to_uint_0(lcg_state);
uint p1 = pointer_pack_to_uint_1(lcg_state);
uint p2 = pointer_pack_to_uint_0(local_isect);
uint p3 = pointer_pack_to_uint_1(local_isect);
uint p4 = local_object;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
/* Is set to zero on miss or if ray is aborted, so can be used as return value. */
uint p5 = max_hits;
if (local_isect) {
local_isect->num_hits = 0; /* Initialize hit count to zero. */
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
0xFF,
/* Need to always call into __anyhit__kernel_optix_local_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
2, /* SBT offset for PG_HITL */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
return p5;
# elif defined(__METALRT__)
if (!scene_intersect_valid(ray)) {
if (local_isect) {
local_isect->num_hits = 0;
}
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_local)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid ift_local");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionLocalPayload payload;
payload.self = ray->self;
payload.local_object = local_object;
payload.max_hits = max_hits;
payload.local_isect.num_hits = 0;
if (lcg_state) {
payload.has_lcg_state = true;
payload.lcg_state = *lcg_state;
}
payload.result = false;
typename metalrt_intersector_type::result_type intersection;
# if defined(__METALRT_MOTION__)
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, ray->time, metal_ancillaries->ift_local, payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, metal_ancillaries->ift_local, payload);
# endif
if (lcg_state) {
*lcg_state = payload.lcg_state;
}
*local_isect = payload.local_isect;
return payload.result;
# else
if (!scene_intersect_valid(ray)) {
if (local_isect) {
local_isect->num_hits = 0;
}
return false;
}
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
const bool has_bvh = !(kernel_data_fetch(object_flag, local_object) &
SD_OBJECT_TRANSFORM_APPLIED);
CCLIntersectContext ctx(
kg, has_bvh ? CCLIntersectContext::RAY_SSS : CCLIntersectContext::RAY_LOCAL);
ctx.lcg_state = lcg_state;
ctx.max_hits = max_hits;
ctx.ray = ray;
ctx.local_isect = local_isect;
if (local_isect) {
local_isect->num_hits = 0;
}
ctx.local_object_id = local_object;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, PATH_RAY_ALL_VISIBILITY);
/* If this object has its own BVH, use it. */
if (has_bvh) {
RTCGeometry geom = rtcGetGeometry(kernel_data.device_bvh, local_object * 2);
if (geom) {
float3 P = ray->P;
float3 dir = ray->D;
float3 idir = ray->D;
Transform ob_itfm;
rtc_ray.tfar = ray->tmax *
bvh_instance_motion_push(kg, local_object, ray, &P, &dir, &idir, &ob_itfm);
/* bvh_instance_motion_push() returns the inverse transform but
* it's not needed here. */
(void)ob_itfm;
rtc_ray.org_x = P.x;
rtc_ray.org_y = P.y;
rtc_ray.org_z = P.z;
rtc_ray.dir_x = dir.x;
rtc_ray.dir_y = dir.y;
rtc_ray.dir_z = dir.z;
RTCScene scene = (RTCScene)rtcGetGeometryUserData(geom);
kernel_assert(scene);
if (scene) {
rtcOccluded1(scene, &rtc_ctx.context, &rtc_ray);
}
}
}
else {
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
}
/* rtcOccluded1 sets tfar to -inf if a hit was found. */
return (local_isect && local_isect->num_hits > 0) || (rtc_ray.tfar < 0);
;
}
# endif /* __EMBREE__ */
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
return bvh_intersect_local_motion(kg, ray, local_isect, local_object, lcg_state, max_hits);
}
# endif /* __OBJECT_MOTION__ */
return bvh_intersect_local(kg, ray, local_isect, local_object, lcg_state, max_hits);
# endif /* __KERNEL_OPTIX__ */
}
#endif
#ifdef __SHADOW_RECORD_ALL__
ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
IntegratorShadowState state,
ccl_private const Ray *ray,
@@ -524,109 +177,7 @@ ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
ccl_private uint *num_recorded_hits,
ccl_private float *throughput)
{
# ifdef __KERNEL_OPTIX__
uint p0 = state;
uint p1 = __float_as_uint(1.0f); /* Throughput. */
uint p2 = 0; /* Number of hits. */
uint p3 = max_hits;
uint p4 = visibility;
uint p5 = false;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
/* Need to always call into __anyhit__kernel_optix_shadow_all_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
1, /* SBT offset for PG_HITS */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
*num_recorded_hits = uint16_unpack_from_uint_0(p2);
*throughput = __uint_as_float(p1);
return p5;
# elif defined(__METALRT__)
if (!scene_intersect_valid(ray)) {
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_shadow)) {
kernel_assert(!"Invalid ift_shadow");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionShadowPayload payload;
payload.self = ray->self;
payload.visibility = visibility;
payload.max_hits = max_hits;
payload.num_hits = 0;
payload.num_recorded_hits = 0;
payload.throughput = 1.0f;
payload.result = false;
payload.state = state;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
typename metalrt_intersector_type::result_type intersection;
# if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_shadow,
payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_shadow, payload);
# endif
*num_recorded_hits = payload.num_recorded_hits;
*throughput = payload.throughput;
return payload.result;
# else
if (!scene_intersect_valid(ray)) {
if (!intersection_ray_valid(ray)) {
*num_recorded_hits = 0;
*throughput = 1.0f;
return false;
@@ -634,21 +185,10 @@ ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_SHADOW_ALL);
Intersection *isect_array = (Intersection *)state->shadow_isect;
ctx.isect_s = isect_array;
ctx.max_hits = max_hits;
ctx.ray = ray;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, visibility);
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
*num_recorded_hits = ctx.num_recorded_hits;
*throughput = ctx.throughput;
return ctx.opaque_hit;
return kernel_embree_intersect_shadow_all(
kg, state, ray, visibility, max_hits, num_recorded_hits, throughput);
}
# endif /* __EMBREE__ */
# endif
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
@@ -662,7 +202,7 @@ ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
return bvh_intersect_shadow_all_motion(
kg, ray, state, visibility, max_hits, num_recorded_hits, throughput);
}
# endif /* __OBJECT_MOTION__ */
# endif /* __OBJECT_MOTION__ */
# ifdef __HAIR__
if (kernel_data.bvh.have_curves) {
@@ -673,132 +213,29 @@ ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
return bvh_intersect_shadow_all(
kg, ray, state, visibility, max_hits, num_recorded_hits, throughput);
# endif /* __KERNEL_OPTIX__ */
}
#endif /* __SHADOW_RECORD_ALL__ */
# endif /* __SHADOW_RECORD_ALL__ */
/* Volume BVH traversal, for initializing or updating the volume stack. */
# if defined(__VOLUME__) && !defined(__VOLUME_RECORD_ALL__)
# define BVH_FUNCTION_NAME bvh_intersect_volume
# define BVH_FUNCTION_FEATURES BVH_HAIR
# include "kernel/bvh/volume.h"
# if defined(__OBJECT_MOTION__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_motion
# define BVH_FUNCTION_FEATURES BVH_MOTION | BVH_HAIR
# include "kernel/bvh/volume.h"
# endif
#ifdef __VOLUME__
ccl_device_intersect bool scene_intersect_volume(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint visibility)
{
# ifdef __KERNEL_OPTIX__
uint p0 = 0;
uint p1 = 0;
uint p2 = 0;
uint p3 = 0;
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
/* Need to always call into __anyhit__kernel_optix_volume_test. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
3, /* SBT offset for PG_HITV */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
isect->t = __uint_as_float(p0);
isect->u = __uint_as_float(p1);
isect->v = __uint_as_float(p2);
isect->prim = p3;
isect->object = p4;
isect->type = p5;
return p5 != PRIMITIVE_NONE;
# elif defined(__METALRT__)
if (!scene_intersect_valid(ray)) {
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_default)) {
kernel_assert(!"Invalid ift_default");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionPayload payload;
payload.self = ray->self;
payload.visibility = visibility;
typename metalrt_intersector_type::result_type intersection;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
# if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_default,
payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_default, payload);
# endif
if (intersection.type == intersection_type::none) {
return false;
}
isect->prim = payload.prim;
isect->type = payload.type;
isect->object = intersection.user_instance_id;
isect->t = intersection.distance;
if (intersection.type == intersection_type::triangle) {
isect->u = 1.0f - intersection.triangle_barycentric_coord.y -
intersection.triangle_barycentric_coord.x;
isect->v = intersection.triangle_barycentric_coord.x;
}
else {
isect->u = payload.u;
isect->v = payload.v;
}
return isect->type != PRIMITIVE_NONE;
# else
if (!scene_intersect_valid(ray)) {
if (!intersection_ray_valid(ray)) {
return false;
}
@@ -809,44 +246,56 @@ ccl_device_intersect bool scene_intersect_volume(KernelGlobals kg,
# endif /* __OBJECT_MOTION__ */
return bvh_intersect_volume(kg, ray, isect, visibility);
# endif /* __KERNEL_OPTIX__ */
}
#endif /* __VOLUME__ */
# endif /* defined(__VOLUME__) && !defined(__VOLUME_RECORD_ALL__) */
#ifdef __VOLUME_RECORD_ALL__
ccl_device_intersect uint scene_intersect_volume_all(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint max_hits,
const uint visibility)
/* Volume BVH traversal, for initializing or updating the volume stack.
* Variation that records multiple intersections at once. */
# if defined(__VOLUME__) && defined(__VOLUME_RECORD_ALL__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_all
# define BVH_FUNCTION_FEATURES BVH_HAIR
# include "kernel/bvh/volume_all.h"
# if defined(__OBJECT_MOTION__)
# define BVH_FUNCTION_NAME bvh_intersect_volume_all_motion
# define BVH_FUNCTION_FEATURES BVH_MOTION | BVH_HAIR
# include "kernel/bvh/volume_all.h"
# endif
ccl_device_intersect uint scene_intersect_volume(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint max_hits,
const uint visibility)
{
if (!scene_intersect_valid(ray)) {
if (!intersection_ray_valid(ray)) {
return false;
}
# ifdef __EMBREE__
# ifdef __EMBREE__
if (kernel_data.device_bvh) {
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_VOLUME_ALL);
ctx.isect_s = isect;
ctx.max_hits = max_hits;
ctx.num_hits = 0;
ctx.ray = ray;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, visibility);
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
return ctx.num_hits;
return kernel_embree_intersect_volume(kg, ray, isect, max_hits, visibility);
}
# endif /* __EMBREE__ */
# endif
# ifdef __OBJECT_MOTION__
# ifdef __OBJECT_MOTION__
if (kernel_data.bvh.have_motion) {
return bvh_intersect_volume_all_motion(kg, ray, isect, max_hits, visibility);
}
# endif /* __OBJECT_MOTION__ */
# endif /* __OBJECT_MOTION__ */
return bvh_intersect_volume_all(kg, ray, isect, max_hits, visibility);
}
#endif /* __VOLUME_RECORD_ALL__ */
# endif /* defined(__VOLUME__) && defined(__VOLUME_RECORD_ALL__) */
# undef BVH_FEATURE
# undef BVH_NAME_JOIN
# undef BVH_NAME_EVAL
# undef BVH_FUNCTION_FULL_NAME
#endif /* __BVH2__ */
CCL_NAMESPACE_END

View File

@@ -1,176 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2018-2022 Blender Foundation. */
#pragma once
#include <embree3/rtcore_ray.h>
#include <embree3/rtcore_scene.h>
#include "kernel/device/cpu/compat.h"
#include "kernel/device/cpu/globals.h"
#include "kernel/bvh/util.h"
#include "util/vector.h"
CCL_NAMESPACE_BEGIN
struct CCLIntersectContext {
typedef enum {
RAY_REGULAR = 0,
RAY_SHADOW_ALL = 1,
RAY_LOCAL = 2,
RAY_SSS = 3,
RAY_VOLUME_ALL = 4,
} RayType;
KernelGlobals kg;
RayType type;
/* For avoiding self intersections */
const Ray *ray;
/* for shadow rays */
Intersection *isect_s;
uint max_hits;
uint num_hits;
uint num_recorded_hits;
float throughput;
float max_t;
bool opaque_hit;
/* for SSS Rays: */
LocalIntersection *local_isect;
int local_object_id;
uint *lcg_state;
CCLIntersectContext(KernelGlobals kg_, RayType type_)
{
kg = kg_;
type = type_;
ray = NULL;
max_hits = 1;
num_hits = 0;
num_recorded_hits = 0;
throughput = 1.0f;
max_t = FLT_MAX;
opaque_hit = false;
isect_s = NULL;
local_isect = NULL;
local_object_id = -1;
lcg_state = NULL;
}
};
class IntersectContext {
public:
IntersectContext(CCLIntersectContext *ctx)
{
rtcInitIntersectContext(&context);
userRayExt = ctx;
}
RTCIntersectContext context;
CCLIntersectContext *userRayExt;
};
ccl_device_inline void kernel_embree_setup_ray(const Ray &ray,
RTCRay &rtc_ray,
const uint visibility)
{
rtc_ray.org_x = ray.P.x;
rtc_ray.org_y = ray.P.y;
rtc_ray.org_z = ray.P.z;
rtc_ray.dir_x = ray.D.x;
rtc_ray.dir_y = ray.D.y;
rtc_ray.dir_z = ray.D.z;
rtc_ray.tnear = ray.tmin;
rtc_ray.tfar = ray.tmax;
rtc_ray.time = ray.time;
rtc_ray.mask = visibility;
}
ccl_device_inline void kernel_embree_setup_rayhit(const Ray &ray,
RTCRayHit &rayhit,
const uint visibility)
{
kernel_embree_setup_ray(ray, rayhit.ray, visibility);
rayhit.hit.geomID = RTC_INVALID_GEOMETRY_ID;
rayhit.hit.instID[0] = RTC_INVALID_GEOMETRY_ID;
}
ccl_device_inline bool kernel_embree_is_self_intersection(const KernelGlobals kg,
const RTCHit *hit,
const Ray *ray)
{
bool status = false;
if (hit->instID[0] != RTC_INVALID_GEOMETRY_ID) {
const int oID = hit->instID[0] / 2;
if ((ray->self.object == oID) || (ray->self.light_object == oID)) {
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->instID[0]));
const int pID = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
status = intersection_skip_self_shadow(ray->self, oID, pID);
}
}
else {
const int oID = hit->geomID / 2;
if ((ray->self.object == oID) || (ray->self.light_object == oID)) {
const int pID = hit->primID + (intptr_t)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->geomID));
status = intersection_skip_self_shadow(ray->self, oID, pID);
}
}
return status;
}
ccl_device_inline void kernel_embree_convert_hit(KernelGlobals kg,
const RTCRay *ray,
const RTCHit *hit,
Intersection *isect)
{
isect->t = ray->tfar;
if (hit->instID[0] != RTC_INVALID_GEOMETRY_ID) {
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->instID[0]));
isect->prim = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
isect->object = hit->instID[0] / 2;
}
else {
isect->prim = hit->primID + (intptr_t)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->geomID));
isect->object = hit->geomID / 2;
}
const bool is_hair = hit->geomID & 1;
if (is_hair) {
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, isect->prim);
isect->type = segment.type;
isect->prim = segment.prim;
isect->u = hit->u;
isect->v = hit->v;
}
else {
isect->type = kernel_data_fetch(objects, isect->object).primitive_type;
isect->u = 1.0f - hit->v - hit->u;
isect->v = hit->u;
}
}
ccl_device_inline void kernel_embree_convert_sss_hit(
KernelGlobals kg, const RTCRay *ray, const RTCHit *hit, Intersection *isect, int object)
{
isect->u = 1.0f - hit->v - hit->u;
isect->v = hit->u;
isect->t = ray->tfar;
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, object * 2));
isect->prim = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
isect->object = object;
isect->type = kernel_data_fetch(objects, object).primitive_type;
}
CCL_NAMESPACE_END

View File

@@ -59,14 +59,10 @@ ccl_device_inline
const int object_flag = kernel_data_fetch(object_flag, local_object);
if (!(object_flag & SD_OBJECT_TRANSFORM_APPLIED)) {
#if BVH_FEATURE(BVH_MOTION)
Transform ob_itfm;
const float t_world_to_instance = bvh_instance_motion_push(
kg, local_object, ray, &P, &dir, &idir, &ob_itfm);
bvh_instance_motion_push(kg, local_object, ray, &P, &dir, &idir);
#else
const float t_world_to_instance = bvh_instance_push(kg, local_object, ray, &P, &dir, &idir);
bvh_instance_push(kg, local_object, ray, &P, &dir, &idir);
#endif
isect_t *= t_world_to_instance;
tmin *= t_world_to_instance;
object = local_object;
}

View File

@@ -1,37 +0,0 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2021-2022 Blender Foundation */
struct MetalRTIntersectionPayload {
RaySelfPrimitives self;
uint visibility;
float u, v;
int prim;
int type;
#if defined(__METALRT_MOTION__)
float time;
#endif
};
struct MetalRTIntersectionLocalPayload {
RaySelfPrimitives self;
uint local_object;
uint lcg_state;
short max_hits;
bool has_lcg_state;
bool result;
LocalIntersection local_isect;
};
struct MetalRTIntersectionShadowPayload {
RaySelfPrimitives self;
uint visibility;
#if defined(__METALRT_MOTION__)
float time;
#endif
int state;
float throughput;
short max_hits;
short num_hits;
short num_recorded_hits;
bool result;
};

View File

@@ -53,23 +53,11 @@ ccl_device_inline
int object = OBJECT_NONE;
uint num_hits = 0;
#if BVH_FEATURE(BVH_MOTION)
Transform ob_itfm;
#endif
/* Max distance in world space. May be dynamically reduced when max number of
* recorded hits is exceeded and we no longer need to find hits beyond the max
* distance found. */
float t_max_world = ray->tmax;
/* Current maximum distance to the intersection.
* Is calculated as a ray length, transformed to an object space when entering
* instance node. */
float t_max_current = ray->tmax;
/* Conversion from world to local space for the current instance if any, 1.0
* otherwise. */
float t_world_to_instance = 1.0f;
const float tmax = ray->tmax;
float tmax_hits = tmax;
*r_num_recorded_hits = 0;
*r_throughput = 1.0f;
@@ -90,7 +78,7 @@ ccl_device_inline
#endif
idir,
tmin,
t_max_current,
tmax,
node_addr,
visibility,
dist);
@@ -158,16 +146,8 @@ ccl_device_inline
switch (type & PRIMITIVE_ALL) {
case PRIMITIVE_TRIANGLE: {
hit = triangle_intersect(kg,
&isect,
P,
dir,
tmin,
t_max_current,
visibility,
prim_object,
prim,
prim_addr);
hit = triangle_intersect(
kg, &isect, P, dir, tmin, tmax, visibility, prim_object, prim, prim_addr);
break;
}
#if BVH_FEATURE(BVH_MOTION)
@@ -177,7 +157,7 @@ ccl_device_inline
P,
dir,
tmin,
t_max_current,
tmax,
ray->time,
visibility,
prim_object,
@@ -200,16 +180,8 @@ ccl_device_inline
}
const int curve_type = kernel_data_fetch(prim_type, prim_addr);
hit = curve_intersect(kg,
&isect,
P,
dir,
tmin,
t_max_current,
prim_object,
prim,
ray->time,
curve_type);
hit = curve_intersect(
kg, &isect, P, dir, tmin, tmax, prim_object, prim, ray->time, curve_type);
break;
}
@@ -226,16 +198,8 @@ ccl_device_inline
}
const int point_type = kernel_data_fetch(prim_type, prim_addr);
hit = point_intersect(kg,
&isect,
P,
dir,
tmin,
t_max_current,
prim_object,
prim,
ray->time,
point_type);
hit = point_intersect(
kg, &isect, P, dir, tmin, tmax, prim_object, prim, ray->time, point_type);
break;
}
#endif /* BVH_FEATURE(BVH_POINTCLOUD) */
@@ -247,9 +211,6 @@ ccl_device_inline
/* shadow ray early termination */
if (hit) {
/* Convert intersection distance to world space. */
isect.t /= t_world_to_instance;
/* detect if this surface has a shader with transparent shadows */
/* todo: optimize so primitive visibility flag indicates if
* the primitive has a transparent shadow shader? */
@@ -281,7 +242,7 @@ ccl_device_inline
if (record_intersection) {
/* Test if we need to record this transparent intersection. */
const uint max_record_hits = min(max_hits, INTEGRATOR_SHADOW_ISECT_SIZE);
if (*r_num_recorded_hits < max_record_hits || isect.t < t_max_world) {
if (*r_num_recorded_hits < max_record_hits || isect.t < tmax_hits) {
/* If maximum number of hits was reached, replace the intersection with the
* highest distance. We want to find the N closest intersections. */
const uint num_recorded_hits = min(*r_num_recorded_hits, max_record_hits);
@@ -303,7 +264,7 @@ ccl_device_inline
}
/* Limit the ray distance and stop counting hits beyond this. */
t_max_world = max(isect.t, max_t);
tmax_hits = max(isect.t, max_t);
}
integrator_state_write_shadow_isect(state, &isect, isect_index);
@@ -321,16 +282,11 @@ ccl_device_inline
object = kernel_data_fetch(prim_object, -prim_addr - 1);
#if BVH_FEATURE(BVH_MOTION)
t_world_to_instance = bvh_instance_motion_push(
kg, object, ray, &P, &dir, &idir, &ob_itfm);
bvh_instance_motion_push(kg, object, ray, &P, &dir, &idir);
#else
t_world_to_instance = bvh_instance_push(kg, object, ray, &P, &dir, &idir);
bvh_instance_push(kg, object, ray, &P, &dir, &idir);
#endif
/* Convert intersection to object space. */
t_max_current *= t_world_to_instance;
tmin *= t_world_to_instance;
++stack_ptr;
kernel_assert(stack_ptr < BVH_STACK_SIZE);
traversal_stack[stack_ptr] = ENTRYPOINT_SENTINEL;
@@ -344,18 +300,9 @@ ccl_device_inline
kernel_assert(object != OBJECT_NONE);
/* Instance pop. */
#if BVH_FEATURE(BVH_MOTION)
bvh_instance_motion_pop(kg, object, ray, &P, &dir, &idir, FLT_MAX, &ob_itfm);
#else
bvh_instance_pop(kg, object, ray, &P, &dir, &idir, FLT_MAX);
#endif
/* Restore world space ray length. */
tmin = ray->tmin;
t_max_current = ray->tmax;
bvh_instance_pop(ray, &P, &dir, &idir);
object = OBJECT_NONE;
t_world_to_instance = 1.0f;
node_addr = traversal_stack[stack_ptr];
--stack_ptr;
}

View File

@@ -43,13 +43,9 @@ ccl_device_noinline bool BVH_FUNCTION_FULL_NAME(BVH)(KernelGlobals kg,
float3 P = ray->P;
float3 dir = bvh_clamp_direction(ray->D);
float3 idir = bvh_inverse_direction(dir);
float tmin = ray->tmin;
const float tmin = ray->tmin;
int object = OBJECT_NONE;
#if BVH_FEATURE(BVH_MOTION)
Transform ob_itfm;
#endif
isect->t = ray->tmax;
isect->u = 0.0f;
isect->v = 0.0f;
@@ -223,15 +219,11 @@ ccl_device_noinline bool BVH_FUNCTION_FULL_NAME(BVH)(KernelGlobals kg,
object = kernel_data_fetch(prim_object, -prim_addr - 1);
#if BVH_FEATURE(BVH_MOTION)
const float t_world_to_instance = bvh_instance_motion_push(
kg, object, ray, &P, &dir, &idir, &ob_itfm);
bvh_instance_motion_push(kg, object, ray, &P, &dir, &idir);
#else
const float t_world_to_instance = bvh_instance_push(kg, object, ray, &P, &dir, &idir);
bvh_instance_push(kg, object, ray, &P, &dir, &idir);
#endif
isect->t *= t_world_to_instance;
tmin *= t_world_to_instance;
++stack_ptr;
kernel_assert(stack_ptr < BVH_STACK_SIZE);
traversal_stack[stack_ptr] = ENTRYPOINT_SENTINEL;
@@ -245,12 +237,7 @@ ccl_device_noinline bool BVH_FUNCTION_FULL_NAME(BVH)(KernelGlobals kg,
kernel_assert(object != OBJECT_NONE);
/* instance pop */
#if BVH_FEATURE(BVH_MOTION)
isect->t = bvh_instance_motion_pop(kg, object, ray, &P, &dir, &idir, isect->t, &ob_itfm);
#else
isect->t = bvh_instance_pop(kg, object, ray, &P, &dir, &idir, isect->t);
#endif
tmin = ray->tmin;
bvh_instance_pop(ray, &P, &dir, &idir);
object = OBJECT_NONE;
node_addr = traversal_stack[stack_ptr];

View File

@@ -5,20 +5,35 @@
CCL_NAMESPACE_BEGIN
ccl_device_inline bool intersection_ray_valid(ccl_private const Ray *ray)
{
/* NOTE: Due to some vectorization code non-finite origin point might
* cause lots of false-positive intersections which will overflow traversal
* stack.
* This code is a quick way to perform early output, to avoid crashes in
* such cases.
* From production scenes so far it seems it's enough to test first element
* only.
* Scene intersection may also called with empty rays for conditional trace
* calls that evaluate to false, so filter those out.
*/
return isfinite_safe(ray->P.x) && isfinite_safe(ray->D.x) && len_squared(ray->D) != 0.0f;
}
/* Offset intersection distance by the smallest possible amount, to skip
* intersections at this distance. This works in cases where the ray start
* position is unchanged and only tmin is updated, since for self
* intersection we'll be comparing against the exact same distances. */
ccl_device_forceinline float intersection_t_offset(const float t)
{
/* This is a simplified version of nextafterf(t, FLT_MAX), only dealing with
/* This is a simplified version of `nextafterf(t, FLT_MAX)`, only dealing with
* non-negative and finite t. */
kernel_assert(t >= 0.0f && isfinite_safe(t));
const uint32_t bits = (t == 0.0f) ? 1 : __float_as_uint(t) + 1;
return __uint_as_float(bits);
}
#if defined(__KERNEL_CPU__)
#ifndef __KERNEL_GPU__
ccl_device int intersections_compare(const void *a, const void *b)
{
const Intersection *isect_a = (const Intersection *)a;

View File

@@ -46,13 +46,9 @@ ccl_device_inline
float3 P = ray->P;
float3 dir = bvh_clamp_direction(ray->D);
float3 idir = bvh_inverse_direction(dir);
float tmin = ray->tmin;
const float tmin = ray->tmin;
int object = OBJECT_NONE;
#if BVH_FEATURE(BVH_MOTION)
Transform ob_itfm;
#endif
isect->t = ray->tmax;
isect->u = 0.0f;
isect->v = 0.0f;
@@ -189,15 +185,11 @@ ccl_device_inline
int object_flag = kernel_data_fetch(object_flag, object);
if (object_flag & SD_OBJECT_HAS_VOLUME) {
#if BVH_FEATURE(BVH_MOTION)
const float t_world_to_instance = bvh_instance_motion_push(
kg, object, ray, &P, &dir, &idir, &ob_itfm);
bvh_instance_motion_push(kg, object, ray, &P, &dir, &idir);
#else
const float t_world_to_instance = bvh_instance_push(kg, object, ray, &P, &dir, &idir);
bvh_instance_push(kg, object, ray, &P, &dir, &idir);
#endif
isect->t *= t_world_to_instance;
tmin *= t_world_to_instance;
++stack_ptr;
kernel_assert(stack_ptr < BVH_STACK_SIZE);
traversal_stack[stack_ptr] = ENTRYPOINT_SENTINEL;
@@ -218,13 +210,7 @@ ccl_device_inline
kernel_assert(object != OBJECT_NONE);
/* instance pop */
#if BVH_FEATURE(BVH_MOTION)
isect->t = bvh_instance_motion_pop(kg, object, ray, &P, &dir, &idir, isect->t, &ob_itfm);
#else
isect->t = bvh_instance_pop(kg, object, ray, &P, &dir, &idir, isect->t);
#endif
tmin = ray->tmin;
bvh_instance_pop(ray, &P, &dir, &idir);
object = OBJECT_NONE;
node_addr = traversal_stack[stack_ptr];

View File

@@ -47,14 +47,10 @@ ccl_device_inline
float3 P = ray->P;
float3 dir = bvh_clamp_direction(ray->D);
float3 idir = bvh_inverse_direction(dir);
float tmin = ray->tmin;
const float tmin = ray->tmin;
int object = OBJECT_NONE;
float isect_t = ray->tmax;
#if BVH_FEATURE(BVH_MOTION)
Transform ob_itfm;
#endif
int num_hits_in_instance = 0;
uint num_hits = 0;
@@ -159,18 +155,6 @@ ccl_device_inline
num_hits_in_instance++;
isect_array->t = isect_t;
if (num_hits == max_hits) {
if (object != OBJECT_NONE) {
#if BVH_FEATURE(BVH_MOTION)
float t_fac = 1.0f / len(transform_direction(&ob_itfm, dir));
#else
Transform itfm = object_fetch_transform(
kg, object, OBJECT_INVERSE_TRANSFORM);
float t_fac = 1.0f / len(transform_direction(&itfm, dir));
#endif
for (int i = 0; i < num_hits_in_instance; i++) {
(isect_array - i - 1)->t *= t_fac;
}
}
return num_hits;
}
}
@@ -212,18 +196,6 @@ ccl_device_inline
num_hits_in_instance++;
isect_array->t = isect_t;
if (num_hits == max_hits) {
if (object != OBJECT_NONE) {
# if BVH_FEATURE(BVH_MOTION)
float t_fac = 1.0f / len(transform_direction(&ob_itfm, dir));
# else
Transform itfm = object_fetch_transform(
kg, object, OBJECT_INVERSE_TRANSFORM);
float t_fac = 1.0f / len(transform_direction(&itfm, dir));
# endif
for (int i = 0; i < num_hits_in_instance; i++) {
(isect_array - i - 1)->t *= t_fac;
}
}
return num_hits;
}
}
@@ -242,15 +214,11 @@ ccl_device_inline
int object_flag = kernel_data_fetch(object_flag, object);
if (object_flag & SD_OBJECT_HAS_VOLUME) {
#if BVH_FEATURE(BVH_MOTION)
const float t_world_to_instance = bvh_instance_motion_push(
kg, object, ray, &P, &dir, &idir, &ob_itfm);
bvh_instance_motion_push(kg, object, ray, &P, &dir, &idir);
#else
const float t_world_to_instance = bvh_instance_push(kg, object, ray, &P, &dir, &idir);
bvh_instance_push(kg, object, ray, &P, &dir, &idir);
#endif
isect_t *= t_world_to_instance;
tmin *= t_world_to_instance;
num_hits_in_instance = 0;
isect_array->t = isect_t;
@@ -274,29 +242,7 @@ ccl_device_inline
kernel_assert(object != OBJECT_NONE);
/* Instance pop. */
if (num_hits_in_instance) {
float t_fac;
#if BVH_FEATURE(BVH_MOTION)
bvh_instance_motion_pop_factor(kg, object, ray, &P, &dir, &idir, &t_fac, &ob_itfm);
#else
bvh_instance_pop_factor(kg, object, ray, &P, &dir, &idir, &t_fac);
#endif
/* Scale isect->t to adjust for instancing. */
for (int i = 0; i < num_hits_in_instance; i++) {
(isect_array - i - 1)->t *= t_fac;
}
}
else {
#if BVH_FEATURE(BVH_MOTION)
bvh_instance_motion_pop(kg, object, ray, &P, &dir, &idir, FLT_MAX, &ob_itfm);
#else
bvh_instance_pop(kg, object, ray, &P, &dir, &idir, FLT_MAX);
#endif
}
tmin = ray->tmin;
isect_t = ray->tmax;
isect_array->t = isect_t;
bvh_instance_pop(ray, &P, &dir, &idir);
object = OBJECT_NONE;
node_addr = traversal_stack[stack_ptr];

View File

@@ -3,7 +3,7 @@
#pragma once
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
# include <fenv.h>
#endif

View File

@@ -70,7 +70,7 @@ KERNEL_STRUCT_MEMBER(film, float4, rec709_to_r)
KERNEL_STRUCT_MEMBER(film, float4, rec709_to_g)
KERNEL_STRUCT_MEMBER(film, float4, rec709_to_b)
KERNEL_STRUCT_MEMBER(film, int, is_rec709)
/* Exposuse. */
/* Exposure. */
KERNEL_STRUCT_MEMBER(film, float, exposure)
/* Passed used. */
KERNEL_STRUCT_MEMBER(film, int, pass_flag)

View File

@@ -0,0 +1,572 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2021-2022 Blender Foundation */
/* CPU Embree implementation of ray-scene intersection. */
#pragma once
#include <embree3/rtcore_ray.h>
#include <embree3/rtcore_scene.h>
#include "kernel/device/cpu/compat.h"
#include "kernel/device/cpu/globals.h"
#include "kernel/bvh/types.h"
#include "kernel/bvh/util.h"
#include "kernel/geom/object.h"
#include "kernel/integrator/state.h"
#include "kernel/sample/lcg.h"
#include "util/vector.h"
CCL_NAMESPACE_BEGIN
#define EMBREE_IS_HAIR(x) (x & 1)
/* Intersection context. */
struct CCLIntersectContext {
typedef enum {
RAY_REGULAR = 0,
RAY_SHADOW_ALL = 1,
RAY_LOCAL = 2,
RAY_SSS = 3,
RAY_VOLUME_ALL = 4,
} RayType;
KernelGlobals kg;
RayType type;
/* For avoiding self intersections */
const Ray *ray;
/* for shadow rays */
Intersection *isect_s;
uint max_hits;
uint num_hits;
uint num_recorded_hits;
float throughput;
float max_t;
bool opaque_hit;
/* for SSS Rays: */
LocalIntersection *local_isect;
int local_object_id;
uint *lcg_state;
CCLIntersectContext(KernelGlobals kg_, RayType type_)
{
kg = kg_;
type = type_;
ray = NULL;
max_hits = 1;
num_hits = 0;
num_recorded_hits = 0;
throughput = 1.0f;
max_t = FLT_MAX;
opaque_hit = false;
isect_s = NULL;
local_isect = NULL;
local_object_id = -1;
lcg_state = NULL;
}
};
class IntersectContext {
public:
IntersectContext(CCLIntersectContext *ctx)
{
rtcInitIntersectContext(&context);
userRayExt = ctx;
}
RTCIntersectContext context;
CCLIntersectContext *userRayExt;
};
/* Utilities. */
ccl_device_inline void kernel_embree_setup_ray(const Ray &ray,
RTCRay &rtc_ray,
const uint visibility)
{
rtc_ray.org_x = ray.P.x;
rtc_ray.org_y = ray.P.y;
rtc_ray.org_z = ray.P.z;
rtc_ray.dir_x = ray.D.x;
rtc_ray.dir_y = ray.D.y;
rtc_ray.dir_z = ray.D.z;
rtc_ray.tnear = ray.tmin;
rtc_ray.tfar = ray.tmax;
rtc_ray.time = ray.time;
rtc_ray.mask = visibility;
}
ccl_device_inline void kernel_embree_setup_rayhit(const Ray &ray,
RTCRayHit &rayhit,
const uint visibility)
{
kernel_embree_setup_ray(ray, rayhit.ray, visibility);
rayhit.hit.geomID = RTC_INVALID_GEOMETRY_ID;
rayhit.hit.instID[0] = RTC_INVALID_GEOMETRY_ID;
}
ccl_device_inline bool kernel_embree_is_self_intersection(const KernelGlobals kg,
const RTCHit *hit,
const Ray *ray)
{
bool status = false;
if (hit->instID[0] != RTC_INVALID_GEOMETRY_ID) {
const int oID = hit->instID[0] / 2;
if ((ray->self.object == oID) || (ray->self.light_object == oID)) {
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->instID[0]));
const int pID = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
status = intersection_skip_self_shadow(ray->self, oID, pID);
}
}
else {
const int oID = hit->geomID / 2;
if ((ray->self.object == oID) || (ray->self.light_object == oID)) {
const int pID = hit->primID + (intptr_t)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->geomID));
status = intersection_skip_self_shadow(ray->self, oID, pID);
}
}
return status;
}
ccl_device_inline void kernel_embree_convert_hit(KernelGlobals kg,
const RTCRay *ray,
const RTCHit *hit,
Intersection *isect)
{
isect->t = ray->tfar;
if (hit->instID[0] != RTC_INVALID_GEOMETRY_ID) {
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->instID[0]));
isect->prim = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
isect->object = hit->instID[0] / 2;
}
else {
isect->prim = hit->primID + (intptr_t)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, hit->geomID));
isect->object = hit->geomID / 2;
}
const bool is_hair = hit->geomID & 1;
if (is_hair) {
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, isect->prim);
isect->type = segment.type;
isect->prim = segment.prim;
isect->u = hit->u;
isect->v = hit->v;
}
else {
isect->type = kernel_data_fetch(objects, isect->object).primitive_type;
isect->u = hit->u;
isect->v = hit->v;
}
}
ccl_device_inline void kernel_embree_convert_sss_hit(
KernelGlobals kg, const RTCRay *ray, const RTCHit *hit, Intersection *isect, int object)
{
isect->u = hit->u;
isect->v = hit->v;
isect->t = ray->tfar;
RTCScene inst_scene = (RTCScene)rtcGetGeometryUserData(
rtcGetGeometry(kernel_data.device_bvh, object * 2));
isect->prim = hit->primID +
(intptr_t)rtcGetGeometryUserData(rtcGetGeometry(inst_scene, hit->geomID));
isect->object = object;
isect->type = kernel_data_fetch(objects, object).primitive_type;
}
/* Ray filter functions. */
/* This gets called by Embree at every valid ray/object intersection.
* Things like recording subsurface or shadow hits for later evaluation
* as well as filtering for volume objects happen here.
* Cycles' own BVH does that directly inside the traversal calls. */
ccl_device void kernel_embree_filter_intersection_func(const RTCFilterFunctionNArguments *args)
{
/* Current implementation in Cycles assumes only single-ray intersection queries. */
assert(args->N == 1);
RTCHit *hit = (RTCHit *)args->hit;
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
}
}
/* This gets called by Embree at every valid ray/object intersection.
* Things like recording subsurface or shadow hits for later evaluation
* as well as filtering for volume objects happen here.
* Cycles' own BVH does that directly inside the traversal calls.
*/
ccl_device void kernel_embree_filter_occluded_func(const RTCFilterFunctionNArguments *args)
{
/* Current implementation in Cycles assumes only single-ray intersection queries. */
assert(args->N == 1);
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
switch (ctx->type) {
case CCLIntersectContext::RAY_SHADOW_ALL: {
Intersection current_isect;
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (intersection_skip_self_shadow(cray->self, current_isect.object, current_isect.prim)) {
*args->valid = 0;
return;
}
/* If no transparent shadows or max number of hits exceeded, all light is blocked. */
const int flags = intersection_get_shader_flags(kg, current_isect.prim, current_isect.type);
if (!(flags & (SD_HAS_TRANSPARENT_SHADOW)) || ctx->num_hits >= ctx->max_hits) {
ctx->opaque_hit = true;
return;
}
++ctx->num_hits;
/* Always use baked shadow transparency for curves. */
if (current_isect.type & PRIMITIVE_CURVE) {
ctx->throughput *= intersection_curve_shadow_transparency(
kg, current_isect.object, current_isect.prim, current_isect.u);
if (ctx->throughput < CURVE_SHADOW_TRANSPARENCY_CUTOFF) {
ctx->opaque_hit = true;
return;
}
else {
*args->valid = 0;
return;
}
}
/* Test if we need to record this transparent intersection. */
const uint max_record_hits = min(ctx->max_hits, INTEGRATOR_SHADOW_ISECT_SIZE);
if (ctx->num_recorded_hits < max_record_hits || ray->tfar < ctx->max_t) {
/* If maximum number of hits was reached, replace the intersection with the
* highest distance. We want to find the N closest intersections. */
const uint num_recorded_hits = min(ctx->num_recorded_hits, max_record_hits);
uint isect_index = num_recorded_hits;
if (num_recorded_hits + 1 >= max_record_hits) {
float max_t = ctx->isect_s[0].t;
uint max_recorded_hit = 0;
for (uint i = 1; i < num_recorded_hits; ++i) {
if (ctx->isect_s[i].t > max_t) {
max_recorded_hit = i;
max_t = ctx->isect_s[i].t;
}
}
if (num_recorded_hits >= max_record_hits) {
isect_index = max_recorded_hit;
}
/* Limit the ray distance and stop counting hits beyond this.
* TODO: is there some way we can tell Embree to stop intersecting beyond
* this distance when max number of hits is reached?. Or maybe it will
* become irrelevant if we make max_hits a very high number on the CPU. */
ctx->max_t = max(current_isect.t, max_t);
}
ctx->isect_s[isect_index] = current_isect;
}
/* Always increase the number of recorded hits, even beyond the maximum,
* so that we can detect this and trace another ray if needed. */
++ctx->num_recorded_hits;
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
case CCLIntersectContext::RAY_LOCAL:
case CCLIntersectContext::RAY_SSS: {
/* Check if it's hitting the correct object. */
Intersection current_isect;
if (ctx->type == CCLIntersectContext::RAY_SSS) {
kernel_embree_convert_sss_hit(kg, ray, hit, &current_isect, ctx->local_object_id);
}
else {
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (ctx->local_object_id != current_isect.object) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
}
if (intersection_skip_self_local(cray->self, current_isect.prim)) {
*args->valid = 0;
return;
}
/* No intersection information requested, just return a hit. */
if (ctx->max_hits == 0) {
break;
}
/* Ignore curves. */
if (EMBREE_IS_HAIR(hit->geomID)) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
LocalIntersection *local_isect = ctx->local_isect;
int hit_idx = 0;
if (ctx->lcg_state) {
/* See triangle_intersect_subsurface() for the native equivalent. */
for (int i = min((int)ctx->max_hits, local_isect->num_hits) - 1; i >= 0; --i) {
if (local_isect->hits[i].t == ray->tfar) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
return;
}
}
local_isect->num_hits++;
if (local_isect->num_hits <= ctx->max_hits) {
hit_idx = local_isect->num_hits - 1;
}
else {
/* reservoir sampling: if we are at the maximum number of
* hits, randomly replace element or skip it */
hit_idx = lcg_step_uint(ctx->lcg_state) % local_isect->num_hits;
if (hit_idx >= ctx->max_hits) {
/* This tells Embree to continue tracing. */
*args->valid = 0;
return;
}
}
}
else {
/* Record closest intersection only. */
if (local_isect->num_hits && current_isect.t > local_isect->hits[0].t) {
*args->valid = 0;
return;
}
local_isect->num_hits = 1;
}
/* record intersection */
local_isect->hits[hit_idx] = current_isect;
local_isect->Ng[hit_idx] = normalize(make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z));
/* This tells Embree to continue tracing. */
*args->valid = 0;
break;
}
case CCLIntersectContext::RAY_VOLUME_ALL: {
/* Append the intersection to the end of the array. */
if (ctx->num_hits < ctx->max_hits) {
Intersection current_isect;
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
if (intersection_skip_self(cray->self, current_isect.object, current_isect.prim)) {
*args->valid = 0;
return;
}
Intersection *isect = &ctx->isect_s[ctx->num_hits];
++ctx->num_hits;
*isect = current_isect;
/* Only primitives from volume object. */
uint tri_object = isect->object;
int object_flag = kernel_data_fetch(object_flag, tri_object);
if ((object_flag & SD_OBJECT_HAS_VOLUME) == 0) {
--ctx->num_hits;
}
/* This tells Embree to continue tracing. */
*args->valid = 0;
}
break;
}
case CCLIntersectContext::RAY_REGULAR:
default:
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
return;
}
break;
}
}
ccl_device void kernel_embree_filter_func_backface_cull(const RTCFilterFunctionNArguments *args)
{
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
/* Always ignore back-facing intersections. */
if (dot(make_float3(ray->dir_x, ray->dir_y, ray->dir_z),
make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z)) > 0.0f) {
*args->valid = 0;
return;
}
CCLIntersectContext *ctx = ((IntersectContext *)args->context)->userRayExt;
const KernelGlobalsCPU *kg = ctx->kg;
const Ray *cray = ctx->ray;
if (kernel_embree_is_self_intersection(kg, hit, cray)) {
*args->valid = 0;
}
}
ccl_device void kernel_embree_filter_occluded_func_backface_cull(
const RTCFilterFunctionNArguments *args)
{
const RTCRay *ray = (RTCRay *)args->ray;
RTCHit *hit = (RTCHit *)args->hit;
/* Always ignore back-facing intersections. */
if (dot(make_float3(ray->dir_x, ray->dir_y, ray->dir_z),
make_float3(hit->Ng_x, hit->Ng_y, hit->Ng_z)) > 0.0f) {
*args->valid = 0;
return;
}
kernel_embree_filter_occluded_func(args);
}
/* Scene intersection. */
ccl_device_intersect bool kernel_embree_intersect(KernelGlobals kg,
ccl_private const Ray *ray,
const uint visibility,
ccl_private Intersection *isect)
{
isect->t = ray->tmax;
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_REGULAR);
IntersectContext rtc_ctx(&ctx);
RTCRayHit ray_hit;
ctx.ray = ray;
kernel_embree_setup_rayhit(*ray, ray_hit, visibility);
rtcIntersect1(kernel_data.device_bvh, &rtc_ctx.context, &ray_hit);
if (ray_hit.hit.geomID == RTC_INVALID_GEOMETRY_ID ||
ray_hit.hit.primID == RTC_INVALID_GEOMETRY_ID) {
return false;
}
kernel_embree_convert_hit(kg, &ray_hit.ray, &ray_hit.hit, isect);
return true;
}
#ifdef __BVH_LOCAL__
ccl_device_intersect bool kernel_embree_intersect_local(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private LocalIntersection *local_isect,
int local_object,
ccl_private uint *lcg_state,
int max_hits)
{
const bool has_bvh = !(kernel_data_fetch(object_flag, local_object) &
SD_OBJECT_TRANSFORM_APPLIED);
CCLIntersectContext ctx(kg,
has_bvh ? CCLIntersectContext::RAY_SSS : CCLIntersectContext::RAY_LOCAL);
ctx.lcg_state = lcg_state;
ctx.max_hits = max_hits;
ctx.ray = ray;
ctx.local_isect = local_isect;
if (local_isect) {
local_isect->num_hits = 0;
}
ctx.local_object_id = local_object;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, PATH_RAY_ALL_VISIBILITY);
/* If this object has its own BVH, use it. */
if (has_bvh) {
RTCGeometry geom = rtcGetGeometry(kernel_data.device_bvh, local_object * 2);
if (geom) {
float3 P = ray->P;
float3 dir = ray->D;
float3 idir = ray->D;
bvh_instance_motion_push(kg, local_object, ray, &P, &dir, &idir);
rtc_ray.org_x = P.x;
rtc_ray.org_y = P.y;
rtc_ray.org_z = P.z;
rtc_ray.dir_x = dir.x;
rtc_ray.dir_y = dir.y;
rtc_ray.dir_z = dir.z;
rtc_ray.tnear = ray->tmin;
rtc_ray.tfar = ray->tmax;
RTCScene scene = (RTCScene)rtcGetGeometryUserData(geom);
kernel_assert(scene);
if (scene) {
rtcOccluded1(scene, &rtc_ctx.context, &rtc_ray);
}
}
}
else {
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
}
/* rtcOccluded1 sets tfar to -inf if a hit was found. */
return (local_isect && local_isect->num_hits > 0) || (rtc_ray.tfar < 0);
}
#endif
#ifdef __SHADOW_RECORD_ALL__
ccl_device_intersect bool kernel_embree_intersect_shadow_all(KernelGlobals kg,
IntegratorShadowStateCPU *state,
ccl_private const Ray *ray,
uint visibility,
uint max_hits,
ccl_private uint *num_recorded_hits,
ccl_private float *throughput)
{
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_SHADOW_ALL);
Intersection *isect_array = (Intersection *)state->shadow_isect;
ctx.isect_s = isect_array;
ctx.max_hits = max_hits;
ctx.ray = ray;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, visibility);
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
*num_recorded_hits = ctx.num_recorded_hits;
*throughput = ctx.throughput;
return ctx.opaque_hit;
}
#endif
#ifdef __VOLUME__
ccl_device_intersect uint kernel_embree_intersect_volume(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint max_hits,
const uint visibility)
{
CCLIntersectContext ctx(kg, CCLIntersectContext::RAY_VOLUME_ALL);
ctx.isect_s = isect;
ctx.max_hits = max_hits;
ctx.num_hits = 0;
ctx.ray = ray;
IntersectContext rtc_ctx(&ctx);
RTCRay rtc_ray;
kernel_embree_setup_ray(*ray, rtc_ray, visibility);
rtcOccluded1(kernel_data.device_bvh, &rtc_ctx.context, &rtc_ray);
return ctx.num_hits;
}
#endif
CCL_NAMESPACE_END

View File

@@ -3,8 +3,6 @@
#pragma once
#define __KERNEL_CPU__
/* Release kernel has too much false-positive maybe-uninitialized warnings,
* which makes it possible to miss actual warnings.
*/
@@ -35,38 +33,4 @@ CCL_NAMESPACE_BEGIN
#define kernel_assert(cond) assert(cond)
/* Macros to handle different memory storage on different devices */
#ifdef __KERNEL_SSE2__
typedef vector3<sseb> sse3b;
typedef vector3<ssef> sse3f;
typedef vector3<ssei> sse3i;
ccl_device_inline void print_sse3b(const char *label, sse3b &a)
{
print_sseb(label, a.x);
print_sseb(label, a.y);
print_sseb(label, a.z);
}
ccl_device_inline void print_sse3f(const char *label, sse3f &a)
{
print_ssef(label, a.x);
print_ssef(label, a.y);
print_ssef(label, a.z);
}
ccl_device_inline void print_sse3i(const char *label, sse3i &a)
{
print_ssei(label, a.x);
print_ssei(label, a.y);
print_ssei(label, a.z);
}
# if defined(__KERNEL_AVX__) || defined(__KERNEL_AVX2__)
typedef vector3<avxf> avx3f;
# endif
#endif
CCL_NAMESPACE_END

View File

@@ -0,0 +1,360 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2021-2022 Blender Foundation */
/* MetalRT implementation of ray-scene intersection. */
#pragma once
#include "kernel/bvh/types.h"
#include "kernel/bvh/util.h"
CCL_NAMESPACE_BEGIN
/* Payload types. */
struct MetalRTIntersectionPayload {
RaySelfPrimitives self;
uint visibility;
float u, v;
int prim;
int type;
#if defined(__METALRT_MOTION__)
float time;
#endif
};
struct MetalRTIntersectionLocalPayload {
RaySelfPrimitives self;
uint local_object;
uint lcg_state;
short max_hits;
bool has_lcg_state;
bool result;
LocalIntersection local_isect;
};
struct MetalRTIntersectionShadowPayload {
RaySelfPrimitives self;
uint visibility;
#if defined(__METALRT_MOTION__)
float time;
#endif
int state;
float throughput;
short max_hits;
short num_hits;
short num_recorded_hits;
bool result;
};
/* Scene intersection. */
ccl_device_intersect bool scene_intersect(KernelGlobals kg,
ccl_private const Ray *ray,
const uint visibility,
ccl_private Intersection *isect)
{
if (!intersection_ray_valid(ray)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
return false;
}
#if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_default)) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
kernel_assert(!"Invalid ift_default");
return false;
}
#endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionPayload payload;
payload.self = ray->self;
payload.u = 0.0f;
payload.v = 0.0f;
payload.visibility = visibility;
typename metalrt_intersector_type::result_type intersection;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
/* No further intersector setup required: Default MetalRT behavior is any-hit. */
}
else if (visibility & PATH_RAY_SHADOW_OPAQUE) {
/* No further intersector setup required: Shadow ray early termination is controlled by the
* intersection handler */
}
#if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_default,
payload);
#else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_default, payload);
#endif
if (intersection.type == intersection_type::none) {
isect->t = ray->tmax;
isect->type = PRIMITIVE_NONE;
return false;
}
isect->t = intersection.distance;
isect->prim = payload.prim;
isect->type = payload.type;
isect->object = intersection.user_instance_id;
isect->t = intersection.distance;
if (intersection.type == intersection_type::triangle) {
isect->u = intersection.triangle_barycentric_coord.x;
isect->v = intersection.triangle_barycentric_coord.y;
}
else {
isect->u = payload.u;
isect->v = payload.v;
}
return isect->type != PRIMITIVE_NONE;
}
#ifdef __BVH_LOCAL__
ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private LocalIntersection *local_isect,
int local_object,
ccl_private uint *lcg_state,
int max_hits)
{
if (!intersection_ray_valid(ray)) {
if (local_isect) {
local_isect->num_hits = 0;
}
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_local)) {
if (local_isect) {
local_isect->num_hits = 0;
}
kernel_assert(!"Invalid ift_local");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionLocalPayload payload;
payload.self = ray->self;
payload.local_object = local_object;
payload.max_hits = max_hits;
payload.local_isect.num_hits = 0;
if (lcg_state) {
payload.has_lcg_state = true;
payload.lcg_state = *lcg_state;
}
payload.result = false;
typename metalrt_intersector_type::result_type intersection;
# if defined(__METALRT_MOTION__)
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, ray->time, metal_ancillaries->ift_local, payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, 0xFF, metal_ancillaries->ift_local, payload);
# endif
if (lcg_state) {
*lcg_state = payload.lcg_state;
}
*local_isect = payload.local_isect;
return payload.result;
}
#endif
#ifdef __SHADOW_RECORD_ALL__
ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
IntegratorShadowState state,
ccl_private const Ray *ray,
uint visibility,
uint max_hits,
ccl_private uint *num_recorded_hits,
ccl_private float *throughput)
{
if (!intersection_ray_valid(ray)) {
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_shadow)) {
kernel_assert(!"Invalid ift_shadow");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionShadowPayload payload;
payload.self = ray->self;
payload.visibility = visibility;
payload.max_hits = max_hits;
payload.num_hits = 0;
payload.num_recorded_hits = 0;
payload.throughput = 1.0f;
payload.result = false;
payload.state = state;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
typename metalrt_intersector_type::result_type intersection;
# if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_shadow,
payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_shadow, payload);
# endif
*num_recorded_hits = payload.num_recorded_hits;
*throughput = payload.throughput;
return payload.result;
}
#endif
#ifdef __VOLUME__
ccl_device_intersect bool scene_intersect_volume(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint visibility)
{
if (!intersection_ray_valid(ray)) {
return false;
}
# if defined(__KERNEL_DEBUG__)
if (is_null_instance_acceleration_structure(metal_ancillaries->accel_struct)) {
kernel_assert(!"Invalid metal_ancillaries->accel_struct pointer");
return false;
}
if (is_null_intersection_function_table(metal_ancillaries->ift_default)) {
kernel_assert(!"Invalid ift_default");
return false;
}
# endif
metal::raytracing::ray r(ray->P, ray->D, ray->tmin, ray->tmax);
metalrt_intersector_type metalrt_intersect;
metalrt_intersect.force_opacity(metal::raytracing::forced_opacity::non_opaque);
if (!kernel_data.bvh.have_curves) {
metalrt_intersect.assume_geometry_type(metal::raytracing::geometry_type::triangle);
}
MetalRTIntersectionPayload payload;
payload.self = ray->self;
payload.visibility = visibility;
typename metalrt_intersector_type::result_type intersection;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
# if defined(__METALRT_MOTION__)
payload.time = ray->time;
intersection = metalrt_intersect.intersect(r,
metal_ancillaries->accel_struct,
ray_mask,
ray->time,
metal_ancillaries->ift_default,
payload);
# else
intersection = metalrt_intersect.intersect(
r, metal_ancillaries->accel_struct, ray_mask, metal_ancillaries->ift_default, payload);
# endif
if (intersection.type == intersection_type::none) {
return false;
}
isect->prim = payload.prim;
isect->type = payload.type;
isect->object = intersection.user_instance_id;
isect->t = intersection.distance;
if (intersection.type == intersection_type::triangle) {
isect->u = intersection.triangle_barycentric_coord.x;
isect->v = intersection.triangle_barycentric_coord.y;
}
else {
isect->u = payload.u;
isect->v = payload.v;
}
return isect->type != PRIMITIVE_NONE;
}
#endif
CCL_NAMESPACE_END

View File

@@ -260,8 +260,6 @@ void kernel_gpu_##name::run(thread MetalKernelContext& context, \
#ifdef __METALRT__
# define __KERNEL_GPU_RAYTRACING__
# if defined(__METALRT_MOTION__)
# define METALRT_TAGS instancing, instance_motion, primitive_motion
# else

View File

@@ -1,41 +1,44 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2021-2022 Blender Foundation */
/* Metal kernel entry points */
/* Metal kernel entry points. */
#include "kernel/device/metal/compat.h"
#include "kernel/device/metal/globals.h"
#include "kernel/device/metal/function_constants.h"
#include "kernel/device/gpu/kernel.h"
/* MetalRT intersection handlers */
/* MetalRT intersection handlers. */
#ifdef __METALRT__
/* Return type for a bounding box intersection function. */
struct BoundingBoxIntersectionResult
{
/* Intersection return types. */
/* For a bounding box intersection function. */
struct BoundingBoxIntersectionResult {
bool accept [[accept_intersection]];
bool continue_search [[continue_search]];
float distance [[distance]];
};
/* Return type for a triangle intersection function. */
struct TriangleIntersectionResult
{
/* For a triangle intersection function. */
struct TriangleIntersectionResult {
bool accept [[accept_intersection]];
bool continue_search [[continue_search]];
bool continue_search [[continue_search]];
};
enum { METALRT_HIT_TRIANGLE, METALRT_HIT_BOUNDING_BOX };
ccl_device_inline bool intersection_skip_self(ray_data const RaySelfPrimitives& self,
/* Utilities. */
ccl_device_inline bool intersection_skip_self(ray_data const RaySelfPrimitives &self,
const int object,
const int prim)
{
return (self.prim == prim) && (self.object == object);
}
ccl_device_inline bool intersection_skip_self_shadow(ray_data const RaySelfPrimitives& self,
ccl_device_inline bool intersection_skip_self_shadow(ray_data const RaySelfPrimitives &self,
const int object,
const int prim)
{
@@ -43,12 +46,14 @@ ccl_device_inline bool intersection_skip_self_shadow(ray_data const RaySelfPrimi
((self.light_prim == prim) && (self.light_object == object));
}
ccl_device_inline bool intersection_skip_self_local(ray_data const RaySelfPrimitives& self,
ccl_device_inline bool intersection_skip_self_local(ray_data const RaySelfPrimitives &self,
const int prim)
{
return (self.prim == prim);
}
/* Hit functions. */
template<typename TReturn, uint intersection_type>
TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload,
@@ -58,7 +63,7 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
const float ray_tmax)
{
TReturn result;
#ifdef __BVH_LOCAL__
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
@@ -101,7 +106,8 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
}
else {
if (payload.local_isect.num_hits && ray_tmax > payload.local_isect.hits[0].t) {
/* Record closest intersection only. Do not terminate ray here, since there is no guarantee about distance ordering in any-hit */
/* Record closest intersection only. Do not terminate ray here, since there is no guarantee
* about distance ordering in any-hit */
result.accept = false;
result.continue_search = true;
return result;
@@ -116,8 +122,8 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
isect->object = object;
isect->type = kernel_data_fetch(objects, object).primitive_type;
isect->u = 1.0f - barycentrics.y - barycentrics.x;
isect->v = barycentrics.x;
isect->u = barycentrics.x;
isect->v = barycentrics.y;
/* Record geometric normal */
const uint tri_vindex = kernel_data_fetch(tri_vindex, isect->prim).w;
@@ -133,21 +139,20 @@ TReturn metalrt_local_hit(constant KernelParamsMetal &launch_params_metal,
#endif
}
[[intersection(triangle, triangle_data, METALRT_TAGS)]]
TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload [[payload]],
uint instance_id [[user_instance_id]],
uint primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
[[intersection(triangle, triangle_data, METALRT_TAGS)]] TriangleIntersectionResult
__anyhit__cycles_metalrt_local_hit_tri(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload [[payload]],
uint instance_id [[user_instance_id]],
uint primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
{
return metalrt_local_hit<TriangleIntersectionResult, METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, instance_id, primitive_id, barycentrics, ray_tmax);
launch_params_metal, payload, instance_id, primitive_id, barycentrics, ray_tmax);
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__anyhit__cycles_metalrt_local_hit_box(const float ray_tmax [[max_distance]])
{
/* unused function */
@@ -180,18 +185,14 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
return true;
}
float u = 0.0f, v = 0.0f;
const float u = barycentrics.x;
const float v = barycentrics.y;
int type = 0;
if (intersection_type == METALRT_HIT_TRIANGLE) {
u = 1.0f - barycentrics.y - barycentrics.x;
v = barycentrics.x;
type = kernel_data_fetch(objects, object).primitive_type;
}
# ifdef __HAIR__
else {
u = barycentrics.x;
v = barycentrics.y;
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
type = segment.type;
prim = segment.prim;
@@ -215,7 +216,7 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
short num_recorded_hits = payload.num_recorded_hits;
MetalKernelContext context(launch_params_metal);
/* If no transparent shadows, all light is blocked and we can stop immediately. */
if (num_hits >= max_hits ||
!(context.intersection_get_shader_flags(NULL, prim, type) & SD_HAS_TRANSPARENT_SHADOW)) {
@@ -223,7 +224,7 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
/* terminate ray */
return false;
}
/* Always use baked shadow transparency for curves. */
if (type & PRIMITIVE_CURVE) {
float throughput = payload.throughput;
@@ -240,10 +241,10 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
return true;
}
}
payload.num_hits += 1;
payload.num_recorded_hits += 1;
uint record_index = num_recorded_hits;
const IntegratorShadowState state = payload.state;
@@ -278,7 +279,7 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, prim) = prim;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, object) = object;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, type) = type;
/* Continue tracing. */
# endif /* __TRANSPARENT_SHADOWS__ */
#endif /* __SHADOW_RECORD_ALL__ */
@@ -286,26 +287,25 @@ bool metalrt_shadow_all_hit(constant KernelParamsMetal &launch_params_metal,
return true;
}
[[intersection(triangle, triangle_data, METALRT_TAGS)]]
TriangleIntersectionResult
__anyhit__cycles_metalrt_shadow_all_hit_tri(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
unsigned int object [[user_instance_id]],
unsigned int primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
[[intersection(triangle, triangle_data, METALRT_TAGS)]] TriangleIntersectionResult
__anyhit__cycles_metalrt_shadow_all_hit_tri(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
unsigned int object [[user_instance_id]],
unsigned int primitive_id [[primitive_id]],
float2 barycentrics [[barycentric_coord]],
float ray_tmax [[distance]])
{
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
TriangleIntersectionResult result;
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, object, prim, barycentrics, ray_tmax);
launch_params_metal, payload, object, prim, barycentrics, ray_tmax);
result.accept = !result.continue_search;
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__anyhit__cycles_metalrt_shadow_all_hit_box(const float ray_tmax [[max_distance]])
{
/* unused function */
@@ -317,15 +317,16 @@ __anyhit__cycles_metalrt_shadow_all_hit_box(const float ray_tmax [[max_distance]
}
template<typename TReturnType, uint intersection_type>
inline TReturnType metalrt_visibility_test(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const float u)
inline TReturnType metalrt_visibility_test(
constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const float u)
{
TReturnType result;
# ifdef __HAIR__
#ifdef __HAIR__
if (intersection_type == METALRT_HIT_BOUNDING_BOX) {
/* Filter out curve endcaps. */
if (u == 0.0f || u == 1.0f) {
@@ -334,16 +335,16 @@ inline TReturnType metalrt_visibility_test(constant KernelParamsMetal &launch_pa
return result;
}
}
# endif
#endif
uint visibility = payload.visibility;
# ifdef __VISIBILITY_FLAG__
#ifdef __VISIBILITY_FLAG__
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
result.accept = false;
result.continue_search = true;
return result;
}
# endif
#endif
/* Shadow ray early termination. */
if (visibility & PATH_RAY_SHADOW_OPAQUE) {
@@ -371,16 +372,17 @@ inline TReturnType metalrt_visibility_test(constant KernelParamsMetal &launch_pa
return result;
}
[[intersection(triangle, triangle_data, METALRT_TAGS)]]
TriangleIntersectionResult
__anyhit__cycles_metalrt_visibility_test_tri(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
unsigned int object [[user_instance_id]],
unsigned int primitive_id [[primitive_id]])
[[intersection(triangle, triangle_data, METALRT_TAGS)]] TriangleIntersectionResult
__anyhit__cycles_metalrt_visibility_test_tri(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
unsigned int object [[user_instance_id]],
unsigned int primitive_id [[primitive_id]])
{
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
TriangleIntersectionResult result = metalrt_visibility_test<TriangleIntersectionResult, METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, object, prim, 0.0f);
TriangleIntersectionResult result =
metalrt_visibility_test<TriangleIntersectionResult, METALRT_HIT_TRIANGLE>(
launch_params_metal, payload, object, prim, 0.0f);
if (result.accept) {
payload.prim = prim;
payload.type = kernel_data_fetch(objects, object).primitive_type;
@@ -388,8 +390,7 @@ __anyhit__cycles_metalrt_visibility_test_tri(constant KernelParamsMetal &launch_
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__anyhit__cycles_metalrt_visibility_test_box(const float ray_tmax [[max_distance]])
{
/* Unused function */
@@ -400,19 +401,21 @@ __anyhit__cycles_metalrt_visibility_test_box(const float ray_tmax [[max_distance
return result;
}
/* Primitive intersection functions. */
#ifdef __HAIR__
ccl_device_inline
void metalrt_intersection_curve(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_origin,
const float3 ray_direction,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
ccl_device_inline void metalrt_intersection_curve(
constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_P,
const float3 ray_D,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
{
# ifdef __VISIBILITY_FLAG__
const uint visibility = payload.visibility;
@@ -421,25 +424,16 @@ void metalrt_intersection_curve(constant KernelParamsMetal &launch_params_metal,
}
# endif
float3 P = ray_origin;
float3 dir = ray_direction;
/* The direction is not normalized by default, but the curve intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
Intersection isect;
isect.t = ray_tmax;
/* Transform maximum distance into object space. */
if (isect.t != FLT_MAX)
isect.t *= len;
MetalKernelContext context(launch_params_metal);
if (context.curve_intersect(NULL, &isect, P, dir, ray_tmin, isect.t, object, prim, time, type)) {
if (context.curve_intersect(
NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
result = metalrt_visibility_test<BoundingBoxIntersectionResult, METALRT_HIT_BOUNDING_BOX>(
launch_params_metal, payload, object, prim, isect.u);
launch_params_metal, payload, object, prim, isect.u);
if (result.accept) {
result.distance = isect.t / len;
result.distance = isect.t;
payload.u = isect.u;
payload.v = isect.v;
payload.prim = prim;
@@ -448,54 +442,41 @@ void metalrt_intersection_curve(constant KernelParamsMetal &launch_params_metal,
}
}
ccl_device_inline
void metalrt_intersection_curve_shadow(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_origin,
const float3 ray_direction,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
ccl_device_inline void metalrt_intersection_curve_shadow(
constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_P,
const float3 ray_D,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
{
const uint visibility = payload.visibility;
float3 P = ray_origin;
float3 dir = ray_direction;
/* The direction is not normalized by default, but the curve intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
Intersection isect;
isect.t = ray_tmax;
/* Transform maximum distance into object space */
if (isect.t != FLT_MAX)
isect.t *= len;
MetalKernelContext context(launch_params_metal);
if (context.curve_intersect(NULL, &isect, P, dir, ray_tmin, isect.t, object, prim, time, type)) {
if (context.curve_intersect(
NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_BOUNDING_BOX>(
launch_params_metal, payload, object, prim, float2(isect.u, isect.v), ray_tmax);
launch_params_metal, payload, object, prim, float2(isect.u, isect.v), ray_tmax);
result.accept = !result.continue_search;
if (result.accept) {
result.distance = isect.t / len;
}
}
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__curve_ribbon(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload
[[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float3 ray_P [[origin]],
const float3 ray_D [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
@@ -508,28 +489,36 @@ __intersection__curve_ribbon(constant KernelParamsMetal &launch_params_metal [[b
result.distance = ray_tmax;
if (segment.type & PRIMITIVE_CURVE_RIBBON) {
metalrt_intersection_curve(launch_params_metal, payload, object, segment.prim, segment.type, ray_origin, ray_direction,
metalrt_intersection_curve(launch_params_metal,
payload,
object,
segment.prim,
segment.type,
ray_P,
ray_D,
# if defined(__METALRT_MOTION__)
payload.time,
# else
0.0f,
# endif
ray_tmin, ray_tmax, result);
ray_tmin,
ray_tmax,
result);
}
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
__intersection__curve_ribbon_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__curve_ribbon_shadow(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_P [[origin]],
const float3 ray_D [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
@@ -540,57 +529,73 @@ __intersection__curve_ribbon_shadow(constant KernelParamsMetal &launch_params_me
result.distance = ray_tmax;
if (segment.type & PRIMITIVE_CURVE_RIBBON) {
metalrt_intersection_curve_shadow(launch_params_metal, payload, object, segment.prim, segment.type, ray_origin, ray_direction,
metalrt_intersection_curve_shadow(launch_params_metal,
payload,
object,
segment.prim,
segment.type,
ray_P,
ray_D,
# if defined(__METALRT_MOTION__)
payload.time,
payload.time,
# else
0.0f,
0.0f,
# endif
ray_tmin, ray_tmax, result);
ray_tmin,
ray_tmax,
result);
}
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__curve_all(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload
[[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float3 ray_P [[origin]],
const float3 ray_D [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
BoundingBoxIntersectionResult result;
result.accept = false;
result.continue_search = true;
result.distance = ray_tmax;
metalrt_intersection_curve(launch_params_metal, payload, object, segment.prim, segment.type, ray_origin, ray_direction,
metalrt_intersection_curve(launch_params_metal,
payload,
object,
segment.prim,
segment.type,
ray_P,
ray_D,
# if defined(__METALRT_MOTION__)
payload.time,
# else
0.0f,
# endif
ray_tmin, ray_tmax, result);
ray_tmin,
ray_tmax,
result);
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
__intersection__curve_all_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__curve_all_shadow(
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_P [[origin]],
const float3 ray_D [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
@@ -600,31 +605,39 @@ __intersection__curve_all_shadow(constant KernelParamsMetal &launch_params_metal
result.continue_search = true;
result.distance = ray_tmax;
metalrt_intersection_curve_shadow(launch_params_metal, payload, object, segment.prim, segment.type, ray_origin, ray_direction,
metalrt_intersection_curve_shadow(launch_params_metal,
payload,
object,
segment.prim,
segment.type,
ray_P,
ray_D,
# if defined(__METALRT_MOTION__)
payload.time,
payload.time,
# else
0.0f,
0.0f,
# endif
ray_tmin, ray_tmax, result);
ray_tmin,
ray_tmax,
result);
return result;
}
#endif /* __HAIR__ */
#ifdef __POINTCLOUD__
ccl_device_inline
void metalrt_intersection_point(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_origin,
const float3 ray_direction,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
ccl_device_inline void metalrt_intersection_point(
constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_P,
const float3 ray_D,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
{
# ifdef __VISIBILITY_FLAG__
const uint visibility = payload.visibility;
@@ -633,25 +646,16 @@ void metalrt_intersection_point(constant KernelParamsMetal &launch_params_metal,
}
# endif
float3 P = ray_origin;
float3 dir = ray_direction;
/* The direction is not normalized by default, but the point intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
Intersection isect;
isect.t = ray_tmax;
/* Transform maximum distance into object space. */
if (isect.t != FLT_MAX)
isect.t *= len;
MetalKernelContext context(launch_params_metal);
if (context.point_intersect(NULL, &isect, P, dir, ray_tmin, isect.t, object, prim, time, type)) {
if (context.point_intersect(
NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
result = metalrt_visibility_test<BoundingBoxIntersectionResult, METALRT_HIT_BOUNDING_BOX>(
launch_params_metal, payload, object, prim, isect.u);
launch_params_metal, payload, object, prim, isect.u);
if (result.accept) {
result.distance = isect.t / len;
result.distance = isect.t;
payload.u = isect.u;
payload.v = isect.v;
payload.prim = prim;
@@ -660,50 +664,78 @@ void metalrt_intersection_point(constant KernelParamsMetal &launch_params_metal,
}
}
ccl_device_inline
void metalrt_intersection_point_shadow(constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_origin,
const float3 ray_direction,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
ccl_device_inline void metalrt_intersection_point_shadow(
constant KernelParamsMetal &launch_params_metal,
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,
const uint object,
const uint prim,
const uint type,
const float3 ray_P,
const float3 ray_D,
float time,
const float ray_tmin,
const float ray_tmax,
thread BoundingBoxIntersectionResult &result)
{
const uint visibility = payload.visibility;
float3 P = ray_origin;
float3 dir = ray_direction;
/* The direction is not normalized by default, but the point intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
Intersection isect;
isect.t = ray_tmax;
/* Transform maximum distance into object space */
if (isect.t != FLT_MAX)
isect.t *= len;
MetalKernelContext context(launch_params_metal);
if (context.point_intersect(NULL, &isect, P, dir, ray_tmin, isect.t, object, prim, time, type)) {
if (context.point_intersect(
NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_BOUNDING_BOX>(
launch_params_metal, payload, object, prim, float2(isect.u, isect.v), ray_tmax);
launch_params_metal, payload, object, prim, float2(isect.u, isect.v), ray_tmax);
result.accept = !result.continue_search;
if (result.accept) {
result.distance = isect.t / len;
result.distance = isect.t;
}
}
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__point(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
const uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
const int type = kernel_data_fetch(objects, object).primitive_type;
BoundingBoxIntersectionResult result;
result.accept = false;
result.continue_search = true;
result.distance = ray_tmax;
metalrt_intersection_point(launch_params_metal,
payload,
object,
prim,
type,
ray_origin,
ray_direction,
# if defined(__METALRT_MOTION__)
payload.time,
# else
0.0f,
# endif
ray_tmin,
ray_tmax,
result);
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]] BoundingBoxIntersectionResult
__intersection__point_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload
[[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
@@ -719,43 +751,21 @@ __intersection__point(constant KernelParamsMetal &launch_params_metal [[buffer(1
result.continue_search = true;
result.distance = ray_tmax;
metalrt_intersection_point(launch_params_metal, payload, object, prim, type, ray_origin, ray_direction,
metalrt_intersection_point_shadow(launch_params_metal,
payload,
object,
prim,
type,
ray_origin,
ray_direction,
# if defined(__METALRT_MOTION__)
payload.time,
payload.time,
# else
0.0f,
0.0f,
# endif
ray_tmin, ray_tmax, result);
return result;
}
[[intersection(bounding_box, triangle_data, METALRT_TAGS)]]
BoundingBoxIntersectionResult
__intersection__point_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload [[payload]],
const uint object [[user_instance_id]],
const uint primitive_id [[primitive_id]],
const float3 ray_origin [[origin]],
const float3 ray_direction [[direction]],
const float ray_tmin [[min_distance]],
const float ray_tmax [[max_distance]])
{
const uint prim = primitive_id + kernel_data_fetch(object_prim_offset, object);
const int type = kernel_data_fetch(objects, object).primitive_type;
BoundingBoxIntersectionResult result;
result.accept = false;
result.continue_search = true;
result.distance = ray_tmax;
metalrt_intersection_point_shadow(launch_params_metal, payload, object, prim, type, ray_origin, ray_direction,
# if defined(__METALRT_MOTION__)
payload.time,
# else
0.0f,
# endif
ray_tmin, ray_tmax, result);
ray_tmin,
ray_tmax,
result);
return result;
}

View File

@@ -149,25 +149,13 @@ void oneapi_kernel_##name(KernelGlobalsGPU *ccl_restrict kg, \
/* clang-format on */
/* Types */
/* It's not possible to use sycl types like sycl::float3, sycl::int3, etc
* because these types have different interfaces from blender version */
* because these types have different interfaces from blender version. */
using uchar = unsigned char;
using sycl::half;
struct float3 {
float x, y, z;
};
ccl_always_inline float3 make_float3(float x, float y, float z)
{
return {x, y, z};
}
ccl_always_inline float3 make_float3(float x)
{
return {x, x, x};
}
/* math functions */
#define fabsf(x) sycl::fabs((x))
#define copysignf(x, y) sycl::copysign((x), (y))

View File

@@ -6,7 +6,8 @@ DLL_INTERFACE_CALL(oneapi_device_capabilities, char *)
DLL_INTERFACE_CALL(oneapi_free, void, void *)
DLL_INTERFACE_CALL(oneapi_get_memcapacity, size_t, SyclQueue *queue)
DLL_INTERFACE_CALL(oneapi_get_compute_units_amount, size_t, SyclQueue *queue)
DLL_INTERFACE_CALL(oneapi_get_num_multiprocessors, int, SyclQueue *queue)
DLL_INTERFACE_CALL(oneapi_get_max_num_threads_per_multiprocessor, int, SyclQueue *queue)
DLL_INTERFACE_CALL(oneapi_iterate_devices, void, OneAPIDeviceIteratorCallback cb, void *user_ptr)
DLL_INTERFACE_CALL(oneapi_set_error_cb, void, OneAPIErrorCallback, void *user_ptr)

View File

@@ -904,11 +904,26 @@ size_t oneapi_get_memcapacity(SyclQueue *queue)
.get_info<sycl::info::device::global_mem_size>();
}
size_t oneapi_get_compute_units_amount(SyclQueue *queue)
int oneapi_get_num_multiprocessors(SyclQueue *queue)
{
return reinterpret_cast<sycl::queue *>(queue)
->get_device()
.get_info<sycl::info::device::max_compute_units>();
const sycl::device &device = reinterpret_cast<sycl::queue *>(queue)->get_device();
if (device.has(sycl::aspect::ext_intel_gpu_eu_count)) {
return device.get_info<sycl::info::device::ext_intel_gpu_eu_count>();
}
else
return 0;
}
int oneapi_get_max_num_threads_per_multiprocessor(SyclQueue *queue)
{
const sycl::device &device = reinterpret_cast<sycl::queue *>(queue)->get_device();
if (device.has(sycl::aspect::ext_intel_gpu_eu_simd_width) &&
device.has(sycl::aspect::ext_intel_gpu_hw_threads_per_eu)) {
return device.get_info<sycl::info::device::ext_intel_gpu_eu_simd_width>() *
device.get_info<sycl::info::device::ext_intel_gpu_hw_threads_per_eu>();
}
else
return 0;
}
#endif /* WITH_ONEAPI */

View File

@@ -0,0 +1,646 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2021-2022 Blender Foundation */
/* OptiX implementation of ray-scene intersection. */
#pragma once
#include "kernel/bvh/types.h"
#include "kernel/bvh/util.h"
#define OPTIX_DEFINE_ABI_VERSION_ONLY
#include <optix_function_table.h>
CCL_NAMESPACE_BEGIN
/* Utilities. */
template<typename T> ccl_device_forceinline T *get_payload_ptr_0()
{
return pointer_unpack_from_uint<T>(optixGetPayload_0(), optixGetPayload_1());
}
template<typename T> ccl_device_forceinline T *get_payload_ptr_2()
{
return pointer_unpack_from_uint<T>(optixGetPayload_2(), optixGetPayload_3());
}
template<typename T> ccl_device_forceinline T *get_payload_ptr_6()
{
return (T *)(((uint64_t)optixGetPayload_7() << 32) | optixGetPayload_6());
}
ccl_device_forceinline int get_object_id()
{
#ifdef __OBJECT_MOTION__
/* Always get the instance ID from the TLAS
* There might be a motion transform node between TLAS and BLAS which does not have one. */
return optixGetInstanceIdFromHandle(optixGetTransformListHandle(0));
#else
return optixGetInstanceId();
#endif
}
/* Hit/miss functions. */
extern "C" __global__ void __miss__kernel_optix_miss()
{
/* 'kernel_path_lamp_emission' checks intersection distance, so need to set it even on a miss. */
optixSetPayload_0(__float_as_uint(optixGetRayTmax()));
optixSetPayload_5(PRIMITIVE_NONE);
}
extern "C" __global__ void __anyhit__kernel_optix_local_hit()
{
#if defined(__HAIR__) || defined(__POINTCLOUD__)
if (!optixIsTriangleHit()) {
/* Ignore curves and points. */
return optixIgnoreIntersection();
}
#endif
#ifdef __BVH_LOCAL__
const int object = get_object_id();
if (object != optixGetPayload_4() /* local_object */) {
/* Only intersect with matching object. */
return optixIgnoreIntersection();
}
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self_local(ray->self, prim)) {
return optixIgnoreIntersection();
}
const uint max_hits = optixGetPayload_5();
if (max_hits == 0) {
/* Special case for when no hit information is requested, just report that something was hit */
optixSetPayload_5(true);
return optixTerminateRay();
}
int hit = 0;
uint *const lcg_state = get_payload_ptr_0<uint>();
LocalIntersection *const local_isect = get_payload_ptr_2<LocalIntersection>();
if (lcg_state) {
for (int i = min(max_hits, local_isect->num_hits) - 1; i >= 0; --i) {
if (optixGetRayTmax() == local_isect->hits[i].t) {
return optixIgnoreIntersection();
}
}
hit = local_isect->num_hits++;
if (local_isect->num_hits > max_hits) {
hit = lcg_step_uint(lcg_state) % local_isect->num_hits;
if (hit >= max_hits) {
return optixIgnoreIntersection();
}
}
}
else {
if (local_isect->num_hits && optixGetRayTmax() > local_isect->hits[0].t) {
/* Record closest intersection only.
* Do not terminate ray here, since there is no guarantee about distance ordering in any-hit.
*/
return optixIgnoreIntersection();
}
local_isect->num_hits = 1;
}
Intersection *isect = &local_isect->hits[hit];
isect->t = optixGetRayTmax();
isect->prim = prim;
isect->object = get_object_id();
isect->type = kernel_data_fetch(objects, isect->object).primitive_type;
const float2 barycentrics = optixGetTriangleBarycentrics();
isect->u = barycentrics.x;
isect->v = barycentrics.y;
/* Record geometric normal. */
const uint tri_vindex = kernel_data_fetch(tri_vindex, prim).w;
const float3 tri_a = kernel_data_fetch(tri_verts, tri_vindex + 0);
const float3 tri_b = kernel_data_fetch(tri_verts, tri_vindex + 1);
const float3 tri_c = kernel_data_fetch(tri_verts, tri_vindex + 2);
local_isect->Ng[hit] = normalize(cross(tri_b - tri_a, tri_c - tri_a));
/* Continue tracing (without this the trace call would return after the first hit). */
optixIgnoreIntersection();
#endif
}
extern "C" __global__ void __anyhit__kernel_optix_shadow_all_hit()
{
#ifdef __SHADOW_RECORD_ALL__
int prim = optixGetPrimitiveIndex();
const uint object = get_object_id();
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
# endif
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self_shadow(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
float u = 0.0f, v = 0.0f;
int type = 0;
if (optixIsTriangleHit()) {
const float2 barycentrics = optixGetTriangleBarycentrics();
u = barycentrics.x;
v = barycentrics.y;
type = kernel_data_fetch(objects, object).primitive_type;
}
# ifdef __HAIR__
else if ((optixGetHitKind() & (~PRIMITIVE_MOTION)) != PRIMITIVE_POINT) {
u = __uint_as_float(optixGetAttribute_0());
v = __uint_as_float(optixGetAttribute_1());
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
type = segment.type;
prim = segment.prim;
# if OPTIX_ABI_VERSION < 55
/* Filter out curve end-caps. */
if (u == 0.0f || u == 1.0f) {
return optixIgnoreIntersection();
}
# endif
}
# endif
else {
type = kernel_data_fetch(objects, object).primitive_type;
u = 0.0f;
v = 0.0f;
}
# ifndef __TRANSPARENT_SHADOWS__
/* No transparent shadows support compiled in, make opaque. */
optixSetPayload_5(true);
return optixTerminateRay();
# else
const uint max_hits = optixGetPayload_3();
const uint num_hits_packed = optixGetPayload_2();
const uint num_recorded_hits = uint16_unpack_from_uint_0(num_hits_packed);
const uint num_hits = uint16_unpack_from_uint_1(num_hits_packed);
/* If no transparent shadows, all light is blocked and we can stop immediately. */
if (num_hits >= max_hits ||
!(intersection_get_shader_flags(NULL, prim, type) & SD_HAS_TRANSPARENT_SHADOW)) {
optixSetPayload_5(true);
return optixTerminateRay();
}
/* Always use baked shadow transparency for curves. */
if (type & PRIMITIVE_CURVE) {
float throughput = __uint_as_float(optixGetPayload_1());
throughput *= intersection_curve_shadow_transparency(nullptr, object, prim, u);
optixSetPayload_1(__float_as_uint(throughput));
optixSetPayload_2(uint16_pack_to_uint(num_recorded_hits, num_hits + 1));
if (throughput < CURVE_SHADOW_TRANSPARENCY_CUTOFF) {
optixSetPayload_5(true);
return optixTerminateRay();
}
else {
/* Continue tracing. */
optixIgnoreIntersection();
return;
}
}
/* Record transparent intersection. */
optixSetPayload_2(uint16_pack_to_uint(num_recorded_hits + 1, num_hits + 1));
uint record_index = num_recorded_hits;
const IntegratorShadowState state = optixGetPayload_0();
const uint max_record_hits = min(max_hits, INTEGRATOR_SHADOW_ISECT_SIZE);
if (record_index >= max_record_hits) {
/* If maximum number of hits reached, find a hit to replace. */
float max_recorded_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, 0, t);
uint max_recorded_hit = 0;
for (int i = 1; i < max_record_hits; i++) {
const float isect_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, i, t);
if (isect_t > max_recorded_t) {
max_recorded_t = isect_t;
max_recorded_hit = i;
}
}
if (optixGetRayTmax() >= max_recorded_t) {
/* Accept hit, so that OptiX won't consider any more hits beyond the distance of the
* current hit anymore. */
return;
}
record_index = max_recorded_hit;
}
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, u) = u;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, v) = v;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, t) = optixGetRayTmax();
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, prim) = prim;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, object) = object;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, type) = type;
/* Continue tracing. */
optixIgnoreIntersection();
# endif /* __TRANSPARENT_SHADOWS__ */
#endif /* __SHADOW_RECORD_ALL__ */
}
extern "C" __global__ void __anyhit__kernel_optix_volume_test()
{
#if defined(__HAIR__) || defined(__POINTCLOUD__)
if (!optixIsTriangleHit()) {
/* Ignore curves. */
return optixIgnoreIntersection();
}
#endif
const uint object = get_object_id();
#ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
#endif
if ((kernel_data_fetch(object_flag, object) & SD_OBJECT_HAS_VOLUME) == 0) {
return optixIgnoreIntersection();
}
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
}
extern "C" __global__ void __anyhit__kernel_optix_visibility_test()
{
#ifdef __HAIR__
# if OPTIX_ABI_VERSION < 55
if (optixGetPrimitiveType() == OPTIX_PRIMITIVE_TYPE_ROUND_CUBIC_BSPLINE) {
/* Filter out curve end-caps. */
const float u = __uint_as_float(optixGetAttribute_0());
if (u == 0.0f || u == 1.0f) {
return optixIgnoreIntersection();
}
}
# endif
#endif
const uint object = get_object_id();
const uint visibility = optixGetPayload_4();
#ifdef __VISIBILITY_FLAG__
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
#endif
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (visibility & PATH_RAY_SHADOW_OPAQUE) {
if (intersection_skip_self_shadow(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
else {
/* Shadow ray early termination. */
return optixTerminateRay();
}
}
else {
if (intersection_skip_self(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
}
}
extern "C" __global__ void __closesthit__kernel_optix_hit()
{
const int object = get_object_id();
const int prim = optixGetPrimitiveIndex();
optixSetPayload_0(__float_as_uint(optixGetRayTmax())); /* Intersection distance */
optixSetPayload_4(object);
if (optixIsTriangleHit()) {
const float2 barycentrics = optixGetTriangleBarycentrics();
optixSetPayload_1(__float_as_uint(barycentrics.x));
optixSetPayload_2(__float_as_uint(barycentrics.y));
optixSetPayload_3(prim);
optixSetPayload_5(kernel_data_fetch(objects, object).primitive_type);
}
else if ((optixGetHitKind() & (~PRIMITIVE_MOTION)) != PRIMITIVE_POINT) {
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
optixSetPayload_1(optixGetAttribute_0()); /* Same as 'optixGetCurveParameter()' */
optixSetPayload_2(optixGetAttribute_1());
optixSetPayload_3(segment.prim);
optixSetPayload_5(segment.type);
}
else {
optixSetPayload_1(0);
optixSetPayload_2(0);
optixSetPayload_3(prim);
optixSetPayload_5(kernel_data_fetch(objects, object).primitive_type);
}
}
/* Custom primitive intersection functions. */
#ifdef __HAIR__
ccl_device_inline void optix_intersection_curve(const int prim, const int type)
{
const int object = get_object_id();
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return;
}
# endif
const float3 ray_P = optixGetObjectRayOrigin();
const float3 ray_D = optixGetObjectRayDirection();
const float ray_tmin = optixGetRayTmin();
# ifdef __OBJECT_MOTION__
const float time = optixGetRayTime();
# else
const float time = 0.0f;
# endif
Intersection isect;
isect.t = optixGetRayTmax();
if (curve_intersect(NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
static_assert(PRIMITIVE_ALL < 128, "Values >= 128 are reserved for OptiX internal use");
optixReportIntersection(isect.t,
type & PRIMITIVE_ALL,
__float_as_int(isect.u), /* Attribute_0 */
__float_as_int(isect.v)); /* Attribute_1 */
}
}
extern "C" __global__ void __intersection__curve_ribbon()
{
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, optixGetPrimitiveIndex());
const int prim = segment.prim;
const int type = segment.type;
if (type & PRIMITIVE_CURVE_RIBBON) {
optix_intersection_curve(prim, type);
}
}
#endif
#ifdef __POINTCLOUD__
extern "C" __global__ void __intersection__point()
{
const int prim = optixGetPrimitiveIndex();
const int object = get_object_id();
const int type = kernel_data_fetch(objects, object).primitive_type;
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return;
}
# endif
const float3 ray_P = optixGetObjectRayOrigin();
const float3 ray_D = optixGetObjectRayDirection();
const float ray_tmin = optixGetRayTmin();
# ifdef __OBJECT_MOTION__
const float time = optixGetRayTime();
# else
const float time = 0.0f;
# endif
Intersection isect;
isect.t = optixGetRayTmax();
if (point_intersect(NULL, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type)) {
static_assert(PRIMITIVE_ALL < 128, "Values >= 128 are reserved for OptiX internal use");
optixReportIntersection(isect.t, type & PRIMITIVE_ALL);
}
}
#endif
/* Scene intersection. */
ccl_device_intersect bool scene_intersect(KernelGlobals kg,
ccl_private const Ray *ray,
const uint visibility,
ccl_private Intersection *isect)
{
uint p0 = 0;
uint p1 = 0;
uint p2 = 0;
uint p3 = 0;
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
uint ray_flags = OPTIX_RAY_FLAG_ENFORCE_ANYHIT;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
else if (visibility & PATH_RAY_SHADOW_OPAQUE) {
ray_flags |= OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT;
}
optixTrace(intersection_ray_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
ray_flags,
0, /* SBT offset for PG_HITD */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
isect->t = __uint_as_float(p0);
isect->u = __uint_as_float(p1);
isect->v = __uint_as_float(p2);
isect->prim = p3;
isect->object = p4;
isect->type = p5;
return p5 != PRIMITIVE_NONE;
}
#ifdef __BVH_LOCAL__
ccl_device_intersect bool scene_intersect_local(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private LocalIntersection *local_isect,
int local_object,
ccl_private uint *lcg_state,
int max_hits)
{
uint p0 = pointer_pack_to_uint_0(lcg_state);
uint p1 = pointer_pack_to_uint_1(lcg_state);
uint p2 = pointer_pack_to_uint_0(local_isect);
uint p3 = pointer_pack_to_uint_1(local_isect);
uint p4 = local_object;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
/* Is set to zero on miss or if ray is aborted, so can be used as return value. */
uint p5 = max_hits;
if (local_isect) {
local_isect->num_hits = 0; /* Initialize hit count to zero. */
}
optixTrace(intersection_ray_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
0xFF,
/* Need to always call into __anyhit__kernel_optix_local_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
2, /* SBT offset for PG_HITL */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
return p5;
}
#endif
#ifdef __SHADOW_RECORD_ALL__
ccl_device_intersect bool scene_intersect_shadow_all(KernelGlobals kg,
IntegratorShadowState state,
ccl_private const Ray *ray,
uint visibility,
uint max_hits,
ccl_private uint *num_recorded_hits,
ccl_private float *throughput)
{
uint p0 = state;
uint p1 = __float_as_uint(1.0f); /* Throughput. */
uint p2 = 0; /* Number of hits. */
uint p3 = max_hits;
uint p4 = visibility;
uint p5 = false;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
optixTrace(intersection_ray_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
/* Need to always call into __anyhit__kernel_optix_shadow_all_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
1, /* SBT offset for PG_HITS */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
*num_recorded_hits = uint16_unpack_from_uint_0(p2);
*throughput = __uint_as_float(p1);
return p5;
}
#endif
#ifdef __VOLUME__
ccl_device_intersect bool scene_intersect_volume(KernelGlobals kg,
ccl_private const Ray *ray,
ccl_private Intersection *isect,
const uint visibility)
{
uint p0 = 0;
uint p1 = 0;
uint p2 = 0;
uint p3 = 0;
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint p6 = ((uint64_t)ray) & 0xFFFFFFFF;
uint p7 = (((uint64_t)ray) >> 32) & 0xFFFFFFFF;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
optixTrace(intersection_ray_valid(ray) ? kernel_data.device_bvh : 0,
ray->P,
ray->D,
ray->tmin,
ray->tmax,
ray->time,
ray_mask,
/* Need to always call into __anyhit__kernel_optix_volume_test. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
3, /* SBT offset for PG_HITV */
0,
0,
p0,
p1,
p2,
p3,
p4,
p5,
p6,
p7);
isect->t = __uint_as_float(p0);
isect->u = __uint_as_float(p1);
isect->v = __uint_as_float(p2);
isect->prim = p3;
isect->object = p4;
isect->type = p5;
return p5 != PRIMITIVE_NONE;
}
#endif
CCL_NAMESPACE_END

View File

@@ -8,7 +8,6 @@
#include <optix.h>
#define __KERNEL_GPU__
#define __KERNEL_GPU_RAYTRACING__
#define __KERNEL_CUDA__ /* OptiX kernels are implicitly CUDA kernels too */
#define __KERNEL_OPTIX__
#define CCL_NAMESPACE_BEGIN

View File

@@ -20,34 +20,6 @@
#include "kernel/integrator/intersect_volume_stack.h"
// clang-format on
#define OPTIX_DEFINE_ABI_VERSION_ONLY
#include <optix_function_table.h>
template<typename T> ccl_device_forceinline T *get_payload_ptr_0()
{
return pointer_unpack_from_uint<T>(optixGetPayload_0(), optixGetPayload_1());
}
template<typename T> ccl_device_forceinline T *get_payload_ptr_2()
{
return pointer_unpack_from_uint<T>(optixGetPayload_2(), optixGetPayload_3());
}
template<typename T> ccl_device_forceinline T *get_payload_ptr_6()
{
return (T *)(((uint64_t)optixGetPayload_7() << 32) | optixGetPayload_6());
}
ccl_device_forceinline int get_object_id()
{
#ifdef __OBJECT_MOTION__
/* Always get the instance ID from the TLAS
* There might be a motion transform node between TLAS and BLAS which does not have one. */
return optixGetInstanceIdFromHandle(optixGetTransformListHandle(0));
#else
return optixGetInstanceId();
#endif
}
extern "C" __global__ void __raygen__kernel_optix_integrator_intersect_closest()
{
const int global_index = optixGetLaunchIndex().x;
@@ -84,411 +56,3 @@ extern "C" __global__ void __raygen__kernel_optix_integrator_intersect_volume_st
integrator_intersect_volume_stack(nullptr, path_index);
}
extern "C" __global__ void __miss__kernel_optix_miss()
{
/* 'kernel_path_lamp_emission' checks intersection distance, so need to set it even on a miss. */
optixSetPayload_0(__float_as_uint(optixGetRayTmax()));
optixSetPayload_5(PRIMITIVE_NONE);
}
extern "C" __global__ void __anyhit__kernel_optix_local_hit()
{
#if defined(__HAIR__) || defined(__POINTCLOUD__)
if (!optixIsTriangleHit()) {
/* Ignore curves and points. */
return optixIgnoreIntersection();
}
#endif
#ifdef __BVH_LOCAL__
const int object = get_object_id();
if (object != optixGetPayload_4() /* local_object */) {
/* Only intersect with matching object. */
return optixIgnoreIntersection();
}
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self_local(ray->self, prim)) {
return optixIgnoreIntersection();
}
const uint max_hits = optixGetPayload_5();
if (max_hits == 0) {
/* Special case for when no hit information is requested, just report that something was hit */
optixSetPayload_5(true);
return optixTerminateRay();
}
int hit = 0;
uint *const lcg_state = get_payload_ptr_0<uint>();
LocalIntersection *const local_isect = get_payload_ptr_2<LocalIntersection>();
if (lcg_state) {
for (int i = min(max_hits, local_isect->num_hits) - 1; i >= 0; --i) {
if (optixGetRayTmax() == local_isect->hits[i].t) {
return optixIgnoreIntersection();
}
}
hit = local_isect->num_hits++;
if (local_isect->num_hits > max_hits) {
hit = lcg_step_uint(lcg_state) % local_isect->num_hits;
if (hit >= max_hits) {
return optixIgnoreIntersection();
}
}
}
else {
if (local_isect->num_hits && optixGetRayTmax() > local_isect->hits[0].t) {
/* Record closest intersection only.
* Do not terminate ray here, since there is no guarantee about distance ordering in any-hit.
*/
return optixIgnoreIntersection();
}
local_isect->num_hits = 1;
}
Intersection *isect = &local_isect->hits[hit];
isect->t = optixGetRayTmax();
isect->prim = prim;
isect->object = get_object_id();
isect->type = kernel_data_fetch(objects, isect->object).primitive_type;
const float2 barycentrics = optixGetTriangleBarycentrics();
isect->u = 1.0f - barycentrics.y - barycentrics.x;
isect->v = barycentrics.x;
/* Record geometric normal. */
const uint tri_vindex = kernel_data_fetch(tri_vindex, prim).w;
const float3 tri_a = kernel_data_fetch(tri_verts, tri_vindex + 0);
const float3 tri_b = kernel_data_fetch(tri_verts, tri_vindex + 1);
const float3 tri_c = kernel_data_fetch(tri_verts, tri_vindex + 2);
local_isect->Ng[hit] = normalize(cross(tri_b - tri_a, tri_c - tri_a));
/* Continue tracing (without this the trace call would return after the first hit). */
optixIgnoreIntersection();
#endif
}
extern "C" __global__ void __anyhit__kernel_optix_shadow_all_hit()
{
#ifdef __SHADOW_RECORD_ALL__
int prim = optixGetPrimitiveIndex();
const uint object = get_object_id();
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
# endif
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self_shadow(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
float u = 0.0f, v = 0.0f;
int type = 0;
if (optixIsTriangleHit()) {
const float2 barycentrics = optixGetTriangleBarycentrics();
u = 1.0f - barycentrics.y - barycentrics.x;
v = barycentrics.x;
type = kernel_data_fetch(objects, object).primitive_type;
}
# ifdef __HAIR__
else if ((optixGetHitKind() & (~PRIMITIVE_MOTION)) != PRIMITIVE_POINT) {
u = __uint_as_float(optixGetAttribute_0());
v = __uint_as_float(optixGetAttribute_1());
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
type = segment.type;
prim = segment.prim;
# if OPTIX_ABI_VERSION < 55
/* Filter out curve endcaps. */
if (u == 0.0f || u == 1.0f) {
return optixIgnoreIntersection();
}
# endif
}
# endif
else {
type = kernel_data_fetch(objects, object).primitive_type;
u = 0.0f;
v = 0.0f;
}
# ifndef __TRANSPARENT_SHADOWS__
/* No transparent shadows support compiled in, make opaque. */
optixSetPayload_5(true);
return optixTerminateRay();
# else
const uint max_hits = optixGetPayload_3();
const uint num_hits_packed = optixGetPayload_2();
const uint num_recorded_hits = uint16_unpack_from_uint_0(num_hits_packed);
const uint num_hits = uint16_unpack_from_uint_1(num_hits_packed);
/* If no transparent shadows, all light is blocked and we can stop immediately. */
if (num_hits >= max_hits ||
!(intersection_get_shader_flags(NULL, prim, type) & SD_HAS_TRANSPARENT_SHADOW)) {
optixSetPayload_5(true);
return optixTerminateRay();
}
/* Always use baked shadow transparency for curves. */
if (type & PRIMITIVE_CURVE) {
float throughput = __uint_as_float(optixGetPayload_1());
throughput *= intersection_curve_shadow_transparency(nullptr, object, prim, u);
optixSetPayload_1(__float_as_uint(throughput));
optixSetPayload_2(uint16_pack_to_uint(num_recorded_hits, num_hits + 1));
if (throughput < CURVE_SHADOW_TRANSPARENCY_CUTOFF) {
optixSetPayload_5(true);
return optixTerminateRay();
}
else {
/* Continue tracing. */
optixIgnoreIntersection();
return;
}
}
/* Record transparent intersection. */
optixSetPayload_2(uint16_pack_to_uint(num_recorded_hits + 1, num_hits + 1));
uint record_index = num_recorded_hits;
const IntegratorShadowState state = optixGetPayload_0();
const uint max_record_hits = min(max_hits, INTEGRATOR_SHADOW_ISECT_SIZE);
if (record_index >= max_record_hits) {
/* If maximum number of hits reached, find a hit to replace. */
float max_recorded_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, 0, t);
uint max_recorded_hit = 0;
for (int i = 1; i < max_record_hits; i++) {
const float isect_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, i, t);
if (isect_t > max_recorded_t) {
max_recorded_t = isect_t;
max_recorded_hit = i;
}
}
if (optixGetRayTmax() >= max_recorded_t) {
/* Accept hit, so that OptiX won't consider any more hits beyond the distance of the
* current hit anymore. */
return;
}
record_index = max_recorded_hit;
}
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, u) = u;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, v) = v;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, t) = optixGetRayTmax();
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, prim) = prim;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, object) = object;
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, type) = type;
/* Continue tracing. */
optixIgnoreIntersection();
# endif /* __TRANSPARENT_SHADOWS__ */
#endif /* __SHADOW_RECORD_ALL__ */
}
extern "C" __global__ void __anyhit__kernel_optix_volume_test()
{
#if defined(__HAIR__) || defined(__POINTCLOUD__)
if (!optixIsTriangleHit()) {
/* Ignore curves. */
return optixIgnoreIntersection();
}
#endif
const uint object = get_object_id();
#ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
#endif
if ((kernel_data_fetch(object_flag, object) & SD_OBJECT_HAS_VOLUME) == 0) {
return optixIgnoreIntersection();
}
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (intersection_skip_self(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
}
extern "C" __global__ void __anyhit__kernel_optix_visibility_test()
{
#ifdef __HAIR__
# if OPTIX_ABI_VERSION < 55
if (optixGetPrimitiveType() == OPTIX_PRIMITIVE_TYPE_ROUND_CUBIC_BSPLINE) {
/* Filter out curve endcaps. */
const float u = __uint_as_float(optixGetAttribute_0());
if (u == 0.0f || u == 1.0f) {
return optixIgnoreIntersection();
}
}
# endif
#endif
const uint object = get_object_id();
const uint visibility = optixGetPayload_4();
#ifdef __VISIBILITY_FLAG__
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return optixIgnoreIntersection();
}
#endif
const int prim = optixGetPrimitiveIndex();
ccl_private Ray *const ray = get_payload_ptr_6<Ray>();
if (visibility & PATH_RAY_SHADOW_OPAQUE) {
if (intersection_skip_self_shadow(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
else {
/* Shadow ray early termination. */
return optixTerminateRay();
}
}
else {
if (intersection_skip_self(ray->self, object, prim)) {
return optixIgnoreIntersection();
}
}
}
extern "C" __global__ void __closesthit__kernel_optix_hit()
{
const int object = get_object_id();
const int prim = optixGetPrimitiveIndex();
optixSetPayload_0(__float_as_uint(optixGetRayTmax())); /* Intersection distance */
optixSetPayload_4(object);
if (optixIsTriangleHit()) {
const float2 barycentrics = optixGetTriangleBarycentrics();
optixSetPayload_1(__float_as_uint(1.0f - barycentrics.y - barycentrics.x));
optixSetPayload_2(__float_as_uint(barycentrics.x));
optixSetPayload_3(prim);
optixSetPayload_5(kernel_data_fetch(objects, object).primitive_type);
}
else if ((optixGetHitKind() & (~PRIMITIVE_MOTION)) != PRIMITIVE_POINT) {
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
optixSetPayload_1(optixGetAttribute_0()); /* Same as 'optixGetCurveParameter()' */
optixSetPayload_2(optixGetAttribute_1());
optixSetPayload_3(segment.prim);
optixSetPayload_5(segment.type);
}
else {
optixSetPayload_1(0);
optixSetPayload_2(0);
optixSetPayload_3(prim);
optixSetPayload_5(kernel_data_fetch(objects, object).primitive_type);
}
}
#ifdef __HAIR__
ccl_device_inline void optix_intersection_curve(const int prim, const int type)
{
const int object = get_object_id();
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return;
}
# endif
float3 P = optixGetObjectRayOrigin();
float3 dir = optixGetObjectRayDirection();
float tmin = optixGetRayTmin();
/* The direction is not normalized by default, but the curve intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
# ifdef __OBJECT_MOTION__
const float time = optixGetRayTime();
# else
const float time = 0.0f;
# endif
Intersection isect;
isect.t = optixGetRayTmax();
/* Transform maximum distance into object space. */
if (isect.t != FLT_MAX)
isect.t *= len;
if (curve_intersect(NULL, &isect, P, dir, tmin, isect.t, object, prim, time, type)) {
static_assert(PRIMITIVE_ALL < 128, "Values >= 128 are reserved for OptiX internal use");
optixReportIntersection(isect.t / len,
type & PRIMITIVE_ALL,
__float_as_int(isect.u), /* Attribute_0 */
__float_as_int(isect.v)); /* Attribute_1 */
}
}
extern "C" __global__ void __intersection__curve_ribbon()
{
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, optixGetPrimitiveIndex());
const int prim = segment.prim;
const int type = segment.type;
if (type & PRIMITIVE_CURVE_RIBBON) {
optix_intersection_curve(prim, type);
}
}
#endif
#ifdef __POINTCLOUD__
extern "C" __global__ void __intersection__point()
{
const int prim = optixGetPrimitiveIndex();
const int object = get_object_id();
const int type = kernel_data_fetch(objects, object).primitive_type;
# ifdef __VISIBILITY_FLAG__
const uint visibility = optixGetPayload_4();
if ((kernel_data_fetch(objects, object).visibility & visibility) == 0) {
return;
}
# endif
float3 P = optixGetObjectRayOrigin();
float3 dir = optixGetObjectRayDirection();
float tmin = optixGetRayTmin();
/* The direction is not normalized by default, the point intersection routine expects that. */
float len;
dir = normalize_len(dir, &len);
# ifdef __OBJECT_MOTION__
const float time = optixGetRayTime();
# else
const float time = 0.0f;
# endif
Intersection isect;
isect.t = optixGetRayTmax();
/* Transform maximum distance into object space. */
if (isect.t != FLT_MAX) {
isect.t *= len;
}
if (point_intersect(NULL, &isect, P, dir, tmin, isect.t, object, prim, time, type)) {
static_assert(PRIMITIVE_ALL < 128, "Values >= 128 are reserved for OptiX internal use");
optixReportIntersection(isect.t / len, type & PRIMITIVE_ALL);
}
}
#endif

View File

@@ -72,7 +72,7 @@ ccl_device_inline float sqr_point_to_line_distance(const float3 PmQ0, const floa
ccl_device_inline bool cylinder_intersect(const float3 cylinder_start,
const float3 cylinder_end,
const float cylinder_radius,
const float3 ray_dir,
const float3 ray_D,
ccl_private float2 *t_o,
ccl_private float *u0_o,
ccl_private float3 *Ng0_o,
@@ -82,7 +82,7 @@ ccl_device_inline bool cylinder_intersect(const float3 cylinder_start,
/* Calculate quadratic equation to solve. */
const float rl = 1.0f / len(cylinder_end - cylinder_start);
const float3 P0 = cylinder_start, dP = (cylinder_end - cylinder_start) * rl;
const float3 O = -P0, dO = ray_dir;
const float3 O = -P0, dO = ray_D;
const float dOdO = dot(dO, dO);
const float OdO = dot(dO, O);
@@ -123,7 +123,7 @@ ccl_device_inline bool cylinder_intersect(const float3 cylinder_start,
/* Calculates u and Ng for near hit. */
{
*u0_o = (t0 * dOz + Oz) * rl;
const float3 Pr = t0 * ray_dir;
const float3 Pr = t0 * ray_D;
const float3 Pl = (*u0_o) * (cylinder_end - cylinder_start) + cylinder_start;
*Ng0_o = Pr - Pl;
}
@@ -131,7 +131,7 @@ ccl_device_inline bool cylinder_intersect(const float3 cylinder_start,
/* Calculates u and Ng for far hit. */
{
*u1_o = (t1 * dOz + Oz) * rl;
const float3 Pr = t1 * ray_dir;
const float3 Pr = t1 * ray_D;
const float3 Pl = (*u1_o) * (cylinder_end - cylinder_start) + cylinder_start;
*Ng1_o = Pr - Pl;
}
@@ -141,10 +141,10 @@ ccl_device_inline bool cylinder_intersect(const float3 cylinder_start,
return true;
}
ccl_device_inline float2 half_plane_intersect(const float3 P, const float3 N, const float3 ray_dir)
ccl_device_inline float2 half_plane_intersect(const float3 P, const float3 N, const float3 ray_D)
{
const float3 O = -P;
const float3 D = ray_dir;
const float3 D = ray_D;
const float ON = dot(O, N);
const float DN = dot(D, N);
const float min_rcp_input = 1e-18f;
@@ -155,7 +155,7 @@ ccl_device_inline float2 half_plane_intersect(const float3 P, const float3 N, co
return make_float2(lower, upper);
}
ccl_device bool curve_intersect_iterative(const float3 ray_dir,
ccl_device bool curve_intersect_iterative(const float3 ray_D,
const float ray_tmin,
ccl_private float *ray_tmax,
const float dt,
@@ -165,7 +165,7 @@ ccl_device bool curve_intersect_iterative(const float3 ray_dir,
const bool use_backfacing,
ccl_private Intersection *isect)
{
const float length_ray_dir = len(ray_dir);
const float length_ray_D = len(ray_D);
/* Error of curve evaluations is proportional to largest coordinate. */
const float4 box_min = min(min(curve[0], curve[1]), min(curve[2], curve[3]));
@@ -176,9 +176,9 @@ ccl_device bool curve_intersect_iterative(const float3 ray_dir,
const float radius_max = box_max.w;
for (int i = 0; i < CURVE_NUM_JACOBIAN_ITERATIONS; i++) {
const float3 Q = ray_dir * t;
const float3 dQdt = ray_dir;
const float Q_err = 16.0f * FLT_EPSILON * length_ray_dir * t;
const float3 Q = ray_D * t;
const float3 dQdt = ray_D;
const float Q_err = 16.0f * FLT_EPSILON * length_ray_D * t;
const float4 P4 = catmull_rom_basis_eval(curve, u);
const float4 dPdu4 = catmull_rom_basis_derivative(curve, u);
@@ -233,7 +233,7 @@ ccl_device bool curve_intersect_iterative(const float3 ray_dir,
const float3 U = dradiusdu * R + dPdu;
const float3 V = cross(dPdu, R);
const float3 Ng = cross(V, U);
if (!use_backfacing && dot(ray_dir, Ng) > 0.0f) {
if (!use_backfacing && dot(ray_D, Ng) > 0.0f) {
return false;
}
@@ -249,8 +249,8 @@ ccl_device bool curve_intersect_iterative(const float3 ray_dir,
return false;
}
ccl_device bool curve_intersect_recursive(const float3 ray_orig,
const float3 ray_dir,
ccl_device bool curve_intersect_recursive(const float3 ray_P,
const float3 ray_D,
const float ray_tmin,
float ray_tmax,
float4 curve[4],
@@ -258,8 +258,8 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
{
/* Move ray closer to make intersection stable. */
const float3 center = float4_to_float3(0.25f * (curve[0] + curve[1] + curve[2] + curve[3]));
const float dt = dot(center - ray_orig, ray_dir) / dot(ray_dir, ray_dir);
const float3 ref = ray_orig + ray_dir * dt;
const float dt = dot(center - ray_P, ray_D) / dot(ray_D, ray_D);
const float3 ref = ray_P + ray_D * dt;
const float4 ref4 = make_float4(ref.x, ref.y, ref.z, 0.0f);
curve[0] -= ref4;
curve[1] -= ref4;
@@ -322,7 +322,7 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
valid = cylinder_intersect(float4_to_float3(P0),
float4_to_float3(P3),
r_outer,
ray_dir,
ray_D,
&tc_outer,
&u_outer0,
&Ng_outer0,
@@ -335,11 +335,10 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
/* Intersect with cap-planes. */
float2 tp = make_float2(ray_tmin - dt, ray_tmax - dt);
tp = make_float2(max(tp.x, tc_outer.x), min(tp.y, tc_outer.y));
const float2 h0 = half_plane_intersect(
float4_to_float3(P0), float4_to_float3(dP0du), ray_dir);
const float2 h0 = half_plane_intersect(float4_to_float3(P0), float4_to_float3(dP0du), ray_D);
tp = make_float2(max(tp.x, h0.x), min(tp.y, h0.y));
const float2 h1 = half_plane_intersect(
float4_to_float3(P3), -float4_to_float3(dP3du), ray_dir);
float4_to_float3(P3), -float4_to_float3(dP3du), ray_D);
tp = make_float2(max(tp.x, h1.x), min(tp.y, h1.y));
valid = tp.x <= tp.y;
if (!valid) {
@@ -359,7 +358,7 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
const bool valid_inner = cylinder_intersect(float4_to_float3(P0),
float4_to_float3(P3),
r_inner,
ray_dir,
ray_D,
&tc_inner,
&u_inner0,
&Ng_inner0,
@@ -369,9 +368,9 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
/* At the unstable area we subdivide deeper. */
# if 0
const bool unstable0 = (!valid_inner) |
(fabsf(dot(normalize(ray_dir), normalize(Ng_inner0))) < 0.3f);
(fabsf(dot(normalize(ray_D), normalize(Ng_inner0))) < 0.3f);
const bool unstable1 = (!valid_inner) |
(fabsf(dot(normalize(ray_dir), normalize(Ng_inner1))) < 0.3f);
(fabsf(dot(normalize(ray_D), normalize(Ng_inner1))) < 0.3f);
# else
/* On the GPU appears to be a little faster if always enabled. */
(void)valid_inner;
@@ -396,7 +395,7 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
CURVE_NUM_BEZIER_SUBDIVISIONS;
if (depth >= termDepth) {
found |= curve_intersect_iterative(
ray_dir, ray_tmin, &ray_tmax, dt, curve, u_outer0, tp0.x, use_backfacing, isect);
ray_D, ray_tmin, &ray_tmax, dt, curve, u_outer0, tp0.x, use_backfacing, isect);
}
else {
recurse = true;
@@ -409,7 +408,7 @@ ccl_device bool curve_intersect_recursive(const float3 ray_orig,
CURVE_NUM_BEZIER_SUBDIVISIONS;
if (depth >= termDepth) {
found |= curve_intersect_iterative(
ray_dir, ray_tmin, &ray_tmax, dt, curve, u_outer1, tp1.y, use_backfacing, isect);
ray_D, ray_tmin, &ray_tmax, dt, curve, u_outer1, tp1.y, use_backfacing, isect);
}
else {
recurse = true;
@@ -519,13 +518,16 @@ ccl_device_inline bool ribbon_intersect_quad(const float ray_tmin,
return true;
}
ccl_device_inline void ribbon_ray_space(const float3 ray_dir, float3 ray_space[3])
ccl_device_inline void ribbon_ray_space(const float3 ray_D,
const float ray_D_invlen,
float3 ray_space[3])
{
const float3 dx0 = make_float3(0, ray_dir.z, -ray_dir.y);
const float3 dx1 = make_float3(-ray_dir.z, 0, ray_dir.x);
const float3 D = ray_D * ray_D_invlen;
const float3 dx0 = make_float3(0, D.z, -D.y);
const float3 dx1 = make_float3(-D.z, 0, D.x);
ray_space[0] = normalize(dot(dx0, dx0) > dot(dx1, dx1) ? dx0 : dx1);
ray_space[1] = normalize(cross(ray_dir, ray_space[0]));
ray_space[2] = ray_dir;
ray_space[1] = normalize(cross(D, ray_space[0]));
ray_space[2] = D * ray_D_invlen;
}
ccl_device_inline float4 ribbon_to_ray_space(const float3 ray_space[3],
@@ -537,7 +539,7 @@ ccl_device_inline float4 ribbon_to_ray_space(const float3 ray_space[3],
}
ccl_device_inline bool ribbon_intersect(const float3 ray_org,
const float3 ray_dir,
const float3 ray_D,
const float ray_tmin,
float ray_tmax,
const int N,
@@ -545,8 +547,9 @@ ccl_device_inline bool ribbon_intersect(const float3 ray_org,
ccl_private Intersection *isect)
{
/* Transform control points into ray space. */
const float ray_D_invlen = 1.0f / len(ray_D);
float3 ray_space[3];
ribbon_ray_space(ray_dir, ray_space);
ribbon_ray_space(ray_D, ray_D_invlen, ray_space);
curve[0] = ribbon_to_ray_space(ray_space, ray_org, curve[0]);
curve[1] = ribbon_to_ray_space(ray_space, ray_org, curve[1]);
@@ -594,7 +597,7 @@ ccl_device_inline bool ribbon_intersect(const float3 ray_org,
const float avoidance_factor = 2.0f;
if (avoidance_factor != 0.0f) {
float r = mix(p0.w, p1.w, vu);
valid0 = vt > avoidance_factor * r;
valid0 = vt > avoidance_factor * r * ray_D_invlen;
}
if (valid0) {
@@ -619,8 +622,8 @@ ccl_device_inline bool ribbon_intersect(const float3 ray_org,
ccl_device_forceinline bool curve_intersect(KernelGlobals kg,
ccl_private Intersection *isect,
const float3 P,
const float3 dir,
const float3 ray_P,
const float3 ray_D,
const float tmin,
const float tmax,
int object,
@@ -651,7 +654,7 @@ ccl_device_forceinline bool curve_intersect(KernelGlobals kg,
if (type & PRIMITIVE_CURVE_RIBBON) {
/* todo: adaptive number of subdivisions could help performance here. */
const int subdivisions = kernel_data.bvh.curve_subdivisions;
if (ribbon_intersect(P, dir, tmin, tmax, subdivisions, curve, isect)) {
if (ribbon_intersect(ray_P, ray_D, tmin, tmax, subdivisions, curve, isect)) {
isect->prim = prim;
isect->object = object;
isect->type = type;
@@ -661,7 +664,7 @@ ccl_device_forceinline bool curve_intersect(KernelGlobals kg,
return false;
}
else {
if (curve_intersect_recursive(P, dir, tmin, tmax, curve, isect)) {
if (curve_intersect_recursive(ray_P, ray_D, tmin, tmax, curve, isect)) {
isect->prim = prim;
isect->object = object;
isect->type = type;

View File

@@ -27,8 +27,8 @@ ccl_device_inline float3 motion_triangle_point_from_uv(KernelGlobals kg,
const float v,
float3 verts[3])
{
float w = 1.0f - u - v;
float3 P = u * verts[0] + v * verts[1] + w * verts[2];
/* This appears to give slightly better precision than interpolating with w = (1 - u - v). */
float3 P = verts[0] + u * (verts[1] - verts[0]) + v * (verts[2] - verts[0]);
if (!(sd->object_flag & SD_OBJECT_TRANSFORM_APPLIED)) {
const Transform tfm = object_get_transform(kg, sd);

View File

@@ -86,7 +86,7 @@ ccl_device_inline Transform object_fetch_transform_motion_test(KernelGlobals kg,
Transform tfm = object_fetch_transform_motion(kg, object, time);
if (itfm)
*itfm = transform_quick_inverse(tfm);
*itfm = transform_inverse(tfm);
return tfm;
}
@@ -488,127 +488,54 @@ ccl_device_inline float3 bvh_inverse_direction(float3 dir)
/* Transform ray into object space to enter static object in BVH */
ccl_device_inline float bvh_instance_push(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir)
ccl_device_inline void bvh_instance_push(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir)
{
Transform tfm = object_fetch_transform(kg, object, OBJECT_INVERSE_TRANSFORM);
*P = transform_point(&tfm, ray->P);
float len;
*dir = bvh_clamp_direction(normalize_len(transform_direction(&tfm, ray->D), &len));
*idir = bvh_inverse_direction(*dir);
return len;
}
/* Transform ray to exit static object in BVH. */
ccl_device_inline float bvh_instance_pop(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir,
float t)
{
if (t != FLT_MAX) {
Transform tfm = object_fetch_transform(kg, object, OBJECT_INVERSE_TRANSFORM);
t /= len(transform_direction(&tfm, ray->D));
}
*P = ray->P;
*dir = bvh_clamp_direction(ray->D);
*idir = bvh_inverse_direction(*dir);
return t;
}
/* Same as above, but returns scale factor to apply to multiple intersection distances */
ccl_device_inline void bvh_instance_pop_factor(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir,
ccl_private float *t_fac)
{
Transform tfm = object_fetch_transform(kg, object, OBJECT_INVERSE_TRANSFORM);
*t_fac = 1.0f / len(transform_direction(&tfm, ray->D));
*P = ray->P;
*dir = bvh_clamp_direction(ray->D);
*dir = bvh_clamp_direction(transform_direction(&tfm, ray->D));
*idir = bvh_inverse_direction(*dir);
}
#ifdef __OBJECT_MOTION__
/* Transform ray into object space to enter motion blurred object in BVH */
ccl_device_inline float bvh_instance_motion_push(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir,
ccl_private Transform *itfm)
{
object_fetch_transform_motion_test(kg, object, ray->time, itfm);
*P = transform_point(itfm, ray->P);
float len;
*dir = bvh_clamp_direction(normalize_len(transform_direction(itfm, ray->D), &len));
*idir = bvh_inverse_direction(*dir);
return len;
}
/* Transform ray to exit motion blurred object in BVH. */
ccl_device_inline float bvh_instance_motion_pop(KernelGlobals kg,
ccl_device_inline void bvh_instance_motion_push(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir,
float t,
ccl_private Transform *itfm)
ccl_private float3 *idir)
{
if (t != FLT_MAX) {
t /= len(transform_direction(itfm, ray->D));
}
Transform tfm;
object_fetch_transform_motion_test(kg, object, ray->time, &tfm);
*P = ray->P;
*dir = bvh_clamp_direction(ray->D);
*idir = bvh_inverse_direction(*dir);
*P = transform_point(&tfm, ray->P);
return t;
}
/* Same as above, but returns scale factor to apply to multiple intersection distances */
ccl_device_inline void bvh_instance_motion_pop_factor(KernelGlobals kg,
int object,
ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir,
ccl_private float *t_fac,
ccl_private Transform *itfm)
{
*t_fac = 1.0f / len(transform_direction(itfm, ray->D));
*P = ray->P;
*dir = bvh_clamp_direction(ray->D);
*dir = bvh_clamp_direction(transform_direction(&tfm, ray->D));
*idir = bvh_inverse_direction(*dir);
}
#endif
/* Transform ray to exit static object in BVH. */
ccl_device_inline void bvh_instance_pop(ccl_private const Ray *ray,
ccl_private float3 *P,
ccl_private float3 *dir,
ccl_private float3 *idir)
{
*P = ray->P;
*dir = bvh_clamp_direction(ray->D);
*idir = bvh_inverse_direction(*dir);
}
/* TODO: This can be removed when we know if no devices will require explicit
* address space qualifiers for this case. */

View File

@@ -10,20 +10,20 @@ CCL_NAMESPACE_BEGIN
#ifdef __POINTCLOUD__
ccl_device_forceinline bool point_intersect_test(const float4 point,
const float3 P,
const float3 dir,
const float tmin,
const float tmax,
const float3 ray_P,
const float3 ray_D,
const float ray_tmin,
const float ray_tmax,
ccl_private float *t)
{
const float3 center = float4_to_float3(point);
const float radius = point.w;
const float rd2 = 1.0f / dot(dir, dir);
const float rd2 = 1.0f / dot(ray_D, ray_D);
const float3 c0 = center - P;
const float projC0 = dot(c0, dir) * rd2;
const float3 perp = c0 - projC0 * dir;
const float3 c0 = center - ray_P;
const float projC0 = dot(c0, ray_D) * rd2;
const float3 perp = c0 - projC0 * ray_D;
const float l2 = dot(perp, perp);
const float r2 = radius * radius;
if (!(l2 <= r2)) {
@@ -32,12 +32,12 @@ ccl_device_forceinline bool point_intersect_test(const float4 point,
const float td = sqrt((r2 - l2) * rd2);
const float t_front = projC0 - td;
const bool valid_front = (tmin <= t_front) & (t_front <= tmax);
const bool valid_front = (ray_tmin <= t_front) & (t_front <= ray_tmax);
/* Always back-face culling for now. */
# if 0
const float t_back = projC0 + td;
const bool valid_back = (tmin <= t_back) & (t_back <= tmax);
const bool valid_back = (ray_tmin <= t_back) & (t_back <= ray_tmax);
/* check if there is a first hit */
const bool valid_first = valid_front | valid_back;
@@ -58,10 +58,10 @@ ccl_device_forceinline bool point_intersect_test(const float4 point,
ccl_device_forceinline bool point_intersect(KernelGlobals kg,
ccl_private Intersection *isect,
const float3 P,
const float3 dir,
const float tmin,
const float tmax,
const float3 ray_P,
const float3 ray_D,
const float ray_tmin,
const float ray_tmax,
const int object,
const int prim,
const float time,
@@ -70,7 +70,7 @@ ccl_device_forceinline bool point_intersect(KernelGlobals kg,
const float4 point = (type & PRIMITIVE_MOTION) ? motion_point(kg, object, prim, time) :
kernel_data_fetch(points, prim);
if (!point_intersect_test(point, P, dir, tmin, tmax, &isect->t)) {
if (!point_intersect_test(point, ray_P, ray_D, ray_tmin, ray_tmax, &isect->t)) {
return false;
}

View File

@@ -18,7 +18,7 @@ ccl_device void shader_setup_object_transforms(KernelGlobals kg,
{
if (sd->object_flag & SD_OBJECT_MOTION) {
sd->ob_tfm_motion = object_fetch_transform_motion(kg, sd->object, time);
sd->ob_itfm_motion = transform_quick_inverse(sd->ob_tfm_motion);
sd->ob_itfm_motion = transform_inverse(sd->ob_tfm_motion);
}
}
#endif

View File

@@ -94,11 +94,11 @@ ccl_device_noinline float subd_triangle_attribute_float(KernelGlobals kg,
float2 uv[3];
subd_triangle_patch_uv(kg, sd, uv);
float2 dpdu = uv[0] - uv[2];
float2 dpdv = uv[1] - uv[2];
float2 dpdu = uv[1] - uv[0];
float2 dpdv = uv[2] - uv[0];
/* p is [s, t] */
float2 p = dpdu * sd->u + dpdv * sd->v + uv[2];
float2 p = dpdu * sd->u + dpdv * sd->v + uv[0];
float a, dads, dadt;
a = patch_eval_float(kg, sd, desc.offset, patch, p.x, p.y, 0, &dads, &dadt);
@@ -165,12 +165,12 @@ ccl_device_noinline float subd_triangle_attribute_float(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_CORNER) {
float2 uv[3];
@@ -195,12 +195,12 @@ ccl_device_noinline float subd_triangle_attribute_float(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_OBJECT || desc.element == ATTR_ELEMENT_MESH) {
if (dx)
@@ -233,11 +233,11 @@ ccl_device_noinline float2 subd_triangle_attribute_float2(KernelGlobals kg,
float2 uv[3];
subd_triangle_patch_uv(kg, sd, uv);
float2 dpdu = uv[0] - uv[2];
float2 dpdv = uv[1] - uv[2];
float2 dpdu = uv[1] - uv[0];
float2 dpdv = uv[2] - uv[0];
/* p is [s, t] */
float2 p = dpdu * sd->u + dpdv * sd->v + uv[2];
float2 p = dpdu * sd->u + dpdv * sd->v + uv[0];
float2 a, dads, dadt;
@@ -305,12 +305,12 @@ ccl_device_noinline float2 subd_triangle_attribute_float2(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_CORNER) {
float2 uv[3];
@@ -337,12 +337,12 @@ ccl_device_noinline float2 subd_triangle_attribute_float2(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_OBJECT || desc.element == ATTR_ELEMENT_MESH) {
if (dx)
@@ -375,11 +375,11 @@ ccl_device_noinline float3 subd_triangle_attribute_float3(KernelGlobals kg,
float2 uv[3];
subd_triangle_patch_uv(kg, sd, uv);
float2 dpdu = uv[0] - uv[2];
float2 dpdv = uv[1] - uv[2];
float2 dpdu = uv[1] - uv[0];
float2 dpdv = uv[2] - uv[0];
/* p is [s, t] */
float2 p = dpdu * sd->u + dpdv * sd->v + uv[2];
float2 p = dpdu * sd->u + dpdv * sd->v + uv[0];
float3 a, dads, dadt;
a = patch_eval_float3(kg, sd, desc.offset, patch, p.x, p.y, 0, &dads, &dadt);
@@ -446,12 +446,12 @@ ccl_device_noinline float3 subd_triangle_attribute_float3(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_CORNER) {
float2 uv[3];
@@ -478,12 +478,12 @@ ccl_device_noinline float3 subd_triangle_attribute_float3(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_OBJECT || desc.element == ATTR_ELEMENT_MESH) {
if (dx)
@@ -516,11 +516,11 @@ ccl_device_noinline float4 subd_triangle_attribute_float4(KernelGlobals kg,
float2 uv[3];
subd_triangle_patch_uv(kg, sd, uv);
float2 dpdu = uv[0] - uv[2];
float2 dpdv = uv[1] - uv[2];
float2 dpdu = uv[1] - uv[0];
float2 dpdv = uv[2] - uv[0];
/* p is [s, t] */
float2 p = dpdu * sd->u + dpdv * sd->v + uv[2];
float2 p = dpdu * sd->u + dpdv * sd->v + uv[0];
float4 a, dads, dadt;
if (desc.type == NODE_ATTR_RGBA) {
@@ -592,12 +592,12 @@ ccl_device_noinline float4 subd_triangle_attribute_float4(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_CORNER || desc.element == ATTR_ELEMENT_CORNER_BYTE) {
float2 uv[3];
@@ -636,12 +636,12 @@ ccl_device_noinline float4 subd_triangle_attribute_float4(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * a + sd->dv.dx * b - (sd->du.dx + sd->dv.dx) * c;
*dx = sd->du.dx * b + sd->dv.dx * c - (sd->du.dx + sd->dv.dx) * a;
if (dy)
*dy = sd->du.dy * a + sd->dv.dy * b - (sd->du.dy + sd->dv.dy) * c;
*dy = sd->du.dy * b + sd->dv.dy * c - (sd->du.dy + sd->dv.dy) * a;
#endif
return sd->u * a + sd->v * b + (1.0f - sd->u - sd->v) * c;
return sd->u * b + sd->v * c + (1.0f - sd->u - sd->v) * a;
}
else if (desc.element == ATTR_ELEMENT_OBJECT || desc.element == ATTR_ELEMENT_MESH) {
if (dx)

View File

@@ -45,8 +45,8 @@ ccl_device_inline void triangle_point_normal(KernelGlobals kg,
float3 v1 = kernel_data_fetch(tri_verts, tri_vindex.w + 1);
float3 v2 = kernel_data_fetch(tri_verts, tri_vindex.w + 2);
/* compute point */
float t = 1.0f - u - v;
*P = (u * v0 + v * v1 + t * v2);
float w = 1.0f - u - v;
*P = (w * v0 + u * v1 + v * v2);
/* get object flags */
int object_flag = kernel_data_fetch(object_flag, object);
/* compute normal */
@@ -97,7 +97,7 @@ triangle_smooth_normal(KernelGlobals kg, float3 Ng, int prim, float u, float v)
float3 n1 = kernel_data_fetch(tri_vnormal, tri_vindex.y);
float3 n2 = kernel_data_fetch(tri_vnormal, tri_vindex.z);
float3 N = safe_normalize((1.0f - u - v) * n2 + u * n0 + v * n1);
float3 N = safe_normalize((1.0f - u - v) * n0 + u * n1 + v * n2);
return is_zero(N) ? Ng : N;
}
@@ -118,7 +118,7 @@ ccl_device_inline float3 triangle_smooth_normal_unnormalized(
object_inverse_normal_transform(kg, sd, &n2);
}
float3 N = (1.0f - u - v) * n2 + u * n0 + v * n1;
float3 N = (1.0f - u - v) * n0 + u * n1 + v * n2;
return is_zero(N) ? Ng : N;
}
@@ -137,8 +137,8 @@ ccl_device_inline void triangle_dPdudv(KernelGlobals kg,
const float3 p2 = kernel_data_fetch(tri_verts, tri_vindex.w + 2);
/* compute derivatives of P w.r.t. uv */
*dPdu = (p0 - p2);
*dPdv = (p1 - p2);
*dPdu = (p1 - p0);
*dPdv = (p2 - p0);
}
/* Reading attributes on various triangle elements */
@@ -167,12 +167,12 @@ ccl_device float triangle_attribute_float(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * f0 + sd->dv.dx * f1 - (sd->du.dx + sd->dv.dx) * f2;
*dx = sd->du.dx * f1 + sd->dv.dx * f2 - (sd->du.dx + sd->dv.dx) * f0;
if (dy)
*dy = sd->du.dy * f0 + sd->dv.dy * f1 - (sd->du.dy + sd->dv.dy) * f2;
*dy = sd->du.dy * f1 + sd->dv.dy * f2 - (sd->du.dy + sd->dv.dy) * f0;
#endif
return sd->u * f0 + sd->v * f1 + (1.0f - sd->u - sd->v) * f2;
return sd->u * f1 + sd->v * f2 + (1.0f - sd->u - sd->v) * f0;
}
else {
#ifdef __RAY_DIFFERENTIALS__
@@ -217,12 +217,12 @@ ccl_device float2 triangle_attribute_float2(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * f0 + sd->dv.dx * f1 - (sd->du.dx + sd->dv.dx) * f2;
*dx = sd->du.dx * f1 + sd->dv.dx * f2 - (sd->du.dx + sd->dv.dx) * f0;
if (dy)
*dy = sd->du.dy * f0 + sd->dv.dy * f1 - (sd->du.dy + sd->dv.dy) * f2;
*dy = sd->du.dy * f1 + sd->dv.dy * f2 - (sd->du.dy + sd->dv.dy) * f0;
#endif
return sd->u * f0 + sd->v * f1 + (1.0f - sd->u - sd->v) * f2;
return sd->u * f1 + sd->v * f2 + (1.0f - sd->u - sd->v) * f0;
}
else {
#ifdef __RAY_DIFFERENTIALS__
@@ -267,12 +267,12 @@ ccl_device float3 triangle_attribute_float3(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * f0 + sd->dv.dx * f1 - (sd->du.dx + sd->dv.dx) * f2;
*dx = sd->du.dx * f1 + sd->dv.dx * f2 - (sd->du.dx + sd->dv.dx) * f0;
if (dy)
*dy = sd->du.dy * f0 + sd->dv.dy * f1 - (sd->du.dy + sd->dv.dy) * f2;
*dy = sd->du.dy * f1 + sd->dv.dy * f2 - (sd->du.dy + sd->dv.dy) * f0;
#endif
return sd->u * f0 + sd->v * f1 + (1.0f - sd->u - sd->v) * f2;
return sd->u * f1 + sd->v * f2 + (1.0f - sd->u - sd->v) * f0;
}
else {
#ifdef __RAY_DIFFERENTIALS__
@@ -328,12 +328,12 @@ ccl_device float4 triangle_attribute_float4(KernelGlobals kg,
#ifdef __RAY_DIFFERENTIALS__
if (dx)
*dx = sd->du.dx * f0 + sd->dv.dx * f1 - (sd->du.dx + sd->dv.dx) * f2;
*dx = sd->du.dx * f1 + sd->dv.dx * f2 - (sd->du.dx + sd->dv.dx) * f0;
if (dy)
*dy = sd->du.dy * f0 + sd->dv.dy * f1 - (sd->du.dy + sd->dv.dy) * f2;
*dy = sd->du.dy * f1 + sd->dv.dy * f2 - (sd->du.dy + sd->dv.dy) * f0;
#endif
return sd->u * f0 + sd->v * f1 + (1.0f - sd->u - sd->v) * f2;
return sd->u * f1 + sd->v * f2 + (1.0f - sd->u - sd->v) * f0;
}
else {
#ifdef __RAY_DIFFERENTIALS__

View File

@@ -145,9 +145,9 @@ ccl_device_inline float3 triangle_point_from_uv(KernelGlobals kg,
const packed_float3 tri_a = kernel_data_fetch(tri_verts, tri_vindex + 0),
tri_b = kernel_data_fetch(tri_verts, tri_vindex + 1),
tri_c = kernel_data_fetch(tri_verts, tri_vindex + 2);
float w = 1.0f - u - v;
float3 P = u * tri_a + v * tri_b + w * tri_c;
/* This appears to give slightly better precision than interpolating with w = (1 - u - v). */
float3 P = tri_a + u * (tri_b - tri_a) + v * (tri_c - tri_a);
if (!(sd->object_flag & SD_OBJECT_TRANSFORM_APPLIED)) {
const Transform tfm = object_get_transform(kg, sd);

View File

@@ -155,6 +155,11 @@ ccl_device bool integrator_init_from_bake(KernelGlobals kg,
1.0f - u);
}
/* Convert from Blender to Cycles/Embree/OptiX barycentric convention. */
const float tmp = u;
u = v;
v = 1.0f - tmp - v;
/* Position and normal on triangle. */
const int object = kernel_data.bake.object_index;
float3 P, Ng;

View File

@@ -51,7 +51,7 @@ ccl_device_forceinline int integrate_shadow_max_transparent_hits(KernelGlobals k
}
#ifdef __TRANSPARENT_SHADOWS__
# if defined(__KERNEL_CPU__)
# ifndef __KERNEL_GPU__
ccl_device int shadow_intersections_compare(const void *a, const void *b)
{
const Intersection *isect_a = (const Intersection *)a;

View File

@@ -38,8 +38,7 @@ ccl_device void integrator_volume_stack_update_for_subsurface(KernelGlobals kg,
#ifdef __VOLUME_RECORD_ALL__
Intersection hits[2 * MAX_VOLUME_STACK_SIZE + 1];
uint num_hits = scene_intersect_volume_all(
kg, &volume_ray, hits, 2 * volume_stack_size, visibility);
uint num_hits = scene_intersect_volume(kg, &volume_ray, hits, 2 * volume_stack_size, visibility);
if (num_hits > 0) {
Intersection *isect = hits;
@@ -108,8 +107,7 @@ ccl_device void integrator_volume_stack_init(KernelGlobals kg, IntegratorState s
#ifdef __VOLUME_RECORD_ALL__
Intersection hits[2 * MAX_VOLUME_STACK_SIZE + 1];
uint num_hits = scene_intersect_volume_all(
kg, &volume_ray, hits, 2 * volume_stack_size, visibility);
uint num_hits = scene_intersect_volume(kg, &volume_ray, hits, 2 * volume_stack_size, visibility);
if (num_hits > 0) {
int enclosed_volumes[MAX_VOLUME_STACK_SIZE];
Intersection *isect = hits;

View File

@@ -186,7 +186,7 @@ ccl_device_forceinline void mnee_setup_manifold_vertex(KernelGlobals kg,
triangle_vertices_and_normals(kg, sd_vtx->prim, verts, normals);
/* Compute refined position (same code as in triangle_point_from_uv). */
sd_vtx->P = isect->u * verts[0] + isect->v * verts[1] + (1.f - isect->u - isect->v) * verts[2];
sd_vtx->P = (1.f - isect->u - isect->v) * verts[0] + isect->u * verts[1] + isect->v * verts[2];
if (!(sd_vtx->object_flag & SD_OBJECT_TRANSFORM_APPLIED)) {
const Transform tfm = object_get_transform(kg, sd_vtx);
sd_vtx->P = transform_point(&tfm, sd_vtx->P);
@@ -213,8 +213,8 @@ ccl_device_forceinline void mnee_setup_manifold_vertex(KernelGlobals kg,
}
/* Tangent space (position derivatives) WRT barycentric (u, v). */
float3 dp_du = verts[0] - verts[2];
float3 dp_dv = verts[1] - verts[2];
float3 dp_du = verts[1] - verts[0];
float3 dp_dv = verts[2] - verts[0];
/* Geometric normal. */
vtx->ng = normalize(cross(dp_du, dp_dv));
@@ -223,16 +223,16 @@ ccl_device_forceinline void mnee_setup_manifold_vertex(KernelGlobals kg,
/* Shading normals: Interpolate normals between vertices. */
float n_len;
vtx->n = normalize_len(normals[0] * sd_vtx->u + normals[1] * sd_vtx->v +
normals[2] * (1.0f - sd_vtx->u - sd_vtx->v),
vtx->n = normalize_len(normals[0] * (1.0f - sd_vtx->u - sd_vtx->v) + normals[1] * sd_vtx->u +
normals[2] * sd_vtx->v,
&n_len);
/* Shading normal derivatives WRT barycentric (u, v)
* we calculate the derivative of n = |u*n0 + v*n1 + (1-u-v)*n2| using:
* d/du [f(u)/|f(u)|] = [d/du f(u)]/|f(u)| - f(u)/|f(u)|^3 <f(u), d/du f(u)>. */
const float inv_n_len = 1.f / n_len;
float3 dn_du = inv_n_len * (normals[0] - normals[2]);
float3 dn_dv = inv_n_len * (normals[1] - normals[2]);
float3 dn_du = inv_n_len * (normals[1] - normals[0]);
float3 dn_dv = inv_n_len * (normals[2] - normals[0]);
dn_du -= vtx->n * dot(vtx->n, dn_du);
dn_dv -= vtx->n * dot(vtx->n, dn_dv);

View File

@@ -13,7 +13,7 @@ CCL_NAMESPACE_BEGIN
ccl_device_inline void path_state_init_queues(IntegratorState state)
{
INTEGRATOR_STATE_WRITE(state, path, queued_kernel) = 0;
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
INTEGRATOR_STATE_WRITE(&state->shadow, shadow_path, queued_kernel) = 0;
INTEGRATOR_STATE_WRITE(&state->ao, shadow_path, queued_kernel) = 0;
#endif

View File

@@ -140,7 +140,7 @@ typedef struct IntegratorStateGPU {
* happen from a kernel which operates on a "main" path. Attempt to use shadow catcher accessors
* from a kernel which operates on a shadow catcher state will cause bad memory access. */
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
/* Scalar access on CPU. */
@@ -159,7 +159,7 @@ typedef const IntegratorShadowStateCPU *ccl_restrict ConstIntegratorShadowState;
# define INTEGRATOR_STATE_ARRAY_WRITE(state, nested_struct, array_index, member) \
((state)->nested_struct[array_index].member)
#else /* __KERNEL_CPU__ */
#else /* !__KERNEL_GPU__ */
/* Array access on GPU with Structure-of-Arrays. */
@@ -180,6 +180,6 @@ typedef int ConstIntegratorShadowState;
# define INTEGRATOR_STATE_ARRAY_WRITE(state, nested_struct, array_index, member) \
INTEGRATOR_STATE_ARRAY(state, nested_struct, array_index, member)
#endif /* __KERNEL_CPU__ */
#endif /* !__KERNEL_GPU__ */
CCL_NAMESPACE_END

View File

@@ -338,7 +338,7 @@ ccl_device_inline IntegratorState integrator_state_shadow_catcher_split(KernelGl
return to_state;
}
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
ccl_device_inline int integrator_state_bounce(ConstIntegratorState state, const int)
{
return INTEGRATOR_STATE(state, path, bounce);

View File

@@ -126,17 +126,8 @@ ccl_device_inline bool subsurface_disk(KernelGlobals kg,
if (!(object_flag & SD_OBJECT_TRANSFORM_APPLIED)) {
/* Transform normal to world space. */
Transform itfm;
Transform tfm = object_fetch_transform_motion_test(kg, object, time, &itfm);
object_fetch_transform_motion_test(kg, object, time, &itfm);
hit_Ng = normalize(transform_direction_transposed(&itfm, hit_Ng));
/* Transform t to world space, except for OptiX and MetalRT where it already is. */
#ifdef __KERNEL_GPU_RAYTRACING__
(void)tfm;
#else
float3 D = transform_direction(&itfm, ray.D);
D = normalize(D) * ss_isect.hits[hit].t;
ss_isect.hits[hit].t = len(transform_direction(&tfm, D));
#endif
}
/* Quickly retrieve P and Ng without setting up ShaderData. */

View File

@@ -205,12 +205,6 @@ ccl_device_inline bool subsurface_random_walk(KernelGlobals kg,
ray.self.light_object = OBJECT_NONE;
ray.self.light_prim = PRIM_NONE;
#ifndef __KERNEL_GPU_RAYTRACING__
/* Compute or fetch object transforms. */
Transform ob_itfm ccl_optional_struct_init;
Transform ob_tfm = object_fetch_transform_motion_test(kg, object, time, &ob_itfm);
#endif
/* Convert subsurface to volume coefficients.
* The single-scattering albedo is named alpha to avoid confusion with the surface albedo. */
const float3 albedo = INTEGRATOR_STATE(state, subsurface, albedo);
@@ -383,15 +377,7 @@ ccl_device_inline bool subsurface_random_walk(KernelGlobals kg,
hit = (ss_isect.num_hits > 0);
if (hit) {
#ifdef __KERNEL_GPU_RAYTRACING__
/* t is always in world space with OptiX and MetalRT. */
ray.tmax = ss_isect.hits[0].t;
#else
/* Compute world space distance to surface hit. */
float3 D = transform_direction(&ob_itfm, ray.D);
D = normalize(D) * ss_isect.hits[0].t;
ray.tmax = len(transform_direction(&ob_tfm, D));
#endif
}
if (bounce == 0) {

View File

@@ -137,8 +137,9 @@ ccl_device_inline float3 shadow_ray_smooth_surface_offset(
triangle_vertices_and_normals(kg, sd->prim, V, N);
}
const float u = sd->u, v = sd->v;
const float w = 1 - u - v;
const float u = 1.0f - sd->u - sd->v;
const float v = sd->u;
const float w = sd->v;
float3 P = V[0] * u + V[1] * v + V[2] * w; /* Local space */
float3 n = N[0] * u + N[1] * v + N[2] * w; /* We get away without normalization */

View File

@@ -20,7 +20,7 @@ shader node_geometry(normal NormalIn = N,
Normal = NormalIn;
TrueNormal = Ng;
Incoming = I;
Parametric = point(u, v, 0.0);
Parametric = point(1.0 - u - v, u, 0.0);
Backfacing = backfacing();
if (bump_offset == "dx") {

View File

@@ -34,7 +34,7 @@ ccl_device_noinline void svm_node_geometry(KernelGlobals kg,
data = sd->Ng;
break;
case NODE_GEOM_uv:
data = make_float3(sd->u, sd->v, 0.0f);
data = make_float3(1.0f - sd->u - sd->v, sd->u, 0.0f);
break;
default:
data = make_float3(0.0f, 0.0f, 0.0f);
@@ -57,7 +57,7 @@ ccl_device_noinline void svm_node_geometry_bump_dx(KernelGlobals kg,
data = sd->P + sd->dP.dx;
break;
case NODE_GEOM_uv:
data = make_float3(sd->u + sd->du.dx, sd->v + sd->dv.dx, 0.0f);
data = make_float3(1.0f - sd->u - sd->du.dx - sd->v - sd->dv.dx, sd->u + sd->du.dx, 0.0f);
break;
default:
svm_node_geometry(kg, sd, stack, type, out_offset);
@@ -84,7 +84,7 @@ ccl_device_noinline void svm_node_geometry_bump_dy(KernelGlobals kg,
data = sd->P + sd->dP.dy;
break;
case NODE_GEOM_uv:
data = make_float3(sd->u + sd->du.dy, sd->v + sd->dv.dy, 0.0f);
data = make_float3(1.0f - sd->u - sd->du.dy - sd->v - sd->dv.dy, sd->u + sd->du.dy, 0.0f);
break;
default:
svm_node_geometry(kg, sd, stack, type, out_offset);

View File

@@ -19,10 +19,6 @@
#include "kernel/svm/types.h"
#ifndef __KERNEL_GPU__
# define __KERNEL_CPU__
#endif
CCL_NAMESPACE_BEGIN
/* Constants */
@@ -51,10 +47,10 @@ CCL_NAMESPACE_BEGIN
#define INTEGRATOR_SHADOW_ISECT_SIZE_CPU 1024U
#define INTEGRATOR_SHADOW_ISECT_SIZE_GPU 4U
#ifdef __KERNEL_CPU__
# define INTEGRATOR_SHADOW_ISECT_SIZE INTEGRATOR_SHADOW_ISECT_SIZE_CPU
#else
#ifdef __KERNEL_GPU__
# define INTEGRATOR_SHADOW_ISECT_SIZE INTEGRATOR_SHADOW_ISECT_SIZE_GPU
#else
# define INTEGRATOR_SHADOW_ISECT_SIZE INTEGRATOR_SHADOW_ISECT_SIZE_CPU
#endif
/* Kernel features */
@@ -83,7 +79,6 @@ CCL_NAMESPACE_BEGIN
#define __LAMP_MIS__
#define __CAMERA_MOTION__
#define __OBJECT_MOTION__
#define __BAKING__
#define __PRINCIPLED__
#define __SUBSURFACE__
#define __VOLUME__
@@ -92,16 +87,12 @@ CCL_NAMESPACE_BEGIN
#define __BRANCHED_PATH__
/* Device specific features */
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
# ifdef WITH_OSL
# define __OSL__
# endif
# define __VOLUME_RECORD_ALL__
#endif /* __KERNEL_CPU__ */
#ifdef __KERNEL_GPU_RAYTRACING__
# undef __BAKING__
#endif /* __KERNEL_GPU_RAYTRACING__ */
#endif /* !__KERNEL_GPU__ */
/* MNEE currently causes "Compute function exceeds available temporary registers"
* on Metal, disabled for now. */
@@ -129,9 +120,6 @@ CCL_NAMESPACE_BEGIN
# if !(__KERNEL_FEATURES & KERNEL_FEATURE_SUBSURFACE)
# undef __SUBSURFACE__
# endif
# if !(__KERNEL_FEATURES & KERNEL_FEATURE_BAKING)
# undef __BAKING__
# endif
# if !(__KERNEL_FEATURES & KERNEL_FEATURE_PATCH_EVALUATION)
# undef __PATCH_EVAL__
# endif
@@ -730,7 +718,7 @@ typedef struct ccl_align(16) ShaderClosure
{
SHADER_CLOSURE_BASE;
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
float pad[2];
#endif
float data[10];
@@ -1168,7 +1156,7 @@ typedef struct KernelData {
uint max_shaders;
uint volume_stack_size;
/* Always dynamic data mambers. */
/* Always dynamic data members. */
KernelCamera cam;
KernelBake bake;
KernelTables tables;
@@ -1548,15 +1536,15 @@ enum KernelFeatureFlag : uint32_t {
/* Must be constexpr on the CPU to avoid compile errors because the state types
* are different depending on the main, shadow or null path. For GPU we don't have
* C++17 everywhere so can't use it. */
#ifdef __KERNEL_CPU__
#ifdef __KERNEL_GPU__
# define IF_KERNEL_FEATURE(feature) if ((node_feature_mask & (KERNEL_FEATURE_##feature)) != 0U)
# define IF_KERNEL_NODES_FEATURE(feature) \
if ((node_feature_mask & (KERNEL_FEATURE_NODE_##feature)) != 0U)
#else
# define IF_KERNEL_FEATURE(feature) \
if constexpr ((node_feature_mask & (KERNEL_FEATURE_##feature)) != 0U)
# define IF_KERNEL_NODES_FEATURE(feature) \
if constexpr ((node_feature_mask & (KERNEL_FEATURE_NODE_##feature)) != 0U)
#else
# define IF_KERNEL_FEATURE(feature) if ((node_feature_mask & (KERNEL_FEATURE_##feature)) != 0U)
# define IF_KERNEL_NODES_FEATURE(feature) \
if ((node_feature_mask & (KERNEL_FEATURE_NODE_##feature)) != 0U)
#endif
CCL_NAMESPACE_END

View File

@@ -3,13 +3,13 @@
#pragma once
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
# include "util/profiling.h"
#endif
CCL_NAMESPACE_BEGIN
#ifdef __KERNEL_CPU__
#ifndef __KERNEL_GPU__
# define PROFILING_INIT(kg, event) \
ProfilingHelper profiling_helper((ProfilingState *)&kg->profiler, event)
# define PROFILING_EVENT(event) profiling_helper.set_event(event)
@@ -22,6 +22,6 @@ CCL_NAMESPACE_BEGIN
# define PROFILING_EVENT(event)
# define PROFILING_INIT_FOR_SHADER(kg, event)
# define PROFILING_SHADER(object, shader)
#endif /* __KERNEL_CPU__ */
#endif /* !__KERNEL_GPU__ */
CCL_NAMESPACE_END

View File

@@ -73,16 +73,16 @@ static int fill_shader_input(const Scene *scene,
switch (j) {
case 0:
u = 1.0f;
u = 0.0f;
v = 0.0f;
break;
case 1:
u = 0.0f;
v = 1.0f;
u = 1.0f;
v = 0.0f;
break;
default:
u = 0.0f;
v = 0.0f;
v = 1.0f;
break;
}

View File

@@ -209,7 +209,7 @@ const BufferPass *BufferParams::get_actual_display_pass(const BufferPass *pass)
return nullptr;
}
if (pass->type == PASS_COMBINED) {
if (pass->type == PASS_COMBINED && pass->lightgroup.empty()) {
const BufferPass *shadow_catcher_matte_pass = find_pass(PASS_SHADOW_CATCHER_MATTE, pass->mode);
if (shadow_catcher_matte_pass) {
pass = shadow_catcher_matte_pass;

View File

@@ -2,7 +2,6 @@
* Copyright 2011-2022 Blender Foundation */
#define __KERNEL_AVX2__
#define __KERNEL_CPU__
#define TEST_CATEGORY_NAME util_avx2

View File

@@ -2,7 +2,6 @@
* Copyright 2011-2022 Blender Foundation */
#define __KERNEL_AVX__
#define __KERNEL_CPU__
#define TEST_CATEGORY_NAME util_avx

View File

@@ -63,6 +63,7 @@ set(SRC_HEADERS
math_float2.h
math_float3.h
math_float4.h
math_float8.h
math_int2.h
math_int3.h
math_int4.h
@@ -128,8 +129,6 @@ set(SRC_HEADERS
types_uint4.h
types_uint4_impl.h
types_ushort4.h
types_vector3.h
types_vector3_impl.h
unique_ptr.h
vector.h
version.h

View File

@@ -81,7 +81,7 @@
/* macros */
/* hints for branch prediction, only use in code that runs a _lot_ */
#if defined(__GNUC__) && defined(__KERNEL_CPU__)
#if defined(__GNUC__) && !defined(__KERNEL_GPU__)
# define LIKELY(x) __builtin_expect(!!(x), 1)
# define UNLIKELY(x) __builtin_expect(!!(x), 0)
#else

View File

@@ -511,6 +511,11 @@ ccl_device_inline float4 float3_to_float4(const float3 a)
return make_float4(a.x, a.y, a.z, 1.0f);
}
ccl_device_inline float4 float3_to_float4(const float3 a, const float w)
{
return make_float4(a.x, a.y, a.z, w);
}
ccl_device_inline float inverse_lerp(float a, float b, float x)
{
return (x - a) / (b - a);
@@ -535,6 +540,7 @@ CCL_NAMESPACE_END
#include "util/math_float2.h"
#include "util/math_float3.h"
#include "util/math_float4.h"
#include "util/math_float8.h"
#include "util/rect.h"
@@ -947,7 +953,11 @@ ccl_device_inline uint prev_power_of_two(uint x)
ccl_device_inline uint32_t reverse_integer_bits(uint32_t x)
{
/* Use a native instruction if it exists. */
#if defined(__aarch64__) || defined(_M_ARM64)
#if defined(__KERNEL_CUDA__)
return __brev(x);
#elif defined(__KERNEL_METAL__)
return reverse_bits(x);
#elif defined(__aarch64__) || defined(_M_ARM64)
/* Assume the rbit is always available on 64bit ARM architecture. */
__asm__("rbit %w0, %w1" : "=r"(x) : "r"(x));
return x;
@@ -956,10 +966,6 @@ ccl_device_inline uint32_t reverse_integer_bits(uint32_t x)
* This 32-bit Thumb instruction is available in ARMv6T2 and above. */
__asm__("rbit %0, %1" : "=r"(x) : "r"(x));
return x;
#elif defined(__KERNEL_CUDA__)
return __brev(x);
#elif defined(__KERNEL_METAL__)
return reverse_bits(x);
#elif __has_builtin(__builtin_bitreverse32)
return __builtin_bitreverse32(x);
#else

View File

@@ -420,7 +420,7 @@ ccl_device_inline float fast_expf(float x)
return fast_exp2f(x / M_LN2_F);
}
#if defined(__KERNEL_CPU__) && !defined(_MSC_VER)
#if !defined(__KERNEL_GPU__) && !defined(_MSC_VER)
/* MSVC seems to have a code-gen bug here in at least SSE41/AVX, see
* T78047 and T78869 for details. Just disable for now, it only makes
* a small difference in denoising performance. */

View File

@@ -147,8 +147,11 @@ ccl_device_inline float3 operator/(const float f, const float3 &a)
ccl_device_inline float3 operator/(const float3 &a, const float f)
{
float invf = 1.0f / f;
return a * invf;
# if defined(__KERNEL_SSE__)
return float3(_mm_div_ps(a.m128, _mm_set1_ps(f)));
# else
return make_float3(a.x / f, a.y / f, a.z / f);
# endif
}
ccl_device_inline float3 operator/(const float3 &a, const float3 &b)
@@ -284,8 +287,12 @@ ccl_device_inline float dot_xy(const float3 &a, const float3 &b)
ccl_device_inline float3 cross(const float3 &a, const float3 &b)
{
float3 r = make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x);
return r;
# ifdef __KERNEL_SSE__
return float3(shuffle<1, 2, 0, 3>(
msub(ssef(a), shuffle<1, 2, 0, 3>(ssef(b)), shuffle<1, 2, 0, 3>(ssef(a)) * ssef(b))));
# else
return make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x);
# endif
}
ccl_device_inline float3 normalize(const float3 &a)

View File

@@ -0,0 +1,419 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2022 Blender Foundation */
#ifndef __UTIL_MATH_FLOAT8_H__
#define __UTIL_MATH_FLOAT8_H__
#ifndef __UTIL_MATH_H__
# error "Do not include this file directly, include util/types.h instead."
#endif
CCL_NAMESPACE_BEGIN
/*******************************************************************************
* Declaration.
*/
ccl_device_inline float8_t operator+(const float8_t a, const float8_t b);
ccl_device_inline float8_t operator+(const float8_t a, const float f);
ccl_device_inline float8_t operator+(const float f, const float8_t a);
ccl_device_inline float8_t operator-(const float8_t a);
ccl_device_inline float8_t operator-(const float8_t a, const float8_t b);
ccl_device_inline float8_t operator-(const float8_t a, const float f);
ccl_device_inline float8_t operator-(const float f, const float8_t a);
ccl_device_inline float8_t operator*(const float8_t a, const float8_t b);
ccl_device_inline float8_t operator*(const float8_t a, const float f);
ccl_device_inline float8_t operator*(const float f, const float8_t a);
ccl_device_inline float8_t operator/(const float8_t a, const float8_t b);
ccl_device_inline float8_t operator/(const float8_t a, float f);
ccl_device_inline float8_t operator/(const float f, const float8_t a);
ccl_device_inline float8_t operator+=(float8_t a, const float8_t b);
ccl_device_inline float8_t operator*=(float8_t a, const float8_t b);
ccl_device_inline float8_t operator*=(float8_t a, float f);
ccl_device_inline float8_t operator/=(float8_t a, float f);
ccl_device_inline bool operator==(const float8_t a, const float8_t b);
ccl_device_inline float8_t rcp(const float8_t a);
ccl_device_inline float8_t sqrt(const float8_t a);
ccl_device_inline float8_t sqr(const float8_t a);
ccl_device_inline bool is_zero(const float8_t a);
ccl_device_inline float average(const float8_t a);
ccl_device_inline float8_t min(const float8_t a, const float8_t b);
ccl_device_inline float8_t max(const float8_t a, const float8_t b);
ccl_device_inline float8_t clamp(const float8_t a, const float8_t mn, const float8_t mx);
ccl_device_inline float8_t fabs(const float8_t a);
ccl_device_inline float8_t mix(const float8_t a, const float8_t b, float t);
ccl_device_inline float8_t saturate(const float8_t a);
ccl_device_inline float8_t safe_divide(const float8_t a, const float b);
ccl_device_inline float8_t safe_divide(const float8_t a, const float8_t b);
ccl_device_inline float reduce_min(const float8_t a);
ccl_device_inline float reduce_max(const float8_t a);
ccl_device_inline float reduce_add(const float8_t a);
ccl_device_inline bool isequal(const float8_t a, const float8_t b);
/*******************************************************************************
* Definition.
*/
ccl_device_inline float8_t zero_float8_t()
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_setzero_ps());
#else
return make_float8_t(0.0f);
#endif
}
ccl_device_inline float8_t one_float8_t()
{
return make_float8_t(1.0f);
}
ccl_device_inline float8_t operator+(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_add_ps(a.m256, b.m256));
#else
return make_float8_t(
a.a + b.a, a.b + b.b, a.c + b.c, a.d + b.d, a.e + b.e, a.f + b.f, a.g + b.g, a.h + b.h);
#endif
}
ccl_device_inline float8_t operator+(const float8_t a, const float f)
{
return a + make_float8_t(f);
}
ccl_device_inline float8_t operator+(const float f, const float8_t a)
{
return make_float8_t(f) + a;
}
ccl_device_inline float8_t operator-(const float8_t a)
{
#ifdef __KERNEL_AVX2__
__m256 mask = _mm256_castsi256_ps(_mm256_set1_epi32(0x80000000));
return float8_t(_mm256_xor_ps(a.m256, mask));
#else
return make_float8_t(-a.a, -a.b, -a.c, -a.d, -a.e, -a.f, -a.g, -a.h);
#endif
}
ccl_device_inline float8_t operator-(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_sub_ps(a.m256, b.m256));
#else
return make_float8_t(
a.a - b.a, a.b - b.b, a.c - b.c, a.d - b.d, a.e - b.e, a.f - b.f, a.g - b.g, a.h - b.h);
#endif
}
ccl_device_inline float8_t operator-(const float8_t a, const float f)
{
return a - make_float8_t(f);
}
ccl_device_inline float8_t operator-(const float f, const float8_t a)
{
return make_float8_t(f) - a;
}
ccl_device_inline float8_t operator*(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_mul_ps(a.m256, b.m256));
#else
return make_float8_t(
a.a * b.a, a.b * b.b, a.c * b.c, a.d * b.d, a.e * b.e, a.f * b.f, a.g * b.g, a.h * b.h);
#endif
}
ccl_device_inline float8_t operator*(const float8_t a, const float f)
{
return a * make_float8_t(f);
}
ccl_device_inline float8_t operator*(const float f, const float8_t a)
{
return make_float8_t(f) * a;
}
ccl_device_inline float8_t operator/(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_div_ps(a.m256, b.m256));
#else
return make_float8_t(
a.a / b.a, a.b / b.b, a.c / b.c, a.d / b.d, a.e / b.e, a.f / b.f, a.g / b.g, a.h / b.h);
#endif
}
ccl_device_inline float8_t operator/(const float8_t a, const float f)
{
return a / make_float8_t(f);
}
ccl_device_inline float8_t operator/(const float f, const float8_t a)
{
return make_float8_t(f) / a;
}
ccl_device_inline float8_t operator+=(float8_t a, const float8_t b)
{
return a = a + b;
}
ccl_device_inline float8_t operator-=(float8_t a, const float8_t b)
{
return a = a - b;
}
ccl_device_inline float8_t operator*=(float8_t a, const float8_t b)
{
return a = a * b;
}
ccl_device_inline float8_t operator*=(float8_t a, float f)
{
return a = a * f;
}
ccl_device_inline float8_t operator/=(float8_t a, float f)
{
return a = a / f;
}
ccl_device_inline bool operator==(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return (_mm256_movemask_ps(_mm256_castsi256_ps(
_mm256_cmpeq_epi32(_mm256_castps_si256(a.m256), _mm256_castps_si256(b.m256)))) &
0b11111111) == 0b11111111;
#else
return (a.a == b.a && a.b == b.b && a.c == b.c && a.d == b.d && a.e == b.e && a.f == b.f &&
a.g == b.g && a.h == b.h);
#endif
}
ccl_device_inline float8_t rcp(const float8_t a)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_rcp_ps(a.m256));
#else
return make_float8_t(1.0f / a.a,
1.0f / a.b,
1.0f / a.c,
1.0f / a.d,
1.0f / a.e,
1.0f / a.f,
1.0f / a.g,
1.0f / a.h);
#endif
}
ccl_device_inline float8_t sqrt(const float8_t a)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_sqrt_ps(a.m256));
#else
return make_float8_t(sqrtf(a.a),
sqrtf(a.b),
sqrtf(a.c),
sqrtf(a.d),
sqrtf(a.e),
sqrtf(a.f),
sqrtf(a.g),
sqrtf(a.h));
#endif
}
ccl_device_inline float8_t sqr(const float8_t a)
{
return a * a;
}
ccl_device_inline bool is_zero(const float8_t a)
{
return a == make_float8_t(0.0f);
}
ccl_device_inline float average(const float8_t a)
{
return reduce_add(a) / 8.0f;
}
ccl_device_inline float8_t min(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_min_ps(a.m256, b.m256));
#else
return make_float8_t(min(a.a, b.a),
min(a.b, b.b),
min(a.c, b.c),
min(a.d, b.d),
min(a.e, b.e),
min(a.f, b.f),
min(a.g, b.g),
min(a.h, b.h));
#endif
}
ccl_device_inline float8_t max(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_max_ps(a.m256, b.m256));
#else
return make_float8_t(max(a.a, b.a),
max(a.b, b.b),
max(a.c, b.c),
max(a.d, b.d),
max(a.e, b.e),
max(a.f, b.f),
max(a.g, b.g),
max(a.h, b.h));
#endif
}
ccl_device_inline float8_t clamp(const float8_t a, const float8_t mn, const float8_t mx)
{
return min(max(a, mn), mx);
}
ccl_device_inline float8_t fabs(const float8_t a)
{
#ifdef __KERNEL_AVX2__
return float8_t(_mm256_and_ps(a.m256, _mm256_castsi256_ps(_mm256_set1_epi32(0x7fffffff))));
#else
return make_float8_t(fabsf(a.a),
fabsf(a.b),
fabsf(a.c),
fabsf(a.d),
fabsf(a.e),
fabsf(a.f),
fabsf(a.g),
fabsf(a.h));
#endif
}
ccl_device_inline float8_t mix(const float8_t a, const float8_t b, float t)
{
return a + t * (b - a);
}
ccl_device_inline float8_t saturate(const float8_t a)
{
return clamp(a, make_float8_t(0.0f), make_float8_t(1.0f));
}
ccl_device_inline float8_t exp(float8_t v)
{
return make_float8_t(
expf(v.a), expf(v.b), expf(v.c), expf(v.d), expf(v.e), expf(v.f), expf(v.g), expf(v.h));
}
ccl_device_inline float8_t log(float8_t v)
{
return make_float8_t(
logf(v.a), logf(v.b), logf(v.c), logf(v.d), logf(v.e), logf(v.f), logf(v.g), logf(v.h));
}
ccl_device_inline float dot(const float8_t a, const float8_t b)
{
#ifdef __KERNEL_AVX2__
float8_t t(_mm256_dp_ps(a.m256, b.m256, 0xFF));
return t[0] + t[4];
#else
return (a.a * b.a) + (a.b * b.b) + (a.c * b.c) + (a.d * b.d) + (a.e * b.e) + (a.f * b.f) +
(a.g * b.g) + (a.h * b.h);
#endif
}
ccl_device_inline float8_t pow(float8_t v, float e)
{
return make_float8_t(powf(v.a, e),
powf(v.b, e),
powf(v.c, e),
powf(v.d, e),
powf(v.e, e),
powf(v.f, e),
powf(v.g, e),
powf(v.h, e));
}
ccl_device_inline float reduce_min(const float8_t a)
{
return min(min(min(a.a, a.b), min(a.c, a.d)), min(min(a.e, a.f), min(a.g, a.h)));
}
ccl_device_inline float reduce_max(const float8_t a)
{
return max(max(max(a.a, a.b), max(a.c, a.d)), max(max(a.e, a.f), max(a.g, a.h)));
}
ccl_device_inline float reduce_add(const float8_t a)
{
#ifdef __KERNEL_AVX2__
float8_t b(_mm256_hadd_ps(a.m256, a.m256));
float8_t h(_mm256_hadd_ps(b.m256, b.m256));
return h[0] + h[4];
#else
return a.a + a.b + a.c + a.d + a.e + a.f + a.g + a.h;
#endif
}
ccl_device_inline bool isequal(const float8_t a, const float8_t b)
{
return a == b;
}
ccl_device_inline float8_t safe_divide(const float8_t a, const float b)
{
return (b != 0.0f) ? a / b : make_float8_t(0.0f);
}
ccl_device_inline float8_t safe_divide(const float8_t a, const float8_t b)
{
return make_float8_t((b.a != 0.0f) ? a.a / b.a : 0.0f,
(b.b != 0.0f) ? a.b / b.b : 0.0f,
(b.c != 0.0f) ? a.c / b.c : 0.0f,
(b.d != 0.0f) ? a.d / b.d : 0.0f,
(b.e != 0.0f) ? a.e / b.e : 0.0f,
(b.f != 0.0f) ? a.f / b.f : 0.0f,
(b.g != 0.0f) ? a.g / b.g : 0.0f,
(b.h != 0.0f) ? a.h / b.h : 0.0f);
}
ccl_device_inline float8_t ensure_finite(float8_t v)
{
v.a = ensure_finite(v.a);
v.b = ensure_finite(v.b);
v.c = ensure_finite(v.c);
v.d = ensure_finite(v.d);
v.e = ensure_finite(v.e);
v.f = ensure_finite(v.f);
v.g = ensure_finite(v.g);
v.h = ensure_finite(v.h);
return v;
}
ccl_device_inline bool isfinite_safe(float8_t v)
{
return isfinite_safe(v.a) && isfinite_safe(v.b) && isfinite_safe(v.c) && isfinite_safe(v.d) &&
isfinite_safe(v.e) && isfinite_safe(v.f) && isfinite_safe(v.g) && isfinite_safe(v.h);
}
CCL_NAMESPACE_END
#endif /* __UTIL_MATH_FLOAT8_H__ */

View File

@@ -105,10 +105,10 @@ ccl_device bool ray_disk_intersect(float3 ray_P,
return false;
}
ccl_device_forceinline bool ray_triangle_intersect(float3 ray_P,
float3 ray_dir,
float ray_tmin,
float ray_tmax,
ccl_device_forceinline bool ray_triangle_intersect(const float3 ray_P,
const float3 ray_D,
const float ray_tmin,
const float ray_tmax,
const float3 tri_a,
const float3 tri_b,
const float3 tri_c,
@@ -116,14 +116,13 @@ ccl_device_forceinline bool ray_triangle_intersect(float3 ray_P,
ccl_private float *isect_v,
ccl_private float *isect_t)
{
#define dot3(a, b) dot(a, b)
const float3 P = ray_P;
const float3 dir = ray_dir;
/* This implementation matches the Plücker coordinates triangle intersection
* in Embree. */
/* Calculate vertices relative to ray origin. */
const float3 v0 = tri_c - P;
const float3 v1 = tri_a - P;
const float3 v2 = tri_b - P;
const float3 v0 = tri_a - ray_P;
const float3 v1 = tri_b - ray_P;
const float3 v2 = tri_c - ray_P;
/* Calculate triangle edges. */
const float3 e0 = v2 - v0;
@@ -131,40 +130,40 @@ ccl_device_forceinline bool ray_triangle_intersect(float3 ray_P,
const float3 e2 = v1 - v2;
/* Perform edge tests. */
const float U = dot(cross(v2 + v0, e0), ray_dir);
const float V = dot(cross(v0 + v1, e1), ray_dir);
const float W = dot(cross(v1 + v2, e2), ray_dir);
const float U = dot(cross(e0, v2 + v0), ray_D);
const float V = dot(cross(e1, v0 + v1), ray_D);
const float W = dot(cross(e2, v1 + v2), ray_D);
const float UVW = U + V + W;
const float eps = FLT_EPSILON * fabsf(UVW);
const float minUVW = min(U, min(V, W));
const float maxUVW = max(U, max(V, W));
if (minUVW < 0.0f && maxUVW > 0.0f) {
if (!(minUVW >= -eps || maxUVW <= eps)) {
return false;
}
/* Calculate geometry normal and denominator. */
const float3 Ng1 = cross(e1, e0);
// const Vec3vfM Ng1 = stable_triangle_normal(e2,e1,e0);
const float3 Ng = Ng1 + Ng1;
const float den = dot3(Ng, dir);
const float den = dot(Ng, ray_D);
/* Avoid division by 0. */
if (UNLIKELY(den == 0.0f)) {
return false;
}
/* Perform depth test. */
const float T = dot3(v0, Ng);
const float T = dot(v0, Ng);
const float t = T / den;
if (!(t >= ray_tmin && t <= ray_tmax)) {
return false;
}
*isect_u = U / den;
*isect_v = V / den;
const float rcp_UVW = (fabsf(UVW) < 1e-18f) ? 0.0f : 1.0f / UVW;
*isect_u = min(U * rcp_UVW, 1.0f);
*isect_v = min(V * rcp_UVW, 1.0f);
*isect_t = t;
return true;
#undef dot3
}
/* Tests for an intersection between a ray and a quad defined by

View File

@@ -99,15 +99,7 @@ ProjectionTransform projection_inverse(const ProjectionTransform &tfm)
memcpy(M, &tfm, sizeof(M));
if (UNLIKELY(!transform_matrix4_gj_inverse(R, M))) {
/* matrix is degenerate (e.g. 0 scale on some axis), ideally we should
* never be in this situation, but try to invert it anyway with tweak */
M[0][0] += 1e-8f;
M[1][1] += 1e-8f;
M[2][2] += 1e-8f;
if (UNLIKELY(!transform_matrix4_gj_inverse(R, M))) {
return projection_identity();
}
return projection_identity();
}
memcpy(&tfmR, R, sizeof(R));
@@ -115,16 +107,9 @@ ProjectionTransform projection_inverse(const ProjectionTransform &tfm)
return tfmR;
}
Transform transform_inverse(const Transform &tfm)
{
ProjectionTransform projection(tfm);
return projection_to_transform(projection_inverse(projection));
}
Transform transform_transposed_inverse(const Transform &tfm)
{
ProjectionTransform projection(tfm);
ProjectionTransform iprojection = projection_inverse(projection);
ProjectionTransform iprojection(transform_inverse(tfm));
return projection_to_transform(projection_transpose(iprojection));
}

View File

@@ -63,10 +63,10 @@ ccl_device_inline float3 transform_point(ccl_private const Transform *t, const f
_MM_TRANSPOSE4_PS(x, y, z, w);
ssef tmp = shuffle<0>(aa) * x;
tmp = madd(shuffle<1>(aa), y, tmp);
ssef tmp = w;
tmp = madd(shuffle<2>(aa), z, tmp);
tmp += w;
tmp = madd(shuffle<1>(aa), y, tmp);
tmp = madd(shuffle<0>(aa), x, tmp);
return float3(tmp.m128);
#elif defined(__KERNEL_METAL__)
@@ -93,9 +93,9 @@ ccl_device_inline float3 transform_direction(ccl_private const Transform *t, con
_MM_TRANSPOSE4_PS(x, y, z, w);
ssef tmp = shuffle<0>(aa) * x;
ssef tmp = shuffle<2>(aa) * z;
tmp = madd(shuffle<1>(aa), y, tmp);
tmp = madd(shuffle<2>(aa), z, tmp);
tmp = madd(shuffle<0>(aa), x, tmp);
return float3(tmp.m128);
#elif defined(__KERNEL_METAL__)
@@ -312,7 +312,6 @@ ccl_device_inline void transform_set_column(Transform *t, int column, float3 val
t->z[column] = value.z;
}
Transform transform_inverse(const Transform &a);
Transform transform_transposed_inverse(const Transform &a);
ccl_device_inline bool transform_uniform_scale(const Transform &tfm, float &scale)
@@ -392,39 +391,47 @@ ccl_device_inline float4 quat_interpolate(float4 q1, float4 q2, float t)
#endif /* defined(__KERNEL_GPU_RAYTRACING__) */
}
ccl_device_inline Transform transform_quick_inverse(Transform M)
ccl_device_inline Transform transform_inverse(const Transform tfm)
{
/* possible optimization: can we avoid doing this altogether and construct
* the inverse matrix directly from negated translation, transposed rotation,
* scale can be inverted but what about shearing? */
Transform R;
float det = M.x.x * (M.z.z * M.y.y - M.z.y * M.y.z) - M.y.x * (M.z.z * M.x.y - M.z.y * M.x.z) +
M.z.x * (M.y.z * M.x.y - M.y.y * M.x.z);
/* This implementation matches the one in Embree exactly, to ensure consistent
* results with the ray intersection of instances. */
float3 x = make_float3(tfm.x.x, tfm.y.x, tfm.z.x);
float3 y = make_float3(tfm.x.y, tfm.y.y, tfm.z.y);
float3 z = make_float3(tfm.x.z, tfm.y.z, tfm.z.z);
float3 w = make_float3(tfm.x.w, tfm.y.w, tfm.z.w);
/* Compute determinant. */
float det = dot(x, cross(y, z));
if (det == 0.0f) {
M.x.x += 1e-8f;
M.y.y += 1e-8f;
M.z.z += 1e-8f;
det = M.x.x * (M.z.z * M.y.y - M.z.y * M.y.z) - M.y.x * (M.z.z * M.x.y - M.z.y * M.x.z) +
M.z.x * (M.y.z * M.x.y - M.y.y * M.x.z);
/* Matrix is degenerate (e.g. 0 scale on some axis), ideally we should
* never be in this situation, but try to invert it anyway with tweak.
*
* This logic does not match Embree which would just give an invalid
* matrix. A better solution would be to remove this and ensure any object
* matrix is valid. */
x.x += 1e-8f;
y.y += 1e-8f;
z.z += 1e-8f;
det = dot(x, cross(y, z));
if (det == 0.0f) {
det = FLT_MAX;
}
}
det = (det != 0.0f) ? 1.0f / det : 0.0f;
float3 Rx = det * make_float3(M.z.z * M.y.y - M.z.y * M.y.z,
M.z.y * M.x.z - M.z.z * M.x.y,
M.y.z * M.x.y - M.y.y * M.x.z);
float3 Ry = det * make_float3(M.z.x * M.y.z - M.z.z * M.y.x,
M.z.z * M.x.x - M.z.x * M.x.z,
M.y.x * M.x.z - M.y.z * M.x.x);
float3 Rz = det * make_float3(M.z.y * M.y.x - M.z.x * M.y.y,
M.z.x * M.x.y - M.z.y * M.x.x,
M.y.y * M.x.x - M.y.x * M.x.y);
float3 T = -make_float3(M.x.w, M.y.w, M.z.w);
/* Divide adjoint matrix by the determinant to compute inverse of 3x3 matrix. */
const float3 inverse_x = cross(y, z) / det;
const float3 inverse_y = cross(z, x) / det;
const float3 inverse_z = cross(x, y) / det;
R.x = make_float4(Rx.x, Rx.y, Rx.z, dot(Rx, T));
R.y = make_float4(Ry.x, Ry.y, Ry.z, dot(Ry, T));
R.z = make_float4(Rz.x, Rz.y, Rz.z, dot(Rz, T));
/* Compute translation and fill transform. */
Transform itfm;
itfm.x = float3_to_float4(inverse_x, -dot(inverse_x, w));
itfm.y = float3_to_float4(inverse_y, -dot(inverse_y, w));
itfm.z = float3_to_float4(inverse_z, -dot(inverse_z, w));
return R;
return itfm;
}
ccl_device_inline void transform_compose(ccl_private Transform *tfm,

View File

@@ -12,6 +12,7 @@
#if !defined(__KERNEL_GPU__)
# include <stdint.h>
# include <stdio.h>
#endif
#include "util/defines.h"
@@ -70,6 +71,12 @@ ccl_device_inline bool is_power_of_two(size_t x)
CCL_NAMESPACE_END
/* Most GPU APIs matching native vector types, so we only need to implement them for
* CPU and oneAPI. */
#if defined(__KERNEL_GPU__) && !defined(__KERNEL_ONEAPI__)
# define __KERNEL_NATIVE_VECTOR_TYPES__
#endif
/* Vectorized types declaration. */
#include "util/types_uchar2.h"
#include "util/types_uchar3.h"
@@ -90,8 +97,6 @@ CCL_NAMESPACE_END
#include "util/types_float4.h"
#include "util/types_float8.h"
#include "util/types_vector3.h"
/* Vectorized types implementation. */
#include "util/types_uchar2_impl.h"
#include "util/types_uchar3_impl.h"
@@ -110,8 +115,6 @@ CCL_NAMESPACE_END
#include "util/types_float4_impl.h"
#include "util/types_float8_impl.h"
#include "util/types_vector3_impl.h"
/* SSE types. */
#ifndef __KERNEL_GPU__
# include "util/sseb.h"

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT2_H__
#define __UTIL_TYPES_FLOAT2_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,18 +9,18 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
struct float2 {
float x, y;
# ifndef __KERNEL_GPU__
__forceinline float operator[](int i) const;
__forceinline float &operator[](int i);
# endif
};
ccl_device_inline float2 make_float2(float x, float y);
ccl_device_inline void print_float2(const char *label, const float2 &a);
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT2_H__ */

View File

@@ -1,20 +1,16 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT2_IMPL_H__
#define __UTIL_TYPES_FLOAT2_IMPL_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
#endif
#ifndef __KERNEL_GPU__
# include <cstdio>
#endif
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
# ifndef __KERNEL_GPU__
__forceinline float float2::operator[](int i) const
{
util_assert(i >= 0);
@@ -28,6 +24,7 @@ __forceinline float &float2::operator[](int i)
util_assert(i < 2);
return *(&x + i);
}
# endif
ccl_device_inline float2 make_float2(float x, float y)
{
@@ -39,8 +36,6 @@ ccl_device_inline void print_float2(const char *label, const float2 &a)
{
printf("%s: %.8f %.8f\n", label, (double)a.x, (double)a.y);
}
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT2_IMPL_H__ */

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT3_H__
#define __UTIL_TYPES_FLOAT3_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,17 +9,28 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
struct ccl_try_align(16) float3
{
# ifdef __KERNEL_SSE__
# ifdef __KERNEL_GPU__
/* Compact structure for GPU. */
float x, y, z;
# else
/* SIMD aligned structure for CPU. */
# ifdef __KERNEL_SSE__
union {
__m128 m128;
struct {
float x, y, z, w;
};
};
# else
float x, y, z, w;
# endif
# endif
# ifdef __KERNEL_SSE__
/* Convenient constructors and operators for SIMD, otherwise default is enough. */
__forceinline float3();
__forceinline float3(const float3 &a);
__forceinline explicit float3(const __m128 &a);
@@ -29,18 +39,18 @@ struct ccl_try_align(16) float3
__forceinline operator __m128 &();
__forceinline float3 &operator=(const float3 &a);
# else /* __KERNEL_SSE__ */
float x, y, z, w;
# endif /* __KERNEL_SSE__ */
# endif
# ifndef __KERNEL_GPU__
__forceinline float operator[](int i) const;
__forceinline float &operator[](int i);
# endif
};
ccl_device_inline float3 make_float3(float f);
ccl_device_inline float3 make_float3(float x, float y, float z);
ccl_device_inline void print_float3(const char *label, const float3 &a);
#endif /* !defined(__KERNEL_GPU__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
/* Smaller float3 for storage. For math operations this must be converted to float3, so that on the
* CPU SIMD instructions can be used. */
@@ -78,5 +88,3 @@ struct packed_float3 {
static_assert(sizeof(packed_float3) == 12, "packed_float3 expected to be exactly 12 bytes");
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT3_H__ */

View File

@@ -1,20 +1,15 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT3_IMPL_H__
#define __UTIL_TYPES_FLOAT3_IMPL_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
#endif
#ifndef __KERNEL_GPU__
# include <cstdio>
#endif
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
# ifdef __KERNEL_SSE__
__forceinline float3::float3()
{
@@ -45,6 +40,7 @@ __forceinline float3 &float3::operator=(const float3 &a)
}
# endif /* __KERNEL_SSE__ */
# ifndef __KERNEL_GPU__
__forceinline float float3::operator[](int i) const
{
util_assert(i >= 0);
@@ -58,23 +54,32 @@ __forceinline float &float3::operator[](int i)
util_assert(i < 3);
return *(&x + i);
}
# endif
ccl_device_inline float3 make_float3(float f)
{
# ifdef __KERNEL_SSE__
float3 a(_mm_set1_ps(f));
# ifdef __KERNEL_GPU__
float3 a = {f, f, f};
# else
# ifdef __KERNEL_SSE__
float3 a(_mm_set1_ps(f));
# else
float3 a = {f, f, f, f};
# endif
# endif
return a;
}
ccl_device_inline float3 make_float3(float x, float y, float z)
{
# ifdef __KERNEL_SSE__
float3 a(_mm_set_ps(0.0f, z, y, x));
# ifdef __KERNEL_GPU__
float3 a = {x, y, z};
# else
# ifdef __KERNEL_SSE__
float3 a(_mm_set_ps(0.0f, z, y, x));
# else
float3 a = {x, y, z, 0.0f};
# endif
# endif
return a;
}
@@ -83,8 +88,6 @@ ccl_device_inline void print_float3(const char *label, const float3 &a)
{
printf("%s: %.8f %.8f %.8f\n", label, (double)a.x, (double)a.y, (double)a.z);
}
#endif /* !defined(__KERNEL_GPU__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT3_IMPL_H__ */

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT4_H__
#define __UTIL_TYPES_FLOAT4_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,7 +9,7 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
struct int4;
struct ccl_try_align(16) float4
@@ -35,16 +34,16 @@ struct ccl_try_align(16) float4
float x, y, z, w;
# endif /* __KERNEL_SSE__ */
# ifndef __KERNEL_GPU__
__forceinline float operator[](int i) const;
__forceinline float &operator[](int i);
# endif
};
ccl_device_inline float4 make_float4(float f);
ccl_device_inline float4 make_float4(float x, float y, float z, float w);
ccl_device_inline float4 make_float4(const int4 &i);
ccl_device_inline void print_float4(const char *label, const float4 &a);
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT4_H__ */

View File

@@ -1,20 +1,15 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_FLOAT4_IMPL_H__
#define __UTIL_TYPES_FLOAT4_IMPL_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
#endif
#ifndef __KERNEL_GPU__
# include <cstdio>
#endif
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
# ifdef __KERNEL_SSE__
__forceinline float4::float4()
{
@@ -41,6 +36,7 @@ __forceinline float4 &float4::operator=(const float4 &a)
}
# endif /* __KERNEL_SSE__ */
# ifndef __KERNEL_GPU__
__forceinline float float4::operator[](int i) const
{
util_assert(i >= 0);
@@ -54,6 +50,7 @@ __forceinline float &float4::operator[](int i)
util_assert(i < 4);
return *(&x + i);
}
# endif
ccl_device_inline float4 make_float4(float f)
{
@@ -89,8 +86,6 @@ ccl_device_inline void print_float4(const char *label, const float4 &a)
{
printf("%s: %.8f %.8f %.8f %.8f\n", label, (double)a.x, (double)a.y, (double)a.z, (double)a.w);
}
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT4_IMPL_H__ */

View File

@@ -2,8 +2,7 @@
* Original code Copyright 2017, Intel Corporation
* Modifications Copyright 2018-2022 Blender Foundation. */
#ifndef __UTIL_TYPES_FLOAT8_H__
#define __UTIL_TYPES_FLOAT8_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -11,11 +10,16 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
/* float8 is a reserved type in Metal that has not been implemented. For
* that reason this is named float8_t and not using native vector types. */
struct ccl_try_align(32) float8
#ifdef __KERNEL_GPU__
struct float8_t
#else
struct ccl_try_align(32) float8_t
#endif
{
# ifdef __KERNEL_AVX2__
#ifdef __KERNEL_AVX2__
union {
__m256 m256;
struct {
@@ -23,28 +27,27 @@ struct ccl_try_align(32) float8
};
};
__forceinline float8();
__forceinline float8(const float8 &a);
__forceinline explicit float8(const __m256 &a);
__forceinline float8_t();
__forceinline float8_t(const float8_t &a);
__forceinline explicit float8_t(const __m256 &a);
__forceinline operator const __m256 &() const;
__forceinline operator __m256 &();
__forceinline float8 &operator=(const float8 &a);
__forceinline float8_t &operator=(const float8_t &a);
# else /* __KERNEL_AVX2__ */
#else /* __KERNEL_AVX2__ */
float a, b, c, d, e, f, g, h;
# endif /* __KERNEL_AVX2__ */
#endif /* __KERNEL_AVX2__ */
#ifndef __KERNEL_GPU__
__forceinline float operator[](int i) const;
__forceinline float &operator[](int i);
#endif
};
ccl_device_inline float8 make_float8(float f);
ccl_device_inline float8
make_float8(float a, float b, float c, float d, float e, float f, float g, float h);
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
ccl_device_inline float8_t make_float8_t(float f);
ccl_device_inline float8_t
make_float8_t(float a, float b, float c, float d, float e, float f, float g, float h);
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT8_H__ */

View File

@@ -2,87 +2,79 @@
* Original code Copyright 2017, Intel Corporation
* Modifications Copyright 2018-2022 Blender Foundation. */
#ifndef __UTIL_TYPES_FLOAT8_IMPL_H__
#define __UTIL_TYPES_FLOAT8_IMPL_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
#endif
#ifndef __KERNEL_GPU__
# include <cstdio>
#endif
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
# ifdef __KERNEL_AVX2__
__forceinline float8::float8()
#ifdef __KERNEL_AVX2__
__forceinline float8_t::float8_t()
{
}
__forceinline float8::float8(const float8 &f) : m256(f.m256)
__forceinline float8_t::float8_t(const float8_t &f) : m256(f.m256)
{
}
__forceinline float8::float8(const __m256 &f) : m256(f)
__forceinline float8_t::float8_t(const __m256 &f) : m256(f)
{
}
__forceinline float8::operator const __m256 &() const
__forceinline float8_t::operator const __m256 &() const
{
return m256;
}
__forceinline float8::operator __m256 &()
__forceinline float8_t::operator __m256 &()
{
return m256;
}
__forceinline float8 &float8::operator=(const float8 &f)
__forceinline float8_t &float8_t::operator=(const float8_t &f)
{
m256 = f.m256;
return *this;
}
# endif /* __KERNEL_AVX2__ */
#endif /* __KERNEL_AVX2__ */
__forceinline float float8::operator[](int i) const
#ifndef __KERNEL_GPU__
__forceinline float float8_t::operator[](int i) const
{
util_assert(i >= 0);
util_assert(i < 8);
return *(&a + i);
}
__forceinline float &float8::operator[](int i)
__forceinline float &float8_t::operator[](int i)
{
util_assert(i >= 0);
util_assert(i < 8);
return *(&a + i);
}
#endif
ccl_device_inline float8 make_float8(float f)
ccl_device_inline float8_t make_float8_t(float f)
{
# ifdef __KERNEL_AVX2__
float8 r(_mm256_set1_ps(f));
# else
float8 r = {f, f, f, f, f, f, f, f};
# endif
#ifdef __KERNEL_AVX2__
float8_t r(_mm256_set1_ps(f));
#else
float8_t r = {f, f, f, f, f, f, f, f};
#endif
return r;
}
ccl_device_inline float8
make_float8(float a, float b, float c, float d, float e, float f, float g, float h)
ccl_device_inline float8_t
make_float8_t(float a, float b, float c, float d, float e, float f, float g, float h)
{
# ifdef __KERNEL_AVX2__
float8 r(_mm256_set_ps(a, b, c, d, e, f, g, h));
# else
float8 r = {a, b, c, d, e, f, g, h};
# endif
#ifdef __KERNEL_AVX2__
float8_t r(_mm256_setr_ps(a, b, c, d, e, f, g, h));
#else
float8_t r = {a, b, c, d, e, f, g, h};
#endif
return r;
}
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_FLOAT8_IMPL_H__ */

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_INT2_H__
#define __UTIL_TYPES_INT2_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,17 +9,17 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
struct int2 {
int x, y;
# ifndef __KERNEL_GPU__
__forceinline int operator[](int i) const;
__forceinline int &operator[](int i);
# endif
};
ccl_device_inline int2 make_int2(int x, int y);
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_INT2_H__ */

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_INT2_IMPL_H__
#define __UTIL_TYPES_INT2_IMPL_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,7 +9,8 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
# ifndef __KERNEL_GPU__
int int2::operator[](int i) const
{
util_assert(i >= 0);
@@ -24,14 +24,13 @@ int &int2::operator[](int i)
util_assert(i < 2);
return *(&x + i);
}
# endif
ccl_device_inline int2 make_int2(int x, int y)
{
int2 a = {x, y};
return a;
}
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_INT2_IMPL_H__ */

View File

@@ -1,8 +1,7 @@
/* SPDX-License-Identifier: Apache-2.0
* Copyright 2011-2022 Blender Foundation */
#ifndef __UTIL_TYPES_INT3_H__
#define __UTIL_TYPES_INT3_H__
#pragma once
#ifndef __UTIL_TYPES_H__
# error "Do not include this file directly, include util/types.h instead."
@@ -10,10 +9,15 @@
CCL_NAMESPACE_BEGIN
#if !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__)
#ifndef __KERNEL_NATIVE_VECTOR_TYPES__
struct ccl_try_align(16) int3
{
# ifdef __KERNEL_SSE__
# ifdef __KERNEL_GPU__
/* Compact structure on the GPU. */
int x, y, z;
# else
/* SIMD aligned structure for CPU. */
# ifdef __KERNEL_SSE__
union {
__m128i m128;
struct {
@@ -29,19 +33,20 @@ struct ccl_try_align(16) int3
__forceinline operator __m128i &();
__forceinline int3 &operator=(const int3 &a);
# else /* __KERNEL_SSE__ */
# else /* __KERNEL_SSE__ */
int x, y, z, w;
# endif /* __KERNEL_SSE__ */
# endif /* __KERNEL_SSE__ */
# endif
# ifndef __KERNEL_GPU__
__forceinline int operator[](int i) const;
__forceinline int &operator[](int i);
# endif
};
ccl_device_inline int3 make_int3(int i);
ccl_device_inline int3 make_int3(int x, int y, int z);
ccl_device_inline void print_int3(const char *label, const int3 &a);
#endif /* !defined(__KERNEL_GPU__) || defined(__KERNEL_ONEAPI__) */
#endif /* __KERNEL_NATIVE_VECTOR_TYPES__ */
CCL_NAMESPACE_END
#endif /* __UTIL_TYPES_INT3_H__ */

Some files were not shown because too many files have changed in this diff Show More