Compare commits

...

292 Commits

Author SHA1 Message Date
cb6b0d088a move node_common to c++ 2021-09-30 13:52:12 +02:00
3453b22b1e Fix: Curve to Mesh node outputs original curve
It should only output the new mesh (and potentially instances
that are processed separately).
2021-09-30 13:33:21 +02:00
9628ef4135 Fix: wrong field input deduplication with Material Selection node 2021-09-30 12:51:09 +02:00
80d7cac22d Fix T91773: improve numerical stability in Curve Spiral node 2021-09-30 12:33:34 +02:00
ae0f944a57 Fix Cycles viewport flickering
Caused by a lack of synchronization between update process which sets
clear flag and the draw code checking the flag outside of a lock.
2021-09-30 11:44:53 +02:00
5f632f9f6e Fix color width regression in 66e24ce35b
Color buttons were drawing single icon width.
2021-09-30 17:45:09 +10:00
af13168a3f Cleanup: remove unused SpaceImage.curtile 2021-09-30 17:32:48 +10:00
e9dac3eab8 Cleanup: reduce Sequence size by 8 bytes
Also use int8_t for color tag.
2021-09-30 17:32:46 +10:00
fdcae48663 Cleanup: isolate UDIM parameters into a struct
Passing multiple UDIM arguments into the packing function
is awkward especially since the caller may not be using UDIM.

Use an argument to store UDIM packing parameters which can be NULL,
which operates without any UDIM support.

Add a function that extracts these parameters out of the image space
allowing for multiple functions to take UDIM parameters in the future.
2021-09-30 17:32:45 +10:00
2c2516bfc9 Fix(unreported): LineArt curve objects garbled result.
This is caused by line art loading curve objects twice from curve and converted mesh instances (when instance option is on). Now only load mesh when instance option is on.
2021-09-30 13:30:10 +08:00
66e24ce35b Fix menu width regression in c7d94a7827
Icon only popup buttons needed to be adjusted too,
add an uiTextIconPadFactor.icon_only to support this.
2021-09-30 12:43:52 +10:00
84dcf12ceb UI: Increase Area Resize Edge Hit Size
This patch increases the size of the Area BORDERPADDING a bit to make
it easier to hit the edges when initiating area resizing.

See D11925 for details and examples.

Differential Revision: https://developer.blender.org/D11925

Reviewed by Campbell Barton
2021-09-29 15:05:58 -07:00
81f552e9ad Geometry Nodes: Expose Bezier handle positions as an attribute
This commit exposes left and right bezier handles as an attribute.
Interaction basically works like edit mode. If you move an aligned
handle, it also moves the opposite handle of the control point.
The difference is that you can't edit "Auto" or "Vector" handles,
you have to first use the "Set Handle Type" node. That gives the handle
types a bit more meaning in the node tree-- changing them in edit mod
is more like a "UI override".

The attributes are named `handle_start` and `handle_end`,
which is the same name used in the curve RNA API.

A new virtual array implementation is added which handles the case of
splines that don't have these attributes, and it also calls two new
functions on `BezierSpline` to set the handle position accounting
for aligned handles.

The virtual arrays and attribute providers will be refactored
(probably templated) in the future, as a next step after the last
built-in curve attribute provider has landed.

Differential Revision: https://developer.blender.org/D12005
2021-09-29 15:29:29 -05:00
cd03f5b6e5 Add RNA path funcs for VolumeRender & VolumeDisplay
Without proper RNA paths, Alt-click editing properties on multiple
selected objects doesn not work (as well as the 'Copy To Selected'
operator).

Fixes T91806.

Maniphest Tasks: T91806

Differential Revision: https://developer.blender.org/D12700
2021-09-29 21:39:56 +02:00
0cddbcf1d7 Fix T91803: Freestyle rendering as pass broken after recent changes 2021-09-29 21:14:37 +02:00
a2e321aa6d Cleanup: Compositor: Migrate most converter nodes to new socket builder
This migrates most nodes except for the switch view node.
This node requires dynamic sockets so its implementation will be more involved.
2021-09-29 14:48:30 -04:00
d3d021601d Nodes: Fix Split View missing identifier
Mistake from rB84251acfcc4534059d6ccd6682f8e37d529b9063
2021-09-29 14:48:30 -04:00
19785cb022 Fix Cycles CPU performance regression after recent change for intersections size
This struct is much bigger now, and does not actually need to be fully zero
initialized.
2021-09-29 20:25:16 +02:00
Himanshi Kalra
22c61e8060 Tests: Disable tests for non-compiled libraries
This diff disables tests for Boolean, subdivision surface and volume
when GMP, Opensubdiv and Openvdb are not compiled respectively.
It also changes the existing file structure and adds sub-folders for
boolean and subdivison tests. The volume folder only has one test and
is as unchanged structure-wise.

Reviewed By: JacquesLucke, LazyDodo

Differential Revision: https://developer.blender.org/D12448
2021-09-29 23:52:52 +05:30
6f23e4484d Fix non-finite curve normal causing Cycles to crash
Similar to the previous change in the area: need to avoid ray
point and direction becoming a non-finite value.

Use the view direction when the geometrical normal can not be
calculated.

Collaboration and sanity inspiration with Brecht!

Differential Revision: https://developer.blender.org/D12703
2021-09-29 19:49:59 +02:00
1d478851f8 GPencil: Avoid double depsgraph tag
The eval data is updated at the end of the function and this call just adds a calculation not used.
2021-09-29 19:21:54 +02:00
214baf5422 Assets: Enable recursive reading for the asset view template as well
This makes asset view templates, e.g. as used by the Pose Library add-on
use recursive asset loading, see

Also works around an issue that made assets not show up at all anymore
since fc7beac8d6. I'm creating a separate report for that regression,
but in the Asset Browser and Viewer it shouldn't be apparent currently.
2021-09-29 18:51:10 +02:00
45a312fd8f Fix build failure on Windows + wrong buffer size
Was using `NAME_MAX` which is defined in `limits.h`, which again may be
implicitly included by the compiler. Intend was to use the Blender
define `MAX_NAME`.
2021-09-29 17:49:42 +02:00
9d9f205dc4 Asset Browser: Initial Asset Catalog UI
The Asset Browser now displays a tree with asset catalogs in the left
sidebar.
This replaces the asset categories. It uses the new UI tree-view API
(https://wiki.blender.org/wiki/Source/Interface/Views#Tree-View).
Buttons are displayed for adding and removing of catalogs. Parent items
can be collapsed, but the collapsed/uncollapsed state is not stored in
files yet.
Note that edits to catalogs (e.g. new or removed catalogs) are only
written to the asset library's catalog definition files when saving a
.blend.

In the "Current File" asset library, we try to show asset catalogs from
a parent asset library, or if that fails, from the directory the file is
stored in. See adaf4f56e1.

There are plenty of TODOs and smaller glitches to be fixed still. Plus a
UI polishing pass should be done.

Important missing UI features:
* Dragging assets into catalogs (WIP, close to being ready).
* Renaming catalogs
* Proper handling of catalogs in the "Current File" asset library
  (currently not working well).

The "Current File" asset library is especially limited still. Since this
is the only place where you can assign assets to a catalog, this makes
the catalogs very cumbersome in general. To assign an asset to a
catalog, one has to manually copy the Catalog ID (a random hash like
number) to the asset metadata through a temporary UI in the Asset
Browser Sidebar. These limitations should be addressed over the next few
days, they are high priority.

Differential Revision: https://developer.blender.org/D12670
2021-09-29 17:15:23 +02:00
df9120b365 Fix T89864: Adding an asset referencing other objects adds it to scene but only adds data-blocks of referenced objects.
Link/append code needs proper access to scene/view3d data to handle
collections/objects instantiation.

Note that this is a temporary hack more than a proper fix, which would require
a deeper redesign of drag&drop code.

Also note that this will not handle 'properly' (i.e. as user would
expect it) cases like implicitely appended parent objects, in that only
the explicitely appended object will be dropped to the nes location, the
others will remain at their original coordinates.

Differential Revision: https://developer.blender.org/D12696
2021-09-29 17:10:42 +02:00
6aac892fad Cleanup: Enforce C linkage for internal File Browser header
This will be used by C++ code in the upcoming asset catalog UI commit.
2021-09-29 16:59:48 +02:00
367775ac6a Fix Cycles use of uninitialized value in volume stack intersection on CPU
Could cause an actual bug but probability is low in practice.
2021-09-29 16:37:32 +02:00
4d4113adc2 Cycles: record large number of transparent shadow intersections on CPU
So we can do fewer intersection calls, only on the GPU do we need to save
memory and do this in small steps.

Ref T87836
2021-09-29 16:37:32 +02:00
fe070fe33b Fix Cycles crash in certain hair configurations
The issue was caused by hair shader setup setting normal to a non
finite value, which then gets used to create a ray with non-finite
direction, making BVH traversal to run out of stack memory.

Happens with 150_0040_A.lighting.blend frame 112 of the Sprites
project.

Differential Revision: https://developer.blender.org/D12692
2021-09-29 16:14:23 +02:00
901fa96b7f Keymap: New preference to open folders on single click in file browser
Introduce a new keymap preference to navigate into folders by clicking on them once instead of twice.
Makes browsing folders faster albeit non-standard, so keeping this off by default for now.

Does not affect Industry Compatible or other keymaps.

It is still the possible to right-click to open context menu, hold Ctrl or Shift to select multiple items:

{F10651030, size=full}

----

Keymap preference:

{F10652759, size=full}

Part of T91537

Reviewed By: fsiddi, campbellbarton

Differential Revision: https://developer.blender.org/D12667
2021-09-29 15:57:57 +02:00
1f4545dc9c Cleanup: else-after-return 2021-09-29 15:48:16 +02:00
b80ed8396d Fix T89164: Sculpt "Smooth" brush crash with zero pressure
Caused by {rB3e5431fdf439}

Issue is that sculpting could start with using `SCULPT_smooth` and
(because of the Pressure sensitivity dropping to zero) code would switch
to `SCULPT_enhance_details_brush` at strength zero.
Issue with this though is that this can be in the middle or end of a
stroke and the necessary `ss->cache->detail_directions` are only
initialized for the first brush step (see
`SCULPT_stroke_is_first_brush_step` in `SCULPT_enhance_details_brush`).
With these missing, it could only go downhill from there.

Suggest to prevent the "mode-flip" from `SCULPT_smooth` to
`SCULPT_enhance_details_brush` (happening solely because of pressure
strength) by changing the condition.
Now do `SCULPT_enhance_details_brush` only if strength is **below** zero
and `SCULPT_smooth` else (flipping from enhance_details to regular smooth
is fine).

If inverting the brush in the middle of the stroke will be supported at
some point, the codepath of `smooth` would have to inform the cache that
invert changed and detail_directions would have to be initialized then
(even if not at the start of the stroke).

Maniphest Tasks: T89164

Differential Revision: https://developer.blender.org/D12676
2021-09-29 15:07:45 +02:00
c33a005297 Texts in Outliner dont activate
Texts in Outliner dont activate on selecting (Text Editor did not change
to selected text) which is a bit inconsistent to other ID types.

ref T90862

Maniphest Tasks: T90862

Differential Revision: https://developer.blender.org/D12412
2021-09-29 15:07:13 +02:00
ef29bf9023 Assets: Expose option to reuse data-block data when appending
With 794c2828af & f48a4aa0f9 it's possible to reuse possibly
expensive, nested data of a data-block when appending. E.g. the
texture of a material, or the mesh of an object. Without this it's easy
to bloat memory and the file size. Duplicated textures also cause
unnecessary shader recompilations.

The feature was intended to be the new default behavior for the Asset
Browser, but it wasn't actually added to the UI yet. This patch adds a
new import type option to the Asset Browser. So from the menu in the
header, you can now choose between:

* Link
* Append
* Append (Reuse Data)

The latter is the new default.

Maniphest Task: https://developer.blender.org/T91741

Differential Revision: https://developer.blender.org/D12647

Reviewed by: Sybren Stüvel, Bastien Montagne
2021-09-29 15:02:38 +02:00
Falk David
5cebcb415e VSE: Add color tags to strips
This patch adds color tags to VSE strips, an overlay option to toggle the colors
on and off, a section in the theme settings to define the 9 possible colors and
two ways of changing the color tag through the UI. You can change the color
through the right-click context menu, or in the strip side panel next to the
strip name.

Color tags are defined in user preferences and they can be disabled in overlay
settings.

Reviewed By: campbellbarton, ISS

Differential Revision: https://developer.blender.org/D12405
2021-09-29 14:34:01 +02:00
ffb9577ac9 Cycles: Ensure finite displacement and background evaluation
Avoids possible numerical issues in the path tracing kernel, which
is most important for displacement as non-finite values in BVH can
lead to infinite node recursion during traversal.

Differential Revision: https://developer.blender.org/D12690
2021-09-29 14:06:10 +02:00
731325a022 Cycles: Make sure GPU transfer is finished prior display update
Noticed while looking into flickering issues in viewport.

Doesn't seem to solve the flicker issue for me, but is something
what is supposed to be happening anyway.

Differential Revision: https://developer.blender.org/D12673
2021-09-29 14:05:51 +02:00
adaf4f56e1 Support loading catalogs in the Current File asset library
When the Asset Browser shows the "Current File" asset library, now it
also attempts to load an asset catalog definition file from location of
the current .blend file. This happens as follows:
* First, see if the file is inside of an asset library that is "mounted" in the
  Preferences. Load the catalogs from there if so.
* Otherwise, if the file is saved, load the catalogs from the directory the
  file is saved in.
* If the file is not saved, no catalogs will be loaded.

Unit tests are being worked on in D12689.

Creating catalogs from the "Current File" asset library still doesn't work, as
the asset catalog service doesn't construct an in-memory catalog definition
file in that case yet.

Differential Revision: https://developer.blender.org/D12675
2021-09-29 13:18:44 +02:00
Gaia Clary
78b9a8c7b9 Add an option to silence bpy.ops.anim.keyframe_delete_v3d when used in Addons
The issues:

1.) When we want to remove keyframes from a range of frames in an
action,
then we can use bpy.ops.anim.keyframe_delete_v3d to remove the keys
frame by frame. However, whenever the operator hits a frame with no
keyframes, then it generates an error. While when it hits a frame
with keyframes, then it reports the numbner of removed keys.

This creates a lot of unnecessary noise in the Blender console.

2.) Furthermore a related issue is that WM_event_add_notifier() is
called also when no frames where removed. This seems to significantly
slow down the removal of keyframes in a range of frames at least
when i use vscode for debugging.

A proposal for improvement:
This patch adds an attribute 'confirm_success' which controls
if the operator reports back what it did (or did not) while
executing. Silent mode would then be called like this:

bpy.ops.anim.keyframe_delete_v3d(confirm_success=False)

Note: confirm_success is True by default so this patchj does not
change the behavior of Blender, it only gives the option to scripts.

3.) Personal note:
I have chosen the attribute name to be equal as it is used
in other related operators. However i rather would rename the
attribute to "verbose" (preferred) or "with_confirm". But i let this
to be decided by the reviewers. Thanks for your time to review!

Reviewed By: campbellbarton

Differential Revision: https://developer.blender.org/D12629
2021-09-29 13:09:17 +02:00
6351c73b75 Fix T88954: Rearranging of modifiers for linked objects no longer works.
There would be no modifier set in context in drag and drop case, in that
case try to get active modifier from active object instead.
2021-09-29 12:53:57 +02:00
0c32e3b312 Fix T91756: String to Curve node produces NaN when size is zero 2021-09-29 12:40:36 +02:00
4cf4bb2664 UI: swap tool and regular header
Swap the tool-header and header order so the tool-header
so the header is always next to the window edge.

Note that files saved in 3.0 will have overlapping headers when opened
in any version of Blender before this commit.

Reviewed By: Severin, fsiddi

Maniphest Tasks: T91536

Ref D12631
2021-09-29 20:03:47 +10:00
eabb134840 Fix T91802: vertex color layer name collides with itself
Vertex colors are already included in `mesh.attributes`. So they don't
have to be added to the collision checks separately.
2021-09-29 11:01:17 +02:00
24a965bb16 Fix T91767: crash when instancing geometry coming from modifier
The issue was that the `GeometrySet` that comes from the modifier does
not have full ownership of the mesh for legacy reasons to avoid copies.
Calling `ensure_owns_direct_data` makes sure that the geometry set
actually owns the geometry so that it can be instanced.
2021-09-29 10:55:32 +02:00
f51bef75f4 Geometry Nodes: instance on points in instances
For consistency with other nodes, we also want to process
all instances in the Points input independently. This allows
for more efficient instancing of many objects but also leads
to nested instancing. All instances are processed in their
local space, so that instances don't have to be realized.

Differential Revision: https://developer.blender.org/D12660
2021-09-29 10:10:12 +02:00
8cbec0beb2 Fix/bypass doversion for wire new colors
Some cases were not covered by the original doversion in
4a0ddeb62b. For now we make it always change the wire color regardless
of whether it was changed before or not.

This do a subversion bump.
2021-09-29 09:58:55 +02:00
Siddhartha Jejurkar
a285299ebb UV: Pack to closest/active UDIM
Implements T78397
Extends the functionality of pack islands operator to allow packing UVs
to either the closest or active UDIM tile.
This provides 2 new options for packing UVs :

* Closest UDIM: Selected UVs will be packed to the UDIM tile they were
  placed on. If not present on a valid UDIM tile, the UVs will be packed
  to the closest UDIM in UV space
* Active UDIM: Selected UVs will be packed to the active UDIM image tile
  In case, no image is present in the UV editor, then UVs will be packed
  to the tile on the UDIM grid where the 2D cursor is located.

Reviewed By: campbellbarton

Maniphest Tasks: T78397

Ref D12680
2021-09-29 17:53:09 +10:00
Siddhartha Jejurkar
bf06f76be6 UV Editor: Grid and snapping improvements
Implements T89789, T89792, custom grid (described as dynamic grid in
T78389) and UV grid snapping (T78391)
Replaces the default UV editor grid with 2 new types of grid :

* Custom grid: Allows the user to create an NxN grid, where the value
  of N is specified by the user.
* Subdividing grid: Subdivides the UV editor grid when the user
  zooms in the viewport and vice versa when zooming out.

UV snapping improvements :
* Increment snapping: Increment values for snapping are calculated based
  on which grid type is being used in the UV editor
  (subdividing or custom). In general the increment value is equal to
  the distance between 2 visible grid lines.
* Absolute grid snap: New toggle added to increment snapping option in
  the UV editor, allows UV grid snapping during translation.

Reviewed By: campbellbarton

Ref D12684
2021-09-29 17:48:35 +10:00
008ae26712 Fix T91237: Wrong Editors could sync animation 'Visible Range'
This was reported for the Outliner.

It was possible to set 'show_locked_time' on any space (via python, not
sure if there are other ways to achieve this).
Navigating in an animation editor obviously ruined the layout in certain
Editors that are not made for this.

Now restrict syncing to editors that support it well (the ones that have
this setting exposed in their menus) and prevent setting this in RNA.

Maniphest Tasks: T91237

Differential Revision: https://developer.blender.org/D12512
2021-09-29 09:12:47 +02:00
eddc3f5bc8 Cleanup: Add constructor for AttributeFieldInput 2021-09-28 23:19:33 -05:00
756c22bb41 Cleanup: Compositor: Migrate color nodes to new socket builder 2021-09-29 00:09:41 -04:00
d1220f795f UI: Update Levels Nodes
- Instead of making the enum expanded leave it in menu form.
- Use full words instead of letters.

Differential Revision: https://developer.blender.org/D12686
2021-09-28 21:36:15 -04:00
8e40bb2dea Nodes: Fix compositor viewer nodes having wrong alpha channel
Default should be black image with an alpha value of 1.
2021-09-28 20:33:42 -04:00
84251acfcc Cleanup: Compositor: Migrate most output nodes to new socket builder
This migrates most nodes except for the file output node.
This node requires dynamic sockets so its implementation will be more involved.
2021-09-28 20:13:42 -04:00
e81533b25e Cleanup: Simplify compositor Z Combine operation conversion 2021-09-28 18:46:10 -04:00
f0f70729b1 Fix: Knife undo with no cut segments left
Now if a user presses the knife tool undo key when there are no more cut segments to undo, the operator exits. Previously, it did nothing.
2021-09-28 23:09:54 +01:00
7f5d62dfc6 Cleanup: Make format 2021-09-28 18:03:33 -04:00
4a48482247 Cleanup: Compositor: Migrate most input nodes to new socket builder
This migrates most nodes except for the image/render layer nodes.
These nodes require dynamic sockets so their implementation will be more involved.
2021-09-28 18:00:14 -04:00
960b21e1d7 Cleanup: sort cmake file lists 2021-09-29 07:30:34 +10:00
6dceaafe5a Cleanup: trailing space, newlines at EOF 2021-09-29 07:30:34 +10:00
b524153d61 Cleanup: use C comments for plain text 2021-09-29 07:30:34 +10:00
79290f5160 Cleanup: spelling in comments 2021-09-29 07:29:15 +10:00
efe3a13b55 Cleanup: Removed redundant if macro 2021-09-28 21:57:42 +01:00
9f0a3a99ab Geometry Nodes: Fields version of attribute proximity node
Add a fields-aware implementation of the attribute proximity node.
The Source position is an implicit position field, but can be
connected with a position input node with alterations before use.
The target input and mode function the same as the original node.

Patch by Johnny Matthews with edits from Hans Goudey (@HooglyBoogly).

Differential Revision: https://developer.blender.org/D12635
2021-09-28 15:21:36 -05:00
283d76a70d Compositor: Full frame Glare node
Part of T88150.
2021-09-28 22:00:17 +02:00
0830211c95 Cleanup: Remove XRange and YRange in Compositor
Mostly unused and originally meant for areas with positive values.
With canvas compositing areas position may be negative.
2021-09-28 22:00:17 +02:00
f84fb12f5d Compositor: Add support for canvas compositing
This commit adds functionality for operations that require pixel
translation or resizing on "Full Frame" mode, allowing to adjust
their canvas. It fixes most cropping issues in translate, scale,
rotate and transform nodes by adjusting their canvas to the result,
instead of the input canvas.

Operations output buffer is still always on (0,0) position for
easier image algorithm implementation, even when the
canvas is not.

Current limitations (will be addressed on bcon2):
- Displayed translation in Viewer node is limited to 6000px.
- When scaling up the canvas size is limited to the
 scene resolution size x 1.5 . From that point it crops.

If none of these limitations are hit, the Viewer node displays
the full input with any translation.

Differential Revision: https://developer.blender.org/D12466
2021-09-28 22:00:17 +02:00
76377f0176 Compositor: Replace resolution concept by canvas
This is a code refactor in preparation of supporting canvas
compositing. See {D12466}.

No functional changes, all canvases are at (0,0) position matching
tiled implementation.

Differential Revision: https://developer.blender.org/D12465
2021-09-28 22:00:16 +02:00
2ecd963d87 Fix error in previous commit
`v_other` -> `v_step`
2021-09-28 16:57:49 -03:00
5cdb2aadfc Fix flag wrongly set in 'BM_face_split_edgenet_connect_islands'
Sometimes the `use_partial_connect` option could trigger the assert:
```
BLI_assert(!BM_elem_flag_test(l_iter->v, VERT_NOT_IN_STACK));
```

This can happen when `v_delimit->e` is not part of edgenet, so `v_other` will not have the flag.
2021-09-28 16:54:23 -03:00
Aaron Carlisle
c7a7c3f5e5 Cleanup: convert compositor nodes to c++
- Many cleanups of to use list base
- Some variable changes

These change is needed to migrate to the new socket builder API

Reviewed By: manzanilla

Differential Revision: https://developer.blender.org/D12366
2021-09-28 15:30:23 -04:00
87e315c237 Cleanup: Sort node types alphabetically 2021-09-28 14:20:29 -05:00
Johnny Matthews
85aac0ef6a Geometry Nodes: Field version of curve reverse node
The updated version has a selection input as a field
and does not realize instances implicitly.

Differential Revision: https://developer.blender.org/D12506
2021-09-28 13:53:20 -05:00
cc653c9b02 Fix potential render tests error with invalid utf-8 characters
In general should not happen, but better to report the actual error instead
of the Python test code failing.
2021-09-28 20:50:38 +02:00
95fca22bfe Geometry Nodes: Remove experimental option for fields
This enables fields as the official workflow for geometry nodes.
While many features are converted to use fields rather than the old
attribute workflow, many are not yet converted. In that case, the
unconverted nodes are still accessible with an experimental option.
In the coming weeks the rest of the nodes will be converted.

Differential Revision: https://developer.blender.org/D12672
2021-09-28 13:22:52 -05:00
86ec9d79ec Fix build without Cycles HIP device 2021-09-28 20:00:55 +02:00
e45ffce5fa Geometry Nodes: Use factor slider for distribution density factor
Though the factor isn't so useful to adjust by itself, and is mostly
useful when used with a field connected, the slider from 0 to 1 can
help to make it clear that it's just used as a multiplier for the max
density after distribution.

Differential Revision: https://developer.blender.org/D12654
2021-09-28 12:44:50 -05:00
Erik Abrahamsson
faedfd5740 Fix VS2017 compile error in String to Curves node
Because of a bug in VS2017 codecvt is replaced with Blender
BLI functions to convert from UTF8 to UTF32.

Differential Revision: https://developer.blender.org/D12655
2021-09-28 12:37:56 -05:00
Brian Savery
044a77352f Cycles: add HIP device support for AMD GPUs
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.

HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.

https://github.com/ROCm-Developer-Tools/HIP.

As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.

See task T91571 for more details on the current status and work remaining
to be done.

Credits:

Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)

Differential Revision: https://developer.blender.org/D12578
2021-09-28 19:18:55 +02:00
262b211856 Geometry Nodes: Mesh Point Cloud Conversion Nodes
This commit adds nodes to do direct conversion between meshes and point
clouds in geometry nodes. The conversion from mesh to points is helpful
to instance once per face, or once per edge, which was previously only
possibly with ugly work-arounds. Fields can be evaluated on the mesh
to pass them to the points with the attribute capture node.

The other conversion, point cloud to mesh vertices, is a bit less
obvious, though it is still a common request from users. It's helpful
for flexibility when passing data around, better visualization in the
viewport (and in the future, cycles), and the simplicity of points.

This is a step towards T91754, where point clouds are currently
combined with meshes when outputing to the next modifier after geometry
nodes. Since we're removing the implicit behavior for realizing
instances, it feels natural to use an explicit node to convert points
to vertices too.

Differential Revision: https://developer.blender.org/D12657
2021-09-28 12:14:13 -05:00
797064544e Geometry Nodes: Only show attribute toggle for field inputs
Change the toggle to switch between an attribute and a single value to
only display for inputs that are fields, as determined statically by
the field inferencing added in rB61f3d4eb7c7db7. This means the field
inferencing must be calculated on file load, since it's used in the UI.

Differential Revision: https://developer.blender.org/D12623
2021-09-28 12:05:42 -05:00
44e4f077a9 Geometry Nodes: Run nodes once on unique instance data
As described in T91672, often it can be much more efficient to run each
node only on the unique geometry of the instances, rather than realizing
all instances and potentially processing redundant data. Sometimes the
performance difference can be completely smooth vs. completely unusable.

Geometry nodes used to hide that choice from users by always realizing
instances, but recently we have decided to expose it. So this commit
makes nodes run once per unique reference in the entire tree of nested
instances in their input geometries, continuing the work started in
rB0559971ab377 and rBf94164d89629f0d2. For the old behavior, a realize
instances node can be added before the nodes, which is done in the
versioning code.

Differential Revision: https://developer.blender.org/D12656
2021-09-28 11:36:28 -05:00
b32b38b380 Fix T89400: Possible to delete objects used by overrides of collections.
This should not be allowed in general, added some initial call to check
when user is allowed to delete a data to search for mandatory override
usages...
2021-09-28 18:30:33 +02:00
f35ea668a1 Geometry Nodes: Move more nodes to legacy
- Curve to Points: Needs output sockets
- Curve Endpoitns: Needs the same output sockets
- Edge Split: Should have a selection input instead
- Subdivision Surface: Should not use "crease" implicitly
All new versions of these nodes should also not implicitly
realize instances.
2021-09-28 11:03:12 -05:00
e694165123 VSE: fix versioning code to the new maximum zoom level (128)
Since we bumped the number of channels to 128, I forgot to doversion the
editors. So new files (new editors) would have this right, but not
existing files.

Fixup to: 8fecc2a852
2021-09-28 17:58:41 +02:00
6ee2f2da96 Cleanup: asset catalog, remove obsolete TODO
No functional changes.
2021-09-28 17:47:34 +02:00
10d926cd4a Cleanup: asset catalogs, move file header definition to constant
Define the standard catalog definition file header in a constant,
separating it from the function that writes the entire file.

No functional changes.
2021-09-28 17:47:34 +02:00
3acf3e9e2f Geometry Nodes: don't realize instances in Instance on Points node
Part of T91672.
2021-09-28 17:46:32 +02:00
330a04d7c7 Fix T91393: Duplicating an action with python crashes Blender.
Own mistake when making NLA overridable, instead of assuming things
about the ID owner of the animation data being processed, properly
return and use the one found by `ED_actedit_animdata_from_context`.
2021-09-28 17:41:40 +02:00
53fa4801a0 Fix: Fluid/Cloth/DynamicPaint: Only share pointcaches in CoW case.
Particle copying code was already properly sharing pointcache between
orig data and its copy only when `LIB_ID_COPY_SET_COPIED_ON_WRITE` is
set, do the same for the other point cache users.

Using `LIB_ID_CREATE_NO_MAIN` here is waaaaaaay to much wide scope for
such a dangerous/advanced behavior, that kind of things has to be
strictly restricted in scope.
2021-09-28 17:41:40 +02:00
34ba6968b2 Cleanup: asset catalog service, remove obsolete write_to_disk function
Remove `AssetCatalogService::write_to_disk()` function. It has been
superseded by `write_to_disk_on_blendfile_save()`; the handful of test
functions that called the old function have been adjusted to use the
new one.

No functional changes to Blender itself.
2021-09-28 17:32:58 +02:00
f17ca53cdd Fix wrong update with shadow catcher and transparent film
This change fixes an issue when scene has a shadow catcher and film is
configured to be transparent. Starting viewport render and making the
background non-transparent will cause bad memory access (wrong render
and possibly crash).

Film passes depends on transparency of background, so check for this.

Demo file: F10650585

Differential Revision: https://developer.blender.org/D12666
2021-09-28 17:06:16 +02:00
640c4ace0b Cycles: Disable tile-level denoising
Only do denoising on the full-frame result. Saves render time.

Can re-consider in the future when/if we'll want to support
denoising during rendering (similar to viewport) to allow artists
to stop rendering when they see image to be good enough. Until
there is a design for that workflow stick to a more time efficient
rendering.

Differential Revision: https://developer.blender.org/D12662
2021-09-28 17:05:47 +02:00
ff7e67afd5 Geometry Nodes: Dashed lines for function flow
Use dashes to represent the function flow (while keeping continuous
lines for the data-flow).

It is important to tell both flows apart (the data and the function
 flow). The sockets help with that, the noodles help this further.

The "data flow" is evaluated at every single node. A user can inspect
the output sockets of those nodes and have a glimpse at their values.

The "function flow" (nodes) however is only evaluated in the geometry
nodes. The noodles are not transporting data in the same sense of the
"data flow". All that can be inspected are the attributes the functions
depend on.

Having this clearly communicated should help users to inspect the
nodetrees, read and understand the different flows in the same tree.

---

Known limitations:

At the moment the dash lines are not equidistant:

* It would be nice to get the "uv.x" to be resampled for the bezier curve
so the dashes are equally distributed in the curve.

* Using distance between the P3 and P0 instead of the real bezier curve
length seems to be fine.

---

Full disclaimer:

Changes with that much of a visual impact tend to be controversial. So
far the main feedback is that dashed lines can be associated to broken
link, and that there are better ways to represent the flows (or
different information that should be visually represented).

I'm fully aware of that. However dashed lines are already used in the
viewport and outliner to indicate (hierarchical) relation. Besides,
other approaches (double-lines, having the data flow to be more
distinct, ...) didn't pan out in the end (or didn't look as good as
this).

---

Impact in other editors:

The compositor uses mostly a "data flow" nodetree, so no change is
expected there.

The shader nodetree is one that could but doesn't have to change its
visual language.

The shader nodetree uses mostly "function flow" with some "data flow" nodes.
One can argue that it should be adapted to follow the
same pattern as geometry nodes (with the new noodles and the diamond
sockets). Oh the other hand, a shader nodetree has a single context.
When a node depends on the "UV", there is only one UV at a time for the
entire nodetree. So it can also be treated as a psedo "data flow"
nodetree if we want to avoid too many changes in other parts of Blender.

Differential Revision: https://developer.blender.org/D12602
2021-09-28 17:03:03 +02:00
728ae33f37 Cycles: Improve handling of tile file error
Expose them to the interface, and stop rendering as soon as possible.

Differential Revision: https://developer.blender.org/D12617
2021-09-28 16:58:27 +02:00
52a702468a Asset Catalog Service: add function to change catalog path
Add `AssetCatalogService::update_catalog_path()` to change the catalog
path of the given catalog, and also change the path of all the catalogs
contained within the given catalog.

Rebuilds the tree structure for the UI, but does not save the new catalog
definitions to disk.

No user-facing changes, just backend preparation for UI work.
2021-09-28 16:09:12 +02:00
d2004326a1 Geometry Nodes: remove empty mesh component in distribute node output 2021-09-28 15:52:31 +02:00
3e78c9e5bb Fix T91766: NLA Editor - Segmentation fault on strip resize
NLA and Dope Sheet use a specific transform operation to scale.

Unlike the conventional resize operation, `TIME_SCALE` operates on `td->val`.

This is a bit outside the convention of transform operators.

The expected thing in this case would be to work in `td->loc` and use the conventional resize operator.

But for now, to fix the problem, use `td->loc` in the `TIME_SCALE` operation.

This commit also brings a cleanup in the style of some comments and removing unnecessary `memset`.
2021-09-28 10:24:02 -03:00
3674347849 Assets: Clear Asset operator variants for clearing/setting Fake User
The Clear Asset operator (`ASSET_OT_clear`) now clears the Fake User.
This makes it symmetrical with the Mark Asset (`ASSET_OT_mark`)
operator, which sets Fake User to ensure assets are always saved to
disk.

Clear Asset now also has a `set_fake_user` boolean option, which allows
users to Clear Asset and set Fake User in one go.

The asset browser now shows these options in the context menu:
- Clear Asset: also clears Fake User. This makes it possible to actually
  remove assets from the blend file without leaving the Asset Browser.
- Clear Asset (Set Fake User): keeps the Fake User bit set. This makes
  it possible to "hide" the asset from the asset browser, without
  loosing the actual data.

Internally, the `ED_asset_clear_id(id)` function now always clears the
Fake User bit. If it was intended that this bit was kept set, it's up to
the caller to explicitly call `id_fake_user_set(id)` afterwards.

Manifest Task: T90844

Reviewed By: Severin

Differential Revision: https://developer.blender.org/D12663
2021-09-28 15:04:55 +02:00
73b2ecb297 Cleanup: typo in comment
Noticed by @david_black in D12392, thx!
2021-09-28 14:37:06 +02:00
Vitor Boschi da Silva
9490db1ad2 Cleanup: remove unused variable
As the title says

Reviewed By: lichtwerk, campbellbarton

Differential Revision: https://developer.blender.org/D12665
2021-09-28 14:29:02 +02:00
6a745e54f6 Cleanup: remove incorrect assert
The method works perfectly fine when `resource` is empty.
2021-09-28 13:32:22 +02:00
5d160dec3b Geometry Nodes: move legacy nodes to separate folder
Previously, we were moving them one by one. It's a lot easier
to just move all files at the same time.
2021-09-28 11:31:46 +02:00
b3431a8846 Fix Drivers Editor showing playhead on the X Axis
Currently the Drivers Editor shows this (the blue thing can be dragged
to change frame):
{F10647661}

But the Drivers Editors X axis is the output of the driver [which can be
further tweaked by the curve] not time(frame).
So it seems better to not mix them here, it is just confusing to have
two different units on one axis.
Especially since what we really want to look at in X (the drivers output
value) can be in a totally unrelated range compared to frames, so e.g.
we might be interested in a drivers range from 0.0 to 1.0 and a
framerange of 100 to 200, so putting this on one axis just does not make
sense. Better to use a separate timeline for this.

Note 2.79 also did not do this.

Maniphest Tasks: T91157

Differential Revision: https://developer.blender.org/D12392
2021-09-28 11:14:44 +02:00
6f29801f1b Fix Drivers Editor not hiding vertical part of cursor
When "Show Cursor" is unchecked, the Drivers Editor would still display
the vertical line representing the cursor x.

Probably overseen in {rB65072499c65a} (historically the vertical line
could represent either the current frame of the cursor X in drawing, but
this is now much more separate).

There is no point in seeing part of the cursor in the Drivers Editor if
this is disabled.

Also correct outdated comments.

ref. T91157

Maniphest Tasks: T91157

Differential Revision: https://developer.blender.org/D12391
2021-09-28 11:14:44 +02:00
b91946780c Path util: BLI_path_contains() case-insensitive on Windows
Make `BLI_path_contains()` case-insensitive on Windows. This behaviour
is dependent on the platform Blender is running on, like the rest of
BLI_path, and not on the style of paths (Windows-style paths will be
treated case-sensitively when Blender is running on Linux/macOS).
2021-09-28 11:05:45 +02:00
741fa8180c Fix T91679: Crash when saving bordered render as multilayer exr
The related issue which is fixed by this change is the missing noisy
image pass when denoising and border render is used,

Need to allocate passes after the passes has been copied from the
original render result.
2021-09-28 10:49:01 +02:00
e5ff9f3615 Cleanup: Move VSE channels check into own util function
Differential Revision: https://developer.blender.org/D12661
2021-09-28 10:44:39 +02:00
a64782b133 VSE: Implement sanity check for files with more channels than supported
This is a follow up to 8fecc2a852.

This makes sure future .blend files that have more channels than the
limit won't break Blender.

It can be backported to LTS.

This is part of https://developer.blender.org/D12645

Differential Revision: https://developer.blender.org/D12648
2021-09-28 10:23:13 +02:00
7cd43a9d28 Fix: field inferencing fails when there are undefined nodes 2021-09-28 10:19:16 +02:00
def8fd6330 Cleanup: support moving InstanceReference 2021-09-28 10:17:49 +02:00
e7b9423623 Geometry Nodes: multi-threading when modifying multiple geometry sets
Differential Revision: https://developer.blender.org/D12652
2021-09-28 10:17:00 +02:00
c7d94a7827 UI: avoid excessive padding for labels in headers
Labels in headers reserved space for an icon even when no icon was used.

This is caused by the shared function ui_text_icon_width adding 1.5x
a buttons X-units width the the width of the string.

Menu buttons detected this and subtracted the extra padding.

Instead of adding the same workaround for labels,
add ui_text_icon_width_ex that takes a padding argument.

Add presets for 'default', 'compact' and 'none' to avoid duplicating
padding values.

This allows removal of hard-coded label scaling for the add-object tool.
2021-09-28 14:36:52 +10:00
986d60490c Asset Browser: Allow World assets to be drag/dropped onto the viewport
While World data has always been able to be marked as an asset, there
was no way to actually use them from the asset browser. This change
allows users to drag-drop world assets onto the Viewport and have them
appended/linked to their scene.

Differential Revision: https://developer.blender.org/D12566
2021-09-27 21:00:17 -07:00
c53ffda8a4 Cleanup: Fix incorrect comments 2021-09-27 13:29:53 -05:00
f94164d896 Geometry Nodes: Do not realize instances in the material assign node
Only run the node once for every unique geometry set in the input's
instance heirarchy. This can massively improve performance when
there are many instances, but it will mean that the result is the same
for every instance. For the previous behavior, a "Realize Instances"
node can be used before this one.

This node can be changed without versioning since the old material
assign node was already deprecated and replaced.
2021-09-27 13:22:44 -05:00
5d70a4d7ee Geometry Nodes: Move output attribute names to a subpanel
In a sub-panel it will be clearer that they are outputs, since they
just look like more inputs now. Unfortunately it is not possible to
make sub-panels display conditionally currently, so the output
sub-panel will always be visible whether or not it is empty.

Differential Revision: https://developer.blender.org/D12653
2021-09-27 13:04:58 -05:00
e6aabcae14 Fix part of T91516: Cycles not rendering geometry nodes instances
Part of the fix is by Jacques. This fixes the most obvious case, but it's
still not clear how to deal with non-mesh geometry instances or how to handle
motion blur for such instances.
2021-09-27 19:49:43 +02:00
50b7253257 Cleanup: fix (harmless) uninitialized variable usage 2021-09-27 19:49:43 +02:00
8da23fd5aa Constraints: change default Stretch To rotation type to Swing.
As also explained in D6134, in most case of Stretch To usage in
rigs, it is desirable to use swing rotation, either via the old
method of pairing the constraint with Damped Track, or via the
Swing rotation type introduced in 2.82. This is for instance true
for all usages of the constraint in Rigify.

The reason can be understood by realizing that unlike order-
dependent euler rotations, swing is not biased to an axis, and
isn't affected by gimbal lock effects at merely 90 degrees
of rotation (it has only one singularity at 180 degrees).

Thus it makes sense to change the default for newly created
constraints to the Swing mode. This has no backward compatibility
concerns except for old tutorials and rig generation scripts.

Differential Revision: https://developer.blender.org/D12643
2021-09-27 20:45:03 +03:00
8967bcb755 Update RNA to user manual url mappings 2021-09-27 13:42:41 -04:00
847d355cab File/Asset Browser: Don't deselect other items when dragging
Basically this enables the select-tweaking behavior as per the
guidelines:
https://wiki.blender.org/wiki/Human_Interface_Guidelines/Selection#Select-tweaking.

We use this in most other other editors that allow selecting and
dragging multiple items. But besides the consistency improvement, this
is important if we want to support dragging multiple assets (or files)
in future. We want to support this at least for dragging multiple assets
into an asset catalog for the upcoming asset catalog UI.
2021-09-27 18:52:16 +02:00
c76ccd85be Cleanup: incorrect null check in asset library
Found by clang tidy (P2439).
2021-09-27 18:42:19 +02:00
11bfbc3337 Fix: Incorrect node socket name after recent refactor
Caused by rBc99cb814520480379
2021-09-27 11:21:33 -05:00
c75c08a737 Geometry Nodes: Distribute points once per instance reference
With this commit, the distribute points on faces node runs only
once for every unique mesh in its input. That means if there are
100 instances of the same mesh, it will only run once.

This basically reverts rB84a4f2ae68d408301. The optimization there
didn't end up being worth it in the end, since it complicates code
quite a lot. It's also incompatible with this method of dealing with
instances, and it breaks field evaluation for instances, where we would
have to make sure to handle each instance transform properly otherwise,
evaluating the field separately for every instance.

Differential Revision: https://developer.blender.org/D12630
2021-09-27 11:16:25 -05:00
efa9667c09 Cleanup: Asset catalogs, fix clang-tidy warning
Change `auto &catalog` to `const auto &catalog`.

No functional changes.
2021-09-27 18:14:44 +02:00
10061ee18a Asset Catalogs: write catalogs to disk when saving the blend file
The Asset Catalog Definition File is now saved whenever the blend file
is saved. The location of the CDF depends on where the blend file is
saved, and whether previously a CDF was already loaded, according to the
following rules. The first matching rule wins:

1. Already loaded a CDF from disk? -> Always write to that file.
2. The directory containing the blend file has a
   `blender_assets.cats.txt` file? -> Merge with & write to that file.
3. The directory containing the blend file is part of an asset library,
   as per the user's preferences? -> Merge with & write to
   `${ASSET_LIBRARY_ROOT}/blender_assets.cats.txt`
4. Create a new file `blender_assets.cats.txt` next to the blend file.
2021-09-27 18:14:21 +02:00
90aa0a5256 BKE Preferences: find asset library containing a path
Add `BKE_preferences_asset_library_containing_path(&U, some_path)` that
finds the asset library that contains the given path.

This is a simple linear search, returning the first asset library that
matches. There is no smartness when it comes to nested asset libraries
(like returning the library with the best-matching path), although this
could be a useful feature to add later.
2021-09-27 18:10:47 +02:00
aafbe111fc BLI Path: add function BLI_path_contains()
Add function `BLI_path_contains(container, containee)` that returns true
if and only `container` contains `containee`.

Paths are normalised and converted to native path separators before
comparing. Relative paths are *not* made absolute, to simplify the
function call; if this is necessary the caller has to do this conversion
first.
2021-09-27 18:10:47 +02:00
6578db57cd Fix T91691: Selecting "Remove unused slots" in Materials panel removes slots that are assigned to particle systems/hair.
`BKE_object_material_slot_used` would only check obdata usages, but
particle settings can also (weirdly enough) use objects' material slots.

So now, as its name suggests, `BKE_object_material_slot_used` does take
an object as parameter, and also checks for potential slot usage from
psys in the object.
2021-09-27 18:09:20 +02:00
824733ea47 Assets: Additions/fixes to the catalog system in preparation for the UI
* Fixes missing update of the catalog tree when adding catalogs.
* Adds iterators for the catalogs, needed for UI code.
* Store catalog ID in the catalog tree items, needed for UI code.
* Other smaller API additions for the UI.
* Improve comments and smaller cleanups.

New functions are covered with unit tests.

Differential Revision: https://developer.blender.org/D12618

Reviewed by: Sybren Stüvel
2021-09-27 17:49:31 +02:00
5bea5e25d5 Fix T91728: Cycles render artifacts with motion blur and object attributes 2021-09-27 17:40:03 +02:00
0559971ab3 Geometry Nodes: add utility to process all instances separately
This adds a new `GeometrySet::modify_geometry_sets` method that can be
used to update each sub-geometry-set separately without making any
instances real.

Differential Revision: https://developer.blender.org/D12650
2021-09-27 17:35:45 +02:00
2189dfd6e2 Cycles: Rework OptiX visibility flags handling
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).

To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).

This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.

Fixes T89801

Differential Revision: https://developer.blender.org/D12604
2021-09-27 17:12:43 +02:00
7270ba011c Fix two issues with recent new Append code.
* ID pointer returned by `wm_file_link_append_datablock_ex` was
  improperly extracted from `WMLinkAppendDataItem` before append step.

* Code deleting linked IDs when their local matching version was re-used
  did not properly clear `LIB_TAG_DOIT` beforehand.
2021-09-27 16:36:07 +02:00
4a0ddeb62b Nodes: Change wire color to increase contrast
If the theme used by the user did not touch the wire or the wire outline
colors this will update them as well.

This was supposed to be a part of a bigger UI theme change for 3.0. But
it was expedited because of the recent change in line thickness for the
noodles (2bd0205215).

Theme change by Pablo Vazquez.

Differential Revision: https://developer.blender.org/D12649
2021-09-27 16:09:40 +02:00
3d2ce25afd Geometry Nodes: support creating new attributes in modifier
This patch allows passing a field to the modifier as output. In the
modifier, the user can choose an attribute name. The attribute
will be filled with values computed by the field. This only works
for realized mesh/curve/point data. As mentioned in T91376, the
output domain is selected in the node group itself. We might want
to add this functionality to the modifier later as well, but not now.

Differential Revision: https://developer.blender.org/D12644
2021-09-27 15:39:47 +02:00
43167a2c25 Fix T90570: Constraint validity not updated with library overrides.
For some reason was assuming setting a property in RNA would call its
update callback if any, but this actually needs to be done separately.

So add this call to `rna_property_override_operation_apply`.
2021-09-27 15:33:58 +02:00
5949d598bc RNA: Make is clear that Scene parameter of update callback may be NULL.
There are cases where there is no way to ensure we do have/know about an
active scene. Further more, this should not be required to perform
'real' updates on data, only to perform additional special handling in
current scene (mostly related to editing tools, UI, etc.).

This pointer is actually almost never used in practice, and half of its current
 usages are fairly close to abuse of the system (like calls to
`ED_gpencil_tag_scene_gpencil` or `BKE_rigidbody_cache_reset`).

This commit ensures that the few places using this 'active scene' pointer are
safely handling the `NULL` case, and clearly document the fact that a
NULL scene pointer is valid.
2021-09-27 15:33:58 +02:00
b077f0684e RNA: Fix bad usages of scene pointer in Update callbacks.
Scene passed to the update callback is the active scene it //may// not
be that actual ID owner of the affected data (although in practice it
should always be currently).
2021-09-27 15:33:58 +02:00
a6b53ef994 Cycles: print name of kernels on errors in CUDA queue, for debugging 2021-09-27 15:24:12 +02:00
2bd0205215 Geometry Nodes: make field links thinner than other links
This makes it easier to spot which links contain fields and which
contain data. Actually, the patch makes all other links a bit thicker.
However, with soon-to-be-implemented theme changes, the
perceived thickness will be the same as before.

This is part of T91563.

Differential Revision: https://developer.blender.org/D12646
2021-09-27 15:10:28 +02:00
8fecc2a852 Increase VSE strip channels limit from 32 to 128
The original limit dates back from 2002 when Blender went open source.
After that many years some productions (e.g., Sprite Fright) are already
experiencing limitations for complex edits.

The future plans is to support an initial shorter (2?) number of
channels with support to "unlimited" channels.

Finally, I'm bumping the minimum file requirement since files with more
than 32 channels won't work well in old Blender versions.

In a future commit I will implement a sanitization so that we only read (and write)
128 channels. Making sure future changes of this number won't corrupt Blender.

Differential Revision: https://developer.blender.org/D12645
2021-09-27 14:58:03 +02:00
Peter Fog
0419c74ae8 VSE: Clamp resulting frame in multiply mode
The clamp added will ensure immediate speed direction change on
changing to/from positive/negative speed factor when using the Speed
effect strip's Multiply mode.

Reviewed By: ISS, sergey

Differential Revision: https://developer.blender.org/D12462
2021-09-27 14:41:56 +02:00
4a562f5077 Fix T91714: Cycles direct/indirect clamp distinction not working correctly 2021-09-27 14:25:22 +02:00
2a0db195c9 Fix viewport roll working wrong
Mistake in own {rB69893ef27c91}.
Was mixing screen on region coordinates.
2021-09-27 14:19:38 +02:00
e87783a5ec Cleanup: spelling in comments 2021-09-27 21:04:34 +10:00
95af9317f0 Cleanup: remove unnecessary use of MEM_SAFE_FREE macro 2021-09-27 21:04:34 +10:00
2c2e1b3d61 Fix T81922: Pose bones F-Curves hidden for unselected objects
Whilst in pose-mode, the selection filter only includes other objects in
pose-mode instead of the object selection.

This makes sense as the selection of the pose bones what the user as
acting on in the 3D view.
The object selection only makes sense to use in object mode.

Reviewed By: sybren

Maniphest Tasks: T81922

Ref D12494
2021-09-27 21:04:34 +10:00
32ffb858d6 Keymap: resolve conflict with use_alt_cursor/Ctrl-LMB to add/extrude
While the option allows tools be be activated on press instead of tweak,
this meant box-select was catching Ctrl-LMB which is also used for
add/extrude in edit mode.

Resolve this by always using tweak for selection tools,
only supporting activation on press for other tools.

Note that this doesn't impact the default configuration.
2021-09-27 21:04:34 +10:00
c618075541 Cleanup: make format 2021-09-27 12:43:54 +02:00
a13b9d20b5 Cleanup: Move asset library remove function next to add function
Better to keep such related operations close together in code.
2021-09-27 12:32:21 +02:00
d90f542b04 Cleanup: Remove function declaration without definition
There is no function definition for this declaration. Instead there is
`BKE_preferences_asset_library_remove()`.
2021-09-27 12:31:06 +02:00
5d5504d8a4 3DView: ability to cancel out of viewport roll
This adds the ability to cancel out of the roll using ESC or RMB
(which is not common for viewops -- but makes sense in the case of roll
I think). This resets the view as well as potential locked cameras to
the original orientations (but does not remove potential autokeys --
which no transform does on cancel btw.)

Maniphest Tasks: T89883

Differential Revision: https://developer.blender.org/D12582
2021-09-27 11:54:25 +02:00
69893ef27c 3DView: Use "real" angle for viewport roll
Since its introduction in {rB5c569d227b64}, the view roll was based on
horizontal movement only. Using a "real" angle not only feels more
natural but also has the benefit of getting more precission the further
away from the center you are (just like regular rotation, brush/stencil
rotation etc.). A similar thing has already been implemented in the
Grease Pencil Tools Addon, now make the blender standard roll the same.

Since this is not using the transform system, we are still lacking a
line in the viewport (this could be added but since this is always based
on the center of the view we dont necessarily need this), as well as the
additional Shift-extra-precission behavior.

Fixes T89883

Maniphest Tasks: T89883

Differential Revision: https://developer.blender.org/D12582
2021-09-27 11:53:26 +02:00
8dcddbcc07 Fix T91711: Blender 3.0 - The Rain demo scene breaks (Proxy to Override auto conversion).
Proxy conversion is a fairly particular case of liboverride creation, in
which remapping all local usages of linked data probably makes more
sense, rather than only doing so whitin the overridden 'group' of IDs.
2021-09-27 11:11:43 +02:00
30ef197c7b Kernel: allow unregistering BKE callback functions
Introduce `BKE_callback_remove()`, which undoes the effect of
`BKE_callback_add()`. It also respects `funcstore->alloc` by freeing the
removed `funcstore` when needed.

This allows for shorter-lived objects in memory to unregister their
callbacks at the end of their lifespan.

`BKE_callback_global_finalize()` has been adjusted so that the
responsibility "remove a callback" is given to one function only.

Reviewed by: campbellbarton

Differential Revision: https://developer.blender.org/D12625
2021-09-27 10:57:23 +02:00
617954c143 Geometry Nodes: new Instance on Points node
This adds a new Instance on Points node that is a replacement
for the old Point Instance node. Contrary to the old node,
it does not have a mode to instance objects or collections
directly. Instead, the node has to be used with an Object/
Collection Info to achieve the same effect.

Rotation and scale of the instances can be adjusted in the node
directly or can be controlled with a field to get some variation
between instances.

The node supports placing different instances on different points.
The user has control over which instance is placed on which point
using an Instance Index input. If that functionality is used, the
Instance Geometry has to contain multiple instances that can are
instanced separately.

Differential Revision: https://developer.blender.org/D12478
2021-09-27 10:17:17 +02:00
547f7d23ca Geometry Nodes: support outputting collection children as instances
This adds two new input sockets to the Collection Info node:
* `Separate Children`: When turned off, the entire collection is output as a single
  collection instance (same behavior as before). When turned on, each child of
  the collection is output as a separate instance (children can be objects and collections).
  Toggling this input should not change the visual transforms of the output geometry.
* `Reset Children`: Only used when `Separate Children` is on. When used, the transforms
  of the instances are reset to the origin. This is useful when one wants to e.g. instance
  the collection children separately in the upcoming instancing node.

Part of D12478.
2021-09-27 10:17:17 +02:00
William Leeson
f3ace5aa80 Fixes T91632 by stopping the sample correlation between dimensions which was causing rendering artifacts on simple scenes.
Fix T91632: Stops the sample correlation between dimensions which was causing rendering artefacts on simple scenes.

This is done by increasing the amount of jitter the Cranley Patterson Rotation is allowed to add. Also, it uses the y dimension of the of the sample table for 1D sampling which causes further decorrelation between dimensions. As an additional measure the x and y dimensions are swapped randomly to provide further decorrelation.

Maniphest Tasks: T91632

Differential Revision: https://developer.blender.org/D12610
2021-09-27 09:54:37 +02:00
fe49904646 Fix: wrong socket shape in Vector input node 2021-09-27 09:34:17 +02:00
ddb0dc2527 Fix knife tool missing refresh changing the lock axes 2021-09-27 16:28:01 +10:00
10a26d583d Fix knife tool using an invalid event value check
The events value was checked without checking the expected modal state.
2021-09-27 16:28:01 +10:00
Lasse Foster
037e66999a FIX: T91697 Eevee Generated texture coordinates completly missing
This patch fixes Eevee-regression https://developer.blender.org/T91697

Reviewed By: jbakker

Differential Revision: https://developer.blender.org/D12634
2021-09-27 08:07:23 +02:00
d2dda0e8b9 Fix T91694: VSE crashes when creating new scene
Crash happened due to NULL dereference. Add NULL checks.
2021-09-27 06:58:07 +02:00
Peter Fog
ad3e5d2bf5 VSE: Expose Zoom to Fit in all display modes
Zoom to Fit is working in all display modes, but was only exposed in
Image mode. This patch exposes it in all Image modes.

Reviewed By: ISS

Differential Revision: https://developer.blender.org/D12632
2021-09-27 04:40:05 +02:00
f9e0981976 Fix T91666: Missing pivot point in VSE
Add pivot point setting for combined timeline and preview mode.
2021-09-27 04:20:37 +02:00
d046a1f2fa Functions: fail early when multi-function throws an exception
Multi-functions are not allowed to throw exceptions that are not
caught in the same multi-function. Previously, it was difficult to
backtrack a crash to a previously thrown exception.
2021-09-26 23:28:14 +02:00
1cd8a438bb Cleanup: simplify field evaluation 2021-09-26 23:28:14 +02:00
88a2b054da Fix T91732: crash in Set Position node on empty mesh 2021-09-26 23:28:14 +02:00
93b36fad68 Knife: Expose XYZ axis locking in modal keymap
A small quality of life improvement that will allow users to change the keys used for axis locking.
2021-09-26 22:18:24 +01:00
80f7bc6d8e LineArt: Smooth tolerance value for chaining.
smooth out jaggy lines with a given threshold. For each point in a stroke, the ones with in a given distance of its previous segment will be removed,
thus "zig-zag" artefacts can be cleaned up.

Reviewed By: Antonio Vazquez (antoniov)

Differential Revision: https://developer.blender.org/D12050
2021-09-26 19:43:12 +08:00
505422220d Cleanup: use override/final for derived classes.
This will help detecting missing API changes. Those keywords were added
on classes which did not already use them. Also added missing
`accepts_object_type()` on NURBS reader.
2021-09-25 09:31:00 +02:00
43394e41a8 Fix Alembic point cloud streaming.
Point clouds are not imported and read anymore. This was caused by an
API change in rB128eb6cbe928e58dfee1c64f340fd8d663134c26 which was not
applied to `AbcPointsReader`. It did not cause a compile error as the
base class as a default implementation for this method.
2021-09-25 09:31:00 +02:00
ab09844be8 Cleanup: typos in code and comments.
No functional changes.
2021-09-25 09:31:00 +02:00
a3027fb094 Fix T91662: VSE Image overlay is drawn for backdrop
Don't draw image overlay in timeline, image manipulation only works in
preview.
2021-09-24 21:18:03 +02:00
b314d3e787 Fix T91639: Cycles crash rendering high resolution images with multiple passes
We were writing large 2048x2048 tiles into EXR files, which appears to cause
integer overflow inside the OpenEXR library when there are multiple passes. Now
use smaller tiles in the image file, while still rendering large tiles.

This adds the requirement that the render tile size must be a multiple of 128
or be smaller than 128, this is adjusted automatically.
2021-09-24 21:08:14 +02:00
2dd3968335 Geometry Nodes: Add versioning and legacy warning for random float node 2021-09-24 14:06:41 -05:00
Johnny Matthews
536f9eb82e Geometry Nodes: Random Value Node
This node replaces the deprecated Attribute Randomize node, populating
a vector, float, integer or boolean field with random values. Vector,
float, and integer have min/max settings, which are also field aware.
The boolean type has a probability value for controlling what portion
of the output should be true. All four types have a field seed input
which is implicitly driven by the index, otherwise, all values would
be the same "random" value.

The Random Float node is now deprecated like other nodes, since it is
redundant with this node.

Differential Revision: https://developer.blender.org/D12603
2021-09-24 14:03:42 -05:00
c87e6b23be Assets: Read catalogs immediately when loading a library
Until now, the asset catalogs would only show up after all assets from
the library were loaded. Now the catalogs are read first, which makes
them appear pretty much immediately. This makes the UI more responsive
and feel less heavy.

I added a dedicated file-list type for asset libraries now. While not
necessarily needed, I prefer that so asset library specific stuff can be
handled in there.
2021-09-24 20:17:04 +02:00
Erik Abrahamsson
be16794ba1 Geometry Nodes: String to Curves Node
This commit adds a node that generates a text paragraph as curve
instances. The inputs on the node control the overall shape of the
paragraph, and other nodes can be used to move the individual instances
afterwards. To output more than one line, the "Special Characters" node
can be used.

The node outputs instances instead of real geometry so that it doesn't
have to duplicate work for every character afterwards. This is much
more efficient, because all of the curve evaluation and nodes like fill
curve don't have to repeat the same calculation for every instance of
the same character.

In the future, the instances component will support attributes, and the
node can output attribute fields like "Word Index" and "Line Index".

Differential Revision: https://developer.blender.org/D11522
2021-09-24 12:41:49 -05:00
Erik Abrahamsson
5c0017e85a Geometry Nodes: Special string characters node
This patch adds a new node called "Special Characters" with two string
outputs: "Line Break" and "Tab". This is necessary because the newline
character cannot be easily typed with a keyboard, but is necessary for
the string to curve node.

Differential Revision: https://developer.blender.org/D12620
2021-09-24 11:00:21 -05:00
bdb8ee9717 Fix Cycles memory leak in baking, after recent changes 2021-09-24 17:46:34 +02:00
Johnny Matthews
ef45399f3b Nodes: initial support for socket tooltips
This adds initial limited support for socket tooltips. It's limited
in a couple of ways for now:
* Only works when hovering over the socket shape, not when hovering
  over the value in the socket.
* Only works for built-in nodes that already use the new node
  declaration system. This can later be extended to support pynodes.

Those limitations are well worth it for now, given that the
implementation is quite simple and the impact on usability is quite
large. More complex updates to the layout system, that would allow
showing socket tooltips in the nodes, can be done later. With the
current implementation we can at least start writing tooltips for
geometry nodes now.

This commit already adds tooltips for the Cylinder node as an example.

Differential Revision: https://developer.blender.org/D12607
2021-09-24 17:13:40 +02:00
4a2c63f4bd Nodes: hide socket value when input is a field implicitly 2021-09-24 16:42:20 +02:00
90b410fe74 Fix: field evaluation crash when the domain size is zero 2021-09-24 16:42:20 +02:00
25d4de92fa Cleanup: UUID, add some documenting comments
No functional changes.
2021-09-24 16:34:25 +02:00
c0db8e3b41 Fix T91660: Cycles remaining render time does not take into account time limit 2021-09-24 16:34:15 +02:00
585998987a Cycles: some steps towards getting standalone compiling again
Render output and display still need to be rewritten to work with the new
system.
2021-09-24 16:34:14 +02:00
ab8f24811d Cleanup: remove unused device code and includes 2021-09-24 16:34:14 +02:00
9cf593f305 GPencil: Reorganize list of modifiers
As we have a now a new `Modify` column, we move some modifers to this column.
2021-09-24 16:27:58 +02:00
45e432b894 GPencil: Change Proximity distance properties to distance type
This keep consistency with mesh modifer.
2021-09-24 16:16:40 +02:00
ede14b3856 GPencil: Invert weight in Weight Proximity modifier
In meshes modifer the Lowest is 0 and Highest is 1.0 and this was working inverted for grease pencil. Now, it works equals to meshes modifier.

Also changed the tooltip to keep consistency with meshes modifier.
2021-09-24 16:16:40 +02:00
95ec6e4dd3 Geometry Nodes: make index field more reusable
Some inputs will be the index field implicitly, so we want this
class to be available outside of `node_geo_input_index.cc`.
2021-09-24 16:03:07 +02:00
1a1c546124 Fix T91240: Object duplication was duplicating its action twice.
Weird that this was not reported before, this was creating a lot of
extra actions everytime...
2021-09-24 15:25:49 +02:00
e1e380ba38 Asset Catalogs: write catalogs sorted by path & UUID
When writing asset catalog definition files, order the catalogs by
(path, UUID). This ensures that every write produces the same file, playing
nice with versioning / synchronisation systems.
2021-09-24 14:42:48 +02:00
ab9644382d UUID: add less-than operator
Add `operator<` to C++ class to allow lexicographic ordering of UUIDs.

This will be necessary when writing asset catalogs to disk in a predictable
(i.e. ordered) manner.
2021-09-24 14:42:48 +02:00
2b9ca0f112 Codestyle: Add brackets around body of if statements. 2021-09-24 13:38:03 +02:00
7e904139a3 Nodes: make dot in socket shape circular
Previously, it was a diamond shape when the overall
shape was a diamond.
2021-09-24 13:33:28 +02:00
2b5733ff01 Fix T91192: Context.copy() crashes on file load
The `ui_list` lookup from 87c1c8112f
didn't account for the region being unset.
2021-09-24 21:10:02 +10:00
dfe01628b0 Cleanup: old-style-declaration warning 2021-09-24 21:10:01 +10:00
e8c6e32348 Asset Catalogs: fix trailing slash test case
The "path with trailing slash" test catalog was using the same path as
another catalog, which meant it was ignored after doing the path
cleanup. It's now different in the test file in SVN, so it'll actually
show up in the test.
2021-09-24 12:53:14 +02:00
59387aabe8 LibOverride: deprecate Proxies: Remove 'Make Proxy' operator. 2021-09-24 12:40:04 +02:00
501b0190d6 LibOverride: Deprecate Proxies: Add auto-conversion on file load.
This commit also add an experimental userPreferences to prevent proxies
conversions on file load, and reporting for amount of coverted proxies
(and possible issues).

Note that potentially linked proxies from other libraries are not
hamdled here (this feature seems to be broken anyway in master
currently?).
2021-09-24 12:40:04 +02:00
cdcdd2c479 LibOverride: Add utils to convert all proxies to overrides. 2021-09-24 12:40:03 +02:00
e7ae2840a5 Geometry Nodes: new Distribute Points on Faces node
This adds a replacement for the deprecated Point Distribute node.

Arguments for the name change can be found in T91155.
Descriptions of the sockets are available in D12536.

Thanks to Jarrett Johnson for the initial patch!

Differential Revision: https://developer.blender.org/D12536
2021-09-24 11:50:02 +02:00
d8a5b768f0 BLI: expose more noise hash functions in header
This is a follow up of the previous commit.
These functions are useful for other areas of Blender as well.
2021-09-24 11:24:15 +02:00
bc27bafa54 BLI: make noise hash functions available in header 2021-09-24 10:55:23 +02:00
7ca48a3814 Fix: incorrect socket shape for noise texture input 2021-09-24 10:55:23 +02:00
f8a0e102cf Weightpaint Gradient tool: expose falloff to the UI
By default, we'll always get a falloff like this from the tool:
{F10559413}

But in the context of using vertexgroups in modifiers/modeling, a choice
on how the gradient falloff of the Weightpaint Gradient tool is shaped
would be desirable:
"real" linear:
{F10559416}
Custom:
{F10559421}
{F10559428}

The way the Weightpaint gradient tool works is a bit outside the usual
tools that use brushes [even though it creates a brush on the fly in
`WPGradient_userData`].
However, it does not have an entry in `eBrushWeightPaintTool` and adding
one there does not play nice for the same reasons (not "really" being
integrated in the brush-based tools).

So in order to expose the brush curve in the UI, we would have to do one
of the following:
- [1] try to use `VIEW3D_PT_tools_brush_falloff`, for this to work:
-- make all kinds of exception in python super classes [`FalloffPanel`,
`BrushPanel`, `UnifiedPaintPanel`, ... -- including making real entries
in `eBrushWeightPaintTool`] to get a proper tool mode and...
-- .. to also make sure Falloff Shape and Front-Face Falloff are not
available [which the tool seems to just not support in its current form]
- [2] just have a simple, contained panel for this tool alone

This patch implements [2] and adds it as part of the ToolDef (could also
be done in `VIEW3D_HT_tool_header`, but again, I think this is nice to
keep separate from the usual tools)
{F10559482}
{F10559485}

Testfile:
{F10559442}

Fixes T91636

Maniphest Tasks: T91636

Differential Revision: https://developer.blender.org/D12614
2021-09-24 10:52:12 +02:00
3ac342dc6d Cleanup: clang format 2021-09-24 10:45:36 +02:00
9a45a4c525 Cleanup: asset catalogs, fix clang-tidy warning
Remove unnecessary call to `std::string::c_str()`.

No functional changes.
2021-09-24 10:29:52 +02:00
e161f39660 Cleanup: clang-tidy 2021-09-24 10:25:16 +02:00
Jeroen Bakker
fc7beac8d6 FileBrowser: Reduce Overhead Browsing Libraries.
When Browsing libraries the asset files were opened multiple times.
once to determine the needed groups to query and once for each
group to query the items in the group. For file browsing this makes sense
but for asset browsing this can be reduced.

This patch will load the asset files recursively and only opens them once.
Another change is that only the assets are requested and not filtered out
later in the process.

This patch is needed to simplify the library indexing. Where
we need access to the full library content.

## The numbers ##

Benchmarked by adding scenes of the spring open movie to the default
asset library. Refreshing the asset library would recursively load all the files
there.

| **8bc27c508a** | Processed 317 'directories/libraries' | 7.573986s |
| **Patch** | Processed 42 'directories/libraries' | 0.821013s |

{F10442811}

Reviewed By: mont29, Severin

Maniphest Tasks: T91406

Differential Revision: https://developer.blender.org/D12499
2021-09-24 08:41:40 +02:00
Jeroen Bakker
6a88f83d67 Hair Info Length Attribute
Goal is to add the length attribute to the Hair Info node, for better control over color gradients or similar along the hair.

Reviewed By: #eevee_viewport, brecht

Differential Revision: https://developer.blender.org/D10481
2021-09-24 07:44:22 +02:00
0f764ade1a Fix T91661: Vector rotate output socket diconnects on file load
An incorrect name was used in the socket declaration refactor.
2021-09-23 23:56:15 -05:00
0e039749e3 Fix: Incorrect field visualization for some shader nodes
These need to be tagged as function nodes in their declaration.
2021-09-23 22:23:14 -05:00
bc65c7d0e5 Cleanup: spelling in comments 2021-09-24 11:31:23 +10:00
599d96e8f9 UI: keymap preference tweaks
- Expose option to use shortcuts to activate tools as an enum.
- "Emulate 3 Button Mouse" now disables preferences
  that depend on Alt-LMB.
2021-09-24 11:07:01 +10:00
bffda4185d Cleanup: group convenience assignments in the keymap 2021-09-24 11:07:01 +10:00
26141664f0 Fix: Curve fill node doesn't fill real geometry with instances
A misplaced return in the middle of the function made it so
the node didn't fill real geometry.
2021-09-23 17:45:41 -05:00
f7608276e3 Audaspace: port bugfix from upstream.
Fixes T91615: Instant crash when dragging video file into video editor sequencer
2021-09-23 23:27:49 +02:00
61f3d4eb7c Geometry Nodes: Initial socket visualization for fields.
This implements the update logic for the vizualization of which
sockets pass data or constants directly, and which pass functions.
The socket shapes may still have to be updated. That should be
done separately, because it might be a bit more involved, because
socket shapes are currently linked to keyframe shapes. Currently
the circle and diamond shapes are used with the following meanings:

 - Input Sockets:
    - Circle: Required to be a single value.
    - Diamond: This input supports fields.
 - Output Sockets:
    - Circle: This output is a single value.
    - Diamond: This output may be a field.

Connecting a field to a circle input socket is an error, since a
field cannot be converted to a single value. If the socket shape
is a diamond with a dot in the middle, it means it is currently
a single value, but could be a field.

In addition to socket shapes, the intention is to draw node links
differently based on the field status. However, the exact method for
conveying that isn't decided yet.

Differential Revision: https://developer.blender.org/D12584
2021-09-23 15:21:31 -05:00
c1b925f7ff Fix build error caused by typo 2021-09-23 15:20:57 -05:00
1bdaf0ebec Fix T91638: image editor Open Cached Render not loading some passes
Previously this was only loading built-in render passes. Now instead of trying
to load the scene render passes, load whatever passes exist in the cache file.
2021-09-23 20:45:42 +02:00
Charlie Jolly
7fb2b50e5d Fix: Build issue with MSVC
header for std::function was not included

reported/fixed by Charlie on chat
2021-09-23 12:33:27 -06:00
fc2255135e Paint: prevent RenderResults and Viewers where unappropriate
Using a RenderResult (or a Viewer) was never really working (think you
cant get a real ImBuff from these) -- cannot use it as a clone, stencil
or canvas [Single Image paint texture slot].

In the case of using it as a 2D paint clone image this would also crash
[due to the Image Editor drawing refactor in 2.91].

Now [in the spirit of T73182 / D11179], prevent using these where
unappropriate by using rna pointer polling functions.

Also add a security check for the 2D paint clone image crash in case a
stencil ImBuff cannot be provided for some reason, but generally old
files are now patched in do_versions_after_linking_300 (thx @brecht!).

Fixes T91625.

Maniphest Tasks: T91625

Differential Revision: https://developer.blender.org/D12609
2021-09-23 20:11:45 +02:00
b8a30c7664 Cleanup: Use const in previously committed function 2021-09-23 19:39:26 +02:00
323fd80aad UI: Tree-View API for easy creation of tree UIs
This follows three main targets:

* Make creation of new tree UIs easy.
* Groundwork to generalize tree UIs (so e.g. Outliner, animation
  channels, asset catalogs and spreadsheet data-sets don't have to
  re-implement basic tree UI code) or even other data-view UIs.
* Better separate data and UI state. E.g. with this, tree-item selection
  or the open/collapsed state can be stored on the UI level, rather than
  in data. (Asset Catalogs need this, storing UI state info in them is
  not an option.)

In addition, the design should be well testable and could even be
exposed to Python.

Note that things will likely change in master still. E.g. the actually
resulting UI isn't very nice visually yet.

The design is documented here:
https://wiki.blender.org/wiki/Source/Interface/Views

Differential Revision: https://developer.blender.org/D12573
2021-09-23 19:33:54 +02:00
eb0eb54d96 Fix D12533: Simplify curve object to mesh conversion
This patch simplifies the curve object to mesh conversion
used by the object convert operator and exporters.

The existing code had a convoluted model of ownership, and did quite
a bit of unnecessary work. It also assumed that curve objects always
evaluated to a mesh, which is not the case anymore.

Now the code checks if the object it receives is evaluated. If so,
it can simply return a copy of the evaluated mesh (or convert the
evaluated curve wire edges to a mesh if there was no evaluated mesh).
If the object isn't evaluated, it uses a temporary copy of the object
with modifiers removed to create the mesh in the same way.

This follows up on the recent changes to curve evaluation,
namely that the result is always either a mesh or a wire curve.

Differential Revision: https://developer.blender.org/D12533
2021-09-23 11:41:46 -05:00
ed541de29d Fix T91626: Cycles sss behind fully transparent object renders differently 2021-09-23 18:32:30 +02:00
354c3eee40 Cycles: improve Auto Tile option description
Ref T91645
2021-09-23 18:32:30 +02:00
0a8a726014 bUUID: make it explicit the default constructor produces the nil value
The implicit default constructor zeroes all plain data fields, and now
this behaviour is explicit & tested for in a unit test.
2021-09-23 18:27:19 +02:00
18a4dc869d Cleanup: UUID, fix clang-tidy warnings
Use explicit `uint32_t` instead of `uint`, add a missing end-of-namespace
comment, and change `auto` to `const auto *`.

No functional changes.
2021-09-23 18:27:19 +02:00
502543e46b Geometry Nodes: remove old method to iterate over attributes
The previous commit added a new method to the same in a better way.
2021-09-23 17:59:44 +02:00
38af29df5c Geometry Nodes: simplify looping over attributes in geometry set
This adds three new methods:
* `InstancesComponent::foreach_reference_as_geometry(...)`
* `GeometrySet::attribute_foreach(...)`
* `GeometrySet::gather_attributes_for_propagation(...)`

The goal is that these iteration primitives can be used in places
where we use more specialized iterators currently.

Differential Revision: https://developer.blender.org/D12613
2021-09-23 17:59:44 +02:00
105115da9f UUID: add != operator for comparing UUIDs
Make it possible to unit test with `EXPECT_NE(uuid1, uuid2)`.
2021-09-23 17:58:20 +02:00
bd63944a73 UUID: place C++ code in correct namespace
Put the `bUUID` class in the `blender` namespace, instead of the
`blender::bke` namespace.

As a result, some C++ code now correctly uses the C++ class, where
previously it would use the C struct and use implicit casting where
necessary. As a result, support for initializer lists had to be
explicitly coded and in another place an explicit `::bUUID` was
necessary to avoid ambiguity.
2021-09-23 17:58:20 +02:00
942fc9f467 Cleanup: bUUID, document the constructors
No functional changes.
2021-09-23 17:58:20 +02:00
93997f9d0a Cleanup: asset catalogs, correct assertion message
There is no such thing as a "relative path" when it comes to asset catalog
paths (they're always absolute).

No functional changes.
2021-09-23 17:58:20 +02:00
d7f803f522 Fix T91641: crash rendering with 16k environment map in Cycles
Protect against integer overflow.
2021-09-23 17:48:16 +02:00
6279efbb78 Fix Cycles compiler warning on GCC 11
For shadow rays there are no closures, leave out the closure merging code
there to avoid warnings about accessing closure memory that does not exist.
2021-09-23 17:48:16 +02:00
d0493796a6 Cleanup: Remove hardcoded values and rename keyframe shape shaders
No functional change.

The shader is complicated by itself, having hardcoded values makes it
even more cryptic.

I also renamed the shader because the shader is not for the keyfarme diamond only,
but for all the keyframe shapes.

Differential Revision: https://developer.blender.org/D12615
2021-09-23 17:41:11 +02:00
f11bcb5a80 Fix T91557: Texture Paint Stencil doesnt use assigned UV Layer
Choosing a UV layer would actually affect the overlay in the viewport
and also painting with the mask brush was in that UV space, but the
resulting stencil mask was always applied with the active UV (not the
explicitly selected stencil UV -- the one one is looking at in the
viewport!) to painting.

This has been like that as far as I have checked back (at least 2.79b),
I am surprised this has not come up before, but it does not seem to make
sense at all...

Now use the UV specified for the stencil layer when applying the mask for
painting, so it corresponds to the stencil mask one is looking at in the
viewport.

Maniphest Tasks: T91557

Differential Revision: https://developer.blender.org/D12583
2021-09-23 16:56:03 +02:00
490425d56e Asset Catalogs: explicit version number in catalog definition files
Declare the current format used for asset catalog definition files as
version 1, and write that to the files. Files without that version number
will be rejected.

This makes it much easier to move to different versions later, with the
opportunity to do versioning on file load.

The version is not associated with any version of Blender, but a separate
integer that's simply incremented when a non-backward-compatible change
happens.
2021-09-23 16:06:25 +02:00
aa2493e2e7 Fix unused variable warning in release builds
The variable is only used in debug builds.
2021-09-23 15:04:41 +02:00
Sybren A. Stüvel
9b12b23d0b Assets: add Asset Catalog system
Catalogs work like directories on disk (without hard-/symlinks), in that
an asset is only contained in one catalog.

See T90066 for design considerations.

#### Known Limitations

Only a single catalog definition file (CDF), is supported, at
`${ASSET_LIBRARY_ROOT}/blender_assets.cats.txt`. In the future this is
to be expanded to support arbitrary CDFs (like one per blend file, one
per subdirectory, etc.).

The current implementation is based on the asset browser, which in
practice means that the asset browser owns the `AssetCatalogService`
instance for the selected asset library. In the future these instances
will be accessible via a less UI-bound asset system.

The UI is still very rudimentary, only showing the catalog ID for the
currently selected asset. Most notably, the loaded catalogs are not
shown yet. The UI is being implemented and will be merged soon.

#### Catalog Identifiers

Catalogs are internally identified by UUID. In older designs this was a
human-readable name, which has the problem that it has to be kept in
sync with its semantics (so when renaming a catalog from X to Y, the
UUID can be kept the same).

Since UUIDs don't communicate any human-readable information, the
mapping from catalog UUID to its path (stored in the Catalog Definition
File, CDF) is critical for understanding which asset is stored in which
human-readable catalog. To make this less critical, and to allow manual
data reconstruction after a CDF is lost/corrupted, each catalog also has
a "simple name" that's stored along with the UUID. This is also stored
on each asset, next to the catalog UUID.

#### Writing to Disk

Before saving asset catalogs to disk, the to-be-overwritten file gets
inspected. Any new catalogs that are found thre are loaded to memory
before writing the catalogs back to disk:

- Changed catalog path: in-memory data wins
- Catalogs deleted on disk: they are recreated based on in-memory data
- Catalogs deleted in memory: deleted on disk as well
- New catalogs on disk: are loaded and thus survive the overwriting

#### Tree Design

This implements the initial tree structure to load catalogs into. See
T90608, and the basic design in T90066.

Reviewed By: Severin

Maniphest Tasks: T91552

Differential Revision: https://developer.blender.org/D12589
2021-09-23 15:00:45 +02:00
222fd1abf0 Asset Browser: Disable metadata editing for external asset libraries
Buttons to edit asset metadata are now disabled for assets from an
external library (i.e. assets not stored in the current .blend file).
Their tooltips explain why they are disabled.

Had to do some RNA trickery to disable the metadata properties at RNA
level, not at UI script level.
The basic idea is:
* Local data-block assets set the data-block as owning ID for the asset
  metadata RNA pointer now.
* That way we can use the owner ID to see where the metadata belongs to
  and decide if it's editable that way.
* Additionaly, some Python operators needed better polling so they show
  as grayed out, and don't just fail.

One important thing: Custom properties of the metadata can still be
edited. The edits won't be saved however. Would be nice to disable that,
but it's currently not supported on BPY/IDProperty/RNA level.

Addresses T82943.

Differential Revision: https://developer.blender.org/D12127
2021-09-23 14:46:13 +02:00
059d01d42e Fileops: call BLI_path_slash_native() in BLI_dir_create_recursive()
Make the Windows version of `BLI_dir_create_recursive()` call
`BLI_path_slash_native()` before it tries to handle the path. This
should make it possible to call it with non-native path separators.

This change was provided by our Windows platform maintainer @LazyDodo in
P2414, so I assume he agrees with this change.
2021-09-23 14:42:14 +02:00
83975965a7 Keymap: RMB select option to place cursor on Alt-LMB
This allows most tools to activate on press, removing a small but
noticeable distance between the initial press and detection of a drag
event.

Gizmo tweak no longer actives when Alt is held, to avoid conflicting
with cursor placement.

This has the advantage that the gizmo doesn't get in the way when Alt
is used for activating tools or placing the cursor.
2021-09-23 22:10:56 +10:00
88692baace WM: gestures now activate immediately on mouse press
Some gestures were activating immediately on tweak events,
extend this to mouse-press and click-drag.

Without this change, box-select for example wouldn't be automatically
activated on mouse-press.
2021-09-23 22:08:02 +10:00
b659d1a560 Cleanup: spelling in comments 2021-09-23 22:08:02 +10:00
d431b91995 Cleanup: assign variable for LMB select 2021-09-23 22:06:47 +10:00
4d2ca33a8a LibLink: Modify WM API to link/append one ID to take flag parameter.
There is no reason to lock behavior into a specific configuration in
those calls, make them properly configurable like the rest of the
link/append code.

This also enable users of those functions to activate 'ID reuse'
behavior.
2021-09-23 13:48:41 +02:00
aadb7ef071 Geometry Nodes: add utility to print an attribute id. 2021-09-23 13:46:26 +02:00
e86cf55667 Fix T91629: Crash in "Open Cached Render" function 2021-09-23 13:03:36 +02:00
f48a4aa0f9 LibLink Append: Expose 'reuse ID' through new BLO flag, and add basic tests.
Option is now available to append operator, alsthough hidden and
disabled by default.
2021-09-23 12:56:05 +02:00
cb173d05dc LibLink Append: Fix 'reused ID' case keeping linked data around.
When we re-use a local ID, we need to delete the matching linked data.
2021-09-23 12:54:23 +02:00
6f53988e7a Cleanup: proper handling of LIB_TAG_DOIT in append code. 2021-09-23 12:46:51 +02:00
ce96a75c2c Cleanup: eBLOLibLinkFlags: Add 'space' between APPEND flags and INSTANCE ones. 2021-09-23 12:45:40 +02:00
3558ae3b6c Cleanup: Silence unused var warning in release builds. 2021-09-23 12:30:07 +02:00
b63f777950 Add missing GPencil IDTYPE_FLAGS_APPEND_IS_REUSABLE flag.
Forgot that one in rB794c2828af60.

Noted by Antonio Vazquez (@antoniov), thanks.
2021-09-23 11:28:39 +02:00
b801e86f8b Append: Reuse local data: remove limitation on directly selected data.
in asset context, when user drag & drop a material several time, they
would still expect to re-use existing one instead of getting new copies
of it, even if this material is directly appended (and not an indirect
dependency of an object e.g.).
2021-09-23 11:15:58 +02:00
3042994c91 Link/Append: Refactor flags.
Flags controlling link/append code are split between two enums, one in
`DNA_space_types.h` and one in `BLO_readfile.h`.

This commit:
 - Moves flags exclusively used in WM and BLO code to `eBLOLibLinkFlags`
   in `BLO_readfile.h`. Flags in `eFileSel_Params_Flag` from
   `DNA_space_types.h` are now only the ones effectively used by the
   file browser editor code too.
 - Fixes some internal utils in `readfile.c` still taking `short` flag
   parameter instead of proper `int` one.

NOTE: there are a few other flags that could probably be moved to
`eBLOLibLinkFlags` (at the very least `FILE_LINK`, probably also
`FILE_AUTOSELECT` and `FILE_ACTIVE_COLLECTION`), since those are not
effectively used by the file browser, and control linking/appending
behavior, not filebrowser behavior.

However for now think it's safer to not touch that.
2021-09-23 11:01:00 +02:00
12924ed573 Fix last Cycles tile highlighted while file from disk is being processed 2021-09-23 10:41:06 +02:00
a351023bd5 Fix default surface resolution U/V mis-match
The resolution for surfaces was 12 for U, 4 for V,
where both should have been set to 4.

Regression in 9a076dd95a
2021-09-23 15:54:32 +10:00
754d56dcc1 Cleanup: spelling in comments 2021-09-23 15:42:44 +10:00
bf948b2cef Custom Properties: Rewrite edit operator, improve UX
This commit changes the custom property edit operator to make editing
different properties types more obvious and expose more of the data,
made more easily possible by the recent UI data refactor.

Previously, the operator guessed the type you wanted based on what you
wrote in a text box. That was problematic, you couldn't make a string
property with a value of `1234`, and you had to know about the Python
syntax for lists in order to create an array property. It was also slow
and error prone; it was too easy to make a typo.

Improvements compared to the old operator:
 - A type drop-down to choose between the property types.
 - Step and precision values are exposed.
 - Buttons that have the correct type based on the property.
 - String properties no longer display min, max, etc. buttons.
 - Generally works in more cases. The old operator tended to break.
 - Choose array length with a slider.
 - Easy to choose to use python evaluation when necessary.
 - Code is commented, split up, and much easier to understand.

The custom property's value is purposefully not exposed, since the Edit
operator is for changing the property's metadata now, rather than the
value itself. Though in the "Python" mode the value is still available.

More improvements are possible in the future, like exposing different
subtypes, and improving the UI of the custom properties panel.

Differential Revision: https://developer.blender.org/D12435
2021-09-22 21:57:33 -05:00
6e77afe6ec Applying patch D12600, GSOC Knife Tools branch
This adds constrained angle mode improvements,
snapping to global and local orientation,
visible distance and angle measurements,
undo capability,
x-ray mode,
multi-object edit mode.

See https://developer.blender.org/D12600 for more details.

Note: this project moved some of the default keymappings
around a bit, as discussed with users in the thread
https://devtalk.blender.org/t/gsoc-2021-knife-tool-improvements-feedback/19047
We'll change the manual documentation in the next couple of days.
2021-09-22 21:23:44 -04:00
a78d3c5261 Fix: T91602 ffmpeg crash
Issue caused by our patch in rB1af722b81912
we replaced an array with a memory allocation
but we forgot to update the assert which now
used an invalid method to calculate the array
size.

SVN libs will have to be updated before
T91602 will be fixed for end users.
2021-09-22 17:29:23 -06:00
79bcc19240 Cleanup: Move more shader nodes to socket declaration API
I had to add `no_muted_links` to the declaration API. The name could
change there, but I think it's more obvious than "internal"
2021-09-22 17:34:09 -05:00
6fb4c8f040 Fix: Incorrect socket names after previous commit 2021-09-22 16:58:29 -05:00
c99cb81452 Cleanup: Use new node socket declaration API for some shader nodes
This converts socket declarations for most shader nodes that had already
been converted to C++, but skips those that needed a special flag.
2021-09-22 14:39:09 -05:00
bd01f4761c GPencil: Fix compiler warnings 2021-09-22 21:17:04 +02:00
ee49991999 Cleanup: Silence missing switch case warning 2021-09-22 12:00:21 -07:00
6611f2cb74 Fix T91607: GPencil Tint modifier "apply" removes the effect 2021-09-22 20:07:35 +02:00
f893dea586 BLI: support initializing empty FunctionRef with nullptr
This may sometimes be desired because it is more explicitely
shows that the `FunctionRef` will be empty.
2021-09-22 19:55:44 +02:00
02bde2c1d5 Cleanup: add using declarations 2021-09-22 19:45:47 +02:00
a28ec92088 BLI: avoid warning when copying empty StringRef 2021-09-22 19:45:18 +02:00
188de4bc31 BLI: initialize MutableSpan in default constructor 2021-09-22 19:44:58 +02:00
bd1c4a781f Geometry Nodes: fix evaluating field to span 2021-09-22 18:45:11 +02:00
3180c6b4a7 Cleanup: Themes: Remove invalid theme option
`font_kerning_style` was removed in rBa1e91fbef3dc9a5d5c8456cd9a887aac1bdb652c
2021-09-22 12:33:12 -04:00
e1a0983b3c Fix T91608: Cycles crash with tile size 0 2021-09-22 18:12:46 +02:00
4068b6b5a7 Fix T91598: Decreasing sample count causes viewport to reset
Differential Revision: https://developer.blender.org/D12601
2021-09-22 18:09:06 +02:00
Chandrapal Singh
5033310e8a UI: Add description for Batch rename
Added description for Batch rename which pop-ups when
hovering the mouse over "Batch rename" inside the edit menu.

Fixes T91390

Reviewed By: Blendify

Maniphest Tasks: T91390

Differential Revision: https://developer.blender.org/D12594
2021-09-22 11:35:29 -04:00
ac68b08c5b Fix T91592: Negative Cycles remaining render time
For the default startup was showing -14:-08.-48 as a remaining time.

Was an integer overflow when specifying total number of pixel-samples.
2021-09-22 17:30:00 +02:00
bc1e675bb9 Fix T91603: Cycles crash when volume becomes visible
Making object which uses volume shader invisible will mark the shader
as not having a volume, forcing re-compilation of the shader to bring
it back to a consistent state.

The compilation is happening as part of scene update, which needs to
know kernel features. So there is a feedback loop.

Use more relaxed way of knowing whether there is a volume in the
shader for the kernel features, which doesn't require shader to be
compiled first.

Solves issues from the report, but potentially causes extra memory
allocated if the volume part of graph is fully optimized out. This
downside is solvable, but would need to split scene update into two
steps (the one which requires on kernel, and the one which does not).
It will be an interesting project to tackle, but for a bug fix is
better to use simpler solution.
2021-09-22 17:10:06 +02:00
1eba32c3e9 Fix T91576: Cycles attribute node with name N for smooth normal no longer working
There was an optimization to remove duplicate storage of normals as attributes
when using normal maps. However for named attributes like this we still need to
store the attribute.

Don't request normal attribute from the normal map node now, instead of skipping it
in the geometry code.
2021-09-22 17:05:31 +02:00
4762a9b09f GPencil: Fix unreported missing material panel for new Scene
When a new scene is created, the paint pointers are not available before using them, so the python panel exits because the pointer was None.

Now, the pointer is checked in order to display the materials panel as expected.
2021-09-22 17:05:21 +02:00
794c2828af Initial implementation of local ID re-use when appending.
This commit adds to ID struct a new optional 'weak reference' to a
linked ID (in the form of a blend file library path and full ID name).

This can then be used on next append to try to find a matching local ID
instead of re-making the linked data local again.

Ref. T90545

NOTE: ID re-use will be disabled for regular append for the time being
(3.0 release), and only used for assets. Therefore, this commit should
not change anything user-wise.

Differential Revision: https://developer.blender.org/D12545
2021-09-22 16:55:39 +02:00
707bcd5693 Cleanup: make format 2021-09-22 16:55:09 +02:00
dilithjay
0d350e0193 Geometry Nodes: Curve Fillet Node
This node can be used to fillet splines at control points to
create a circular arc.
The implementation roughly follows T89227's design.

The node works in two main modes: Bezier and Poly
* Bezier: Creates a circular arc at vertices by changing
handle lengths (applicable only for Bezier splines).
* Poly: Creates a circular arc by creating vertices (as
many as defined by the Count fields input) along
the arc (applicable for all spline types).

In both modes, the radius of the created arc is defined
by the Radius fields input.
The Limit Radius attribute can be enabled to prevent
overlapping when the defined radius exceeds the maximum
possible radius for a given point.

Reviewed By: Hans Goudey

Differential Revision: https://developer.blender.org/D12115
2021-09-22 20:11:12 +05:30
817 changed files with 25385 additions and 6144 deletions

View File

@@ -419,6 +419,8 @@ mark_as_advanced(WITH_CYCLES_NATIVE_ONLY)
option(WITH_CYCLES_DEVICE_CUDA "Enable Cycles CUDA compute support" ON)
option(WITH_CYCLES_DEVICE_OPTIX "Enable Cycles OptiX support" ON)
option(WITH_CYCLES_DEVICE_HIP "Enable Cycles HIP support" OFF)
mark_as_advanced(WITH_CYCLES_DEVICE_HIP)
mark_as_advanced(WITH_CYCLES_DEVICE_CUDA)
option(WITH_CUDA_DYNLOAD "Dynamically load CUDA libraries at runtime" ON)
@@ -821,6 +823,11 @@ if(NOT WITH_CUDA_DYNLOAD)
endif()
endif()
if(WITH_CYCLES_DEVICE_HIP)
# Currently HIP must be dynamically loaded, this may change in future toolkits
set(WITH_HIP_DYNLOAD ON)
endif()
#-----------------------------------------------------------------------------
# Check check if submodules are cloned
@@ -1850,6 +1857,9 @@ elseif(WITH_CYCLES_STANDALONE)
if(WITH_CUDA_DYNLOAD)
add_subdirectory(extern/cuew)
endif()
if(WITH_HIP_DYNLOAD)
add_subdirectory(extern/hipew)
endif()
if(NOT WITH_SYSTEM_GLEW)
add_subdirectory(extern/glew)
endif()

View File

@@ -70,16 +70,18 @@
}
--- a/libavcodec/rl.c
+++ b/libavcodec/rl.c
@@ -71,7 +71,7 @@ av_cold void ff_rl_init(RLTable *rl,
@@ -71,17 +71,19 @@
av_cold void ff_rl_init_vlc(RLTable *rl, unsigned static_size)
{
int i, q;
- VLC_TYPE table[1500][2] = {{0}};
+ VLC_TYPE (*table)[2] = av_calloc(sizeof(VLC_TYPE), 1500 * 2);
VLC vlc = { .table = table, .table_allocated = static_size };
av_assert0(static_size <= FF_ARRAY_ELEMS(table));
- av_assert0(static_size <= FF_ARRAY_ELEMS(table));
+ av_assert0(static_size < 1500);
init_vlc(&vlc, 9, rl->n + 1, &rl->table_vlc[0][1], 4, 2, &rl->table_vlc[0][0], 4, 2, INIT_VLC_USE_NEW_STATIC);
@@ -80,8 +80,10 @@ av_cold void ff_rl_init_vlc(RLTable *rl, unsigned static_size)
for (q = 0; q < 32; q++) {
int qmul = q * 2;
int qadd = (q - 1) | 1;
@@ -91,7 +93,7 @@
if (q == 0) {
qmul = 1;
@@ -113,4 +115,5 @@ av_cold void ff_rl_init_vlc(RLTable *rl, unsigned static_size)
@@ -113,4 +115,5 @@
rl->rl_vlc[q][i].run = run;
}
}

View File

@@ -67,9 +67,12 @@ endif()
if(WITH_CYCLES OR WITH_COMPOSITOR OR WITH_OPENSUBDIV)
add_subdirectory(clew)
if(WITH_CUDA_DYNLOAD)
if((WITH_CYCLES_DEVICE_CUDA OR WITH_CYCLES_DEVICE_OPTIX) AND WITH_CUDA_DYNLOAD)
add_subdirectory(cuew)
endif()
if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
add_subdirectory(hipew)
endif()
endif()
if(WITH_GHOST_X11 AND WITH_GHOST_XDND)

View File

@@ -102,26 +102,30 @@ AUD_API int AUD_Sound_getFileStreams(AUD_Sound* sound, AUD_StreamInfo **stream_i
if(file)
{
auto streams = file->queryStreams();
size_t size = sizeof(AUD_StreamInfo) * streams.size();
if(!size)
try
{
auto streams = file->queryStreams();
size_t size = sizeof(AUD_StreamInfo) * streams.size();
if(!size)
{
*stream_infos = nullptr;
return 0;
}
*stream_infos = reinterpret_cast<AUD_StreamInfo*>(std::malloc(size));
std::memcpy(*stream_infos, streams.data(), size);
return streams.size();
}
catch(Exception&)
{
*stream_infos = nullptr;
return 0;
}
*stream_infos = reinterpret_cast<AUD_StreamInfo*>(std::malloc(size));
std::memcpy(*stream_infos, streams.data(), size);
return streams.size();
}
else
{
*stream_infos = nullptr;
return 0;
}
*stream_infos = nullptr;
return 0;
}
AUD_API sample_t* AUD_Sound_data(AUD_Sound* sound, int* length, AUD_Specs* specs)

39
extern/hipew/CMakeLists.txt vendored Normal file
View File

@@ -0,0 +1,39 @@
# ***** BEGIN GPL LICENSE BLOCK *****
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#
# The Original Code is Copyright (C) 2021, Blender Foundation
# All rights reserved.
# ***** END GPL LICENSE BLOCK *****
set(INC
.
include
)
set(INC_SYS
)
set(SRC
src/hipew.c
include/hipew.h
)
set(LIB
)
blender_add_lib(extern_hipew "${SRC}" "${INC}" "${INC_SYS}" "${LIB}")

1207
extern/hipew/include/hipew.h vendored Normal file

File diff suppressed because it is too large Load Diff

533
extern/hipew/src/hipew.c vendored Normal file
View File

@@ -0,0 +1,533 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License
*/
#ifdef _MSC_VER
# if _MSC_VER < 1900
# define snprintf _snprintf
# endif
# define popen _popen
# define pclose _pclose
# define _CRT_SECURE_NO_WARNINGS
#endif
#include <hipew.h>
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#ifdef _WIN32
# define WIN32_LEAN_AND_MEAN
# define VC_EXTRALEAN
# include <windows.h>
/* Utility macros. */
typedef HMODULE DynamicLibrary;
# define dynamic_library_open(path) LoadLibraryA(path)
# define dynamic_library_close(lib) FreeLibrary(lib)
# define dynamic_library_find(lib, symbol) GetProcAddress(lib, symbol)
#else
# include <dlfcn.h>
typedef void* DynamicLibrary;
# define dynamic_library_open(path) dlopen(path, RTLD_NOW)
# define dynamic_library_close(lib) dlclose(lib)
# define dynamic_library_find(lib, symbol) dlsym(lib, symbol)
#endif
#define _LIBRARY_FIND_CHECKED(lib, name) \
name = (t##name *)dynamic_library_find(lib, #name); \
assert(name);
#define _LIBRARY_FIND(lib, name) \
name = (t##name *)dynamic_library_find(lib, #name);
#define HIP_LIBRARY_FIND_CHECKED(name) \
_LIBRARY_FIND_CHECKED(hip_lib, name)
#define HIP_LIBRARY_FIND(name) _LIBRARY_FIND(hip_lib, name)
static DynamicLibrary hip_lib;
/* Function definitions. */
thipGetErrorName *hipGetErrorName;
thipInit *hipInit;
thipDriverGetVersion *hipDriverGetVersion;
thipGetDevice *hipGetDevice;
thipGetDeviceCount *hipGetDeviceCount;
thipDeviceGetName *hipDeviceGetName;
thipDeviceGetAttribute *hipDeviceGetAttribute;
thipDeviceComputeCapability *hipDeviceComputeCapability;
thipDevicePrimaryCtxRetain *hipDevicePrimaryCtxRetain;
thipDevicePrimaryCtxRelease *hipDevicePrimaryCtxRelease;
thipDevicePrimaryCtxSetFlags *hipDevicePrimaryCtxSetFlags;
thipDevicePrimaryCtxGetState *hipDevicePrimaryCtxGetState;
thipDevicePrimaryCtxReset *hipDevicePrimaryCtxReset;
thipCtxCreate *hipCtxCreate;
thipCtxDestroy *hipCtxDestroy;
thipCtxPushCurrent *hipCtxPushCurrent;
thipCtxPopCurrent *hipCtxPopCurrent;
thipCtxSetCurrent *hipCtxSetCurrent;
thipCtxGetCurrent *hipCtxGetCurrent;
thipCtxGetDevice *hipCtxGetDevice;
thipCtxGetFlags *hipCtxGetFlags;
thipCtxSynchronize *hipCtxSynchronize;
thipDeviceSynchronize *hipDeviceSynchronize;
thipCtxGetCacheConfig *hipCtxGetCacheConfig;
thipCtxSetCacheConfig *hipCtxSetCacheConfig;
thipCtxGetSharedMemConfig *hipCtxGetSharedMemConfig;
thipCtxSetSharedMemConfig *hipCtxSetSharedMemConfig;
thipCtxGetApiVersion *hipCtxGetApiVersion;
thipModuleLoad *hipModuleLoad;
thipModuleLoadData *hipModuleLoadData;
thipModuleLoadDataEx *hipModuleLoadDataEx;
thipModuleUnload *hipModuleUnload;
thipModuleGetFunction *hipModuleGetFunction;
thipModuleGetGlobal *hipModuleGetGlobal;
thipModuleGetTexRef *hipModuleGetTexRef;
thipMemGetInfo *hipMemGetInfo;
thipMalloc *hipMalloc;
thipMemAllocPitch *hipMemAllocPitch;
thipFree *hipFree;
thipMemGetAddressRange *hipMemGetAddressRange;
thipHostMalloc *hipHostMalloc;
thipHostFree *hipHostFree;
thipHostGetDevicePointer *hipHostGetDevicePointer;
thipHostGetFlags *hipHostGetFlags;
thipMallocManaged *hipMallocManaged;
thipDeviceGetByPCIBusId *hipDeviceGetByPCIBusId;
thipDeviceGetPCIBusId *hipDeviceGetPCIBusId;
thipMemcpyPeer *hipMemcpyPeer;
thipMemcpyHtoD *hipMemcpyHtoD;
thipMemcpyDtoH *hipMemcpyDtoH;
thipMemcpyDtoD *hipMemcpyDtoD;
thipDrvMemcpy2DUnaligned *hipDrvMemcpy2DUnaligned;
thipMemcpyParam2D *hipMemcpyParam2D;
thipDrvMemcpy3D *hipDrvMemcpy3D;
thipMemcpyHtoDAsync *hipMemcpyHtoDAsync;
thipMemcpyDtoHAsync *hipMemcpyDtoHAsync;
thipMemcpyParam2DAsync *hipMemcpyParam2DAsync;
thipDrvMemcpy3DAsync *hipDrvMemcpy3DAsync;
thipMemsetD8 *hipMemsetD8;
thipMemsetD16 *hipMemsetD16;
thipMemsetD32 *hipMemsetD32;
thipMemsetD8Async *hipMemsetD8Async;
thipMemsetD16Async *hipMemsetD16Async;
thipMemsetD32Async *hipMemsetD32Async;
thipArrayCreate *hipArrayCreate;
thipArrayDestroy *hipArrayDestroy;
thipArray3DCreate *hipArray3DCreate;
thipStreamCreateWithFlags *hipStreamCreateWithFlags;
thipStreamCreateWithPriority *hipStreamCreateWithPriority;
thipStreamGetPriority *hipStreamGetPriority;
thipStreamGetFlags *hipStreamGetFlags;
thipStreamWaitEvent *hipStreamWaitEvent;
thipStreamAddCallback *hipStreamAddCallback;
thipStreamQuery *hipStreamQuery;
thipStreamSynchronize *hipStreamSynchronize;
thipStreamDestroy *hipStreamDestroy;
thipEventCreateWithFlags *hipEventCreateWithFlags;
thipEventRecord *hipEventRecord;
thipEventQuery *hipEventQuery;
thipEventSynchronize *hipEventSynchronize;
thipEventDestroy *hipEventDestroy;
thipEventElapsedTime *hipEventElapsedTime;
thipFuncGetAttribute *hipFuncGetAttribute;
thipFuncSetCacheConfig *hipFuncSetCacheConfig;
thipModuleLaunchKernel *hipModuleLaunchKernel;
thipDrvOccupancyMaxActiveBlocksPerMultiprocessor *hipDrvOccupancyMaxActiveBlocksPerMultiprocessor;
thipDrvOccupancyMaxActiveBlocksPerMultiprocessorWithFlags *hipDrvOccupancyMaxActiveBlocksPerMultiprocessorWithFlags;
thipModuleOccupancyMaxPotentialBlockSize *hipModuleOccupancyMaxPotentialBlockSize;
thipTexRefSetArray *hipTexRefSetArray;
thipTexRefSetAddress *hipTexRefSetAddress;
thipTexRefSetAddress2D *hipTexRefSetAddress2D;
thipTexRefSetFormat *hipTexRefSetFormat;
thipTexRefSetAddressMode *hipTexRefSetAddressMode;
thipTexRefSetFilterMode *hipTexRefSetFilterMode;
thipTexRefSetFlags *hipTexRefSetFlags;
thipTexRefGetAddress *hipTexRefGetAddress;
thipTexRefGetArray *hipTexRefGetArray;
thipTexRefGetAddressMode *hipTexRefGetAddressMode;
thipTexObjectCreate *hipTexObjectCreate;
thipTexObjectDestroy *hipTexObjectDestroy;
thipDeviceCanAccessPeer *hipDeviceCanAccessPeer;
thipCtxEnablePeerAccess *hipCtxEnablePeerAccess;
thipCtxDisablePeerAccess *hipCtxDisablePeerAccess;
thipDeviceGetP2PAttribute *hipDeviceGetP2PAttribute;
thipGraphicsUnregisterResource *hipGraphicsUnregisterResource;
thipGraphicsMapResources *hipGraphicsMapResources;
thipGraphicsUnmapResources *hipGraphicsUnmapResources;
thipGraphicsResourceGetMappedPointer *hipGraphicsResourceGetMappedPointer;
thipGraphicsGLRegisterBuffer *hipGraphicsGLRegisterBuffer;
thipGLGetDevices *hipGLGetDevices;
static DynamicLibrary dynamic_library_open_find(const char **paths) {
int i = 0;
while (paths[i] != NULL) {
DynamicLibrary lib = dynamic_library_open(paths[i]);
if (lib != NULL) {
return lib;
}
++i;
}
return NULL;
}
/* Implementation function. */
static void hipewHipExit(void) {
if (hip_lib != NULL) {
/* Ignore errors. */
dynamic_library_close(hip_lib);
hip_lib = NULL;
}
}
static int hipewHipInit(void) {
/* Library paths. */
#ifdef _WIN32
/* Expected in c:/windows/system or similar, no path needed. */
const char *hip_paths[] = {"amdhip64.dll", NULL};
#elif defined(__APPLE__)
/* Default installation path. */
const char *hip_paths[] = {"", NULL};
#else
const char *hip_paths[] = {"/opt/rocm/hip/lib/libamdhip64.so", NULL};
#endif
static int initialized = 0;
static int result = 0;
int error, driver_version;
if (initialized) {
return result;
}
initialized = 1;
error = atexit(hipewHipExit);
if (error) {
result = HIPEW_ERROR_ATEXIT_FAILED;
return result;
}
/* Load library. */
hip_lib = dynamic_library_open_find(hip_paths);
if (hip_lib == NULL) {
result = HIPEW_ERROR_OPEN_FAILED;
return result;
}
/* Fetch all function pointers. */
HIP_LIBRARY_FIND_CHECKED(hipGetErrorName);
HIP_LIBRARY_FIND_CHECKED(hipInit);
HIP_LIBRARY_FIND_CHECKED(hipDriverGetVersion);
HIP_LIBRARY_FIND_CHECKED(hipGetDevice);
HIP_LIBRARY_FIND_CHECKED(hipGetDeviceCount);
HIP_LIBRARY_FIND_CHECKED(hipDeviceGetName);
HIP_LIBRARY_FIND_CHECKED(hipDeviceGetAttribute);
HIP_LIBRARY_FIND_CHECKED(hipDeviceComputeCapability);
HIP_LIBRARY_FIND_CHECKED(hipDevicePrimaryCtxRetain);
HIP_LIBRARY_FIND_CHECKED(hipDevicePrimaryCtxRelease);
HIP_LIBRARY_FIND_CHECKED(hipDevicePrimaryCtxSetFlags);
HIP_LIBRARY_FIND_CHECKED(hipDevicePrimaryCtxGetState);
HIP_LIBRARY_FIND_CHECKED(hipDevicePrimaryCtxReset);
HIP_LIBRARY_FIND_CHECKED(hipCtxCreate);
HIP_LIBRARY_FIND_CHECKED(hipCtxDestroy);
HIP_LIBRARY_FIND_CHECKED(hipCtxPushCurrent);
HIP_LIBRARY_FIND_CHECKED(hipCtxPopCurrent);
HIP_LIBRARY_FIND_CHECKED(hipCtxSetCurrent);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetCurrent);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetDevice);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetFlags);
HIP_LIBRARY_FIND_CHECKED(hipCtxSynchronize);
HIP_LIBRARY_FIND_CHECKED(hipDeviceSynchronize);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetCacheConfig);
HIP_LIBRARY_FIND_CHECKED(hipCtxSetCacheConfig);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetSharedMemConfig);
HIP_LIBRARY_FIND_CHECKED(hipCtxSetSharedMemConfig);
HIP_LIBRARY_FIND_CHECKED(hipCtxGetApiVersion);
HIP_LIBRARY_FIND_CHECKED(hipModuleLoad);
HIP_LIBRARY_FIND_CHECKED(hipModuleLoadData);
HIP_LIBRARY_FIND_CHECKED(hipModuleLoadDataEx);
HIP_LIBRARY_FIND_CHECKED(hipModuleUnload);
HIP_LIBRARY_FIND_CHECKED(hipModuleGetFunction);
HIP_LIBRARY_FIND_CHECKED(hipModuleGetGlobal);
HIP_LIBRARY_FIND_CHECKED(hipModuleGetTexRef);
HIP_LIBRARY_FIND_CHECKED(hipMemGetInfo);
HIP_LIBRARY_FIND_CHECKED(hipMalloc);
HIP_LIBRARY_FIND_CHECKED(hipMemAllocPitch);
HIP_LIBRARY_FIND_CHECKED(hipFree);
HIP_LIBRARY_FIND_CHECKED(hipMemGetAddressRange);
HIP_LIBRARY_FIND_CHECKED(hipHostMalloc);
HIP_LIBRARY_FIND_CHECKED(hipHostFree);
HIP_LIBRARY_FIND_CHECKED(hipHostGetDevicePointer);
HIP_LIBRARY_FIND_CHECKED(hipHostGetFlags);
HIP_LIBRARY_FIND_CHECKED(hipMallocManaged);
HIP_LIBRARY_FIND_CHECKED(hipDeviceGetByPCIBusId);
HIP_LIBRARY_FIND_CHECKED(hipDeviceGetPCIBusId);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyPeer);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyHtoD);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyDtoH);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyDtoD);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyParam2D);
HIP_LIBRARY_FIND_CHECKED(hipDrvMemcpy3D);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyHtoDAsync);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyDtoHAsync);
HIP_LIBRARY_FIND_CHECKED(hipDrvMemcpy2DUnaligned);
HIP_LIBRARY_FIND_CHECKED(hipMemcpyParam2DAsync);
HIP_LIBRARY_FIND_CHECKED(hipDrvMemcpy3DAsync);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD8);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD16);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD32);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD8Async);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD16Async);
HIP_LIBRARY_FIND_CHECKED(hipMemsetD32Async);
HIP_LIBRARY_FIND_CHECKED(hipArrayCreate);
HIP_LIBRARY_FIND_CHECKED(hipArrayDestroy);
HIP_LIBRARY_FIND_CHECKED(hipArray3DCreate);
HIP_LIBRARY_FIND_CHECKED(hipStreamCreateWithFlags);
HIP_LIBRARY_FIND_CHECKED(hipStreamCreateWithPriority);
HIP_LIBRARY_FIND_CHECKED(hipStreamGetPriority);
HIP_LIBRARY_FIND_CHECKED(hipStreamGetFlags);
HIP_LIBRARY_FIND_CHECKED(hipStreamWaitEvent);
HIP_LIBRARY_FIND_CHECKED(hipStreamAddCallback);
HIP_LIBRARY_FIND_CHECKED(hipStreamQuery);
HIP_LIBRARY_FIND_CHECKED(hipStreamSynchronize);
HIP_LIBRARY_FIND_CHECKED(hipStreamDestroy);
HIP_LIBRARY_FIND_CHECKED(hipEventCreateWithFlags);
HIP_LIBRARY_FIND_CHECKED(hipEventRecord);
HIP_LIBRARY_FIND_CHECKED(hipEventQuery);
HIP_LIBRARY_FIND_CHECKED(hipEventSynchronize);
HIP_LIBRARY_FIND_CHECKED(hipEventDestroy);
HIP_LIBRARY_FIND_CHECKED(hipEventElapsedTime);
HIP_LIBRARY_FIND_CHECKED(hipFuncGetAttribute);
HIP_LIBRARY_FIND_CHECKED(hipFuncSetCacheConfig);
HIP_LIBRARY_FIND_CHECKED(hipModuleLaunchKernel);
HIP_LIBRARY_FIND_CHECKED(hipModuleOccupancyMaxPotentialBlockSize);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetArray);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetAddress);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetAddress2D);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetFormat);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetAddressMode);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetFilterMode);
HIP_LIBRARY_FIND_CHECKED(hipTexRefSetFlags);
HIP_LIBRARY_FIND_CHECKED(hipTexRefGetAddress);
HIP_LIBRARY_FIND_CHECKED(hipTexRefGetAddressMode);
HIP_LIBRARY_FIND_CHECKED(hipTexObjectCreate);
HIP_LIBRARY_FIND_CHECKED(hipTexObjectDestroy);
HIP_LIBRARY_FIND_CHECKED(hipDeviceCanAccessPeer);
HIP_LIBRARY_FIND_CHECKED(hipCtxEnablePeerAccess);
HIP_LIBRARY_FIND_CHECKED(hipCtxDisablePeerAccess);
HIP_LIBRARY_FIND_CHECKED(hipDeviceGetP2PAttribute);
#ifdef _WIN32
HIP_LIBRARY_FIND_CHECKED(hipGraphicsUnregisterResource);
HIP_LIBRARY_FIND_CHECKED(hipGraphicsMapResources);
HIP_LIBRARY_FIND_CHECKED(hipGraphicsUnmapResources);
HIP_LIBRARY_FIND_CHECKED(hipGraphicsResourceGetMappedPointer);
HIP_LIBRARY_FIND_CHECKED(hipGraphicsGLRegisterBuffer);
HIP_LIBRARY_FIND_CHECKED(hipGLGetDevices);
#endif
result = HIPEW_SUCCESS;
return result;
}
int hipewInit(hipuint32_t flags) {
int result = HIPEW_SUCCESS;
if (flags & HIPEW_INIT_HIP) {
result = hipewHipInit();
if (result != HIPEW_SUCCESS) {
return result;
}
}
return result;
}
const char *hipewErrorString(hipError_t result) {
switch (result) {
case hipSuccess: return "No errors";
case hipErrorInvalidValue: return "Invalid value";
case hipErrorOutOfMemory: return "Out of memory";
case hipErrorNotInitialized: return "Driver not initialized";
case hipErrorDeinitialized: return "Driver deinitialized";
case hipErrorProfilerDisabled: return "Profiler disabled";
case hipErrorProfilerNotInitialized: return "Profiler not initialized";
case hipErrorProfilerAlreadyStarted: return "Profiler already started";
case hipErrorProfilerAlreadyStopped: return "Profiler already stopped";
case hipErrorNoDevice: return "No HIP-capable device available";
case hipErrorInvalidDevice: return "Invalid device";
case hipErrorInvalidImage: return "Invalid kernel image";
case hipErrorInvalidContext: return "Invalid context";
case hipErrorContextAlreadyCurrent: return "Context already current";
case hipErrorMapFailed: return "Map failed";
case hipErrorUnmapFailed: return "Unmap failed";
case hipErrorArrayIsMapped: return "Array is mapped";
case hipErrorAlreadyMapped: return "Already mapped";
case hipErrorNoBinaryForGpu: return "No binary for GPU";
case hipErrorAlreadyAcquired: return "Already acquired";
case hipErrorNotMapped: return "Not mapped";
case hipErrorNotMappedAsArray: return "Mapped resource not available for access as an array";
case hipErrorNotMappedAsPointer: return "Mapped resource not available for access as a pointer";
case hipErrorECCNotCorrectable: return "Uncorrectable ECC error detected";
case hipErrorUnsupportedLimit: return "hipLimit_t not supported by device";
case hipErrorContextAlreadyInUse: return "Context already in use";
case hipErrorPeerAccessUnsupported: return "Peer access unsupported";
case hipErrorInvalidKernelFile: return "Invalid ptx";
case hipErrorInvalidGraphicsContext: return "Invalid graphics context";
case hipErrorInvalidSource: return "Invalid source";
case hipErrorFileNotFound: return "File not found";
case hipErrorSharedObjectSymbolNotFound: return "Link to a shared object failed to resolve";
case hipErrorSharedObjectInitFailed: return "Shared object initialization failed";
case hipErrorOperatingSystem: return "Operating system";
case hipErrorInvalidHandle: return "Invalid handle";
case hipErrorNotFound: return "Not found";
case hipErrorNotReady: return "HIP not ready";
case hipErrorIllegalAddress: return "Illegal address";
case hipErrorLaunchOutOfResources: return "Launch exceeded resources";
case hipErrorLaunchTimeOut: return "Launch exceeded timeout";
case hipErrorPeerAccessAlreadyEnabled: return "Peer access already enabled";
case hipErrorPeerAccessNotEnabled: return "Peer access not enabled";
case hipErrorSetOnActiveProcess: return "Primary context active";
case hipErrorAssert: return "Assert";
case hipErrorHostMemoryAlreadyRegistered: return "Host memory already registered";
case hipErrorHostMemoryNotRegistered: return "Host memory not registered";
case hipErrorLaunchFailure: return "Launch failed";
case hipErrorCooperativeLaunchTooLarge: return "Cooperative launch too large";
case hipErrorNotSupported: return "Not supported";
case hipErrorUnknown: return "Unknown error";
default: return "Unknown HIP error value";
}
}
static void path_join(const char *path1,
const char *path2,
int maxlen,
char *result) {
#if defined(WIN32) || defined(_WIN32)
const char separator = '\\';
#else
const char separator = '/';
#endif
int n = snprintf(result, maxlen, "%s%c%s", path1, separator, path2);
if (n != -1 && n < maxlen) {
result[n] = '\0';
}
else {
result[maxlen - 1] = '\0';
}
}
static int path_exists(const char *path) {
struct stat st;
if (stat(path, &st)) {
return 0;
}
return 1;
}
const char *hipewCompilerPath(void) {
#ifdef _WIN32
const char *hipPath = getenv("HIP_ROCCLR_HOME");
const char *windowsCommand = "perl ";
const char *executable = "bin/hipcc";
static char hipcc[65536];
static char finalCommand[65536];
if(hipPath) {
path_join(hipPath, executable, sizeof(hipcc), hipcc);
if(path_exists(hipcc)) {
snprintf(finalCommand, sizeof(hipcc), "%s %s", windowsCommand, hipcc);
return finalCommand;
} else {
printf("Could not find hipcc. Make sure HIP_ROCCLR_HOME points to the directory holding /bin/hipcc");
}
}
#else
const char *hipPath = "opt/rocm/hip/bin";
const char *executable = "hipcc";
static char hipcc[65536];
if(hipPath) {
path_join(hipPath, executable, sizeof(hipcc), hipcc);
if(path_exists(hipcc)){
return hipcc;
}
}
#endif
{
#ifdef _WIN32
FILE *handle = popen("where hipcc", "r");
#else
FILE *handle = popen("which hipcc", "r");
#endif
if (handle) {
char buffer[4096] = {0};
int len = fread(buffer, 1, sizeof(buffer) - 1, handle);
buffer[len] = '\0';
pclose(handle);
if (buffer[0]) {
return "hipcc";
}
}
}
return NULL;
}
int hipewCompilerVersion(void) {
const char *path = hipewCompilerPath();
const char *marker = "Hip compilation tools, release ";
FILE *pipe;
int major, minor;
char *versionstr;
char buf[128];
char output[65536] = "\0";
char command[65536] = "\0";
if (path == NULL) {
return 0;
}
/* get --version output */
strcat(command, "\"");
strncat(command, path, sizeof(command) - 1);
strncat(command, "\" --version", sizeof(command) - strlen(path) - 1);
pipe = popen(command, "r");
if (!pipe) {
fprintf(stderr, "HIP: failed to run compiler to retrieve version");
return 0;
}
while (!feof(pipe)) {
if (fgets(buf, sizeof(buf), pipe) != NULL) {
strncat(output, buf, sizeof(output) - strlen(output) - 1);
}
}
pclose(pipe);
return 40;
}

View File

@@ -2,4 +2,5 @@ Project: TinyGLTF
URL: https://github.com/syoyo/tinygltf
License: MIT
Upstream version: 2.5.0, 19a41d20ec0
Local modifications: None
Local modifications:
* Silence "enum value not handled in switch" warnings due to JSON dependency.

View File

@@ -3201,6 +3201,7 @@ static bool ParseJsonAsValue(Value *ret, const json &o) {
val = Value(o.get<double>());
break;
case json::value_t::null:
case json::value_t::binary:
case json::value_t::discarded:
// default:
break;

View File

@@ -297,6 +297,7 @@ endif()
if(WITH_CYCLES_STANDALONE)
set(WITH_CYCLES_DEVICE_CUDA TRUE)
set(WITH_CYCLES_DEVICE_HIP TRUE)
endif()
# TODO(sergey): Consider removing it, only causes confusion in interface.
set(WITH_CYCLES_DEVICE_MULTI TRUE)

View File

@@ -53,7 +53,7 @@ struct Options {
SessionParams session_params;
bool quiet;
bool show_help, interactive, pause;
string output_path;
string output_filepath;
} options;
static void session_print(const string &str)
@@ -160,7 +160,7 @@ static void session_init()
/* load scene */
scene_init();
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
options.session->start();
}
@@ -222,9 +222,7 @@ static void display_info(Progress &progress)
static void display()
{
static DeviceDrawParams draw_params = DeviceDrawParams();
options.session->draw(session_buffer_params(), draw_params);
options.session->draw();
display_info(options.session->progress);
}
@@ -254,7 +252,7 @@ static void motion(int x, int y, int button)
options.session->scene->camera->need_flags_update = true;
options.session->scene->camera->need_device_update = true;
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
}
}
@@ -271,7 +269,7 @@ static void resize(int width, int height)
options.session->scene->camera->need_flags_update = true;
options.session->scene->camera->need_device_update = true;
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
}
}
@@ -283,7 +281,7 @@ static void keyboard(unsigned char key)
/* Reset */
else if (key == 'r')
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
/* Cancel */
else if (key == 27) // escape
@@ -320,7 +318,7 @@ static void keyboard(unsigned char key)
options.session->scene->camera->need_flags_update = true;
options.session->scene->camera->need_device_update = true;
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
}
/* Set Max Bounces */
@@ -346,7 +344,7 @@ static void keyboard(unsigned char key)
options.session->scene->integrator->set_max_bounce(bounce);
options.session->reset(session_buffer_params(), options.session_params.samples);
options.session->reset(options.session_params, session_buffer_params());
}
}
#endif
@@ -361,11 +359,13 @@ static int files_parse(int argc, const char *argv[])
static void options_parse(int argc, const char **argv)
{
options.width = 0;
options.height = 0;
options.width = 1024;
options.height = 512;
options.filepath = "";
options.session = NULL;
options.quiet = false;
options.session_params.use_auto_tile = false;
options.session_params.tile_size = 0;
/* device names */
string device_names = "";
@@ -411,7 +411,7 @@ static void options_parse(int argc, const char **argv)
&options.session_params.samples,
"Number of samples to render",
"--output %s",
&options.output_path,
&options.output_filepath,
"File path to write output image",
"--threads %d",
&options.session_params.threads,
@@ -422,12 +422,9 @@ static void options_parse(int argc, const char **argv)
"--height %d",
&options.height,
"Window height in pixel",
"--tile-width %d",
&options.session_params.tile_size.x,
"Tile width in pixels",
"--tile-height %d",
&options.session_params.tile_size.y,
"Tile height in pixels",
"--tile-size %d",
&options.session_params.tile_size,
"Tile size in pixels",
"--list-devices",
&list,
"List information about all available devices",
@@ -489,8 +486,9 @@ static void options_parse(int argc, const char **argv)
options.session_params.background = true;
#endif
/* Use progressive rendering */
options.session_params.progressive = true;
if (options.session_params.tile_size > 0) {
options.session_params.use_auto_tile = true;
}
/* find matching device */
DeviceType device_type = Device::type_from_string(devicename.c_str());

View File

@@ -95,6 +95,9 @@ set(ADDON_FILES
add_definitions(${GL_DEFINITIONS})
if(WITH_CYCLES_DEVICE_HIP)
add_definitions(-DWITH_HIP)
endif()
if(WITH_MOD_FLUID)
add_definitions(-DWITH_FLUID)
endif()

View File

@@ -28,7 +28,7 @@ def _configure_argument_parser():
action='store_true')
parser.add_argument("--cycles-device",
help="Set the device to use for Cycles, overriding user preferences and the scene setting."
"Valid options are 'CPU', 'CUDA' or 'OPTIX'."
"Valid options are 'CPU', 'CUDA', 'OPTIX', or 'HIP'"
"Additionally, you can append '+CPU' to any GPU type for hybrid rendering.",
default=None)
return parser

View File

@@ -111,6 +111,7 @@ enum_device_type = (
('CPU', "CPU", "CPU", 0),
('CUDA', "CUDA", "CUDA", 1),
('OPTIX', "OptiX", "OptiX", 3),
("HIP", "HIP", "HIP", 4)
)
enum_texture_limit = (
@@ -739,14 +740,14 @@ class CyclesRenderSettings(bpy.types.PropertyGroup):
use_auto_tile: BoolProperty(
name="Auto Tiles",
description="Automatically split image into tiles",
description="Automatically render high resolution images in tiles to reduce memory usage, using the specified tile size. Tiles are cached to disk while rendering to save memory",
default=True,
)
tile_size: IntProperty(
name="Tile Size",
default=2048,
description="",
min=0, max=16384,
min=8, max=16384,
)
# Various fine-tuning debug flags
@@ -1266,12 +1267,16 @@ class CyclesPreferences(bpy.types.AddonPreferences):
def get_device_types(self, context):
import _cycles
has_cuda, has_optix = _cycles.get_device_types()
has_cuda, has_optix, has_hip = _cycles.get_device_types()
list = [('NONE', "None", "Don't use compute device", 0)]
if has_cuda:
list.append(('CUDA', "CUDA", "Use CUDA for GPU acceleration", 1))
if has_optix:
list.append(('OPTIX', "OptiX", "Use OptiX for GPU acceleration", 3))
if has_hip:
list.append(('HIP', "HIP", "Use HIP for GPU acceleration", 4))
return list
compute_device_type: EnumProperty(
@@ -1296,7 +1301,7 @@ class CyclesPreferences(bpy.types.AddonPreferences):
def update_device_entries(self, device_list):
for device in device_list:
if not device[1] in {'CUDA', 'OPTIX', 'CPU'}:
if not device[1] in {'CUDA', 'OPTIX', 'CPU', 'HIP'}:
continue
# Try to find existing Device entry
entry = self.find_existing_device_entry(device)
@@ -1330,7 +1335,7 @@ class CyclesPreferences(bpy.types.AddonPreferences):
elif entry.type == 'CPU':
cpu_devices.append(entry)
# Extend all GPU devices with CPU.
if compute_device_type != 'CPU':
if compute_device_type != 'CPU' and compute_device_type != 'HIP':
devices.extend(cpu_devices)
return devices
@@ -1340,7 +1345,7 @@ class CyclesPreferences(bpy.types.AddonPreferences):
import _cycles
# Ensure `self.devices` is not re-allocated when the second call to
# get_devices_for_type is made, freeing items from the first list.
for device_type in ('CUDA', 'OPTIX', 'OPENCL'):
for device_type in ('CUDA', 'OPTIX', 'HIP'):
self.update_device_entries(_cycles.available_devices(device_type))
# Deprecated: use refresh_devices instead.

View File

@@ -99,6 +99,11 @@ def use_cuda(context):
return (get_device_type(context) == 'CUDA' and cscene.device == 'GPU')
def use_hip(context):
cscene = context.scene.cycles
return (get_device_type(context) == 'HIP' and cscene.device == 'GPU')
def use_optix(context):
cscene = context.scene.cycles
@@ -613,8 +618,8 @@ class CYCLES_RENDER_PT_performance_threads(CyclesButtonsPanel, Panel):
sub.prop(rd, "threads")
class CYCLES_RENDER_PT_performance_tiles(CyclesButtonsPanel, Panel):
bl_label = "Tiles"
class CYCLES_RENDER_PT_performance_memory(CyclesButtonsPanel, Panel):
bl_label = "Memory"
bl_parent_id = "CYCLES_RENDER_PT_performance"
def draw(self, context):
@@ -2107,7 +2112,7 @@ classes = (
CYCLES_RENDER_PT_film_transparency,
CYCLES_RENDER_PT_performance,
CYCLES_RENDER_PT_performance_threads,
CYCLES_RENDER_PT_performance_tiles,
CYCLES_RENDER_PT_performance_memory,
CYCLES_RENDER_PT_performance_acceleration_structure,
CYCLES_RENDER_PT_performance_final_render,
CYCLES_RENDER_PT_performance_viewport,

View File

@@ -283,10 +283,13 @@ static void ExportCurveSegments(Scene *scene, Hair *hair, ParticleCurveData *CDa
return;
Attribute *attr_intercept = NULL;
Attribute *attr_length = NULL;
Attribute *attr_random = NULL;
if (hair->need_attribute(scene, ATTR_STD_CURVE_INTERCEPT))
attr_intercept = hair->attributes.add(ATTR_STD_CURVE_INTERCEPT);
if (hair->need_attribute(scene, ATTR_STD_CURVE_LENGTH))
attr_length = hair->attributes.add(ATTR_STD_CURVE_LENGTH);
if (hair->need_attribute(scene, ATTR_STD_CURVE_RANDOM))
attr_random = hair->attributes.add(ATTR_STD_CURVE_RANDOM);
@@ -336,6 +339,10 @@ static void ExportCurveSegments(Scene *scene, Hair *hair, ParticleCurveData *CDa
num_curve_keys++;
}
if (attr_length != NULL) {
attr_length->add(CData->curve_length[curve]);
}
if (attr_random != NULL) {
attr_random->add(hash_uint2_to_float(num_curves, 0));
}
@@ -657,11 +664,15 @@ static void export_hair_curves(Scene *scene, Hair *hair, BL::Hair b_hair)
/* Add requested attributes. */
Attribute *attr_intercept = NULL;
Attribute *attr_length = NULL;
Attribute *attr_random = NULL;
if (hair->need_attribute(scene, ATTR_STD_CURVE_INTERCEPT)) {
attr_intercept = hair->attributes.add(ATTR_STD_CURVE_INTERCEPT);
}
if (hair->need_attribute(scene, ATTR_STD_CURVE_LENGTH)) {
attr_length = hair->attributes.add(ATTR_STD_CURVE_LENGTH);
}
if (hair->need_attribute(scene, ATTR_STD_CURVE_RANDOM)) {
attr_random = hair->attributes.add(ATTR_STD_CURVE_RANDOM);
}
@@ -714,6 +725,10 @@ static void export_hair_curves(Scene *scene, Hair *hair, BL::Hair b_hair)
}
}
if (attr_length) {
attr_length->add(length);
}
/* Random number per curve. */
if (attr_random != NULL) {
attr_random->add(hash_uint2_to_float(b_curve.index(), 0));

View File

@@ -26,6 +26,7 @@ enum ComputeDevice {
COMPUTE_DEVICE_CPU = 0,
COMPUTE_DEVICE_CUDA = 1,
COMPUTE_DEVICE_OPTIX = 3,
COMPUTE_DEVICE_HIP = 4,
COMPUTE_DEVICE_NUM
};
@@ -81,6 +82,9 @@ DeviceInfo blender_device_info(BL::Preferences &b_preferences, BL::Scene &b_scen
else if (compute_device == COMPUTE_DEVICE_OPTIX) {
mask |= DEVICE_MASK_OPTIX;
}
else if (compute_device == COMPUTE_DEVICE_HIP) {
mask |= DEVICE_MASK_HIP;
}
vector<DeviceInfo> devices = Device::available_devices(mask);
/* Match device preferences and available devices. */

View File

@@ -80,8 +80,10 @@ Geometry *BlenderSync::sync_geometry(BL::Depsgraph &b_depsgraph,
{
/* Test if we can instance or if the object is modified. */
Geometry::Type geom_type = determine_geom_type(b_ob_info, use_particle_hair);
BL::ID b_key_id = (BKE_object_is_modified(b_ob_info.real_object)) ? b_ob_info.real_object :
b_ob_info.object_data;
BL::ID b_key_id = (b_ob_info.is_real_object_data() &&
BKE_object_is_modified(b_ob_info.real_object)) ?
b_ob_info.real_object :
b_ob_info.object_data;
GeometryKey key(b_key_id.ptr.data, geom_type);
/* Find shader indices. */

View File

@@ -485,12 +485,6 @@ void BlenderGPUDisplay::do_draw(const GPUDisplayParams &params)
/* See do_update_begin() for why no locking is required here. */
const bool transparent = true; // TODO(sergey): Derive this from Film.
if (texture_.need_clear) {
/* Texture is requested to be cleared and was not yet cleared.
* Do early return which should be equivalent of drawing all-zero texture. */
return;
}
if (!gl_draw_resources_ensure()) {
return;
}
@@ -499,6 +493,16 @@ void BlenderGPUDisplay::do_draw(const GPUDisplayParams &params)
gl_context_mutex_.lock();
}
if (texture_.need_clear) {
/* Texture is requested to be cleared and was not yet cleared.
*
* Do early return which should be equivalent of drawing all-zero texture.
* Watchout for the lock though so that the clear happening during update is properly
* synchronized here. */
gl_context_mutex_.unlock();
return;
}
if (gl_upload_sync_) {
glWaitSync((GLsync)gl_upload_sync_, 0, GL_TIMEOUT_IGNORED);
}
@@ -524,7 +528,7 @@ void BlenderGPUDisplay::do_draw(const GPUDisplayParams &params)
const float zoomed_width = params.size.x * zoom_.x;
const float zoomed_height = params.size.y * zoom_.y;
if (texture_.width != params.size.x || texture_.height != params.size.y) {
/* Resolution divider is different from 1, force enarest interpolation. */
/* Resolution divider is different from 1, force nearest interpolation. */
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
}
else if (zoomed_width - params.size.x > 0.5f || zoomed_height - params.size.y > 0.5f) {

View File

@@ -911,14 +911,16 @@ static PyObject *enable_print_stats_func(PyObject * /*self*/, PyObject * /*args*
static PyObject *get_device_types_func(PyObject * /*self*/, PyObject * /*args*/)
{
vector<DeviceType> device_types = Device::available_types();
bool has_cuda = false, has_optix = false;
bool has_cuda = false, has_optix = false, has_hip = false;
foreach (DeviceType device_type, device_types) {
has_cuda |= (device_type == DEVICE_CUDA);
has_optix |= (device_type == DEVICE_OPTIX);
has_hip |= (device_type == DEVICE_HIP);
}
PyObject *list = PyTuple_New(2);
PyObject *list = PyTuple_New(3);
PyTuple_SET_ITEM(list, 0, PyBool_FromLong(has_cuda));
PyTuple_SET_ITEM(list, 1, PyBool_FromLong(has_optix));
PyTuple_SET_ITEM(list, 2, PyBool_FromLong(has_hip));
return list;
}
@@ -944,6 +946,9 @@ static PyObject *set_device_override_func(PyObject * /*self*/, PyObject *arg)
else if (override == "OPTIX") {
BlenderSession::device_override = DEVICE_MASK_OPTIX;
}
else if (override == "HIP") {
BlenderSession::device_override = DEVICE_MASK_HIP;
}
else {
printf("\nError: %s is not a valid Cycles device.\n", override.c_str());
Py_RETURN_FALSE;

View File

@@ -71,7 +71,8 @@ BlenderSession::BlenderSession(BL::RenderEngine &b_engine,
width(0),
height(0),
preview_osl(preview_osl),
python_thread_state(NULL)
python_thread_state(NULL),
use_developer_ui(false)
{
/* offline render */
background = true;
@@ -311,6 +312,8 @@ void BlenderSession::read_render_tile()
for (BL::RenderPass &b_pass : b_rlay.passes) {
session->set_render_tile_pixels(b_pass.name(), b_pass.channels(), (float *)b_pass.rect());
}
b_engine.end_result(b_rr, false, false, false);
}
void BlenderSession::write_render_tile()
@@ -557,6 +560,11 @@ void BlenderSession::render(BL::Depsgraph &b_depsgraph_)
/* free result without merging */
b_engine.end_result(b_rr, true, false, false);
/* When tiled rendering is used there will be no "write" done for the tile. Forcefully clear
* highlighted tiles now, so that the highlight will be removed while processing full frame from
* file. */
b_engine.tile_highlight_clear_all();
double total_time, render_time;
session->progress.get_time(total_time, render_time);
VLOG(1) << "Total render time: " << total_time;
@@ -581,6 +589,12 @@ void BlenderSession::render_frame_finish()
for (string_view filename : full_buffer_files_) {
session->process_full_buffer_from_disk(filename);
if (check_and_report_session_error()) {
break;
}
}
for (string_view filename : full_buffer_files_) {
path_remove(filename);
}
@@ -988,8 +1002,9 @@ void BlenderSession::update_status_progress()
get_status(status, substatus);
get_progress(progress, total_time, render_time);
if (progress > 0)
remaining_time = (1.0 - (double)progress) * (render_time / (double)progress);
if (progress > 0) {
remaining_time = session->get_estimated_remaining_time();
}
if (background) {
if (scene)
@@ -1027,20 +1042,27 @@ void BlenderSession::update_status_progress()
last_progress = progress;
}
if (session->progress.get_error()) {
string error = session->progress.get_error_message();
if (error != last_error) {
/* TODO(sergey): Currently C++ RNA API doesn't let us to
* use mnemonic name for the variable. Would be nice to
* have this figured out.
*
* For until then, 1 << 5 means RPT_ERROR.
*/
b_engine.report(1 << 5, error.c_str());
b_engine.error_set(error.c_str());
last_error = error;
}
check_and_report_session_error();
}
bool BlenderSession::check_and_report_session_error()
{
if (!session->progress.get_error()) {
return false;
}
const string error = session->progress.get_error_message();
if (error != last_error) {
/* TODO(sergey): Currently C++ RNA API doesn't let us to use mnemonic name for the variable.
* Would be nice to have this figured out.
*
* For until then, 1 << 5 means RPT_ERROR. */
b_engine.report(1 << 5, error.c_str());
b_engine.error_set(error.c_str());
last_error = error;
}
return true;
}
void BlenderSession::tag_update()

View File

@@ -110,8 +110,7 @@ class BlenderSession {
BL::RenderSettings b_render;
BL::Depsgraph b_depsgraph;
/* NOTE: Blender's scene might become invalid after call
* free_blender_memory_if_possible().
*/
* #free_blender_memory_if_possible(). */
BL::Scene b_scene;
BL::SpaceView3D b_v3d;
BL::RegionView3D b_rv3d;
@@ -147,6 +146,11 @@ class BlenderSession {
protected:
void stamp_view_layer_metadata(Scene *scene, const string &view_layer_name);
/* Check whether session error happened.
* If so, it is reported to the render engine and true is returned.
* Otherwise false is returned. */
bool check_and_report_session_error();
void builtin_images_load();
/* Is used after each render layer synchronization is done with the goal

View File

@@ -855,7 +855,7 @@ SessionParams BlenderSync::get_session_params(BL::RenderEngine &b_engine,
if (background) {
params.use_auto_tile = RNA_boolean_get(&cscene, "use_auto_tile");
params.tile_size = get_int(cscene, "tile_size");
params.tile_size = max(get_int(cscene, "tile_size"), 8);
}
else {
params.use_auto_tile = false;

View File

@@ -90,26 +90,27 @@ static inline BL::Mesh object_to_mesh(BL::BlendData & /*data*/,
}
#endif
BL::Mesh mesh(PointerRNA_NULL);
if (b_ob_info.object_data.is_a(&RNA_Mesh)) {
/* TODO: calc_undeformed is not used. */
mesh = BL::Mesh(b_ob_info.object_data);
BL::Mesh mesh = (b_ob_info.object_data.is_a(&RNA_Mesh)) ? BL::Mesh(b_ob_info.object_data) :
BL::Mesh(PointerRNA_NULL);
/* Make a copy to split faces if we use autosmooth, otherwise not needed.
* Also in edit mode do we need to make a copy, to ensure data layers like
* UV are not empty. */
if (mesh.is_editmode() ||
(mesh.use_auto_smooth() && subdivision_type == Mesh::SUBDIVISION_NONE)) {
if (b_ob_info.is_real_object_data()) {
if (mesh) {
/* Make a copy to split faces if we use autosmooth, otherwise not needed.
* Also in edit mode do we need to make a copy, to ensure data layers like
* UV are not empty. */
if (mesh.is_editmode() ||
(mesh.use_auto_smooth() && subdivision_type == Mesh::SUBDIVISION_NONE)) {
BL::Depsgraph depsgraph(PointerRNA_NULL);
mesh = b_ob_info.real_object.to_mesh(false, depsgraph);
}
}
else {
BL::Depsgraph depsgraph(PointerRNA_NULL);
assert(b_ob_info.is_real_object_data());
mesh = b_ob_info.real_object.to_mesh(false, depsgraph);
}
}
else {
BL::Depsgraph depsgraph(PointerRNA_NULL);
if (b_ob_info.is_real_object_data()) {
mesh = b_ob_info.real_object.to_mesh(false, depsgraph);
}
/* TODO: what to do about non-mesh geometry instances? */
}
#if 0

View File

@@ -213,7 +213,7 @@ static void rtc_filter_occluded_func(const RTCFilterFunctionNArguments *args)
if (ctx->num_hits < ctx->max_hits) {
Intersection current_isect;
kernel_embree_convert_hit(kg, ray, hit, &current_isect);
for (size_t i = 0; i < ctx->max_hits; ++i) {
for (size_t i = 0; i < ctx->num_hits; ++i) {
if (current_isect.object == ctx->isect_s[i].object &&
current_isect.prim == ctx->isect_s[i].prim && current_isect.t == ctx->isect_s[i].t) {
/* This intersection was already recorded, skip it. */

View File

@@ -532,4 +532,13 @@ if(WITH_CYCLES_CUDA_BINARIES OR NOT WITH_CUDA_DYNLOAD)
endif()
endif()
###########################################################################
# HIP
###########################################################################
if(NOT WITH_HIP_DYNLOAD)
set(WITH_HIP_DYNLOAD ON)
endif()
unset(_cycles_lib_dir)

View File

@@ -156,10 +156,16 @@ macro(cycles_target_link_libraries target)
${PLATFORM_LINKLIBS}
)
if(WITH_CUDA_DYNLOAD)
target_link_libraries(${target} extern_cuew)
else()
target_link_libraries(${target} ${CUDA_CUDA_LIBRARY})
if(WITH_CYCLES_DEVICE_CUDA OR WITH_CYCLES_DEVICE_OPTIX)
if(WITH_CUDA_DYNLOAD)
target_link_libraries(${target} extern_cuew)
else()
target_link_libraries(${target} ${CUDA_CUDA_LIBRARY})
endif()
endif()
if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
target_link_libraries(${target} extern_hipew)
endif()
if(CYCLES_STANDALONE_REPOSITORY)

View File

@@ -22,16 +22,25 @@ set(INC_SYS
../../../extern/clew/include
)
if(WITH_CUDA_DYNLOAD)
if(WITH_CYCLES_DEVICE_OPTIX OR WITH_CYCLES_DEVICE_CUDA)
if(WITH_CUDA_DYNLOAD)
list(APPEND INC
../../../extern/cuew/include
)
add_definitions(-DWITH_CUDA_DYNLOAD)
else()
list(APPEND INC_SYS
${CUDA_TOOLKIT_INCLUDE}
)
add_definitions(-DCYCLES_CUDA_NVCC_EXECUTABLE="${CUDA_NVCC_EXECUTABLE}")
endif()
endif()
if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
list(APPEND INC
../../../extern/cuew/include
../../../extern/hipew/include
)
add_definitions(-DWITH_CUDA_DYNLOAD)
else()
list(APPEND INC_SYS
${CUDA_TOOLKIT_INCLUDE}
)
add_definitions(-DCYCLES_CUDA_NVCC_EXECUTABLE="${CUDA_NVCC_EXECUTABLE}")
add_definitions(-DWITH_HIP_DYNLOAD)
endif()
set(SRC
@@ -70,6 +79,21 @@ set(SRC_CUDA
cuda/util.h
)
set(SRC_HIP
hip/device.cpp
hip/device.h
hip/device_impl.cpp
hip/device_impl.h
hip/graphics_interop.cpp
hip/graphics_interop.h
hip/kernel.cpp
hip/kernel.h
hip/queue.cpp
hip/queue.h
hip/util.cpp
hip/util.h
)
set(SRC_DUMMY
dummy/device.cpp
dummy/device.h
@@ -105,13 +129,21 @@ set(LIB
${CYCLES_GL_LIBRARIES}
)
if(WITH_CUDA_DYNLOAD)
if(WITH_CYCLES_DEVICE_OPTIX OR WITH_CYCLES_DEVICE_CUDA)
if(WITH_CUDA_DYNLOAD)
list(APPEND LIB
extern_cuew
)
else()
list(APPEND LIB
${CUDA_CUDA_LIBRARY}
)
endif()
endif()
if(WITH_CYCLES_DEVICE_HIP AND WITH_HIP_DYNLOAD)
list(APPEND LIB
extern_cuew
)
else()
list(APPEND LIB
${CUDA_CUDA_LIBRARY}
extern_hipew
)
endif()
@@ -120,6 +152,9 @@ add_definitions(${GL_DEFINITIONS})
if(WITH_CYCLES_DEVICE_CUDA)
add_definitions(-DWITH_CUDA)
endif()
if(WITH_CYCLES_DEVICE_HIP)
add_definitions(-DWITH_HIP)
endif()
if(WITH_CYCLES_DEVICE_OPTIX)
add_definitions(-DWITH_OPTIX)
endif()
@@ -140,6 +175,7 @@ cycles_add_library(cycles_device "${LIB}"
${SRC}
${SRC_CPU}
${SRC_CUDA}
${SRC_HIP}
${SRC_DUMMY}
${SRC_MULTI}
${SRC_OPTIX}

View File

@@ -54,7 +54,6 @@
#include "util/util_function.h"
#include "util/util_logging.h"
#include "util/util_map.h"
#include "util/util_opengl.h"
#include "util/util_openimagedenoise.h"
#include "util/util_optimization.h"
#include "util/util_progress.h"
@@ -170,7 +169,7 @@ void CPUDevice::mem_copy_to(device_memory &mem)
}
void CPUDevice::mem_copy_from(
device_memory & /*mem*/, int /*y*/, int /*w*/, int /*h*/, int /*elem*/)
device_memory & /*mem*/, size_t /*y*/, size_t /*w*/, size_t /*h*/, size_t /*elem*/)
{
/* no-op */
}
@@ -204,7 +203,7 @@ void CPUDevice::mem_free(device_memory &mem)
}
}
device_ptr CPUDevice::mem_alloc_sub_ptr(device_memory &mem, int offset, int /*size*/)
device_ptr CPUDevice::mem_alloc_sub_ptr(device_memory &mem, size_t offset, size_t /*size*/)
{
return (device_ptr)(((char *)mem.device_pointer) + mem.memory_elements_size(offset));
}
@@ -298,154 +297,6 @@ void CPUDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
Device::build_bvh(bvh, progress, refit);
}
#if 0
void CPUDevice::render(DeviceTask &task, RenderTile &tile, KernelGlobals *kg)
{
const bool use_coverage = kernel_data.film.cryptomatte_passes & CRYPT_ACCURATE;
scoped_timer timer(&tile.buffers->render_time);
Coverage coverage(kg, tile);
if (use_coverage) {
coverage.init_path_trace();
}
float *render_buffer = (float *)tile.buffer;
int start_sample = tile.start_sample;
int end_sample = tile.start_sample + tile.num_samples;
/* Needed for Embree. */
SIMD_SET_FLUSH_TO_ZERO;
for (int sample = start_sample; sample < end_sample; sample++) {
if (task.get_cancel() || TaskPool::canceled()) {
if (task.need_finish_queue == false)
break;
}
if (tile.stealing_state == RenderTile::CAN_BE_STOLEN && task.get_tile_stolen()) {
tile.stealing_state = RenderTile::WAS_STOLEN;
break;
}
if (tile.task == RenderTile::PATH_TRACE) {
for (int y = tile.y; y < tile.y + tile.h; y++) {
for (int x = tile.x; x < tile.x + tile.w; x++) {
if (use_coverage) {
coverage.init_pixel(x, y);
}
kernels.path_trace(kg, render_buffer, sample, x, y, tile.offset, tile.stride);
}
}
}
else {
for (int y = tile.y; y < tile.y + tile.h; y++) {
for (int x = tile.x; x < tile.x + tile.w; x++) {
kernels.bake(kg, render_buffer, sample, x, y, tile.offset, tile.stride);
}
}
}
tile.sample = sample + 1;
if (task.adaptive_sampling.use && task.adaptive_sampling.need_filter(sample)) {
const bool stop = adaptive_sampling_filter(kg, tile, sample);
if (stop) {
const int num_progress_samples = end_sample - sample;
tile.sample = end_sample;
task.update_progress(&tile, tile.w * tile.h * num_progress_samples);
break;
}
}
task.update_progress(&tile, tile.w * tile.h);
}
if (use_coverage) {
coverage.finalize();
}
if (task.adaptive_sampling.use && (tile.stealing_state != RenderTile::WAS_STOLEN)) {
adaptive_sampling_post(tile, kg);
}
}
void CPUDevice::thread_render(DeviceTask &task)
{
if (TaskPool::canceled()) {
if (task.need_finish_queue == false)
return;
}
/* allocate buffer for kernel globals */
CPUKernelThreadGlobals kg(kernel_globals, get_cpu_osl_memory());
profiler.add_state(&kg.profiler);
/* NLM denoiser. */
DenoisingTask *denoising = NULL;
/* OpenImageDenoise: we can only denoise with one thread at a time, so to
* avoid waiting with mutex locks in the denoiser, we let only a single
* thread acquire denoising tiles. */
uint tile_types = task.tile_types;
bool hold_denoise_lock = false;
if ((tile_types & RenderTile::DENOISE) && task.denoising.type == DENOISER_OPENIMAGEDENOISE) {
if (!oidn_task_lock.try_lock()) {
tile_types &= ~RenderTile::DENOISE;
hold_denoise_lock = true;
}
}
RenderTile tile;
while (task.acquire_tile(this, tile, tile_types)) {
if (tile.task == RenderTile::PATH_TRACE) {
render(task, tile, &kg);
}
else if (tile.task == RenderTile::BAKE) {
render(task, tile, &kg);
}
else if (tile.task == RenderTile::DENOISE) {
denoise_openimagedenoise(task, tile);
task.update_progress(&tile, tile.w * tile.h);
}
task.release_tile(tile);
if (TaskPool::canceled()) {
if (task.need_finish_queue == false)
break;
}
}
if (hold_denoise_lock) {
oidn_task_lock.unlock();
}
profiler.remove_state(&kg.profiler);
delete denoising;
}
void CPUDevice::thread_denoise(DeviceTask &task)
{
RenderTile tile;
tile.x = task.x;
tile.y = task.y;
tile.w = task.w;
tile.h = task.h;
tile.buffer = task.buffer;
tile.sample = task.sample + task.num_samples;
tile.num_samples = task.num_samples;
tile.start_sample = task.sample;
tile.offset = task.offset;
tile.stride = task.stride;
tile.buffers = task.buffers;
denoise_openimagedenoise(task, tile);
task.update_progress(&tile, tile.w * tile.h);
}
#endif
const CPUKernels *CPUDevice::get_cpu_kernels() const
{
return &kernels;

View File

@@ -72,10 +72,13 @@ class CPUDevice : public Device {
virtual void mem_alloc(device_memory &mem) override;
virtual void mem_copy_to(device_memory &mem) override;
virtual void mem_copy_from(device_memory &mem, int y, int w, int h, int elem) override;
virtual void mem_copy_from(
device_memory &mem, size_t y, size_t w, size_t h, size_t elem) override;
virtual void mem_zero(device_memory &mem) override;
virtual void mem_free(device_memory &mem) override;
virtual device_ptr mem_alloc_sub_ptr(device_memory &mem, int offset, int /*size*/) override;
virtual device_ptr mem_alloc_sub_ptr(device_memory &mem,
size_t offset,
size_t /*size*/) override;
virtual void const_copy_to(const char *name, void *host, size_t size) override;

View File

@@ -31,7 +31,6 @@
# include "util/util_logging.h"
# include "util/util_map.h"
# include "util/util_md5.h"
# include "util/util_opengl.h"
# include "util/util_path.h"
# include "util/util_string.h"
# include "util/util_system.h"
@@ -837,7 +836,7 @@ void CUDADevice::mem_copy_to(device_memory &mem)
}
}
void CUDADevice::mem_copy_from(device_memory &mem, int y, int w, int h, int elem)
void CUDADevice::mem_copy_from(device_memory &mem, size_t y, size_t w, size_t h, size_t elem)
{
if (mem.type == MEM_TEXTURE || mem.type == MEM_GLOBAL) {
assert(!"mem_copy_from not supported for textures.");
@@ -891,7 +890,7 @@ void CUDADevice::mem_free(device_memory &mem)
}
}
device_ptr CUDADevice::mem_alloc_sub_ptr(device_memory &mem, int offset, int /*size*/)
device_ptr CUDADevice::mem_alloc_sub_ptr(device_memory &mem, size_t offset, size_t /*size*/)
{
return (device_ptr)(((char *)mem.device_pointer) + mem.memory_elements_size(offset));
}
@@ -1169,141 +1168,6 @@ void CUDADevice::tex_free(device_texture &mem)
}
}
# if 0
void CUDADevice::render(DeviceTask &task,
RenderTile &rtile,
device_vector<KernelWorkTile> &work_tiles)
{
scoped_timer timer(&rtile.buffers->render_time);
if (have_error())
return;
CUDAContextScope scope(this);
CUfunction cuRender;
/* Get kernel function. */
if (rtile.task == RenderTile::BAKE) {
cuda_assert(cuModuleGetFunction(&cuRender, cuModule, "kernel_cuda_bake"));
}
else {
cuda_assert(cuModuleGetFunction(&cuRender, cuModule, "kernel_cuda_path_trace"));
}
if (have_error()) {
return;
}
cuda_assert(cuFuncSetCacheConfig(cuRender, CU_FUNC_CACHE_PREFER_L1));
/* Allocate work tile. */
work_tiles.alloc(1);
KernelWorkTile *wtile = work_tiles.data();
wtile->x = rtile.x;
wtile->y = rtile.y;
wtile->w = rtile.w;
wtile->h = rtile.h;
wtile->offset = rtile.offset;
wtile->stride = rtile.stride;
wtile->buffer = (float *)(CUdeviceptr)rtile.buffer;
/* Prepare work size. More step samples render faster, but for now we
* remain conservative for GPUs connected to a display to avoid driver
* timeouts and display freezing. */
int min_blocks, num_threads_per_block;
cuda_assert(
cuOccupancyMaxPotentialBlockSize(&min_blocks, &num_threads_per_block, cuRender, NULL, 0, 0));
if (!info.display_device) {
min_blocks *= 8;
}
uint step_samples = divide_up(min_blocks * num_threads_per_block, wtile->w * wtile->h);
/* Render all samples. */
uint start_sample = rtile.start_sample;
uint end_sample = rtile.start_sample + rtile.num_samples;
for (int sample = start_sample; sample < end_sample;) {
/* Setup and copy work tile to device. */
wtile->start_sample = sample;
wtile->num_samples = step_samples;
if (task.adaptive_sampling.use) {
wtile->num_samples = task.adaptive_sampling.align_samples(sample, step_samples);
}
wtile->num_samples = min(wtile->num_samples, end_sample - sample);
work_tiles.copy_to_device();
CUdeviceptr d_work_tiles = (CUdeviceptr)work_tiles.device_pointer;
uint total_work_size = wtile->w * wtile->h * wtile->num_samples;
uint num_blocks = divide_up(total_work_size, num_threads_per_block);
/* Launch kernel. */
void *args[] = {&d_work_tiles, &total_work_size};
cuda_assert(
cuLaunchKernel(cuRender, num_blocks, 1, 1, num_threads_per_block, 1, 1, 0, 0, args, 0));
/* Run the adaptive sampling kernels at selected samples aligned to step samples. */
uint filter_sample = sample + wtile->num_samples - 1;
if (task.adaptive_sampling.use && task.adaptive_sampling.need_filter(filter_sample)) {
adaptive_sampling_filter(filter_sample, wtile, d_work_tiles);
}
cuda_assert(cuCtxSynchronize());
/* Update progress. */
sample += wtile->num_samples;
rtile.sample = sample;
task.update_progress(&rtile, rtile.w * rtile.h * wtile->num_samples);
if (task.get_cancel()) {
if (task.need_finish_queue == false)
break;
}
}
/* Finalize adaptive sampling. */
if (task.adaptive_sampling.use) {
CUdeviceptr d_work_tiles = (CUdeviceptr)work_tiles.device_pointer;
adaptive_sampling_post(rtile, wtile, d_work_tiles);
cuda_assert(cuCtxSynchronize());
task.update_progress(&rtile, rtile.w * rtile.h * wtile->num_samples);
}
}
void CUDADevice::thread_run(DeviceTask &task)
{
CUDAContextScope scope(this);
if (task.type == DeviceTask::RENDER) {
device_vector<KernelWorkTile> work_tiles(this, "work_tiles", MEM_READ_ONLY);
/* keep rendering tiles until done */
RenderTile tile;
DenoisingTask denoising(this, task);
while (task.acquire_tile(this, tile, task.tile_types)) {
if (tile.task == RenderTile::PATH_TRACE) {
render(task, tile, work_tiles);
}
else if (tile.task == RenderTile::BAKE) {
render(task, tile, work_tiles);
}
task.release_tile(tile);
if (task.get_cancel()) {
if (task.need_finish_queue == false)
break;
}
}
work_tiles.free();
}
}
# endif
unique_ptr<DeviceQueue> CUDADevice::gpu_queue_create()
{
return make_unique<CUDADeviceQueue>(this);

View File

@@ -26,7 +26,6 @@
# ifdef WITH_CUDA_DYNLOAD
# include "cuew.h"
# else
# include "util/util_opengl.h"
# include <cuda.h>
# include <cudaGL.h>
# endif
@@ -120,13 +119,13 @@ class CUDADevice : public Device {
void mem_copy_to(device_memory &mem) override;
void mem_copy_from(device_memory &mem, int y, int w, int h, int elem) override;
void mem_copy_from(device_memory &mem, size_t y, size_t w, size_t h, size_t elem) override;
void mem_zero(device_memory &mem) override;
void mem_free(device_memory &mem) override;
device_ptr mem_alloc_sub_ptr(device_memory &mem, int offset, int /*size*/) override;
device_ptr mem_alloc_sub_ptr(device_memory &mem, size_t offset, size_t /*size*/) override;
virtual void const_copy_to(const char *name, void *host, size_t size) override;

View File

@@ -116,18 +116,18 @@ bool CUDADeviceQueue::enqueue(DeviceKernel kernel, const int work_size, void *ar
}
/* Launch kernel. */
cuda_device_assert(cuda_device_,
cuLaunchKernel(cuda_kernel.function,
num_blocks,
1,
1,
num_threads_per_block,
1,
1,
shared_mem_bytes,
cuda_stream_,
args,
0));
assert_success(cuLaunchKernel(cuda_kernel.function,
num_blocks,
1,
1,
num_threads_per_block,
1,
1,
shared_mem_bytes,
cuda_stream_,
args,
0),
"enqueue");
return !(cuda_device_->have_error());
}
@@ -139,7 +139,8 @@ bool CUDADeviceQueue::synchronize()
}
const CUDAContextScope scope(cuda_device_);
cuda_device_assert(cuda_device_, cuStreamSynchronize(cuda_stream_));
assert_success(cuStreamSynchronize(cuda_stream_), "synchronize");
debug_synchronize();
return !(cuda_device_->have_error());
@@ -162,9 +163,9 @@ void CUDADeviceQueue::zero_to_device(device_memory &mem)
assert(mem.device_pointer != 0);
const CUDAContextScope scope(cuda_device_);
cuda_device_assert(
cuda_device_,
cuMemsetD8Async((CUdeviceptr)mem.device_pointer, 0, mem.memory_size(), cuda_stream_));
assert_success(
cuMemsetD8Async((CUdeviceptr)mem.device_pointer, 0, mem.memory_size(), cuda_stream_),
"zero_to_device");
}
void CUDADeviceQueue::copy_to_device(device_memory &mem)
@@ -185,10 +186,10 @@ void CUDADeviceQueue::copy_to_device(device_memory &mem)
/* Copy memory to device. */
const CUDAContextScope scope(cuda_device_);
cuda_device_assert(
cuda_device_,
assert_success(
cuMemcpyHtoDAsync(
(CUdeviceptr)mem.device_pointer, mem.host_pointer, mem.memory_size(), cuda_stream_));
(CUdeviceptr)mem.device_pointer, mem.host_pointer, mem.memory_size(), cuda_stream_),
"copy_to_device");
}
void CUDADeviceQueue::copy_from_device(device_memory &mem)
@@ -204,10 +205,19 @@ void CUDADeviceQueue::copy_from_device(device_memory &mem)
/* Copy memory from device. */
const CUDAContextScope scope(cuda_device_);
cuda_device_assert(
cuda_device_,
assert_success(
cuMemcpyDtoHAsync(
mem.host_pointer, (CUdeviceptr)mem.device_pointer, mem.memory_size(), cuda_stream_));
mem.host_pointer, (CUdeviceptr)mem.device_pointer, mem.memory_size(), cuda_stream_),
"copy_from_device");
}
void CUDADeviceQueue::assert_success(CUresult result, const char *operation)
{
if (result != CUDA_SUCCESS) {
const char *name = cuewErrorString(result);
cuda_device_->set_error(string_printf(
"%s in CUDA queue %s (%s)", name, operation, debug_active_kernels().c_str()));
}
}
unique_ptr<DeviceGraphicsInterop> CUDADeviceQueue::graphics_interop_create()

View File

@@ -60,6 +60,8 @@ class CUDADeviceQueue : public DeviceQueue {
protected:
CUDADevice *cuda_device_;
CUstream cuda_stream_;
void assert_success(CUresult result, const char *operation);
};
CCL_NAMESPACE_END

View File

@@ -25,6 +25,7 @@
#include "device/cpu/device.h"
#include "device/cuda/device.h"
#include "device/dummy/device.h"
#include "device/hip/device.h"
#include "device/multi/device.h"
#include "device/optix/device.h"
@@ -32,7 +33,6 @@
#include "util/util_half.h"
#include "util/util_logging.h"
#include "util/util_math.h"
#include "util/util_opengl.h"
#include "util/util_string.h"
#include "util/util_system.h"
#include "util/util_time.h"
@@ -47,6 +47,7 @@ thread_mutex Device::device_mutex;
vector<DeviceInfo> Device::cuda_devices;
vector<DeviceInfo> Device::optix_devices;
vector<DeviceInfo> Device::cpu_devices;
vector<DeviceInfo> Device::hip_devices;
uint Device::devices_initialized_mask = 0;
/* Device */
@@ -97,6 +98,14 @@ Device *Device::create(const DeviceInfo &info, Stats &stats, Profiler &profiler)
device = device_optix_create(info, stats, profiler);
break;
#endif
#ifdef WITH_HIP
case DEVICE_HIP:
if (device_hip_init())
device = device_hip_create(info, stats, profiler);
break;
#endif
default:
break;
}
@@ -118,6 +127,8 @@ DeviceType Device::type_from_string(const char *name)
return DEVICE_OPTIX;
else if (strcmp(name, "MULTI") == 0)
return DEVICE_MULTI;
else if (strcmp(name, "HIP") == 0)
return DEVICE_HIP;
return DEVICE_NONE;
}
@@ -132,6 +143,8 @@ string Device::string_from_type(DeviceType type)
return "OPTIX";
else if (type == DEVICE_MULTI)
return "MULTI";
else if (type == DEVICE_HIP)
return "HIP";
return "";
}
@@ -146,6 +159,10 @@ vector<DeviceType> Device::available_types()
#ifdef WITH_OPTIX
types.push_back(DEVICE_OPTIX);
#endif
#ifdef WITH_HIP
types.push_back(DEVICE_HIP);
#endif
return types;
}
@@ -187,6 +204,20 @@ vector<DeviceInfo> Device::available_devices(uint mask)
}
#endif
#ifdef WITH_HIP
if (mask & DEVICE_MASK_HIP) {
if (!(devices_initialized_mask & DEVICE_MASK_HIP)) {
if (device_hip_init()) {
device_hip_info(hip_devices);
}
devices_initialized_mask |= DEVICE_MASK_HIP;
}
foreach (DeviceInfo &info, hip_devices) {
devices.push_back(info);
}
}
#endif
if (mask & DEVICE_MASK_CPU) {
if (!(devices_initialized_mask & DEVICE_MASK_CPU)) {
device_cpu_info(cpu_devices);
@@ -227,6 +258,15 @@ string Device::device_capabilities(uint mask)
}
#endif
#ifdef WITH_HIP
if (mask & DEVICE_MASK_HIP) {
if (device_hip_init()) {
capabilities += "\nHIP device capabilities:\n";
capabilities += device_hip_capabilities();
}
}
#endif
return capabilities;
}
@@ -315,6 +355,7 @@ void Device::free_memory()
devices_initialized_mask = 0;
cuda_devices.free_memory();
optix_devices.free_memory();
hip_devices.free_memory();
cpu_devices.free_memory();
}

View File

@@ -51,6 +51,7 @@ enum DeviceType {
DEVICE_CUDA,
DEVICE_MULTI,
DEVICE_OPTIX,
DEVICE_HIP,
DEVICE_DUMMY,
};
@@ -58,6 +59,7 @@ enum DeviceTypeMask {
DEVICE_MASK_CPU = (1 << DEVICE_CPU),
DEVICE_MASK_CUDA = (1 << DEVICE_CUDA),
DEVICE_MASK_OPTIX = (1 << DEVICE_OPTIX),
DEVICE_MASK_HIP = (1 << DEVICE_HIP),
DEVICE_MASK_ALL = ~0
};
@@ -119,7 +121,7 @@ class Device {
string error_msg;
virtual device_ptr mem_alloc_sub_ptr(device_memory & /*mem*/, int /*offset*/, int /*size*/)
virtual device_ptr mem_alloc_sub_ptr(device_memory & /*mem*/, size_t /*offset*/, size_t /*size*/)
{
/* Only required for devices that implement denoising. */
assert(false);
@@ -273,7 +275,7 @@ class Device {
virtual void mem_alloc(device_memory &mem) = 0;
virtual void mem_copy_to(device_memory &mem) = 0;
virtual void mem_copy_from(device_memory &mem, int y, int w, int h, int elem) = 0;
virtual void mem_copy_from(device_memory &mem, size_t y, size_t w, size_t h, size_t elem) = 0;
virtual void mem_zero(device_memory &mem) = 0;
virtual void mem_free(device_memory &mem) = 0;
@@ -284,6 +286,7 @@ class Device {
static vector<DeviceInfo> cuda_devices;
static vector<DeviceInfo> optix_devices;
static vector<DeviceInfo> cpu_devices;
static vector<DeviceInfo> hip_devices;
static uint devices_initialized_mask;
};

View File

@@ -136,7 +136,7 @@ void device_memory::device_copy_to()
}
}
void device_memory::device_copy_from(int y, int w, int h, int elem)
void device_memory::device_copy_from(size_t y, size_t w, size_t h, size_t elem)
{
assert(type != MEM_TEXTURE && type != MEM_READ_ONLY && type != MEM_GLOBAL);
device->mem_copy_from(*this, y, w, h, elem);
@@ -181,7 +181,7 @@ bool device_memory::is_resident(Device *sub_device) const
/* Device Sub Ptr */
device_sub_ptr::device_sub_ptr(device_memory &mem, int offset, int size) : device(mem.device)
device_sub_ptr::device_sub_ptr(device_memory &mem, size_t offset, size_t size) : device(mem.device)
{
ptr = device->mem_alloc_sub_ptr(mem, offset, size);
}

View File

@@ -81,154 +81,154 @@ static constexpr size_t datatype_size(DataType datatype)
template<typename T> struct device_type_traits {
static const DataType data_type = TYPE_UNKNOWN;
static const int num_elements_cpu = sizeof(T);
static const int num_elements_gpu = sizeof(T);
static const size_t num_elements_cpu = sizeof(T);
static const size_t num_elements_gpu = sizeof(T);
};
template<> struct device_type_traits<uchar> {
static const DataType data_type = TYPE_UCHAR;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(uchar) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uchar2> {
static const DataType data_type = TYPE_UCHAR;
static const int num_elements_cpu = 2;
static const int num_elements_gpu = 2;
static const size_t num_elements_cpu = 2;
static const size_t num_elements_gpu = 2;
static_assert(sizeof(uchar2) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uchar3> {
static const DataType data_type = TYPE_UCHAR;
static const int num_elements_cpu = 3;
static const int num_elements_gpu = 3;
static const size_t num_elements_cpu = 3;
static const size_t num_elements_gpu = 3;
static_assert(sizeof(uchar3) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uchar4> {
static const DataType data_type = TYPE_UCHAR;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(uchar4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint> {
static const DataType data_type = TYPE_UINT;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(uint) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint2> {
static const DataType data_type = TYPE_UINT;
static const int num_elements_cpu = 2;
static const int num_elements_gpu = 2;
static const size_t num_elements_cpu = 2;
static const size_t num_elements_gpu = 2;
static_assert(sizeof(uint2) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint3> {
static const DataType data_type = TYPE_UINT;
static const int num_elements_cpu = 3;
static const int num_elements_gpu = 3;
static const size_t num_elements_cpu = 3;
static const size_t num_elements_gpu = 3;
static_assert(sizeof(uint3) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint4> {
static const DataType data_type = TYPE_UINT;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(uint4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<int> {
static const DataType data_type = TYPE_INT;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(int) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<int2> {
static const DataType data_type = TYPE_INT;
static const int num_elements_cpu = 2;
static const int num_elements_gpu = 2;
static const size_t num_elements_cpu = 2;
static const size_t num_elements_gpu = 2;
static_assert(sizeof(int2) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<int3> {
static const DataType data_type = TYPE_INT;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 3;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 3;
static_assert(sizeof(int3) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<int4> {
static const DataType data_type = TYPE_INT;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(int4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<float> {
static const DataType data_type = TYPE_FLOAT;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(float) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<float2> {
static const DataType data_type = TYPE_FLOAT;
static const int num_elements_cpu = 2;
static const int num_elements_gpu = 2;
static const size_t num_elements_cpu = 2;
static const size_t num_elements_gpu = 2;
static_assert(sizeof(float2) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<float3> {
static const DataType data_type = TYPE_FLOAT;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 3;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 3;
static_assert(sizeof(float3) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<float4> {
static const DataType data_type = TYPE_FLOAT;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(float4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<half> {
static const DataType data_type = TYPE_HALF;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(half) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<ushort4> {
static const DataType data_type = TYPE_UINT16;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(ushort4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint16_t> {
static const DataType data_type = TYPE_UINT16;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(uint16_t) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<half4> {
static const DataType data_type = TYPE_HALF;
static const int num_elements_cpu = 4;
static const int num_elements_gpu = 4;
static const size_t num_elements_cpu = 4;
static const size_t num_elements_gpu = 4;
static_assert(sizeof(half4) == num_elements_cpu * datatype_size(data_type));
};
template<> struct device_type_traits<uint64_t> {
static const DataType data_type = TYPE_UINT64;
static const int num_elements_cpu = 1;
static const int num_elements_gpu = 1;
static const size_t num_elements_cpu = 1;
static const size_t num_elements_gpu = 1;
static_assert(sizeof(uint64_t) == num_elements_cpu * datatype_size(data_type));
};
@@ -277,6 +277,7 @@ class device_memory {
protected:
friend class CUDADevice;
friend class OptiXDevice;
friend class HIPDevice;
/* Only create through subclasses. */
device_memory(Device *device, const char *name, MemoryType type);
@@ -296,7 +297,7 @@ class device_memory {
void device_alloc();
void device_free();
void device_copy_to();
void device_copy_from(int y, int w, int h, int elem);
void device_copy_from(size_t y, size_t w, size_t h, size_t elem);
void device_zero();
bool device_is_cpu();
@@ -565,7 +566,7 @@ template<typename T> class device_vector : public device_memory {
device_copy_from(0, data_width, (data_height == 0) ? 1 : data_height, sizeof(T));
}
void copy_from_device(int y, int w, int h)
void copy_from_device(size_t y, size_t w, size_t h)
{
device_copy_from(y, w, h, sizeof(T));
}
@@ -601,7 +602,7 @@ template<typename T> class device_vector : public device_memory {
class device_sub_ptr {
public:
device_sub_ptr(device_memory &mem, int offset, int size);
device_sub_ptr(device_memory &mem, size_t offset, size_t size);
~device_sub_ptr();
device_ptr operator*() const

View File

@@ -57,8 +57,9 @@ void DeviceQueue::debug_init_execution()
{
if (VLOG_IS_ON(3)) {
last_sync_time_ = time_dt();
last_kernels_enqueued_ = 0;
}
last_kernels_enqueued_ = 0;
}
void DeviceQueue::debug_enqueue(DeviceKernel kernel, const int work_size)
@@ -66,8 +67,9 @@ void DeviceQueue::debug_enqueue(DeviceKernel kernel, const int work_size)
if (VLOG_IS_ON(3)) {
VLOG(4) << "GPU queue launch " << device_kernel_as_string(kernel) << ", work_size "
<< work_size;
last_kernels_enqueued_ |= (uint64_t(1) << (uint64_t)kernel);
}
last_kernels_enqueued_ |= (uint64_t(1) << (uint64_t)kernel);
}
void DeviceQueue::debug_synchronize()
@@ -80,8 +82,14 @@ void DeviceQueue::debug_synchronize()
stats_kernel_time_[last_kernels_enqueued_] += elapsed_time;
last_sync_time_ = new_time;
last_kernels_enqueued_ = 0;
}
last_kernels_enqueued_ = 0;
}
string DeviceQueue::debug_active_kernels()
{
return device_kernel_mask_as_string(last_kernels_enqueued_);
}
CCL_NAMESPACE_END

View File

@@ -21,6 +21,7 @@
#include "device/device_graphics_interop.h"
#include "util/util_logging.h"
#include "util/util_map.h"
#include "util/util_string.h"
#include "util/util_unique_ptr.h"
CCL_NAMESPACE_BEGIN
@@ -101,6 +102,7 @@ class DeviceQueue {
void debug_init_execution();
void debug_enqueue(DeviceKernel kernel, const int work_size);
void debug_synchronize();
string debug_active_kernels();
/* Combination of kernels enqueued together sync last synchronize. */
DeviceKernelMask last_kernels_enqueued_;

View File

@@ -48,7 +48,7 @@ class DummyDevice : public Device {
{
}
virtual void mem_copy_from(device_memory &, int, int, int, int) override
virtual void mem_copy_from(device_memory &, size_t, size_t, size_t, size_t) override
{
}

View File

@@ -0,0 +1,276 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "device/hip/device.h"
#include "util/util_logging.h"
#ifdef WITH_HIP
# include "device/device.h"
# include "device/hip/device_impl.h"
# include "util/util_string.h"
# include "util/util_windows.h"
#endif /* WITH_HIP */
CCL_NAMESPACE_BEGIN
bool device_hip_init()
{
#if !defined(WITH_HIP)
return false;
#elif defined(WITH_HIP_DYNLOAD)
static bool initialized = false;
static bool result = false;
if (initialized)
return result;
initialized = true;
int hipew_result = hipewInit(HIPEW_INIT_HIP);
if (hipew_result == HIPEW_SUCCESS) {
VLOG(1) << "HIPEW initialization succeeded";
if (HIPDevice::have_precompiled_kernels()) {
VLOG(1) << "Found precompiled kernels";
result = true;
}
else if (hipewCompilerPath() != NULL) {
VLOG(1) << "Found HIPCC " << hipewCompilerPath();
result = true;
}
else {
VLOG(1) << "Neither precompiled kernels nor HIPCC was found,"
<< " unable to use HIP";
}
}
else {
VLOG(1) << "HIPEW initialization failed: "
<< ((hipew_result == HIPEW_ERROR_ATEXIT_FAILED) ? "Error setting up atexit() handler" :
"Error opening the library");
}
return result;
#else /* WITH_HIP_DYNLOAD */
return true;
#endif /* WITH_HIP_DYNLOAD */
}
Device *device_hip_create(const DeviceInfo &info, Stats &stats, Profiler &profiler)
{
#ifdef WITH_HIP
return new HIPDevice(info, stats, profiler);
#else
(void)info;
(void)stats;
(void)profiler;
LOG(FATAL) << "Request to create HIP device without compiled-in support. Should never happen.";
return nullptr;
#endif
}
#ifdef WITH_HIP
static hipError_t device_hip_safe_init()
{
# ifdef _WIN32
__try {
return hipInit(0);
}
__except (EXCEPTION_EXECUTE_HANDLER) {
/* Ignore crashes inside the HIP driver and hope we can
* survive even with corrupted HIP installs. */
fprintf(stderr, "Cycles HIP: driver crashed, continuing without HIP.\n");
}
return hipErrorNoDevice;
# else
return hipInit(0);
# endif
}
#endif /* WITH_HIP */
void device_hip_info(vector<DeviceInfo> &devices)
{
#ifdef WITH_HIP
hipError_t result = device_hip_safe_init();
if (result != hipSuccess) {
if (result != hipErrorNoDevice)
fprintf(stderr, "HIP hipInit: %s\n", hipewErrorString(result));
return;
}
int count = 0;
result = hipGetDeviceCount(&count);
if (result != hipSuccess) {
fprintf(stderr, "HIP hipGetDeviceCount: %s\n", hipewErrorString(result));
return;
}
vector<DeviceInfo> display_devices;
for (int num = 0; num < count; num++) {
char name[256];
result = hipDeviceGetName(name, 256, num);
if (result != hipSuccess) {
fprintf(stderr, "HIP :hipDeviceGetName: %s\n", hipewErrorString(result));
continue;
}
int major;
hipDeviceGetAttribute(&major, hipDeviceAttributeComputeCapabilityMajor, num);
// TODO : (Arya) What is the last major version we are supporting?
DeviceInfo info;
info.type = DEVICE_HIP;
info.description = string(name);
info.num = num;
info.has_half_images = (major >= 3);
info.has_nanovdb = true;
info.denoisers = 0;
info.has_gpu_queue = true;
/* Check if the device has P2P access to any other device in the system. */
for (int peer_num = 0; peer_num < count && !info.has_peer_memory; peer_num++) {
if (num != peer_num) {
int can_access = 0;
hipDeviceCanAccessPeer(&can_access, num, peer_num);
info.has_peer_memory = (can_access != 0);
}
}
int pci_location[3] = {0, 0, 0};
hipDeviceGetAttribute(&pci_location[0], hipDeviceAttributePciDomainID, num);
hipDeviceGetAttribute(&pci_location[1], hipDeviceAttributePciBusId, num);
hipDeviceGetAttribute(&pci_location[2], hipDeviceAttributePciDeviceId, num);
info.id = string_printf("HIP_%s_%04x:%02x:%02x",
name,
(unsigned int)pci_location[0],
(unsigned int)pci_location[1],
(unsigned int)pci_location[2]);
/* If device has a kernel timeout and no compute preemption, we assume
* it is connected to a display and will freeze the display while doing
* computations. */
int timeout_attr = 0, preempt_attr = 0;
hipDeviceGetAttribute(&timeout_attr, hipDeviceAttributeKernelExecTimeout, num);
if (timeout_attr && !preempt_attr) {
VLOG(1) << "Device is recognized as display.";
info.description += " (Display)";
info.display_device = true;
display_devices.push_back(info);
}
else {
VLOG(1) << "Device has compute preemption or is not used for display.";
devices.push_back(info);
}
VLOG(1) << "Added device \"" << name << "\" with id \"" << info.id << "\".";
}
if (!display_devices.empty())
devices.insert(devices.end(), display_devices.begin(), display_devices.end());
#else /* WITH_HIP */
(void)devices;
#endif /* WITH_HIP */
}
string device_hip_capabilities()
{
#ifdef WITH_HIP
hipError_t result = device_hip_safe_init();
if (result != hipSuccess) {
if (result != hipErrorNoDevice) {
return string("Error initializing HIP: ") + hipewErrorString(result);
}
return "No HIP device found\n";
}
int count;
result = hipGetDeviceCount(&count);
if (result != hipSuccess) {
return string("Error getting devices: ") + hipewErrorString(result);
}
string capabilities = "";
for (int num = 0; num < count; num++) {
char name[256];
if (hipDeviceGetName(name, 256, num) != hipSuccess) {
continue;
}
capabilities += string("\t") + name + "\n";
int value;
# define GET_ATTR(attr) \
{ \
if (hipDeviceGetAttribute(&value, hipDeviceAttribute##attr, num) == hipSuccess) { \
capabilities += string_printf("\t\thipDeviceAttribute" #attr "\t\t\t%d\n", value); \
} \
} \
(void)0
/* TODO(sergey): Strip all attributes which are not useful for us
* or does not depend on the driver.
*/
GET_ATTR(MaxThreadsPerBlock);
GET_ATTR(MaxBlockDimX);
GET_ATTR(MaxBlockDimY);
GET_ATTR(MaxBlockDimZ);
GET_ATTR(MaxGridDimX);
GET_ATTR(MaxGridDimY);
GET_ATTR(MaxGridDimZ);
GET_ATTR(MaxSharedMemoryPerBlock);
GET_ATTR(TotalConstantMemory);
GET_ATTR(WarpSize);
GET_ATTR(MaxPitch);
GET_ATTR(MaxRegistersPerBlock);
GET_ATTR(ClockRate);
GET_ATTR(TextureAlignment);
GET_ATTR(MultiprocessorCount);
GET_ATTR(KernelExecTimeout);
GET_ATTR(Integrated);
GET_ATTR(CanMapHostMemory);
GET_ATTR(ComputeMode);
GET_ATTR(MaxTexture1DWidth);
GET_ATTR(MaxTexture2DWidth);
GET_ATTR(MaxTexture2DHeight);
GET_ATTR(MaxTexture3DWidth);
GET_ATTR(MaxTexture3DHeight);
GET_ATTR(MaxTexture3DDepth);
GET_ATTR(ConcurrentKernels);
GET_ATTR(EccEnabled);
GET_ATTR(MemoryClockRate);
GET_ATTR(MemoryBusWidth);
GET_ATTR(L2CacheSize);
GET_ATTR(MaxThreadsPerMultiProcessor);
GET_ATTR(ComputeCapabilityMajor);
GET_ATTR(ComputeCapabilityMinor);
GET_ATTR(MaxSharedMemoryPerMultiprocessor);
GET_ATTR(ManagedMemory);
GET_ATTR(IsMultiGpuBoard);
# undef GET_ATTR
capabilities += "\n";
}
return capabilities;
#else /* WITH_HIP */
return "";
#endif /* WITH_HIP */
}
CCL_NAMESPACE_END

View File

@@ -0,0 +1,37 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "util/util_string.h"
#include "util/util_vector.h"
CCL_NAMESPACE_BEGIN
class Device;
class DeviceInfo;
class Profiler;
class Stats;
bool device_hip_init();
Device *device_hip_create(const DeviceInfo &info, Stats &stats, Profiler &profiler);
void device_hip_info(vector<DeviceInfo> &devices);
string device_hip_capabilities();
CCL_NAMESPACE_END

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,153 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/device.h"
# include "device/hip/kernel.h"
# include "device/hip/queue.h"
# include "device/hip/util.h"
# include "util/util_map.h"
# ifdef WITH_HIP_DYNLOAD
# include "hipew.h"
# else
# include "util/util_opengl.h"
# endif
CCL_NAMESPACE_BEGIN
class DeviceQueue;
class HIPDevice : public Device {
friend class HIPContextScope;
public:
hipDevice_t hipDevice;
hipCtx_t hipContext;
hipModule_t hipModule;
size_t device_texture_headroom;
size_t device_working_headroom;
bool move_texture_to_host;
size_t map_host_used;
size_t map_host_limit;
int can_map_host;
int pitch_alignment;
int hipDevId;
int hipDevArchitecture;
bool first_error;
struct HIPMem {
HIPMem() : texobject(0), array(0), use_mapped_host(false)
{
}
hipTextureObject_t texobject;
hArray array;
/* If true, a mapped host memory in shared_pointer is being used. */
bool use_mapped_host;
};
typedef map<device_memory *, HIPMem> HIPMemMap;
HIPMemMap hip_mem_map;
thread_mutex hip_mem_map_mutex;
/* Bindless Textures */
device_vector<TextureInfo> texture_info;
bool need_texture_info;
HIPDeviceKernels kernels;
static bool have_precompiled_kernels();
virtual bool show_samples() const override;
virtual BVHLayoutMask get_bvh_layout_mask() const override;
void set_error(const string &error) override;
HIPDevice(const DeviceInfo &info, Stats &stats, Profiler &profiler);
virtual ~HIPDevice();
bool support_device(const uint /*kernel_features*/);
bool check_peer_access(Device *peer_device) override;
bool use_adaptive_compilation();
virtual string compile_kernel_get_common_cflags(const uint kernel_features);
string compile_kernel(const uint kernel_features,
const char *name,
const char *base = "hip",
bool force_ptx = false);
virtual bool load_kernels(const uint kernel_features) override;
void reserve_local_memory(const uint kernel_features);
void init_host_memory();
void load_texture_info();
void move_textures_to_host(size_t size, bool for_texture);
HIPMem *generic_alloc(device_memory &mem, size_t pitch_padding = 0);
void generic_copy_to(device_memory &mem);
void generic_free(device_memory &mem);
void mem_alloc(device_memory &mem) override;
void mem_copy_to(device_memory &mem) override;
void mem_copy_from(device_memory &mem, size_t y, size_t w, size_t h, size_t elem) override;
void mem_zero(device_memory &mem) override;
void mem_free(device_memory &mem) override;
device_ptr mem_alloc_sub_ptr(device_memory &mem, size_t offset, size_t /*size*/) override;
virtual void const_copy_to(const char *name, void *host, size_t size) override;
void global_alloc(device_memory &mem);
void global_free(device_memory &mem);
void tex_alloc(device_texture &mem);
void tex_free(device_texture &mem);
/* Graphics resources interoperability. */
virtual bool should_use_graphics_interop() override;
virtual unique_ptr<DeviceQueue> gpu_queue_create() override;
int get_num_multiprocessors();
int get_max_num_threads_per_multiprocessor();
protected:
bool get_device_attribute(hipDeviceAttribute_t attribute, int *value);
int get_device_default_attribute(hipDeviceAttribute_t attribute, int default_value);
};
CCL_NAMESPACE_END
#endif

View File

@@ -0,0 +1,93 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/hip/graphics_interop.h"
# include "device/hip/device_impl.h"
# include "device/hip/util.h"
CCL_NAMESPACE_BEGIN
HIPDeviceGraphicsInterop::HIPDeviceGraphicsInterop(HIPDeviceQueue *queue)
: queue_(queue), device_(static_cast<HIPDevice *>(queue->device))
{
}
HIPDeviceGraphicsInterop::~HIPDeviceGraphicsInterop()
{
HIPContextScope scope(device_);
if (hip_graphics_resource_) {
hip_device_assert(device_, hipGraphicsUnregisterResource(hip_graphics_resource_));
}
}
void HIPDeviceGraphicsInterop::set_destination(const DeviceGraphicsInteropDestination &destination)
{
const int64_t new_buffer_area = int64_t(destination.buffer_width) * destination.buffer_height;
if (opengl_pbo_id_ == destination.opengl_pbo_id && buffer_area_ == new_buffer_area) {
return;
}
HIPContextScope scope(device_);
if (hip_graphics_resource_) {
hip_device_assert(device_, hipGraphicsUnregisterResource(hip_graphics_resource_));
}
const hipError_t result = hipGraphicsGLRegisterBuffer(
&hip_graphics_resource_, destination.opengl_pbo_id, hipGraphicsRegisterFlagsNone);
if (result != hipSuccess) {
LOG(ERROR) << "Error registering OpenGL buffer: " << hipewErrorString(result);
}
opengl_pbo_id_ = destination.opengl_pbo_id;
buffer_area_ = new_buffer_area;
}
device_ptr HIPDeviceGraphicsInterop::map()
{
if (!hip_graphics_resource_) {
return 0;
}
HIPContextScope scope(device_);
hipDeviceptr_t hip_buffer;
size_t bytes;
hip_device_assert(device_,
hipGraphicsMapResources(1, &hip_graphics_resource_, queue_->stream()));
hip_device_assert(
device_, hipGraphicsResourceGetMappedPointer(&hip_buffer, &bytes, hip_graphics_resource_));
return static_cast<device_ptr>(hip_buffer);
}
void HIPDeviceGraphicsInterop::unmap()
{
HIPContextScope scope(device_);
hip_device_assert(device_,
hipGraphicsUnmapResources(1, &hip_graphics_resource_, queue_->stream()));
}
CCL_NAMESPACE_END
#endif

View File

@@ -0,0 +1,61 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/device_graphics_interop.h"
# ifdef WITH_HIP_DYNLOAD
# include "hipew.h"
# endif
CCL_NAMESPACE_BEGIN
class HIPDevice;
class HIPDeviceQueue;
class HIPDeviceGraphicsInterop : public DeviceGraphicsInterop {
public:
explicit HIPDeviceGraphicsInterop(HIPDeviceQueue *queue);
HIPDeviceGraphicsInterop(const HIPDeviceGraphicsInterop &other) = delete;
HIPDeviceGraphicsInterop(HIPDeviceGraphicsInterop &&other) noexcept = delete;
~HIPDeviceGraphicsInterop();
HIPDeviceGraphicsInterop &operator=(const HIPDeviceGraphicsInterop &other) = delete;
HIPDeviceGraphicsInterop &operator=(HIPDeviceGraphicsInterop &&other) = delete;
virtual void set_destination(const DeviceGraphicsInteropDestination &destination) override;
virtual device_ptr map() override;
virtual void unmap() override;
protected:
HIPDeviceQueue *queue_ = nullptr;
HIPDevice *device_ = nullptr;
/* OpenGL PBO which is currently registered as the destination for the CUDA buffer. */
uint opengl_pbo_id_ = 0;
/* Buffer area in pixels of the corresponding PBO. */
int64_t buffer_area_ = 0;
hipGraphicsResource hip_graphics_resource_ = nullptr;
};
CCL_NAMESPACE_END
#endif

View File

@@ -0,0 +1,69 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/hip/kernel.h"
# include "device/hip/device_impl.h"
CCL_NAMESPACE_BEGIN
void HIPDeviceKernels::load(HIPDevice *device)
{
hipModule_t hipModule = device->hipModule;
for (int i = 0; i < (int)DEVICE_KERNEL_NUM; i++) {
HIPDeviceKernel &kernel = kernels_[i];
/* No megakernel used for GPU. */
if (i == DEVICE_KERNEL_INTEGRATOR_MEGAKERNEL) {
continue;
}
const std::string function_name = std::string("kernel_gpu_") +
device_kernel_as_string((DeviceKernel)i);
hip_device_assert(device,
hipModuleGetFunction(&kernel.function, hipModule, function_name.c_str()));
if (kernel.function) {
hip_device_assert(device, hipFuncSetCacheConfig(kernel.function, hipFuncCachePreferL1));
hip_device_assert(
device,
hipModuleOccupancyMaxPotentialBlockSize(
&kernel.min_blocks, &kernel.num_threads_per_block, kernel.function, 0, 0));
}
else {
LOG(ERROR) << "Unable to load kernel " << function_name;
}
}
loaded = true;
}
const HIPDeviceKernel &HIPDeviceKernels::get(DeviceKernel kernel) const
{
return kernels_[(int)kernel];
}
bool HIPDeviceKernels::available(DeviceKernel kernel) const
{
return kernels_[(int)kernel].function != nullptr;
}
CCL_NAMESPACE_END
#endif /* WITH_HIP*/

View File

@@ -0,0 +1,54 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#ifdef WITH_HIP
# include "device/device_kernel.h"
# ifdef WITH_HIP_DYNLOAD
# include "hipew.h"
# endif
CCL_NAMESPACE_BEGIN
class HIPDevice;
/* HIP kernel and associate occupancy information. */
class HIPDeviceKernel {
public:
hipFunction_t function = nullptr;
int num_threads_per_block = 0;
int min_blocks = 0;
};
/* Cache of HIP kernels for each DeviceKernel. */
class HIPDeviceKernels {
public:
void load(HIPDevice *device);
const HIPDeviceKernel &get(DeviceKernel kernel) const;
bool available(DeviceKernel kernel) const;
protected:
HIPDeviceKernel kernels_[DEVICE_KERNEL_NUM];
bool loaded = false;
};
CCL_NAMESPACE_END
#endif /* WITH_HIP */

View File

@@ -0,0 +1,209 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/hip/queue.h"
# include "device/hip/device_impl.h"
# include "device/hip/graphics_interop.h"
# include "device/hip/kernel.h"
CCL_NAMESPACE_BEGIN
/* HIPDeviceQueue */
HIPDeviceQueue::HIPDeviceQueue(HIPDevice *device)
: DeviceQueue(device), hip_device_(device), hip_stream_(nullptr)
{
const HIPContextScope scope(hip_device_);
hip_device_assert(hip_device_, hipStreamCreateWithFlags(&hip_stream_, hipStreamNonBlocking));
}
HIPDeviceQueue::~HIPDeviceQueue()
{
const HIPContextScope scope(hip_device_);
hipStreamDestroy(hip_stream_);
}
int HIPDeviceQueue::num_concurrent_states(const size_t /*state_size*/) const
{
/* TODO: compute automatically. */
/* TODO: must have at least num_threads_per_block. */
return 14416128;
}
int HIPDeviceQueue::num_concurrent_busy_states() const
{
const int max_num_threads = hip_device_->get_num_multiprocessors() *
hip_device_->get_max_num_threads_per_multiprocessor();
if (max_num_threads == 0) {
return 65536;
}
return 4 * max_num_threads;
}
void HIPDeviceQueue::init_execution()
{
/* Synchronize all textures and memory copies before executing task. */
HIPContextScope scope(hip_device_);
hip_device_->load_texture_info();
hip_device_assert(hip_device_, hipDeviceSynchronize());
debug_init_execution();
}
bool HIPDeviceQueue::kernel_available(DeviceKernel kernel) const
{
return hip_device_->kernels.available(kernel);
}
bool HIPDeviceQueue::enqueue(DeviceKernel kernel, const int work_size, void *args[])
{
if (hip_device_->have_error()) {
return false;
}
debug_enqueue(kernel, work_size);
const HIPContextScope scope(hip_device_);
const HIPDeviceKernel &hip_kernel = hip_device_->kernels.get(kernel);
/* Compute kernel launch parameters. */
const int num_threads_per_block = hip_kernel.num_threads_per_block;
const int num_blocks = divide_up(work_size, num_threads_per_block);
int shared_mem_bytes = 0;
switch (kernel) {
case DEVICE_KERNEL_INTEGRATOR_QUEUED_PATHS_ARRAY:
case DEVICE_KERNEL_INTEGRATOR_QUEUED_SHADOW_PATHS_ARRAY:
case DEVICE_KERNEL_INTEGRATOR_ACTIVE_PATHS_ARRAY:
case DEVICE_KERNEL_INTEGRATOR_TERMINATED_PATHS_ARRAY:
case DEVICE_KERNEL_INTEGRATOR_SORTED_PATHS_ARRAY:
case DEVICE_KERNEL_INTEGRATOR_COMPACT_PATHS_ARRAY:
/* See parall_active_index.h for why this amount of shared memory is needed. */
shared_mem_bytes = (num_threads_per_block + 1) * sizeof(int);
break;
default:
break;
}
/* Launch kernel. */
hip_device_assert(hip_device_,
hipModuleLaunchKernel(hip_kernel.function,
num_blocks,
1,
1,
num_threads_per_block,
1,
1,
shared_mem_bytes,
hip_stream_,
args,
0));
return !(hip_device_->have_error());
}
bool HIPDeviceQueue::synchronize()
{
if (hip_device_->have_error()) {
return false;
}
const HIPContextScope scope(hip_device_);
hip_device_assert(hip_device_, hipStreamSynchronize(hip_stream_));
debug_synchronize();
return !(hip_device_->have_error());
}
void HIPDeviceQueue::zero_to_device(device_memory &mem)
{
assert(mem.type != MEM_GLOBAL && mem.type != MEM_TEXTURE);
if (mem.memory_size() == 0) {
return;
}
/* Allocate on demand. */
if (mem.device_pointer == 0) {
hip_device_->mem_alloc(mem);
}
/* Zero memory on device. */
assert(mem.device_pointer != 0);
const HIPContextScope scope(hip_device_);
hip_device_assert(
hip_device_,
hipMemsetD8Async((hipDeviceptr_t)mem.device_pointer, 0, mem.memory_size(), hip_stream_));
}
void HIPDeviceQueue::copy_to_device(device_memory &mem)
{
assert(mem.type != MEM_GLOBAL && mem.type != MEM_TEXTURE);
if (mem.memory_size() == 0) {
return;
}
/* Allocate on demand. */
if (mem.device_pointer == 0) {
hip_device_->mem_alloc(mem);
}
assert(mem.device_pointer != 0);
assert(mem.host_pointer != nullptr);
/* Copy memory to device. */
const HIPContextScope scope(hip_device_);
hip_device_assert(
hip_device_,
hipMemcpyHtoDAsync(
(hipDeviceptr_t)mem.device_pointer, mem.host_pointer, mem.memory_size(), hip_stream_));
}
void HIPDeviceQueue::copy_from_device(device_memory &mem)
{
assert(mem.type != MEM_GLOBAL && mem.type != MEM_TEXTURE);
if (mem.memory_size() == 0) {
return;
}
assert(mem.device_pointer != 0);
assert(mem.host_pointer != nullptr);
/* Copy memory from device. */
const HIPContextScope scope(hip_device_);
hip_device_assert(
hip_device_,
hipMemcpyDtoHAsync(
mem.host_pointer, (hipDeviceptr_t)mem.device_pointer, mem.memory_size(), hip_stream_));
}
// TODO : (Arya) Enable this after stabilizing dev branch
unique_ptr<DeviceGraphicsInterop> HIPDeviceQueue::graphics_interop_create()
{
return make_unique<HIPDeviceGraphicsInterop>(this);
}
CCL_NAMESPACE_END
#endif /* WITH_HIP */

View File

@@ -0,0 +1,68 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#ifdef WITH_HIP
# include "device/device_kernel.h"
# include "device/device_memory.h"
# include "device/device_queue.h"
# include "device/hip/util.h"
CCL_NAMESPACE_BEGIN
class HIPDevice;
class device_memory;
/* Base class for HIP queues. */
class HIPDeviceQueue : public DeviceQueue {
public:
HIPDeviceQueue(HIPDevice *device);
~HIPDeviceQueue();
virtual int num_concurrent_states(const size_t state_size) const override;
virtual int num_concurrent_busy_states() const override;
virtual void init_execution() override;
virtual bool kernel_available(DeviceKernel kernel) const override;
virtual bool enqueue(DeviceKernel kernel, const int work_size, void *args[]) override;
virtual bool synchronize() override;
virtual void zero_to_device(device_memory &mem) override;
virtual void copy_to_device(device_memory &mem) override;
virtual void copy_from_device(device_memory &mem) override;
virtual hipStream_t stream()
{
return hip_stream_;
}
// TODO : (Arya) Enable this after stabilizing the dev branch
virtual unique_ptr<DeviceGraphicsInterop> graphics_interop_create() override;
protected:
HIPDevice *hip_device_;
hipStream_t hip_stream_;
};
CCL_NAMESPACE_END
#endif /* WITH_HIP */

View File

@@ -0,0 +1,61 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifdef WITH_HIP
# include "device/hip/util.h"
# include "device/hip/device_impl.h"
CCL_NAMESPACE_BEGIN
HIPContextScope::HIPContextScope(HIPDevice *device) : device(device)
{
hip_device_assert(device, hipCtxPushCurrent(device->hipContext));
}
HIPContextScope::~HIPContextScope()
{
hip_device_assert(device, hipCtxPopCurrent(NULL));
}
# ifndef WITH_HIP_DYNLOAD
const char *hipewErrorString(hipError_t result)
{
/* We can only give error code here without major code duplication, that
* should be enough since dynamic loading is only being disabled by folks
* who knows what they're doing anyway.
*
* NOTE: Avoid call from several threads.
*/
static string error;
error = string_printf("%d", result);
return error.c_str();
}
const char *hipewCompilerPath()
{
return CYCLES_HIP_HIPCC_EXECUTABLE;
}
int hipewCompilerVersion()
{
return (HIP_VERSION / 100) + (HIP_VERSION % 100 / 10);
}
# endif
CCL_NAMESPACE_END
#endif /* WITH_HIP */

View File

@@ -0,0 +1,63 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#ifdef WITH_HIP
# ifdef WITH_HIP_DYNLOAD
# include "hipew.h"
# endif
CCL_NAMESPACE_BEGIN
class HIPDevice;
/* Utility to push/pop HIP context. */
class HIPContextScope {
public:
HIPContextScope(HIPDevice *device);
~HIPContextScope();
private:
HIPDevice *device;
};
/* Utility for checking return values of HIP function calls. */
# define hip_device_assert(hip_device, stmt) \
{ \
hipError_t result = stmt; \
if (result != hipSuccess) { \
const char *name = hipewErrorString(result); \
hip_device->set_error( \
string_printf("%s in %s (%s:%d)", name, #stmt, __FILE__, __LINE__)); \
} \
} \
(void)0
# define hip_assert(stmt) hip_device_assert(this, stmt)
# ifndef WITH_HIP_DYNLOAD
/* Transparently implement some functions, so majority of the file does not need
* to worry about difference between dynamically loaded and linked HIP at all. */
const char *hipewErrorString(hipError_t result);
const char *hipewCompilerPath();
int hipewCompilerVersion();
# endif /* WITH_HIP_DYNLOAD */
CCL_NAMESPACE_END
#endif /* WITH_HIP */

View File

@@ -315,14 +315,14 @@ class MultiDevice : public Device {
stats.mem_alloc(mem.device_size - existing_size);
}
void mem_copy_from(device_memory &mem, int y, int w, int h, int elem) override
void mem_copy_from(device_memory &mem, size_t y, size_t w, size_t h, size_t elem) override
{
device_ptr key = mem.device_pointer;
int i = 0, sub_h = h / devices.size();
size_t i = 0, sub_h = h / devices.size();
foreach (SubDevice &sub, devices) {
int sy = y + i * sub_h;
int sh = (i == (int)devices.size() - 1) ? h - sub_h * i : sub_h;
size_t sy = y + i * sub_h;
size_t sh = (i == (size_t)devices.size() - 1) ? h - sub_h * i : sub_h;
SubDevice *owner_sub = find_matching_mem_device(key, sub);
mem.device = owner_sub->device;

View File

@@ -315,6 +315,11 @@ bool OptiXDevice::load_kernels(const uint kernel_features)
group_descs[PG_HITS].kind = OPTIX_PROGRAM_GROUP_KIND_HITGROUP;
group_descs[PG_HITS].hitgroup.moduleAH = optix_module;
group_descs[PG_HITS].hitgroup.entryFunctionNameAH = "__anyhit__kernel_optix_shadow_all_hit";
group_descs[PG_HITV].kind = OPTIX_PROGRAM_GROUP_KIND_HITGROUP;
group_descs[PG_HITV].hitgroup.moduleCH = optix_module;
group_descs[PG_HITV].hitgroup.entryFunctionNameCH = "__closesthit__kernel_optix_hit";
group_descs[PG_HITV].hitgroup.moduleAH = optix_module;
group_descs[PG_HITV].hitgroup.entryFunctionNameAH = "__anyhit__kernel_optix_volume_test";
if (kernel_features & KERNEL_FEATURE_HAIR) {
if (kernel_features & KERNEL_FEATURE_HAIR_THICK) {
@@ -397,6 +402,7 @@ bool OptiXDevice::load_kernels(const uint kernel_features)
trace_css = std::max(trace_css, stack_size[PG_HITD].cssIS + stack_size[PG_HITD].cssAH);
trace_css = std::max(trace_css, stack_size[PG_HITS].cssIS + stack_size[PG_HITS].cssAH);
trace_css = std::max(trace_css, stack_size[PG_HITL].cssIS + stack_size[PG_HITL].cssAH);
trace_css = std::max(trace_css, stack_size[PG_HITV].cssIS + stack_size[PG_HITV].cssAH);
trace_css = std::max(trace_css,
stack_size[PG_HITD_MOTION].cssIS + stack_size[PG_HITD_MOTION].cssAH);
trace_css = std::max(trace_css,
@@ -421,6 +427,7 @@ bool OptiXDevice::load_kernels(const uint kernel_features)
pipeline_groups.push_back(groups[PG_HITD]);
pipeline_groups.push_back(groups[PG_HITS]);
pipeline_groups.push_back(groups[PG_HITL]);
pipeline_groups.push_back(groups[PG_HITV]);
if (motion_blur) {
pipeline_groups.push_back(groups[PG_HITD_MOTION]);
pipeline_groups.push_back(groups[PG_HITS_MOTION]);
@@ -459,6 +466,7 @@ bool OptiXDevice::load_kernels(const uint kernel_features)
pipeline_groups.push_back(groups[PG_HITD]);
pipeline_groups.push_back(groups[PG_HITS]);
pipeline_groups.push_back(groups[PG_HITL]);
pipeline_groups.push_back(groups[PG_HITV]);
if (motion_blur) {
pipeline_groups.push_back(groups[PG_HITD_MOTION]);
pipeline_groups.push_back(groups[PG_HITS_MOTION]);
@@ -1390,25 +1398,33 @@ void OptiXDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
/* Set user instance ID to object index (but leave low bit blank). */
instance.instanceId = ob->get_device_index() << 1;
/* Have to have at least one bit in the mask, or else instance would always be culled. */
instance.visibilityMask = 1;
/* Add some of the object visibility bits to the mask.
* __prim_visibility contains the combined visibility bits of all instances, so is not
* reliable if they differ between instances. But the OptiX visibility mask can only contain
* 8 bits, so have to trade-off here and select just a few important ones.
*/
instance.visibilityMask = ob->visibility_for_tracing() & 0xFF;
if (ob->get_geometry()->has_volume) {
/* Volumes have a special bit set in the visibility mask so a trace can mask only volumes.
*/
instance.visibilityMask |= 2;
/* Have to have at least one bit in the mask, or else instance would always be culled. */
if (0 == instance.visibilityMask) {
instance.visibilityMask = 0xFF;
}
if (ob->get_geometry()->geometry_type == Geometry::HAIR) {
/* Same applies to curves (so they can be skipped in local trace calls). */
instance.visibilityMask |= 4;
if (motion_blur && ob->get_geometry()->has_motion_blur() &&
static_cast<const Hair *>(ob->get_geometry())->curve_shape == CURVE_THICK) {
if (ob->get_geometry()->geometry_type == Geometry::HAIR &&
static_cast<const Hair *>(ob->get_geometry())->curve_shape == CURVE_THICK) {
if (motion_blur && ob->get_geometry()->has_motion_blur()) {
/* Select between motion blur and non-motion blur built-in intersection module. */
instance.sbtOffset = PG_HITD_MOTION - PG_HITD;
}
}
else {
/* Can disable __anyhit__kernel_optix_visibility_test by default (except for thick curves,
* since it needs to filter out endcaps there).
* It is enabled where necessary (visibility mask exceeds 8 bits or the other any-hit
* programs like __anyhit__kernel_optix_shadow_all_hit) via OPTIX_RAY_FLAG_ENFORCE_ANYHIT.
*/
instance.flags = OPTIX_INSTANCE_FLAG_DISABLE_ANYHIT;
}
/* Insert motion traversable if object has motion. */
if (motion_blur && ob->use_motion()) {
@@ -1474,7 +1490,7 @@ void OptiXDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
delete[] reinterpret_cast<uint8_t *>(&motion_transform);
/* Disable instance transform if object uses motion transform already. */
instance.flags = OPTIX_INSTANCE_FLAG_DISABLE_TRANSFORM;
instance.flags |= OPTIX_INSTANCE_FLAG_DISABLE_TRANSFORM;
/* Get traversable handle to motion transform. */
optixConvertPointerToTraversableHandle(context,
@@ -1491,7 +1507,7 @@ void OptiXDevice::build_bvh(BVH *bvh, Progress &progress, bool refit)
}
else {
/* Disable instance transform if geometry already has it applied to vertex data. */
instance.flags = OPTIX_INSTANCE_FLAG_DISABLE_TRANSFORM;
instance.flags |= OPTIX_INSTANCE_FLAG_DISABLE_TRANSFORM;
/* Non-instanced objects read ID from 'prim_object', so distinguish
* them from instanced objects with the low bit set. */
instance.instanceId |= 1;

View File

@@ -40,6 +40,7 @@ enum {
PG_HITD, /* Default hit group. */
PG_HITS, /* __SHADOW_RECORD_ALL__ hit group. */
PG_HITL, /* __BVH_LOCAL__ hit group (only used for triangles). */
PG_HITV, /* __VOLUME__ hit group. */
PG_HITD_MOTION,
PG_HITS_MOTION,
PG_CALL_SVM_AO,
@@ -51,7 +52,7 @@ enum {
static const int MISS_PROGRAM_GROUP_OFFSET = PG_MISS;
static const int NUM_MIS_PROGRAM_GROUPS = 1;
static const int HIT_PROGAM_GROUP_OFFSET = PG_HITD;
static const int NUM_HIT_PROGRAM_GROUPS = 5;
static const int NUM_HIT_PROGRAM_GROUPS = 6;
static const int CALLABLE_PROGRAM_GROUPS_BASE = PG_CALL_SVM_AO;
static const int NUM_CALLABLE_PROGRAM_GROUPS = 3;

View File

@@ -244,7 +244,7 @@ static void foreach_sliced_buffer_params(const vector<unique_ptr<PathTraceWork>>
const int slice_height = max(lround(height * weight), 1);
/* Disallow negative values to deal with situations when there are more compute devices than
* scanlines. */
* scan-lines. */
const int remaining_height = max(0, height - current_y);
BufferParams slide_params = buffer_params;
@@ -801,7 +801,7 @@ void PathTrace::tile_buffer_write_to_disk()
}
if (!tile_manager_.write_tile(*buffers)) {
LOG(ERROR) << "Error writing tile to file.";
device_->set_error("Error writing tile to file");
}
}
@@ -894,7 +894,14 @@ void PathTrace::process_full_buffer_from_disk(string_view filename)
DenoiseParams denoise_params;
if (!tile_manager_.read_full_buffer_from_disk(filename, &full_frame_buffers, &denoise_params)) {
LOG(ERROR) << "Error reading tiles from file.";
const string error_message = "Error reading tiles from file";
if (progress_) {
progress_->set_error(error_message);
progress_->set_cancel(error_message);
}
else {
LOG(ERROR) << error_message;
}
return;
}
@@ -1028,6 +1035,8 @@ static const char *device_type_for_description(const DeviceType type)
return "CUDA";
case DEVICE_OPTIX:
return "OptiX";
case DEVICE_HIP:
return "HIP";
case DEVICE_DUMMY:
return "Dummy";
case DEVICE_MULTI:

View File

@@ -286,7 +286,7 @@ class PathTrace {
/* Parameters of the big tile with the current resolution divider applied. */
BufferParams effective_big_tile_params;
/* Denosier was run and there are denoised versions of the passes in the render buffers. */
/* Denoiser was run and there are denoised versions of the passes in the render buffers. */
bool has_denoised_result = false;
/* Current tile has been written (to either disk or callback.

View File

@@ -104,7 +104,7 @@ class PathTraceWork {
* - Copies work's render buffer to its device. */
void copy_from_render_buffers(const RenderBuffers *render_buffers);
/* Special version of the `copy_from_render_buffers()` which only copies denosied passes from the
/* Special version of the `copy_from_render_buffers()` which only copies denoised passes from the
* given render buffers, leaving rest of the passes.
*
* Same notes about device copying applies to this call as well. */

View File

@@ -19,6 +19,8 @@
#include "device/cpu/kernel.h"
#include "device/device.h"
#include "kernel/kernel_path_state.h"
#include "integrator/pass_accessor_cpu.h"
#include "render/buffers.h"
@@ -116,13 +118,17 @@ void PathTraceWorkCPU::render_samples_full_pipeline(KernelGlobals *kernel_global
const KernelWorkTile &work_tile,
const int samples_num)
{
const bool has_shadow_catcher = device_scene_->data.integrator.has_shadow_catcher;
const bool has_bake = device_scene_->data.bake.use;
IntegratorStateCPU integrator_states[2] = {};
IntegratorStateCPU integrator_states[2];
IntegratorStateCPU *state = &integrator_states[0];
IntegratorStateCPU *shadow_catcher_state = &integrator_states[1];
IntegratorStateCPU *shadow_catcher_state = nullptr;
if (device_scene_->data.integrator.has_shadow_catcher) {
shadow_catcher_state = &integrator_states[1];
path_state_init_queues(kernel_globals, shadow_catcher_state);
}
KernelWorkTile sample_work_tile = work_tile;
float *render_buffer = buffers_->buffer.data();
@@ -147,7 +153,7 @@ void PathTraceWorkCPU::render_samples_full_pipeline(KernelGlobals *kernel_global
kernels_.integrator_megakernel(kernel_globals, state, render_buffer);
if (has_shadow_catcher) {
if (shadow_catcher_state) {
kernels_.integrator_megakernel(kernel_globals, shadow_catcher_state, render_buffer);
}

View File

@@ -95,8 +95,8 @@ void PathTraceWorkGPU::alloc_integrator_soa()
#define KERNEL_STRUCT_END(name) \
break; \
}
#define KERNEL_STRUCT_END_ARRAY(name, array_size) \
if (array_index == array_size - 1) { \
#define KERNEL_STRUCT_END_ARRAY(name, cpu_array_size, gpu_array_size) \
if (array_index == gpu_array_size - 1) { \
break; \
} \
}
@@ -738,7 +738,8 @@ void PathTraceWorkGPU::copy_to_gpu_display_naive(GPUDisplay *gpu_display,
get_render_tile_film_pixels(destination, pass_mode, num_samples);
gpu_display_rgba_half_.copy_from_device();
queue_->copy_from_device(gpu_display_rgba_half_);
queue_->synchronize();
gpu_display->copy_pixels_to_texture(
gpu_display_rgba_half_.data(), texture_x, texture_y, width, height);

View File

@@ -384,7 +384,7 @@ bool RenderScheduler::set_postprocess_render_work(RenderWork *render_work)
}
if (denoiser_params_.use && !state_.last_work_tile_was_denoised) {
render_work->tile.denoise = true;
render_work->tile.denoise = !tile_manager_.has_multiple_tiles();
any_scheduled = true;
}
@@ -903,6 +903,12 @@ bool RenderScheduler::work_need_denoise(bool &delayed, bool &ready_to_display)
return false;
}
/* When multiple tiles are used the full frame will be denoised.
* Avoid per-tile denoising to save up render time. */
if (tile_manager_.has_multiple_tiles()) {
return false;
}
if (done()) {
/* Always denoise at the last sample. */
return true;

View File

@@ -31,7 +31,7 @@ class RenderWork {
int resolution_divider = 1;
/* Initialize render buffers.
* Includes steps like zero-ing the buffer on the device, and optional reading of pixels from the
* Includes steps like zeroing the buffer on the device, and optional reading of pixels from the
* baking target. */
bool init_render_buffers = false;

View File

@@ -149,14 +149,14 @@ bool ShaderEval::eval_gpu(Device *device,
/* Execute work on GPU in chunk, so we can cancel.
* TODO : query appropriate size from device.*/
const int chunk_size = 65536;
const int64_t chunk_size = 65536;
const int work_size = output.size();
const int64_t work_size = output.size();
void *d_input = (void *)input.device_pointer;
void *d_output = (void *)output.device_pointer;
for (int d_offset = 0; d_offset < work_size; d_offset += chunk_size) {
int d_work_size = min(chunk_size, work_size - d_offset);
for (int64_t d_offset = 0; d_offset < work_size; d_offset += chunk_size) {
int64_t d_work_size = std::min(chunk_size, work_size - d_offset);
void *args[] = {&d_input, &d_output, &d_offset, &d_work_size};
queue->enqueue(kernel, d_work_size, args);

View File

@@ -35,6 +35,10 @@ set(SRC_DEVICE_CUDA
device/cuda/kernel.cu
)
set(SRC_DEVICE_HIP
device/hip/kernel.cpp
)
set(SRC_DEVICE_OPTIX
device/optix/kernel.cu
device/optix/kernel_shader_raytrace.cu
@@ -106,6 +110,12 @@ set(SRC_DEVICE_CUDA_HEADERS
device/cuda/globals.h
)
set(SRC_DEVICE_HIP_HEADERS
device/hip/compat.h
device/hip/config.h
device/hip/globals.h
)
set(SRC_DEVICE_OPTIX_HEADERS
device/optix/compat.h
device/optix/globals.h
@@ -458,6 +468,104 @@ if(WITH_CYCLES_CUDA_BINARIES)
cycles_set_solution_folder(cycles_kernel_cuda)
endif()
####################################################### START
# HIP module
if(WITH_CYCLES_HIP_BINARIES)
# 64 bit only
set(HIP_BITS 64)
# HIP version
execute_process(COMMAND ${HIP_HIPCC_EXECUTABLE} "--version" OUTPUT_VARIABLE HIPCC_OUT)
string(REGEX REPLACE ".*release ([0-9]+)\\.([0-9]+).*" "\\1" HIP_VERSION_MAJOR "${HIPCC_OUT}")
string(REGEX REPLACE ".*release ([0-9]+)\\.([0-9]+).*" "\\2" HIP_VERSION_MINOR "${HIPCC_OUT}")
set(HIP_VERSION "${HIP_VERSION_MAJOR}${HIP_VERSION_MINOR}")
message(WARNING
"HIP version ${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR} detected")
# build for each arch
set(hip_sources device/hip/kernel.cpp
${SRC_HEADERS}
${SRC_DEVICE_HIP_HEADERS}
${SRC_BVH_HEADERS}
${SRC_SVM_HEADERS}
${SRC_GEOM_HEADERS}
${SRC_INTEGRATOR_HEADERS}
${SRC_CLOSURE_HEADERS}
${SRC_UTIL_HEADERS}
)
set(hip_fatbins)
macro(CYCLES_HIP_KERNEL_ADD arch prev_arch name flags sources experimental)
if(${arch} MATCHES "compute_.*")
set(format "ptx")
else()
set(format "fatbin")
endif()
set(hip_file ${name}_${arch}.${format})
set(kernel_sources ${sources})
if(NOT ${prev_arch} STREQUAL "none")
if(${prev_arch} MATCHES "compute_.*")
set(kernel_sources ${kernel_sources} ${name}_${prev_arch}.ptx)
else()
set(kernel_sources ${kernel_sources} ${name}_${prev_arch}.fatbin)
endif()
endif()
set(hip_kernel_src "/device/hip/${name}.cpp")
set(hip_flags ${flags}
-D CCL_NAMESPACE_BEGIN=
-D CCL_NAMESPACE_END=
-D HIPCC
-m ${HIP_BITS}
-I ${CMAKE_CURRENT_SOURCE_DIR}/..
-I ${CMAKE_CURRENT_SOURCE_DIR}/device/hip
--use_fast_math
-o ${CMAKE_CURRENT_BINARY_DIR}/${hip_file})
if(${experimental})
set(hip_flags ${hip_flags} -D __KERNEL_EXPERIMENTAL__)
set(name ${name}_experimental)
endif()
if(WITH_CYCLES_DEBUG)
set(hip_flags ${hip_flags} -D __KERNEL_DEBUG__)
endif()
if(WITH_NANOVDB)
set(hip_flags ${hip_flags}
-D WITH_NANOVDB
-I "${NANOVDB_INCLUDE_DIR}")
endif()
endmacro()
set(prev_arch "none")
foreach(arch ${CYCLES_HIP_BINARIES_ARCH})
set(hip_hipcc_executable ${HIP_HIPCC_EXECUTABLE})
set(hip_toolkit_root_dir ${HIP_TOOLKIT_ROOT_DIR})
if(DEFINED hip_hipcc_executable AND DEFINED hip_toolkit_root_dir)
# Compile regular kernel
CYCLES_HIP_KERNEL_ADD(${arch} ${prev_arch} kernel "" "${hip_sources}" FALSE)
if(WITH_CYCLES_HIP_BUILD_SERIAL)
set(prev_arch ${arch})
endif()
unset(hip_hipcc_executable)
unset(hip_toolkit_root_dir)
endif()
endforeach()
add_custom_target(cycles_kernel_hip ALL DEPENDS ${hip_fatbins})
cycles_set_solution_folder(cycles_kernel_hip)
endif()
####################################################### END
# OptiX PTX modules
if(WITH_CYCLES_DEVICE_OPTIX AND WITH_CYCLES_CUDA_BINARIES)
@@ -602,11 +710,13 @@ endif()
cycles_add_library(cycles_kernel "${LIB}"
${SRC_DEVICE_CPU}
${SRC_DEVICE_CUDA}
${SRC_DEVICE_HIP}
${SRC_DEVICE_OPTIX}
${SRC_HEADERS}
${SRC_DEVICE_CPU_HEADERS}
${SRC_DEVICE_GPU_HEADERS}
${SRC_DEVICE_CUDA_HEADERS}
${SRC_DEVICE_HIP_HEADERS}
${SRC_DEVICE_OPTIX_HEADERS}
${SRC_BVH_HEADERS}
${SRC_CLOSURE_HEADERS}
@@ -621,6 +731,7 @@ source_group("geom" FILES ${SRC_GEOM_HEADERS})
source_group("integrator" FILES ${SRC_INTEGRATOR_HEADERS})
source_group("kernel" FILES ${SRC_HEADERS})
source_group("device\\cpu" FILES ${SRC_DEVICE_CPU} ${SRC_DEVICE_CPU_HEADERS})
source_group("device\\hip" FILES ${SRC_DEVICE_HIP} ${SRC_DEVICE_HIP_HEADERS})
source_group("device\\gpu" FILES ${SRC_DEVICE_GPU_HEADERS})
source_group("device\\cuda" FILES ${SRC_DEVICE_CUDA} ${SRC_DEVICE_CUDA_HEADERS})
source_group("device\\optix" FILES ${SRC_DEVICE_OPTIX} ${SRC_DEVICE_OPTIX_HEADERS})
@@ -632,14 +743,19 @@ endif()
if(WITH_CYCLES_DEVICE_OPTIX AND WITH_CYCLES_CUDA_BINARIES)
add_dependencies(cycles_kernel cycles_kernel_optix)
endif()
if(WITH_CYCLES_HIP)
add_dependencies(cycles_kernel cycles_kernel_hip)
endif()
# Install kernel source for runtime compilation
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_CUDA}" ${CYCLES_INSTALL_PATH}/source/kernel/device/cuda)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_HIP}" ${CYCLES_INSTALL_PATH}/source/kernel/device/hip)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_OPTIX}" ${CYCLES_INSTALL_PATH}/source/kernel/device/optix)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_GPU_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/device/gpu)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_CUDA_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/device/cuda)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_HIP_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/device/hip)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_DEVICE_OPTIX_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/device/optix)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_BVH_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/bvh)
delayed_install(${CMAKE_CURRENT_SOURCE_DIR} "${SRC_CLOSURE_HEADERS}" ${CYCLES_INSTALL_PATH}/source/kernel/closure)

View File

@@ -167,15 +167,25 @@ ccl_device_intersect bool scene_intersect(const KernelGlobals *kg,
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint ray_mask = visibility & 0xFF;
uint ray_flags = OPTIX_RAY_FLAG_NONE;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
ray_flags = OPTIX_RAY_FLAG_ENFORCE_ANYHIT;
}
else if (visibility & PATH_RAY_SHADOW_OPAQUE) {
ray_flags = OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT;
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.bvh.scene : 0,
ray->P,
ray->D,
0.0f,
ray->t,
ray->time,
0xF,
OPTIX_RAY_FLAG_NONE,
0, // SBT offset for PG_HITD
ray_mask,
ray_flags,
0, /* SBT offset for PG_HITD */
0,
0,
p0,
@@ -251,11 +261,11 @@ ccl_device_intersect bool scene_intersect_local(const KernelGlobals *kg,
uint p2 = ((uint64_t)local_isect) & 0xFFFFFFFF;
uint p3 = (((uint64_t)local_isect) >> 32) & 0xFFFFFFFF;
uint p4 = local_object;
// Is set to zero on miss or if ray is aborted, so can be used as return value
/* Is set to zero on miss or if ray is aborted, so can be used as return value. */
uint p5 = max_hits;
if (local_isect) {
local_isect->num_hits = 0; // Initialize hit count to zero
local_isect->num_hits = 0; /* Initialize hit count to zero. */
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.bvh.scene : 0,
ray->P,
@@ -263,11 +273,10 @@ ccl_device_intersect bool scene_intersect_local(const KernelGlobals *kg,
0.0f,
ray->t,
ray->time,
// Skip curves
0x3,
// Need to always call into __anyhit__kernel_optix_local_hit
0xFF,
/* Need to always call into __anyhit__kernel_optix_local_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
2, // SBT offset for PG_HITL
2, /* SBT offset for PG_HITL */
0,
0,
p0,
@@ -365,17 +374,22 @@ ccl_device_intersect bool scene_intersect_shadow_all(const KernelGlobals *kg,
uint p4 = visibility;
uint p5 = false;
*num_hits = 0; // Initialize hit count to zero
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
*num_hits = 0; /* Initialize hit count to zero. */
optixTrace(scene_intersect_valid(ray) ? kernel_data.bvh.scene : 0,
ray->P,
ray->D,
0.0f,
ray->t,
ray->time,
0xF,
// Need to always call into __anyhit__kernel_optix_shadow_all_hit
ray_mask,
/* Need to always call into __anyhit__kernel_optix_shadow_all_hit. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
1, // SBT offset for PG_HITS
1, /* SBT offset for PG_HITS */
0,
0,
p0,
@@ -444,16 +458,21 @@ ccl_device_intersect bool scene_intersect_volume(const KernelGlobals *kg,
uint p4 = visibility;
uint p5 = PRIMITIVE_NONE;
uint ray_mask = visibility & 0xFF;
if (0 == ray_mask && (visibility & ~0xFF) != 0) {
ray_mask = 0xFF;
}
optixTrace(scene_intersect_valid(ray) ? kernel_data.bvh.scene : 0,
ray->P,
ray->D,
0.0f,
ray->t,
ray->time,
// Skip everything but volumes
0x2,
OPTIX_RAY_FLAG_NONE,
0, // SBT offset for PG_HITD
ray_mask,
/* Need to always call into __anyhit__kernel_optix_volume_test. */
OPTIX_RAY_FLAG_ENFORCE_ANYHIT,
3, /* SBT offset for PG_HITV */
0,
0,
p0,

View File

@@ -21,11 +21,15 @@ CCL_NAMESPACE_BEGIN
/* Given an array of states, build an array of indices for which the states
* are active.
*
* Shared memory requirement is sizeof(int) * (number_of_warps + 1) */
* Shared memory requirement is `sizeof(int) * (number_of_warps + 1)`. */
#include "util/util_atomic.h"
#define GPU_PARALLEL_ACTIVE_INDEX_DEFAULT_BLOCK_SIZE 512
#ifdef __HIP__
# define GPU_PARALLEL_ACTIVE_INDEX_DEFAULT_BLOCK_SIZE 1024
#else
# define GPU_PARALLEL_ACTIVE_INDEX_DEFAULT_BLOCK_SIZE 512
#endif
template<uint blocksize, typename IsActiveOp>
__device__ void gpu_parallel_active_index_array(const uint num_states,

View File

@@ -27,7 +27,11 @@ CCL_NAMESPACE_BEGIN
#include "util/util_atomic.h"
#define GPU_PARALLEL_PREFIX_SUM_DEFAULT_BLOCK_SIZE 512
#ifdef __HIP__
# define GPU_PARALLEL_PREFIX_SUM_DEFAULT_BLOCK_SIZE 1024
#else
# define GPU_PARALLEL_PREFIX_SUM_DEFAULT_BLOCK_SIZE 512
#endif
template<uint blocksize> __device__ void gpu_parallel_prefix_sum(int *values, const int num_values)
{

View File

@@ -26,7 +26,11 @@ CCL_NAMESPACE_BEGIN
* the overall cost of the algorithm while keeping the work complexity O(n) and
* the step complexity O(log n). (Brent's Theorem optimization) */
#define GPU_PARALLEL_SUM_DEFAULT_BLOCK_SIZE 512
#ifdef __HIP__
# define GPU_PARALLEL_SUM_DEFAULT_BLOCK_SIZE 1024
#else
# define GPU_PARALLEL_SUM_DEFAULT_BLOCK_SIZE 512
#endif
template<uint blocksize, typename InputT, typename OutputT, typename ConvertOp>
__device__ void gpu_parallel_sum(

View File

@@ -26,7 +26,11 @@ CCL_NAMESPACE_BEGIN
#include "util/util_atomic.h"
#define GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE 512
#ifdef __HIP__
# define GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE 1024
#else
# define GPU_PARALLEL_SORTED_INDEX_DEFAULT_BLOCK_SIZE 512
#endif
#define GPU_PARALLEL_SORTED_INDEX_INACTIVE_KEY (~0)
template<uint blocksize, typename GetKeyOp>

View File

@@ -0,0 +1,121 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#define __KERNEL_GPU__
#define __KERNEL_HIP__
#define CCL_NAMESPACE_BEGIN
#define CCL_NAMESPACE_END
#ifndef ATTR_FALLTHROUGH
# define ATTR_FALLTHROUGH
#endif
#ifdef __HIPCC_RTC__
typedef unsigned int uint32_t;
typedef unsigned long long uint64_t;
#else
# include <stdint.h>
#endif
#ifdef CYCLES_HIPBIN_CC
# define FLT_MIN 1.175494350822287507969e-38f
# define FLT_MAX 340282346638528859811704183484516925440.0f
# define FLT_EPSILON 1.192092896e-07F
#endif
/* Qualifiers */
#define ccl_device __device__ __inline__
#define ccl_device_inline __device__ __inline__
#define ccl_device_forceinline __device__ __forceinline__
#define ccl_device_noinline __device__ __noinline__
#define ccl_device_noinline_cpu ccl_device
#define ccl_global
#define ccl_static_constant __constant__
#define ccl_device_constant __constant__ __device__
#define ccl_constant const
#define ccl_gpu_shared __shared__
#define ccl_private
#define ccl_may_alias
#define ccl_addr_space
#define ccl_restrict __restrict__
#define ccl_loop_no_unroll
#define ccl_align(n) __align__(n)
#define ccl_optional_struct_init
#define kernel_assert(cond)
/* Types */
#ifdef __HIP__
# include "hip/hip_fp16.h"
# include "hip/hip_runtime.h"
#endif
#ifdef _MSC_VER
# include <immintrin.h>
#endif
#define ccl_gpu_thread_idx_x (threadIdx.x)
#define ccl_gpu_block_dim_x (blockDim.x)
#define ccl_gpu_block_idx_x (blockIdx.x)
#define ccl_gpu_grid_dim_x (gridDim.x)
#define ccl_gpu_warp_size (warpSize)
#define ccl_gpu_global_id_x() (ccl_gpu_block_idx_x * ccl_gpu_block_dim_x + ccl_gpu_thread_idx_x)
#define ccl_gpu_global_size_x() (ccl_gpu_grid_dim_x * ccl_gpu_block_dim_x)
/* GPU warp synchronization */
#define ccl_gpu_syncthreads() __syncthreads()
#define ccl_gpu_ballot(predicate) __ballot(predicate)
#define ccl_gpu_shfl_down_sync(mask, var, detla) __shfl_down(var, detla)
#define ccl_gpu_popc(x) __popc(x)
/* GPU texture objects */
typedef hipTextureObject_t ccl_gpu_tex_object;
template<typename T>
ccl_device_forceinline T ccl_gpu_tex_object_read_2D(const ccl_gpu_tex_object texobj,
const float x,
const float y)
{
return tex2D<T>(texobj, x, y);
}
template<typename T>
ccl_device_forceinline T ccl_gpu_tex_object_read_3D(const ccl_gpu_tex_object texobj,
const float x,
const float y,
const float z)
{
return tex3D<T>(texobj, x, y, z);
}
/* Use fast math functions */
#define cosf(x) __cosf(((float)(x)))
#define sinf(x) __sinf(((float)(x)))
#define powf(x, y) __powf(((float)(x)), ((float)(y)))
#define tanf(x) __tanf(((float)(x)))
#define logf(x) __logf(((float)(x)))
#define expf(x) __expf(((float)(x)))
/* Types */
#include "util/util_half.h"
#include "util/util_types.h"

View File

@@ -0,0 +1,57 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* Device data taken from HIP occupancy calculator.
*
* Terminology
* - HIP GPUs have multiple streaming multiprocessors
* - Each multiprocessor executes multiple thread blocks
* - Each thread block contains a number of threads, also known as the block size
* - Multiprocessors have a fixed number of registers, and the amount of registers
* used by each threads limits the number of threads per block.
*/
/* Launch Bound Definitions */
#define GPU_MULTIPRESSOR_MAX_REGISTERS 65536
#define GPU_MULTIPROCESSOR_MAX_BLOCKS 64
#define GPU_BLOCK_MAX_THREADS 1024
#define GPU_THREAD_MAX_REGISTERS 255
#define GPU_KERNEL_BLOCK_NUM_THREADS 1024
#define GPU_KERNEL_MAX_REGISTERS 64
/* Compute number of threads per block and minimum blocks per multiprocessor
* given the maximum number of registers per thread. */
#define ccl_gpu_kernel(block_num_threads, thread_num_registers) \
extern "C" __global__ void __launch_bounds__(block_num_threads, \
GPU_MULTIPRESSOR_MAX_REGISTERS / \
(block_num_threads * thread_num_registers))
/* sanity checks */
#if GPU_KERNEL_BLOCK_NUM_THREADS > GPU_BLOCK_MAX_THREADS
# error "Maximum number of threads per block exceeded"
#endif
#if GPU_MULTIPRESSOR_MAX_REGISTERS / (GPU_KERNEL_BLOCK_NUM_THREADS * GPU_KERNEL_MAX_REGISTERS) > \
GPU_MULTIPROCESSOR_MAX_BLOCKS
# error "Maximum number of blocks per multiprocessor exceeded"
#endif
#if GPU_KERNEL_MAX_REGISTERS > GPU_THREAD_MAX_REGISTERS
# error "Maximum number of registers per thread exceeded"
#endif

View File

@@ -0,0 +1,49 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* Constant Globals */
#pragma once
#include "kernel/kernel_profiling.h"
#include "kernel/kernel_types.h"
#include "kernel/integrator/integrator_state.h"
CCL_NAMESPACE_BEGIN
/* Not actually used, just a NULL pointer that gets passed everywhere, which we
* hope gets optimized out by the compiler. */
struct KernelGlobals {
/* NOTE: Keep the size in sync with SHADOW_STACK_MAX_HITS. */
int unused[1];
};
/* Global scene data and textures */
__constant__ KernelData __data;
#define KERNEL_TEX(type, name) __attribute__((used)) const __constant__ __device__ type *name;
#include "kernel/kernel_textures.h"
/* Integrator state */
__constant__ IntegratorStateGPU __integrator_state;
/* Abstraction macros */
#define kernel_data __data
#define kernel_tex_fetch(t, index) t[(index)]
#define kernel_tex_array(t) (t)
#define kernel_integrator_state __integrator_state
CCL_NAMESPACE_END

View File

@@ -0,0 +1,28 @@
/*
* Copyright 2011-2021 Blender Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/* HIP kernel entry points */
#ifdef __HIP_DEVICE_COMPILE__
# include "kernel/device/hip/compat.h"
# include "kernel/device/hip/config.h"
# include "kernel/device/hip/globals.h"
# include "kernel/device/gpu/image.h"
# include "kernel/device/gpu/kernel.h"
#endif

View File

@@ -19,7 +19,7 @@
#include "kernel/device/optix/compat.h"
#include "kernel/device/optix/globals.h"
#include "kernel/device/gpu/image.h" // Texture lookup uses normal CUDA intrinsics
#include "kernel/device/gpu/image.h" /* Texture lookup uses normal CUDA intrinsics. */
#include "kernel/integrator/integrator_state.h"
#include "kernel/integrator/integrator_state_flow.h"
@@ -44,18 +44,18 @@ template<typename T> ccl_device_forceinline T *get_payload_ptr_2()
template<bool always = false> ccl_device_forceinline uint get_object_id()
{
#ifdef __OBJECT_MOTION__
// Always get the the instance ID from the TLAS
// There might be a motion transform node between TLAS and BLAS which does not have one
/* Always get the the instance ID from the TLAS.
* There might be a motion transform node between TLAS and BLAS which does not have one. */
uint object = optixGetInstanceIdFromHandle(optixGetTransformListHandle(0));
#else
uint object = optixGetInstanceId();
#endif
// Choose between always returning object ID or only for instances
/* Choose between always returning object ID or only for instances. */
if (always || (object & 1) == 0)
// Can just remove the low bit since instance always contains object ID
/* Can just remove the low bit since instance always contains object ID. */
return object >> 1;
else
// Set to OBJECT_NONE if this is not an instanced object
/* Set to OBJECT_NONE if this is not an instanced object. */
return OBJECT_NONE;
}
@@ -93,23 +93,30 @@ extern "C" __global__ void __raygen__kernel_optix_integrator_intersect_volume_st
extern "C" __global__ void __miss__kernel_optix_miss()
{
// 'kernel_path_lamp_emission' checks intersection distance, so need to set it even on a miss
/* 'kernel_path_lamp_emission' checks intersection distance, so need to set it even on a miss. */
optixSetPayload_0(__float_as_uint(optixGetRayTmax()));
optixSetPayload_5(PRIMITIVE_NONE);
}
extern "C" __global__ void __anyhit__kernel_optix_local_hit()
{
#ifdef __HAIR__
if (!optixIsTriangleHit()) {
/* Ignore curves. */
return optixIgnoreIntersection();
}
#endif
#ifdef __BVH_LOCAL__
const uint object = get_object_id<true>();
if (object != optixGetPayload_4() /* local_object */) {
// Only intersect with matching object
/* Only intersect with matching object. */
return optixIgnoreIntersection();
}
const uint max_hits = optixGetPayload_5();
if (max_hits == 0) {
// Special case for when no hit information is requested, just report that something was hit
/* Special case for when no hit information is requested, just report that something was hit */
optixSetPayload_5(true);
return optixTerminateRay();
}
@@ -136,8 +143,9 @@ extern "C" __global__ void __anyhit__kernel_optix_local_hit()
}
else {
if (local_isect->num_hits && optixGetRayTmax() > local_isect->hits[0].t) {
// Record closest intersection only
// Do not terminate ray here, since there is no guarantee about distance ordering in any-hit
/* Record closest intersection only.
* Do not terminate ray here, since there is no guarantee about distance ordering in any-hit.
*/
return optixIgnoreIntersection();
}
@@ -154,14 +162,14 @@ extern "C" __global__ void __anyhit__kernel_optix_local_hit()
isect->u = 1.0f - barycentrics.y - barycentrics.x;
isect->v = barycentrics.x;
// Record geometric normal
/* Record geometric normal. */
const uint tri_vindex = kernel_tex_fetch(__prim_tri_index, isect->prim);
const float3 tri_a = float4_to_float3(kernel_tex_fetch(__prim_tri_verts, tri_vindex + 0));
const float3 tri_b = float4_to_float3(kernel_tex_fetch(__prim_tri_verts, tri_vindex + 1));
const float3 tri_c = float4_to_float3(kernel_tex_fetch(__prim_tri_verts, tri_vindex + 2));
local_isect->Ng[hit] = normalize(cross(tri_b - tri_a, tri_c - tri_a));
// Continue tracing (without this the trace call would return after the first hit)
/* Continue tracing (without this the trace call would return after the first hit). */
optixIgnoreIntersection();
#endif
}
@@ -190,7 +198,7 @@ extern "C" __global__ void __anyhit__kernel_optix_shadow_all_hit()
u = __uint_as_float(optixGetAttribute_0());
v = __uint_as_float(optixGetAttribute_1());
// Filter out curve endcaps
/* Filter out curve endcaps. */
if (u == 0.0f || u == 1.0f) {
ignore_intersection = true;
}
@@ -241,10 +249,10 @@ extern "C" __global__ void __anyhit__kernel_optix_shadow_all_hit()
isect->type = kernel_tex_fetch(__prim_type, prim);
# ifdef __TRANSPARENT_SHADOWS__
// Detect if this surface has a shader with transparent shadows
/* Detect if this surface has a shader with transparent shadows. */
if (!shader_transparent_shadow(NULL, isect) || max_hits == 0) {
# endif
// If no transparent shadows, all light is blocked and we can stop immediately
/* If no transparent shadows, all light is blocked and we can stop immediately. */
optixSetPayload_5(true);
return optixTerminateRay();
# ifdef __TRANSPARENT_SHADOWS__
@@ -252,24 +260,39 @@ extern "C" __global__ void __anyhit__kernel_optix_shadow_all_hit()
# endif
}
// Continue tracing
/* Continue tracing. */
optixIgnoreIntersection();
#endif
}
extern "C" __global__ void __anyhit__kernel_optix_visibility_test()
extern "C" __global__ void __anyhit__kernel_optix_volume_test()
{
uint visibility = optixGetPayload_4();
#ifdef __HAIR__
if (!optixIsTriangleHit()) {
/* Ignore curves. */
return optixIgnoreIntersection();
}
#endif
#ifdef __VISIBILITY_FLAG__
const uint prim = optixGetPrimitiveIndex();
const uint visibility = optixGetPayload_4();
if ((kernel_tex_fetch(__prim_visibility, prim) & visibility) == 0) {
return optixIgnoreIntersection();
}
#endif
const uint object = get_object_id<true>();
if ((kernel_tex_fetch(__object_flag, object) & SD_OBJECT_HAS_VOLUME) == 0) {
return optixIgnoreIntersection();
}
}
extern "C" __global__ void __anyhit__kernel_optix_visibility_test()
{
#ifdef __HAIR__
if (!optixIsTriangleHit()) {
// Filter out curve endcaps
/* Filter out curve endcaps. */
const float u = __uint_as_float(optixGetAttribute_0());
if (u == 0.0f || u == 1.0f) {
return optixIgnoreIntersection();
@@ -277,18 +300,26 @@ extern "C" __global__ void __anyhit__kernel_optix_visibility_test()
}
#endif
// Shadow ray early termination
#ifdef __VISIBILITY_FLAG__
const uint prim = optixGetPrimitiveIndex();
const uint visibility = optixGetPayload_4();
if ((kernel_tex_fetch(__prim_visibility, prim) & visibility) == 0) {
return optixIgnoreIntersection();
}
/* Shadow ray early termination. */
if (visibility & PATH_RAY_SHADOW_OPAQUE) {
return optixTerminateRay();
}
#endif
}
extern "C" __global__ void __closesthit__kernel_optix_hit()
{
optixSetPayload_0(__float_as_uint(optixGetRayTmax())); // Intersection distance
optixSetPayload_0(__float_as_uint(optixGetRayTmax())); /* Intersection distance */
optixSetPayload_3(optixGetPrimitiveIndex());
optixSetPayload_4(get_object_id());
// Can be PRIMITIVE_TRIANGLE and PRIMITIVE_MOTION_TRIANGLE or curve type and segment index
/* Can be PRIMITIVE_TRIANGLE and PRIMITIVE_MOTION_TRIANGLE or curve type and segment index. */
optixSetPayload_5(kernel_tex_fetch(__prim_type, optixGetPrimitiveIndex()));
if (optixIsTriangleHit()) {
@@ -297,7 +328,7 @@ extern "C" __global__ void __closesthit__kernel_optix_hit()
optixSetPayload_2(__float_as_uint(barycentrics.x));
}
else {
optixSetPayload_1(optixGetAttribute_0()); // Same as 'optixGetCurveParameter()'
optixSetPayload_1(optixGetAttribute_0()); /* Same as 'optixGetCurveParameter()' */
optixSetPayload_2(optixGetAttribute_1());
}
}
@@ -311,7 +342,7 @@ ccl_device_inline void optix_intersection_curve(const uint prim, const uint type
float3 P = optixGetObjectRayOrigin();
float3 dir = optixGetObjectRayDirection();
// The direction is not normalized by default, but the curve intersection routine expects that
/* The direction is not normalized by default, but the curve intersection routine expects that */
float len;
dir = normalize_len(dir, &len);
@@ -323,15 +354,15 @@ ccl_device_inline void optix_intersection_curve(const uint prim, const uint type
Intersection isect;
isect.t = optixGetRayTmax();
// Transform maximum distance into object space
/* Transform maximum distance into object space. */
if (isect.t != FLT_MAX)
isect.t *= len;
if (curve_intersect(NULL, &isect, P, dir, isect.t, visibility, object, prim, time, type)) {
optixReportIntersection(isect.t / len,
type & PRIMITIVE_ALL,
__float_as_int(isect.u), // Attribute_0
__float_as_int(isect.v)); // Attribute_1
__float_as_int(isect.u), /* Attribute_0 */
__float_as_int(isect.v)); /* Attribute_1 */
}
}

View File

@@ -713,7 +713,7 @@ ccl_device_inline void curve_shader_setup(const KernelGlobals *kg,
P = transform_point(&tfm, P);
D = transform_direction(&tfm, D * t);
D = normalize_len(D, &t);
D = safe_normalize_len(D, &t);
}
int prim = kernel_tex_fetch(__prim_index, isect_prim);
@@ -764,8 +764,10 @@ ccl_device_inline void curve_shader_setup(const KernelGlobals *kg,
/* Thick curves, compute normal using direction from inside the curve.
* This could be optimized by recording the normal in the intersection,
* however for Optix this would go beyond the size of the payload. */
/* NOTE: It is possible that P will be the same as P_inside (precision issues, or very small
* radius). In this case use the view direction to approximate the normal. */
const float3 P_inside = float4_to_float3(catmull_rom_basis_eval(P_curve, sd->u));
const float3 Ng = normalize(P - P_inside);
const float3 Ng = (!isequal_float3(P, P_inside)) ? normalize(P - P_inside) : -sd->I;
sd->N = Ng;
sd->Ng = Ng;

View File

@@ -41,7 +41,18 @@ ccl_device_inline int find_attribute_motion(const KernelGlobals *kg,
uint4 attr_map = kernel_tex_fetch(__attributes_map, attr_offset);
while (attr_map.x != id) {
attr_offset += ATTR_PRIM_TYPES;
if (UNLIKELY(attr_map.x == ATTR_STD_NONE)) {
if (UNLIKELY(attr_map.y == 0)) {
return (int)ATTR_STD_NOT_FOUND;
}
else {
/* Chain jump to a different part of the table. */
attr_offset = attr_map.z;
}
}
else {
attr_offset += ATTR_PRIM_TYPES;
}
attr_map = kernel_tex_fetch(__attributes_map, attr_offset);
}

View File

@@ -365,19 +365,16 @@ ccl_device bool integrate_surface(INTEGRATOR_STATE_ARGS,
#ifdef __VOLUME__
if (!(sd.flag & SD_HAS_ONLY_VOLUME)) {
#endif
const int path_flag = INTEGRATOR_STATE(path, flag);
{
const int path_flag = INTEGRATOR_STATE(path, flag);
#ifdef __SUBSURFACE__
/* Can skip shader evaluation for BSSRDF exit point without bump mapping. */
if (!(path_flag & PATH_RAY_SUBSURFACE) || ((sd.flag & SD_HAS_BSSRDF_BUMP)))
/* Can skip shader evaluation for BSSRDF exit point without bump mapping. */
if (!(path_flag & PATH_RAY_SUBSURFACE) || ((sd.flag & SD_HAS_BSSRDF_BUMP)))
#endif
{
/* Evaluate shader. */
PROFILING_EVENT(PROFILING_SHADE_SURFACE_EVAL);
shader_eval_surface<node_feature_mask>(
INTEGRATOR_STATE_PASS, &sd, render_buffer, path_flag);
}
{
/* Evaluate shader. */
PROFILING_EVENT(PROFILING_SHADE_SURFACE_EVAL);
shader_eval_surface<node_feature_mask>(INTEGRATOR_STATE_PASS, &sd, render_buffer, path_flag);
}
#ifdef __SUBSURFACE__
@@ -417,17 +414,20 @@ ccl_device bool integrate_surface(INTEGRATOR_STATE_ARGS,
/* Perform path termination. Most paths have already been terminated in
* the intersect_closest kernel, this is just for emission and for dividing
* throughput by the probability at the right moment. */
const int path_flag = INTEGRATOR_STATE(path, flag);
const float probability = (path_flag & PATH_RAY_TERMINATE_ON_NEXT_SURFACE) ?
0.0f :
path_state_continuation_probability(INTEGRATOR_STATE_PASS,
path_flag);
if (probability == 0.0f) {
return false;
}
else if (probability != 1.0f) {
INTEGRATOR_STATE_WRITE(path, throughput) /= probability;
* throughput by the probability at the right moment.
*
* Also ensure we don't do it twice for SSS at both the entry and exit point. */
if (!(path_flag & PATH_RAY_SUBSURFACE)) {
const float probability = (path_flag & PATH_RAY_TERMINATE_ON_NEXT_SURFACE) ?
0.0f :
path_state_continuation_probability(INTEGRATOR_STATE_PASS,
path_flag);
if (probability == 0.0f) {
return false;
}
else if (probability != 1.0f) {
INTEGRATOR_STATE_WRITE(path, throughput) /= probability;
}
}
#ifdef __DENOISING_FEATURES__

View File

@@ -74,7 +74,7 @@ ccl_device_inline bool shadow_volume_shader_sample(INTEGRATOR_STATE_ARGS,
ShaderData *ccl_restrict sd,
float3 *ccl_restrict extinction)
{
shader_eval_volume(INTEGRATOR_STATE_PASS, sd, PATH_RAY_SHADOW, [=](const int i) {
shader_eval_volume<true>(INTEGRATOR_STATE_PASS, sd, PATH_RAY_SHADOW, [=](const int i) {
return integrator_state_read_shadow_volume_stack(INTEGRATOR_STATE_PASS, i);
});
@@ -93,7 +93,7 @@ ccl_device_inline bool volume_shader_sample(INTEGRATOR_STATE_ARGS,
VolumeShaderCoefficients *coeff)
{
const int path_flag = INTEGRATOR_STATE(path, flag);
shader_eval_volume(INTEGRATOR_STATE_PASS, sd, path_flag, [=](const int i) {
shader_eval_volume<false>(INTEGRATOR_STATE_PASS, sd, path_flag, [=](const int i) {
return integrator_state_read_volume_stack(INTEGRATOR_STATE_PASS, i);
});

View File

@@ -60,7 +60,15 @@ CCL_NAMESPACE_BEGIN
* TODO: these could be made dynamic depending on the features used in the scene. */
#define INTEGRATOR_VOLUME_STACK_SIZE VOLUME_STACK_SIZE
#define INTEGRATOR_SHADOW_ISECT_SIZE 4
#define INTEGRATOR_SHADOW_ISECT_SIZE_CPU 1024
#define INTEGRATOR_SHADOW_ISECT_SIZE_GPU 4
#ifdef __KERNEL_CPU__
# define INTEGRATOR_SHADOW_ISECT_SIZE INTEGRATOR_SHADOW_ISECT_SIZE_CPU
#else
# define INTEGRATOR_SHADOW_ISECT_SIZE INTEGRATOR_SHADOW_ISECT_SIZE_GPU
#endif
/* Data structures */
@@ -74,9 +82,9 @@ typedef struct IntegratorStateCPU {
#define KERNEL_STRUCT_END(name) \
} \
name;
#define KERNEL_STRUCT_END_ARRAY(name, size) \
#define KERNEL_STRUCT_END_ARRAY(name, cpu_size, gpu_size) \
} \
name[size];
name[cpu_size];
#include "kernel/integrator/integrator_state_template.h"
#undef KERNEL_STRUCT_BEGIN
#undef KERNEL_STRUCT_MEMBER
@@ -103,9 +111,9 @@ typedef struct IntegratorStateGPU {
#define KERNEL_STRUCT_END(name) \
} \
name;
#define KERNEL_STRUCT_END_ARRAY(name, size) \
#define KERNEL_STRUCT_END_ARRAY(name, cpu_size, gpu_size) \
} \
name[size];
name[gpu_size];
#include "kernel/integrator/integrator_state_template.h"
#undef KERNEL_STRUCT_BEGIN
#undef KERNEL_STRUCT_MEMBER

View File

@@ -107,7 +107,7 @@ KERNEL_STRUCT_END(subsurface)
KERNEL_STRUCT_BEGIN(volume_stack)
KERNEL_STRUCT_ARRAY_MEMBER(volume_stack, int, object, KERNEL_FEATURE_VOLUME)
KERNEL_STRUCT_ARRAY_MEMBER(volume_stack, int, shader, KERNEL_FEATURE_VOLUME)
KERNEL_STRUCT_END_ARRAY(volume_stack, INTEGRATOR_VOLUME_STACK_SIZE)
KERNEL_STRUCT_END_ARRAY(volume_stack, INTEGRATOR_VOLUME_STACK_SIZE, INTEGRATOR_VOLUME_STACK_SIZE)
/********************************* Shadow Path State **************************/
@@ -153,11 +153,15 @@ KERNEL_STRUCT_ARRAY_MEMBER(shadow_isect, int, object, KERNEL_FEATURE_PATH_TRACIN
KERNEL_STRUCT_ARRAY_MEMBER(shadow_isect, int, type, KERNEL_FEATURE_PATH_TRACING)
/* TODO: exclude for GPU. */
KERNEL_STRUCT_ARRAY_MEMBER(shadow_isect, float3, Ng, KERNEL_FEATURE_PATH_TRACING)
KERNEL_STRUCT_END_ARRAY(shadow_isect, INTEGRATOR_SHADOW_ISECT_SIZE)
KERNEL_STRUCT_END_ARRAY(shadow_isect,
INTEGRATOR_SHADOW_ISECT_SIZE_CPU,
INTEGRATOR_SHADOW_ISECT_SIZE_GPU)
/**************************** Shadow Volume Stack *****************************/
KERNEL_STRUCT_BEGIN(shadow_volume_stack)
KERNEL_STRUCT_ARRAY_MEMBER(shadow_volume_stack, int, object, KERNEL_FEATURE_VOLUME)
KERNEL_STRUCT_ARRAY_MEMBER(shadow_volume_stack, int, shader, KERNEL_FEATURE_VOLUME)
KERNEL_STRUCT_END_ARRAY(shadow_volume_stack, INTEGRATOR_VOLUME_STACK_SIZE)
KERNEL_STRUCT_END_ARRAY(shadow_volume_stack,
INTEGRATOR_VOLUME_STACK_SIZE,
INTEGRATOR_VOLUME_STACK_SIZE)

View File

@@ -217,10 +217,10 @@ ccl_device_inline void integrator_state_copy_only(const IntegratorState to_state
while (false) \
;
# define KERNEL_STRUCT_END_ARRAY(name, array_size) \
# define KERNEL_STRUCT_END_ARRAY(name, cpu_array_size, gpu_array_size) \
++index; \
} \
while (index < array_size) \
while (index < gpu_array_size) \
;
# include "kernel/integrator/integrator_state_template.h"
@@ -264,7 +264,12 @@ ccl_device_inline void integrator_state_shadow_catcher_split(INTEGRATOR_STATE_AR
IntegratorStateCPU *ccl_restrict split_state = state + 1;
*split_state = *state;
/* Only copy the required subset, since shadow intersections are big and irrelevant here. */
split_state->path = state->path;
split_state->ray = state->ray;
split_state->isect = state->isect;
memcpy(split_state->volume_stack, state->volume_stack, sizeof(state->volume_stack));
split_state->shadow_path = state->shadow_path;
split_state->path.flag |= PATH_RAY_SHADOW_CATCHER_PASS;
#endif

View File

@@ -386,7 +386,7 @@ ccl_device_inline void kernel_accum_light(INTEGRATOR_STATE_CONST_ARGS,
{
/* The throughput for shadow paths already contains the light shader evaluation. */
float3 contribution = INTEGRATOR_STATE(shadow_path, throughput);
kernel_accum_clamp(kg, &contribution, INTEGRATOR_STATE(shadow_path, bounce) - 1);
kernel_accum_clamp(kg, &contribution, INTEGRATOR_STATE(shadow_path, bounce));
ccl_global float *buffer = kernel_accum_pixel_render_buffer(INTEGRATOR_STATE_PASS,
render_buffer);

View File

@@ -42,6 +42,16 @@ ccl_device void kernel_displace_evaluate(const KernelGlobals *kg,
object_inverse_dir_transform(kg, &sd, &D);
#ifdef __KERNEL_DEBUG_NAN__
if (!isfinite3_safe(D)) {
kernel_assert(!"Cycles displacement with non-finite value detected");
}
#endif
/* Ensure finite displacement, preventing BVH from becoming degenerate and avoiding possible
* traversal issues caused by non-finite math. */
D = ensure_finite3(D);
/* Write output. */
output[offset] += make_float4(D.x, D.y, D.z, 0.0f);
}
@@ -66,7 +76,16 @@ ccl_device void kernel_background_evaluate(const KernelGlobals *kg,
const int path_flag = PATH_RAY_EMISSION;
shader_eval_surface<KERNEL_FEATURE_NODE_MASK_SURFACE_LIGHT>(
INTEGRATOR_STATE_PASS_NULL, &sd, NULL, path_flag);
const float3 color = shader_background_eval(&sd);
float3 color = shader_background_eval(&sd);
#ifdef __KERNEL_DEBUG_NAN__
if (!isfinite3_safe(color)) {
kernel_assert(!"Cycles background with non-finite value detected");
}
#endif
/* Ensure finite color, avoiding possible numerical instabilities in the path tracing kernels. */
color = ensure_finite3(color);
/* Write output. */
output[offset] += make_float4(color.x, color.y, color.z, 0.0f);

View File

@@ -394,7 +394,7 @@ film_calculate_shadow_catcher(const KernelFilmConvert *ccl_restrict kfilm_conver
/* NOTE: It is possible that the Shadow Catcher pass is requested as an output without actual
* shadow catcher objects in the scene. In this case there will be no auxiliary passes required
* for the devision (to save up memory). So delay the asserts to this point so that the number of
* for the decision (to save up memory). So delay the asserts to this point so that the number of
* samples check handles such configuration. */
kernel_assert(kfilm_convert->pass_offset != PASS_UNUSED);
kernel_assert(kfilm_convert->pass_combined != PASS_UNUSED);

View File

@@ -74,10 +74,6 @@ ccl_device_inline float cmj_randfloat_simple(uint i, uint p)
ccl_device float pmj_sample_1D(const KernelGlobals *kg, uint sample, uint rng_hash, uint dimension)
{
/* The PMJ sample sets contain a sample with (x,y) with NUM_PMJ_SAMPLES so for 1D
* the x part is used as the sample (TODO(@leesonw): Add using both x and y parts
* independently). */
/* Perform Owen shuffle of the sample number to reorder the samples. */
#ifdef _SIMPLE_HASH_
const uint rv = cmj_hash_simple(dimension, rng_hash);
@@ -95,7 +91,10 @@ ccl_device float pmj_sample_1D(const KernelGlobals *kg, uint sample, uint rng_ha
const uint sample_set = s / NUM_PMJ_SAMPLES;
const uint d = (dimension + sample_set);
const uint dim = d % NUM_PMJ_PATTERNS;
int index = 2 * (dim * NUM_PMJ_SAMPLES + (s % NUM_PMJ_SAMPLES));
/* The PMJ sample sets contain a sample with (x,y) with NUM_PMJ_SAMPLES so for 1D
* the x part is used for even dims and the y for odd. */
int index = 2 * ((dim >> 1) * NUM_PMJ_SAMPLES + (s % NUM_PMJ_SAMPLES)) + (dim & 1);
float fx = kernel_tex_fetch(__sample_pattern_lut, index);
@@ -104,12 +103,11 @@ ccl_device float pmj_sample_1D(const KernelGlobals *kg, uint sample, uint rng_ha
# ifdef _SIMPLE_HASH_
float dx = cmj_randfloat_simple(d, rng_hash);
# else
/* Only jitter within the grid interval. */
float dx = cmj_randfloat(d, rng_hash);
# endif
fx = fx + dx * (1.0f / NUM_PMJ_SAMPLES);
/* Jitter sample locations and map back into [0 1]. */
fx = fx + dx;
fx = fx - floorf(fx);
#else
# warning "Not using Cranley-Patterson Rotation."
#endif
@@ -136,7 +134,7 @@ ccl_device void pmj_sample_2D(
/* Based on the sample number a sample pattern is selected and offset by the dimension. */
const uint sample_set = s / NUM_PMJ_SAMPLES;
const uint d = (dimension + sample_set);
const uint dim = d % NUM_PMJ_PATTERNS;
uint dim = d % NUM_PMJ_PATTERNS;
int index = 2 * (dim * NUM_PMJ_SAMPLES + (s % NUM_PMJ_SAMPLES));
float fx = kernel_tex_fetch(__sample_pattern_lut, index);
@@ -151,17 +149,17 @@ ccl_device void pmj_sample_2D(
float dx = cmj_randfloat(d, rng_hash);
float dy = cmj_randfloat(d + 1, rng_hash);
# endif
/* Only jitter within the grid cells. */
fx = fx + dx * (1.0f / NUM_PMJ_DIVISIONS);
fy = fy + dy * (1.0f / NUM_PMJ_DIVISIONS);
fx = fx - floorf(fx);
fy = fy - floorf(fy);
/* Jitter sample locations and map back to the unit square [0 1]x[0 1]. */
float sx = fx + dx;
float sy = fy + dy;
sx = sx - floorf(sx);
sy = sy - floorf(sy);
#else
# warning "Not using Cranley Patterson Rotation."
#endif
(*x) = fx;
(*y) = fy;
(*x) = sx;
(*y) = sy;
}
CCL_NAMESPACE_END

View File

@@ -750,7 +750,7 @@ ccl_device int shader_phase_sample_closure(const KernelGlobals *kg,
/* Volume Evaluation */
template<typename StackReadOp>
template<const bool shadow, typename StackReadOp>
ccl_device_inline void shader_eval_volume(INTEGRATOR_STATE_CONST_ARGS,
ShaderData *ccl_restrict sd,
const int path_flag,
@@ -815,8 +815,11 @@ ccl_device_inline void shader_eval_volume(INTEGRATOR_STATE_CONST_ARGS,
# endif
/* Merge closures to avoid exceeding number of closures limit. */
if (i > 0)
shader_merge_volume_closures(sd);
if (!shadow) {
if (i > 0) {
shader_merge_volume_closures(sd);
}
}
}
}

View File

@@ -572,6 +572,7 @@ typedef enum AttributeStandard {
ATTR_STD_MOTION_VERTEX_NORMAL,
ATTR_STD_PARTICLE,
ATTR_STD_CURVE_INTERCEPT,
ATTR_STD_CURVE_LENGTH,
ATTR_STD_CURVE_RANDOM,
ATTR_STD_PTEX_FACE_ID,
ATTR_STD_PTEX_UV,

View File

@@ -107,6 +107,7 @@ ustring OSLRenderServices::u_geom_undisplaced("geom:undisplaced");
ustring OSLRenderServices::u_is_smooth("geom:is_smooth");
ustring OSLRenderServices::u_is_curve("geom:is_curve");
ustring OSLRenderServices::u_curve_thickness("geom:curve_thickness");
ustring OSLRenderServices::u_curve_length("geom:curve_length");
ustring OSLRenderServices::u_curve_tangent_normal("geom:curve_tangent_normal");
ustring OSLRenderServices::u_curve_random("geom:curve_random");
ustring OSLRenderServices::u_path_ray_length("path:ray_length");

View File

@@ -294,6 +294,7 @@ class OSLRenderServices : public OSL::RendererServices {
static ustring u_is_smooth;
static ustring u_is_curve;
static ustring u_curve_thickness;
static ustring u_curve_length;
static ustring u_curve_tangent_normal;
static ustring u_curve_random;
static ustring u_path_ray_length;

View File

@@ -18,12 +18,14 @@
shader node_hair_info(output float IsStrand = 0.0,
output float Intercept = 0.0,
output float Length = 0.0,
output float Thickness = 0.0,
output normal TangentNormal = N,
output float Random = 0)
{
getattribute("geom:is_curve", IsStrand);
getattribute("geom:curve_intercept", Intercept);
getattribute("geom:curve_length", Length);
getattribute("geom:curve_thickness", Thickness);
getattribute("geom:curve_tangent_normal", TangentNormal);
getattribute("geom:curve_random", Random);

View File

@@ -213,6 +213,8 @@ ccl_device_noinline void svm_node_hair_info(
}
case NODE_INFO_CURVE_INTERCEPT:
break; /* handled as attribute */
case NODE_INFO_CURVE_LENGTH:
break; /* handled as attribute */
case NODE_INFO_CURVE_RANDOM:
break; /* handled as attribute */
case NODE_INFO_CURVE_THICKNESS: {

View File

@@ -173,6 +173,7 @@ typedef enum NodeParticleInfo {
typedef enum NodeHairInfo {
NODE_INFO_CURVE_IS_STRAND,
NODE_INFO_CURVE_INTERCEPT,
NODE_INFO_CURVE_LENGTH,
NODE_INFO_CURVE_THICKNESS,
/* Fade for minimum hair width transiency. */
// NODE_INFO_CURVE_FADE,

View File

@@ -342,6 +342,8 @@ const char *Attribute::standard_name(AttributeStandard std)
return "particle";
case ATTR_STD_CURVE_INTERCEPT:
return "curve_intercept";
case ATTR_STD_CURVE_LENGTH:
return "curve_length";
case ATTR_STD_CURVE_RANDOM:
return "curve_random";
case ATTR_STD_PTEX_FACE_ID:
@@ -586,6 +588,9 @@ Attribute *AttributeSet::add(AttributeStandard std, ustring name)
case ATTR_STD_CURVE_INTERCEPT:
attr = add(name, TypeDesc::TypeFloat, ATTR_ELEMENT_CURVE_KEY);
break;
case ATTR_STD_CURVE_LENGTH:
attr = add(name, TypeDesc::TypeFloat, ATTR_ELEMENT_CURVE);
break;
case ATTR_STD_CURVE_RANDOM:
attr = add(name, TypeDesc::TypeFloat, ATTR_ELEMENT_CURVE);
break;

View File

@@ -22,7 +22,6 @@
#include "util/util_foreach.h"
#include "util/util_hash.h"
#include "util/util_math.h"
#include "util/util_opengl.h"
#include "util/util_time.h"
#include "util/util_types.h"

View File

@@ -434,7 +434,8 @@ void Film::update_passes(Scene *scene, bool add_sample_count_pass)
const ObjectManager *object_manager = scene->object_manager;
Integrator *integrator = scene->integrator;
if (!is_modified() && !object_manager->need_update() && !integrator->is_modified()) {
if (!is_modified() && !object_manager->need_update() && !integrator->is_modified() &&
!background->is_modified()) {
return;
}

View File

@@ -794,11 +794,6 @@ void GeometryManager::device_update_attributes(Device *device,
foreach (AttributeRequest &req, attributes.requests) {
Attribute *attr = geom->attributes.find(req);
/* Vertex normals are stored in DeviceScene.tri_vnormal. */
if (attr && attr->std == ATTR_STD_VERTEX_NORMAL) {
continue;
}
update_attribute_element_size(geom,
attr,
ATTR_PRIM_GEOMETRY,
@@ -811,11 +806,6 @@ void GeometryManager::device_update_attributes(Device *device,
Mesh *mesh = static_cast<Mesh *>(geom);
Attribute *subd_attr = mesh->subd_attributes.find(req);
/* Vertex normals are stored in DeviceScene.tri_vnormal. */
if (subd_attr && subd_attr->std == ATTR_STD_VERTEX_NORMAL) {
continue;
}
update_attribute_element_size(mesh,
subd_attr,
ATTR_PRIM_SUBD,
@@ -870,11 +860,6 @@ void GeometryManager::device_update_attributes(Device *device,
Attribute *attr = geom->attributes.find(req);
if (attr) {
/* Vertex normals are stored in DeviceScene.tri_vnormal. */
if (attr->std == ATTR_STD_VERTEX_NORMAL) {
continue;
}
/* force a copy if we need to reallocate all the data */
attr->modified |= attributes_need_realloc[Attribute::kernel_type(*attr)];
}
@@ -898,11 +883,6 @@ void GeometryManager::device_update_attributes(Device *device,
Attribute *subd_attr = mesh->subd_attributes.find(req);
if (subd_attr) {
/* Vertex normals are stored in DeviceScene.tri_vnormal. */
if (subd_attr->std == ATTR_STD_VERTEX_NORMAL) {
continue;
}
/* force a copy if we need to reallocate all the data */
subd_attr->modified |= attributes_need_realloc[Attribute::kernel_type(*subd_attr)];
}

View File

@@ -46,7 +46,7 @@ class GPUDisplayParams {
* NOTE: Is not affected by the resolution divider. */
int2 full_size = make_int2(0, 0);
/* Effective vieport size.
/* Effective viewport size.
* In the case of border render, size of the border rectangle.
*
* NOTE: Is not affected by the resolution divider. */
@@ -163,7 +163,7 @@ class GPUDisplay {
* This call might happen in parallel with draw, but can never happen in parallel with the
* update.
*
* The actual zero-ing can be deferred to a later moment. What is important is that after clear
* The actual zeroing can be deferred to a later moment. What is important is that after clear
* and before pixels update the drawing texture will be fully empty, and that partial update
* after clear will write new pixel values for an updating area, leaving everything else zeroed.
*

View File

@@ -19,7 +19,7 @@
#include "kernel/kernel_types.h"
#include "device/device_denoise.h" /* For the paramaters and type enum. */
#include "device/device_denoise.h" /* For the parameters and type enum. */
#include "graph/node.h"
#include "integrator/adaptive_sampling.h"

View File

@@ -4368,6 +4368,7 @@ NODE_DEFINE(HairInfoNode)
SOCKET_OUT_FLOAT(is_strand, "Is Strand");
SOCKET_OUT_FLOAT(intercept, "Intercept");
SOCKET_OUT_FLOAT(size, "Length");
SOCKET_OUT_FLOAT(thickness, "Thickness");
SOCKET_OUT_NORMAL(tangent_normal, "Tangent Normal");
#if 0 /* Output for minimum hair width transparency - deactivated. */
@@ -4390,6 +4391,9 @@ void HairInfoNode::attributes(Shader *shader, AttributeRequestSet *attributes)
if (!intercept_out->links.empty())
attributes->add(ATTR_STD_CURVE_INTERCEPT);
if (!output("Length")->links.empty())
attributes->add(ATTR_STD_CURVE_LENGTH);
if (!output("Random")->links.empty())
attributes->add(ATTR_STD_CURVE_RANDOM);
}
@@ -4412,6 +4416,12 @@ void HairInfoNode::compile(SVMCompiler &compiler)
compiler.add_node(NODE_ATTR, attr, compiler.stack_assign(out), NODE_ATTR_OUTPUT_FLOAT);
}
out = output("Length");
if (!out->links.empty()) {
int attr = compiler.attribute(ATTR_STD_CURVE_LENGTH);
compiler.add_node(NODE_ATTR, attr, compiler.stack_assign(out), NODE_ATTR_OUTPUT_FLOAT);
}
out = output("Thickness");
if (!out->links.empty()) {
compiler.add_node(NODE_HAIR_INFO, NODE_INFO_CURVE_THICKNESS, compiler.stack_assign(out));
@@ -6749,8 +6759,6 @@ void NormalMapNode::attributes(Shader *shader, AttributeRequestSet *attributes)
attributes->add(ustring((string(attribute.c_str()) + ".tangent").c_str()));
attributes->add(ustring((string(attribute.c_str()) + ".tangent_sign").c_str()));
}
attributes->add(ATTR_STD_VERTEX_NORMAL);
}
ShaderNode::attributes(shader, attributes);
@@ -7026,8 +7034,6 @@ void VectorDisplacementNode::attributes(Shader *shader, AttributeRequestSet *att
attributes->add(ustring((string(attribute.c_str()) + ".tangent").c_str()));
attributes->add(ustring((string(attribute.c_str()) + ".tangent_sign").c_str()));
}
attributes->add(ATTR_STD_VERTEX_NORMAL);
}
ShaderNode::attributes(shader, attributes);

Some files were not shown because too many files have changed in this diff Show More