This uses a StorageBuf as the source of indirect dispatch argument.
The user needs to make sure the parameters are in the right order.
There is no support for argument offset for the moment as there is no
need for it. But this might be added in the future.
Note that the indirect buffer is synchronized at the backend level. This is
done for practical reasons and because this feature is almost always used
for GPU driven pipeline.
This is a faster way to clear a buffer instead of reuploading new data.
It is equivalent to `memset` and runs directly on the GPU.
This is better to clear huge buffers and to avoid the sync cost of data upload.
This was getting in the way in multiple instances. Compute shaders dispatch
are still made in the presence of the last bound framebuffer even if they
do not interact with it.
This is supposed to hold the latest improvement from the EEVEE rewrite branch.
Note that a restart is necessary in order for the engine to appear.
The registration code is a bit convoluted as it needs to be after the WM_init.
Add a new operator to the Graph Editor that blends selected keyframes
to their default value.
The operator can be accessed from
Key>Slider Operators>Blend To Default Value
Reviewed by: Sybren A. Stüvel
Differential Revision: https://developer.blender.org/D9376
Ref: D9367
As the grease pencil simplify is a subotion of general simplify, if the general switch is disabled, the grease pencil simplify must be disabled too.
This patch also disable the UI panel.
This is supposed to hold the latest improvement from the EEVEE rewrite branch.
Note that a restart is necessary in order for the engine to appear.
The registration code is a bit convoluted as it needs to be after the WM_init.
Create a function on CurvesGeometry that can also be used for an edit
mode operator in the future. Dealing with CustomData directly means the
code is a bit more verbose than would be ideal, but this would be a
simple thing to clean up in the future if we get an attribute API here.
Also change the reverse node to first work on a read-only geometry
component, and only get write access if there is a curve selected.
Differential Revision: https://developer.blender.org/D14375
This will mostly just remove the overhead of converting
to and from the old curves type, though it also does open
some opportunities for multi-threading in the future.
A mistake in 8538c69921. The offsets include the segment at the
corresponding index, but the evaluated offset calculation was adjusting
the offset for the second to last segment.
Make the new curves' translate and transform functions also affect
the handle position attributes.
Differential Revision: https://developer.blender.org/D14372
Resizing nodes used the cursor location when the event was triggered
instead of the drag-start, harmless but means the drag location isn't
under the cursor especially with a high drag threshold.
Noticed when investigating other drag issues,
unrelated to recent changes to drag behavior.
`RNA_def_struct_ui_text(srna, ...)` was reused for `is_valid` and `is_muted`
which would set the documentation to theirs (actually to that of the last
call).
`RNA_def_property_ui_text(prop, ...)` should be used for the properties.
Remove the conversion to and from `CurveEval` by supporting the
new Curves data-block in the node. This allows for some simplifications
to the code, as well as a fix for transfering curve domain attributes
when duplicating the curve domain.
The performance improvements (obverved through the timings overlay)
can be relatively massive with many curves. When duplicating 10000
4-point curves to become 2 million curves, I observed an approximate
150x improvement, from about 3 seconds to about 20ms.
- Pass less redundant information in function arguments.
- Use `IndexRange` more instead of direct offset calculations.
- Use specific geometry component types for specialized functions.
- Use const arguments.
- Declare variables closer to where they are created or used.
- Remove some redundant logic.
- Simplify the description for the output geometry.
The menu for Timeline > Keying > Active Keying Set wouldn't show up.
Caused by d8e3bcf770. The function to attach search menu data to the button
would be called twice with different arguments for the same button now.
Shouldn't be an issue in general, but the first call now had the unexpected
side effect that the button would get disabled. Make sure it's re-enabled when
the second call sets the proper search data now.
Caused by rB43bc494892c3, moving this 'new id' relink to generic
remapping code added the over-head of proper, generic post-processing,
compared to the special-cases previous code was only designed to handle.
Fortunately with recent 'multi-remapping' work we can easily rewrite
that new id relink code to use the multi-remapping approach too.
No behavioral change is expected from this commit, besides the improved
performances (essentially restored to what they where before
rB43bc494892c3).
This reduces the complexity and avoid framebuffer setup costs.
This also "remove" the prefiltering of the glossy cubemaps in favor
of a simple bilinear filtering of the mipchain.
Change the sample mode to don't duplicate the last vertex of the
stroke and instead use the cyclic flag to close previously cyclic
strokes. This is necessary for the following modifiers.
Reviewed By: NicksBest
Differential Revision: http://developer.blender.org/D14359
From hair particle mode:
* Add
* Comb
* Cut
* Grow
New:
* Delete
Only comb and delete are used at the moment (by the new tools which are
under experimental).
Selecting an object that was already active & selected would de-select
it when the cursor was over the objects center.
This was caused by [0] that added a check which assumed more than one
hits from GPU_select meant there were multiple objects to select from.
This is not necessarily the case since bones, camera tracks or the
objects own center can add additional hits.
Resolve by keeping track of the best hit with & without the
active-selected object, only using the non-active-selected if it's found.
[0] 1550573360
Support for differentiating the tweak tool from the 3D cursor when
select is set to RMB.
This is currently an experimental preference:
Tweak Tool: Left Mouse Select & Drag
When enabled the tweak tool can now tweak the existing selection
without de-selecting first, a single click can be used to replace
the selection.
This matches selection in the graph & node editors.
This preferences is only available with "Developer Extras" enabled.
Ref T96544.
Needed so mapping selection to click doesn't pass the click event
through to setting the 3D cursor for e.g.
While this doesn't happen with the default key-map, setting selection
to LMB-click would set the 3D cursor as well (when the selection
fell through to nothing).
De-selecting objects meant that selecting a bone would de-select
all the other pose objects - making exiting & entering pose-mode
loose the current set of pose objects.
Match edit-mode behavior: avoid de-selecting objects in the current mode
(unless the action is explicitly performed in the outliner for e.g.).
Previously setting the 'basact' to NULL was done, but this wasn't
so simple to use with deselect_all which needs to check if there was
anything found at the cursor.
Add a 'handled' variable to differentiate this case, when set
don't attempt object selection.
While basic single track selection worked,
toggling and de-selection has been broken since at least 2.83.
Support SelectPick_Params with the exception of deselect_all
which doesn't make sense for tracks as de-selecting all objects
is expected in that case.
This was an involved operation to include inline,
making ed_object_select_pick more difficult to follow.
Prepare for track selection to properly support SelectPick_Params.
Currently this isn't used in the key-map, it will eventually
allow the 3D viewports tweak tool to match the behavior of other
editors that support tweaking a selection without first de-selecting
all other elements.
This is only part of the experimental "Full Frame" mode (disabled
by default). See T88150.
Currently the viewer node uses buffer paddings to display image offset
in the backdrop as a temporal solution implemented for {D12466}.
This solution is inefficient memory and performance-wise. Another
issue is that the paddings are part the image when saved.
This patch instead sets the offset in the Viewer node image
as variables and makes the backdrop take it into account
when drawing the image or any related gizmo.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D12750
The previous fix including `<algorithm>` was an improvement
but not the actual error, which appears to be that `int64_t` is
long long int on one platform but just long int on another.
The fix includes the template argument directly.
This patch adds evaluation for NURBS, Bezier, and Catmull Rom
curves for the new `Curves` data-block. The main difference from
the code in `BKE_spline.hh` is that the functionality is not
encapsulated in classes. Instead, each function has arguments
for all of the information it needs. This makes the code more
reusable and removes a bunch of unnecessary complications
for keeping track of state.
NURBS and Bezier evaluation works the same way as existing code.
The Catmull Rom implementation is new, with the basis function
based on Cycles code. All three types have some basic tests.
For NURBS and Catmull Rom curves, evaluating positions is the
same as any generic attribute, so it's implemented by the generic
interpolation to evaluated points. Bezier curves are a bit special,
because the "handle" control points are stored in a separate attribute.
This patch doesn't include generic interpolation to evaluated points
for Bezier curves.
Ref T95942
Differential Revision: https://developer.blender.org/D14284
With the deferred pipeline, the materials needs different shading groups
depending on their matflags.
Note that this is potentially slower because execution order of shaders
may now be random. This might be fixed in a later commit.
Similar to other changes to ID remapping, gives huge speedups in some
cases, like certain types of liboverride creation.
Case from {T96092} goes from 1725 seconds (almost 30 minutes) to 45
seconds to generate the liboverride, on my machine.
Reviewed By: jbakker
Maniphest Tasks: T96092
Differential Revision: https://developer.blender.org/D14240
Ever since d5b72fb06c, shader nodes have been in the
`blender::nodes` namespace, so they don't need to use that to access
Blender's C++ types and functions.
Somehow exposed after 943b919fe8, linking could fail because
bf_nodes was not properly configured as a dependency of bf_nodes_shader.
Also add the dependency to the geometry nodes module.
To make porting to other architectures easier, clarifying that this does not
need to be supported. The unused parallel_reduce implementation assumed warp
size 32, but is easy to update if we ever need it in the future.
When using inverted filling and click inside a closed area and not outside as is expected, the algorithm to detect the contour to fill is unable to find the filling shape and try to fill outside of the valid index.
The infinite loop was adding more memory for each loop and the process continued while there was system resources and finally crashed the system.
As the tool in negative mode is designed to fill all areas when you click outside of any shape, now the algorithm check if the outline is not working as expected and cancels the filling process.
This commit removes the implementations of legacy nodes,
their type definitions, and related code that becomes unused.
Now that we have two releases that included the legacy nodes,
there is not much reason to include them still. Removing the
code means refactoring will be easier, and old code doesn't
have to be tested and maintained.
After this commit, the legacy nodes will be undefined in the UI,
so 3.0 or 3.1 should be used to convert files to the fields system.
The net change is 12184 lines removed!
The tooltip for legacy nodes mentioned that we would remove
them before 4.0, which was purposefully a bit vague to allow
us this flexibility. In a poll in a devtalk post showed that the
majority of people were okay with removing the nodes.
https://devtalk.blender.org/t/geometry-nodes-backward-compatibility-poll/20199
Differential Revision: https://developer.blender.org/D14353
Solved by introducing introducing a variant of MEM_cnew which behaves
as a copy-constructor for a trivial types.
Alternative approach would be to surround DNA structs with clang/gcc
diagnostics push/modify/pop so that implicitly defined constructors
and copy operators are allowed to access deprecated fields.
The downside of the DNA approach is that it will require some way to
easily apply diagnostics modifications to many structs, which is not
possible currently.
The newly added MEM_cnew has other good usecases, so is easiest to
use this route, at least for now.
Differential Revision: https://developer.blender.org/D14356
Resolves a fair amount of noisy warnings with default build on macOS.
Tested using render_layer render test which includes Freestyle layer.
Differential Revision: https://developer.blender.org/D14355
Meta-element selection now follows conventions for other picking
functions (e.g. EDBM_select_pick).
- Split meta-element find-nearest into a separate function.
- Cycle the meta-element starting from the active & selected
instead of comparing & setting a static variable.
- Order elements using depth (from front-to-back)
when cycling multiple elements.
This uses a StorageBuf as the source of indirect dispatch argument.
The user needs to make sure the parameters are in the right order.
There is no support for argument offset for the moment as there is no
need for it. But this might be added in the future.
Note that the indirect buffer is synchronized at the backend level. This is
done for practical reasons and because this feature is almost always used
for GPU driven pipeline.
This is a faster way to clear a buffer instead of reuploading new data.
It is equivalent to `memset` and runs directly on the GPU.
This is better to clear huge buffers and to avoid the sync cost of data upload.
This was getting in the way in multiple instances. Compute shaders dispatch
are still made in the presence of the last bound framebuffer even if they
do not interact with it.
Volatile fields were introduced to the RenderResult struct years ago[1].
However, volatile is most likely not doing what it was intended to do
in this instance, and is problematic when moving files to c++ (see
discussion from D13962). There are complex rules around what happens to
these fields but none of them guarantee what the above commit alluded to.
This patch drops the volatile and cleans up the APIs surrounding it.
[1] rB7930c40051ef1b1a26140629cf1299aa89eed859
Passing on all platforms:
https://builder.blender.org/admin/#/builders/18/builds/338
Differential Revision: https://developer.blender.org/D14298
- Rename 'location' to 'mval', typically used for region cursor coords.
- Rename 'retval' to 'changed', typically used for operators
when their return value depends on a change being made.
- Add SelectPick_Params struct to make picking logic more
straightforward and easier to extend.
- Use `eSelectOp` instead of booleans (extend, deselect, toggle)
which were used to represent 4 states (which wasn't obvious).
- Handle deselect_all when pocking instead of view3d_select_exec,
de-duplicate de-selection which was already needed in when replacing
the selection in picking functions.
- Handle outliner update & notifiers in the picking functions
instead of view3d_select_exec.
- Fix particle select deselect_all option which did nothing.
As proposed in T95802, this adds buttons to a new column on the right to modify
the override in the Library Override display mode. Some further usability
improvements are planned. E.g. this does not yet expand collections (modifiers,
constraints, etc) nicely or group modified properties of a modifier together.
Vector properties with more than 3 items or matrices aren't displayed nicely
yet, they are just squeezed into the column. If this actually becomes a problem
there are some ideas to address this.
Differential Revision: https://developer.blender.org/D14268
While the correlation may not work well with adaptive sampling, in practice
this appears to work ok in most cases
Automatic scrambling distance uses the minimum samples from adaptive sampling,
which provides a good default estimate to avoid artifacts.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D13325
This allows users to type in values larger than 1, for use in conjunction
with automatic scrambling distance.
Contributed by Alaska.
Differential Revision: https://developer.blender.org/D13580
When the light direction is not pointing away from the geometric normal and
there is a shadow terminator offset, self intersection is supposed to occur.
Some old platforms and drivers have limited amount of SSBO binding per
compute shader. This disables GPU subdivision if we cannot possibly
bind all required buffers within this limit.
For now the maximum number of buffers used by the GPU code is hardcoded,
but will be programmatically detected when shader creation is automated.
Ref D14337
This adds detection of the maximum number of shader storage buffer
bindings that is supported on the current platform. This can be
useful to turn off features that require compute shaders but use
more buffer bindings than available.
Differential Revision: https://developer.blender.org/D14337
The performance issue was noticeable when tracking a lot of tracks
which are using keyframe pattern matching. What was happening is that
at some cache gets filled in and the furthest away frame gets removed
from the cache: the frame at marker's keyframe gets removed and needs
to be re-read from disk on the next tracking step.
This change makes it so frames at markers' keyframes are not removed
from cache during tracking.
Steps to easily reproduce:
- Set cache size to 512 Mb.
- Open image sequence in clip editor
- Detect features
- Track all markers
Originally was reported by Rik, thanks!
Modified source Armature ID in the join operation was not properly
tagged as such for the depsgraph (and therefore memfile undo)..
Issue caused/revealed by rBe648e388874a.
Should be backported to 3.1 should we make a corrective release.
After rB9b298cf3dbec, the `StructRNA` declarations can now be accessed via
`RNA prototypes.h`
Also, since all redundated declarations are now removed,
`_WM_MESSAGE_EXTERN_BEGIN` and `_WM_MESSAGE_EXTERN_END` are also no
longer needed.
Differential Revision: https://developer.blender.org/D14342
Caused by 0cb5eae9d0 which restored
support for 3D depth when selecting gizmos - making it difficult
to select single lines drawn in front of other gizmos.
Previously the first hit was always used.
Resolve by using a margin around arrow stems when selecting
which was already done for 2D arrows.
This commit adds three nodes:
- `Remove Attribute`: Removes an attribute with the given name
- `Named Attribute`: A field input node
- `Store Named Attribute`: Puts results of a field in a named attribute
They are added behind a new experimental feature flag, because further
development of attribute search and name dependency visualization will
happen as separate steps.
Ref T91742
Differential Revision: https://developer.blender.org/D12685
So far it was needed to declare a new RNA struct to `RNA_access.h` manually.
Since 9b298cf3db we generate a `RNA_prototypes.h` for RNA property
declarations. Now this also includes the RNA struct declarations, so they don't
have to be added manually anymore.
Differential Revision: https://developer.blender.org/D13862
Reviewed by: brecht, campbellbarton
Lets `makesrna` generate a `RNA_prototypes.h` header with declarations for all
RNA properties. This can be included in regular source files when needing to
reference RNA properties statically.
This solves an issue on MSVC with adding such declarations in functions, like
we used to do. See 800fc17367. Removes any such declarations and the related
FIXME comments.
Reviewed By: campbellbarton, LazyDodo, brecht
Differential Revision: https://developer.blender.org/D13837
This patch remove all duplicate code for the same Bake modifier logic.
Still some modifiers need custom bake functions and cannot use this generic bake.
Steps to reproduce:
- Add image sequence to movie clip editor.
- Set cache limit to a low value in the user preferences.
- Playback until old frames starts to be removed from cache.
- Jump to the beginning of the image sequence.
The reason of dead-lock comes from two factors:
- Due to global nature of the cache limiter calls needs to be
guarded with locks.
- Image buffers stored in the cache can have their own cache
(which is used for color management).
Didn't find a better solution than to use recursive lock.
Kind of makes sense since the thread-guardable resource is
recursive (moviecache can have nested moviecaches).
Differential Revision: https://developer.blender.org/D14331
This reverts commit 1558b270e9.
An earlier commit (rB101fadcf6b93c) introduced some new functionality,
which was overlooked in reviewing this commit & got broken.
Will re-commit after the issue has been fixed.
Ref: D13687
If the scale in the offset modifier was set to a value lower than -1,
the object would get mirrored. The problem was, that the thickness
was set to 0 by that. This fix makes the thickness calculation only
use the absolute values.
Differential Revision: http://developer.blender.org/D14324
For now just assume that a node group without output sockets is
an output node. Ideally, we would use run-time information stored
on the node group itself to determine if the group contains a
top-level output node (e.g. Material Output). That can be
implemented separately.
In the larger scheme of things, top-level outputs within node
groups seem to break the node group abstraction and reusability
a bit.
Correction to the calculation of font size used for the tabs on the
Sidebar so that they are always the same size as other content on the
panel.
See D14322 for more details.
Differential Revision: https://developer.blender.org/D14322
Reviewed by Brecht Van Lommel
Instead of allocating a vector of the basis weights cache for
each evaluated point, allocate a single vector for all of the
weights. This should reduce memory usage by avoiding the
overhead of storing many vectors. I noticed a small performance
improvement to evaluated position calculation with an order of 5,
which is larger than `Vector`'s default inline buffer capacity.
This change is possible because of previous commits that
made the basis cache for each evaluated point always have
the same "order" size.
Currently a single buffer is used as working space for all evaluated
points. In order to make evaluations more independent, opening
options like multi-threading in the future, instead use a separate
array for each call. Using an inline buffer capacity higher than
the default allows a few percent performance improvement, and removes
allocations for every evaluated point.
The step after calculating the NURBS basis for a single evaluated
point trimmed extra zeroes from the weights. However, in practice
this rarely did anything, only for the first and last evaluated point
of certain knot configurations. Remove it in order to simplify code.
Also use a separate span for the result, to clarify its length.
Previously, the popover menu in sculpt/texture paint mode did not
take into account the `UnifiedBrushSettings` for the unit.
To fix this, the behavior of `class _draw_tool_settings_context_mode` is matched
by checking the same conditions when setting up the UI of the right-click popover menu.
Fixes T81616
Reviewed By: #sculpt_paint_texture, pablodp606
Maniphest Tasks: T81616
Differential Revision: https://developer.blender.org/D9168
Label alignment in top bar by using `ui_text_icon_width_ex` instead of `w_hint`
Old:
{F12733743}
New:
{F12733742}
Fixes T61558
Reviewed By: Severin
Maniphest Tasks: T61558
Differential Revision: https://developer.blender.org/D13552
This changes drastically the implementation to leverage arbitrary writes
in order to reduce complexity, memory usage and increase speed.
Since we are no longer dependent on the framebuffer requirement, we can
allocate bigger size texture that fits all views and avoid the extra.
Transparency, holdout and emissions are no longer deferred and are now
composited using dual source blending.
The indirect lighting and raytracing is still not functional but will
also gets a large refactor on its own
This is still far from perfect but it is better than not working
correctly.
The view/casters intersection bounds are too big and rough to
compute a decent tilemap level that is near the desire shadow pixels
density.
The algorithm works relatively ok if the sun direction is almost
parallel to the ortho view direction.
This allows removing the indirection for lods during shading since the
tile is not owner of the page unless it uses it.
The cache system is quite more complex but makes it easier to spot
errors since the pages are not scattered into the tile texture.
This also simplify allocation since the free heap is separated from the
cache.
This lib allows any shader to use `print()` like functions for
logging and debugging shaders.
Usage is described in the comment at the top of the file.
Instead of using a manual list of dependency, the new implementation
scans all shader files beginning by `gpu_shader_material_` and extract
all function declarations.
This way we can deduce the internal dependencies between theses files.
This new implementation is merged with the manual pragma dependency system
uses by other shader files. This way it is compatible with the shader
logging system and does not require any string duplication during shader
building.
This introduce `DEBUG_DEPENDENCIES` (not a cmake flag but a local define)
which when set to 1, will list all the original files included in this
shader while omitting the generated / non original code.
This is the layout used by the amdgpu-pro GL implementation.
This also add some sanitizing of the parse output because the bespoke
implementation has bogus error when it comes to compute shaders.
This displays the error source such as IDE can find identify them
as path and let the programmer follow the direct link instead of
manual string search.
This only works for shaders compiling from unaltered sources as
it uses the source `char*` as key for filename search.
For some reason on some GL implementation (amdgpu) this particular
syntaxes shift the error lines.
Remove the context lines by default as they are not useful anymore.
We now use ShaderCreateInfo as a way to setup the custom material
implementation.
This is more versatile and flexible while not require parsing of
snippets of code.
The defrag shader make sure the free heap is free of holes. Making
the allocation more straightforward.
Since we now only reference the pages using the tiles, we introduce
a debug shader that produces an image with page data in a visual way.
This replaces the debug 8 option.
This also fixes some bug that were still present in the pipeline.
This separate the handling of directional lights (sun) into their
own loops. This will help reduce register pressure and remove some
pollution of the local light culling.
All sun lights are packed at the start of the light array.
We now scan the depth buffer after the prepass to tag the needed
shadow tiles.
This is much more precise than the bound box tagging which is now
reserved for transparent objects.
This also:
- fix pixel radius size.
- add a dedicated info buffer to avoid having one unused tile.
Until now the LOD selection was based on distance from camera.
Now it is based on receiver distance ratio. We compute the world
size of one view pixel along with the world size of one shadow texel.
By knowing one point distance to the light or to the view, we can
compute the pixel density ratio and deduce the corresponding LOD.
We use this to compute the min LOD during the visibility selection phase
and the "mean" LOD for usage tagging by BBoxes.
The tagging LOD is a crude approximation as it only uses the BBox
center.
This makes every shadow setup pass aware of the LOD chain of the tilemap
for each cubemap face.
In the free phase, we mask any LOD page that is completely covered by
higher LOD. This avoir commiting memory twice or more per area.
In the allocation phase, we check for the last valid LOD and set it
in the LOD 0 meta data. We also store the actual page location in LOD0 but
do not mark it as allocated as the LOD tile has the ownership of the page.
This removes the light count limit for the forward shaded object. This
also provides a more efficient way of computing the culling directly on
the GPU. Moreover, this avoids doing multiple lighting passes for high
light counts in the deferred pipeline, improving performance.
This continue the effort to implement virtual shadow mapping.
This includes:
- Spot cone culling of tile.
- Tile vs. view frustum tagging.
- Shadowmap Page allocation / freeing.
- Rendering to 4K buffer only tiles that needs it.
- Copying to shadow atlas.
This debug buffer is automatically bound if a shader is including
`common_debug_lib.glsl`. One buffer is created for each shading group
using such a shader.
The shader can then use the functions from that file to draw debug
lines. There is a hardcoded limit of line one buffer can contain. Make
sure to only output lines for a few threads at most.
Under the hood this uses a vertex buffer bound as SSBO that contains
the number of verts and all the positions and colors packed into 1 vec4.
We render by just rendering the whole buffer.
All unused vertices are initialized with NaN positions and will not be
drawn.
This is a total refactor of how shadows are handled.
We use Virtual shadow maps with different Level of details to
ensure a somewhat evenly distributed precision.
The shadow test is a really crude shadow test that will be
improved in further commit.
There is a pool of 4096 Tilemaps that are distributed between
shadowed ligths. These tilemaps are 16x16 each and reference
shadow map pages that are allocated in an atlas. Pages are only
allocated if needed (i.e: visible for rendering an object).
Page management is done on GPU using compute shaders to reduce
CPU task.
On CPU only one draw pass per updated tilemaps is issued.
This reduces the memory requirement of shadowmapping large scenes
with many lights.
Denoising make use of more memory to store and reproject the result of
previous frame to reduce noise. This only works for viewport.
There is a final bilateral filter for cleaning up noise even more.
Screen space Raytracing is supported by alpha blended surfaces.
However only opaque surfaces will be visible to the rays. This means
Alpha blended surfaces cannot reflect or refract themselves.
Denoising is not possible on alpha blended surfaces. Many samples
are needed for noise free results.
Since the cost of tracing can be very high, raytracing will only be
enabled on demand, on a per-material basis.
This simply reuse the reflection raytracing pipeline but with another
ray distribution. Only direct lighting, distant lighting and emissive
light are visible to diffuse rays.
Subsurface effect is not visible but transmittance effect is visible
to diffuse rays.
Indirect diffuse light is processed by the SSS filter.
The new pipeline is now cleaner and allows for deferred refraction.
The refractions are more accurate but are not denoised for now. More
research needs to be done in this area.
There is no feedback buffer for now, so reflections of metallic surfaces
will appear black.
The same restriction on refractive materials still holds true. They will
not appear in screen space tracing of other non refractive surfaces.
However, refractive surfaces (non-blended) can now reflect themselves
and the other surfaces with screen space reflections.
Half res tracing is not implemented back yet.
This is to automate the generation of reuse sample tables and maybe more
in the future. This is not designed to make compilation way longer than
expected.
Same as SSS this has been rewritten to support varying SSS radius.
Instead of relying on shadowmap hack to improve the transmittance
artifact (previously called translucency) we exposed a min thickness
output that will reduce the maximum of light bleeding that can happen
at the shading point. This is far from perfect but at least it is
tweakable.
The effect is now cheaper and the option to enable it is now gone.
It can always be artificially disabled by making the thickness bigger
than the sss radius.
The effect is always enabled for all SSS surfaces and will even be
applied on forward shaded object (alpha blend mode).
This only adds the output but the output is not yet used.
This thickness output is meant to control the aspect of subsurface,
refraction, absorption and volume shaders.
The value expected is the mean thickness inside the object at the
shading point. The source can be a vertex color or a texture map baked
from a raytracer.
This new implementation follows the technique described in
"Efficient screen space subsurface scattering Siggraph 2018".
Compared to the old implementation it fixes a lot of issues at
the cost of it being slower. This fixes:
- Light leaking between different objects.
- Light leaking between different surfaces with different depths.
- SSS radii are now "texturable" per pixel. No SSS surfaces limits.
- Noise should be lower.
- Precomputation is only done once for all SSS surfaces which lowers the
per material storage and precomputation time.
Implementation is also simpler as it is only a one pass processing.
We differ from the reference presentation by not precomputing the
RGB weights per samples. We actually compute them on the fly in order
to support varying SSS radii.
Notes:
- SSS IOR and SSS anisotropy are not supported.
- Object level light leak prevention might not work for high number of
objects in the scene (> 1024). In this case light leak might occur.
Adding or deleting (hidding) objects in the scene might change which
objects can leak.
This was caused by the StructArrayBuffer wrapper not being tagged as NonMovable.
The UBO was in fact being freed at creation time in debug build, but the
pointer was kept as valid in the copied wrapper.
Changing the higher level structure to not use the copy constructor to avoid this.
This is a needed change for the viewport compositor. The compositor
needs to draw to `dtxl->color` to have correct overlay / background
composition.
The solution here is to have a separate buffer that keeps the first
sample we blend from. This increases VRAM usage but it is the most
elegant option.
This integer divide by zero was evaluated to 0 on all platform but apple,
where it yields 1. The world lighting would then sample the 1 sample of the
first grid instead of its own sample.
This was caused by the blend mode that was used even with full opacity.
This caused issues with the viewport was resized and the color of the
framebuffer becomes undefined, leading to undefined values in the blend
equation.
Another fix would be to clear the viewport color on resize inside the
GPUViewport.
This is a necessary step for EEVEE's new arch. This moves more data
to the draw manager. This makes it easier to have the render or draw
engines manage their own data.
This makes more sense and cleans-up what the GPUViewport holds
Also rewrites the Texture pool manager to be in C++.
This also move the DefaultFramebuffer/TextureList and the engine related
data to a new `DRWViewData` struct. This struct manages the per view
(as in stereo view) engine data.
There is a bit of cleanup in the way the draw manager is setup.
We now use a temporary DRWData instead of creating a dummy viewport.
Differential Revision: https://developer.blender.org/D11966
This change the gbuffer layout to use more of the hardware to converting
data back and forth. Normals are encoded as two 16 bits components and
colors as R11G11B10F format.
This was motivated by the need of better quality normals. The issue is
that this increase the GBuffer size consequently. In order to balance
this we chose to merge the refraction and Diffuse/SSS data to use the
same buffer. This means we need to stochastically chose one of these
layers (so noise appear). Given that Glass BSDFs are rarely mixed
with Diffuse BSDFs, we think this is a good tradeoff.
The functions need to be declared before main as prototypes.
The appended libs will use the resources (textures, UBOs) defined at
global scope.
This removes a bit of code duplication and some long macros.
Instead of appending using `BLENDER_REQUIRE`, shaders can now ask for
libs to be added after the shader's `main()` by using the
`BLENDER_REQUIRE_POST` pragma.
Use viewspace instead of world space to compute pixel projection.
This fix issues when camera is far from origin and float precision would
produce artifacts.
This port the facing "flat" normal trick used by the gpencil engine
to EEVEE as well as the thickness mode.
The objects parameters are passed via the objectInfos UBO to avoid
much boiler plate code. However if this UBO grows too much we might have
to split it.
The normal trick for planar surfaces is quite simple to port to the
vertex shader even if it is less efficient.
However to compute it we need the objects bounds. This is passed as a
scale only through the orco factors. This will needs a bit of cleaning
at some points, with boundbox computed at object level.
Nothing much different compared to the previous implementation.
The transparent BSDF and principled BSDF now detects when the material
is potentially transparent to select the best way to render it.
This makes is possible to have AA and correct blending of the
forward rendered spheres.
However, to avoid distorded spheres we need to not support Lookdev
in panoramic projection mode.
Also remove support for LookDev when using render border for now.
This differs a bit from old implementation.
- Instead of manually adjusting the viewport we correctly place the
sphere in the vertex shader.
- Rendering happens after TAA accumulation: This is because we now
support panoramic cameras and TAA would distort the spheres.
This expose the capability of having no light and no probe (except the
world one) for specific views / code path.
The caller just need to pass 0 as extent to the `set_view()` function.
This is usefull for lookdev.
This does not include reference spheres rendering.
The approach is a bit different than before.
Now we use a `bNodeTree` to control the rendering of lookdev. This
generates a `GPUMaterial` that is stored per `Instance`. This way
rendering lookdev is just updating the temp light cache using this
material as world material. Removing the use of custom shader.
This introduces a small hack in order to bind the studiolight hdri after
the nodetree glsl parsing.
The background display however is still using a custom shader in order
to sample the world cubemap with different roughness.
The view space option of the studiolight is now faster by using a
transform before shading instead of rebaking the lightprobe constantly.
This should not have any particular impact on render time.
When evaluating surfaces, the deferred passes needs to sample the
depth buffer. But it also test against the stancil buffer.
Moreover the sampler needs to be a 2D sampler which is not the case
for cubemaps and texture2Darrays.
To overcome this we simply copy the gbuffer depth to another
temp texture using framebuffer blitting.
Some things differs from old implementation.
- Object visibility is filtered correctly without using a visibility
callback (which is to be removed).
The implementation is also more high level using less low level tricks.
A dedicated LightProbeView is created for each lightprobe cubeface to
render using all pipeline (deferred and forward).
There is still a few things not working.
Only world probe is supported for now.
The new implementation diverge from the original by randomly
selecting one lightprobe instead of sampling them all.
This speeds-up rendering a bit.
This is a small convenience. This let the render engine use this
default world if scene has no world.
World is black to keep the same behavior as before.
Shading groups are now created by the material_array_get functions
instead of passing a reference to be filled later. This avoids having
to wait later to maybe create a sub shading group.
This also simplifies different geomety type handling.
This adds a new closure selection method.
- In a first pass, weights are accumulated per output type (diffuse,
reflection, refraction).
- A random threshold is then generated before evaluating the BSDF nodes
again.
- During the evaluation pass the random threshold is decremented until
it reaches 0. At this moment the current BSDF is sampled.
For this to work, I splited the evaluation and the weighting in two
functions for all BSDF. The `*_eval` nodes are generated as dangling
nodes from the graph and only serialized after the rest of the graph.
Recalc flag on Material ID being unavailable to render engine, this
adds a simple way to detect material update by detecting shader creation
or update.
This constructs a "mirror" nodetree that feeds the closure "shader"
nodes with their respective final weight.
The tree is mirrored using simple math nodes. This is quite messy but
this is the only way to proceed without introducing special nodes.
The other issue with this method is that inputs are all uniforms even
for unplugged socket on temporary math nodes with add bloat to the
shader uniform buffer structure.
Only the part relevant to the weighting is duplicated. Other connexions
with the shading tree are reuse.
All shader nodes are updated to receive a `Weight` hidden parameter.
The original shader mixing tree is preserve to let the choice of using
either way to weight the output.
For now this is only done for the output nodes. This will need to be
extended to Closure to RGBA sub-tree.
This is the first step towards the new evaluation scheme of EEVEE
closures.
This commit contains:
- Removal of GPU_SOURCE_BUILTIN type, prefering global instead. This
avoid many boilerplate code since most of the old builtins are now
datas that are always present (i.e: view matrices, normals).
- Rewritting of codegen in C++ to use `std::stringstream`.
- Added a callback to let engine decide what to do with codegen code.
This remove a lot of needs for defines because of code order
dependency. The engine can insert the nodetree code in custom ways
to create advance effects (i.e: add displacement or vertex lighting).
Engine now returns final shader strings.
- Closure nodes evaluation replacment is a placeholder for now.
This is a port of the old material grouping. This is a bit more
clean as we use containers for each passes and other structures.
Nodetree is generated without major error for simple materials but
it is not yet used as closures are not outputed.
This adds the transparency and volume handling in the deferred
render pipeline.
Implementation is still unfinished.
To have better naming convention, I renamed object shader to surface.
This introduce a fat Gbuffer layout that groups closure data in groups
of similar BSDF. The goal is to have at least one sample for each
group to avoid too much code complexity and expected worse performance.
There is a lot of room for buffer reuse to reduce memory usage but it is
not considered a priority for now.
Add a smooth transition to avoid flickering of stochastic effects such
as soft shadows.
This use a simple blend method to progressively reveal the render
after some low sample count to avoid most of the flickering.
Parameters are hardcoded for now.
We use a new RNG to avoid correlation artifacts between Anti-Aliasing
and Shadow samples (see T68594).
The new sequence is a leap halton sequence. This makes it good with
low number of samples and yield less correlation issues.
Another change is that we directly jitter the projection matrix instead
of rotating the view matrix. This is improving convergence time and
avoid passing a second matrix to the shader.
However this case lead to discontinuity artifacts at face boders.
We might want to revert to the old rotation method for this
reason even if convergence is slower.
Now the shadows are linked to a `Light` object. The `Light` object is
linked to an `ObjectKey` to ensure persistence and deletion tracking.
The Uniform data are packed so that there is 1 `ShadowPunctualData`
per light in a `LightBatch`. This means there is only a shadowmap
limit to the number of `Shadow` in a scene.
Difference with previous implementation:
- Better texture space usage of cone and area light shadow.
- Shadows are packed in an atlas. Reducing requirements for future
features.
- Sampling is simpler because shadow matrix does everything.
This follows closely the implementation of 2.5D tiled light
culling described in the presentation:
"Improved Culling for Tiled and Clustered Rendering"
from Michal Drobot
http://advances.realtimerendering.com/s2017/2017_Sig_Improved_Culling_final.pdf
I chose the tile + Z binning approach for its high depth range support
and low CPU overhead & low memory consumption compared to the cluster
based culling. The cons is that the culling is a bit less precise in
some aspect but it is quite balanced.
The culling is done by the `Culling` object which is templated to easily
be reused for light probes cullg.
The Z-binning process is described starting from slide 20 in the
reference pdf.
I also implemented a debug pass to visualize false negative (light
culled when they shouldn't) and light evaluation density.
This is useful to detect failure case and hotspot. This could be exposed
as a developper only render pass in the future.
Some optimization of the reference implementation requires extensions
not yet added to GPU module and will be added later.
This has the basis of clustered light culling but does not yet do
it. The lights are only culled by frustum.
Its the same as if there was only one Cell for the entire Viewport.
This also wrap GPUFrameBuffer & GPUTexture inside eevee:Framebuffer
and eevee:Texture to improve managment.
Another cleanup was to put all members of `Instance` public to
avoid much complexity in accessing the data with modules
dependencies.
Also split velocity View related data to `class Velocity` and
rename previous `Velocity` to `VelocityModule`
Support infinite light count by dividing rendering into chucks of
LIGHT_MAX. Forward passes are just rendered again and deferred passes
(not implemented yet) will just have to have multiple light evaluation
passes.
This is almost the same thing as old implementation.
Differences:
- We clamp the motion vectors to their maximum when sampling the velocity buffer.
- Velocity rendering (and data manager) is separated from motion blur. This allows
outputing the motion vector render pass and in the future use motion vectors to
reproject older frames.
- Vector render pass support (only if motion blur is disabled, just like cycles).
- Velocity tiles are computed in one pass (simpler code, less CPU overhead, less
VRAM usage, maybe a bit slower but imperceivable (< 0.3ms)).
- Two velocity passes are outputed, one for motion blur fx (applied per shading view)
and one for the vector pass. This could be optimized further in the future.
- No current support for deformation & hair (to come).
Bonus addition, support for shutter curve.
Compared to the old implementation, the per time step sync function
is lighter and localized. Also it does not require a full engine
"reboot" in order to work.
Also modifies camera setup to be compatible with future camera motion
blur.
Bonus addition, support for shutter curve.
Compared to the old implementation, the per time step sync function
is lighter and localized. Also it does not require a full engine
"reboot" in order to work.
Pretty much identical to the previous implementation. With the exception
of a temporary noise function and some simplification of the CoC
computation. This also fixes issues with the Ortho depth of field.
Most of the files were modified to comply to new shader codestyle.
This also adds partial support of panoramic cameras (bokeh and
anamorphic is still buggy).
This cleansup a lot of confusion / complexity in the setup code.
Setup is closer to what cycles does now.
Also duplicates some buggy behavior of Cycles for now until this
is fixed.
This move view resolution handling to the `Camera` class that will
in the future clip and trim each view in panoramic projection.
There is a new `CameraView` that contains the `DRWView` and subview.
This way each `ShadingView` is associated to a unique `CameraView`.
ShadingView` & `CameraView` are all allocated & defined at creation time
but only the one activated by `Camera` will be rendered.
This option will make accumulation happen in a pre exposed logarithm
color space. This reduces the importance of bright pixels in the pixel
filter which will result in less aliasing in theses areas.
There is a few cases where one might want to disable this option to
match cycles better.
Render mode is really close to what the viewport render does.
Film output is done by resolving the data to the next (double buffered)
framebuffer and read back.
This also includes a bit of cleaning about naming of init() and sync()
functions.
This commit adds the Film class that handles accumulation of color and
non-color data using arbitrary projection and filter size.
A weighted accumulation (sum) is done into a data buffer with an
additional weight buffer. The sum being per pixel, it allows the input
textures that are not aligned with the output pixel grid.
Panoramic projection works by rendering a cubemap (6 views) of the scene
at the camera position. The Film filter pass then gather the pixels
using the correct Panoramic projection ensuring correct Anti-Aliasing.
For Non-color data (depth, normals) we only keep the closest value to
the target pixel center (simulating a filter size of 0).
Color data is accumulated in a log space to improve AntiAliasing output.
This is hardcoded for now.
Larger filters have poor performance but are very fast to converge.
Code Wise: This commit rename some modules to avoid possible confusion
and have better meaning. Use namespace instead of prefixes.
Added a new eevee_shared.hh file to share structure and enum definitions
between GLSL and C++.
Same idea as previous commit. This cleans-up the interface and put all
viewport related data inside the `DRWData` struct.
The draw manager is responsible for freeing it. That is the main point
of this all. In the future, we can have custom freeing method for each
engine.
This also move the DefaultFramebuffer/TextureList and the engine related
data to a new `DRWViewData` struct. This struct manages the per view
(as in stereo view) engine data.
There is a bit of cleanup in the way the draw manager is setup.
We now use a temporary DRWData instead of creating a dummy viewport.
@@ -348,8 +348,8 @@ class CyclesRenderSettings(bpy.types.PropertyGroup):
scrambling_distance:FloatProperty(
name="Scrambling Distance",
default=1.0,
min=0.0,max=1.0,
description="Reduce randomization between pixels to improve GPU rendering performance, at the cost of possible rendering artifacts if set too low. Only works when not using adaptive sampling",
min=0.0,soft_max=1.0,
description="Reduce randomization between pixels to improve GPU rendering performance, at the cost of possible rendering artifacts if set too low",
)
preview_scrambling_distance:BoolProperty(
name="Scrambling Distance viewport",
@@ -360,7 +360,7 @@ class CyclesRenderSettings(bpy.types.PropertyGroup):
auto_scrambling_distance:BoolProperty(
name="Automatic Scrambling Distance",
default=False,
description="Automatically reduce the randomization between pixels to improve GPU rendering performance, at the cost of possible rendering artifacts. Only works when not using adaptive sampling",
description="Automatically reduce the randomization between pixels to improve GPU rendering performance, at the cost of possible rendering artifacts",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.