Pretty much straightforward change which gives around 30%
speedup on my laptop and around 2x speedup on desktop in
the BI (which uses gts580). Tested with huge blurs (like
10% of blur) which was rather common during Caminandes.
For now OpenCL is only limited for blur size more than
100 pixels.
This is a bit experimental still, feedback is welcome.
Reviewers: jbakker, lukastoenne
Subscribers: ton
Differential Revision: https://developer.blender.org/D576
The UV values includes the image width/height now. To restore the
previous method as close as possible (even though it is not documented
anywhere how this is supposed to work), we have to ignore this scaling.
This ensures that the beams color does not darken along borders,
by using the last valid color of the ray as the border color (extending
colors in the direction of the source point).
The sunbeams node was clamping the range of influence to start at 1
pixel distance from the source. This was a poor fix for artifacts caused
by an off set in buffer coordinates. Since the u coordinate starts at
ceil(umax) the v coordinate also has to use ceil. This also fixes some
discontinuities that became visible when the source point is close to
a sharp line in the input image.
For now it was mainly about OpenCL wrangler being duplicated
between Cycles and Compositor, but with OpenSubdiv work those
wranglers were gonna to be duplicated just once again.
This commit makes it so Cycles and Compositor uses wranglers
from this repositories:
- https://github.com/CudaWrangler/cuew
- https://github.com/OpenCLWrangler/clew
This repositories are based on the wranglers we used before
and they'll be likely continued maintaining by us plus some
more players in the market.
Pretty much straightforward change with some tricks in the
CMake/SCons to make this libs being passed to the linker
after all other libraries in order to make OpenSubdiv linked
against those wranglers in the future.
For those who're worrying about Cycles being less standalone,
it's not truth, it's rather more flexible now and in the future
different wranglers might be used in Cycles. For now it'll
just mean those libs would need to be put into Cycles repository
together with some other libs from Blender such as mikkspace.
This is mainly platform maintenance commit, should not be any
changes to the user space.
Reviewers: juicyfruit, dingto, campbellbarton
Reviewed By: juicyfruit, dingto, campbellbarton
Differential Revision: https://developer.blender.org/D707
This allows adding a "fake" sun beam effect, simulating crepuscular rays
from light being scattered in a medium like the atmosphere or deep water.
Such effects can be created also by renderers using volumetric lighting,
but the compositor feature is a lot cheaper and is independent from 3D
rendering. This makes it ideally suited for motion graphics.
The implementation uses am optimized accumulation method for gathering
color values along a line segment. The inner buffer loop uses fixed
offset increments to avoid unnecessary multiplications and avoids
variables by using compile-time specialization (see inline comments
for further details).
Proxy operations from muted nodes would still create conversion
operations where the datatypes don't match, which creates unexpected
behavior. Arguably datatype conversion could still happen even when the
main operation is muted, but this would be a design change and so is
disabled now.
This gives around 30% of speedup for gaussian blur node.
Pretty much straightforward implementation inside the node
itself, but needed to implement some additional things:
- Aligned malloc. It's needed to load data onto SSE registers
faster. based on the aligned_malloc() from Libmv with
some additional trickery going on to support arbitrary
alignment (this magic is needed because of MemHead).
In the practice only 16bit alignment is supported because
of the lack of aligned malloc with arbitrary alignment
for OSX. Not a bit deal for now because we need 16 bytes
alignment at this moment only. Could be tweaked further
later.
- Memory buffers in compositor are now aligned to 16 bytes.
Should be harmless for non-SSE cases too. just mentioning.
Reviewers: campbellbarton, lukastoenne, jbakker
Reviewed By: campbellbarton
CC: lockal
Differential Revision: https://developer.blender.org/D564
The node was using sampler from the callee node and passed
it to the input nodes. Since the fact that compositor output
node uses NEAREST interpolation (why it uses nearest is the
whole separate story) it's not possible to have subpixel
precision in such cases:
<image> -> <translate> -> <output>
For now solving by hard-coding translate node to use BILINEAR
interpolation. It can't become worse in this node anyway and
the sampling pipeline is to be re-visited from scratch.
This commit pretty much reverts all the changes related on tile-ability
of the fast gaussian blur. It's not tilable by definition and would almost
always give you seams on the tile boundaries.
Atmind already met the issue and tried to solve it by increasing some
magic constant, which is pretty much likely simply made it so compositor
switched to full-frame calculation in that particular .blend file.
Fast gaussian is really not a production thing and need to be avoided.
We're to improve speed of normal gaussian blur instead.
The formula was not consistent across Blender and behaved strangely, now it is
a simple linear blend between color1 and min(color1, color2).
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D489
This issue is because of a somewhat "special" behavior in old code, which got lost during rB09874df:
There was a variant of the `relinkConnections` function which would leave the socket completely unconnected. This is not a valid state really (given that each unconnected input must otherwise connected to a constant `Set` type node), but was used as a way to distinguish connected alpha/depth sockets in composite and viewer output nodes.
https://developer.blender.org/diffusion/B/browse/master/source/blender/compositor/intern/COM_InputSocket.cpp;28a829893c702918afc5ac1945a06eaefa611594$69
After the large cleanup patch ({D309}) every socket is now automatically connected to a constant, such that `getInputSocketReader` will never return a NULL pointer. This breaks the previous test method, which needs to be replaced by more explicit flags. Luckily this was done only for very few output nodes (Composite, Viewer, Output-File). These now use the regular SetValueOperation default in case "use alpha" is disabled, but set this to an explicit 1.0 value instead of mapping to the node socket.
Many parts of the compositor are unnecessarily complicated. This patch
aims at reducing the complexity of writing nodes and making the code
more transparent.
== Separating Nodes and Operations ==
Currently these are both mixed in the same graph, even though they have
very different purposes and are used at distinct stages in the
compositing process. The patch introduces dedicated graph classes for
nodes and for operations.
This removes the need for a lot of special case checks (isOperation etc.)
and explicit type casts. It simplifies the code since it becomes clear
at every stage what type of node we are dealing with. The compiler can
use static typing to avoid common bugs from mixing up these types and
fewer runtime sanity checks are needed.
== Simplified Node Conversion ==
Converting nodes to operations was previously based on "relinking", i.e.
nodes would start with by mirroring links in the Blender DNA node trees,
then add operations and redirect these links to them. This was very hard
to follow in many cases and required a lot of attention to avoid invalid
states.
Now there is a helper class called the NodeConverter, which is passed to
nodes and implements a much simpler API for this process. Nodes can add
operations and explicit connections as before, but defining "external"
links to the inputs/outputs of the original node now uses mapping
instead of directly modifying link data. Input data (node graph) and
result (operations graph) are cleanly separated.
== Removed Redundant Data Structures ==
A few redundant data structures have been removed, notably the
SocketConnection. These are only needed temporarily during graph
construction. For executing the compositor operations it is perfectly
sufficient to store only the direct input link pointers. A common
pointer indirection is avoided this way (which might also give a little
performance improvement).
== Avoid virtual recursive functions ==
Recursive virtual functions are evil. They are very hard to follow
during debugging. At least in the parts this patch is concerned with
these functions have been replaced by a non-virtual recursive core
function (which might then call virtual non-recursive functions if
needed). See for example NodeOperationBuilder::group_operations.
which tiles are selected for input there was always a constant for correcting
the accuracy.
It seems that the constant was not enough and has been adjusted. (2 => 3).
This is a regression caused by rB67134a7bf689279785e2e40b29cd24243813998b
The UV coordinates read from the UV input must be scaled by the Image
input size instead of the UV input size.
Also now this node uses the UV input resolution instead of the Image
resolution, since this is what determines the available resolution. The
image is EWA-sampled anyway, it's resolution does not have a direct
impact.
This was suggested by Christopher Barrett (terrachild). Corner pin is a common feature in compositing.
The corners for the plane warping can be defined by using vector node inputs to allow using perspective plane transformations without having to go via the MovieClip editor tracking data.
Uses the same math as the PlaneTrack node, but without the link to MovieClip and Object.
{F78199}
The code for PlaneTrack operations has been restructured a bit to share it with the CornerPin node.
* PlaneDistortCommonOperation.h/.cpp: Shared generic code for warping images based on 4 plane corners and a perspective matrix generated from these. Contains operation base classes for both the WarpImage and Mask operations.
* PlaneTrackOperation.h/.cpp: Current plane track node operations, based on the common code above. These add pointers to MovieClip and Object which define the track data from wich to read the corners.
* PlaneCornerPinOperation.h/.cpp: New corner pin variant, using explicit input sockets for the plane corners.
One downside of the current compositor design is that there is no concept of invariables (constants) that don't vary over the image space. This has already been an issue for Blur nodes (size input is usually constant except when "variable size" is enabled) and a few others. For the corner pin node it is necessary that the corner input sockets are also invariant. They have to be evaluated for each tile now, otherwise the data is not available. This in turn makes it necessary to make the operation "complex" and request full input buffers, which adds unnecessary overhead.
This can happen if no image buffers are used to define a sensible
resolution. Then the viewer will stiff create a float buffer in the
output imbuf, which defies the usual ibuf->rect_float check and leads
to invalid memory access. Float buffer should not be created in this
case.
nodes.
The Rotate node was calculating the center with a 1 pixel offset, which
effectively shifts the image by 1 pixel on one or both axis for
right-angle (90 degree) rotations.
Note that the wrapping feature for translate nodes can still produce
undesirable results for non-quadratic images. This is because of how
the resolution calculation works atm: the Rotate node will keep the
resolution of the input image, even if the resulting image is then
cropped or leaves empty margins. There is no easy way to fix that
without redesign.
The underlying cause for these issues is the insufficient sampling of
the bokeh image. For smaller blur radius there will be very few samples
taken, and with 1-pixel radius it boils down to just 4 samples:
2 on the left border (black), 1 in the center (black) and 1 at the top
border (blue) ...
For now have added the workarounds implemented in the CPU version of
that node, which hide these artifacts. Ultimately would be better to
have mipmap levels for the bokeh image input instead.
The blur operations were clamping the filter size to 1, which prevents
no-op blur nodes. Further any value < 1 would also be ignored and in
many combinations the filter scale setting ("Size") would only work in
integer steps.
Now most blur settings will work with smooth Size value scaling as well,
meaning you can choose a reasonably large filter size (e.g. 10) and then
use the Size factor to scale the actual blur radius smoothly.
Note that non-integer filter sizes also depend on the filter type
selected in the Blur node, e.g. "Flat" filtering will still ignore
smooth filter sizes. Gaussian filters work best for this purpose.