Until now, on each mouse/key event preview render restarted with first tile.
It now rememers where it was, and continues rendering.
Also tried to get threaded preview working, but its more work than I can
spend right now. Back to bugs :)
- Improved stats drawing while rendering, it now draws - while preparing
renderdata - each second the amount of verts/faces.
Also while rendering, the amount of finished and total parts are printed.
- Added ESC in loop that generated Group render data
- On deleting Render Layers, the nodes that use them are now checked and
corrected.
- Restored drawing all scanlines in renderwindow... this wasn't the bug!
sampling have been activated for UI. Check the pictures here:
http://www.blender.org/bf/filters/index2.html
I also did do tests with anti-aliased shadowbuffers:
http://www.blender.org/bf/filters/index3.html
But this needs more thinking over still...
- Compositor now frees memory of buffers internally used in groups
immediately. This wasn't part of the event-based cache anyway
- New option: "Free Texture Images" (in render Output panel). This
frees after each render of each scene all images and mipmaps as
used by textures. As reference it prints total amount of MB freed.
- Render stage 'creating speed vectors' had no ESC checking yet
- Made drawing scanline updates during render draw 1 scanline less...
dunno, still hunting for weird opengl crashes.
- 3D preview render didn't properly skip sequence or composit render.
I noticed still several cases where the Imbuf library was called within a
thread... and that whilst the Imbuf itself isn't threadsafe. Also the
thread lock I added in rendering for loading images actually didn't
work, because then it was still possible both threads were accessing the
MEM_malloc function at same time.
This commit nearly fully replaces ImBuf calls in compositor (giving another
nice speedup btw, the way preview images in Nodes were calculated used
clumsy imbuf scaling code).
I've also centralized the 'mutex' locking for threading, which now only
resides in BLI_threads.h. This is used to secure the last ImBuf calls
I cannot replace, which is loading images and creating mipmaps.
Really hope we get something more stable now!
1) Accumulation buffer alpha handling
Accumulating colors in an accumulation is simple; a weighting factor can
make sure colors don't over- or undersaturate.
For alpha this is a bit more complex... especially because the masks for
vectorblur are anti-aliased themselves with alpha values. Up to now I just
premultiplied the mask-alpha with the actual color alpha, which worked OK
for solid masks, but not for transparent ones. I thought that would be an
acceptable situation, since 'ztra' faces only get blurred with alpha==1.
However, it gives bad results when using 'mist' in Blender, which just
gives pixels an alpha value based on camera distance. In these cases the
alpha became oversaturated, accumulating into too high values.
The solution is to store the mask-alpha separately, only premultiply this
alpha with the weighting factor to define the accumulation amount.
This is the math:
blendfactor: the accumulation factor for a vectorblur pass
passRGBA: color and alpha value of the current to be accumulated pass
accRGBA: color and alpha value of accumulation buffer (initialized
with original picture values)
maskA: the mask's alpha itself
accRGBA = (1 - maskA*blendfactor)*accRGBA + (maskA*blendfactor)*passRGBA
This formula accumulates alpha values equally to colors, only using the
mask-alpha as 'alpha-over' operation.
It all sounds very logical, I just write this extensive log because I
couldn't find any technical doc about this case. :)
2) Creating efficient masks with camera-shake
Vector blur can only work well when there's a clear distinction between
what moves, and what doesn't move. This you can solve for example by
rendering complex scenes in multiple layers. This isn't always easy, or
just a lot of work. Especially when the camera itself moves, the mask
created by the vectorblur code becomes the entire image.
A very simple solution is to introduce a small threshold for
moving pixels, which can efficiently separate the hardly-moving pixels
from the moving ones, and thus create nice looking masks.
You can find this new option in the VectorBlur node, as 'min speed'.
This mimimum speed is in pixel units. A value of just 3 will already
clearly separate the background from foreground.
Note; to make this work OK, all vectors in an image are scaled 3 pixels
smaller, to ensure everything keeps looking coherent.
Test renders; 'Elephants Dream' scene with lotsof moving parts; rendered
without OSA, image textures, shadow or color correction.
No vectorblur:
http://www.blender.org/bf/vblur.jpg
With vectorblur, showing the alpha-saturation for mist:
http://www.blender.org/bf/vblur1.jpg
New accumulation formula:
http://www.blender.org/bf/vblur2.jpg
Same image, but now with a 3 pixel minimum speed threshold:
http://www.blender.org/bf/vblur3.jpg
Next frame, without minimum speed
http://www.blender.org/bf/vblur4.jpg
Same frame with speed threshold:
http://www.blender.org/bf/vblur5.jpg
(Only 20 steps of vectorblur were applied for clarity).
The code that generated mipmaps took a real long time to do it... on a
5k x 5k image it took here (no optim, debug compile) 32.5 sec.
Recoded the very old filtering routine, which already brought it down to
2.8 seconds. Then tested if we even need this filtering... in many cases
the images are painted or photographs, which is filtered OK already.
Without the filter, the mipmap timing went down to 0.39 second. :)
http://www.blender.org/bf/filters/index1.html
Here's an example of two 'mips' generated with or without gauss filter.
Note that aliasing in an image remains there... which can be a wanted
effect anyway.
So; added the gauss filter as option in making mipmaps. Also had to
reshuffle the buttons there in a more logical manner.
There's also disabled code in the do_versions to set 'gauss' on in older
files. Will be enabled during release time.
sockets were not used yet... now they're verified on read, and written
in socket stack data on adding new nodes.
Also the buttons in Nodes use these values now. Special request from
Nathan Vegdahl who seems to be messing around with my precious nodes! :)
get sampled on larger distance. It actually just flattens bump when the
sampled area is (much) larger than pixel size, to prevent weird things
like:
current render:
http://www.blender.org/bf/b1.jpg
distance corrected:
http://www.blender.org/bf/b2.jpg
(image based on Alexander file :)
Tested on env's dinos too... seems to work, but we'll see.
- LampHalos can be rendered separately too. Just disable 'Solid' in a
layer and keep 'Halo' option enabled.
- Note that disabling 'Solid' will still fill in Z values for the solid
faces, to provide occlusion information for the Ztransp and Halo layer
options. The latter didn't work this way until now for OSA render.
ALso note that that Ztransp+LampHalo still isn't good marriage... it
renders a bit weird, but that's an old issue. :)
- it now correctly pre-multiplies with alpha the RGB values for the
antialised mask (alpha artefacts were visible)
- The transparent layer will add speed vectors on top of the solid layer,
cancelling out cases where the solid layer was not moving (like in its
own antialising.
This works fine, for as long you don't render in a single pass trans-
parent faces that move on top of not-moving solid faces.
You now can set a Preview panel in the Image window, to define a sub-rect
of an image to be processed. Works like the preview in 3D Window. Just
press SHIFT+P to get it activated. Very nice speedup!
This is how it works:
- The compositor still uses the scene image size (including % setting) for
Viewer or Composite output size
- If a preview exists, it calculates the cropped rect from its position
in the Image window, and stores that in the Scene render data
- On composite execute, it copies only this part from the 'generator nodes',
right now Images or Render Results. That makes the entire composite tree
only using small rects, so it will execute fast.
- Also the render window will only display the cropped rect, and on F12
only the cropped part is being executed
- On rendering in background mode, the cropping is ignored though.
Usability notes:
- translating or zooming view will automatically invoke a recalculation
- if you zoom in on details, the calculated rect will even become smaller
- only one Imagewindow can have this Preview Panel, to prevent conflicts of
what the cropped area should be. Compositing is on Scene level, not local
per image window. (Note; 3D Previews are local per window!)
- Closing the preview panel will invoke a full-size recalculation
- All passes/layers from rendering are nicely cropped, including Z and
vectors.
The work to make the compositor do cropping was simple, but getting the
Image window displaying correctly and get all events OK was a lot of work...
indeed, we need to refactor Image Window usage once. Sorry for making the
mess even bigger now. :) I've tried not to interfere with UV edit or Paint
though... only when you're in compositing mode the panel will work.
BUG fix:
3D Preview render didn't work when multiple layers were set in the current
scene.
as triangles, with a tag bit to denote which triangle was which part of
the quad. That was hardcoded bit 0x800000, which allows a maximum of
about 8 million quads...
I've made this a nice #define, set to be 16 times larger. So, now the
facejunkies can go up to 128 Million faces, were it not that this will eat
up a load of memory!
I only have 1 Gig in this machine. A test with 9M vertices and 7.5M quads
eats up 912 MB of memory already. If this becomes a real issue, I know
tricks how to make the vertices 20 bytes smaller, and faces 4 bytes, which
would in the above case save about 200 MB. Not much... but probably worth
the try? A much better method is of course 'bucketing' the renderdata per
tile. It's a spec of the render recode, but not a quicky to add.
Also: bug fix in curve code. There was a short counter still, crashing on
large curves with resol set to 1024 :)
- Composit cache now gets fully freed on a render. Each output socket of a
node stores the entire image... and while render that's a waste of memory
- Sky 'paper' render was using wrong texture coordinates
- Found missing test_break() in ztransp rendering.
- ZTransp render now also delivers Z values and Speed vectors in passes
Note that speed vectors accumulate within a pixel to store the minimum,
so rendering ztransp on top of a non-moving plane won't give speed...
Best results you get is by rendering it in a separate layer.
The Z value stored is the closest visible transparent face in the pixel.
Fixes:
- Render to 'spare page' has been enabled again. Because of the strict
separation of Render and UI, but especially because a 'render result' now
can consist of unlimited images, I've not made this a Render feature.
Instead, the render-window itself stores the 'spare' image... I also
had to change the convention for it a bit.
Now, instead of having two "render buffers" (which was a render feature),
the RenderWindow will store each previous frame on a re-render. This
storing will only start after you've pressed 'Jkey' once, but then always
will happen for as long the rendered image is same size as previously.
For clarity, I've also renamed the window title, to 'previous frame'.
- RenderWindow shows alpha again on Akey
- Display of the Zvalues in ImageWindow has been tweaked. White now denotes
closest, and the color range goes from camera clip-sta to clip-end.
- Bugfix: on splitting/merging/duplicating windows, the 3D Previewrender was
not always freed correctly, potentially causing crashes or memory leaks.
vertex locations, not global coordinates. This ensures consistant
autosmoothing for each frame. Also fixes missing vectorblur for parts.
Nice task for a dev: put autosmooth code in end of modifier stack... then
it also shows in 3D window
- BUG FIX! I noticed the last tile rendered quite slow, and even did not
update scanlines. Found out that the main tiles processor didn't go
to sleep when the last tile was rendered, because it detected a free
possible thread. This caused the main thread to go into a very tight
loop, eating up a lot of cpu and blocking the other thread.
vertices differed on previous/next frame, causing speedvector calculus
to be skipped.
Now that worked OK, where it not that non-existing speed vectors were not
initialized zero while rendering...
Also another issue showed up with autosmooth. When using exact smooth
angles (like 30 degrees) on a model that has been spinned with exactly
30 degree steps, the autosmooth gave different results on each frame...
and only when compiled in O2 (probably thats doing bad float rounding).
Solved this by just adding 0.1 to the user defined smooth angle.
vectors. It's actually shutter speed, but in this case works identical to
the old motionblur 'blur fac' button.
Note; the "Max Speed" button only clips speed, use this to prevent
extreme speed values. Max speed applied before the scaling happens.
After a couple of experiments with variable blur filters, I tried
a more interesting, and who knows... original approach. :)
First watch results here:
http://www.blender.org/bf/rt0001_0030.avihttp://www.blender.org/bf/hand0001_0060.avi
These are the steps in producing such results:
- In preprocess, the speed vectors to previous and next frame are
calculated. Speed vectors are screen-aligned and in pixel size.
- while rendering, these vectors get calculated per sample, and
accumulated in the vector buffer checking for "minimum speed".
(on start the vector buffer is initialized on max speed).
- After render:
- The entire image, all pixels, then is converted to quad polygons.
- Also the z value of the pixels is assigned to the polygons
- The vertices for the quads use averaged speed vectors (of the 4
corner faces), using a 'minimum but non-zero' speed rule.
This minimal speed trick works very well to prevent 'tearing' apart
when multiple faces move in different directions in a pixel, or to
be able to separate moving pixels clearly from non-moving ones
- So, now we have a sort of 'mask' of quad polygons. The previous steps
guaranteed that this mask doesn't have antialias color info, and has
speed vectors that ensure individual parts to move nicely without
tearing effects. The Z allows multiple layers of moving masks.
- Then, in temporal buffer, faces get tagged if they move or not
- These tags then go to an anti-alias routine, which assigns alpha
values to edge faces, based on the method we used in past to antialias
bitmaps (still in our code, check the antialias.c in imbuf!)
- finally, the tag buffer is used to tag which z values of the original
image have to be included (to allow blur go behind stuff).
- OK, now we're ready for accumulating! In a loop, all faces then get
drawn (with zbuffer) with increasing influence of their speed vectors.
The resulting image then is accumulated on top of the original with a
decreasing weighting value.
It sounds all quite complex... but the speed is still encouraging. Above
images have 64 mblur steps, which takes about 1-3 seconds per frame.
Usage notes:
- Make sure the render-layer has passes 'Vector' and 'Z' on.
- add in Compositor the VectorBlur node, and connect the image, Z and
speed to the inputs.
- The node allows to set amount of steps (10 steps = 10 forward, 10 back).
and to set a maximum speed in pixels... to prevent extreme moving things
to blur too wide.
- Scene support in RenderLayers
You now can indicate in Compositor to use RenderLayer(s) from other scenes.
Use the new dropdown menu in the "Render Result" node. It will change the
title of the node to indicate that.
The other Scenes are rendered fully separate, creating own databases (and
octrees) after the current scene was finished. They use their own render
settings, with as exception the render output size (and optional border).
This makes the option an interesting memory saver and speedup.
Also note that the render-results of other scenes are kept in memory while
you work. So, after a render, you can tweak all composit effects.
- Render Stats
Added an 'info string' to stats, printed in renderwindow header. It gives
info now on steps "creating database", "shadow buffers", and "octree".
- Bug fixes
Added redraw event for Image window, when using compositor render.
Text objects were not rendered using background render (probably a bug
since depsgraph was added)
Dropdown buttons in Node editor were not refreshed after usage
Sometimes render window did not open, this due to wrong check for 'esc'.
Removed option that renders view-layers on F12, with mouse in 3d window.
Not only was it confusing, it's now more efficient with the Preview Panel,
which does this nicely.
optionally save a jpg next to it, with compression as set in buttons.
This allows quick previews or download from farms.
Button: next to the 'half' and 'zbuf' options for exr.
http://www.blender.org/bf/filters/
I found out current blur actually doesn't do gauss, but more did regular
quadratic. Now you can choose common filter types, but more specifically;
- set gamma on, to emphasize bright parts in blur more than darker parts
- use the bokeh option for (current circlular only) blur based on true
area filters (meaning, for each pixel it samples the entire surrounding).
This enables more effects, but is also much slower. Have to check on
optimization for this still... use with care!
- RenderLayers with 'view layers' set, now also take visible lights into
account. Works just like for scene layer settings.
- On ESC from render, compositing (if set) is being skipped too
- While rendering with multiple RenderLayers it will end with a display
of the current RenderLayer (as in Scene buttons)
- Enabled Groups to execute in Compositor. They were ignored still.
Note; inside of groups nothing is cached, so a change of a group input
will recalculate it fully. This is needed because groups are linked
data (instances use same internal nodes).
- Made Composit node "Viewer" display correctly input for images with
1/2/3/4 channels.
- Added pass rendering, tested now with only regular Materials. For
Material nodes this is quite more complex... since they cannot be
easily separated in passes (each Material does a full shade)
In this commit all pass render is disabled though, will continue work on
that later.
Sneak preview: http://www.blender.org/bf/rt.jpg (temporal image)
- What did remain is the 'Normal' pass output. Normal works very nice for
relighting effects. Use the "Normal Node" to define where more or less
light should be. (Use "Value Map" node to tweak influence of the
Normal node 'dot' output.)
- EVIL bug fix: I've spend almost a day finding it... when combining AO and
mirror render, the event queue was totally screwing up... two things not
related at all!
Found out error was in ray-mirror code, which was using partially
uninitialized 'ShadeInput' data to pass on to render code.
- Another fix; made sure that while thread render, the threads don't get
events, only the main program will do. Might fix issues reported by
people on linux/windows.
- Live scanline updates while rendering
Using a timer system, each second now the tiles that are being processed
are checked if they could use display.
To make this work pretty, I had to use the threaded 'tile processor' for
a single thread too, but that's now proven to be stable.
Also note that these updates draw per layer, including ztransp progress
separately from solid render.
- Recode of ztransp OSA
Until now (since blender 1.0) the ztransp part was fully rendered and
added on top of the solid part with alpha-over. This adding was done before
the solid part applied sub-pixel sample filtering, causing the ztransp
layer to be always too blurry.
Now the ztransp layer uses same sub=pixel filter, resulting in the same
AA level (and filter results) as the solid part. Quite noticable with hair
renders.
- Vector buffer support & preliminary vector-blur Node
Using the "Render Layer" panel "Vector" pass button, the motion vectors
per pixel are calculated and stored. Accessible via the Compositor.
The vector-blur node is horrible btw! It just uses the length of the
vector to apply a filter like with current (z)blur. I'm committing it anyway,
I'll experiment with it further, and who knows some surprise code shows up!
Yesterdays commit slightly extended clipping area for window, to ensure
no empty borders get rendered. Unfortunately it reveiled a case in code
that was never handled; clipping code was throwing away good faces. Old
bug... but apparently never showed up?
for compositing code.
Officially malloc/calloc/free is threadsafe, but our secure malloc system
requires all memory blocks to be stored in a single list, so when two
threads write in this list you get conflicts.
- the sub-pixel masks for applying correct filters (gauss and friends)
accidentally were y-flipped, causing bad looking results.
- zbuffer was clipping extremely narrow, causing border pixels to miss
samples, and reveiling alpha that way (was in old render a prob too)
- Compositor now is threaded
Enable it with the Scene buttons "Threads". This will handle over nodes to
individual threads to be calculated. However, if nodes depend on others
they have to wait. The current system only threads per entire node, not for
calculating results in parts.
I've reshuffled the node execution code to evaluate 'changed' events, and
prepare the entire tree to become simply parsed for open jobs with a call
to node = getExecutableNode()
By default, even without 'thread' option active, all node execution is
done within a separate thread.
Also fixed issues in yesterdays commit for 'event based' calculations, it
didn't do animated images, or execute (on rendering) the correct nodes
when you don't have Render-Result nodes included.
- Added generic Thread support in blenlib/ module
The renderer and the node system now both use same code for controlling the
threads. This has been moved to a new C file in blenlib/intern/threads.c.
Check this c file for an extensive doc and example how to use it.
The current implementation for Compositing allows unlimited amount of
threads. For rendering it is still tied to two threads, although it is
pretty easy to extend to 4 already. People with giant amounts of cpus can
poke me once for tests. :)
- Bugfix in creating group nodes
Group node definitions demand a clear separation of 'internal sockets' and
'external sockets'. The first are sockets being linked internally, the latter
are sockets exposed as sockets for the group itself.
When sockets were linked both internal and external, Blender crashed. It is
solved now by removing the external link(s).
system tracking changes in nodes, making sure only these nodes and
the ones that depend, are executed.
Further the 'time cursor' now counts down to indicate which node is being
done.
Also: you now can disable the "use nodes" button in the header, edit all
changes, and when you press that button again it nicely executes the
changes.
Still on the todo:
- make compositing threaded
- find a way to nicely exit compositing on input events... so the UI
keeps being responsive
- idea; a 'percentage' menu in header to enforce calculations on smaller
images temporally
threads didn't seem to free allocated memory... while rendering an long
sequence, the 'virtual memory' size grew with about 20 meg per frame.
This appeared to be not related to using malloc in threads (works
properly), but just because threads were not closed properly.
I assumed that the call to SDL_CreateThread() also closes the thread
when finished... but that seems to be not the case. By using a call to
SDL_WaitThread() after the thread was finished the memory heap is stable
again.
This is something I've seen not documented anywhere... the SDL man
pages are horrible sparse; Take for example the official page:
http://manuals.thexdershome.com/SDL-1.2.5/html/sdlcreatethread.html