For some reason I thought SDL thread handling would be much simpler... but
the migration to posix pthread went very smooth and painless. Less code
even, and I even notice a slight performance increase!
All threading code is still wrapped in blenlib/intern/threads.c
Only real change was making the callback functions to return void pointer,
instead of an int.
The mutex handling is also different... there's no test anymore if a
mutex was initialized, which is a bit confusing. But it appears to run
all fine still. :)
Nathan Letwory has been signalled already to provide the Windows pthread
library and make/scons linking. For MSVC we might need help from someone
else later though.
the ones that get changed within threads, to communicate with the main
thread.
(Part of the long quest to get threaded render safe, especially in Linux)
passes in single file. Code is currently disabled, commit is mainly to
have a nicer method of excluding OpenEXR dependency from render module.
This should compile with disabled WITH_OPENEXR too.
Reason why EXR is great to include by default in Blender is its feature
to store unlimited layers and channels, and write this tile based. I
need the feature for saving memory; while rendering tiles, all full-size
buffers for all layers and passes are kept in memory now, which can go
into 100s of MB easily.
The code I commit now doesn't allocate these buffers while rendering, but
saves the tiles to disk. In the end is it read back. Overhead for large
renders (like 300 meg buffers) is 10-15 seconds, not bad.
Two more interesting aspects:
- Blender can save such multi-layer files in the temp directory, storing
it with .blend file name and scene name. That way, on each restart of Blender,
or on switching scenes, these buffers can be read. So you always see what was
rendered last. Also great for compositing work.
- This can also become an output image type for rendering. There's plenty of
cases where you want specific layers or passes saved to disk for later use.
Anyhoo, finishing it is another days of work, and I got more urgent stuff
now!
out moving transparent pixels by checking for alpha>0.95, now it also
checks the solid layer (if present), and if there's no solid face in a
pixel, the speed vector gets also added and used for transparent pixels.
This solves the 'ugly' hard outlines for vectorblur of moving hair.
Before:
http://www.blender.org/bf/h1.jpg
After:
http://www.blender.org/bf/h2.jpg
without initialization.
For Brecht:
source/blender/blenkernel/intern/subsurf_ccg.c:329: warning: left-hand operand of comma expression has no effect
This line I don't understand...
This now is a post-process option only (used to be in render).
It is only handled within the Imbuf/ module, on conversions from float
to byte rect, which atm mostly happens on saving images.
- Small fix: when using Scene RenderLayer nodes, the speed vectors for
these nodes were not created when that scene had "Do Composite" off.
(NOTE: new include dependency in Render module, might need MSVC update!
It has to include the imbuf/intern/openexr/ directory in search path)
-> New Composite node: "Hue Saturation".
Works like the former 'post process' menu. There's no gamma, brightness or
multiply needed in this node, for that the Curves Node functions better.
-> Enabled Toolbox in Node editor
This now also replaces the SHIFT+A for adding nodes. The nodes are
automatically added to the menus, using the 'class' category from the
type definition.
Current classes are (compositor examples):
Inputs: RenderResult, Image
Outputs: Composite, Viewer
Color Ops: RGB Curves, Mix, Hue Saturation, AlphaOver
Vector Ops: Normal, Vector Curves, Map Value
Filters: Filter, Blur, VectorBlur
Convertors: ColorRamp, RGBtoBW, Separate RGBA, Separate HSVA, Set Alpha
Generators: RGB, Value, Time
Groups: the list of custom defined nodes
-> OpenEXR tile saving support
Created an API for for saving tile-based Images with an unlimited amount
of layers/channels. I've tested it for 'render result' now, with the idea
that this can (optionally) replace the current inserting of tiles in the
main result buffers. Especially with a lot of layers, the used memory for
these buffers can easily go into the 100s of megs.
Two other advantages:
- all 'render result' layers can be saved entirely in a single file, for
later use in compositing, also for animation output.
- on each render, per scene, a unique temp file can be stored, allowing
to re-use these temp files on starting Blender or loading files, showing
the last result of a render command.
The option is currently disabled, needs more work... but I had to commit
this because of the rest of the work I did!
-> Bug fix
The Image node didn't call an execute event when browsing another image.
In Orange we've been fighting the past weeks with memory usage a lot...
at the moment incredible huge scenes are being rendered, with multiple
layers and all compositing, stressing limits of memory a lot.
I had hoped that less frequently used blocks would be swapped away
nicely, so fragmented memory could survive. Unfortunately (in OSX) the
malloc range is limited to 2 GB only (upped half of address space).
Other OS's have a limit too, but typically larger afaik.
Now here's mmap to the rescue! It has a very nice feature to map to
a virtual (non existing) file, allowing to allocate disk-mapped memory
on the fly. For as long there's real memory it works nearly as fast as
a regular malloc, and when you go to the swap boundary, it knows nicely
what to swap first.
The upcoming commit will use mmap for all large memory blocks, like
the composit stack, render layers, lamp buffers and images. Tested here
on my 1 GB system, and compositing huge images with a total of 2.5 gig
still works acceptable here. :)
http://www.blender.org/bf/memory.jpg
This is a silly composit test, using 64 MB images with a load of nodes.
Check the header print... the (2323.33M) is the mmap disk-cache in use.
BTW: note that is still limited to the virtual address space of 4 GB.
The new call is:
MEM_mapalloc()
Per definition, mmap() returns zero'ed memory, so a calloc isn't required.
For Windows there's no mmap() available, but I'm pretty sure there's an
equivalent. Windows gurus here are invited to insert that here in code! At
the moment it's nicely ifdeffed, so for Windows the mmap defaults to a
regular alloc.
- Button option "Single" in render-layer panel will enable to only render
the currently indicated render-layer. It will also skip compositing.
- Brought back the 'Local View' render. This will only render the visible
objects, but with lights from the original view-layers.
To make the option useful, it also temporal enables 'Single', which has
the a disadvantage that you need to set the correct render-layer.
It is a bit a tricky option though... since its quite invisble and
confusing for people who don't know the feature. This might become either
a button in 3d header, or use a popup requester to confirm, or... will
need to think over!
At least; both options display in render window a text to denote the option.
large scenes... this because it has to make 3 entire databases to find
the vertex-speed to previous and next frame. Even though most of the
prev/next database was freed, the parts I kept were spread all over
memory.
This commit copies from the prev/next database only the two screen aligned
speed vectors and stores that in temporal per-object structs. Even whilst
it takes more memory, it then can free the entire database, making space
for the next database to be built.
Tests reveiled it saves quite some... well, if you want to believe the
'virtual memory' total unix gives... :)
using 1 line per part rendered. Might go back to 1 line again, but at this
moment I need the logs for debugging.
Same prints are active now for UI rendering. Just temporal :)
keys with IKEY in buttons to not work.
- Crash in opengl while rendering was caused by the fact that scanline
updates are drawn in the main thread, whilst the actual render thread
then can already be doing different stuff.
Especially with many layers & passes it's getting confusing easily :)
Convention now is that scanline render updates only happen while the
thread is looping over scanlines. As soon as it reached the last, no
drawing happens, not even to update the last segment.
This isnt a problen, since any finished tile is drawn again entirely.
nothing radical. :)
Just remember to always try higher octree resolutions (256 or 512) or more
complex scenes. Can be 5-10 times faster.
For waiting pleasure; added a per-second header print update to tell where
octree is. Also added an ESC test in octree making.
(Commit in image.c is a faulty print for 'Not an anim').
Until now, on each mouse/key event preview render restarted with first tile.
It now rememers where it was, and continues rendering.
Also tried to get threaded preview working, but its more work than I can
spend right now. Back to bugs :)
- Improved stats drawing while rendering, it now draws - while preparing
renderdata - each second the amount of verts/faces.
Also while rendering, the amount of finished and total parts are printed.
- Added ESC in loop that generated Group render data
- On deleting Render Layers, the nodes that use them are now checked and
corrected.
- Restored drawing all scanlines in renderwindow... this wasn't the bug!
sampling have been activated for UI. Check the pictures here:
http://www.blender.org/bf/filters/index2.html
I also did do tests with anti-aliased shadowbuffers:
http://www.blender.org/bf/filters/index3.html
But this needs more thinking over still...
- Compositor now frees memory of buffers internally used in groups
immediately. This wasn't part of the event-based cache anyway
- New option: "Free Texture Images" (in render Output panel). This
frees after each render of each scene all images and mipmaps as
used by textures. As reference it prints total amount of MB freed.
- Render stage 'creating speed vectors' had no ESC checking yet
- Made drawing scanline updates during render draw 1 scanline less...
dunno, still hunting for weird opengl crashes.
- 3D preview render didn't properly skip sequence or composit render.
I noticed still several cases where the Imbuf library was called within a
thread... and that whilst the Imbuf itself isn't threadsafe. Also the
thread lock I added in rendering for loading images actually didn't
work, because then it was still possible both threads were accessing the
MEM_malloc function at same time.
This commit nearly fully replaces ImBuf calls in compositor (giving another
nice speedup btw, the way preview images in Nodes were calculated used
clumsy imbuf scaling code).
I've also centralized the 'mutex' locking for threading, which now only
resides in BLI_threads.h. This is used to secure the last ImBuf calls
I cannot replace, which is loading images and creating mipmaps.
Really hope we get something more stable now!
1) Accumulation buffer alpha handling
Accumulating colors in an accumulation is simple; a weighting factor can
make sure colors don't over- or undersaturate.
For alpha this is a bit more complex... especially because the masks for
vectorblur are anti-aliased themselves with alpha values. Up to now I just
premultiplied the mask-alpha with the actual color alpha, which worked OK
for solid masks, but not for transparent ones. I thought that would be an
acceptable situation, since 'ztra' faces only get blurred with alpha==1.
However, it gives bad results when using 'mist' in Blender, which just
gives pixels an alpha value based on camera distance. In these cases the
alpha became oversaturated, accumulating into too high values.
The solution is to store the mask-alpha separately, only premultiply this
alpha with the weighting factor to define the accumulation amount.
This is the math:
blendfactor: the accumulation factor for a vectorblur pass
passRGBA: color and alpha value of the current to be accumulated pass
accRGBA: color and alpha value of accumulation buffer (initialized
with original picture values)
maskA: the mask's alpha itself
accRGBA = (1 - maskA*blendfactor)*accRGBA + (maskA*blendfactor)*passRGBA
This formula accumulates alpha values equally to colors, only using the
mask-alpha as 'alpha-over' operation.
It all sounds very logical, I just write this extensive log because I
couldn't find any technical doc about this case. :)
2) Creating efficient masks with camera-shake
Vector blur can only work well when there's a clear distinction between
what moves, and what doesn't move. This you can solve for example by
rendering complex scenes in multiple layers. This isn't always easy, or
just a lot of work. Especially when the camera itself moves, the mask
created by the vectorblur code becomes the entire image.
A very simple solution is to introduce a small threshold for
moving pixels, which can efficiently separate the hardly-moving pixels
from the moving ones, and thus create nice looking masks.
You can find this new option in the VectorBlur node, as 'min speed'.
This mimimum speed is in pixel units. A value of just 3 will already
clearly separate the background from foreground.
Note; to make this work OK, all vectors in an image are scaled 3 pixels
smaller, to ensure everything keeps looking coherent.
Test renders; 'Elephants Dream' scene with lotsof moving parts; rendered
without OSA, image textures, shadow or color correction.
No vectorblur:
http://www.blender.org/bf/vblur.jpg
With vectorblur, showing the alpha-saturation for mist:
http://www.blender.org/bf/vblur1.jpg
New accumulation formula:
http://www.blender.org/bf/vblur2.jpg
Same image, but now with a 3 pixel minimum speed threshold:
http://www.blender.org/bf/vblur3.jpg
Next frame, without minimum speed
http://www.blender.org/bf/vblur4.jpg
Same frame with speed threshold:
http://www.blender.org/bf/vblur5.jpg
(Only 20 steps of vectorblur were applied for clarity).
The code that generated mipmaps took a real long time to do it... on a
5k x 5k image it took here (no optim, debug compile) 32.5 sec.
Recoded the very old filtering routine, which already brought it down to
2.8 seconds. Then tested if we even need this filtering... in many cases
the images are painted or photographs, which is filtered OK already.
Without the filter, the mipmap timing went down to 0.39 second. :)
http://www.blender.org/bf/filters/index1.html
Here's an example of two 'mips' generated with or without gauss filter.
Note that aliasing in an image remains there... which can be a wanted
effect anyway.
So; added the gauss filter as option in making mipmaps. Also had to
reshuffle the buttons there in a more logical manner.
There's also disabled code in the do_versions to set 'gauss' on in older
files. Will be enabled during release time.
sockets were not used yet... now they're verified on read, and written
in socket stack data on adding new nodes.
Also the buttons in Nodes use these values now. Special request from
Nathan Vegdahl who seems to be messing around with my precious nodes! :)
get sampled on larger distance. It actually just flattens bump when the
sampled area is (much) larger than pixel size, to prevent weird things
like:
current render:
http://www.blender.org/bf/b1.jpg
distance corrected:
http://www.blender.org/bf/b2.jpg
(image based on Alexander file :)
Tested on env's dinos too... seems to work, but we'll see.
- LampHalos can be rendered separately too. Just disable 'Solid' in a
layer and keep 'Halo' option enabled.
- Note that disabling 'Solid' will still fill in Z values for the solid
faces, to provide occlusion information for the Ztransp and Halo layer
options. The latter didn't work this way until now for OSA render.
ALso note that that Ztransp+LampHalo still isn't good marriage... it
renders a bit weird, but that's an old issue. :)
- it now correctly pre-multiplies with alpha the RGB values for the
antialised mask (alpha artefacts were visible)
- The transparent layer will add speed vectors on top of the solid layer,
cancelling out cases where the solid layer was not moving (like in its
own antialising.
This works fine, for as long you don't render in a single pass trans-
parent faces that move on top of not-moving solid faces.
You now can set a Preview panel in the Image window, to define a sub-rect
of an image to be processed. Works like the preview in 3D Window. Just
press SHIFT+P to get it activated. Very nice speedup!
This is how it works:
- The compositor still uses the scene image size (including % setting) for
Viewer or Composite output size
- If a preview exists, it calculates the cropped rect from its position
in the Image window, and stores that in the Scene render data
- On composite execute, it copies only this part from the 'generator nodes',
right now Images or Render Results. That makes the entire composite tree
only using small rects, so it will execute fast.
- Also the render window will only display the cropped rect, and on F12
only the cropped part is being executed
- On rendering in background mode, the cropping is ignored though.
Usability notes:
- translating or zooming view will automatically invoke a recalculation
- if you zoom in on details, the calculated rect will even become smaller
- only one Imagewindow can have this Preview Panel, to prevent conflicts of
what the cropped area should be. Compositing is on Scene level, not local
per image window. (Note; 3D Previews are local per window!)
- Closing the preview panel will invoke a full-size recalculation
- All passes/layers from rendering are nicely cropped, including Z and
vectors.
The work to make the compositor do cropping was simple, but getting the
Image window displaying correctly and get all events OK was a lot of work...
indeed, we need to refactor Image Window usage once. Sorry for making the
mess even bigger now. :) I've tried not to interfere with UV edit or Paint
though... only when you're in compositing mode the panel will work.
BUG fix:
3D Preview render didn't work when multiple layers were set in the current
scene.
as triangles, with a tag bit to denote which triangle was which part of
the quad. That was hardcoded bit 0x800000, which allows a maximum of
about 8 million quads...
I've made this a nice #define, set to be 16 times larger. So, now the
facejunkies can go up to 128 Million faces, were it not that this will eat
up a load of memory!
I only have 1 Gig in this machine. A test with 9M vertices and 7.5M quads
eats up 912 MB of memory already. If this becomes a real issue, I know
tricks how to make the vertices 20 bytes smaller, and faces 4 bytes, which
would in the above case save about 200 MB. Not much... but probably worth
the try? A much better method is of course 'bucketing' the renderdata per
tile. It's a spec of the render recode, but not a quicky to add.
Also: bug fix in curve code. There was a short counter still, crashing on
large curves with resol set to 1024 :)
- Composit cache now gets fully freed on a render. Each output socket of a
node stores the entire image... and while render that's a waste of memory
- Sky 'paper' render was using wrong texture coordinates
- Found missing test_break() in ztransp rendering.
- ZTransp render now also delivers Z values and Speed vectors in passes
Note that speed vectors accumulate within a pixel to store the minimum,
so rendering ztransp on top of a non-moving plane won't give speed...
Best results you get is by rendering it in a separate layer.
The Z value stored is the closest visible transparent face in the pixel.
Fixes:
- Render to 'spare page' has been enabled again. Because of the strict
separation of Render and UI, but especially because a 'render result' now
can consist of unlimited images, I've not made this a Render feature.
Instead, the render-window itself stores the 'spare' image... I also
had to change the convention for it a bit.
Now, instead of having two "render buffers" (which was a render feature),
the RenderWindow will store each previous frame on a re-render. This
storing will only start after you've pressed 'Jkey' once, but then always
will happen for as long the rendered image is same size as previously.
For clarity, I've also renamed the window title, to 'previous frame'.
- RenderWindow shows alpha again on Akey
- Display of the Zvalues in ImageWindow has been tweaked. White now denotes
closest, and the color range goes from camera clip-sta to clip-end.
- Bugfix: on splitting/merging/duplicating windows, the 3D Previewrender was
not always freed correctly, potentially causing crashes or memory leaks.
vertex locations, not global coordinates. This ensures consistant
autosmoothing for each frame. Also fixes missing vectorblur for parts.
Nice task for a dev: put autosmooth code in end of modifier stack... then
it also shows in 3D window
- BUG FIX! I noticed the last tile rendered quite slow, and even did not
update scanlines. Found out that the main tiles processor didn't go
to sleep when the last tile was rendered, because it detected a free
possible thread. This caused the main thread to go into a very tight
loop, eating up a lot of cpu and blocking the other thread.
vertices differed on previous/next frame, causing speedvector calculus
to be skipped.
Now that worked OK, where it not that non-existing speed vectors were not
initialized zero while rendering...
Also another issue showed up with autosmooth. When using exact smooth
angles (like 30 degrees) on a model that has been spinned with exactly
30 degree steps, the autosmooth gave different results on each frame...
and only when compiled in O2 (probably thats doing bad float rounding).
Solved this by just adding 0.1 to the user defined smooth angle.
vectors. It's actually shutter speed, but in this case works identical to
the old motionblur 'blur fac' button.
Note; the "Max Speed" button only clips speed, use this to prevent
extreme speed values. Max speed applied before the scaling happens.
After a couple of experiments with variable blur filters, I tried
a more interesting, and who knows... original approach. :)
First watch results here:
http://www.blender.org/bf/rt0001_0030.avihttp://www.blender.org/bf/hand0001_0060.avi
These are the steps in producing such results:
- In preprocess, the speed vectors to previous and next frame are
calculated. Speed vectors are screen-aligned and in pixel size.
- while rendering, these vectors get calculated per sample, and
accumulated in the vector buffer checking for "minimum speed".
(on start the vector buffer is initialized on max speed).
- After render:
- The entire image, all pixels, then is converted to quad polygons.
- Also the z value of the pixels is assigned to the polygons
- The vertices for the quads use averaged speed vectors (of the 4
corner faces), using a 'minimum but non-zero' speed rule.
This minimal speed trick works very well to prevent 'tearing' apart
when multiple faces move in different directions in a pixel, or to
be able to separate moving pixels clearly from non-moving ones
- So, now we have a sort of 'mask' of quad polygons. The previous steps
guaranteed that this mask doesn't have antialias color info, and has
speed vectors that ensure individual parts to move nicely without
tearing effects. The Z allows multiple layers of moving masks.
- Then, in temporal buffer, faces get tagged if they move or not
- These tags then go to an anti-alias routine, which assigns alpha
values to edge faces, based on the method we used in past to antialias
bitmaps (still in our code, check the antialias.c in imbuf!)
- finally, the tag buffer is used to tag which z values of the original
image have to be included (to allow blur go behind stuff).
- OK, now we're ready for accumulating! In a loop, all faces then get
drawn (with zbuffer) with increasing influence of their speed vectors.
The resulting image then is accumulated on top of the original with a
decreasing weighting value.
It sounds all quite complex... but the speed is still encouraging. Above
images have 64 mblur steps, which takes about 1-3 seconds per frame.
Usage notes:
- Make sure the render-layer has passes 'Vector' and 'Z' on.
- add in Compositor the VectorBlur node, and connect the image, Z and
speed to the inputs.
- The node allows to set amount of steps (10 steps = 10 forward, 10 back).
and to set a maximum speed in pixels... to prevent extreme moving things
to blur too wide.
- Scene support in RenderLayers
You now can indicate in Compositor to use RenderLayer(s) from other scenes.
Use the new dropdown menu in the "Render Result" node. It will change the
title of the node to indicate that.
The other Scenes are rendered fully separate, creating own databases (and
octrees) after the current scene was finished. They use their own render
settings, with as exception the render output size (and optional border).
This makes the option an interesting memory saver and speedup.
Also note that the render-results of other scenes are kept in memory while
you work. So, after a render, you can tweak all composit effects.
- Render Stats
Added an 'info string' to stats, printed in renderwindow header. It gives
info now on steps "creating database", "shadow buffers", and "octree".
- Bug fixes
Added redraw event for Image window, when using compositor render.
Text objects were not rendered using background render (probably a bug
since depsgraph was added)
Dropdown buttons in Node editor were not refreshed after usage
Sometimes render window did not open, this due to wrong check for 'esc'.
Removed option that renders view-layers on F12, with mouse in 3d window.
Not only was it confusing, it's now more efficient with the Preview Panel,
which does this nicely.