- Renderwin still used a thread-unsafe malloc, in the header text print
- Setting clipping flags in vertices for parts required a mutex lock after
all... I thought it would go fine, but noticed on renders with small
amounts of faces that sometimes faces disappear from a render.
(was doing movie credits, so all faces are visible! Otherwise it would
have hardly been noticable...)
- Bug fix: the upper tile in a collumn for Panorama render didn't put the
mainthread to sleep properly. Now panorama renders 25% faster if you had
set Y-Parts to 4.
- Enabling Compositing in Scene for first time now adds a "Composite" node
too, so render output gets applied.
- An attempt to render with "Do Composite" without "Composite" node will
throw an error and stops rendering. In background mode it will just not
render at all, and print errors.
- Errors that prevent rendering now give a popup menu again.
- Having MBlur or Fields option on will now normally render, but with an
error print in console (not done yet...)
for leftover initialized max-speed values, and clears it. Also gives
a giant print then... I want to know when it happens, and howto redo!
(error print = "tsk tsk! PASS_VECTOR_MAX left in buffer...")
also could use correction for it.
The current perspective projected blur would look in 180 degree view like
this:
http://www.blender.org/bf/p2.jpg
(circle of planes rotating around camera)
After some fight with my rusty highschool gonio I got it fixed; nice
cylindrical projected speedvectors:
http://www.blender.org/bf/p1.jpg
front/back is selected), the UV coordinates for curves should also be
corrected.
This commit re-uses the same code as for Nurbs, to make sure UV coordinates
wrap around nicely.
BUT! I've noticed that Daniel's commit of august in this code actually
broke this UV correction... in his craze to cleanup old code, he missed
the actual functionality. Meaning that in 2.40 and 2.41, "UV orco" texture
coordinates wrap around ugly in Nurbs Surfaces, something that was fixed
in NaN days.
Got no time for tracker now... but I'm sure it's in there! :)
To be able to make good masks, it is important to separate the non moving
pixels from the moving ones. With fixes I did 2 weeks ago, a floating
point inaccuracy causes speed vectors to be not always zero perfectly...
and the masks to get badly shaped.
It was clearly visible on moving objects over a non-moving background.
Current commit includes minimal threshold to force speed to zero. Images
are nice and smooth again. :)
Bad:
http://www.blender.org/bf/vb1.jpg
Good again:
http://www.blender.org/bf/vb2.jpg
animation systems, all transforms of all duplicated group members have
to be set first, before drawing or converting for render. This because
then still deformation can be calculated.
The old implementation was added quite hackish (talking about 10 yr ago).
You also had to make a small image slice, which was extended Xparts in
size. That also required to adjust the camera angle. Very clumsy.
Now; when enabling the Panorama option, it will automatically apply the
panorama effect on the vertically aligned tiles. You can just enable or
disable the "Pano" button, to get a subtle lens effect like this:
(without pano)
http://www.blender.org/bf/rt.jpg
(with pano)
http://www.blender.org/bf/rt1.jpg
For Panorama render, the minimum slice size has been hardcoded to be 8
pixels. The XParts button goes up to 512 to allow that. In practice,
rendering 64 slices will already give very good images for a wide angle
lens of 90 degrees, the curvature of straight lines then is equal to
a circle of 256 points.
Rendering a full 360 degree panorama you do by creating an extreme wide
angle camera. The theory says camera-lens 5 should do 360 degrees, but
for some reason my tests reveil it's 5.1... there's a rounding error
somewhere, maybe related to the clipping plane start? Will look at that
later. :)
Also note that for each Xpart slice, the entire database needs to be
rotated around camera to correct for panorama, on huge scenes that might
give some overhead.
Threaded render goes fine for Panorama too, but it can only render the
vertically aligned parts in parallel. For the next panorama slice it has
to wait for all threads of the current slice to be ready.
On reading old files, I convert the settings to match as closely as
possible the new situation.
Since I cannot bump up the version #, the code detects for old panorama
by checking for the image size. If image width is smaller than height, it
assumes it's an old file (only if Panoroma option was set).
issues in parallel... So this commit contains: an update of
the solver (e.g. moving objects), integration of blender IPOs,
improved rendering (motion blur, smoothed normals) and a first particle
test. In more detail:
Solver update:
- Moving objects using a relatively simple model, and not yet fully optimized - ok
for box falling into water, water in a moving glass might cause trouble. Simulation
times are influenced by overall no. of triangles of the mesh, scaling meshes up a lot
might also cause slowdowns.
- Additional obstacle settings: noslip (as before), free slip (move along wall freely)
and part slip (mix of both).
- Obstacle settings also added for domain boundaries now, the six walls of the domain are
obstacles after all as well
- Got rid of templates, should make compiling for e.g. macs more convenient,
for linux there's not much difference. Finally got rid of parser (and some other code
parts), the simulation now uses the internal API to transfer data.
- Some unnecessary file were removed, the GUI now needs 3 settings buttons...
This should still be changed (maybe by adding a new panel for domain objects).
IPOs:
- Animated params: viscosity, time and gravity for domains. In contrast
to normal time IPO for Blender objects, the fluidsim one scales the time
step size - so a constant 1 has no effect, values towards 0 slow it down,
larger ones speed the simulation up (-> longer time steps, more compuations).
The viscosity IPO is also only a factor for the selected viscosity (again, 1=no effect).
- For objects that are enabled for fluidsim, a new IPO type shows up. Inflow
objects can use the velocity channels to animate the inflow. Obstacles, in/outflow
objects can be switched on (Active IPO>0) and off (<0) during the simulation.
- Movement, rotation and scaling of those 3 types is exported from the normal
Blender channels (Loc,dLoc,etc.).
Particles:
- This is still experimental, so it might be deactivated for a
release... It should at some point be used to model smaller splashes,
depending on the the realworld size and the particle generation
settings particles are generated during simulation (stored in _particles_X.gz
files).
- These are loaded by enabling the particle field for an arbitrary object,
which should be given a halo material. For each frame, similar to the mesh
loading, the particle system them loads the simulated particle positions.
- For rendering, I "abused" the part->rt field - I couldnt find any use
for it in the code and it seems to work fine. The fluidsim particles
store their size there.
Rendering:
- The fluidims particles use scaled sizes and alpha values to give a more varied
appearance. In convertblender.c fluidsim particle systems use the p->rt field
to scale up the size and down the alpha of "smaller particles". Setting the
influence fields in the fluidims settings to 0 gives equally sized particles
with same alpha everywhere. Higher values cause larger differences.
- Smoothed normals: for unmodified fluid meshes (e.g. no subdivision) the normals
computed by the solver are used. This is basically done by switching off the
normal recalculation in convertblender.c (the function calc_fluidsimnormals
handles other mesh inits instead of calc_vertexnormals).
This could also be used to e.g. modify mesh normals in a modifier...
- Another change is that fluidsim meshes load the velocities computed
during the simulation for image based motion blur. This is inited in
load_fluidsimspeedvectors for the vector pass (they're loaded during the
normal load in DerivedMesh readBobjgz). Generation and loading can be switched
off in the settings. Vector pass currently loads the fluidism meshes 3 times,
so this should still be optimized.
Examples:
- smoothed normals versus normals from subdividing once:
http://www10.informatik.uni-erlangen.de/~sinithue/temp/v060227_1smoothnorms.pnghttp://www10.informatik.uni-erlangen.de/~sinithue/temp/v060227_2subdivnorms.png
- fluidsim particles, size/alpha influence 0:
http://www10.informatik.uni-erlangen.de/~sinithue/temp/v060227_3particlesnorm.png
size influence 1:
http://www10.informatik.uni-erlangen.de/~sinithue/temp/v060227_4particlessize.png
size & alpha influence 1:
http://www10.informatik.uni-erlangen.de/~sinithue/temp/v060227_5particlesalpha.png
- the standard drop with motion blur and particles:
http://www10.informatik.uni-erlangen.de/~sinithue/temp/elbeemupdate_t2new.mpg
(here's how it looks without
http://www10.informatik.uni-erlangen.de/~sinithue/temp/elbeemupdate_t1old.mpg)
- another inflow animation (moving, switched on/off) with a moving obstacle
(and strong mblur :)
http://www10.informatik.uni-erlangen.de/~sinithue/temp/elbeemupdate_t3ipos.mpg
Things still to fix:
- rotating & scaling domains causes wrong speed vectors
- get rid of SDL code for threading, use pthreads as well?
- update wiki documentation
- cool effects for rendering would be photon maps for caustics,
and motion blur for particles :)
For some reason I thought SDL thread handling would be much simpler... but
the migration to posix pthread went very smooth and painless. Less code
even, and I even notice a slight performance increase!
All threading code is still wrapped in blenlib/intern/threads.c
Only real change was making the callback functions to return void pointer,
instead of an int.
The mutex handling is also different... there's no test anymore if a
mutex was initialized, which is a bit confusing. But it appears to run
all fine still. :)
Nathan Letwory has been signalled already to provide the Windows pthread
library and make/scons linking. For MSVC we might need help from someone
else later though.
the ones that get changed within threads, to communicate with the main
thread.
(Part of the long quest to get threaded render safe, especially in Linux)
passes in single file. Code is currently disabled, commit is mainly to
have a nicer method of excluding OpenEXR dependency from render module.
This should compile with disabled WITH_OPENEXR too.
Reason why EXR is great to include by default in Blender is its feature
to store unlimited layers and channels, and write this tile based. I
need the feature for saving memory; while rendering tiles, all full-size
buffers for all layers and passes are kept in memory now, which can go
into 100s of MB easily.
The code I commit now doesn't allocate these buffers while rendering, but
saves the tiles to disk. In the end is it read back. Overhead for large
renders (like 300 meg buffers) is 10-15 seconds, not bad.
Two more interesting aspects:
- Blender can save such multi-layer files in the temp directory, storing
it with .blend file name and scene name. That way, on each restart of Blender,
or on switching scenes, these buffers can be read. So you always see what was
rendered last. Also great for compositing work.
- This can also become an output image type for rendering. There's plenty of
cases where you want specific layers or passes saved to disk for later use.
Anyhoo, finishing it is another days of work, and I got more urgent stuff
now!
out moving transparent pixels by checking for alpha>0.95, now it also
checks the solid layer (if present), and if there's no solid face in a
pixel, the speed vector gets also added and used for transparent pixels.
This solves the 'ugly' hard outlines for vectorblur of moving hair.
Before:
http://www.blender.org/bf/h1.jpg
After:
http://www.blender.org/bf/h2.jpg
without initialization.
For Brecht:
source/blender/blenkernel/intern/subsurf_ccg.c:329: warning: left-hand operand of comma expression has no effect
This line I don't understand...
This now is a post-process option only (used to be in render).
It is only handled within the Imbuf/ module, on conversions from float
to byte rect, which atm mostly happens on saving images.
- Small fix: when using Scene RenderLayer nodes, the speed vectors for
these nodes were not created when that scene had "Do Composite" off.
(NOTE: new include dependency in Render module, might need MSVC update!
It has to include the imbuf/intern/openexr/ directory in search path)
-> New Composite node: "Hue Saturation".
Works like the former 'post process' menu. There's no gamma, brightness or
multiply needed in this node, for that the Curves Node functions better.
-> Enabled Toolbox in Node editor
This now also replaces the SHIFT+A for adding nodes. The nodes are
automatically added to the menus, using the 'class' category from the
type definition.
Current classes are (compositor examples):
Inputs: RenderResult, Image
Outputs: Composite, Viewer
Color Ops: RGB Curves, Mix, Hue Saturation, AlphaOver
Vector Ops: Normal, Vector Curves, Map Value
Filters: Filter, Blur, VectorBlur
Convertors: ColorRamp, RGBtoBW, Separate RGBA, Separate HSVA, Set Alpha
Generators: RGB, Value, Time
Groups: the list of custom defined nodes
-> OpenEXR tile saving support
Created an API for for saving tile-based Images with an unlimited amount
of layers/channels. I've tested it for 'render result' now, with the idea
that this can (optionally) replace the current inserting of tiles in the
main result buffers. Especially with a lot of layers, the used memory for
these buffers can easily go into the 100s of megs.
Two other advantages:
- all 'render result' layers can be saved entirely in a single file, for
later use in compositing, also for animation output.
- on each render, per scene, a unique temp file can be stored, allowing
to re-use these temp files on starting Blender or loading files, showing
the last result of a render command.
The option is currently disabled, needs more work... but I had to commit
this because of the rest of the work I did!
-> Bug fix
The Image node didn't call an execute event when browsing another image.
In Orange we've been fighting the past weeks with memory usage a lot...
at the moment incredible huge scenes are being rendered, with multiple
layers and all compositing, stressing limits of memory a lot.
I had hoped that less frequently used blocks would be swapped away
nicely, so fragmented memory could survive. Unfortunately (in OSX) the
malloc range is limited to 2 GB only (upped half of address space).
Other OS's have a limit too, but typically larger afaik.
Now here's mmap to the rescue! It has a very nice feature to map to
a virtual (non existing) file, allowing to allocate disk-mapped memory
on the fly. For as long there's real memory it works nearly as fast as
a regular malloc, and when you go to the swap boundary, it knows nicely
what to swap first.
The upcoming commit will use mmap for all large memory blocks, like
the composit stack, render layers, lamp buffers and images. Tested here
on my 1 GB system, and compositing huge images with a total of 2.5 gig
still works acceptable here. :)
http://www.blender.org/bf/memory.jpg
This is a silly composit test, using 64 MB images with a load of nodes.
Check the header print... the (2323.33M) is the mmap disk-cache in use.
BTW: note that is still limited to the virtual address space of 4 GB.
The new call is:
MEM_mapalloc()
Per definition, mmap() returns zero'ed memory, so a calloc isn't required.
For Windows there's no mmap() available, but I'm pretty sure there's an
equivalent. Windows gurus here are invited to insert that here in code! At
the moment it's nicely ifdeffed, so for Windows the mmap defaults to a
regular alloc.
- Button option "Single" in render-layer panel will enable to only render
the currently indicated render-layer. It will also skip compositing.
- Brought back the 'Local View' render. This will only render the visible
objects, but with lights from the original view-layers.
To make the option useful, it also temporal enables 'Single', which has
the a disadvantage that you need to set the correct render-layer.
It is a bit a tricky option though... since its quite invisble and
confusing for people who don't know the feature. This might become either
a button in 3d header, or use a popup requester to confirm, or... will
need to think over!
At least; both options display in render window a text to denote the option.
large scenes... this because it has to make 3 entire databases to find
the vertex-speed to previous and next frame. Even though most of the
prev/next database was freed, the parts I kept were spread all over
memory.
This commit copies from the prev/next database only the two screen aligned
speed vectors and stores that in temporal per-object structs. Even whilst
it takes more memory, it then can free the entire database, making space
for the next database to be built.
Tests reveiled it saves quite some... well, if you want to believe the
'virtual memory' total unix gives... :)
using 1 line per part rendered. Might go back to 1 line again, but at this
moment I need the logs for debugging.
Same prints are active now for UI rendering. Just temporal :)
keys with IKEY in buttons to not work.
- Crash in opengl while rendering was caused by the fact that scanline
updates are drawn in the main thread, whilst the actual render thread
then can already be doing different stuff.
Especially with many layers & passes it's getting confusing easily :)
Convention now is that scanline render updates only happen while the
thread is looping over scanlines. As soon as it reached the last, no
drawing happens, not even to update the last segment.
This isnt a problen, since any finished tile is drawn again entirely.
nothing radical. :)
Just remember to always try higher octree resolutions (256 or 512) or more
complex scenes. Can be 5-10 times faster.
For waiting pleasure; added a per-second header print update to tell where
octree is. Also added an ESC test in octree making.
(Commit in image.c is a faulty print for 'Not an anim').
Until now, on each mouse/key event preview render restarted with first tile.
It now rememers where it was, and continues rendering.
Also tried to get threaded preview working, but its more work than I can
spend right now. Back to bugs :)
- Improved stats drawing while rendering, it now draws - while preparing
renderdata - each second the amount of verts/faces.
Also while rendering, the amount of finished and total parts are printed.
- Added ESC in loop that generated Group render data
- On deleting Render Layers, the nodes that use them are now checked and
corrected.
- Restored drawing all scanlines in renderwindow... this wasn't the bug!
sampling have been activated for UI. Check the pictures here:
http://www.blender.org/bf/filters/index2.html
I also did do tests with anti-aliased shadowbuffers:
http://www.blender.org/bf/filters/index3.html
But this needs more thinking over still...
- Compositor now frees memory of buffers internally used in groups
immediately. This wasn't part of the event-based cache anyway
- New option: "Free Texture Images" (in render Output panel). This
frees after each render of each scene all images and mipmaps as
used by textures. As reference it prints total amount of MB freed.
- Render stage 'creating speed vectors' had no ESC checking yet
- Made drawing scanline updates during render draw 1 scanline less...
dunno, still hunting for weird opengl crashes.
- 3D preview render didn't properly skip sequence or composit render.
I noticed still several cases where the Imbuf library was called within a
thread... and that whilst the Imbuf itself isn't threadsafe. Also the
thread lock I added in rendering for loading images actually didn't
work, because then it was still possible both threads were accessing the
MEM_malloc function at same time.
This commit nearly fully replaces ImBuf calls in compositor (giving another
nice speedup btw, the way preview images in Nodes were calculated used
clumsy imbuf scaling code).
I've also centralized the 'mutex' locking for threading, which now only
resides in BLI_threads.h. This is used to secure the last ImBuf calls
I cannot replace, which is loading images and creating mipmaps.
Really hope we get something more stable now!
1) Accumulation buffer alpha handling
Accumulating colors in an accumulation is simple; a weighting factor can
make sure colors don't over- or undersaturate.
For alpha this is a bit more complex... especially because the masks for
vectorblur are anti-aliased themselves with alpha values. Up to now I just
premultiplied the mask-alpha with the actual color alpha, which worked OK
for solid masks, but not for transparent ones. I thought that would be an
acceptable situation, since 'ztra' faces only get blurred with alpha==1.
However, it gives bad results when using 'mist' in Blender, which just
gives pixels an alpha value based on camera distance. In these cases the
alpha became oversaturated, accumulating into too high values.
The solution is to store the mask-alpha separately, only premultiply this
alpha with the weighting factor to define the accumulation amount.
This is the math:
blendfactor: the accumulation factor for a vectorblur pass
passRGBA: color and alpha value of the current to be accumulated pass
accRGBA: color and alpha value of accumulation buffer (initialized
with original picture values)
maskA: the mask's alpha itself
accRGBA = (1 - maskA*blendfactor)*accRGBA + (maskA*blendfactor)*passRGBA
This formula accumulates alpha values equally to colors, only using the
mask-alpha as 'alpha-over' operation.
It all sounds very logical, I just write this extensive log because I
couldn't find any technical doc about this case. :)
2) Creating efficient masks with camera-shake
Vector blur can only work well when there's a clear distinction between
what moves, and what doesn't move. This you can solve for example by
rendering complex scenes in multiple layers. This isn't always easy, or
just a lot of work. Especially when the camera itself moves, the mask
created by the vectorblur code becomes the entire image.
A very simple solution is to introduce a small threshold for
moving pixels, which can efficiently separate the hardly-moving pixels
from the moving ones, and thus create nice looking masks.
You can find this new option in the VectorBlur node, as 'min speed'.
This mimimum speed is in pixel units. A value of just 3 will already
clearly separate the background from foreground.
Note; to make this work OK, all vectors in an image are scaled 3 pixels
smaller, to ensure everything keeps looking coherent.
Test renders; 'Elephants Dream' scene with lotsof moving parts; rendered
without OSA, image textures, shadow or color correction.
No vectorblur:
http://www.blender.org/bf/vblur.jpg
With vectorblur, showing the alpha-saturation for mist:
http://www.blender.org/bf/vblur1.jpg
New accumulation formula:
http://www.blender.org/bf/vblur2.jpg
Same image, but now with a 3 pixel minimum speed threshold:
http://www.blender.org/bf/vblur3.jpg
Next frame, without minimum speed
http://www.blender.org/bf/vblur4.jpg
Same frame with speed threshold:
http://www.blender.org/bf/vblur5.jpg
(Only 20 steps of vectorblur were applied for clarity).
The code that generated mipmaps took a real long time to do it... on a
5k x 5k image it took here (no optim, debug compile) 32.5 sec.
Recoded the very old filtering routine, which already brought it down to
2.8 seconds. Then tested if we even need this filtering... in many cases
the images are painted or photographs, which is filtered OK already.
Without the filter, the mipmap timing went down to 0.39 second. :)
http://www.blender.org/bf/filters/index1.html
Here's an example of two 'mips' generated with or without gauss filter.
Note that aliasing in an image remains there... which can be a wanted
effect anyway.
So; added the gauss filter as option in making mipmaps. Also had to
reshuffle the buttons there in a more logical manner.
There's also disabled code in the do_versions to set 'gauss' on in older
files. Will be enabled during release time.
sockets were not used yet... now they're verified on read, and written
in socket stack data on adding new nodes.
Also the buttons in Nodes use these values now. Special request from
Nathan Vegdahl who seems to be messing around with my precious nodes! :)
get sampled on larger distance. It actually just flattens bump when the
sampled area is (much) larger than pixel size, to prevent weird things
like:
current render:
http://www.blender.org/bf/b1.jpg
distance corrected:
http://www.blender.org/bf/b2.jpg
(image based on Alexander file :)
Tested on env's dinos too... seems to work, but we'll see.