I recently decided to try adding a Metropolis sampler (no BPT, just the sampler) to Cycles to get some experience with the Cycles code layout, but it turned out working so well that I decided to post it here. Due to some problems, it's not releasable yet, but these problems (more below) should be not too hard to fix.
The code is based on the SmallLuxGPU metropolis sampler, after I first tried the PBRT code, but that one didn't seem to work nearly as well.
Basically, it works by bypassing the RNG functions: When they are called and metropolis sampling is selected, they just return a value from a sample array that is stored in the RNG pointer. This allows the sampler to make only minimal changes to the kernel, it even uses the standard kernel_path_integrate. The sampler itself is in the CPUDevice thread.
My current test scene is a simple pool with a modifier-displaced water surface, an absorption volume in the water (works great, by the way), a glass pane on one sine of the water and a Sun-HDR-combination lighting the scene.
The upper image is the 2.69 release, while the lower one is with the metropolis patch. Both were rendered in equal time, but the patched version is a debug build.
From the Metropolis image, the biggest problem is obvious: Tiles. The current system requires one sampler per tile, so the seams are easily visible. Also, if one tile has large bright surfaces, the noise outside of those is worse than in other tiles. The solution for this would be one sampler per thread, all working on the whole image (which would require atomics for the buffer writes).
Another problem are passes, since they're currently written directly from the kernel. However, in Metropolis sampling, they need to be weighted, but this weight is only available after the kernel is done with the ray. A solution for this would be to return the values to be written to the buffer from the kernel, so that the Device is responsible for storing them. The depth, normal, ObjectID and Alpha passes could be done in a single-pass regular pathtrace.
Also, there currently is a bug that causes standard pathtracing to crash, I still have to find the source of this one.
However, this seems like a promising feature that might be worth the work fixing the problems above.
PS: A one-hour-render of the pool looks like this:
I recently decided to try adding a Metropolis sampler (no BPT, just the sampler) to Cycles to get some experience with the Cycles code layout, but it turned out working so well that I decided to post it here. Due to some problems, it's not releasable yet, but these problems (more below) should be not too hard to fix.
The code is based on the SmallLuxGPU metropolis sampler, after I first tried the PBRT code, but that one didn't seem to work nearly as well.
Basically, it works by bypassing the RNG functions: When they are called and metropolis sampling is selected, they just return a value from a sample array that is stored in the RNG pointer. This allows the sampler to make only minimal changes to the kernel, it even uses the standard kernel_path_integrate. The sampler itself is in the CPUDevice thread.
My current test scene is a simple pool with a modifier-displaced water surface, an absorption volume in the water (works great, by the way), a glass pane on one sine of the water and a Sun-HDR-combination lighting the scene.
 
The upper image is the 2.69 release, while the lower one is with the metropolis patch. Both were rendered in equal time, but the patched version is a debug build.
From the Metropolis image, the biggest problem is obvious: Tiles. The current system requires one sampler per tile, so the seams are easily visible. Also, if one tile has large bright surfaces, the noise outside of those is worse than in other tiles. The solution for this would be one sampler per thread, all working on the whole image (which would require atomics for the buffer writes).
Another problem are passes, since they're currently written directly from the kernel. However, in Metropolis sampling, they need to be weighted, but this weight is only available after the kernel is done with the ray. A solution for this would be to return the values to be written to the buffer from the kernel, so that the Device is responsible for storing them. The depth, normal, ObjectID and Alpha passes could be done in a single-pass regular pathtrace.
Also, there currently is a bug that causes standard pathtracing to crash, I still have to find the source of this one.
However, this seems like a promising feature that might be worth the work fixing the problems above.
PS: A one-hour-render of the pool looks like this: 
The patch is here: [metropolis.diff](https://archive.blender.org/developer/F75623/metropolis.diff)
Cool stuff, I really didn't expect this and it would definitely make for much easier rendering of caustics in Cycles (especially in conjunction with 'filter glossy' to help with those more difficult lightpaths).
By any chance (and this might be some crazy idea), but would it be possible to only apply the metropolis sampler for certain lightpath types (like the ones that create caustic effects), because you can already get away easily with plain pathtracing with plain diffuse bounces or other non-caustic situations (except for cases with tiny lights of course)?
Just an idea to toss around since Cycles has the functionality needed to obtain information from paths (hence thelight path node).
I think storm_st should also look at this, perhaps he can move his bidirectional sampling efforts to work off of your code.
Cool stuff, I really didn't expect this and it would definitely make for much easier rendering of caustics in Cycles (especially in conjunction with 'filter glossy' to help with those more difficult lightpaths).
By any chance (and this might be some crazy idea), but would it be possible to only apply the metropolis sampler for certain lightpath types (like the ones that create caustic effects), because you can already get away easily with plain pathtracing with plain diffuse bounces or other non-caustic situations (except for cases with tiny lights of course)?
Just an idea to toss around since Cycles has the functionality needed to obtain information from paths (hence the*light path* node).
I think storm_st should also look at this, perhaps he can move his bidirectional sampling efforts to work off of your code.
After patching, preview seems to be broken. Looks like only a couple of samples are updated into the viewport. F12 render works just fine.
Testing on Linux 64-bit. Very excited to play around with this! Thanks for the contribution.
After patching, preview seems to be broken. Looks like only a couple of samples are updated into the viewport. F12 render works just fine.
Testing on Linux 64-bit. Very excited to play around with this! Thanks for the contribution.
The broken preview render is probably because every time the CPUDevice thread is called, a new sampler is created with the global integrator seed. This could be fixed by storing the sampler data in the tile, I'll try that once I'm at home.
The broken preview render is probably because every time the CPUDevice thread is called, a new sampler is created with the global integrator seed. This could be fixed by storing the sampler data in the tile, I'll try that once I'm at home.
I don't want to spoil the fun here (results look great), but isn't SmallLuxGPU GPL code?
I know that they re licensed the LuxRays code recently to Apache 2.0, so if your work is based on that, it's fine.
Some clarification here would be appreciated (link to the sources this patch is based on).
I don't want to spoil the fun here (results look great), but isn't SmallLuxGPU GPL code?
I know that they re licensed the LuxRays code recently to Apache 2.0, so if your work is based on that, it's fine.
Some clarification here would be appreciated (link to the sources this patch is based on).
The code is based on 48e44c150f/src/slg/sampler/sampler.cpp , which, according to the header, is Apache 2.0 licensed. I'll upload my test file as soon as I'm on my PC. I'm working on Linux x64, but I could also test it on Windows, although there shouldn't be any platform-dependent code in there.
The code is based on https://bitbucket.org/luxrender/luxrays/src/48e44c150fd53c00a8dd8722efdcaf870a561da0/src/slg/sampler/sampler.cpp , which, according to the header, is Apache 2.0 licensed. I'll upload my test file as soon as I'm on my PC. I'm working on Linux x64, but I could also test it on Windows, although there shouldn't be any platform-dependent code in there.
Render is different, lot of errors. (color_ramp.blend in this example).
Tested the patch on Mac OS now, with clang compiler. Same result.
I just open one of our test files: https://svn.blender.org/svnroot/bf-blender/trunk/lib/tests/cycles/ and switch to the MLT sampler.
Render is different, lot of errors.  (color_ramp.blend in this example).
@LukasStockner: I found the issue.
Your Metropolis code in device_cpu.cpp comes after the optimized kernels (AVX, SSE41....).
I added a quick #if 0 around those, so on runtime those are skipped and the non optimized kernel (with your MLT code) gets used. :)
Edit: Run some more tests now. Caustics are better with MLT, but other things (Diffuse surfaces, background) are much more noisy. Good start though.
CC'ing @brecht, I guess he will find this interesting. :)
@LukasStockner: I found the issue.
Your Metropolis code in device_cpu.cpp comes after the optimized kernels (AVX, SSE41....).
I added a quick #if 0 around those, so on runtime those are skipped and the non optimized kernel (with your MLT code) gets used. :)
Edit: Run some more tests now. Caustics are better with MLT, but other things (Diffuse surfaces, background) are much more noisy. Good start though.
CC'ing @brecht, I guess he will find this interesting. :)
Ah, ok. I disabled them in my scons config, forgot about that -.-. Glad that it works now.
The overall noise in the image is pretty much expected, but a few tricks (clamping importance, user-provided importance map, noise-aware sampling) might help out with that. Also, Metropolis sampling is naturally better suited for long rendering times instead of quick previews.
Ah, ok. I disabled them in my scons config, forgot about that -.-. Glad that it works now.
The overall noise in the image is pretty much expected, but a few tricks (clamping importance, user-provided importance map, noise-aware sampling) might help out with that. Also, Metropolis sampling is naturally better suited for long rendering times instead of quick previews.
Ok, so here's my test scene. I replaced the HDR with a sky node because it's too big to upload here.
[Testscene_Pool.blend](https://archive.blender.org/developer/F75670/Testscene_Pool.blend)
Another heads up: Using Filter Glossy seems to break rendering, not sure what's going on. I'm going to try to build on my Windows desktop now, been limited to a single core on my laptop so far.
Another heads up: Using Filter Glossy seems to break rendering, not sure what's going on. I'm going to try to build on my Windows desktop now, been limited to a single core on my laptop so far.
New patch version, the SSE/AVX builds work now too, they just aren't executed if Metropolis is selected (Of cource, I will later add code for Metropolis to also work with those). Preview is still not fixed, the RenderTiles seem to be replaced every iteration, I still have to find a way around this (Storing in the CPUDevice doesn't work as well).
The first patch is incremental from the old one, the second one is from trunk.
New patch version, the SSE/AVX builds work now too, they just aren't executed if Metropolis is selected (Of cource, I will later add code for Metropolis to also work with those). Preview is still not fixed, the RenderTiles seem to be replaced every iteration, I still have to find a way around this (Storing in the CPUDevice doesn't work as well).
The first patch is incremental from the old one, the second one is from trunk.
[metropolis_1_to_2.diff](https://archive.blender.org/developer/F75674/metropolis_1_to_2.diff)
[metropolis_2.diff](https://archive.blender.org/developer/F75675/metropolis_2.diff)
New patch version, render passes and SSE/AVX works now.
The UV, Normal, ID and Alpha passes are done in a one-sample path tracing prepass, while all other passes are done with the metropolis sampler.
SSE and AVX is now used for metropolis rendering as well.
The pure pathtracing doesn't crash anymore, apparently I fixed the bug by the way. @MatthewHeimlich: Could you please upload a test file where Filter Glossy crashes, for me it works fine...
The next steps are now Preview rendering and trying out the Quasi-random extension in the Metropolis sampler of the regular LuxRender (no code copying this time, since this code is GPL)
New patch version, render passes and SSE/AVX works now.
The UV, Normal, ID and Alpha passes are done in a one-sample path tracing prepass, while all other passes are done with the metropolis sampler.
SSE and AVX is now used for metropolis rendering as well.
The pure pathtracing doesn't crash anymore, apparently I fixed the bug by the way.
@MatthewHeimlich: Could you please upload a test file where Filter Glossy crashes, for me it works fine...
The next steps are now Preview rendering and trying out the Quasi-random extension in the Metropolis sampler of the regular LuxRender (no code copying this time, since this code is GPL)
Patch 2 to 3: [metropolis_2_to_3.diff](https://archive.blender.org/developer/F75812/metropolis_2_to_3.diff)
Trunk to Patch 3: [metropolis_3.diff](https://archive.blender.org/developer/F75813/metropolis_3.diff)
Hi Lucas, get crash with metropolis_3.diff and fed1b8b.
<
```
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc7fff700 (LWP 18135)]
```
0x00007ffff2850849 in raise () from /lib64/libc.so.6
```
(gdb) bt
```
- 0 0x00007ffff2850849 in raise () from /lib64/libc.so.6
- 1 0x00007ffff2851cd8 in abort () from /lib64/libc.so.6
- 2 0x00007ffff288f114 in __libc_message () from /lib64/libc.so.6
- 3 0x00007ffff289496e in malloc_printerr () from /lib64/libc.so.6
- 4 0x00007ffff2895647 in _int_free () from /lib64/libc.so.6
- 5 0x000000000125d4f3 in ccl::RenderBuffers::device_free() ()
- 6 0x000000000125d53c in ccl::RenderBuffers::~RenderBuffers() ()
- 7 0x0000000001290c92 in ccl::Session::release_tile(ccl::RenderTile&) ()
- 8 0x00000000012ab199 in boost::function1<void, ccl::RenderTile&>::operator()(ccl::RenderTile&) const ()
- 9 0x00000000012ab9e2 in ccl::CPUDevice::thread_path_trace(ccl::DeviceTask&) ()
- 10 0x000000000146aec8 in ccl::TaskScheduler::thread_run(int) ()
- 11 0x0000000001294829 in ccl::thread::run(void*) ()
- 12 0x00007ffff63c70db in start_thread () from /lib64/libpthread.so.0
- 13 0x00007ffff290290d in clone () from /lib64/libc.so.6
```
(gdb)
```
>
[BlankComparisonScene_cycles.blend](https://archive.blender.org/developer/F75817/BlankComparisonScene_cycles.blend)
Cheers, mib.
```
```
@mib2berlin Thanks for the report, however, I can't reproduce it, even with fed1b8b and your file (by the way, two image textures were missing, but I can't imagine that they caused the bug...). Could you please build your Blender with debug info and post a backtrack again?
Either way, metropolis_4 should be ready soon, now stopping the render works again and the MCQMC extension works quite well, too.
@mib2berlin Thanks for the report, however, I can't reproduce it, even with fed1b8b and your file (by the way, two image textures were missing, but I can't imagine that they caused the bug...). Could you please build your Blender with debug info and post a backtrack again?
Either way, metropolis_4 should be ready soon, now stopping the render works again and the MCQMC extension works quite well, too.
Thanks, does not crash with debug build, get crash again with release build but different BT.
<
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbffff700 (LWP 32593)]
0x00007ffff2894a82 in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
0 0x00007ffff2894a82 in malloc_consolidate () from /lib64/libc.so.6
1 0x00007ffff2895cd8 in _int_malloc () from /lib64/libc.so.6
2 0x00007ffff2898993 in calloc () from /lib64/libc.so.6
3 0x0000000001013a8f in MEM_lockfree_callocN ()
4 0x0000000000baf1a5 in render_result_new ()
5 0x0000000000b89f3a in RE_engine_begin_result ()
6 0x000000000122f034 in ccl::BlenderSession::do_write_update_render_tile(ccl::RenderTile&, bool) ()
7 0x0000000001290e81 in ccl::Session::update_tile_sample(ccl::RenderTile&) ()
8 0x00000000012bf88d in ccl::DeviceTask::update_progress(ccl::RenderTile&) ()
9 0x00000000012ab952 in ccl::CPUDevice::thread_path_trace(ccl::DeviceTask&) ()
10 0x000000000146afa8 in ccl::TaskScheduler::thread_run(int) ()
11 0x00000000012947c9 in ccl:🧵:run(void*) ()
12 0x00007ffff63c70db in start_thread () from /lib64/libpthread.so.0
13 0x00007ffff290290d in clone () from /lib64/libc.so.6
(gdb)
Thanks, does not crash with debug build, get crash again with release build but different BT.
<
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbffff700 (LWP 32593)]
0x00007ffff2894a82 in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
- 0 0x00007ffff2894a82 in malloc_consolidate () from /lib64/libc.so.6
- 1 0x00007ffff2895cd8 in _int_malloc () from /lib64/libc.so.6
- 2 0x00007ffff2898993 in calloc () from /lib64/libc.so.6
- 3 0x0000000001013a8f in MEM_lockfree_callocN ()
- 4 0x0000000000baf1a5 in render_result_new ()
- 5 0x0000000000b89f3a in RE_engine_begin_result ()
- 6 0x000000000122f034 in ccl::BlenderSession::do_write_update_render_tile(ccl::RenderTile&, bool) ()
- 7 0x0000000001290e81 in ccl::Session::update_tile_sample(ccl::RenderTile&) ()
- 8 0x00000000012bf88d in ccl::DeviceTask::update_progress(ccl::RenderTile&) ()
- 9 0x00000000012ab952 in ccl::CPUDevice::thread_path_trace(ccl::DeviceTask&) ()
- 10 0x000000000146afa8 in ccl::TaskScheduler::thread_run(int) ()
- 11 0x00000000012947c9 in ccl::thread::run(void*) ()
- 12 0x00007ffff63c70db in start_thread () from /lib64/libpthread.so.0
- 13 0x00007ffff290290d in clone () from /lib64/libc.so.6
(gdb)
>
Opensuse Linux 13.1/64
Intel i5 3770K
GTX 760
GTX 560Ti 448 Cores
Driver 331.20
Build 482823a
Maybe it is Linux only.
Cheers, mib.
Okay, this definitely looks like a out-of-bounds memory access, I'll look at it in Valgrind.
I'm on Linux too (Mint 16 64bit), so this shouldn't be a problem.
Okay, this definitely looks like a out-of-bounds memory access, I'll look at it in Valgrind.
I'm on Linux too (Mint 16 64bit), so this shouldn't be a problem.
New patch version, MCQMC (Quasi-random numbers in the Metropolis sampler) works now, due to lack of reference images I'm not exacty sure whether it's an improvement or not, but it's definitely not worse. Also, the render is stoppable now again.
Regarding the memory bug: I fixed an out-of-bounds error (it didn't generate enough random numbers), however, I'm not sure whether this was the bug that caused the crash. Debugging this in Valgrind is notoriously difficult since it only appears sometimes and Blender in Valgrind is sloooow (30min and upwards for one iteration!).
I tried to bypass the Tile system when using metropolis, but it's such a central part of the Cycles code that this didn't work out at all. Instead, preview rendering now uses standard Path Tracing and F12 rendering still uses Tiles, this won't be as straightforward as I have thought...
New patch version, MCQMC (Quasi-random numbers in the Metropolis sampler) works now, due to lack of reference images I'm not exacty sure whether it's an improvement or not, but it's definitely not worse. Also, the render is stoppable now again.
Regarding the memory bug: I fixed an out-of-bounds error (it didn't generate enough random numbers), however, I'm not sure whether this was the bug that caused the crash. Debugging this in Valgrind is notoriously difficult since it only appears sometimes and Blender in Valgrind is sloooow (30min and upwards for one iteration!).
I tried to bypass the Tile system when using metropolis, but it's such a central part of the Cycles code that this didn't work out at all. Instead, preview rendering now uses standard Path Tracing and F12 rendering still uses Tiles, this won't be as straightforward as I have thought...
[metropolis_4.diff](https://archive.blender.org/developer/F75883/metropolis_4.diff)
Hi Lucas, 4.diff give the the same error posted in my last post.
It crash on every scene I test.
If I switch to Progressive Refine it work.
With your patch it is not possible to render with GPU, gave cuda error.
Compiling CUDA kernel ...
"/usr/local/cuda-5.0/bin/nvcc" -arch=sm_20 -m64 --cubin "/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel.cu" -o "/home/pepo/.config/blender/2.69/cache/cycles_kernel_sm20_8022DE0DC7069375EE41B6065DB5DB0F.cubin" --ptxas-options="-v" --maxrregcount=32 --use_fast_math -I"/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel" -DNVCC -D__KERNEL_CUDA_VERSION__=50
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(50): error: too few arguments in function call
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(52): error: too few arguments in function call
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(56): error: too few arguments in function call
.
.
.
But is not important.
Thank you for your work, mib.
Hi Lucas, 4.diff give the the same error posted in my last post.
It crash on every scene I test.
If I switch to Progressive Refine it work.
With your patch it is not possible to render with GPU, gave cuda error.
```
Compiling CUDA kernel ...
"/usr/local/cuda-5.0/bin/nvcc" -arch=sm_20 -m64 --cubin "/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel.cu" -o "/home/pepo/.config/blender/2.69/cache/cycles_kernel_sm20_8022DE0DC7069375EE41B6065DB5DB0F.cubin" --ptxas-options="-v" --maxrregcount=32 --use_fast_math -I"/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel" -DNVCC -D__KERNEL_CUDA_VERSION__=50
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(50): error: too few arguments in function call
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(52): error: too few arguments in function call
/daten/blender-git/build/bin/2.69/scripts/addons/cycles/kernel/kernel_passes.h(56): error: too few arguments in function call
.
.
.
```
But is not important.
Thank you for your work, mib.
@LukasStockner, you should try GCC Address Sanitizer , it is very fast, one can use it even with RelWithDebInfo or inside gdb. Here is the crashlog:
{P11, lines=15}
So it calls float4 operator+(const float4& a, const float4& b) inside kernel_write_pass_float4 with a null reference.
@LukasStockner, you should try [GCC Address Sanitizer ](http://wiki.blender.org/index.php/Dev:Doc/Tools/Debugging/GCC_Address_Sanitizer), it is very fast, one can use it even with RelWithDebInfo or inside gdb. Here is the crashlog:
{[P11](https://archive.blender.org/developer/P11.txt), lines=15}
So it calls `float4 operator+(const float4& a, const float4& b)` inside `kernel_write_pass_float4` with a null reference.
New patch version, no new features this time, instead I fixed another bug and added a check for a possible error source.
The GCC Address Sanitizer showed only the bug I fixed, now all scenes I tested (including BlankComparisonScene_cycles.blend) run error- and crash-free, even with the Address Sanitizer. If Blender still crashes for someone, please try compiling with the Address Sanitizer (if possible, with a debug build) and post the output here.
The GPU building bug was fixed, but there may be other issues remaining. Sadly, I can't test it since my GPU is still Compute Capability 1.3. However, the metropolis sampler is currently CPU-only, and this will probably not change so soon, since Metropolis sampling, in contrast to Sobol sampling, is inherently a sequential algorithm. The only way to work around this is using many sampling chains in parallel (basically, this is ERPT), but this isn't nearly as efficient as using only a small number of chains on the CPU.
Apart from bugfixes, the next steps will be the Tile system and better importance functions (some Noise-detecting algorithm could be useful, like the one described in http://graphics.cs.illinois.edu/papers/importance). Also, user-defined importance maps seem quite useful, but these would need a good UI. By the way, this should also work for non-metropolis sampling.
New patch version, no new features this time, instead I fixed another bug and added a check for a possible error source.
The GCC Address Sanitizer showed only the bug I fixed, now all scenes I tested (including BlankComparisonScene_cycles.blend) run error- and crash-free, even with the Address Sanitizer. If Blender still crashes for someone, please try compiling with the Address Sanitizer (if possible, with a debug build) and post the output here.
The GPU building bug was fixed, but there may be other issues remaining. Sadly, I can't test it since my GPU is still Compute Capability 1.3. However, the metropolis sampler is currently CPU-only, and this will probably not change so soon, since Metropolis sampling, in contrast to Sobol sampling, is inherently a sequential algorithm. The only way to work around this is using many sampling chains in parallel (basically, this is ERPT), but this isn't nearly as efficient as using only a small number of chains on the CPU.
Apart from bugfixes, the next steps will be the Tile system and better importance functions (some Noise-detecting algorithm could be useful, like the one described in http://graphics.cs.illinois.edu/papers/importance). Also, user-defined importance maps seem quite useful, but these would need a good UI. By the way, this should also work for non-metropolis sampling.
[metropolis_5.diff](https://archive.blender.org/developer/F76009/metropolis_5.diff)
Hi Lucas, no more crashes with the last patch withal my testfiles.
Cuda is working with my CC 2.0 and 3.0 cards.
Nice process, thanks.
I got message in terminal during render:
Pixel sampling error, expect crashing!
But render fine.
Cheers, mib.
Hi Lucas, no more crashes with the last patch withal my testfiles.
Cuda is working with my CC 2.0 and 3.0 cards.
Nice process, thanks.
I got message in terminal during render:
```
Pixel sampling error, expect crashing!
```
But render fine.
Cheers, mib.
Whoa, sorry about that :/ But this at least shows where the error lies: Somehow, the sampler goes outside of the image, I'll look at it. By the way: Is it possible that you use an image mutation range > 1 ?
Whoa, sorry about that :/ But this at least shows where the error lies: Somehow, the sampler goes outside of the image, I'll look at it. By the way: Is it possible that you use an image mutation range > 1 ?
Ok, so I found another bug, this time it was related to floating-point math. In the Mutate function, there is code like
if (x < 0.0f) x += 1.0f;
if (x >= 1.0f) x -= 1.0f;
, so I thought it would always be 0 <= x < 1. However, apparantly sometimes x is so slightly under 0 that the if is executed, but x + 1 gets rounded up to 1, which causes code like (int) (x * width) to sometimes round up to width, which then causes a out-of-bounds access.
To fix it, just add
if (s >= 1.f) s = 1.f-FLT_EPSILON;
if (s < 0.f) s = 0.f;
after line 301 in intern/cycles/device/device_cpu.cpp (This is just too small to make a metropolis_6.diff for it)
Regarding the error message: I have failed massively, I forgot the Negation in there, so the line got printed out when everything worked fine -.-
To remove it, just delete
Ok, so I found another bug, this time it was related to floating-point math. In the Mutate function, there is code like
```
if (x < 0.0f) x += 1.0f;
if (x >= 1.0f) x -= 1.0f;
```
, so I thought it would always be 0 <= x < 1. However, apparantly sometimes x is so slightly under 0 that the if is executed, but x + 1 gets rounded up to 1, which causes code like (int) (x * width) to sometimes round up to width, which then causes a out-of-bounds access.
To fix it, just add
```
if (s >= 1.f) s = 1.f-FLT_EPSILON;
if (s < 0.f) s = 0.f;
```
after line 301 in intern/cycles/device/device_cpu.cpp (This is just too small to make a metropolis_6.diff for it)
Regarding the error message: I have failed massively, I forgot the Negation in there, so the line got printed out when everything worked fine -.-
To remove it, just delete
```
if (m->samples[0] >= 0 && m->samples[1] >= 0 && m->samples[0] < 1 && m->samples[1] < 1)
printf("Pixel sampling error, expect crashing!\n");
```
, also in device_cpu.cpp
New patch version, this time I included the bugfixes described above and fixed another quite serious bug in the sample contribution, now the rendered images are way smoother and diffuse areas converge faster.
Also, as a test for noise-based Importance Sampling, I added a perceptual noise pass as described in the paper I posted recently (In fact, currently I replaced the Mist pass as my new pass didn't work, but that's just a temporary solution), both to the Metropolis sampler and to the standard path tracer. Especially for the PT, the results are really great. As an example, I rendered the Lego Bulldozer from http://www.blendswap.com/blends/view/72124 and this are the results:
To test it, just activate the Mist render pass and multiply it by some small value in the compositor since the output is usually > 1.
The patch is here, but as there seem to be conflicts with the current Trunk, I recommend to apply to 3d8c106, since my local repo uses that one as origin/master.
New patch version, this time I included the bugfixes described above and fixed another quite serious bug in the sample contribution, now the rendered images are way smoother and diffuse areas converge faster.
Also, as a test for noise-based Importance Sampling, I added a perceptual noise pass as described in the paper I posted recently (In fact, currently I replaced the Mist pass as my new pass didn't work, but that's just a temporary solution), both to the Metropolis sampler and to the standard path tracer. Especially for the PT, the results are really great. As an example, I rendered the Lego Bulldozer from http://www.blendswap.com/blends/view/72124 and this are the results:


To test it, just activate the Mist render pass and multiply it by some small value in the compositor since the output is usually > 1.
The patch is here, but as there seem to be conflicts with the current Trunk, I recommend to apply to 3d8c106, since my local repo uses that one as origin/master.
[metropolis_6.diff](https://archive.blender.org/developer/F76506/metropolis_6.diff)
Here's a test of the noise map generator with some pokemon models I have laying around:
Seems to be working pretty well. Highlighting shadows, edges, furry spots, etc.
Here's a test of the noise map generator with some pokemon models I have laying around:


Seems to be working pretty well. Highlighting shadows, edges, furry spots, etc.
Don't know what I did wrong, but with metropolis_6.diff and 3d8c106 all shadows and reflections are wrong for me. But previous versions of this patch gave plausible results (except for dark rectangles).
Don't know what I did wrong, but with metropolis_6.diff and 3d8c106 all shadows and reflections are wrong for me. But previous versions of this patch gave plausible results (except for dark rectangles).


I now rebased to the current trunk, everything seems to work fine. @Lockal, please try this one out as well, for me reflection/refraction works just fine (I really need a second test system, somehow bugs never show up on my system...)
Also, this version includes D301 as a first step in noise-adaptive sampling. metropolis_7.diff
I now rebased to the current trunk, everything seems to work fine. @Lockal, please try this one out as well, for me reflection/refraction works just fine (I really need a second test system, somehow bugs never show up on my system...)
Also, this version includes [D301](https://archive.blender.org/developer/D301) as a first step in noise-adaptive sampling.
[metropolis_7.diff](https://archive.blender.org/developer/F76596/metropolis_7.diff)
reflection/refraction working fine here on both patches. Something I should've pointed out before: In intern/cycles/util/util_color.h line 240, Clang/OS X doesn't like
return exp10(log_i);
Jens Verwiebe pointed out to me in IRC it works with:
return pow(10, log_i);
So I had it built with that.
Also, on adaptive sampling: While I won't claim to know how they work behind the scenes, the adaptive samplers in Vray and Mental Ray seem to work fine running on each tile individually. Might it be possible to try this with Cycles? (maybe only updating the noise map every n number of samples?)
reflection/refraction working fine here on both patches. Something I should've pointed out before: In intern/cycles/util/util_color.h line 240, Clang/OS X doesn't like
```
return exp10(log_i);
```
Jens Verwiebe pointed out to me in IRC it works with:
```
return pow(10, log_i);
```
So I had it built with that.
Also, on adaptive sampling: While I won't claim to know how they work behind the scenes, the adaptive samplers in Vray and Mental Ray seem to work fine running on each tile individually. Might it be possible to try this with Cycles? (maybe only updating the noise map every *n* number of samples?)
Thanks for the exp10 thing, of course you're right (I think exp(log_i * log(10)), as described in the GNU libc documentation, should be even faster since log(10) is constant).
Regarding adaptive sampling: Basically, adaptive sampling in the tile works as well. However, consider this: The left half of the image is nearly noise-free, while the right half is very noisy. Now, with adaptive sampling, you want the right side to receive more samples. However, when the right side is rendered, the left side is possibly not rendered yet, so there is no way to do that. So, if you implement it that way, the samples can only be adaptively distributed inside the tile, but not between tiles.
Indeed, your last sentence is quite what I also intend to do, this is the reason for D301: Rendering, for example, 10 samples with a tiled approach on the whole image and then creating a noise map that is used for inter- and intra-tile sample distribution which is then used for the next 10 samples.
Thanks for the exp10 thing, of course you're right (I think exp(log_i * log(10)), as described in the GNU libc documentation, should be even faster since log(10) is constant).
Regarding adaptive sampling: Basically, adaptive sampling in the tile works as well. However, consider this: The left half of the image is nearly noise-free, while the right half is very noisy. Now, with adaptive sampling, you want the right side to receive more samples. However, when the right side is rendered, the left side is possibly not rendered yet, so there is no way to do that. So, if you implement it that way, the samples can only be adaptively distributed *inside* the tile, but not *between* tiles.
Indeed, your last sentence is quite what I also intend to do, this is the reason for [D301](https://archive.blender.org/developer/D301): Rendering, for example, 10 samples with a tiled approach on the whole image and then creating a noise map that is used for inter- and intra-tile sample distribution which is then used for the next 10 samples.
Guys u are are doing soo good joob! if u need some test machines i have i5 laptop. and 2 Macbook pro laptops... And They are totaly free so i can set some long time rendering just give the link to download and say what to do :>
Guys u are are doing soo good joob! if u need some test machines i have i5 laptop. and 2 Macbook pro laptops... And They are totaly free so i can set some long time rendering just give the link to download and say what to do :>
How did you plan the adaptive sampling to work? Adaptively distributing a fixed number of samples across the image? Or setting a range of possible AA samples and letting each tile cut off where needed in that range? Because Vray/MR do the latter, and that avoids the problem of some tiles needing more samples than others. For example:
You set a min and max number of AA samples, and some target noise threshold. Once a tile reaches the min value, it checks noise level every n samples, and stops upon it either falling below the threshold, or hitting the max AA samples value. This way you can keep the coherency of individual tiles, but still let some tiles have far more samples than others.
How did you plan the adaptive sampling to work? Adaptively distributing a fixed number of samples across the image? Or setting a range of possible AA samples and letting each tile cut off where needed in that range? Because Vray/MR do the latter, and that avoids the problem of some tiles needing more samples than others. For example:
You set a min and max number of AA samples, and some target noise threshold. Once a tile reaches the min value, it checks noise level every *n* samples, and stops upon it either falling below the threshold, or hitting the max AA samples value. This way you can keep the coherency of individual tiles, but still let some tiles have far more samples than others.
Oops, with F12 render everything is ok (except for tiles, of course). But "rendered" viewport mode has very obvious problems. I did a full recompilation with metropolis_7 patch (gcc 4.8, linux x86-64), but the problem is still there.
Oops, with F12 render everything is ok (except for tiles, of course). But "rendered" viewport mode has very obvious problems. I did a full recompilation with metropolis_7 patch (gcc 4.8, linux x86-64), but the problem is still there.
@JasonClarke The latter one is actually a quite great idea, this would even add a stopping criterion that could be useful for renderfarms, animation rendering etc. The only question is whether the average noise of the tile or the maximum noise in the tile is considered. Alternatively, we could go the LuxRender way and add a user-provided percentage of pixels that has to pass the test.
My original idea was to use code like in the EnvMap importance sampling to map the uniform sample values to noise-accordingly distributed ones. This might be added as an option.
@MaciejJutrzenka Currently, you have to build it from source (see http://wiki.blender.org/index.php/Dev:Doc/Building_Blender), but between downloading the source and building you have to apply the metropolis_7 patch from above.
@JasonClarke The latter one is actually a quite great idea, this would even add a stopping criterion that could be useful for renderfarms, animation rendering etc. The only question is whether the average noise of the tile or the maximum noise in the tile is considered. Alternatively, we could go the LuxRender way and add a user-provided percentage of pixels that has to pass the test.
My original idea was to use code like in the EnvMap importance sampling to map the uniform sample values to noise-accordingly distributed ones. This might be added as an option.
Do both! Use the importance sampling within the tile to hit the threshold faster. As far as a avg noise vs max noise vs changed pixels vs pixels below threshold, I don't really know. Might be best to just give several options so people can test. After running it through some scenes, you could hide/disable modes that prove unreliable.
Do both! Use the importance sampling within the tile to hit the threshold faster. As far as a avg noise vs max noise vs changed pixels vs pixels below threshold, I don't really know. Might be best to just give several options so people can test. After running it through some scenes, you could hide/disable modes that prove unreliable.
Do I read this right that there might be soon a "noise-based Importance Sampling" so Cycles spends more time on where actual noise is not not where there result is already clean?
That would be terrific!
lukasstockner97,
Do I read this right that there might be soon a "noise-based Importance Sampling" so Cycles spends more time on where actual noise is not not where there result is already clean?
That would be terrific!
@ClaasKuhnen Yes, that's currently the plan... @MaciejJutrzenka Sadly, not yet. @JasonClarke The only thing missing for this is an actual noise estimate since the current output is in fact a variance estimate (more precisely, a visually weighted RMS estimate), not a noise estimate. Basically, its value is proportional to the difficulty, not to the remaining noise. On the one hand, this is great, since we can sample directly from it, but on the other hand to determine the noise left in a tile we'd need a separate estimate, although for this easier methods are availible.
I'm currently working on adaptive sampling, I'll post a new patch once it works good enough.
@ClaasKuhnen Yes, that's currently the plan...
@MaciejJutrzenka Sadly, not yet.
@JasonClarke The only thing missing for this is an actual noise estimate since the current output is in fact a variance estimate (more precisely, a visually weighted RMS estimate), not a noise estimate. Basically, its value is proportional to the difficulty, not to the remaining noise. On the one hand, this is great, since we can sample directly from it, but on the other hand to determine the noise left in a tile we'd need a separate estimate, although for this easier methods are availible.
I'm currently working on adaptive sampling, I'll post a new patch once it works good enough.
@ lukasstockner97
Regarding difficulty vs remaining noise, what about the idea shown here? (http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=1985872&viewfull=1#post1985872)
New patch version, adaptive sampling works now in PT mode, the Noise-Aware Metropolis isn't too hard now as well.
I haven't added a stopping criterion yet, this will require a redesign of the Tile Manager since currently whole Cycles is based on the assumption that the samples per pixel is the same for every pixel (at least in a tile).
The current patch works around this by using a Sample number pass that is used in render/buffers.cpp (you can see the values of the pass in the shadow pass, which I currently use until I figure out how to output a new pass).
Also, at the moment adaptive sampling is only inside of individual tiles, every tile gets the same amunt of total work (this will change in the future, I plan to distribute samples to the tiles according to their mean importance). So, if you have one tile with high variance in the whole tile and one with basically zero variance, they will still be sampled equally.
Under the performance options, you can check adaptive sampling. There are two options: Adaptive Warmup, which sets the number of uniform samples taken per pixel to estimate importance before the adaptive sampling starts. Don't set this to low (~under 10) or it might miss difficult regions. The second one is the importance map interval, which is the number of samples taken until a new importance map is calculated. Too low values might give a significant performance hit.
Speaking about performance: Toe code is not optimized yet, in particular, the 4-pixel gaussian blur on the importance map is probably a big performance hit (maybe a simple box filter would suffice?). Also, when using progressive mode, a new map is computed every sample, so it's probably quite slow.
GPU is not included yet, but, in contrast to Metropolis, adaptive sampling should be possible there too. @lsscpp This might work, using a more complex visual difference predictor would probably be even better, but they tend to be much slower than the approach you posted. An aternative is just a fixed samples/unit of importance value, I think I'll add an option to choose betwenn stopping criteria.
New patch version, adaptive sampling works now in PT mode, the Noise-Aware Metropolis isn't too hard now as well.
I haven't added a stopping criterion yet, this will require a redesign of the Tile Manager since currently whole Cycles is based on the assumption that the samples per pixel is the same for every pixel (at least in a tile).
The current patch works around this by using a Sample number pass that is used in render/buffers.cpp (you can see the values of the pass in the shadow pass, which I currently use until I figure out how to output a new pass).
Also, at the moment adaptive sampling is only inside of individual tiles, every tile gets the same amunt of total work (this will change in the future, I plan to distribute samples to the tiles according to their mean importance). So, if you have one tile with high variance in the whole tile and one with basically zero variance, they will still be sampled equally.
Under the performance options, you can check adaptive sampling. There are two options: Adaptive Warmup, which sets the number of uniform samples taken per pixel to estimate importance before the adaptive sampling starts. Don't set this to low (~under 10) or it might miss difficult regions. The second one is the importance map interval, which is the number of samples taken until a new importance map is calculated. Too low values might give a significant performance hit.
Speaking about performance: Toe code is *not* optimized yet, in particular, the 4-pixel gaussian blur on the importance map is probably a big performance hit (maybe a simple box filter would suffice?). Also, when using progressive mode, a new map is computed every sample, so it's probably quite slow.
GPU is not included yet, but, in contrast to Metropolis, adaptive sampling should be possible there too.
@lsscpp This might work, using a more complex visual difference predictor would probably be even better, but they tend to be much slower than the approach you posted. An aternative is just a fixed samples/unit of importance value, I think I'll add an option to choose betwenn stopping criteria.
[metropolis_8.diff](https://archive.blender.org/developer/F78116/metropolis_8.diff)
Wow! Can't wait to test it (whenever a build apparso on graphicall). About more advanced difference predicatore, i remember i posted somewhere a link to a paper named something like "Entropy variance". That could be something worth to look at
Wow! Can't wait to test it (whenever a build apparso on graphicall). About more advanced difference predicatore, i remember i posted somewhere a link to a paper named something like "Entropy variance". That could be something worth to look at
Adaptive sampling on (warmpup=25, map update=25. 3:03):
Quick test on the Mike Pan BMW scene:
No adaptive sampling (2:39): 
Adaptive sampling on (warmpup=25, map update=25. 3:03): 
@JasonClarke OK, this isn't much improvement. What rendering settings did you use (sample number, tile size)? @lsscpp Once the adaptive sampling works in Metropolis mode, I'll post Windows and Linux (both x64) builds on graphicall, for other platforms (x86 and Mac) somebody using them would have to post a build. Maybe once these are online we should put a link into blenderartists to get some beta testers. Concering the entropy paper: It certainly looks interesing, do you have any information regarding realtime performance?
@JasonClarke OK, this isn't much improvement. What rendering settings did you use (sample number, tile size)?
@lsscpp Once the adaptive sampling works in Metropolis mode, I'll post Windows and Linux (both x64) builds on graphicall, for other platforms (x86 and Mac) somebody using them would have to post a build. Maybe once these are online we should put a link into blenderartists to get some beta testers. Concering the entropy paper: It certainly looks interesing, do you have any information regarding realtime performance?
The rest of the settings on that test were the defaults for the BMW scene, so 128x64 tiles, 200 progressive samples. I'm not sure it's realistic to expect much better when we still have the same number of AA samples on all tiles. Large tile sizes have a performance hit of their own in CPU mode, so just making them bigger isn't realistic either. I think we really need to wait until there's an option to stop some tiles before max samples (that way you can just pad out the max AA value and only use it on the tricky tiles).
The rest of the settings on that test were the defaults for the BMW scene, so 128x64 tiles, 200 progressive samples. I'm not sure it's realistic to expect much better when we still have the same number of AA samples on all tiles. Large tile sizes have a performance hit of their own in CPU mode, so just making them bigger isn't realistic either. I think we really need to wait until there's an option to stop some tiles before max samples (that way you can just pad out the max AA value and only use it on the tricky tiles).
@LukasStockner no, unfortunately i have no clue about performance @JasonClarke can you please make another BMW test with no adaptive, letting cycles run for 3:03 as well, so we can see how better the algorithm distributed the samples in the same amount of time?
@LukasStockner no, unfortunately i have no clue about performance
@JasonClarke can you please make another BMW test with no adaptive, letting cycles run for 3:03 as well, so we can see how better the algorithm distributed the samples in the same amount of time?
New patch version, this time with the focus again on Metropolis. The last patch broke it due to the samples pass, now I fixed the bug. Also, there was another pretty serious one where samples were written to the wrong pixels, this is fixed now too. Preview rendering should work now as well.
Noise-adaptive sampling in Metropolis doesn't work yet, but if you check adaptive sampling, it uses another trick from the Importance-Sampling paper I posted a while ago: The importance function is divided by the current brightness of the pixel, to that the samples are distributed more evenly (by default, the Metropolis sampler samples according to brightness). By doing so, it focuses more on good lightpaths instead of just bright regions, you can see this in the Shadow (amount of samples) and Emission (Importance) channels (by the way, sorry for overriding all the default channels...). I haven't run extensive tests with this one yet, but is seems as if it gives a nice enhancement.
Regarding tests: I rendered the Sintel hair scene from http://www.blenderguru.com/videos/how-to-render-hair-with-cycles/ in two instances of blender, one with Metropolis and one without. Both had 2 threads and two tiles, one at the top half and one on the bottom half. After 2 hours, this were the results:
Metropolis:
Pathtracing:
As you can see, the Metropolis version has remarkably less noise, especially around the neck and the lower part of the hair. Also, this shows that Metropolis converges to the correct solution.
Another test is the pool scene, also 2 threads and 12 hours (probably less would have been enough):
The next patch might take 1-3 weeks since I have to work on another project for school (it's also rendering/CGI, so I won't get out of practise), but then the Tiling issues should be gone.
New patch version, this time with the focus again on Metropolis. The last patch broke it due to the samples pass, now I fixed the bug. Also, there was another pretty serious one where samples were written to the wrong pixels, this is fixed now too. Preview rendering should work now as well.
Noise-adaptive sampling in Metropolis doesn't work yet, but if you check adaptive sampling, it uses another trick from the Importance-Sampling paper I posted a while ago: The importance function is divided by the current brightness of the pixel, to that the samples are distributed more evenly (by default, the Metropolis sampler samples according to brightness). By doing so, it focuses more on good lightpaths instead of just bright regions, you can see this in the Shadow (amount of samples) and Emission (Importance) channels (by the way, sorry for overriding all the default channels...). I haven't run extensive tests with this one yet, but is seems as if it gives a nice enhancement.
Regarding tests: I rendered the Sintel hair scene from http://www.blenderguru.com/videos/how-to-render-hair-with-cycles/ in two instances of blender, one with Metropolis and one without. Both had 2 threads and two tiles, one at the top half and one on the bottom half. After 2 hours, this were the results:
Metropolis:

Pathtracing:

As you can see, the Metropolis version has remarkably less noise, especially around the neck and the lower part of the hair. Also, this shows that Metropolis converges to the correct solution.
Another test is the pool scene, also 2 threads and 12 hours (probably less would have been enough):

The next patch might take 1-3 weeks since I have to work on another project for school (it's also rendering/CGI, so I won't get out of practise), but then the Tiling issues should be gone.
[metropolis_9.diff](https://archive.blender.org/developer/F78599/metropolis_9.diff)
It turned out I forgot to save aome code before creating the diff, so here is the correct version:
[metropolis_10.diff](https://archive.blender.org/developer/F78621/metropolis_10.diff)
Originally I wanted to release a new patch once all the tiling stuff is done, but I randomly found that a small change in the sample number pass causes the noise on diffuse areas/background to be completely gone. For example, a plane under a sky background is now nearly noise-free in a single pass.
Also, Metropolis sampling is now tile-free, every thread works on the whole image. This means that they can now also share the mean importance calculation which caused the brightness differences between tiles.
Somehow, screen-wide sampling seems to break automatic EXR writing, for example, when rendering the Sintel hair scene, after finishing/cancelling the render, Blender freezes with an OpenEXR error on the console, I'll look into this further.
Adaptive sampling is still only per tile, also the Division-By-Mean-Brightness was disabled for now. Both will be fixed in a later patch version, of course.
Just to give a impression of what the Patch is capable of now, consider this scene:
That's the Villa scene from PBRT, imported into Blender with a quick and hacky pbrt-to-obj converter and rendered over the night. The materials are re-done with Cycles. Nearly every surface in this scene has a glossy component, even the wood and the walls/ceilings. The light comes from small emitters completely behing glass, two polit light sources inside of the spherical things on the ceiling and, although only a very small amount, from the TV. For pure Pathtracing, this scene is basically worst-case, while with Metropolis even the area to the left, illuminated through three glass panes, is nearly noise-free. The noise remaining in the back section of the room should get better once the Division-By-Mean-Brightness (this name is horrible...) works again. The dark artifacts around the light sources come from tonemapping.
This is the classic pool scene, rendered in just 8min.
Originally I wanted to release a new patch once all the tiling stuff is done, but I randomly found that a small change in the sample number pass causes the noise on diffuse areas/background to be completely **gone**. For example, a plane under a sky background is now nearly noise-free in a *single pass*.
Also, Metropolis sampling is now tile-free, every thread works on the whole image. This means that they can now also share the mean importance calculation which caused the brightness differences between tiles.
Somehow, screen-wide sampling seems to break automatic EXR writing, for example, when rendering the Sintel hair scene, after finishing/cancelling the render, Blender freezes with an OpenEXR error on the console, I'll look into this further.
Adaptive sampling is still only per tile, also the Division-By-Mean-Brightness was disabled for now. Both will be fixed in a later patch version, of course.
Just to give a impression of what the Patch is capable of now, consider this scene:

That's the Villa scene from PBRT, imported into Blender with a quick and hacky pbrt-to-obj converter and rendered over the night. The materials are re-done with Cycles. Nearly every surface in this scene has a glossy component, even the wood and the walls/ceilings. The light comes from small emitters completely behing glass, two polit light sources inside of the spherical things on the ceiling and, although only a very small amount, from the TV. For pure Pathtracing, this scene is basically worst-case, while with Metropolis even the area to the left, illuminated through three glass panes, is nearly noise-free. The noise remaining in the back section of the room should get better once the Division-By-Mean-Brightness (this name is horrible...) works again. The dark artifacts around the light sources come from tonemapping.

This is the classic pool scene, rendered in just 8min.
[metropolis_11.diff](https://archive.blender.org/developer/F79601/metropolis_11.diff)
Hi, I would like to test the new development but got build error.
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:21:33: fatal error: kernel/kernel_types.h: Datei oder Verzeichnis nicht gefunden
#include <kernel/kernel_types.h>
@mib2berlin
Sorry, my fault, it seems like you have a remarkably strict compiler^^ metropolis_12.diff, that should fix it. @MaciejJutrzenka
8 Minutes on a FX8350, I'm glad you guys like it. Regarding the builds, for this it has to be accepted into trunk, but IMO it's not ready for code review yet, the features aren't complete yet and the codestyle is rather messy. But once I got the Windows buildsystem running, I'll post Windows/Linux builds on GraphicsAll. Sadly, I can't build for Mac.
@mib2berlin
Sorry, my fault, it seems like you have a remarkably strict compiler^^
[metropolis_12.diff](https://archive.blender.org/developer/F79651/metropolis_12.diff), that should fix it.
@MaciejJutrzenka
8 Minutes on a FX8350, I'm glad you guys like it. Regarding the builds, for this it has to be accepted into trunk, but IMO it's not ready for code review yet, the features aren't complete yet and the codestyle is rather messy. But once I got the Windows buildsystem running, I'll post Windows/Linux builds on GraphicsAll. Sadly, I can't build for Mac.
Nope, it is really strict. ^^
gcc (SUSE Linux) 4.8.1
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:21:33: fatal error: kernel/kernel_types.h: Datei oder Verzeichnis nicht gefunden
#include "kernel/kernel_types.h"
^
compilation terminated.
Thanks for fast reply, mib.
Nope, it is really strict. ^^
gcc (SUSE Linux) 4.8.1
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:21:33: fatal error: kernel/kernel_types.h: Datei oder Verzeichnis nicht gefunden
#include "kernel/kernel_types.h"
```
^
```
compilation terminated.
Thanks for fast reply, mib.
Compiling ==> 'device_cpu.cpp'
In file included from intern/cycles/device/device_cpu.cpp:45:
intern/cycles/util/util_metropolis.h:80:60: error: use of undeclared identifier 'ulong'; did you mean
'long'?
rng = lcg_init(hash_int_2d(kg->__data.integrator.seed, ((ulong) this & 0xffffffff) ...
^~~~~
long
intern/cycles/util/util_metropolis.h:80:90: error: use of undeclared identifier 'ulong'; did you mean
'long'?
...= lcg_init(hash_int_2d(kg->__data.integrator.seed, ((ulong) this & 0xffffffff) ^ ((ulong) this & (0...
^~~~~
long
2 errors generated.
scons: *** [/Volumes/Home/Jason/Developer/Blender/build/darwin/intern/cycles/device/device_cpu.o] Error 1
scons: building terminated because of errors.
OS X/Clang is complaining about something else:
Compiling ==> 'device_cpu.cpp'
In file included from intern/cycles/device/device_cpu.cpp:45:
intern/cycles/util/util_metropolis.h:80:60: error: use of undeclared identifier 'ulong'; did you mean
```
'long'?
rng = lcg_init(hash_int_2d(kg->__data.integrator.seed, ((ulong) this & 0xffffffff) ...
^~~~~
long
```
intern/cycles/util/util_metropolis.h:80:90: error: use of undeclared identifier 'ulong'; did you mean
```
'long'?
...= lcg_init(hash_int_2d(kg->__data.integrator.seed, ((ulong) this & 0xffffffff) ^ ((ulong) this & (0...
^~~~~
long
```
2 errors generated.
scons: *** [/Volumes/Home/Jason/Developer/Blender/build/darwin/intern/cycles/device/device_cpu.o] Error 1
scons: building terminated because of errors.
@mib2berlin
The include thing is really strange, try if this one works better. If not, I'm out of ideas. @JasonClarke
This part was probably overkill, now it should only give a warning about precision loss, which is no problem.
OK, I really need to install a second compiler :D
@mib2berlin
The include thing is really strange, try if this one works better. If not, I'm out of ideas.
@JasonClarke
This part was probably overkill, now it should only give a warning about precision loss, which is no problem.
[metropolis_13.diff](https://archive.blender.org/developer/F79652/metropolis_13.diff)
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:44:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h: In function ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h:28:8: warning: no previous declaration for ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’ [-Wmissing-declarations]
float* variance_to_importance(float buffer, KernelFilm film, int stride, int pass_stride, int offset, int x_ofs, int y_ofs, int width, int height) {
^
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h: In constructor ‘ccl::Metropolis::Metropolis(ccl::KernelGlobals*, double*, double*)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:81:65: error: cast from ‘ccl::Metropolis*’ to ‘ccl::uint {aka unsigned int}’ loses precision [-fpermissive]
Lucas, may with new error you know more:
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:44:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h: In function ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h:28:8: warning: no previous declaration for ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’ [-Wmissing-declarations]
float* variance_to_importance(float *buffer, KernelFilm* film, int stride, int pass_stride, int offset, int x_ofs, int y_ofs, int width, int height) {
```
^
```
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h: In constructor ‘ccl::Metropolis::Metropolis(ccl::KernelGlobals*, double*, double*)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:81:65: error: cast from ‘ccl::Metropolis*’ to ‘ccl::uint {aka unsigned int}’ loses precision [-fpermissive]
```
rng = lcg_init(hash_int_2d(kg->__data.integrator.seed, (uint) this));
^
```
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o] Fehler 1
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/all] Fehler 2
make: *** [all] Fehler 2
Sorry for so much trouble, mib.
In file included from intern/cycles/device/device_cpu.cpp:45:
intern/cycles/util/util_metropolis.h:81:58: error: cast from pointer to smaller type 'uint' (aka 'unsigned int') loses
information
rng = lcg_init(hash_int_2d(kg->__data.integrator.seed, (uint) this));
^~~~~~~~~~~
1 error generated.
scons: *** [/Volumes/Home/Jason/Developer/Blender/build/darwin/intern/cycles/device/device_cpu.o] Error 1
scons: building terminated because of errors.
Clang barfs there too:
```
In file included from intern/cycles/device/device_cpu.cpp:45:
intern/cycles/util/util_metropolis.h:81:58: error: cast from pointer to smaller type 'uint' (aka 'unsigned int') loses
information
rng = lcg_init(hash_int_2d(kg->__data.integrator.seed, (uint) this));
^~~~~~~~~~~
```
1 error generated.
```
scons: *** [/Volumes/Home/Jason/Developer/Blender/build/darwin/intern/cycles/device/device_cpu.o] Error 1
scons: building terminated because of errors.
```
@mib2berlin, @JasonClarke Sorry for all these errors, I hope this patch finally works. @MatthewHeimlich Not really since I never used TortoiseGit, but just try this one.
@mib2berlin, @JasonClarke Sorry for all these errors, I hope this patch finally works.
@MatthewHeimlich Not really since I never used TortoiseGit, but just try this one.
[metropolis_14.diff](https://archive.blender.org/developer/F79662/metropolis_14.diff)
Ok, patch 14 did the trick! Here's two test renders. The thing that hit me immediately was how slow MLT mode is now. It took 90 seconds to do 10 samples. Progressive can do a 120-sample render in the same amount of time. (here's the outputs of those two)
Progressive:
MLT:
(also, something seems funny with the volume sphere in MLT)
Here's a 10 sample render with the BMW, took about 8mins.
Progressive on that same scene is 2:40 for 200 samples.
Ok, patch 14 did the trick! Here's two test renders. The thing that hit me immediately was how slow MLT mode is now. It took 90 seconds to do 10 samples. Progressive can do a 120-sample render in the same amount of time. (here's the outputs of those two)
Progressive:

MLT:

(also, something seems funny with the volume sphere in MLT)
Here's a 10 sample render with the BMW, took about 8mins.
Progressive on that same scene is 2:40 for 200 samples.
@JasonClarke Yes, the speed is indeed extremely low. At the moment, I can think of three reasons for this:
First, sampling overhead. Especially a high max bounce value can slow it down significantly, lazy sample generation could help there. However, with increasing scene complexity, the impact of this should reduce.
Second, caching. In the classical PT, most of the geometry should be still in cache from the previous Ray, while in Metropolis, at least for large mutations, this is usually not the case. This issue probably gets bigger with scene complaxity.
Third, BVH traversal. I haven't looked at the Cycles BVH yet, but considering it is originally targeted at GPUs, there is quite a chance that it is designed for high ray coherence, which, as said above, is not given in Metropolis.
Probably, running a full Valgrind profiling session might give some hints wich one of these is responsible.
@ThomasDinges: Thanks, I'll include it in the next patch.
@JasonClarke Yes, the speed is indeed extremely low. At the moment, I can think of three reasons for this:
First, sampling overhead. Especially a high max bounce value can slow it down significantly, lazy sample generation could help there. However, with increasing scene complexity, the impact of this should reduce.
Second, caching. In the classical PT, most of the geometry should be still in cache from the previous Ray, while in Metropolis, at least for large mutations, this is usually not the case. This issue probably gets bigger with scene complaxity.
Third, BVH traversal. I haven't looked at the Cycles BVH yet, but considering it is originally targeted at GPUs, there is quite a chance that it is designed for high ray coherence, which, as said above, is not given in Metropolis.
Probably, running a full Valgrind profiling session might give some hints wich one of these is responsible.
@ThomasDinges: Thanks, I'll include it in the next patch.
Hm, maybe a windows problem.
Render testfile here, it is slow but impossible with path in any time. :)
Need 16 minutes on i5.
Btw., default cube need 20 seconds here.
Cheers, mib.
Hm, maybe a windows problem.
Render testfile here, it is slow but impossible with path in any time. :)

Need 16 minutes on i5.
Btw., default cube need 20 seconds here.
Cheers, mib.
Lucas, I start a thread on BA about your work.
May you jump in if you finished the win/lin builds for user.
http://www.blenderartists.org/forum/showthread.php?329089-Cycles-MLT-patch
Cheers, mib.
@ThomasDinges Awesome, my building system on Windows somehow refuses to work, so thanks! @mib2berlin Thanks, certainly a good idea.
Unfortunately, my BA account was blocked by the Spamfilter, so I currently have to await manual activation...
Also, I only have access to a slow laptop and an even slower WiFi until Friday, so, while I will continue to work on the patch, I probably won't be making big progress.
Over SSH, through, I will try to run a profiler session on my PC at home, maybe this will give some more information on where the bottleneck is.
Once my BA account works, I'll post a somewhat more detailed description there. The next steps IMO are working screen-wide sample distibution (Probably the best approach is to let every tile run ~25 samples, then calculate a sample distribution. If any tile has received more samples than its allocated sample budget, it is stopped) and speed improvement (The main question here is whether BVH and cache or the sampler itself are the bottleneck).
@ThomasDinges Awesome, my building system on Windows somehow refuses to work, so thanks!
@mib2berlin Thanks, certainly a good idea.
Unfortunately, my BA account was blocked by the Spamfilter, so I currently have to await manual activation...
Also, I only have access to a slow laptop and an even slower WiFi until Friday, so, while I will continue to work on the patch, I probably won't be making big progress.
Over SSH, through, I will try to run a profiler session on my PC at home, maybe this will give some more information on where the bottleneck is.
Once my BA account works, I'll post a somewhat more detailed description there. The next steps IMO are working screen-wide sample distibution (Probably the best approach is to let every tile run ~25 samples, then calculate a sample distribution. If any tile has received more samples than its allocated sample budget, it is stopped) and speed improvement (The main question here is whether BVH and cache or the sampler itself are the bottleneck).
New patch version, this time optimization! It turned out to be the sampling, the fix was quite easy...
Basically, if you have 8 max. bounces and 8 max. transparent bounces, it generated 16*12 = 192 sample values, while often only 1-2 bounces were needed. Now, the samples are generated on demand.
For testing I used 2 scenes: One, the default cube at 3 samples, and two, a level-2-subdivided glass Suzanne lying on a plane at 40 samples. Both used one thread.
By the way and highly off-topic: I just found Embree, a optimized BVH library for CPUs, written by Intel. Considering we use NVIDIA code currently on GPU and CPU, maybe Intel code would be optimized better on CPUs. The API looks quite nice, I'm currently trying it out in a private VCM implementation of mine.
New patch version, this time optimization! It turned out to be the sampling, the fix was quite easy...
Basically, if you have 8 max. bounces and 8 max. transparent bounces, it generated 16*12 = 192 sample values, while often only 1-2 bounces were needed. Now, the samples are generated on demand.
For testing I used 2 scenes: One, the default cube at 3 samples, and two, a level-2-subdivided glass Suzanne lying on a plane at 40 samples. Both used one thread.
Results Metro 14:
Glass PT: 3.4sec Metro: 12sec
Cube PT: 0.7sec Metro: 9.6sec
Metro 15:
Glass PT: 3.5sec Metro: 4sec
Cube PT: 0.7sec Metro: 1.7sec
By the way and highly off-topic: I just found Embree, a optimized BVH library for CPUs, written by Intel. Considering we use NVIDIA code currently on GPU and CPU, maybe Intel code would be optimized better on CPUs. The API looks quite nice, I'm currently trying it out in a private VCM implementation of mine.
[metropolis_15.diff](https://archive.blender.org/developer/F79930/metropolis_15.diff)
Doesn't link on Windows unfortunately: (also the log(10) issue in util_color is still present)
bf_intern_cycles.lib(device_cpu.obj) : error LNK2019: Verweis auf nicht aufgelös
tes externes Symbol ""public: __cdecl ccl::Metropolis::Metropolis(struct ccl::Ke
rnelGlobals *,double *,double *,class ccl::RenderTile *)" (??0Metropolis@ccl@@QE
AA@PEAUKernelGlobals@1@PEAN1PEAVRenderTile@1@@Z)" in Funktion ""public: void __c
decl ccl::CPUDevice::thread_path_trace(class ccl::DeviceTask &)" (?thread_path_t
race@CPUDevice@ccl@@QEAAXAEAVDeviceTask@2@@Z)".
Nice speedups!
Doesn't link on Windows unfortunately: (also the log(10) issue in util_color is still present)
```
bf_intern_cycles.lib(device_cpu.obj) : error LNK2019: Verweis auf nicht aufgelös
tes externes Symbol ""public: __cdecl ccl::Metropolis::Metropolis(struct ccl::Ke
rnelGlobals *,double *,double *,class ccl::RenderTile *)" (??0Metropolis@ccl@@QE
AA@PEAUKernelGlobals@1@PEAN1PEAVRenderTile@1@@Z)" in Funktion ""public: void __c
decl ccl::CPUDevice::thread_path_trace(class ccl::DeviceTask &)" (?thread_path_t
race@CPUDevice@ccl@@QEAAXAEAVDeviceTask@2@@Z)".
../../lib/libcycles_device.a(device_cpu.cpp.o): In function `ccl::CPUDevice::thread_path_trace(ccl::DeviceTask&)':
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x114): undefined reference to `ccl::Metropolis::Metropolis(ccl::KernelGlobals*, double*, double*, ccl::RenderTile*)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2ab): undefined reference to `ccl::Metropolis::consider_sample(float, ccl::float4, float*, float*, ccl::PassData&)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2c2): undefined reference to `ccl::Metropolis::end_sample(int)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2e5): undefined reference to `ccl::Metropolis::next_sample()'
../../lib/libcycles_kernel.a(kernel.cpp.o): In function `ccl::kernel_volume_integrate_heterogeneous(ccl::KernelGlobals*, ccl::PathState*, ccl::Ray*, ccl::ShaderData*, ccl::PathRadiance*, ccl::float3*, unsigned int*)':
kernel.cpp:(.text+0x39942): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x39a49): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
../../lib/libcycles_kernel.a(kernel.cpp.o): In function `ccl::kernel_volume_integrate_homogeneous(ccl::KernelGlobals*, ccl::PathState*, ccl::Ray*, ccl::ShaderData*, ccl::PathRadiance*, ccl::float3*, unsigned int*)':
kernel.cpp:(.text+0x3e47c): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x3e4dd): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x4062d): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
../../lib/libcycles_kernel.a(kernel.cpp.o):kernel.cpp:(.text+0x40713): more undefined references to `ccl::metro_get_sample(ccl::Metropolis*, int)' follow
collect2: error: ld returned 1 exit status
source/creator/CMakeFiles/blender.dir/build.make:281: recipe for target 'bin/blender' failed
Patch 15 fails to link for me (on linux):
```
../../lib/libcycles_device.a(device_cpu.cpp.o): In function `ccl::CPUDevice::thread_path_trace(ccl::DeviceTask&)':
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x114): undefined reference to `ccl::Metropolis::Metropolis(ccl::KernelGlobals*, double*, double*, ccl::RenderTile*)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2ab): undefined reference to `ccl::Metropolis::consider_sample(float, ccl::float4, float*, float*, ccl::PassData&)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2c2): undefined reference to `ccl::Metropolis::end_sample(int)'
device_cpu.cpp:(.text._ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]+0x2e5): undefined reference to `ccl::Metropolis::next_sample()'
../../lib/libcycles_kernel.a(kernel.cpp.o): In function `ccl::kernel_volume_integrate_heterogeneous(ccl::KernelGlobals*, ccl::PathState*, ccl::Ray*, ccl::ShaderData*, ccl::PathRadiance*, ccl::float3*, unsigned int*)':
kernel.cpp:(.text+0x39942): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x39a49): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
../../lib/libcycles_kernel.a(kernel.cpp.o): In function `ccl::kernel_volume_integrate_homogeneous(ccl::KernelGlobals*, ccl::PathState*, ccl::Ray*, ccl::ShaderData*, ccl::PathRadiance*, ccl::float3*, unsigned int*)':
kernel.cpp:(.text+0x3e47c): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x3e4dd): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
kernel.cpp:(.text+0x4062d): undefined reference to `ccl::metro_get_sample(ccl::Metropolis*, int)'
../../lib/libcycles_kernel.a(kernel.cpp.o):kernel.cpp:(.text+0x40713): more undefined references to `ccl::metro_get_sample(ccl::Metropolis*, int)' follow
collect2: error: ld returned 1 exit status
source/creator/CMakeFiles/blender.dir/build.make:281: recipe for target 'bin/blender' failed
```
We use some code from Embree 1.x, in BVH Traversal and BVH build, yes
Hmm. since embree is at 2.2 now,
i wonder if the embree bvh has been updated much since 1.x
*We use some code from Embree 1.x, in BVH Traversal and BVH build, yes*
Hmm. since embree is at 2.2 now,
i wonder if the embree bvh has been updated much since 1.x
i wonder if the embree bvh has been updated much since 1.x
According to this PDF here (their siggraph presentation for embree 2.0) There are significant speed ups... not to mention that this is only for the code to 2.0, not to mention more optimizations in 2.1 and 2.2
*i wonder if the embree bvh has been updated much since 1.x*
According to this PDF [here ](http://embree.github.io/data/embree-siggraph-2013-final.pdf) (their siggraph presentation for embree 2.0) There are significant speed ups... not to mention that this is only for the code to 2.0, not to mention more optimizations in 2.1 and 2.2
| , const btVector3&, btScalar, int)' defined but not used [-Wunused-function]
| -- |
[ 95%] C:\Blendersvn\blender\extern\bullet2\src\BulletSoftBody\btSoftBodyInterna
ls.h:507:17: warning: 'void EvaluateMedium(const btSoftBodyWorldInfo*, const btV
ector3&, btSoftBody::sMedium&)' defined but not used [-Wunused-function]
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_tex
t.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_tim
eline.c.obj
[ 95%] [ 95%] [ 95%] Building C object source/blender/makesrna/intern/CMakeFiles
/makesrna.dir/rna_tracking.c.obj
[ 95%] [ 95%] Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/sr
c/BulletSoftBody/btSoftSoftCollisionAlgorithm.cpp.obj
[ 95%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_ui.c.obj
Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/BulletSoftBo
dy/btDefaultSoftBodySolver.cpp.obj
Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/LinearMath/b
tAlignedAllocator.cpp.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_use
rdef.c.obj
[ 95%] [ 95%] Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/sr
c/LinearMath/btConvexHull.cpp.obj
Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/LinearMath/b
tConvexHullComputer.cpp.obj
[ 95%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_vfont.c.obj
[ 95%] [ 95%] Building C object source/blender/makesrna/intern/CMakeFiles/makesr
na.dir/rna_wm.c.obj
[ 96%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_world.c.obj
Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/LinearMath/b
tGeometryUtil.cpp.obj
[ 96%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_action_api.c.obj
[ 96%] [ 96%] [ 96%] Building C object source/blender/makesrna/intern/CMakeFiles
/makesrna.dir/rna_actuator_api.c.obj
[ 96%] Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/Linea
rMath/btSerializer.cpp.obj
[ 96%] Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/Linea
rMath/btVector3.cpp.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_ani
mation_api.c.obj
[ 97%] Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/Linea
rMath/btQuickprof.cpp.obj
Building CXX object extern/bullet2/CMakeFiles/extern_bullet.dir/src/LinearMath/b
tPolarDecomposition.cpp.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_armature_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_camera_api.c.obj
[ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesr
na.dir/rna_curve_api.c.obj
[ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesr
na.dir/rna_controller_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_fcu
rve_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_lattice_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_ima
ge_api.c.obj
Linking CXX static library ..\..\lib\libextern_bullet.a
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_main_api.c.obj
[ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesr
na.dir/rna_material_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_mesh_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_meta_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_object_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_tex
ture_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_pos
e_api.c.obj
[ 97%] [ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMakeFiles
/makesrna.dir/rna_scene_api.c.obj
[ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesr
na.dir/rna_sensor_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_seq
uencer_api.c.obj
[ 97%] Built target extern_bullet
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_space_api.c.obj
[ 97%] Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/
rna_text_api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_ui_
api.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/rna_wm_
api.c.obj
[ 97%] [ 97%] [ 97%] [ 97%] Building C object source/blender/makesrna/intern/CMa
keFiles/makesrna.dir/__/__/__/__/intern/guardedalloc/intern/mallocn.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/__/__/_
_/__/intern/guardedalloc/intern/mallocn_guarded_impl.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/__/__/_
_/__/intern/guardedalloc/intern/mallocn_lockfree_impl.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/makesrna.dir/__/__/_
_/__/intern/guardedalloc/intern/mmap_win.c.obj
Linking C executable ..\..\..\..\bin\makesrna.exe
[ 97%] Built target makesrna
[ 97%] Generating rna_ID_gen.c, rna_action_gen.c, rna_actuator_gen.c, rna_animat
ion_gen.c, rna_animviz_gen.c, rna_armature_gen.c, rna_boid_gen.c, rna_brush_gen.
c, rna_camera_gen.c, rna_cloth_gen.c, rna_color_gen.c, rna_constraint_gen.c, rna
_context_gen.c, rna_controller_gen.c, rna_curve_gen.c, rna_dynamicpaint_gen.c, r
na_fcurve_gen.c, rna_fluidsim_gen.c, rna_gpencil_gen.c, rna_group_gen.c, rna_ima
ge_gen.c, rna_key_gen.c, rna_lamp_gen.c, rna_lattice_gen.c, rna_linestyle_gen.c,
rna_main_gen.c, rna_mask_gen.c, rna_material_gen.c, rna_mesh_gen.c, rna_meta_ge
n.c, rna_modifier_gen.c, rna_movieclip_gen.c, rna_nla_gen.c, rna_nodetree_gen.c,
rna_object_gen.c, rna_object_force_gen.c, rna_packedfile_gen.c, rna_particle_ge
n.c, rna_pose_gen.c, rna_property_gen.c, rna_render_gen.c, rna_rigidbody_gen.c,
rna_rna_gen.c, rna_scene_gen.c, rna_screen_gen.c, rna_sculpt_paint_gen.c, rna_se
nsor_gen.c, rna_sequencer_gen.c, rna_smoke_gen.c, rna_sound_gen.c, rna_space_gen
.c, rna_speaker_gen.c, rna_test_gen.c, rna_text_gen.c, rna_texture_gen.c, rna_ti
meline_gen.c, rna_tracking_gen.c, rna_ui_gen.c, rna_userdef_gen.c, rna_vfont_gen
.c, rna_wm_gen.c, rna_world_gen.c
Running makesrna
Scanning dependencies of target bf_rna
[ 97%] [ 98%] [ 98%] [ 98%] [ 98%] [ 98%] [ 98%] Building C object source/blende
r/makesrna/intern/CMakeFiles/bf_rna.dir/rna_access.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_ID_ge
n.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_actua
tor_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_actio
n_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_anima
tion_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_animv
iz_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_armat
ure_gen.c.obj
[ 98%] [ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_boid_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_camer
a_gen.c.obj
[ 98%] C:\Blendersvn\blender\source\blender\makesrna\intern\rna_access.c: In fun
ction 'RNA_enum_is_equal':
C:\Blendersvn\blender\source\blender\makesrna\intern\rna_access.c:4757:8: warnin
g: 'cmp' may be used uninitialized in this function [-Wmaybe-uninitialized]
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_brush
_gen.c.obj
[ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_cloth_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_color
_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_const
raint_gen.c.obj
[ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_context_gen.c.obj
[ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_controller_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_curve
_gen.c.obj
[ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_dynamicpaint_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_fcurv
e_gen.c.obj
[ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_fluidsim_gen.c.obj
[ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_gpencil_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_group
_gen.c.obj
[ 98%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_image_gen.c.obj
[ 98%] [ 98%] [ 98%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_lamp_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_key_g
en.c.obj
[ 98%] [ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_lattice_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_lines
tyle_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_mask_
gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_main_
gen.c.obj
[ 99%] [ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_material_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_mesh_
gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_meta_
gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_modifier_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_movie
clip_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_nla_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_nodet
ree_gen.c.obj
[ 99%] [ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_object_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_objec
t_force_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_packe
dfile_gen.c.obj
[ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_particle_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_pose_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_prope
rty_gen.c.obj
[ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_render_gen.c.obj
[ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_rigidbody_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_rna_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_scene
_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_screen_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_sculp
t_paint_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_sensor_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_seque
ncer_gen.c.obj
[ 99%] [ 99%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_smoke_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_sound
_gen.c.obj
[100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_space_gen.c.obj
[100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_speaker_gen.c.obj
[100%] [100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_test_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_text_
gen.c.obj
[100%] [100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_texture_gen.c.obj
[100%] [100%] [100%] Building C object source/blender/makesrna/intern/CMakeFiles
/bf_rna.dir/rna_timeline_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_track
ing_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_ui_ge
n.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_userd
ef_gen.c.obj
[100%] [100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna
.dir/rna_vfont_gen.c.obj
[100%] Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rn
a_wm_gen.c.obj
Building C object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_world
_gen.c.obj
Linking C static library ..\..\..\..\lib\libbf_rna.a
[100%] Built target bf_rna
Scanning dependencies of target bf_intern_cycles
[100%] [100%] [100%] [100%] [100%] [100%] [100%] Building CXX object intern/cycl
es/blender/CMakeFiles/bf_intern_cycles.dir/blender_camera.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_mesh.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_object.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_curves.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_particles.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_python.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_session.cpp.obj
[100%] [100%] Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cyc
les.dir/blender_shader.cpp.obj
Building CXX object intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blende
r_sync.cpp.obj
Linking CXX static library ..\..\..\lib\libbf_intern_cycles.a
[100%] Built target bf_intern_cycles
Scanning dependencies of target blender
[100%] [100%] [100%] Building C object source/creator/CMakeFiles/blender.dir/cre
ator.c.obj
Building RC object source/creator/CMakeFiles/blender.dir/__/icons/winblender.rc.
obj
Building C object source/creator/CMakeFiles/blender.dir/buildinfo.c.obj
Linking CXX executable ..\..\bin\blender.exe
..\..\lib\libcycles_device.a(device_cpu.cpp.obj):device_cpu.cpp:(.text$_ZN3ccl9C
PUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_tra
ceERNS_10DeviceTaskE]+0x1c4): undefined reference to `ccl::Metropolis::Metropoli
s(ccl::KernelGlobals*, double*, double*, ccl::RenderTile*)'
..\..\lib\libcycles_device.a(device_cpu.cpp.obj):device_cpu.cpp:(.text$_ZN3ccl9C
PUDevice17thread_path_traceERNS_10DeviceTaskE[_ZN3ccl9CPUDevice17thread_path_tra
ceERNS_10DeviceTaskE]+0x3b3): undefined reference to `ccl::Metropolis::consider_
sample(float, ccl::float4, float*, float*, ccl::PassData&)'
c:/mingw/bin/../lib/gcc/x86_64-w64-mingw32/4.7.3/../../../../x86_64-w64-mingw32/
bin/ld.exe: ..\..\lib\libcycles_device.a(device_cpu.cpp.obj): bad reloc address
0x3b3 in section `.text$_ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE[_
ZN3ccl9CPUDevice17thread_path_traceERNS_10DeviceTaskE]'
collect2.exe: error: ld returned 1 exit status
source\creator\CMakeFiles\blender.dir\build.make:247: recipe for target `bin/ble
nder.exe' failed
mingw32-make- [x]: *** [bin/blender.exe] Error 1
CMakeFiles\Makefile2:6670: recipe for target `source/creator/CMakeFiles/blender.
dir/all' failed
mingw32-make- [x]: *** [source/creator/CMakeFiles/blender.dir/all] Error 2
makefile:135: recipe for target `all' failed
mingw32-make: *** [all] Error 2
c:\Blendersvn\build> |
Linking error Patch 15
Okay, my bad. I forgot to add util_importance.h/cpp to CMakeLists in the utils tolder, so it doesn't eork for CMake users. Just add them and it should work.
Okay, my bad. I forgot to add util_importance.h/cpp to CMakeLists in the utils tolder, so it doesn't eork for CMake users. Just add them and it should work.
hmmm.. i added it but now i get an error when cmake generating
| Compiling for 64 bit with MinGW-w64.
| -- |
Found unordered_map/set in std::tr1 namespace.
Blender Skipping: (bf_collada;bf_intern_ctr;bf_intern_ghostndof3dconnexion;extern_redcode)
- Found Git: C:/Program Files (x86)/Git/cmd/git.exe
Configuring done
CMake Error at intern/cycles/util/CMakeLists.txt:74 (add_library):
```
Cannot find source file:
```
```
util_importance.ccp
```
```
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx |
```
however i dont have a util_importance.cpp file in that folder.. ?
can i compile with it(leave the .cpp out an just use the .h?
umm you're correct i spelled it wrong..
however i dont have a util_importance.cpp file in that folder.. ?
can i compile with it(leave the .cpp out an just use the .h?
CMake compiles after adding this extra patch: [cmake_fix_for_metropolis_15.patch](https://archive.blender.org/developer/F80030/cmake_fix_for_metropolis_15.patch)
I'm getting an error from cmake telling me there's no util_metropolis.cpp file. Indeed, upon searching, there isn't. Diff applied against master on Win7 64.
I'm getting an error from cmake telling me there's no util_metropolis.cpp file. Indeed, upon searching, there isn't. Diff applied against master on Win7 64.
compiled successfully mingw64 cmake..
metro_15 is considerably faster than metro_14
m9105826
make sure you applied the metrolis15.diff first then apply the cmake_fix for 15.patch
try it on a fresh clean blender folder
thanks lucasstockner97 keep up the goo work. :)
lockal, thank you that did it!
compiled successfully mingw64 cmake..
metro_15 is considerably faster than metro_14
--------------
m9105826
make sure you applied the metrolis15.diff first then apply the cmake_fix for 15.patch
try it on a fresh clean blender folder
thanks lucasstockner97 keep up the goo work. :)
here is the metro_15 mingw64 cmake build of blender 2.70 with the new splash. :)
http://www.mediafire.com/download/o1azyqyqo3noq4u/Blender_mingw64_metropolis_patch_15.7z
Tried Holyenigma's build, and I have a bug report for you.
I noticed that Metropolis sampling will crash Blender whenever there's a volume material in the scene, even if the only object is the default cube.
This doesn't affect the adaptive sampling though, though it seems MingW builds will become unstable if you zoom and pan a bit while the scene is rendering.
Tried Holyenigma's build, and I have a bug report for you.
I noticed that Metropolis sampling will crash Blender whenever there's a volume material in the scene, even if the only object is the default cube.
This doesn't affect the adaptive sampling though, though it seems MingW builds will become unstable if you zoom and pan a bit while the scene is rendering.
more samples give more and more singledots noise around object
in pathtracing this noise is uniform and not cotrast, in mlt not
more samples give more and more singledots noise around object
in pathtracing this noise is uniform and not cotrast, in mlt not


Lopataasdf; Having tried Luxrender before, this is an expected side effect due to how Metropolis sampling works.
You see, Metropolis is finding lightpaths that would otherwise not be found with normal pathtracing, so the dots represent those paths and they should converge to a result that contains lighting effects that are hard to get in Cycles without the patch.
Lopataasdf; Having tried Luxrender before, this is an expected side effect due to how Metropolis sampling works.
You see, Metropolis is finding lightpaths that would otherwise not be found with normal pathtracing, so the dots represent those paths and they should converge to a result that contains lighting effects that are hard to get in Cycles without the patch.
There appears to be something weird going on in the MinGW build in the viewport. Looks to be some kind of issue with the Alpha value? Using just path tracing, transparency is turned off. Sorry if this is a known issue.
F12 rendering doesn't produce the same issue.
There appears to be something weird going on in the MinGW build in the viewport. Looks to be some kind of issue with the Alpha value? Using just path tracing, transparency is turned off. Sorry if this is a known issue.

F12 rendering doesn't produce the same issue.
Building fails for me (cmake, archlinux) with metro_15 and the cmake fix:
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_random.h:18:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:29,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_avx.cpp:39:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h: At global scope:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h:31:14: warning: 'float ccl::Mutate(float, float)' declared 'static' but never defined [-Wunused-function]
static float Mutate(const float x, const float randomValue);
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h:32:14: warning: 'float ccl::MutateScaled(float, float, float)' declared 'static' but never defined [-Wunused-function]
static float MutateScaled(const float x, const float range, const float randomValue);
^
Linking CXX static library ../../../lib/libcycles_kernel.a
[ 74%] Built target cycles_kernel
Linking CXX static library ../../../lib/libbf_freestyle.a
[ 74%] Built target bf_freestyle
Makefile:146: recipe for target 'all' failed
make[1]: *** [all] Error 2
Building fails for me (cmake, archlinux) with metro_15 and the cmake fix:
```
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_random.h:18:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:29,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_avx.cpp:39:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h: At global scope:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h:31:14: warning: 'float ccl::Mutate(float, float)' declared 'static' but never defined [-Wunused-function]
static float Mutate(const float x, const float randomValue);
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/../util/util_metropolis.h:32:14: warning: 'float ccl::MutateScaled(float, float, float)' declared 'static' but never defined [-Wunused-function]
static float MutateScaled(const float x, const float range, const float randomValue);
^
Linking CXX static library ../../../lib/libcycles_kernel.a
[ 74%] Built target cycles_kernel
Linking CXX static library ../../../lib/libbf_freestyle.a
[ 74%] Built target bf_freestyle
Makefile:146: recipe for target 'all' failed
make[1]: *** [all] Error 2
```
Holyenigma; do I really need to post a .blend, here are a few steps to see for yourself (it's real easy).
1). Open Blender
2). Give the default cube a transparent material with a scattering volume
3). Switch to Metropolis sampling and leave every other setting alone
4). Watch Blender crash when Cycles gets ready to update the render window.
And that's the thing, the crash happens not right when I hit F12, but when Cycles is about to show the initial results on the screen.
Holyenigma; do I really need to post a .blend, here are a few steps to see for yourself (it's real easy).
1). Open Blender
2). Give the default cube a transparent material with a scattering volume
3). Switch to Metropolis sampling and leave every other setting alone
4). Watch Blender crash when Cycles gets ready to update the render window.
And that's the thing, the crash happens not right when I hit F12, but when Cycles is about to show the initial results on the screen.
ace_dragon, thanks i see it now.
i also noticed mingw build wont open some of my old blend files(dingto build opens then fine)
i opened the old blend with dingto build and saved it, then it loads into mingw build
ace_dragon, thanks i see it now.
i also noticed mingw build wont open some of my old blend files(dingto build opens then fine)
i opened the old blend with dingto build and saved it, then it loads into mingw build
i figured out what was causing the .blends on mingw to crash on open..
if the .blend is saved with transparent checked under Film, uncheck it and save the .blend
i figured out what was causing the .blends on mingw to crash on open..
if the .blend is saved with transparent checked under Film, uncheck it and save the .blend
Transparent viewport rendering: Thanks, I see it. By the way: Metropolis preview seems to work, but not nearly as well as it should.
The Volume bug seems to get worse every patch, I guess it's because another commit in trunk conflicts with the patch, since before I pulled in the latest changes, it worked fine. However, if it crashes, it's maybe even better since crashes give a location where the problem is.
The single-pixel noise is quite expected. Also consider that from whar I cam see on the screenshots, the PT one has 1000 samples while the Metro one only has ~100. If you want to get rid of the noise, try lowering the max. consecutive rejects setting, but this will introduce bias.
I can load files with transparent film activated just fine, could you maybe post one that doesn't open?
Anyways, I'm going to work on this. Once I have something new, I'll post it here.
Okay, thanks for the reports.
- Transparent viewport rendering: Thanks, I see it. By the way: Metropolis preview seems to work, but not nearly as well as it should.
- The Volume bug seems to get worse every patch, I guess it's because another commit in trunk conflicts with the patch, since before I pulled in the latest changes, it worked fine. However, if it crashes, it's maybe even better since crashes give a location where the problem is.
- The single-pixel noise is quite expected. Also consider that from whar I cam see on the screenshots, the PT one has 1000 samples while the Metro one only has ~100. If you want to get rid of the noise, try lowering the max. consecutive rejects setting, but this will introduce bias.
- I can load files with transparent film activated just fine, could you maybe post one that doesn't open?
Anyways, I'm going to work on this. Once I have something new, I'll post it here.
I had another really strange issue that I'm having trouble recreating at the moment. It was in the same session where I did the transparent screen grab. After working for a few minutes and doing a couple of f12 renders, I turned viewport rendering back on and got extremely blown out lights and strange colors not previously present in the scene. Switching back to path tracing caused different, but similar results.
Additionally, path tracing and metropolis have been converging to different results. I'll post an example .blend when I'm back on my PC.
I had another really strange issue that I'm having trouble recreating at the moment. It was in the same session where I did the transparent screen grab. After working for a few minutes and doing a couple of f12 renders, I turned viewport rendering back on and got extremely blown out lights and strange colors not previously present in the scene. Switching back to path tracing caused different, but similar results.
Additionally, path tracing and metropolis have been converging to different results. I'll post an example .blend when I'm back on my PC.
Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
New patch version, cmake should now work ouf-of-the-box and the volume bug is fixed.
It turned out that the fix for #38710 caused it, now when using Metro, the bugfix is simply not applied. This doesn't mean that #38710 now appears in Metro however, because it is related to the sampler and Metro overrides that one anyways.
This is my current issue-list in random order:
Builds for Windows 32bit (OK, this isn't such a big priority).
Viewport rendering is still broken, I still haven't found out why. Clearly needs to be fixed.
I noticed that in the BMW scene, sometimes it just renders the beckground with Metro on. When disabling DOF, the issue seems to disappear, but this needs further investigation.
@MatthewHeimlich Regarding the different results, I haven't been able to reproduce this with any scene tried, a example file would indeed be useful.
@holyenigma Crashing with transparent enabled: For me, the fire extinguisher works just fine. I don't have access to a x64 Windows, so I can't try the build, but I'm currently building a x86 one to test. It'd be really useful if you could upload a crashlog.
CMake building should be fixed now, at least it works on my Linux and (at least until now) on XP.
Importance equalization (my new name for Division-By-Mean-Brightness): A feature, not a bug, but definitely useful.
New patch version, cmake should now work ouf-of-the-box and the volume bug is fixed.
It turned out that the fix for #38710 caused it, now when using Metro, the bugfix is simply not applied. This doesn't mean that #38710 now appears in Metro however, because it is related to the sampler and Metro overrides that one anyways.
This is my current issue-list in random order:
- Builds for Windows 32bit (OK, this isn't such a big priority).
- Viewport rendering is still broken, I still haven't found out why. Clearly needs to be fixed.
- I noticed that in the BMW scene, sometimes it just renders the beckground with Metro on. When disabling DOF, the issue seems to disappear, but this needs further investigation.
- @MatthewHeimlich Regarding the different results, I haven't been able to reproduce this with any scene tried, a example file would indeed be useful.
- @holyenigma Crashing with transparent enabled: For me, the fire extinguisher works just fine. I don't have access to a x64 Windows, so I can't try the build, but I'm currently building a x86 one to test. It'd be really useful if you could upload a crashlog.
- CMake building should be fixed now, at least it works on my Linux and (at least until now) on XP.
- Importance equalization (my new name for Division-By-Mean-Brightness): A feature, not a bug, but definitely useful.
Have I forgotten anything? If yes, please tell me.
[metropolis_16.diff](https://archive.blender.org/developer/F80461/metropolis_16.diff)
(Bump) Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
(Bump) Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
Just to let you know, I'm still working on the patch and making progress, the transparent viewport bug is fixed, Importance Equalization works and most of the image-wide adaptive sampling is done as well. Once everything is stable enough, I'll post it here.
Just to let you know, I'm still working on the patch and making progress, the transparent viewport bug is fixed, Importance Equalization works and most of the image-wide adaptive sampling is done as well. Once everything is stable enough, I'll post it here.
@MaciejJutrzenka OK, now that's a remarkable coincidence o.O
In fact, I've been working with VCM for over a year now for another project of mine (lightpath reuse in animated scenes, saves up to 50% of the calculations) and only one month ago I had basically the same idea (merging with vertices in the volume my adjusting the geometric couplign term). It's really awesome to see that this indeed works and has already been implemented :D
Regarding Cycles: Adding it would probably be possible, but it's not comparable to the Metro patch. You would first have to add Bidir, then add Photon-Tracing support like in PPM, then add the VCM merging code (which is particularily hard to implement correctly due to the various weighting terms) and finally add volume photon storage and beam queries. This is a HUGE amount of work and, considering that the whloe code structure of Cycles is made for Path Tracing, it would be probably hacky and messy.
By the way, the current patch takes so long because of the VCM project described above, because in 1 week there is a competition for which I still want to code a GPU implementation of my animation-VCM code, so currently it requires most of my time. Sorry for that, development will be faster afterwards.
@MaciejJutrzenka OK, now that's a remarkable coincidence o.O
In fact, I've been working with VCM for over a year now for another project of mine (lightpath reuse in animated scenes, saves up to 50% of the calculations) and only one month ago I had basically the same idea (merging with vertices in the volume my adjusting the geometric couplign term). It's really awesome to see that this indeed works and has already been implemented :D
Regarding Cycles: Adding it would probably be possible, but it's not comparable to the Metro patch. You would first have to add Bidir, then add Photon-Tracing support like in PPM, then add the VCM merging code (which is particularily hard to implement correctly due to the various weighting terms) and finally add volume photon storage and beam queries. This is a HUGE amount of work and, considering that the whloe code structure of Cycles is made for Path Tracing, it would be probably hacky and messy.
By the way, the current patch takes so long because of the VCM project described above, because in 1 week there is a competition for which I still want to code a GPU implementation of my animation-VCM code, so currently it requires most of my time. Sorry for that, development will be faster afterwards.
Interesting piece about adaptive sampling that popped up on BA today. Dade claims it's incredibly easy to implement over an existing tiled renderer.
http://www.luxrender.net/forum/viewtopic.php?f=8&t=10955&sid=8774701c5a8832e330716af5a8f74863
@MatthewHeimlich Wow, that looks really useful. I haven't read the paper yet, but it definitely sounds useful and usable for the patch. Really incredible work by the LuxRender guys there, always impementing the latest stuff.
@MatthewHeimlich Wow, that looks really useful. I haven't read the paper yet, but it definitely sounds useful and usable for the patch. Really incredible work by the LuxRender guys there, always impementing the latest stuff.
A video of the new adaptive sampling at work. I'd need to see some more complex scenes to judge definitively, but so far this is the first adaptive sampling method I've ever seen that appears to "just work".
A video of the new adaptive sampling at work. I'd need to see some more complex scenes to judge definitively, but so far this is the first adaptive sampling method I've ever seen that appears to "just work".
https://www.youtube.com/watch?feature=player_embedded&v=P_QmdpnKTW4
@MatthewHeimlich
Excellent video: now I get it... how it works.
The threshold based on noise rather than uniform for all tiles.
Please i ask again: Does this exist in compiled "ready-to-use" form or only as a patch requiring me to compile - something I don't know how to do. I use Blender 2.69 and 2.70 under Windows 7/32
@MatthewHeimlich
Excellent video: now I get it... how it works.
The threshold based on noise rather than uniform for all tiles.
Please i ask again: Does this exist in compiled "ready-to-use" form or only as a patch requiring me to compile - something I don't know how to do. I use Blender 2.69 and 2.70 under Windows 7/32
@xs If you do not know how to patch properly, i probably would suggest just sticking to the main blender... this is a work in progress and is not ready for mainstream use.
@xs If you do not know how to patch properly, i probably would suggest just sticking to the main blender... this is a work in progress and is not ready for mainstream use.
To be clear, what's shown in that video isn't available in Blender at all. It's a demonstration of the new method available in LuxRender that I linked to above. Should be relatively painless to integrate into Cycles, though.
To be clear, what's shown in that video isn't available in Blender at all. It's a demonstration of the new method available in LuxRender that I linked to above. Should be relatively painless to integrate into Cycles, though.
I'm still convinced this will be the BIGGEST performance optimization
And it's what I've been waiting for a looong time... :)
http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=2001913&viewfull=1#post2001913
http://blenderartists.org/forum/showthread.php?255683-Cycles-status-%28as-of-May-14th%29&highlight=cycles+status
http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=2370199&viewfull=1#post2370199
http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering&p=2395532&viewfull=1#post2395532
I'm still convinced this will be the BIGGEST performance optimization
Developer-question here: How do you think this should be implemented? My current plan is to treat the Samples setting as a maximum value when using the stopping criterion and add a maximum tolerated error setting. As soon as one of them is reached, the tile stops. This, however, is incompatible with progressive rendering and Metropolis, but I think we can live with that.
In fact, most of the code required for this is already implemented for the current adaptive sampling (most importantly, a per-pixel sample-number buffer), so the stopping criterion should be running quite soon. In-Tile adaptive sampling can be kept the way it is now, only using the new variance estimate (contrast to even-sample buffer).
I'm really looking forward to see how good it works in Cycles, for SLG the speedup seems quite impressive.
BTW: The viewport bug is fixed and another one that broke DoF and Motion Blur is fixed, too. Importance equalisation works as well, once the stopping criterion works, I'll publish a new patch version.
Developer-question here: How do you think this should be implemented? My current plan is to treat the Samples setting as a maximum value when using the stopping criterion and add a maximum tolerated error setting. As soon as one of them is reached, the tile stops. This, however, is incompatible with progressive rendering and Metropolis, but I think we can live with that.
In fact, most of the code required for this is already implemented for the current adaptive sampling (most importantly, a per-pixel sample-number buffer), so the stopping criterion should be running quite soon. In-Tile adaptive sampling can be kept the way it is now, only using the new variance estimate (contrast to even-sample buffer).
I'm really looking forward to see how good it works in Cycles, for SLG the speedup seems quite impressive.
BTW: The viewport bug is fixed and another one that broke DoF and Motion Blur is fixed, too. Importance equalisation works as well, once the stopping criterion works, I'll publish a new patch version.
@Lukas-132
cool, can't wait...
what do you mean when you say that the "max_sample_limit + maximum_tolerated_error" are incompatible with progressive rendering?
@JasonClarke
I actually don't know ;)
@Lukas-132
cool, can't wait...
what do you mean when you say that the "max_sample_limit + maximum_tolerated_error" are incompatible with progressive rendering?
Lukas: Sounds like the best way to do things. Halt render if either criteria is reached. For certain scenes (I have one with a half lit face where almost a full half of the samples are wasted on the side that is almost completely black that comes to mind) this would be an ENORMOUS time saver.
Lukas: Sounds like the best way to do things. Halt render if either criteria is reached. For certain scenes (I have one with a half lit face where almost a full half of the samples are wasted on the side that is almost completely black that comes to mind) this would be an ENORMOUS time saver.
I dare.
There is also one more opportunity: time halt condition. To make this possible the engine should know how long each tile takes to render, so a first pass (with a minimum sample number to be decided) should be done for all the tiles. This would give two benefits: a proper render-time estimation, and a first rough look at the whole render output.
I dare.
There is also one more opportunity: time halt condition. To make this possible the engine should know how long each tile takes to render, so a first pass (with a minimum sample number to be decided) should be done for all the tiles. This would give two benefits: a proper render-time estimation, and a first rough look at the whole render output.
Here is some more discussion from Dade and others on halt conditions/adaptive sampling. Lots of nice papers linked, lot's of good discussion of pros and cons from people who have implemented some of them.
lsscpp, I completely agree. It would be very nice to have a rendering algorithm that ran a few samples at lower sampling rates across the entire image to show a very rough preview of what's coming. Others do things this way and it makes a huge difference in catching issues early on in the render.
http://ompf2.com/viewtopic.php?f=3&t=1933
Here is some more discussion from Dade and others on halt conditions/adaptive sampling. Lots of nice papers linked, lot's of good discussion of pros and cons from people who have implemented some of them.
lsscpp, I completely agree. It would be very nice to have a rendering algorithm that ran a few samples at lower sampling rates across the entire image to show a very rough preview of what's coming. Others do things this way and it makes a huge difference in catching issues early on in the render.
The basic implementation works now, but there is a problem: correlation. Taking only even samples of a 1000-sample render gives a slightly different picture than using all samples of a 500-sample render, at least for the BMW scene. With CMJ-samples, the problem is gone. Apart from that, it seems to work quite OK, in 1-3 days the patch should be ready for a new upload.
By the way, another open question: Should the stopping use average error or maximum error inside of a tile? Currently, the code works like "average below threshold and maximum below 2*threshold".
The basic implementation works now, but there is a problem: correlation. Taking only even samples of a 1000-sample render gives a slightly different picture than using all samples of a 500-sample render, at least for the BMW scene. With CMJ-samples, the problem is gone. Apart from that, it seems to work quite OK, in 1-3 days the patch should be ready for a new upload.
By the way, another open question: Should the stopping use average error or maximum error inside of a tile? Currently, the code works like "average below threshold and maximum below 2*threshold".
IIRC LuxCore implementation uses max error inside each tile, and it sounds reasonable too. This way you always know how much the error might be.
Can't wait to test this out. Can you (ora anyone) please upload the build once it is ready?
IIRC LuxCore implementation uses max error inside each tile, and it sounds reasonable too. This way you always know how much the error *might* be.
Can't wait to test this out. Can you (ora anyone) please upload the build once it is ready?
Basically both, but I think Metro works quite fine already. Once the Importance Equalisation is stable and the adaptive stuff is done, in my opinion the patch is ready for extensive testing and code cleanup.
Basically both, but I think Metro works quite fine already. Once the Importance Equalisation is stable and the adaptive stuff is done, in my opinion the patch is ready for extensive testing and code cleanup.
First of all, sorry for the permanent delays. I underestimated the amount of work this would take and a ton of other stuff needed attention.
However, by now I solved the Sobol problem by adding a RNG dimension that decides whether the sample goes to the "even"-buffer instead of really taking only even samples, and adaptive sampling inside the tile now also is done according to the error in the pixel so it's consistent. All in all, it starts to work really good now, after I removed some debugging code and rebased to the current trunk, metropolis_17 should be ready quite soon.
Some images here to show you how good it works, all of them are pure PT and rendered for 2:30 min:
No adaptive stuff at all:
Adaptive stopping, no adaptive sampling:
Both adaptive stopping and sampling:
As you can see, the noise is distributed more uniformly in the images with adaptive stopping and especially the headlights and their reflection look better. The doors, on the other hand, are a bit more noisy. This is because the adaptive stopping doesn't work any faster, it just distributes samples differently. If the headlights are to receive more samples, some areas have to receive less.
This is the same scene (obviously), this time 8:30 min:
No adaptive stuff:
Both adaptive stopping and sampling:
First of all, sorry for the permanent delays. I underestimated the amount of work this would take and a ton of other stuff needed attention.
However, by now I solved the Sobol problem by adding a RNG dimension that decides whether the sample goes to the "even"-buffer instead of really taking only even samples, and adaptive sampling inside the tile now also is done according to the error in the pixel so it's consistent. All in all, it starts to work really good now, after I removed some debugging code and rebased to the current trunk, metropolis_17 should be ready quite soon.
Some images here to show you how good it works, all of them are pure PT and rendered for 2:30 min:
No adaptive stuff at all:

Adaptive stopping, no adaptive sampling:

Both adaptive stopping and sampling:

As you can see, the noise is distributed more uniformly in the images with adaptive stopping and especially the headlights and their reflection look better. The doors, on the other hand, are a bit more noisy. This is because the adaptive stopping doesn't work any faster, it just distributes samples differently. If the headlights are to receive more samples, some areas have to receive less.
This is the same scene (obviously), this time 8:30 min:
No adaptive stuff:

Both adaptive stopping and sampling:

For adaptive sampling, does it measure the noise in linear or in display space? If the former, I wonder if the result could be improved by doing it in display space, as linear space would underestimate the noise in dark areas.
Great work!
For adaptive sampling, does it measure the noise in linear or in display space? If the former, I wonder if the result could be improved by doing it in display space, as linear space would underestimate the noise in dark areas.
Thanks!
Currently it works in linear space (I'm assuming the RenderBuffers are in linear space), but the difference between regular and even pass is weighted by a Threshold-Versus-Intensity function.
Still, additionally weighting by the tonemapping function is probably better (as far as I remember, the paper that used the even-buffer trick even did this). If I remember correctly, the linear-to-sRGB is mainly a pow() operation, the speed impact shouldn't be that high (also, the noise estimation is currently only done every 25 samples).
Thanks!
Currently it works in linear space (I'm assuming the RenderBuffers are in linear space), but the difference between regular and even pass is weighted by a Threshold-Versus-Intensity function.
Still, additionally weighting by the tonemapping function is probably better (as far as I remember, the paper that used the even-buffer trick even did this). If I remember correctly, the linear-to-sRGB is mainly a pow() operation, the speed impact shouldn't be that high (also, the noise estimation is currently only done every 25 samples).
I'd be interested to see this on a scene without lots of glossy noise, something where shadows or indirect noise take up a good chunk of the render time with master Cycles. Something like a half lit face should benefit quite a bit from adaptive sampling like this.
I'd be interested to see this on a scene without lots of glossy noise, something where shadows or indirect noise take up a good chunk of the render time with master Cycles. Something like a half lit face should benefit quite a bit from adaptive sampling like this.
as m9105826 says. test could be done on a more difficult scene as this: http://www.blenderartists.org/forum/showthread.php?331149-The-new-Cyles-GPU-2-70-Benchmark
New patch version, the adaptive stuff got a massive update, importance equalisation now works again and CUDA/OpenCL should build again (without Metro support). Adaptive stopping works on CPU and GPU, although it might slow down GPU a bit since every test has to fetch the buffers from device memory. I didn't test it, but in ~1 week I'll get access to a CUDA 2.0-capable system so I can fix the CUDA support. Adaptive sampling will get GPU support, but it's not finished yet.
For adaptive stopping, set the Stopping Threshold value under the Performance tab to a value > 0. The adaptive stopping works parallel to the classic samples-value stopping, so if a tile hits the specified sample number, it's stopped regardless of whether its error is below threshold or not. In my experience, something around 1 gives a rough-preview-quality result while values <= 0.25 appear noise-free. The test whether the tile is done starts after some warmup samples have passed and is performed in a specifix interval, the settings for these are under the threshold. The checkbox will activate adaptive sampling inside the tiles. By the way, adaptive stopping also works for Metro.
In Metro mode, the adaptive sampling checkbox activates Importance equalisation. This setting basically causes the Metro sampler to distribute the samples more uniformly instead of sampling according to brightness (technically, it samples acccording to path brightness divided by average pixel brightness, therefor still favoring high-energy paths). Sometimes this helps really a lot, in other scenes it's pretty useless.
@lsscpp
Adaptive stopping looks at the tile and decides whether it's done or not, while adaptive sampling distributes the samples inside the tile to areas of high error.
This is useful, for example, if the tile is at the boundary of Background/Object, since without adaptive sampling, all of the tile will be sampled until the error is low enough. With adaptive sampling, however, the Object will receive more samples than the background.
Basically, adaptive sampling is what the patch already did, while adaptive stopping is the new feature.
By the way, in the 2.70 benchmark, there is nearly no improvement since the noise is already distributed quite evenly.
New patch version, the adaptive stuff got a massive update, importance equalisation now works again and CUDA/OpenCL should build again (without Metro support). Adaptive stopping works on CPU and GPU, although it might slow down GPU a bit since every test has to fetch the buffers from device memory. I didn't test it, but in ~1 week I'll get access to a CUDA 2.0-capable system so I can fix the CUDA support. Adaptive sampling will get GPU support, but it's not finished yet.
For adaptive stopping, set the Stopping Threshold value under the Performance tab to a value > 0. The adaptive stopping works parallel to the classic samples-value stopping, so if a tile hits the specified sample number, it's stopped regardless of whether its error is below threshold or not. In my experience, something around 1 gives a rough-preview-quality result while values <= 0.25 appear noise-free. The test whether the tile is done starts after some warmup samples have passed and is performed in a specifix interval, the settings for these are under the threshold. The checkbox will activate adaptive sampling inside the tiles. By the way, adaptive stopping also works for Metro.
In Metro mode, the adaptive sampling checkbox activates Importance equalisation. This setting basically causes the Metro sampler to distribute the samples more uniformly instead of sampling according to brightness (technically, it samples acccording to path brightness divided by average pixel brightness, therefor still favoring high-energy paths). Sometimes this helps really a lot, in other scenes it's pretty useless.
@lsscpp
Adaptive stopping looks at the tile and decides whether it's done or not, while adaptive sampling distributes the samples inside the tile to areas of high error.
This is useful, for example, if the tile is at the boundary of Background/Object, since without adaptive sampling, all of the tile will be sampled until the error is low enough. With adaptive sampling, however, the Object will receive more samples than the background.
Basically, adaptive sampling is what the patch already did, while adaptive stopping is the new feature.
By the way, in the 2.70 benchmark, there is nearly no improvement since the noise is already distributed quite evenly.
[metropolis_17.diff](https://archive.blender.org/developer/F85547/metropolis_17.diff)
Hi Lucas, patch not apply on 1e6fa59 from today and get build error:
Hunk #1 FAILED at 64.
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
With patch -p1 < metropolis_17.diff
[ 83%] Built target cycles_device
[ 83%] Building CXX object intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/kernel/geom/geom.h:40:0,
from /daten/blender-git/blender/intern/cycles/kernel/kernel_path.h:27,
from /daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:25:
/daten/blender-git/blender/intern/cycles/kernel/geom/geom_curve.h: In function ‘bool ccl::bvh_curve_intersect(ccl::KernelGlobals*, ccl::Intersection*, ccl::float3, ccl::float3, ccl::uint, int, int, float, int, ccl::uint*, float, float)’:
/daten/blender-git/blender/intern/cycles/kernel/geom/geom_curve.h:785:8: warning: variable ‘backface’ set but not used [-Wunused-but-set-variable]
bool backface = false;
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘float ccl::kernel_cpu_metropolis_first_pass(ccl::KernelGlobals*, ccl::uint*, int, int, ccl::PassData*, int, int, int)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:114:85: error: ‘kernel_metropolis_first_pass’ was not declared in this scope
return kernel_metropolis_first_pass(kg, rng_state, x, y, pd, offset, stride, sample);
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘ccl::float4 ccl::kernel_cpu_metropolis_path_trace(ccl::KernelGlobals*, float*, float, float, ccl::PassData*)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:119:65: error: ‘kernel_metropolis_path_trace’ was not declared in this scope
return kernel_metropolis_path_trace(kg, randomSamples, x, y, pd);
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘float ccl::kernel_cpu_metropolis_first_pass(ccl::KernelGlobals*, ccl::uint*, int, int, ccl::PassData*, int, int, int)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:115:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘ccl::float4 ccl::kernel_cpu_metropolis_path_trace(ccl::KernelGlobals*, float*, float, float, ccl::PassData*)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:120:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel.cpp.o] Fehler 1
make[1]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all] Fehler 2
make: *** [all] Fehler 2
Thanks, mib
Hi Lucas, patch not apply on 1e6fa59 from today and get build error:
Hunk #1 FAILED at 64.
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
With patch -p1 < metropolis_17.diff
```
[ 83%] Built target cycles_device
[ 83%] Building CXX object intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/kernel/geom/geom.h:40:0,
from /daten/blender-git/blender/intern/cycles/kernel/kernel_path.h:27,
from /daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:25:
/daten/blender-git/blender/intern/cycles/kernel/geom/geom_curve.h: In function ‘bool ccl::bvh_curve_intersect(ccl::KernelGlobals*, ccl::Intersection*, ccl::float3, ccl::float3, ccl::uint, int, int, float, int, ccl::uint*, float, float)’:
/daten/blender-git/blender/intern/cycles/kernel/geom/geom_curve.h:785:8: warning: variable ‘backface’ set but not used [-Wunused-but-set-variable]
bool backface = false;
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘float ccl::kernel_cpu_metropolis_first_pass(ccl::KernelGlobals*, ccl::uint*, int, int, ccl::PassData*, int, int, int)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:114:85: error: ‘kernel_metropolis_first_pass’ was not declared in this scope
return kernel_metropolis_first_pass(kg, rng_state, x, y, pd, offset, stride, sample);
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘ccl::float4 ccl::kernel_cpu_metropolis_path_trace(ccl::KernelGlobals*, float*, float, float, ccl::PassData*)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:119:65: error: ‘kernel_metropolis_path_trace’ was not declared in this scope
return kernel_metropolis_path_trace(kg, randomSamples, x, y, pd);
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘float ccl::kernel_cpu_metropolis_first_pass(ccl::KernelGlobals*, ccl::uint*, int, int, ccl::PassData*, int, int, int)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:115:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp: In function ‘ccl::float4 ccl::kernel_cpu_metropolis_path_trace(ccl::KernelGlobals*, float*, float, float, ccl::PassData*)’:
/daten/blender-git/blender/intern/cycles/kernel/kernel.cpp:120:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel.cpp.o] Fehler 1
make[1]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all] Fehler 2
make: *** [all] Fehler 2
```
Thanks, mib
For Metro Octane Render Team made some custom implementation. Most likely it's patented but maybe we can find a similar hybrid solution.
What do you think?
For Metro Octane Render Team made some custom implementation. Most likely it's patented but maybe we can find a similar hybrid solution.
What do you think?
They use ERPT with Population Monte Carlo I think. It's not clear to me that this would be better than MLT combined with the adaptive sampling.
I suggest to look at the "Multiplexed Metropolis Light Transport" paper when it comes out, I saw some impressive results from that (sorry, I have no link) and it should fit well here.
They use ERPT with Population Monte Carlo I think. It's not clear to me that this would be better than MLT combined with the adaptive sampling.
I suggest to look at the "Multiplexed Metropolis Light Transport" paper when it comes out, I saw some impressive results from that (sorry, I have no link) and it should fit well here.
Well I and many people out there want GPU support, so whatever works for that will be good.
Also Octane is getting out of core textures(no more GPU memory limit). Any chance to get something like that for cycles
Well I and many people out there want GPU support, so whatever works for that will be good.
Also Octane is getting out of core textures(no more GPU memory limit). Any chance to get something like that for cycles
OK, here is a corrected version. One line in kernel_types.h was different between trunk and patch.
Regarding the Octane method: Running a large number of concurrent samplers on GPUs would be possible, it's quite similar to ERPT and LuxRender already uses it. However, I'm not too sure whether this is any better than PT since fewer, longer MCMC chains should explore the sample space more evenly.
The Multiplexed Metropolis paper sounds definitely interesting, I'll look at it once it's released.
OK, here is a corrected version. One line in kernel_types.h was different between trunk and patch.
Regarding the Octane method: Running a large number of concurrent samplers on GPUs would be possible, it's quite similar to ERPT and LuxRender already uses it. However, I'm not too sure whether this is any better than PT since fewer, longer MCMC chains should explore the sample space more evenly.
The Multiplexed Metropolis paper sounds definitely interesting, I'll look at it once it's released.
[metropolis_17_fixed.diff](https://archive.blender.org/developer/F85824/metropolis_17_fixed.diff)
@MatthewHeimlich I've been trying to setup a VC2012 environment to reproduce your errors (also W7 64), but can't get it to work properly. The instructions in the Wiki still use the SVN trunk, it works fine with this one. However, once I use the Git trunk, CMake fails to find the libs. My folder layout is "E:/Blender/blender" for the trunk, "E:/Blender/build" for CMake and "E:/Blender/libs/win64_vc11" for the precompiled libs (still from SVN). CMake won't find Boost, OpenJPEG and Python 3.4 (the precompiled libs still have Python 3.3) and stops afterwards. Could you tell me where you got your libs from or did you compile them yourself?
Regarding the error: I suppose it complains about linear_rgb_to_gray? If yes, a workaround would be to rename the new float4 version to linear_rgb_to_gray4 and changing the occurences in device/device_cpu.cpp (after the Metropolis kernel call) and in render/buffers.cpp (in build_importance_map()). Of course, this still needs to be fixed properly, but GCC doesn't seem to care about it at all...
@MatthewHeimlich I've been trying to setup a VC2012 environment to reproduce your errors (also W7 64), but can't get it to work properly. The instructions in the Wiki still use the SVN trunk, it works fine with this one. However, once I use the Git trunk, CMake fails to find the libs. My folder layout is "E:/Blender/blender" for the trunk, "E:/Blender/build" for CMake and "E:/Blender/libs/win64_vc11" for the precompiled libs (still from SVN). CMake won't find Boost, OpenJPEG and Python 3.4 (the precompiled libs still have Python 3.3) and stops afterwards. Could you tell me where you got your libs from or did you compile them yourself?
Regarding the error: I suppose it complains about linear_rgb_to_gray? If yes, a workaround would be to rename the new float4 version to linear_rgb_to_gray4 and changing the occurences in device/device_cpu.cpp (after the Metropolis kernel call) and in render/buffers.cpp (in build_importance_map()). Of course, this still needs to be fixed properly, but GCC doesn't seem to care about it at all...
@ThomasDinges Thanks, works great (apart from an error complaining that lib\win64_vc12\release\python34_numpy_1.8.tar.gz is missing, but it doesn't seem to matter) @MatthewHeimlich OK, I fixed the error. It's not the linear_rgb_to_gray(), but the logarithm in "return exp(log_i * log(10))", in the linear_gray_to_tvi() function. Changing "10" to "10.0f" fixes it.
@ThomasDinges Thanks, works great (apart from an error complaining that lib\win64_vc12\release\python34_numpy_1.8.tar.gz is missing, but it doesn't seem to matter)
@MatthewHeimlich OK, I fixed the error. It's not the linear_rgb_to_gray(), but the logarithm in "return exp(log_i * log(10))", in the linear_gray_to_tvi() function. Changing "10" to "10.0f" fixes it.
Oh, and another thing, "struct RenderTile;" in util_metropolis.h needs to be changed to "class RenderTile;"
Another, probably unrelated thing: In a test scene, when I enable Cycles and go to the Material tab, I get an "Assertion failed" in the python34_d.dll ( ../Python/ceval.c, different lines, but always "!PyErr_Occured()" ), even on trunk. Is this normal?
Oh, and another thing, "struct RenderTile;" in util_metropolis.h needs to be changed to "class RenderTile;"
Another, probably unrelated thing: In a test scene, when I enable Cycles and go to the Material tab, I get an "Assertion failed" in the python34_d.dll ( ../Python/ceval.c, different lines, but always "!PyErr_Occured()" ), even on trunk. Is this normal?
Since I just setup my Windows toolchain, I thought I'll just post my build here. It's based on today's trunk, built with VC2013 for Windows x64 (x32 will follow soon). It lacks some features (mainly GPU and OSL), but the standard stuff should work.
I hope I included every needed file, if something is missing, please tell me and I'll post it. By the way, the patch doesn't affect performance in classical PT without the new options in any way, in fact, on my machine this build renders faster than the 2.70a release.
Since I just setup my Windows toolchain, I thought I'll just post my build here. It's based on today's trunk, built with VC2013 for Windows x64 (x32 will follow soon). It lacks some features (mainly GPU and OSL), but the standard stuff should work.
I hope I included every needed file, if something is missing, please tell me and I'll post it. By the way, the patch doesn't affect performance in classical PT without the new options in any way, in fact, on my machine this build renders faster than the 2.70a release.
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
WOW. I tested the scene I had spoken about previously. What a savings! Image quality difference is negligible, but a bit over 75% faster render times. Excellent work!
WOW. I tested the scene I had spoken about previously. What a savings! Image quality difference is negligible, but a bit over 75% faster render times. Excellent work!


That was with only 6 of my 8 cores, btw. Tolerated error set to 3, update rate at 5, warmup at 10. Let me know if you think there are even better settings.
That was with only 6 of my 8 cores, btw. Tolerated error set to 3, update rate at 5, warmup at 10. Let me know if you think there are even better settings.
rendering freezed, when i use small render samples and corelatted multijitter

[BMW1M_adptiv_sampl.blend](https://archive.blender.org/developer/F86781/BMW1M_adptiv_sampl.blend)
@MatthewHeimlich Well, that's quite an improvement. Scenes like this are of course best-case for adaptive sampling, but especially big background areas are quite common.
Regarding the settings: The tolerated error sounds quite high to me, but indeed the rendered image looks just fine, so I guess it's okay.
The other two settings seem reasonable, basically they depend on how expansive tracing a single sample is (when the tile is rendered at 100 samples/sec, checking for convergence every 5 sample might give a significant slowdown, while for something like 5 samples/sec it's just fine).
This gives me another idea: Maybe a time-based update rate would be better than a sample-based one?
@lopataasdf Yes, this scene freezes for me too. Actually (or at least it seems like this to me), it's not freezing completely, but somehow blocking the GUI while rendering, since after ~1min it shortly got responsive for me again. I'll try to reproduce this on Linux since I have no idea of MSVC debugging...
Could someone try to reproduce this with another Windows build to see whether it's related to this specific build, the compiler, the OS or the whole patch?
@MatthewHeimlich Well, that's quite an improvement. Scenes like this are of course best-case for adaptive sampling, but especially big background areas are quite common.
Regarding the settings: The tolerated error sounds quite high to me, but indeed the rendered image looks just fine, so I guess it's okay.
The other two settings seem reasonable, basically they depend on how expansive tracing a single sample is (when the tile is rendered at 100 samples/sec, checking for convergence every 5 sample might give a significant slowdown, while for something like 5 samples/sec it's just fine).
This gives me another idea: Maybe a time-based update rate would be better than a sample-based one?
@lopataasdf Yes, this scene freezes for me too. Actually (or at least it seems like this to me), it's not freezing completely, but somehow blocking the GUI while rendering, since after ~1min it shortly got responsive for me again. I'll try to reproduce this on Linux since I have no idea of MSVC debugging...
Could someone try to reproduce this with another Windows build to see whether it's related to this specific build, the compiler, the OS or the whole patch?
Time based could be a problem when the scene is rendered on another computer with much different performance (say, an old workstation that retired to the render farm).
What about setting warmup/update as a percentage of the total number of AA samples? Like, every 10% of completion, do the update?
Time based could be a problem when the scene is rendered on another computer with much different performance (say, an old workstation that retired to the render farm).
What about setting warmup/update as a percentage of the total number of AA samples? Like, every 10% of completion, do the update?
I know you talked about "vc2013 makes Cycles 15-20% faster for free. Could it be because of the "MSVCR120.dll" included should be "MSVCR130.dll" - or is this unrelated to your usage of "vc2013"?
My system only has a RADEON 6450 GPU, must this build run on a system with a CUDA GPU?
My system is a stock Dell XPS8300 with Windows 7 Home premium 64bit, i7 CPU, 12 GB ram
Thank You
Your build
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
won't load on my Win 7 PC, it crashes before ever loading.
I know you talked about "vc2013 makes Cycles 15-20% faster for free. Could it be because of the "MSVCR120.dll" included should be "MSVCR130.dll" - or is this unrelated to your usage of "vc2013"?
Problem signature:
```
Problem Event Name: APPCRASH
Application Name: blender.exe
Application Version: 2.7.0.0
Application Timestamp: 5363f045
Fault Module Name: MSVCR120.dll
Fault Module Version: 12.0.21005.1
Fault Module Timestamp: 524f83ff
```
My system only has a RADEON 6450 GPU, must this build run on a system with a CUDA GPU?
My system is a stock Dell XPS8300 with Windows 7 Home premium 64bit, i7 CPU, 12 GB ram
Thank You
Just replaced the RADEON 6450 with a nVidea 550ti, and this build still won't load in my particular system.
Also the Windows 7 Home premium 64bit is service pack 1 - I am leery of MS service packs.. :)
Thank You
Just replaced the RADEON 6450 with a nVidea 550ti, and this build still won't load in my particular system.
Also the Windows 7 Home premium 64bit is service pack 1 - I am leery of MS service packs.. :)
Thank You
found out why it wouldn't load - there needs to be a folder named " 2.70" , with only the folders you already included ( datafiles, python, scripts) inside of it , then it opened!
Looking forward to trying it out
Thank You
found out why it wouldn't load - there needs to be a folder named " 2.70" , with only the folders you already included ( datafiles, python, scripts) inside of it , then it opened!
Looking forward to trying it out
Thank You
rendering of "BMW1M_adptiv_sampl.blend" freezes after about 27/150 tiles (64x64), and continues to consume 98% of CPU power - have to "kill" (end task) blender using task manager, takes about 5 seconds to "kill" blender after clicking button, and minute to get to task manager, cuase the PC is responding so slow. After stopping blender the PC runs fine & can try another instance again without any noticeable problems
thanks
craig
rendering of "BMW1M_adptiv_sampl.blend" freezes after about 27/150 tiles (64x64), and continues to consume 98% of CPU power - have to "kill" (end task) blender using task manager, takes about 5 seconds to "kill" blender after clicking button, and minute to get to task manager, cuase the PC is responding so slow. After stopping blender the PC runs fine & can try another instance again without any noticeable problems
thanks
craig
tried it with 96x96 tiles, still crashes on render (13/60) and hangs the CPU drawing almost full CPU power until I force task manager to stop blender.
Thanks again
tried it with 96x96 tiles, still crashes on render (13/60) and hangs the CPU drawing almost full CPU power until I force task manager to stop blender.
Thanks again

different noise level

[BlenderGuru_InteriorRenderingStarter.blend](https://archive.blender.org/developer/F86888/BlenderGuru_InteriorRenderingStarter.blend)
@craigar I'll look into the freezing, it also happens in my Linux build. @lopataasdf The different noise level is quite expected, since the error of the tile is currently based on an average over the tile and the max error in the tile. In your example, the right-side part of the low-noise tile probably needed more samples, so the left part also got sampled mode. In the high-noise tile, the error is more uniform so the rendering stops earlier. This is the reason why I also added the in-tile adaptive sampling (it looks to me as if you didn't use it, if you did, please tell me), which would give the right side of the low a noise tile more priority, making the tile pass the convergence test in less samples.
By the way: The smaller tiles are, the more fine-grained the adaptive stopping works. I usually use 8x8 or 16x16 in my tests, which usually (at least on CPU) is faster than large tiles. @mib2berlin Yes, the bake merge broke the metro_17, I'm currently rebasing the patch. Once I'm done, I'll publish metro_18 (also with the MSVC fixes).
Regarding splitting the patch: There are some features used by both Metro and Adaptive (most importantly the Samples pass), so a clean split is quite difficult. My suggestion would be to setup a "advanced sampling" branch that could be merged to trunk feature by feature (Basic Adaptive Stopping, Ada-Sampling, Metro) once it's tested enough and the codestyle is fine.
@craigar I'll look into the freezing, it also happens in my Linux build.
@lopataasdf The different noise level is quite expected, since the error of the tile is currently based on an average over the tile and the max error in the tile. In your example, the right-side part of the low-noise tile probably needed more samples, so the left part also got sampled mode. In the high-noise tile, the error is more uniform so the rendering stops earlier. This is the reason why I also added the in-tile adaptive sampling (it looks to me as if you didn't use it, if you did, please tell me), which would give the right side of the low a noise tile more priority, making the tile pass the convergence test in less samples.
By the way: The smaller tiles are, the more fine-grained the adaptive stopping works. I usually use 8x8 or 16x16 in my tests, which usually (at least on CPU) is faster than large tiles.
@mib2berlin Yes, the bake merge broke the metro_17, I'm currently rebasing the patch. Once I'm done, I'll publish metro_18 (also with the MSVC fixes).
Regarding splitting the patch: There are some features used by both Metro and Adaptive (most importantly the Samples pass), so a clean split is quite difficult. My suggestion would be to setup a "advanced sampling" branch that could be merged to trunk feature by feature (Basic Adaptive Stopping, Ada-Sampling, Metro) once it's tested enough and the codestyle is fine.
Had a look at updating the patch (since some changes I made might have caused conflicts),
However it looks like ScenePassType now has no bits left, (so perhaps it has to be extended to int64_t)
Had a look at updating the patch (since some changes I made might have caused conflicts),
However it looks like `ScenePassType` now has no bits left, (so perhaps it has to be extended to `int64_t`)
This patch may be almost ready to simply commit to trunk as an experimental feature, I'm currently testing adaptive metropolis sampling with adaptive stopping on a very difficult scene and several areas have already shown to converge an exponential amount faster than with generic pathtracing (more than 20x faster at the least).
If it's that intertwined, simply committing it with the current feature set might actually be less work.
This patch may be almost ready to simply commit to trunk as an experimental feature, I'm currently testing adaptive metropolis sampling with adaptive stopping on a very difficult scene and several areas have already shown to converge an exponential amount faster than with generic pathtracing (more than 20x faster at the least).
If it's that intertwined, simply committing it with the current feature set might actually be less work.
The code needs to go through extensive code review first via our Differential system here: https://developer.blender.org/differential/query/open/
For 2.71 it's also too late.
Before submitting for code review, there are still some issues I'd like to fix:
Adaptive Sampling for GPU: Currently, Adaptive stopping should already work on GPU (could someone test this?), but the sampling doesn't. The sampling of the pixels could be done on the CPU and be transferred to the Kernel or also done on the GPU, but this would require a copy of the noise map.
Progressive + Adaptive Stopping: Currently, these don't work together. My solution would be that when both are activated, the tolerated error is lowered every iteration.
The Importance Equalisation requires Progressive rendering to work, this should work automatically.
Metro still ignores Noise, this should also be added.
Error estimation for Metro seems to be broken.
The code style and source comments need to be improved.
Before submitting for code review, there are still some issues I'd like to fix:
- Adaptive Sampling for GPU: Currently, Adaptive stopping should already work on GPU (could someone test this?), but the sampling doesn't. The sampling of the pixels could be done on the CPU and be transferred to the Kernel or also done on the GPU, but this would require a copy of the noise map.
- Progressive + Adaptive Stopping: Currently, these don't work together. My solution would be that when both are activated, the tolerated error is lowered every iteration.
- The Importance Equalisation requires Progressive rendering to work, this should work automatically.
- Metro still ignores Noise, this should also be added.
- Error estimation for Metro seems to be broken.
- The code style and source comments need to be improved.
@00Ghz Whether it will be in 2.72 depends on how fast it passes code review. There's a chance, but it's not sure. Regarding GPU Metro: As I said before, it would be possible, but I think it would only be slightly better than PT and not worth the effort. Maybe I'm wrong with this, but definitely, GPU-Metro will never reach CPU-Metro (since it requires more and shorter Markov-Chains) in Terms of Quality per Sample. So: Maybe later, if it actually gives a benefit, but it's currently no priority for me. @lsscpp It's not hard at all, actually. This would require a per-pixel rendering as in the old Blender-Internal times, which might even improve Cache Coherency (and therefore speed). The Pixels could be walked with a Hilbert curve for even more Cache performance, with one thread rendering one pixel.
The main problem would be pixels falsely being considered converged. For 64 pixels this is more unlikely since the errors in the estimation cancel each other out quite well. This might be solved by rendering each pixel ~10 samples more after it converged and then testing convergence again.
So, to conclude, it would be definitely possible and is an interesting option for future development.
@00Ghz Whether it will be in 2.72 depends on how fast it passes code review. There's a chance, but it's not sure. Regarding GPU Metro: As I said before, it would be possible, but I think it would only be slightly better than PT and not worth the effort. Maybe I'm wrong with this, but definitely, GPU-Metro will never reach CPU-Metro (since it requires more and shorter Markov-Chains) in Terms of Quality per Sample. So: Maybe later, if it actually gives a benefit, but it's currently no priority for me.
@lsscpp It's not hard at all, actually. This would require a per-pixel rendering as in the old Blender-Internal times, which might even improve Cache Coherency (and therefore speed). The Pixels could be walked with a Hilbert curve for even more Cache performance, with one thread rendering one pixel.
The main problem would be pixels falsely being considered converged. For 64 pixels this is more unlikely since the errors in the estimation cancel each other out quite well. This might be solved by rendering each pixel ~10 samples more after it converged and then testing convergence again.
So, to conclude, it would be definitely possible and is an interesting option for future development.
Another couple of test renders, this time at 720p resolution. Still only using 6 of 8 cores.
Another couple of test renders, this time at 720p resolution. Still only using 6 of 8 cores.


Lukas; Okay, thanks for the full rundown on the remaining issues and points that need to be resolved before it gets submitted to review, I assume that you know a lot better of what needs to be done than what testers are finding.
I can only imagine how much better it will be by then once those things are resolved :)
Lukas; Okay, thanks for the full rundown on the remaining issues and points that need to be resolved before it gets submitted to review, I assume that you know a lot better of what needs to be done than what testers are finding.
I can only imagine how much better it will be by then once those things are resolved :)
Also, I think I have found that Metro actually seems to work with adaptive stopping when you're not using the 'progressive refine' mode, at least I had a very quick test render stop on its own after a minute with a high error tolerance.
Also, I think I have found that Metro actually seems to work with adaptive stopping when you're not using the 'progressive refine' mode, at least I had a very quick test render stop on its own after a minute with a high error tolerance.
Ok... I've been following closely and testing every build that has came out here...
And I have to say that this last build has given my CPU ~4 times faster renders than my GPU, when in the past, my GPU setup gives me 8x faster renders... My CPU is a rather old Intel Q9550 O.C. to 3Ghz... and my GPUs are two Nvidia GeForce 550ti.
This whole development is incredible, not only the Metro part, but the whole Adaptive Sampling and Importance Equalization give real-live-production renders an incredible boost in convergence efficiency!
Huge kudos to you, Lukas! Please, keep up the great work!
Ok... I've been following closely and testing every build that has came out here...
And I have to say that this last build has given my CPU ~4 times faster renders than my GPU, when in the past, my GPU setup gives me 8x faster renders... My CPU is a rather old Intel Q9550 O.C. to 3Ghz... and my GPUs are **two** Nvidia GeForce 550ti.
This whole development is incredible, not only the Metro part, but the whole Adaptive Sampling and Importance Equalization give real-live-production renders an incredible boost in convergence efficiency!
Huge kudos to you, Lukas! Please, keep up the great work!
Do I read well: now your CPU rendering is 4 times faster than GPU... instead of 8 times slower??
This means 32 times faster in CPU/CPU comparison!! Or in other words, 3200% speed boost, or also rendering in ~3% of the time!
Ok, let's say that I misunderstand something here...
Do I read well: now your CPU rendering is 4 times faster than GPU... instead of 8 times slower??
This means 32 times faster in CPU/CPU comparison!! Or in other words, 3200% speed boost, or also rendering in ~3% of the time!
Ok, let's say that I misunderstand something here...
Yes indeed, my tiny test showed way lower speed gains. Indeed almost no speed gain for bit-noisy renders. But this is probably my fault.
BTW I read this - http://lists.blender.org/pipermail/bf-cycles/2014-May/001921.html - on the mailing list And wondered if it could be used for some further balancing in the (adaptive) sampling process. Reading Brecht answer sounds like I totally misunderstood the paper though...
Yes indeed, my tiny test showed way lower speed gains. Indeed almost no speed gain for bit-noisy renders. But this is probably my fault.
BTW I read this - http://lists.blender.org/pipermail/bf-cycles/2014-May/001921.html - on the mailing list And wondered if it could be used for some further balancing in the (adaptive) sampling process. Reading Brecht answer sounds like I totally misunderstood the paper though...
Yes, I'm saying that now my " CPU rendering is 4 times faster than GPU... instead of 8 times slower"...
I'm very aware that the scene I rendered was something of a "best case" scenario and not in all cases the speed improvement was that good, but it was better by 2x at the very least!
Here's a couple of renders that took around 8h (each) in CPU while they would take around 30-36h (each) in GPU. (original 1920x1080).
Yes, I'm saying that now my " CPU rendering is 4 times faster than GPU... instead of 8 times slower"...
I'm very aware that the scene I rendered was something of a "best case" scenario and not in all cases the speed improvement was that good, but it was better by 2x at the very least!
Here's a couple of renders that took around 8h (each) in CPU while they would take around 30-36h (each) in GPU. (original 1920x1080).


using the build lukasstockner97 posted Fri, May 2, 10:11 PM
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
it is only about 30% quicker on my i7 than just 2.7 cycles in a simple scene, that has a complex "diamond shader" from
http://www.blendswap.com/blends/view/39307
Where can I get newer builds of this?
Thank you

@craigar Currently nowhere since metro_17 was (up to now) the newest patch, I'll upload a 18-build shortly (most likely tomorrow).
OK, metropolis_18 is finished now. I included the MSVC fixes, so it should work for it as well (unless the new code broke it). CUDA works now, I tested it and everything compiles and runs (No, still no Metro on GPU). Adaptive sampling code now works on CUDA as well, but it gives results that are inferior to CPU and I currently have no idea why (it's still better than no adaptive, through). However, the performance of GPU rendering drops when using adaptive sampling, so that in some cases running without it might even give better results in the same time.
All in all, I changed quite a lot in both adaptive features, for example, now the adaptive stopping uses a power mean with p=4 instead of a regular quadratic mean, this gives more influence to the higher errors in the tile. Basically, the higher p is, the more the extremes are pronounced (1 gives an average, 2 the "regular" quadratic mean, and in theory infinity would give a simple max operation). Some other things were also changed, so you probably have to re-tune your parameters.
The progressive + adaptive stopping combination doesn't work yet, but the changed code layout makes it way easier now.
Importance equalisation now automatically switches on progressive rendering (for it to work well, set a low adaptive interval like 5).
I've worked a bit on codestyle, but its still not quite good yet.
Another thing: I removed the adaptive warmup, it now just uses one map interval as the warmup, but I'll probably re-add it.
Metro_19 might take a while since I currently have quite a lot of other things to do, but I'll still follow this thread. Real development will continue around the beginning of June.
@craigar Currently nowhere since metro_17 was (up to now) the newest patch, I'll upload a 18-build shortly (most likely tomorrow).
OK, metropolis_18 is finished now. I included the MSVC fixes, so it should work for it as well (unless the new code broke it). CUDA works now, I tested it and everything compiles and runs (No, still no Metro on GPU). Adaptive sampling code now works on CUDA as well, but it gives results that are inferior to CPU and I currently have no idea why (it's still better than no adaptive, through). However, the performance of GPU rendering drops when using adaptive sampling, so that in some cases running without it might even give better results in the same time.
All in all, I changed quite a lot in both adaptive features, for example, now the adaptive stopping uses a power mean with p=4 instead of a regular quadratic mean, this gives more influence to the higher errors in the tile. Basically, the higher p is, the more the extremes are pronounced (1 gives an average, 2 the "regular" quadratic mean, and in theory infinity would give a simple max operation). Some other things were also changed, so you probably have to re-tune your parameters.
The progressive + adaptive stopping combination doesn't work yet, but the changed code layout makes it way easier now.
Importance equalisation now automatically switches on progressive rendering (for it to work well, set a low adaptive interval like 5).
I've worked a bit on codestyle, but its still not quite good yet.
Another thing: I removed the adaptive warmup, it now just uses one map interval as the warmup, but I'll probably re-add it.
Metro_19 might take a while since I currently have quite a lot of other things to do, but I'll still follow this thread. Real development will continue around the beginning of June.
[metropolis_18.diff](https://archive.blender.org/developer/F88507/metropolis_18.diff)
I just read the Multiplexed Metropolis paper (from http://cs.au.dk/~toshiya/) and sadly, it's no use for Cycles at all (currently) since it focuses on choosing the right (s, t) pair for bidirectional path tracing. That's what the authors mean with "Combination of MIS and MCMC", since MIS was developed for choosing this pair at first.
I just read the Multiplexed Metropolis paper (from http://cs.au.dk/~toshiya/) and sadly, it's no use for Cycles at all (currently) since it focuses on choosing the right (s, t) pair for bidirectional path tracing. That's what the authors mean with "Combination of MIS and MCMC", since MIS was developed for choosing this pair at first.
I wouldn't worry about that right now Lukas, I and a lot of others would just be happy to see the adaptive metropolis sampling feature developed as far as you can get it and committed to Master. If you're thinking about adding a bidirectional sampler, then it would be advisable to hold that off until after this patch is completed, one thing at a time :).
I wouldn't worry about that right now Lukas, I and a lot of others would just be happy to see the adaptive metropolis sampling feature developed as far as you can get it and committed to Master. If you're thinking about adding a bidirectional sampler, then it would be advisable to hold that off until after this patch is completed, one thing at a time :).
Hi Lucas, I am sorry but _18.diff fail on Linux during patching.
I don´t try to compile.
patching file intern/cycles/kernel/kernel_types.h
Hunk #2 FAILED at 200.
Hunk #3 succeeded at 312 (offset 2 lines).
Hunk #4 succeeded at 355 (offset 2 lines).
Hunk #5 succeeded at 852 (offset 2 lines).
Hunk #6 succeeded at 917 (offset 2 lines).
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
Thank you, mib
Hi Lucas, I am sorry but _18.diff fail on Linux during patching.
I don´t try to compile.
```
patching file intern/cycles/kernel/kernel_types.h
Hunk #2 FAILED at 200.
Hunk #3 succeeded at 312 (offset 2 lines).
Hunk #4 succeeded at 355 (offset 2 lines).
Hunk #5 succeeded at 852 (offset 2 lines).
Hunk #6 succeeded at 917 (offset 2 lines).
```
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
```
```
Thank you, mib
@mib2berlin Are you sure you use the current master? I just (5min ago) pulled the newest changes and metropolis_18.diff applies for me (also on Linux). The error in kernel_types sounds as if you haven't pulled in the baking commit yet, since then one Pass would be missing. If not, could you post the intern/cycles/kernel/kernel_types.h.rej?
I'll also include a new patch, this time in a better format (true patch instead of diff), maybe this one works better (although apart from 2 functions now being inlined, there is nothing changed).
By the way: I ran some benchmarks a few days ago, and it really seems that most speed issues are fixed. First of all, when compared with a unpatched master (built with the same compiler etc.), there was no speed difference in classic PT (with all new options off) apart from the usual +- 1-2% due to background processes. The Metro sampler that was so slow in the beginning also catches up quite good, the same scene (an indoor Archviz) took 8:31 with PT (no Adaptive) and 8:57 with Metro at the same sample number (200). The problem with 10-second-tests is that Metro has to do an "first pass" for the UV and ID channels that of course influences results when rendering with 5 regular samples (the first pass doesn't count, so if you set 5 samples, it will do first pass + 5 samples). Maybe a check for only doing this when UV or ID passes are activated would be a good idea.
By the way, something I forgot to say until now: Due to the way the multi-threading for Metro currently works, if you specify 10 samples and 4 threads, every thread will run 10 samples on the whole image, resulting in 40 samples/pixel. Therefore, for a fair comparison, you'd have to set 40 samples when rendering PT. I'll change this, but currently that's how it works.
@Ace_Dragon Well, I do have some ideas what do do after this patch, but BPT is definitely not on it (see http://lists.blender.org/pipermail/bf-cycles/2014-May/001929.html for my reasons) and of course I'll finish this patch first :D
@mib2berlin Are you sure you use the current master? I just (5min ago) pulled the newest changes and metropolis_18.diff applies for me (also on Linux). The error in kernel_types sounds as if you haven't pulled in the baking commit yet, since then one Pass would be missing. If not, could you post the intern/cycles/kernel/kernel_types.h.rej?
I'll also include a new patch, this time in a better format (true patch instead of diff), maybe this one works better (although apart from 2 functions now being inlined, there is nothing changed).
By the way: I ran some benchmarks a few days ago, and it really seems that most speed issues are fixed. First of all, when compared with a unpatched master (built with the same compiler etc.), there was no speed difference in classic PT (with all new options off) apart from the usual +- 1-2% due to background processes. The Metro sampler that was so slow in the beginning also catches up quite good, the same scene (an indoor Archviz) took 8:31 with PT (no Adaptive) and 8:57 with Metro at the same sample number (200). The problem with 10-second-tests is that Metro has to do an "first pass" for the UV and ID channels that of course influences results when rendering with 5 regular samples (the first pass doesn't count, so if you set 5 samples, it will do first pass + 5 samples). Maybe a check for only doing this when UV or ID passes are activated would be a good idea.
By the way, something I forgot to say until now: Due to the way the multi-threading for Metro currently works, if you specify 10 samples and 4 threads, *every* thread will run 10 samples on the whole image, resulting in 40 samples/pixel. Therefore, for a fair comparison, you'd have to set 40 samples when rendering PT. I'll change this, but currently that's how it works.
[metropolis_18_n.patch](https://archive.blender.org/developer/F88861/metropolis_18_n.patch)
Build fails for me with metropolis_18_n.patch and latest master:
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:29:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_sse3.cpp:38:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_accumulate.h:79:2: warning: 'bsdf_eval.ccl::BsdfEval::use_light_pass' may be used uninitialized in this function [-Wmaybe-uninitialized]
if(eval->use_light_pass) {
^
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_sse3.cpp:38:0:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:864:13: note: 'bsdf_eval.ccl::BsdfEval::use_light_pass' was declared here
BsdfEval bsdf_eval;
^
cc1plus: some warnings being treated as errors
intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/build.make:126: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse41.cpp.o' failed
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse41.cpp.o] Error 1
cc1plus: some warnings being treated as errors
intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/build.make:103: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse3.cpp.o' failed
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse3.cpp.o] Error 1
CMakeFiles/Makefile2:5498: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all' failed
make[1]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
Build fails for me with metropolis_18_n.patch and latest master:
```
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:29:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_sse3.cpp:38:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_accumulate.h:79:2: warning: 'bsdf_eval.ccl::BsdfEval::use_light_pass' may be used uninitialized in this function [-Wmaybe-uninitialized]
if(eval->use_light_pass) {
^
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_sse3.cpp:38:0:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_path.h:864:13: note: 'bsdf_eval.ccl::BsdfEval::use_light_pass' was declared here
BsdfEval bsdf_eval;
^
cc1plus: some warnings being treated as errors
intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/build.make:126: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse41.cpp.o' failed
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse41.cpp.o] Error 1
cc1plus: some warnings being treated as errors
intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/build.make:103: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse3.cpp.o' failed
make[2]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/kernel_sse3.cpp.o] Error 1
CMakeFiles/Makefile2:5498: recipe for target 'intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all' failed
make[1]: *** [intern/cycles/kernel/CMakeFiles/cycles_kernel.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
```
@gandalf3 Well, this is a warning that also appears for me when building, however, I haven't modified kernel_accumulate, so it is most likely from master. The line "cc1plus: some warnings being treated as errors" sounds like you have enabled -Werror or something similar, or are there any errors reported further up (when you build multithreaded, the actual error might be further up).?
@gandalf3 Well, this is a warning that also appears for me when building, however, I haven't modified kernel_accumulate, so it is most likely from master. The line "cc1plus: some warnings being treated as errors" sounds like you have enabled -Werror or something similar, or are there any errors reported further up (when you build multithreaded, the actual error might be further up).?
Unfortunately I don't really know what I'm doing here..
I tried configuring with -Wno-error and building single threaded, but there was a different error (or maybe the same one, and I just didn't see it the first time if it was further up):
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_math.h:20:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_types.h:20,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/render/camera.h:20,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/blender/blender_camera.cpp:17:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h: In function 'float ccl::linear_gray_to_tvi(float)':
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:227:23: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
float log_v = log10(v);
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:232:9: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
log_i = pow(0.405f*log_v + 1.6f, 2.18f) - 2.86f;
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:236:9: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
log_i = pow(0.249f*log_v + 0.65f, 2.7f) - 0.72f;
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:239:31: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
return exp(log_i * log(10.0f));
^
cc1plus: some warnings being treated as errors
intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/build.make:57: recipe for target 'intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blender_camera.cpp.o' failed
make[2]: *** [intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blender_camera.cpp.o] Error 1
CMakeFiles/Makefile2:5318: recipe for target 'intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/all' failed
make[1]: *** [intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
It seems you may be right about -Werror, but I didn't set it..
Unfortunately I don't really know what I'm doing here..
I tried configuring with `-Wno-error` and building single threaded, but there was a different error (or maybe the same one, and I just didn't see it the first time if it was further up):
```
In file included from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_math.h:20:0,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/kernel/kernel_types.h:20,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/render/camera.h:20,
from /home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/blender/blender_camera.cpp:17:
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h: In function 'float ccl::linear_gray_to_tvi(float)':
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:227:23: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
float log_v = log10(v);
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:232:9: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
log_i = pow(0.405f*log_v + 1.6f, 2.18f) - 2.86f;
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:236:9: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
log_i = pow(0.249f*log_v + 0.65f, 2.7f) - 0.72f;
^
/home/gandalf3/.Blenderversions/gitbuild/blender/intern/cycles/util/util_color.h:239:31: error: conversion to 'float' from 'double' may alter its value [-Werror=float-conversion]
return exp(log_i * log(10.0f));
^
cc1plus: some warnings being treated as errors
intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/build.make:57: recipe for target 'intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blender_camera.cpp.o' failed
make[2]: *** [intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/blender_camera.cpp.o] Error 1
CMakeFiles/Makefile2:5318: recipe for target 'intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/all' failed
make[1]: *** [intern/cycles/blender/CMakeFiles/bf_intern_cycles.dir/all] Error 2
Makefile:146: recipe for target 'all' failed
make: *** [all] Error 2
```
It seems you may be right about `-Werror`, but I didn't set it..
I don't have a compiler (nor do I know how to use a C++ compiler, or any modern code compiler) so I need a build for Windows 7 64-bit, or newer files that I can copy into the folders I made for the May 2 build
@LukasStockner
I see you upoladed
metropolis_18_n.patch
I don't have a compiler (nor do I know how to use a C++ compiler, or any modern code compiler) so I need a build for Windows 7 64-bit, or newer files that I can copy into the folders I made for the May 2 build
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
Thank You
I'm also still waiting for the Win64 build with the latest patch version and I'm starting to wonder if plans have switched to wait until iteration 19 is done before another build is made.
Thanks.
Hi again.
I'm also still waiting for the Win64 build with the latest patch version and I'm starting to wonder if plans have switched to wait until iteration 19 is done before another build is made.
Thanks.
Do we really need to be filling up task/tracker pages with requests for builds? There's plenty of that at BA as is. There's no pages here, so it makes it much harder to follow progress when there's tons of "compile for me plz". BA's forum setup makes it a lot easier to work with general discussion, plus there's a lot more people there who can help with building anyway.
Not to mention, compiling Blender REALLY isn't all that hard. There's good instructions in the wiki. If you're good enough with computers to make sense of this thread, and have permissions on your machine to install test builds, you really shouldn't have any issue rolling your own anyhow.
Do we really need to be filling up task/tracker pages with requests for builds? There's plenty of that at BA as is. There's no pages here, so it makes it much harder to follow progress when there's tons of "compile for me plz". BA's forum setup makes it a lot easier to work with general discussion, plus there's a lot more people there who can help with building anyway.
Not to mention, compiling Blender REALLY isn't all that hard. There's good instructions in the wiki. If you're good enough with computers to make sense of this thread, and have permissions on your machine to install test builds, you really shouldn't have any issue rolling your own anyhow.
May i suggest "as an exception", this gets pushed to the Blender build bot ?.
Bypassing code reviews. This doesn't break the other cycles render methods or other stuff.
And we all know those builds are experimental anyway.
As a result we wouldn't disturb the programmer any more with build questions, or failing compilations.
Sure it might not have been peer viewed, but is it that so important ??..
Its experimental, and its the biggest blender change since well ehmm... cycles introduction.
Anyone loves a faster renderer, even if still under development.
Also then real problems / suggestions / ideas could be forum discussed BA, and perhaps more people understand this math on other forums as well.
May i suggest "as an exception", this gets pushed to the Blender build bot ?.
Bypassing code reviews. This doesn't break the other cycles render methods or other stuff.
And we all know those builds are experimental anyway.
As a result we wouldn't disturb the programmer any more with build questions, or failing compilations.
Sure it might not have been peer viewed, but is it that so important ??..
Its experimental, and its the biggest blender change since well ehmm... cycles introduction.
Anyone loves a faster renderer, even if still under development.
Also then real problems / suggestions / ideas could be forum discussed BA, and perhaps more people understand this math on other forums as well.
This is not how our development process works. If you want to use it now, you can do it by building Blender on your own.
Also, can we please stop with this "Please I want this" comments? They don't really help here and won't make things happen faster. Thanks!
This is not how our development process works. If you want to use it now, you can do it by building Blender on your own.
Also, can we please stop with this "Please I want this" comments? They don't really help here and won't make things happen faster. Thanks!
i compiled new master with metropolis_18_n.patch using mingw64 and cmake
(i had to disable a few things to get smake to work correctly, this isnt exclusive to this patch
i think its something wired with cmake, anyhow i disabled these items they are not available in this build
i compiled new master with metropolis_18_n.patch using mingw64 and cmake
(i had to disable a few things to get smake to work correctly, this isnt exclusive to this patch
i think its something wired with cmake, anyhow i disabled these items they are not available in this build
libmv
bullet
freestyle
osl
CUDA
if you would like try it
http://www.mediafire.com/download/4wjp48eo7le802j/Blender_2.7.5_Mingw64_cmake_mertopolis18.7z
I think this conversation should be moved over to blender artists forums. this thread is really for development purposes. and while it is great to see some of the images being produced, the "can someone send me a build", "i need help compiling" and "omg wowz" posts are clogging up the page.
im not a moderator or anything on here, i just think developer.blender should be just that... for devs
I think this conversation should be moved over to blender artists forums. this thread is really for development purposes. and while it is great to see some of the images being produced, the "can someone send me a build", "i need help compiling" and "omg wowz" posts are clogging up the page.
Please continue to post on this thread http://blenderartists.org/forum/showthread.php?329089-Cycles-MLT-patch
and leave this thread to developers.
*im not a moderator or anything on here, i just think developer.blender should be just that... for devs*
i thought i'd share it here
sorry if it's not related, but this Progressive Importance sampling sounds very promissing
https://corona-renderer.com/blog/research-corner-progressive-importance-sampling/
@LukasStockner,
I just got some probleme merging the diff ? Could you diff from master if you get some time ? Or provide a githash that merge without difficulties, so i can checkout to it ? Many thanks !
@LukasStockner,
I just got some probleme merging the diff ? Could you diff from master if you get some time ? Or provide a githash that merge without difficulties, so i can checkout to it ? Many thanks !
Well, time to get back to the Blender development :D
First of all, sorry for not uploading Windows builds and not answering on the thread anymore, my CUDA toolchain just won't work on Windows and I just had so much other stuff to do...
But, I think I have news that can make up for this: Since, for said other project, I got a GTX780 from NVIDIA, I'm currently working on GPU Metropolis! There basically are two reasons for this: First of all, while reading "Physically based Rendering" and Veach's thesis, I noticed that they both recommend using more, shorter chains, so it seems that I was wrong in assuming that more chains (on GPU) mean less quality. Second, due to my current re-structuring of the patch (moving stuff from util/ into the kernel), it's not much work to add it anymore, so why not just do it?
So, once it works reasonably, I'll post a patch, hopefully it won't be long...
Regarding the Progressive Importance Sampling: Wow, that's really awesome! I read the paper and it seems quite reasonable and impementable. It's basically a Photon mapping prepass that is not used to generate the image itself, but rather to "train" the path tracer so it knows from where the light comes from and is able to find more important paths. The implementation details are a mix of PT, Photon mapping, irradiance caching and GMM machine learning.
Since it still uses PT for the image generation, it would definitely be useful for Cycles. In fact, since it already uses Lightpath-Generation etc., but no explicit path connection, it's quite a step on the way to BPT without being as hard to incorporate into Cycles. By the way, PIS should work just as well on the GPU as it does on the CPU.
So, to answer @derekbarker, until the PIS paper was posted here, the next feature on my plan have been Lightgroups. Probably I'll still do them next, but PIS now definitely also is on the plan. Lightgroups won't be as involved as Metro (I imagined representing them like scene layers in the GUI, with every activated layer adding one output to the compositor), but the PIS would be quite more work to do. Still, challenges are fun :D
@tychota Yes, the current patch won't merge anymore. 83cdd5 should work, since it's two days older than metro_18_n. Metro_19 will of course be rebased again.
Well, time to get back to the Blender development :D
First of all, sorry for not uploading Windows builds and not answering on the thread anymore, my CUDA toolchain just won't work on Windows and I just had so much other stuff to do...
But, I think I have news that can make up for this: Since, for said other project, I got a GTX780 from NVIDIA, I'm currently working on GPU Metropolis! There basically are two reasons for this: First of all, while reading "Physically based Rendering" and Veach's thesis, I noticed that they both recommend using more, shorter chains, so it seems that I was wrong in assuming that more chains (on GPU) mean less quality. Second, due to my current re-structuring of the patch (moving stuff from util/ into the kernel), it's not much work to add it anymore, so why not just do it?
So, once it works reasonably, I'll post a patch, hopefully it won't be long...
Regarding the Progressive Importance Sampling: Wow, that's really awesome! I read the paper and it seems quite reasonable and impementable. It's basically a Photon mapping prepass that is not used to generate the image itself, but rather to "train" the path tracer so it knows from where the light comes from and is able to find more important paths. The implementation details are a mix of PT, Photon mapping, irradiance caching and GMM machine learning.
Since it still uses PT for the image generation, it would definitely be useful for Cycles. In fact, since it already uses Lightpath-Generation etc., but no explicit path connection, it's quite a step on the way to BPT without being as hard to incorporate into Cycles. By the way, PIS should work just as well on the GPU as it does on the CPU.
So, to answer @derekbarker, until the PIS paper was posted here, the next feature on my plan have been Lightgroups. Probably I'll still do them next, but PIS now definitely also is on the plan. Lightgroups won't be as involved as Metro (I imagined representing them like scene layers in the GUI, with every activated layer adding one output to the compositor), but the PIS would be quite more work to do. Still, challenges are fun :D
@tychota Yes, the current patch won't merge anymore. 83cdd5 should work, since it's two days older than metro_18_n. Metro_19 will of course be rebased again.
Maybe a silly question but i think it would be perfect to get a branched path version of this. Sometimes (most of the time) either translucent or glossy shader or SSS are producing the noise but diffuse isn't. So is it possible that the noise aware sample adjust branched sample so it won't spend processor or tracing diffuse rays or anything that has currently converted ?
Maybe a silly question but i think it would be perfect to get a branched path version of this. Sometimes (most of the time) either translucent or glossy shader or SSS are producing the noise but diffuse isn't. So is it possible that the noise aware sample adjust branched sample so it won't spend processor or tracing diffuse rays or anything that has currently converted ?
@tychota Yes, the current patch won't merge anymore. 83cdd5 should work, since it's two days older than metro_18_n. Metro_19 will of course be rebased again.
Would it be different if i use arc instead of patch -p1 ?
> @tychota Yes, the current patch won't merge anymore. 83cdd5 should work, since it's two days older than metro_18_n. Metro_19 will of course be rebased again.
no it don't.
I did
```
git checkout -b 83cdd5
patch -p1 < pathToMetropolis1_n.patch
```
but stilll get plenty of rejections
```
.../device/device_cpu.cpp
.../device/device_cuda.cpp
.../kernel/kernel_path.h
etc
```
Would it be different if i use arc instead of patch -p1 ?
@ThomasDinges Please accept my sincere apologies, from now on I will keep my input here only relative to "beta testing/suggestions" of the code in development - if thats appropriate? I won't waste your time via your "automatic email" from this (or any developer) thread.
I am very grateful for all you do for the 3D community, and don't want to be disrespectful or wasteful.
I became a bit "over excited" when I read about Lucas' enthusiasm towards moving forward on implementing GPU, MLT and PIS and should have thought first to send my "cheering" to him from an appropriate forum/channel.
Thank You
@ThomasDinges Please accept my sincere apologies, from now on I will keep my input here only relative to "beta testing/suggestions" of the code in development - if thats appropriate? I won't waste your time via your "automatic email" from this (or any developer) thread.
I am very grateful for all you do for the 3D community, and don't want to be disrespectful or wasteful.
I became a bit "over excited" when I read about Lucas' enthusiasm towards moving forward on implementing GPU, MLT and PIS and should have thought first to send my "cheering" to him from an appropriate forum/channel.
Thank You
About me asking earlier if it was possible to have per-pixel stop-condition, I thought of a workflow like this:
Each pixel gets evaluated as usual, but now every x-samples* a variable gets stored (yeah for each pixel in the tile!). Let's call it Y
Y is the absolute difference from the previous pixel computed and the new one. (Being that the sampling iteration brings every pixel to be averaged, Y value will be decreasing more and more)
Here we could set the threshold: if Y < threshold then stop sampling this pixel.
(* this could be every single pass or user-defined number of passes. Not sure what would be better.)
Not sure if i was clear. English is not my language of course.
And not sure if this is a totally absurd coding concept, or, I suspect a total memory-killer approach.
About me asking earlier if it was possible to have per-pixel stop-condition, I thought of a workflow like this:
- Each pixel gets evaluated as usual, but now every x-samples* a variable gets stored (yeah for each pixel in the tile!). Let's call it *Y*
- *Y* is the absolute difference from the previous pixel computed and the new one. (Being that the sampling iteration brings every pixel to be averaged, *Y* value will be decreasing more and more)
- Here we could set the threshold: if *Y* < threshold then stop sampling this pixel.
(* this could be every single pass or user-defined number of passes. Not sure what would be better.)
Not sure if i was clear. English is not my language of course.
And not sure if this is a totally absurd coding concept, or, I suspect a total memory-killer approach.
Max Consecutive Rejections - 256
Mutation distance - 0.40
Large Mutation chance - 0.40
Num. samples - 1024 using Correlated multi-jitter
Sampling - Equi-Importance
Patch revision - 18
The build I'm using is almost a month old (A MingW build from HolyEnigma), but the results you get from this patch are absolutely stunning with the right settings.
It goes to show that even the worst case scenarios can be rendered now in around 3 days max (with simpler cases surely rendering quite a lot faster than that).
Good work once again, I stand by my previous token award :)
Max Consecutive Rejections - 256
Mutation distance - 0.40
Large Mutation chance - 0.40
Num. samples - 1024 using Correlated multi-jitter
Sampling - Equi-Importance
Patch revision - 18

The build I'm using is almost a month old (A MingW build from HolyEnigma), but the results you get from this patch are absolutely stunning with the right settings.
It goes to show that even the worst case scenarios can be rendered now in around 3 days max (with simpler cases surely rendering quite a lot faster than that).
Good work once again, I stand by my previous token award :)
Metropolis Sampling
Max Consecutive Rejections - 256
Mutation distance - 0.25
Large Mutation chance - 0.20
Num. samples - 1532(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 29 hours
Patch revision - 18
Only very minor touch up was needed, I would really like to see Lukas thinking of this getting this into master at least as an experimental feature as soon as he can, the potential improvements here are just too great to just see waste away.
I have also noted, the Multi-Jitter was crushed by Sobol in this case, lowering the max rejects doesn't have any rapid rise in bias introduction using that method and 256 then gives the best that the higher and lower values will give you.
I can't wait to see the next update and the next build containing it, keep up the good work. :)
Metropolis Sampling
Max Consecutive Rejections - 256
Mutation distance - 0.25
Large Mutation chance - 0.20
Num. samples - 1532(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 29 hours
Patch revision - 18

Only very minor touch up was needed, I would really like to see Lukas thinking of this getting this into master at least as an experimental feature as soon as he can, the potential improvements here are just too great to just see waste away.
I have also noted, the Multi-Jitter was crushed by Sobol in this case, lowering the max rejects doesn't have any rapid rise in bias introduction using that method and 256 then gives the best that the higher and lower values will give you.
I can't wait to see the next update and the next build containing it, keep up the good work. :)
Metropolis Sampling
Max Consecutive Rejections - 256
Mutation distance - 0.20
Large Mutation chance - 0.25
Num. samples - approx. 710(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 70 hours
Patch revision - 18
I know, same image as the first one I posted, but it looks to me like the very nature of Sobol eliminates pretty much every issue with bias and incomplete convergence that the Multi-Jitter version had. The time is the same, but the convergence is quite a bit better.
Now I guess it's back to vanilla builds for now since the MingW one is getting old, which in turns results in me hoping we get a nice big update soon like better adaptive sampling for the Metro integrator and the like. This pretty much solves the issue of trying to render scenes with complex indirect lighting in Cycles.
Metropolis Sampling
Max Consecutive Rejections - 256
Mutation distance - 0.20
Large Mutation chance - 0.25
Num. samples - approx. 710(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 70 hours
Patch revision - 18

I know, same image as the first one I posted, but it looks to me like the very nature of Sobol eliminates pretty much every issue with bias and incomplete convergence that the Multi-Jitter version had. The time is the same, but the convergence is quite a bit better.
Now I guess it's back to vanilla builds for now since the MingW one is getting old, which in turns results in me hoping we get a nice big update soon like better adaptive sampling for the Metro integrator and the like. This pretty much solves the issue of trying to render scenes with complex indirect lighting in Cycles.
@Ace_Dragon, as for reference what is your PC hardware (processor-speed&cores / gpu /memory ),
It seams a complex scene, how many faces its made of ?, how much lights ?
And how do these render times compare to the other cycles render methods
@Ace_Dragon, as for reference what is your PC hardware (processor-speed&cores / gpu /memory ),
It seams a complex scene, how many faces its made of ?, how much lights ?
And how do these render times compare to the other cycles render methods
Regarding noise levels between stopped tiles in adaptive:
Could the sample-count be interpolated inside the tile? This means rendering a different sample count on each side of the tile - when one of the neighbouring tiles is stopped, the surrounding tiles don't render more samples on the connecting side, and the sample count increases towards the other non-stopped tiles.
It could also be done the other way - also tiles that are below the stopping condition render some extra samples towards the borders of non-stopped tiles.
Another idea is to render these extra samples after all tiles have stopped.
this would enable consistent grain/noise levels even at low samples...
Regarding noise levels between stopped tiles in adaptive:
Could the sample-count be interpolated inside the tile? This means rendering a different sample count on each side of the tile - when one of the neighbouring tiles is stopped, the surrounding tiles don't render more samples on the connecting side, and the sample count increases towards the other non-stopped tiles.
It could also be done the other way - also tiles that are below the stopping condition render some extra samples towards the borders of non-stopped tiles.
Another idea is to render these extra samples after all tiles have stopped.
this would enable consistent grain/noise levels even at low samples...
Just to show a sign of life here - I'm still working on the patch, but the full Metro rewrite necessary for the GPU is quite some work. It's working by now, but now regular PT crashes :/
Also, from what I can currently see, GPU Metro doesn't significantly outperform the CPU version. Considering I use a GTX780, that's quite disappointing, but I'll try to improve the performance.
On the positive side, the new code is already quite cleaner than the previous version, since now the whole Metro code is in the kernel instead of the device.
Another thing I'll try is to add "regular" pixel filtering for Metro mode. With PT, the current approach (shifting the camera point) works just as well, but I can imagine that for Metro this might be different. Also, this would filter functions with negative lobes (Sinc-Lanczos, Mitchell-Netravali etc.).
Just to show a sign of life here - I'm still working on the patch, but the full Metro rewrite necessary for the GPU is quite some work. It's working by now, but now regular PT crashes :/
Also, from what I can currently see, GPU Metro doesn't significantly outperform the CPU version. Considering I use a GTX780, that's quite disappointing, but I'll try to improve the performance.
On the positive side, the new code is already quite cleaner than the previous version, since now the whole Metro code is in the kernel instead of the device.
Another thing I'll try is to add "regular" pixel filtering for Metro mode. With PT, the current approach (shifting the camera point) works just as well, but I can imagine that for Metro this might be different. Also, this would filter functions with negative lobes (Sinc-Lanczos, Mitchell-Netravali etc.).
Lukas; if you're still following up on messages here...
Do you have any updates on this patch, if the GPU portion is looking to be too difficult right now, perhaps then you can at least get this done as a CPU-only feature for the initial release and work on the GPU stuff later?
If you can give an answer, that would be great.
Lukas; if you're still following up on messages here...
Do you have any updates on this patch, if the GPU portion is looking to be too difficult right now, perhaps then you can at least get this done as a CPU-only feature for the initial release and work on the GPU stuff later?
If you can give an answer, that would be great.
Well, yes, I'm still working on it and Metro19 is nearly finished, the only problem remaining is a CUDA alignment problem, it seems that nvcc doesn't manage to align the strict itself. Once this is done (I already tracked it down to a variable, so most likely today), I'll upload it here (rebase to master etc. is already done).
Well, yes, I'm still working on it and Metro19 is nearly finished, the only problem remaining is a CUDA alignment problem, it seems that nvcc doesn't manage to align the strict itself. Once this is done (I already tracked it down to a variable, so most likely today), I'll upload it here (rebase to master etc. is already done).
Finally, Metro19 is finished!
So, what has changed?
GPU Metropolis! After some changes, it's considerably faster than CPU Metro (at least on my system) and has all features that CPU also has.
To do this, I rewrote (or rather re-organized) the whole sampler, moving it from the device code to the kernel. This is IMO a lot cleaner.
Importance Equalisation is currently broken, but of course I'll re-add it.
There now is an option the choose the Chain Number (number of independent samplers). On GPUs, this should be really high (I currently use 16384 or 32768, however, this results in 75MB / 150MB memory usage on the GPU for the samplers), on CPUs it should be a multiple of the thread number (1-2 chains per thread are just fine). Basically, for GPU performance, this is the performance equivalent of tile size.
I fixed a pretty massive bug in the cooldown phase, but I might have only added it while rewriting, so it probably wasn't present in metro_18
Error estimation was also rewritten, now again using variance instead of the even-samples pass. Reasons for this are: It's got a solid theoretical basis, there are no correlation issues with Sobol, it works just as well with Metro and it seems more solid. If anyone is interested in the derivation of the actual formulas used, I put in some comments, I hope it's clear enough.
Basically, it uses perceptually weighted standard deviation (I call it PWSD), multiplied by sqrt(N) with N being the number of samples. The reason for this is that the standard deviation convergence of MC methods like PT is O(1/sqrt(N)), so by increasing the sample count by N, the error lowers by sqrt(N). This correction is necessary so that the error goes to 0 as N goes to infinity.
You can see the estimated error in the Diffuse direct pass
There is a "Power mean exponent" setting which can be used to balance between using the average error of a tile and the maximum error. Basically, the higher this value is, the more important the maximum values get. 2 gives a regular average, while in theory infinity gives a pure maximum. However, due to numerical precision issues, going higher than 10 is probably a bad idea. I always use 4, it seems quite balanced.
The new system seems pretty stable to varying tile sizes, so it works just as well for GPU.
To see the system at work, just render an image without any option, one just with stopping and one with both options. You'll see that the error distribution gets flatter from image to image.
Some things to note:
Since the pass system now uses atomic writes for Metro and atomic float writes are compiler-specific, I can't guarantee that building will work on all compilers. If building fails on one, just tell me and I'll include a #define for it. Currently, I have GCC, ICC and MSVC code, the MSVC however is untested.
If you have an Intel CPU, consider using ICC since it currently optimizes the Metro code way better (I'll optimize the code itself, but I wanted to get this patch version out). On my Laptop (i7-4500M) Cycles runs 2,5 times as fast with it (compared to GCC 4.8, same options). On AMD CPUs, however, this is a bad idea: Due to biased dispatching , it's (for me) actually 60%*slower// than GCC.
Metro seems to sample less dense in the top 10% of the image. Currently, I have no idea whiy this is the case, I'll look into it.
The image filtering didn't actually help anything for quality, so I removed it.
Sorry for the tons of whitespace error in the diff, I'll clean it up.
Things still to do:
Adaptive Metro: It's probably ~20 lines of code, but it's linked to importance equalisation, which as said above is currently broken.
Progressive stopping mode: My idea for this is to render every tile for one mapping interval, then move them to a priority queue sorted by error and always render the one with the highest error for one mapping interval. Probably also not too much work.
Considering that the Metro code still was seriously buggy when I decided that tile-wise Metro wouldn't work, it might be interesting to give it another chance.
For CPU Metro, the status line is freezing.
For some .blends, Metro rendering freezes when you stop it, with an OpenEXR error on the console.
Optimization. I did some profiling and there is a lot of speed to gain.
Finally, Metro19 is finished!
So, what has changed?
- GPU Metropolis! After some changes, it's considerably faster than CPU Metro (at least on my system) and has all features that CPU also has.
- To do this, I rewrote (or rather re-organized) the whole sampler, moving it from the device code to the kernel. This is IMO a lot cleaner.
- Importance Equalisation is currently broken, but of course I'll re-add it.
- There now is an option the choose the Chain Number (number of independent samplers). On GPUs, this should be really high (I currently use 16384 or 32768, however, this results in 75MB / 150MB memory usage on the GPU for the samplers), on CPUs it should be a multiple of the thread number (1-2 chains per thread are just fine). Basically, for GPU performance, this is the performance equivalent of tile size.
- I fixed a pretty massive bug in the cooldown phase, but I might have only added it while rewriting, so it probably wasn't present in metro_18
- Error estimation was also rewritten, now again using variance instead of the even-samples pass. Reasons for this are: It's got a solid theoretical basis, there are no correlation issues with Sobol, it works just as well with Metro and it seems more solid. If anyone is interested in the derivation of the actual formulas used, I put in some comments, I hope it's clear enough.
- Basically, it uses perceptually weighted standard deviation (I call it PWSD), multiplied by sqrt(N) with N being the number of samples. The reason for this is that the standard deviation convergence of MC methods like PT is O(1/sqrt(N)), so by increasing the sample count by N, the error lowers by sqrt(N). This correction is necessary so that the error goes to 0 as N goes to infinity.
- You can see the estimated error in the Diffuse direct pass
- There is a "Power mean exponent" setting which can be used to balance between using the average error of a tile and the maximum error. Basically, the higher this value is, the more important the maximum values get. 2 gives a regular average, while in theory infinity gives a pure maximum. However, due to numerical precision issues, going higher than 10 is probably a bad idea. I always use 4, it seems quite balanced.
- The new system seems pretty stable to varying tile sizes, so it works just as well for GPU.
- To see the system at work, just render an image without any option, one just with stopping and one with both options. You'll see that the error distribution gets flatter from image to image.
Some things to note:
- Since the pass system now uses atomic writes for Metro and atomic float writes are compiler-specific, I can't guarantee that building will work on all compilers. If building fails on one, just tell me and I'll include a #define for it. Currently, I have GCC, ICC and MSVC code, the MSVC however is untested.
- If you have an Intel CPU, consider using ICC since it currently optimizes the Metro code way better (I'll optimize the code itself, but I wanted to get this patch version out). On my Laptop (i7-4500M) Cycles runs 2,5 times as fast with it (compared to GCC 4.8, same options). On AMD CPUs, however, this is a bad idea: Due to [biased dispatching ](http:*www.agner.org/optimize/blog/read.php?i=121#49), it's (for me) actually 60%*slower// than GCC.
- Metro seems to sample less dense in the top 10% of the image. Currently, I have no idea whiy this is the case, I'll look into it.
- The image filtering didn't actually help anything for quality, so I removed it.
- Sorry for the tons of whitespace error in the diff, I'll clean it up.
Things still to do:
- Adaptive Metro: It's probably ~20 lines of code, but it's linked to importance equalisation, which as said above is currently broken.
- Progressive stopping mode: My idea for this is to render every tile for one mapping interval, then move them to a priority queue sorted by error and always render the one with the highest error for one mapping interval. Probably also not too much work.
- Considering that the Metro code still was seriously buggy when I decided that tile-wise Metro wouldn't work, it might be interesting to give it another chance.
- For CPU Metro, the status line is freezing.
- For some .blends, Metro rendering freezes when you stop it, with an OpenEXR error on the console.
- Optimization. I did some profiling and there is a lot of speed to gain.
The patch is pased on commit 49c73f.
[metropolis_19.patch](https://archive.blender.org/developer/F97398/metropolis_19.patch)
Hi Lucas, _19 compiles fine on my system, CPU is working but get error during GPU kernel compilation.
/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel/kernel_metropolis.h:20:23: fatal error: util_hash.h: Datei oder Verzeichnis nicht gefunden
Hi Lucas, _19 compiles fine on my system, CPU is working but get error during GPU kernel compilation.
```
/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel/kernel_metropolis.h:20:23: fatal error: util_hash.h: Datei oder Verzeichnis nicht gefunden
```
#include "util_hash.h"
Opensuse 13.1/64
Intel i5 3770K
GTX 760 4 GB (Display)
GTX 560Ti 1.28 GB 448 Cores
Driver 331.67
Thank you, mib
@mib2berlin That's really weird since ../util is in both CMakeLists and SConscript. Are you building with CMake or SCons? Could you try a new build from scratch?
@mib2berlin That's really weird since ../util is in both CMakeLists and SConscript. Are you building with CMake or SCons? Could you try a new build from scratch?
#define AtomicCASD(x, y, z) InterlockedCompareExchange64(x, z, y)
#define AtomicCASF(x, y, z) InterlockedCompareExchange(x, z, y)
+#define AtomicCASD(x, y, z) _InterlockedCompareExchange64(x, z, y)
+#define AtomicCASF(x, y, z) _InterlockedCompareExchange(x, z, y)
old = AtomicCASD(address_as_ull, assumed, newV);
+ old = AtomicCASD((volatile long long int *)address_as_ull, assumed, newV);
old = AtomicCASF(address_as_ull, assumed, newV);
+ old = AtomicCASF((volatile long *)address_as_ull, assumed, newV);
I also had to disable OSL, there was a difficult to trace undefined reference bug.
Very nice work !
Testing this out on msvc 2013, I made a few changes to get it to compile:
```
intern/cycles/device/device_cuda.cpp:
```
- offset = align_up(offset, __alignof(rtile.w));
```
+ offset = align_up(offset, __alignof(int));
intern/cycles/kernel/kernel_compat_cpu.h:
```
#elif defined(_WIN32)
- #define AtomicCASD(x, y, z) InterlockedCompareExchange64(x, z, y)
- #define AtomicCASF(x, y, z) InterlockedCompareExchange(x, z, y)
```
+#define AtomicCASD(x, y, z) _InterlockedCompareExchange64(x, z, y)
+#define AtomicCASF(x, y, z) _InterlockedCompareExchange(x, z, y)
```
- old = AtomicCASD(address_as_ull, assumed, newV);
```
+ old = AtomicCASD((volatile long long int *)address_as_ull, assumed, newV);
```
- old = AtomicCASF(address_as_ull, assumed, newV);
```
+ old = AtomicCASF((volatile long *)address_as_ull, assumed, newV);
```
I also had to disable OSL, there was a difficult to trace undefined reference bug.
Very nice work !
Hi lukasstockner97, try with clean build directory but same error.
After add complete path to util_hash.h Cuda kernel start to build but stop with:
Compiling CUDA kernel ...
"/usr/local/cuda/bin/nvcc" -arch=sm_20 -m64 --cubin "/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel/kernel.cu" -o "/home/pepo/.config/blender/2.71/cache/cycles_kernel_sm20_D233D0AAE2C08F54D31BCF819203CB63.cubin" --ptxas-options="-v" -I"/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel" -DNVCC -D__KERNEL_CUDA_VERSION__=60
ptxas fatal : More than 128 textures used in entry function 'kernel_cuda_metropolis_first_pass'
CUDA kernel compilation failed, see console for details.
Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering
Thank you for the fast help, mib
EDIT: The kernel for my GTX 760 sm_30 is compiling and working.
Error is for GTX 560Ti 448 sm_20.
The memory consumptionon GPU is huge, 1.1 GB for BMW!
THX
Hi lukasstockner97, try with clean build directory but same error.
After add complete path to util_hash.h Cuda kernel start to build but stop with:
```
Compiling CUDA kernel ...
"/usr/local/cuda/bin/nvcc" -arch=sm_20 -m64 --cubin "/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel/kernel.cu" -o "/home/pepo/.config/blender/2.71/cache/cycles_kernel_sm20_D233D0AAE2C08F54D31BCF819203CB63.cubin" --ptxas-options="-v" -I"/daten/blender-git/build/bin/2.71/scripts/addons/cycles/kernel" -DNVCC -D__KERNEL_CUDA_VERSION__=60
ptxas fatal : More than 128 textures used in entry function 'kernel_cuda_metropolis_first_pass'
CUDA kernel compilation failed, see console for details.
Refer to the Cycles GPU rendering documentation for possible solutions:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/GPU_Rendering
```
Thank you for the fast help, mib
EDIT: The kernel for my GTX 760 sm_30 is compiling and working.
Error is for GTX 560Ti 448 sm_20.
The memory consumptionon GPU is huge, 1.1 GB for BMW!
THX
@hdunderscore Thanks, I'll add it in the next version! @mib2berlin This error comes from the 128-textures limit on devices with sm20 or lower, since MCQMC uses another texture, one of the image textures has to be moved to the sm30+ code. I forgot to do so since I only use sm30 and sm35, sorry for that. For a quick fix, move the "__tex_image_098"-line in intern/cycles/kernel/svm/svm_image.h 2 lines down into the "#if defined ..."-block.
Regarding memory usage: Well, the Metro currently requires lots of memory, I am aware of this and will try to reduce it (this falls under optimization). One solution would be to disable lazy sample generation, another one to always re-mutate from the last large-step. This would give performance problems, however, when there is a large chain of rejected large-steps, and especially on GPUs, where every warp has to wait for all threads to finish the mutations, this would probably be a quite bad trade-off.
A quick fix for this is to reduce the bounce count, it nothing else helps and you're getting OOMs, reduce the chain count.
@hdunderscore Thanks, I'll add it in the next version!
@mib2berlin This error comes from the 128-textures limit on devices with sm20 or lower, since MCQMC uses another texture, one of the image textures has to be moved to the sm30+ code. I forgot to do so since I only use sm30 and sm35, sorry for that. For a quick fix, move the "__tex_image_098"-line in intern/cycles/kernel/svm/svm_image.h 2 lines down into the "#if defined ..."-block.
Regarding memory usage: Well, the Metro currently requires lots of memory, I am aware of this and will try to reduce it (this falls under optimization). One solution would be to disable lazy sample generation, another one to always re-mutate from the last large-step. This would give performance problems, however, when there is a large chain of rejected large-steps, and especially on GPUs, where every warp has to wait for all threads to finish the mutations, this would probably be a quite bad trade-off.
A quick fix for this is to reduce the bounce count, it nothing else helps and you're getting OOMs, reduce the chain count.
Thank you, with changes in svm_image.h GTX 560Ti 448 start rendering.
I cant render with both cards and it seams sometimes the 560 render and sometimes the 760.
Over 6000 M. Chains GTX 560 stop working without error.
The render results also very different on GTX 560.
Thank you, with changes in svm_image.h GTX 560Ti 448 start rendering.
I cant render with both cards and it seams sometimes the 560 render and sometimes the 760.
Over 6000 M. Chains GTX 560 stop working without error.
The render results also very different on GTX 560.
http://www.pasteall.org/pic/73837
http://www.pasteall.org/pic/73838
http://www.pasteall.org/pic/73839
http://www.pasteall.org/pic/73840
Very interessting, cheers, mib
someone requested a build..
here mingw64-cmake 7-12-14 metropolis19
with player, no CUDA or OSL
http://www.mediafire.com/download/84vzfy99a5h6n41/Blender_2.71_mingw64_cmake_7-12-14_metropolis19.7z
-Complex scenes seem to crash the Metropolis sampler now (those that need around a gig or RAM or more), can't really determine why it crashes. Sometimes it will work with very small image sizes if the image buffer has an image being overwritten instead of empty, but avoiding it is about impossible with larger sizes (tested with the MingW and VC2013 builds). I will also note that all crashes occur when the samples are about to be drawn to the image buffer after the first pass is completed.
-Saving a scene with the Metropolis sampler selected, then exiting and restarting Blender, seems to permanently turn on the progressive refine option for that specific scene, at least for me it's stuck on the 'on' position for that scene even when the box is unchecked.
-I don't exactly see why someone would really need the sampler count option, running more than one sampler just increases noise and can even lead to progress quickly stopping altogether (check the default cube scene).
So the Metropolis functionality seems to have seen a few regressions as of rev. 19,, but the adaptive sampling seems intact after all of the changes (EDIT; Mostly, just found that a high min-bounce number will also crash Blender on the same scenes that crash Metro)..
Hope you can fix these.
EDIT: Updated information after more testing
Initial thoughts....
- -Complex scenes seem to crash the Metropolis sampler now (those that need around a gig or RAM or more), can't really determine why it crashes. Sometimes it will work with very small image sizes if the image buffer has an image being overwritten instead of empty, but avoiding it is about impossible with larger sizes (tested with the MingW and VC2013 builds). I will also note that all crashes occur when the samples are about to be drawn to the image buffer after the first pass is completed.
- -Saving a scene with the Metropolis sampler selected, then exiting and restarting Blender, seems to permanently turn on the *progressive refine* option for that specific scene, at least for me it's stuck on the 'on' position for that scene even when the box is unchecked.
- -I don't exactly see why someone would really need the *sampler count* option, running more than one sampler just increases noise and can even lead to progress quickly stopping altogether (check the default cube scene).
So the Metropolis functionality seems to have seen a few regressions as of rev. 19,, but the adaptive sampling seems intact after all of the changes (EDIT; Mostly, just found that a high min-bounce number will also crash Blender on the same scenes that crash Metro)..
Hope you can fix these.
EDIT: Updated information after more testing
(hopefully) built a version of R19 with CUDA (and GPU SSS and Volumetrics) compiled with VS2013/cmake. Doesn't have OSL or Player.
It works on my computer but as it is my first build, not sure whether I've zipped everything up that is needed.
http://www.mediafire.com/download/bq551h24xxdym9e/Release.zip
While playing around with settings i noticed something strange using R19 Jemonn's build (thanks), on my i7 octocore (no GPU).
I had a simple blend with 3 lights and some musical notes as objects > https:*dl.dropboxusercontent.com/u/54767531/music.blend
It does render a first preview, but then later it doesn't update (although the rendering system stays busy).
Well the setting i used are probably not good for a nice render, but its strange that the system kept busy while no improvement was made.
With other settings it worked just fine, so i just post this as maybe there is something going wrong her with metropolis or the adaptive sampling.
While playing around with settings i noticed something strange using R19 Jemonn's build (thanks), on my i7 octocore (no GPU).
I had a simple blend with 3 lights and some musical notes as objects > [https:*dl.dropboxusercontent.com/u/54767531/music.blend ](https:*dl.dropboxusercontent.com/u/54767531/music.blend)
It does render a first preview, but then later it doesn't update (although the rendering system stays busy).
Well the setting i used are probably not good for a nice render, but its strange that the system kept busy while no improvement was made.
With other settings it worked just fine, so i just post this as maybe there is something going wrong her with metropolis or the adaptive sampling.
Hey Lukas; I don't know how much it would help, but I found a potentially useful resource that has free .blend file scenes (to verify that the adaptive and metropolis sampling code works with as many cases as possible). http://www.emirage.org/category/free-stuff/
For example, if any one of these scenes crash with the metropolis sampler being used, you know that there's a regression or two since Patch 18.
Hey Lukas; I don't know how much it would help, but I found a potentially useful resource that has free .blend file scenes (to verify that the adaptive and metropolis sampling code works with as many cases as possible).
http://www.emirage.org/category/free-stuff/
For example, if any one of these scenes crash with the metropolis sampler being used, you know that there's a regression or two since Patch 18.
Hi Luckas.
First of all, thanks for all that job, that's cool !
More constructive remark: i tested the @holyenigma 's build, it crash when i render a scene with metro and render border enabeled.
@jemonn: your build crash on my computer, missing a file: "VCOMP120.DLL"
Hi Luckas.
First of all, thanks for all that job, that's cool !
More constructive remark: i tested the @holyenigma 's build, it crash when i render a scene with metro and render border enabeled.
@jemonn: your build crash on my computer, missing a file: "VCOMP120.DLL"
@Lapineige for some reason, it didn't copy it to my build folder, but it still works for me? anyway, I've uploaded it again, copying the contents of my build to a copy of the 2.71 folder, which should include all the files but as it works for me, I've got no idea whether it's fixed everything.
Using my build, it doesn't crash when using metro and a render border is used.
I've been leaving all the settings at their defaults, are there any suggestions to improve performance? Is leaving sampler count at 0 (auto) okay, or should I manually change it?
@Lapineige for some reason, it didn't copy it to my build folder, but it still works for me? anyway, I've uploaded it again, copying the contents of my build to a copy of the 2.71 folder, which should include all the files but as it works for me, I've got no idea whether it's fixed everything.
Using my build, it doesn't crash when using metro and a render border is used.
(Hopefully) fixed v19 link: http://www.mediafire.com/download/0jkmzgnk2qim00q/Blender.7z
I've been leaving all the settings at their defaults, are there any suggestions to improve performance? Is leaving sampler count at 0 (auto) okay, or should I manually change it?
Some unfortunate news about the adaptive sampling in rev. 19.
Never mind about it remaining intact from rev. 18, just found that adaptive sampling on a complex scene will also crash Blender when the adaptive stopping kicks in for the first time (providing it takes a while to get there). I can also get Blender to crash early on if I use a high number for the min bounces setting (10 or more).
The crashing issues mean the patch is unusable once a scene gets to a certain degree of complexity, back to rendering it in a vanilla Master build I guess for now. Don't forget that those eMirage scenes can used for testing of your patch and seeing if things crash for you as well.
Some unfortunate news about the adaptive sampling in rev. 19.
Never mind about it remaining intact from rev. 18, just found that adaptive sampling on a complex scene will also crash Blender when the adaptive stopping kicks in for the first time (providing it takes a while to get there). I can also get Blender to crash early on if I use a high number for the *min bounces* setting (10 or more).
The crashing issues mean the patch is unusable once a scene gets to a certain degree of complexity, back to rendering it in a vanilla Master build I guess for now. Don't forget that those eMirage scenes can used for testing of your patch and seeing if things crash for you as well.
Sorry about all the crashes...
Metro20 is making progress, in 2-3 days it should be done. The Metro crashes should be gone with it, the adaptive crashes are quite strange. Could you post a crashlog (the ones from the Temp folder), and, if possible, use a debug build so the place where it crashes can be seen better.
For me, Metro19 didn't crash once, even with scenes like the Lego bulldozer or the Pavilion from eMirage. I'll run some Valgrind tests today to search for wrong memory accesses. By the way, is it expected that cuda-memcheck crashes with Blender?
Regarding the GPU Metro problems: With Metro20, Multi-GPU will work, but the error on the GTX 560 is strange. Well, we'll see if it works with Metro20.
The min. bounces is strange, I'll have a look.
Sorry about all the crashes...
Metro20 is making progress, in 2-3 days it should be done. The Metro crashes should be gone with it, the adaptive crashes are quite strange. Could you post a crashlog (the ones from the Temp folder), and, if possible, use a debug build so the place where it crashes can be seen better.
For me, Metro19 didn't crash once, even with scenes like the Lego bulldozer or the Pavilion from eMirage. I'll run some Valgrind tests today to search for wrong memory accesses. By the way, is it expected that cuda-memcheck crashes with Blender?
Regarding the GPU Metro problems: With Metro20, Multi-GPU will work, but the error on the GTX 560 is strange. Well, we'll see if it works with Metro20.
The min. bounces is strange, I'll have a look.
Lukas, testing the VC2013 build I have with your patch, there's a chance it may just be MingW being unstable compared to the official platforms (MingW being the only platform Holy Enigma will build for).
I couldn't get the crashes I got early on with the other build in this case, so it may be MingW's fault and not the fault of your code.
If I do indeed find it a false alarm, then I'm sorry for that.
EDIT: Okay, even the use of 64 threads doesn't crash adaptive sampling with the VC2013 build, so it really is just MingW doing its thing as the least stable platform you can possibly build with. Sorry for that.
Lukas, testing the VC2013 build I have with your patch, there's a chance it may just be MingW being unstable compared to the official platforms (MingW being the only platform Holy Enigma will build for).
I couldn't get the crashes I got early on with the other build in this case, so it may be MingW's fault and not the fault of your code.
If I do indeed find it a false alarm, then I'm sorry for that.
-------
EDIT: Okay, even the use of 64 threads doesn't crash adaptive sampling with the VC2013 build, so it really is just MingW doing its thing as the least stable platform you can possibly build with. Sorry for that.
Do you by any chance know what the unit is for the adaptive map update value? I ask because I couldn't really obtain a difference say, between a value of 5 and one of 50, but then I found that much larger values seem to be necessary if I wanted to see a difference when using a lower error tolerance along with the adaptive map.
Perhaps if one could make sure that the value was easy for the user to understand it would be different, say, update the adaptive map every N number of passes instead if it's not being done like that already.
Another question.
Do you by any chance know what the unit is for the adaptive map update value? I ask because I couldn't really obtain a difference say, between a value of 5 and one of 50, but then I found that much larger values seem to be necessary if I wanted to see a difference when using a lower error tolerance along with the adaptive map.
Perhaps if one could make sure that the value was easy for the user to understand it would be different, say, update the adaptive map every N number of passes instead if it's not being done like that already.
MLT 19 (holyenigma MingW build) really good fast CPU renders (once I messed with it a little, and followed Lucas' suggestions) on a "physically" simple scene, with very complex "real" spectral/caustic lighting, using MLT 19 holyenigma MingW
A useable "draft" in Around 10 minutes 1920x1080 50 passes, adaptive sampling at 2.5.
Before using this MLT 19 I could NOT GET a good reasonably, noise free render any other way on this scene, I tried many combinations on both PT and BPT, at least 200 times. Tried MLT 18 and it crashed a lot on other scenes, maybe I'll try it on this scene just to see ....
MLT 19 (holyenigma MingW build) really good fast CPU renders (once I messed with it a little, and followed Lucas' suggestions) on a "physically" simple scene, with very complex "real" spectral/caustic lighting, using MLT 19 holyenigma MingW
A useable "draft" in Around 10 minutes 1920x1080 50 passes, adaptive sampling at 2.5.
Before using this MLT 19 I could NOT GET a good reasonably, noise free render any other way on this scene, I tried many combinations on both PT and BPT, at least 200 times. Tried MLT 18 and it crashed a lot on other scenes, maybe I'll try it on this scene just to see ....
Image and .blend file here
[BA Thread: Cycles MLT patch](http://www.blenderartists.org/forum/showthread.php?329089-Cycles-MLT-patch/page12)
i7 2600 3.4ghz, Windows 7 64-bit
rendered scene above at 4 times the samples, was at 50 but once it gets to about 70 passes it starts showing "black fireflies" - even tried it on "Suzanne" with the same spectral node AND lots of ambient light - same problem?
also tried @jemonn build of 19 w GPU (GTX580 3 GB), and the GPU mode is about 30% slower on this file, than using CPU = i7 2600 3.4ghz, Windows 7 64-bit 12 GB
rendered scene above at 4 times the samples, was at 50 but once it gets to about 70 passes it starts showing "black fireflies" - even tried it on "Suzanne" with the same spectral node AND lots of ambient light - same problem?
also tried @jemonn build of 19 w GPU (GTX580 3 GB), and the GPU mode is about 30% slower on this file, than using CPU = i7 2600 3.4ghz, Windows 7 64-bit 12 GB
Just an FYI: out of the box patch from master to metropolis_19 yields "patch does not apply" and "trailing white space" errors and when compiling:
blender\intern\cycles\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found
blender\intern\cycles\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\device_cuda.cpp(716): error C2143: syntax error : missing ')' before '.'
blender\intern\cycles\device\device_cuda.cpp(716): error C2228: left of '.w' must have class/struct/union type is 'int'
blender\intern\cycles\device\device_cuda.cpp(716): error C2059: syntax error : ')'
My machine (using the term lightly):
Windows 7
VS 2013 Win32 (builds master just fine)
CMake
portablegit (Github) with tortoisesvn (idk if it makes a difference)
Hope this helps
Just an FYI: out of the box patch from master to metropolis_19 yields "patch does not apply" and "trailing white space" errors and when compiling:
```
blender\intern\cycles\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found
blender\intern\cycles\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(260): error C3861: 'InterlockedCompareExchange64': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\..\kernel\kernel_compat_cpu.h(272): error C3861: 'InterlockedCompareExchange': identifier not found (blender\intern\cycles\device\device_cpu.cpp)
blender\intern\cycles\device\device_cuda.cpp(716): error C2143: syntax error : missing ')' before '.'
blender\intern\cycles\device\device_cuda.cpp(716): error C2228: left of '.w' must have class/struct/union type is 'int'
blender\intern\cycles\device\device_cuda.cpp(716): error C2059: syntax error : ')'
```
My machine (using the term lightly):
Windows 7
VS 2013 Win32 (builds master just fine)
CMake
portablegit (Github) with tortoisesvn (idk if it makes a difference)
Hope this helps
Also GPU rendering is really slow for me, might just be my build as @craigar said it was slower for GPU, but there usually isn't this much difference in the speeds.
I've just tried the test file @craigar linked to above and something is really wrong with importance equalisation (but we all knew that anyway).
However, I have slightly different results with my CPU and GPU?
---

i7 4500U - 400 samples, importance equalisation off, 4 chains. (04:53:63)
---
 to Cycles to get some experience with the Cycles code layout, but it turned out working so well that I decided to post it here. Due to some problems, it's not releasable yet, but these problems (more below) should be not too hard to fix.

The code is based on the SmallLuxGPU metropolis sampler, after I first tried the PBRT code, but that one didn't seem to work nearly as well.
Basically, it works by bypassing the RNG functions: When they are called and metropolis sampling is selected, they just return a value from a sample array that is stored in the RNG pointer. This allows the sampler to make only minimal changes to the kernel, it even uses the standard kernel_path_integrate. The sampler itself is in the CPUDevice thread.
My current test scene is a simple pool with a modifier-displaced water surface, an absorption volume in the water (works great, by the way), a glass pane on one sine of the water and a Sun-HDR-combination lighting the scene.
The upper image is the 2.69 release, while the lower one is with the metropolis patch. Both were rendered in equal time, but the patched version is a debug build.
From the Metropolis image, the biggest problem is obvious: Tiles. The current system requires one sampler per tile, so the seams are easily visible. Also, if one tile has large bright surfaces, the noise outside of those is worse than in other tiles. The solution for this would be one sampler per thread, all working on the whole image (which would require atomics for the buffer writes).
Another problem are passes, since they're currently written directly from the kernel. However, in Metropolis sampling, they need to be weighted, but this weight is only available after the kernel is done with the ray. A solution for this would be to return the values to be written to the buffer from the kernel, so that the Device is responsible for storing them. The depth, normal, ObjectID and Alpha passes could be done in a single-pass regular pathtrace.
Also, there currently is a bug that causes standard pathtracing to crash, I still have to find the source of this one.
However, this seems like a promising feature that might be worth the work fixing the problems above.
PS: A one-hour-render of the pool looks like this:
The patch is here: metropolis.diff
Changed status to: 'Open'
Added subscriber: @LukasStockner
Added subscriber: @Ace_Dragon
Cool stuff, I really didn't expect this and it would definitely make for much easier rendering of caustics in Cycles (especially in conjunction with 'filter glossy' to help with those more difficult lightpaths).
By any chance (and this might be some crazy idea), but would it be possible to only apply the metropolis sampler for certain lightpath types (like the ones that create caustic effects), because you can already get away easily with plain pathtracing with plain diffuse bounces or other non-caustic situations (except for cases with tiny lights of course)?
Just an idea to toss around since Cycles has the functionality needed to obtain information from paths (hence thelight path node).
I think storm_st should also look at this, perhaps he can move his bidirectional sampling efforts to work off of your code.
Added subscriber: @MatthewHeimlich
Added subscriber: @mib2berlin
Hi, testing the patch now.
I got only black for Glass BSDF, could you add your test file, please?
Looks promising. :)
Cheers, mib.
After patching, preview seems to be broken. Looks like only a couple of samples are updated into the viewport. F12 render works just fine.
Testing on Linux 64-bit. Very excited to play around with this! Thanks for the contribution.
Added subscriber: @MaciejJutrzenka
Would be awesome to have Metropolis in cycles. I even can donate some coder to make it happen :)
Added subscriber: @tuqueque
The broken preview render is probably because every time the CPUDevice thread is called, a new sampler is created with the global integrator seed. This could be fixed by storing the sampler data in the tile, I'll try that once I'm at home.
Added subscriber: @tychota
Removed subscriber: @tychota
Added subscriber: @tychota
Edit: Sorry for the doublepost.
Added subscriber: @ThomasDinges
I don't want to spoil the fun here (results look great), but isn't SmallLuxGPU GPL code?
I know that they re licensed the LuxRays code recently to Apache 2.0, so if your work is based on that, it's fine.
Some clarification here would be appreciated (link to the sources this patch is based on).
Added subscriber: @lsscpp
Tested on vc2008 (Windows) now, and when I just switch to the Metropolis sampler, I get a mostly black/wrong image. Both Preview and F12.
Added subscriber: @sanne
The code is based on
48e44c150f/src/slg/sampler/sampler.cpp
, which, according to the header, is Apache 2.0 licensed. I'll upload my test file as soon as I'm on my PC. I'm working on Linux x64, but I could also test it on Windows, although there shouldn't be any platform-dependent code in there.Thanks for the clarification, good to hear. :)
Tested the patch on Mac OS now, with clang compiler. Same result.
I just open one of our test files: https://svn.blender.org/svnroot/bf-blender/trunk/lib/tests/cycles/ and switch to the MLT sampler.
Render is different, lot of errors.
(color_ramp.blend in this example).
Added subscriber: @marcog
@LukasStockner: I found the issue.
Your Metropolis code in device_cpu.cpp comes after the optimized kernels (AVX, SSE41....).
I added a quick #if 0 around those, so on runtime those are skipped and the non optimized kernel (with your MLT code) gets used. :)
Edit: Run some more tests now. Caustics are better with MLT, but other things (Diffuse surfaces, background) are much more noisy. Good start though.
CC'ing @brecht, I guess he will find this interesting. :)
Ah, ok. I disabled them in my scons config, forgot about that -.-. Glad that it works now.
The overall noise in the image is pretty much expected, but a few tricks (clamping importance, user-provided importance map, noise-aware sampling) might help out with that. Also, Metropolis sampling is naturally better suited for long rendering times instead of quick previews.
Added subscriber: @FlorianRichter
Ok, so here's my test scene. I replaced the HDR with a sky node because it's too big to upload here.
Testscene_Pool.blend
Another heads up: Using Filter Glossy seems to break rendering, not sure what's going on. I'm going to try to build on my Windows desktop now, been limited to a single core on my laptop so far.
New patch version, the SSE/AVX builds work now too, they just aren't executed if Metropolis is selected (Of cource, I will later add code for Metropolis to also work with those). Preview is still not fixed, the RenderTiles seem to be replaced every iteration, I still have to find a way around this (Storing in the CPUDevice doesn't work as well).
The first patch is incremental from the old one, the second one is from trunk.
metropolis_1_to_2.diff
metropolis_2.diff
Added subscriber: @Lockal
Added subscriber: @JasonClarke
New patch version, render passes and SSE/AVX works now.
The UV, Normal, ID and Alpha passes are done in a one-sample path tracing prepass, while all other passes are done with the metropolis sampler.
SSE and AVX is now used for metropolis rendering as well.
The pure pathtracing doesn't crash anymore, apparently I fixed the bug by the way.
@MatthewHeimlich: Could you please upload a test file where Filter Glossy crashes, for me it works fine...
The next steps are now Preview rendering and trying out the Quasi-random extension in the Metropolis sampler of the regular LuxRender (no code copying this time, since this code is GPL)
Patch 2 to 3: metropolis_2_to_3.diff
Trunk to Patch 3: metropolis_3.diff
Hi Lucas, get crash with metropolis_3.diff and
fed1b8b
.<
0x00007ffff2850849 in raise () from /lib64/libc.so.6
BlankComparisonScene_cycles.blend
Cheers, mib.
Open file, change to Metropolis, F12.
Mib
@mib2berlin Thanks for the report, however, I can't reproduce it, even with
fed1b8b
and your file (by the way, two image textures were missing, but I can't imagine that they caused the bug...). Could you please build your Blender with debug info and post a backtrack again?Either way, metropolis_4 should be ready soon, now stopping the render works again and the MCQMC extension works quite well, too.
Thanks, does not crash with debug build, get crash again with release build but different BT.
<
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbffff700 (LWP 32593)]
0x00007ffff2894a82 in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
(gdb)
Opensuse Linux 13.1/64
Intel i5 3770K
GTX 760
GTX 560Ti 448 Cores
Driver 331.20
Build
482823a
Maybe it is Linux only.
Cheers, mib.
Okay, this definitely looks like a out-of-bounds memory access, I'll look at it in Valgrind.
I'm on Linux too (Mint 16 64bit), so this shouldn't be a problem.
New patch version, MCQMC (Quasi-random numbers in the Metropolis sampler) works now, due to lack of reference images I'm not exacty sure whether it's an improvement or not, but it's definitely not worse. Also, the render is stoppable now again.
Regarding the memory bug: I fixed an out-of-bounds error (it didn't generate enough random numbers), however, I'm not sure whether this was the bug that caused the crash. Debugging this in Valgrind is notoriously difficult since it only appears sometimes and Blender in Valgrind is sloooow (30min and upwards for one iteration!).
I tried to bypass the Tile system when using metropolis, but it's such a central part of the Cycles code that this didn't work out at all. Instead, preview rendering now uses standard Path Tracing and F12 rendering still uses Tiles, this won't be as straightforward as I have thought...
metropolis_4.diff
Hi Lucas, 4.diff give the the same error posted in my last post.
It crash on every scene I test.
If I switch to Progressive Refine it work.
With your patch it is not possible to render with GPU, gave cuda error.
But is not important.
Thank you for your work, mib.
@LukasStockner, you should try GCC Address Sanitizer , it is very fast, one can use it even with RelWithDebInfo or inside gdb. Here is the crashlog:
{P11, lines=15}
So it calls
float4 operator+(const float4& a, const float4& b)
insidekernel_write_pass_float4
with a null reference.Someone maybe have link to the build for windows 7 64? I would like to make some test.
New patch version, no new features this time, instead I fixed another bug and added a check for a possible error source.
The GCC Address Sanitizer showed only the bug I fixed, now all scenes I tested (including BlankComparisonScene_cycles.blend) run error- and crash-free, even with the Address Sanitizer. If Blender still crashes for someone, please try compiling with the Address Sanitizer (if possible, with a debug build) and post the output here.
The GPU building bug was fixed, but there may be other issues remaining. Sadly, I can't test it since my GPU is still Compute Capability 1.3. However, the metropolis sampler is currently CPU-only, and this will probably not change so soon, since Metropolis sampling, in contrast to Sobol sampling, is inherently a sequential algorithm. The only way to work around this is using many sampling chains in parallel (basically, this is ERPT), but this isn't nearly as efficient as using only a small number of chains on the CPU.
Apart from bugfixes, the next steps will be the Tile system and better importance functions (some Noise-detecting algorithm could be useful, like the one described in http://graphics.cs.illinois.edu/papers/importance). Also, user-defined importance maps seem quite useful, but these would need a good UI. By the way, this should also work for non-metropolis sampling.
metropolis_5.diff
Hi Lucas, no more crashes with the last patch withal my testfiles.
Cuda is working with my CC 2.0 and 3.0 cards.
Nice process, thanks.
I got message in terminal during render:
But render fine.
Cheers, mib.
Hi, this error message wrote 19GB in my.xsession-errors-:0 ! :)
Harddisk full (SSD) after some test render.
Cheers, mib.
Whoa, sorry about that :/ But this at least shows where the error lies: Somehow, the sampler goes outside of the image, I'll look at it. By the way: Is it possible that you use an image mutation range > 1 ?
Hi Lucas, I use Metropolis default settings in all tests.
Got crash now in debug build:
I try with Address Sanitizer but got scrambled output because of error message.
Hope this help to catch the bug.
Cheers, mib.
Ok, so I found another bug, this time it was related to floating-point math. In the Mutate function, there is code like
, so I thought it would always be 0 <= x < 1. However, apparantly sometimes x is so slightly under 0 that the if is executed, but x + 1 gets rounded up to 1, which causes code like (int) (x * width) to sometimes round up to width, which then causes a out-of-bounds access.
To fix it, just add
after line 301 in intern/cycles/device/device_cpu.cpp (This is just too small to make a metropolis_6.diff for it)
Regarding the error message: I have failed massively, I forgot the Negation in there, so the line got printed out when everything worked fine -.-
To remove it, just delete
, also in device_cpu.cpp
New patch version, this time I included the bugfixes described above and fixed another quite serious bug in the sample contribution, now the rendered images are way smoother and diffuse areas converge faster.
Also, as a test for noise-based Importance Sampling, I added a perceptual noise pass as described in the paper I posted recently (In fact, currently I replaced the Mist pass as my new pass didn't work, but that's just a temporary solution), both to the Metropolis sampler and to the standard path tracer. Especially for the PT, the results are really great. As an example, I rendered the Lego Bulldozer from http://www.blendswap.com/blends/view/72124 and this are the results:
To test it, just activate the Mist render pass and multiply it by some small value in the compositor since the output is usually > 1.
The patch is here, but as there seem to be conflicts with the current Trunk, I recommend to apply to
3d8c106
, since my local repo uses that one as origin/master.metropolis_6.diff
Here's a test of the noise map generator with some pokemon models I have laying around:
Seems to be working pretty well. Highlighting shadows, edges, furry spots, etc.
Don't know what I did wrong, but with metropolis_6.diff and


3d8c106
all shadows and reflections are wrong for me. But previous versions of this patch gave plausible results (except for dark rectangles).I now rebased to the current trunk, everything seems to work fine. @Lockal, please try this one out as well, for me reflection/refraction works just fine (I really need a second test system, somehow bugs never show up on my system...)
Also, this version includes D301 as a first step in noise-adaptive sampling.
metropolis_7.diff
reflection/refraction working fine here on both patches. Something I should've pointed out before: In intern/cycles/util/util_color.h line 240, Clang/OS X doesn't like
Jens Verwiebe pointed out to me in IRC it works with:
So I had it built with that.
Also, on adaptive sampling: While I won't claim to know how they work behind the scenes, the adaptive samplers in Vray and Mental Ray seem to work fine running on each tile individually. Might it be possible to try this with Cycles? (maybe only updating the noise map every n number of samples?)
Thanks for the exp10 thing, of course you're right (I think exp(log_i * log(10)), as described in the GNU libc documentation, should be even faster since log(10) is constant).
Regarding adaptive sampling: Basically, adaptive sampling in the tile works as well. However, consider this: The left half of the image is nearly noise-free, while the right half is very noisy. Now, with adaptive sampling, you want the right side to receive more samples. However, when the right side is rendered, the left side is possibly not rendered yet, so there is no way to do that. So, if you implement it that way, the samples can only be adaptively distributed inside the tile, but not between tiles.
Indeed, your last sentence is quite what I also intend to do, this is the reason for D301: Rendering, for example, 10 samples with a tiled approach on the whole image and then creating a noise map that is used for inter- and intra-tile sample distribution which is then used for the next 10 samples.
Guys u are are doing soo good joob! if u need some test machines i have i5 laptop. and 2 Macbook pro laptops... And They are totaly free so i can set some long time rendering just give the link to download and say what to do :>
How did you plan the adaptive sampling to work? Adaptively distributing a fixed number of samples across the image? Or setting a range of possible AA samples and letting each tile cut off where needed in that range? Because Vray/MR do the latter, and that avoids the problem of some tiles needing more samples than others. For example:
You set a min and max number of AA samples, and some target noise threshold. Once a tile reaches the min value, it checks noise level every n samples, and stops upon it either falling below the threshold, or hitting the max AA samples value. This way you can keep the coherency of individual tiles, but still let some tiles have far more samples than others.
Oops, with F12 render everything is ok (except for tiles, of course). But "rendered" viewport mode has very obvious problems. I did a full recompilation with metropolis_7 patch (gcc 4.8, linux x86-64), but the problem is still there.
@MaciejJutrzenka Currently, you have to build it from source (see http://wiki.blender.org/index.php/Dev:Doc/Building_Blender), but between downloading the source and building you have to apply the metropolis_7 patch from above.
@JasonClarke The latter one is actually a quite great idea, this would even add a stopping criterion that could be useful for renderfarms, animation rendering etc. The only question is whether the average noise of the tile or the maximum noise in the tile is considered. Alternatively, we could go the LuxRender way and add a user-provided percentage of pixels that has to pass the test.
My original idea was to use code like in the EnvMap importance sampling to map the uniform sample values to noise-accordingly distributed ones. This might be added as an option.
Do both! Use the importance sampling within the tile to hit the threshold faster. As far as a avg noise vs max noise vs changed pixels vs pixels below threshold, I don't really know. Might be best to just give several options so people can test. After running it through some scenes, you could hide/disable modes that prove unreliable.
@LukasStockner seems bit complicated however there is no some ready build with this for windows 64bit?
Added subscriber: @ClaasKuhnen
lukasstockner97,
Do I read this right that there might be soon a "noise-based Importance Sampling" so Cycles spends more time on where actual noise is not not where there result is already clean?
That would be terrific!
@ClaasKuhnen Yes, that's currently the plan...
@MaciejJutrzenka Sadly, not yet.
@JasonClarke The only thing missing for this is an actual noise estimate since the current output is in fact a variance estimate (more precisely, a visually weighted RMS estimate), not a noise estimate. Basically, its value is proportional to the difficulty, not to the remaining noise. On the one hand, this is great, since we can sample directly from it, but on the other hand to determine the noise left in a tile we'd need a separate estimate, although for this easier methods are availible.
I'm currently working on adaptive sampling, I'll post a new patch once it works good enough.
Added subscriber: @FilipPolbratt
Added subscriber: @lopataasdf
@ lukasstockner97
Regarding difficulty vs remaining noise, what about the idea shown here? (http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=1985872&viewfull=1#post1985872)
Added subscriber: @bned
Added subscriber: @gandalf3
New patch version, adaptive sampling works now in PT mode, the Noise-Aware Metropolis isn't too hard now as well.
I haven't added a stopping criterion yet, this will require a redesign of the Tile Manager since currently whole Cycles is based on the assumption that the samples per pixel is the same for every pixel (at least in a tile).
The current patch works around this by using a Sample number pass that is used in render/buffers.cpp (you can see the values of the pass in the shadow pass, which I currently use until I figure out how to output a new pass).
Also, at the moment adaptive sampling is only inside of individual tiles, every tile gets the same amunt of total work (this will change in the future, I plan to distribute samples to the tiles according to their mean importance). So, if you have one tile with high variance in the whole tile and one with basically zero variance, they will still be sampled equally.
Under the performance options, you can check adaptive sampling. There are two options: Adaptive Warmup, which sets the number of uniform samples taken per pixel to estimate importance before the adaptive sampling starts. Don't set this to low (~under 10) or it might miss difficult regions. The second one is the importance map interval, which is the number of samples taken until a new importance map is calculated. Too low values might give a significant performance hit.
Speaking about performance: Toe code is not optimized yet, in particular, the 4-pixel gaussian blur on the importance map is probably a big performance hit (maybe a simple box filter would suffice?). Also, when using progressive mode, a new map is computed every sample, so it's probably quite slow.
GPU is not included yet, but, in contrast to Metropolis, adaptive sampling should be possible there too.
@lsscpp This might work, using a more complex visual difference predictor would probably be even better, but they tend to be much slower than the approach you posted. An aternative is just a fixed samples/unit of importance value, I think I'll add an option to choose betwenn stopping criteria.
metropolis_8.diff
Wow! Can't wait to test it (whenever a build apparso on graphicall). About more advanced difference predicatore, i remember i posted somewhere a link to a paper named something like "Entropy variance". That could be something worth to look at
I actually did a search... Google this: "entropy based adaptive sampling paper"
Quick test on the Mike Pan BMW scene:
No adaptive sampling (2:39):
Adaptive sampling on (warmpup=25, map update=25. 3:03):
@JasonClarke OK, this isn't much improvement. What rendering settings did you use (sample number, tile size)?
@lsscpp Once the adaptive sampling works in Metropolis mode, I'll post Windows and Linux (both x64) builds on graphicall, for other platforms (x86 and Mac) somebody using them would have to post a build. Maybe once these are online we should put a link into blenderartists to get some beta testers. Concering the entropy paper: It certainly looks interesing, do you have any information regarding realtime performance?
The rest of the settings on that test were the defaults for the BMW scene, so 128x64 tiles, 200 progressive samples. I'm not sure it's realistic to expect much better when we still have the same number of AA samples on all tiles. Large tile sizes have a performance hit of their own in CPU mode, so just making them bigger isn't realistic either. I think we really need to wait until there's an option to stop some tiles before max samples (that way you can just pad out the max AA value and only use it on the tricky tiles).
@LukasStockner no, unfortunately i have no clue about performance
@JasonClarke can you please make another BMW test with no adaptive, letting cycles run for 3:03 as well, so we can see how better the algorithm distributed the samples in the same amount of time?
New patch version, this time with the focus again on Metropolis. The last patch broke it due to the samples pass, now I fixed the bug. Also, there was another pretty serious one where samples were written to the wrong pixels, this is fixed now too. Preview rendering should work now as well.
Noise-adaptive sampling in Metropolis doesn't work yet, but if you check adaptive sampling, it uses another trick from the Importance-Sampling paper I posted a while ago: The importance function is divided by the current brightness of the pixel, to that the samples are distributed more evenly (by default, the Metropolis sampler samples according to brightness). By doing so, it focuses more on good lightpaths instead of just bright regions, you can see this in the Shadow (amount of samples) and Emission (Importance) channels (by the way, sorry for overriding all the default channels...). I haven't run extensive tests with this one yet, but is seems as if it gives a nice enhancement.
Regarding tests: I rendered the Sintel hair scene from http://www.blenderguru.com/videos/how-to-render-hair-with-cycles/ in two instances of blender, one with Metropolis and one without. Both had 2 threads and two tiles, one at the top half and one on the bottom half. After 2 hours, this were the results:
Metropolis:

Pathtracing:

As you can see, the Metropolis version has remarkably less noise, especially around the neck and the lower part of the hair. Also, this shows that Metropolis converges to the correct solution.

Another test is the pool scene, also 2 threads and 12 hours (probably less would have been enough):
The next patch might take 1-3 weeks since I have to work on another project for school (it's also rendering/CGI, so I won't get out of practise), but then the Tiling issues should be gone.
metropolis_9.diff
It turned out I forgot to save aome code before creating the diff, so here is the correct version:
metropolis_10.diff
Originally I wanted to release a new patch once all the tiling stuff is done, but I randomly found that a small change in the sample number pass causes the noise on diffuse areas/background to be completely gone. For example, a plane under a sky background is now nearly noise-free in a single pass.
Also, Metropolis sampling is now tile-free, every thread works on the whole image. This means that they can now also share the mean importance calculation which caused the brightness differences between tiles.
Somehow, screen-wide sampling seems to break automatic EXR writing, for example, when rendering the Sintel hair scene, after finishing/cancelling the render, Blender freezes with an OpenEXR error on the console, I'll look into this further.
Adaptive sampling is still only per tile, also the Division-By-Mean-Brightness was disabled for now. Both will be fixed in a later patch version, of course.
Just to give a impression of what the Patch is capable of now, consider this scene:
That's the Villa scene from PBRT, imported into Blender with a quick and hacky pbrt-to-obj converter and rendered over the night. The materials are re-done with Cycles. Nearly every surface in this scene has a glossy component, even the wood and the walls/ceilings. The light comes from small emitters completely behing glass, two polit light sources inside of the spherical things on the ceiling and, although only a very small amount, from the TV. For pure Pathtracing, this scene is basically worst-case, while with Metropolis even the area to the left, illuminated through three glass panes, is nearly noise-free. The noise remaining in the back section of the room should get better once the Division-By-Mean-Brightness (this name is horrible...) works again. The dark artifacts around the light sources come from tonemapping.
This is the classic pool scene, rendered in just 8min.
metropolis_11.diff
Hi, I would like to test the new development but got build error.
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:21:33: fatal error: kernel/kernel_types.h: Datei oder Verzeichnis nicht gefunden
#include <kernel/kernel_types.h>
compilation terminated.
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o] Fehler 1
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/all] Fehler 2
make: *** [all] Fehler 2
Opensuse 13.1/64
Intel i5 3770K
GTX 760
GTX 560Ti 448 Cores
Driver 331.49
Blender
3c3c2cd
Thank you, mib.
@LukasStockner
8 minutes on CPU / GPU? That is amazing! How long it will take to made it to the let's say daily builds ?
@mib2berlin
Sorry, my fault, it seems like you have a remarkably strict compiler^^
metropolis_12.diff, that should fix it.
@MaciejJutrzenka
8 Minutes on a FX8350, I'm glad you guys like it. Regarding the builds, for this it has to be accepted into trunk, but IMO it's not ready for code review yet, the features aren't complete yet and the codestyle is rather messy. But once I got the Windows buildsystem running, I'll post Windows/Linux builds on GraphicsAll. Sadly, I can't build for Mac.
Nope, it is really strict. ^^
gcc (SUSE Linux) 4.8.1
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:21:33: fatal error: kernel/kernel_types.h: Datei oder Verzeichnis nicht gefunden
#include "kernel/kernel_types.h"
compilation terminated.
Thanks for fast reply, mib.
OS X/Clang is complaining about something else:
Compiling ==> 'device_cpu.cpp'
In file included from intern/cycles/device/device_cpu.cpp:45:
intern/cycles/util/util_metropolis.h:80:60: error: use of undeclared identifier 'ulong'; did you mean
intern/cycles/util/util_metropolis.h:80:90: error: use of undeclared identifier 'ulong'; did you mean
2 errors generated.
scons: *** [/Volumes/Home/Jason/Developer/Blender/build/darwin/intern/cycles/device/device_cpu.o] Error 1
scons: building terminated because of errors.
OK, I really need to install a second compiler :D
@mib2berlin
The include thing is really strange, try if this one works better. If not, I'm out of ideas.
@JasonClarke
This part was probably overkill, now it should only give a warning about precision loss, which is no problem.
metropolis_13.diff
Lucas, may with new error you know more:
Linking CXX static library ../../../lib/libbf_intern_cycles.a
[ 84%] Built target bf_intern_cycles
[ 84%] Built target cycles_bvh
[ 84%] Building CXX object intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:44:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h: In function ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_importance.h:28:8: warning: no previous declaration for ‘float* ccl::variance_to_importance(float*, ccl::KernelFilm*, int, int, int, int, int, int, int)’ [-Wmissing-declarations]
float* variance_to_importance(float buffer, KernelFilm film, int stride, int pass_stride, int offset, int x_ofs, int y_ofs, int width, int height) {
In file included from /daten/blender-git/blender/intern/cycles/device/device_cpu.cpp:45:0:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h: In constructor ‘ccl::Metropolis::Metropolis(ccl::KernelGlobals*, double*, double*)’:
/daten/blender-git/blender/intern/cycles/device/../util/util_metropolis.h:81:65: error: cast from ‘ccl::Metropolis*’ to ‘ccl::uint {aka unsigned int}’ loses precision [-fpermissive]
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/device_cpu.cpp.o] Fehler 1
make- [x]: *** [intern/cycles/device/CMakeFiles/cycles_device.dir/all] Fehler 2
make: *** [all] Fehler 2
Sorry for so much trouble, mib.
Clang barfs there too:
1 error generated.
In TortoiseGit I'm getting "Path Format Detection: Fail" and it refuses to patch. Any ideas?
@mib2berlin, @JasonClarke Sorry for all these errors, I hope this patch finally works.
@MatthewHeimlich Not really since I never used TortoiseGit, but just try this one.
metropolis_14.diff
Anyone using TortioiseGit will need to use the git Apply command. The GUI apply only works for diff's made with TG.
Lucas, thanks, working fine now.
Cheers, mib.
Ok, patch 14 did the trick! Here's two test renders. The thing that hit me immediately was how slow MLT mode is now. It took 90 seconds to do 10 samples. Progressive can do a 120-sample render in the same amount of time. (here's the outputs of those two)
Progressive:

MLT:

(also, something seems funny with the volume sphere in MLT)
Here's a 10 sample render with the BMW, took about 8mins.
Progressive on that same scene is 2:40 for 200 samples.
@LukasStockner: I had to change line 239 in util_color.h to the following, msvc2008 complained otherwise:
But yeah i can observe the same, 10 Samples to render the default cube takes over a minute here, with regular Path Trace just about ~1s.
@JasonClarke Yes, the speed is indeed extremely low. At the moment, I can think of three reasons for this:
First, sampling overhead. Especially a high max bounce value can slow it down significantly, lazy sample generation could help there. However, with increasing scene complexity, the impact of this should reduce.
Second, caching. In the classical PT, most of the geometry should be still in cache from the previous Ray, while in Metropolis, at least for large mutations, this is usually not the case. This issue probably gets bigger with scene complaxity.
Third, BVH traversal. I haven't looked at the Cycles BVH yet, but considering it is originally targeted at GPUs, there is quite a chance that it is designed for high ray coherence, which, as said above, is not given in Metropolis.
Probably, running a full Valgrind profiling session might give some hints wich one of these is responsible.
@ThomasDinges: Thanks, I'll include it in the next patch.
Hm, maybe a windows problem.
Render testfile here, it is slow but impossible with path in any time. :)
Need 16 minutes on i5.
Btw., default cube need 20 seconds here.
Cheers, mib.
Added subscriber: @DavidSisco
Lucas, I start a thread on BA about your work.
May you jump in if you finished the win/lin builds for user.
http://www.blenderartists.org/forum/showthread.php?329089-Cycles-MLT-patch
Cheers, mib.
Windows x64 Build (MSVC 2008): http://blender.dingto.org/blender_win64_mlt.zip
Based on Patch Nr. 14.
@ThomasDinges Awesome, my building system on Windows somehow refuses to work, so thanks!
@mib2berlin Thanks, certainly a good idea.
Unfortunately, my BA account was blocked by the Spamfilter, so I currently have to await manual activation...
Also, I only have access to a slow laptop and an even slower WiFi until Friday, so, while I will continue to work on the patch, I probably won't be making big progress.
Over SSH, through, I will try to run a profiler session on my PC at home, maybe this will give some more information on where the bottleneck is.
Once my BA account works, I'll post a somewhat more detailed description there. The next steps IMO are working screen-wide sample distibution (Probably the best approach is to let every tile run ~25 samples, then calculate a sample distribution. If any tile has received more samples than its allocated sample budget, it is stopped) and speed improvement (The main question here is whether BVH and cache or the sampler itself are the bottleneck).
in my tests mlt very slow and give many noise in dark areas
if u don't have opurtiny to code... write documentation and help about settings.. and what options are doing etc manual..
Added subscriber: @holyenigma
m9105826, can you explain how to use the Git add command for windows.
give an example path etc.. Git is such a PITA
thanks
Added subscriber: @SebastianRothlisberger
New patch version, this time optimization! It turned out to be the sampling, the fix was quite easy...
Basically, if you have 8 max. bounces and 8 max. transparent bounces, it generated 16*12 = 192 sample values, while often only 1-2 bounces were needed. Now, the samples are generated on demand.
For testing I used 2 scenes: One, the default cube at 3 samples, and two, a level-2-subdivided glass Suzanne lying on a plane at 40 samples. Both used one thread.
Results Metro 14:
Glass PT: 3.4sec Metro: 12sec
Cube PT: 0.7sec Metro: 9.6sec
Metro 15:
Glass PT: 3.5sec Metro: 4sec
Cube PT: 0.7sec Metro: 1.7sec
By the way and highly off-topic: I just found Embree, a optimized BVH library for CPUs, written by Intel. Considering we use NVIDIA code currently on GPU and CPU, maybe Intel code would be optimized better on CPUs. The API looks quite nice, I'm currently trying it out in a private VCM implementation of mine.
metropolis_15.diff
Really great developement, thank you Lucas.
I'll try to compile it on osx. (Is there already a "public" one? I can post mine if not)
Nice speedups!
Doesn't link on Windows unfortunately: (also the log(10) issue in util_color is still present)
lucas i think blender already uses embree code for bvh.. ?
We use some code from Embree 1.x, in BVH Traversal and BVH build, yes.
Patch 15 fails to link for me (on linux):
We use some code from Embree 1.x, in BVH Traversal and BVH build, yes
Hmm. since embree is at 2.2 now,
i wonder if the embree bvh has been updated much since 1.x
Added subscriber: @candreacchio
i wonder if the embree bvh has been updated much since 1.x
According to this PDF here (their siggraph presentation for embree 2.0) There are significant speed ups... not to mention that this is only for the code to 2.0, not to mention more optimizations in 2.1 and 2.2
c:\Blendersvn\build> |
Linking error Patch 15
omg sorry.. can someone shorted that?
i cant edit it on here.
Okay, my bad. I forgot to add util_importance.h/cpp to CMakeLists in the utils tolder, so it doesn't eork for CMake users. Just add them and it should work.
hmmm.. i added it but now i get an error when cmake generating
Configuring done
CMake Error at intern/cycles/util/CMakeLists.txt:74 (add_library):
You wrote .ccp instead of .cpp as the file extension.
umm you're correct i spelled it wrong..
however i dont have a util_importance.cpp file in that folder.. ?
can i compile with it(leave the .cpp out an just use the .h?
OK, I failed -.- I meant util_metropolis, not util_importance.
now i get error about compositor?
c:\Blendersvn\build> |
Ehm, this looks like an error in your Makefile. Try a full clean and recompiling, but since I use SCons, I don't really know what to do in CMake.
CMake compiles after adding this extra patch: cmake_fix_for_metropolis_15.patch
I'm getting an error from cmake telling me there's no util_metropolis.cpp file. Indeed, upon searching, there isn't. Diff applied against master on Win7 64.
lockal, thank you that did it!
compiled successfully mingw64 cmake..
metro_15 is considerably faster than metro_14
m9105826
make sure you applied the metrolis15.diff first then apply the cmake_fix for 15.patch
try it on a fresh clean blender folder
thanks lucasstockner97 keep up the goo work. :)
Indeed, no more cmake issues, but I get the same exact linker error as Thomas with VS2013, 64-bit.
here is the metro_15 mingw64 cmake build of blender 2.70 with the new splash. :)
http://www.mediafire.com/download/o1azyqyqo3noq4u/Blender_mingw64_metropolis_patch_15.7z
Tried Holyenigma's build, and I have a bug report for you.
I noticed that Metropolis sampling will crash Blender whenever there's a volume material in the scene, even if the only object is the default cube.
This doesn't affect the adaptive sampling though, though it seems MingW builds will become unstable if you zoom and pan a bit while the scene is rendering.
more samples give more and more singledots noise around object
in pathtracing this noise is uniform and not cotrast, in mlt not
Lopataasdf; Having tried Luxrender before, this is an expected side effect due to how Metropolis sampling works.
You see, Metropolis is finding lightpaths that would otherwise not be found with normal pathtracing, so the dots represent those paths and they should converge to a result that contains lighting effects that are hard to get in Cycles without the patch.
There appears to be something weird going on in the MinGW build in the viewport. Looks to be some kind of issue with the Alpha value? Using just path tracing, transparency is turned off. Sorry if this is a known issue.
F12 rendering doesn't produce the same issue.
ace_dragon, can you post a .blend of the volume material that crashes?
m9105826, the dingto build does the same thing.
so it must be something else.
Building fails for me (cmake, archlinux) with metro_15 and the cmake fix:
Holyenigma; do I really need to post a .blend, here are a few steps to see for yourself (it's real easy).
1). Open Blender
2). Give the default cube a transparent material with a scattering volume
3). Switch to Metropolis sampling and leave every other setting alone
4). Watch Blender crash when Cycles gets ready to update the render window.
And that's the thing, the crash happens not right when I hit F12, but when Cycles is about to show the initial results on the screen.
ace_dragon, thanks i see it now.
i also noticed mingw build wont open some of my old blend files(dingto build opens then fine)
i opened the old blend with dingto build and saved it, then it loads into mingw build
i figured out what was causing the .blends on mingw to crash on open..
if the .blend is saved with transparent checked under Film, uncheck it and save the .blend
Okay, thanks for the reports.
Anyways, I'm going to work on this. Once I have something new, I'll post it here.
lucas, try opening this with mingw build,
works fine with vc build.
http://www.mediafire.com/download/iiboskfrnrrz6ea/fire_extingusher.7z
if the background is pink just point it to your own hdr,
or use sky background in world setting.
possible new build with the newest changes.?
Windows 7 64
ahh sorry there is one. i missed
I had another really strange issue that I'm having trouble recreating at the moment. It was in the same session where I did the transparent screen grab. After working for a few minutes and doing a couple of f12 renders, I turned viewport rendering back on and got extremely blown out lights and strange colors not previously present in the scene. Switching back to path tracing caused different, but similar results.
Additionally, path tracing and metropolis have been converging to different results. I'll post an example .blend when I'm back on my PC.
Added subscriber: @xs
Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
New patch version, cmake should now work ouf-of-the-box and the volume bug is fixed.
It turned out that the fix for #38710 caused it, now when using Metro, the bugfix is simply not applied. This doesn't mean that #38710 now appears in Metro however, because it is related to the sampler and Metro overrides that one anyways.
This is my current issue-list in random order:
Have I forgotten anything? If yes, please tell me.
metropolis_16.diff
Added subscriber: @GoldenCrescent
(Bump) Hi,
I've been looking through this thread to find a compiled version for Windows 7/32.
I am aware patches are released as code, not compiled, so I may miss the latest. I am not ready to experiment with compiling..
So compiled version?
Thanks,
Christos
Just to let you know, I'm still working on the patch and making progress, the transparent viewport bug is fixed, Importance Equalization works and most of the image-wide adaptive sampling is done as well. Once everything is stable enough, I'll post it here.
Perfect :D
@LukasStockner any comment about this?
http://www.luxrender.net/forum/viewtopic.php?f=8&t=10947
@MaciejJutrzenka OK, now that's a remarkable coincidence o.O
In fact, I've been working with VCM for over a year now for another project of mine (lightpath reuse in animated scenes, saves up to 50% of the calculations) and only one month ago I had basically the same idea (merging with vertices in the volume my adjusting the geometric couplign term). It's really awesome to see that this indeed works and has already been implemented :D
Regarding Cycles: Adding it would probably be possible, but it's not comparable to the Metro patch. You would first have to add Bidir, then add Photon-Tracing support like in PPM, then add the VCM merging code (which is particularily hard to implement correctly due to the various weighting terms) and finally add volume photon storage and beam queries. This is a HUGE amount of work and, considering that the whloe code structure of Cycles is made for Path Tracing, it would be probably hacky and messy.
By the way, the current patch takes so long because of the VCM project described above, because in 1 week there is a competition for which I still want to code a GPU implementation of my animation-VCM code, so currently it requires most of my time. Sorry for that, development will be faster afterwards.
Interesting piece about adaptive sampling that popped up on BA today. Dade claims it's incredibly easy to implement over an existing tiled renderer.
http://www.luxrender.net/forum/viewtopic.php?f=8&t=10955&sid=8774701c5a8832e330716af5a8f74863
@MatthewHeimlich Wow, that looks really useful. I haven't read the paper yet, but it definitely sounds useful and usable for the patch. Really incredible work by the LuxRender guys there, always impementing the latest stuff.
Added subscriber: @Lapineige
A video of the new adaptive sampling at work. I'd need to see some more complex scenes to judge definitively, but so far this is the first adaptive sampling method I've ever seen that appears to "just work".
https://www.youtube.com/watch?feature=player_embedded&v=P_QmdpnKTW4
@MatthewHeimlich
Excellent video: now I get it... how it works.
The threshold based on noise rather than uniform for all tiles.
Please i ask again: Does this exist in compiled "ready-to-use" form or only as a patch requiring me to compile - something I don't know how to do. I use Blender 2.69 and 2.70 under Windows 7/32
@xs If you do not know how to patch properly, i probably would suggest just sticking to the main blender... this is a work in progress and is not ready for mainstream use.
To be clear, what's shown in that video isn't available in Blender at all. It's a demonstration of the new method available in LuxRender that I linked to above. Should be relatively painless to integrate into Cycles, though.
And it's what I've been waiting for a looong time... :)
http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=2001913&viewfull=1#post2001913
http://blenderartists.org/forum/showthread.php?255683-Cycles-status-%28as-of-May-14th%29&highlight=cycles+status
http://blenderartists.org/forum/showthread.php?236453-Measuring-Noise-in-Cycles-Renders&p=2370199&viewfull=1#post2370199
http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering&p=2395532&viewfull=1#post2395532
I'm still convinced this will be the BIGGEST performance optimization
BTW, Yafaray has something similar since ages. Wondering what is the noise-aware implementation there...
Isn't Yafaray's just a simple contrast-based sampler? Just hunts edges, not high-noise areas?
Developer-question here: How do you think this should be implemented? My current plan is to treat the Samples setting as a maximum value when using the stopping criterion and add a maximum tolerated error setting. As soon as one of them is reached, the tile stops. This, however, is incompatible with progressive rendering and Metropolis, but I think we can live with that.
In fact, most of the code required for this is already implemented for the current adaptive sampling (most importantly, a per-pixel sample-number buffer), so the stopping criterion should be running quite soon. In-Tile adaptive sampling can be kept the way it is now, only using the new variance estimate (contrast to even-sample buffer).
I'm really looking forward to see how good it works in Cycles, for SLG the speedup seems quite impressive.
BTW: The viewport bug is fixed and another one that broke DoF and Motion Blur is fixed, too. Importance equalisation works as well, once the stopping criterion works, I'll publish a new patch version.
Added subscriber: @Lukas-132
@JasonClarke
I actually don't know ;)
@Lukas-132
cool, can't wait...
what do you mean when you say that the "max_sample_limit + maximum_tolerated_error" are incompatible with progressive rendering?
Lukas: Sounds like the best way to do things. Halt render if either criteria is reached. For certain scenes (I have one with a half lit face where almost a full half of the samples are wasted on the side that is almost completely black that comes to mind) this would be an ENORMOUS time saver.
I dare.
There is also one more opportunity: time halt condition. To make this possible the engine should know how long each tile takes to render, so a first pass (with a minimum sample number to be decided) should be done for all the tiles. This would give two benefits: a proper render-time estimation, and a first rough look at the whole render output.
http://ompf2.com/viewtopic.php?f=3&t=1933
Here is some more discussion from Dade and others on halt conditions/adaptive sampling. Lots of nice papers linked, lot's of good discussion of pros and cons from people who have implemented some of them.
lsscpp, I completely agree. It would be very nice to have a rendering algorithm that ran a few samples at lower sampling rates across the entire image to show a very rough preview of what's coming. Others do things this way and it makes a huge difference in catching issues early on in the render.
The basic implementation works now, but there is a problem: correlation. Taking only even samples of a 1000-sample render gives a slightly different picture than using all samples of a 500-sample render, at least for the BMW scene. With CMJ-samples, the problem is gone. Apart from that, it seems to work quite OK, in 1-3 days the patch should be ready for a new upload.
By the way, another open question: Should the stopping use average error or maximum error inside of a tile? Currently, the code works like "average below threshold and maximum below 2*threshold".
IIRC LuxCore implementation uses max error inside each tile, and it sounds reasonable too. This way you always know how much the error might be.
Can't wait to test this out. Can you (ora anyone) please upload the build once it is ready?
I'm not sure i'm following entirely, @LukasStockner are you keep refining Metropolis or trying the Lux new adaptive one?
Basically both, but I think Metro works quite fine already. Once the Importance Equalisation is stable and the adaptive stuff is done, in my opinion the patch is ready for extensive testing and code cleanup.
First of all, sorry for the permanent delays. I underestimated the amount of work this would take and a ton of other stuff needed attention.
However, by now I solved the Sobol problem by adding a RNG dimension that decides whether the sample goes to the "even"-buffer instead of really taking only even samples, and adaptive sampling inside the tile now also is done according to the error in the pixel so it's consistent. All in all, it starts to work really good now, after I removed some debugging code and rebased to the current trunk, metropolis_17 should be ready quite soon.
Some images here to show you how good it works, all of them are pure PT and rendered for 2:30 min:
No adaptive stuff at all:

Adaptive stopping, no adaptive sampling:

Both adaptive stopping and sampling:

As you can see, the noise is distributed more uniformly in the images with adaptive stopping and especially the headlights and their reflection look better. The doors, on the other hand, are a bit more noisy. This is because the adaptive stopping doesn't work any faster, it just distributes samples differently. If the headlights are to receive more samples, some areas have to receive less.
This is the same scene (obviously), this time 8:30 min:
No adaptive stuff:

Both adaptive stopping and sampling:

Added subscriber: @brecht
Great work!
For adaptive sampling, does it measure the noise in linear or in display space? If the former, I wonder if the result could be improved by doing it in display space, as linear space would underestimate the noise in dark areas.
Thanks!
Currently it works in linear space (I'm assuming the RenderBuffers are in linear space), but the difference between regular and even pass is weighted by a Threshold-Versus-Intensity function.
Still, additionally weighting by the tonemapping function is probably better (as far as I remember, the paper that used the even-buffer trick even did this). If I remember correctly, the linear-to-sRGB is mainly a pow() operation, the speed impact shouldn't be that high (also, the noise estimation is currently only done every 25 samples).
I'd be interested to see this on a scene without lots of glossy noise, something where shadows or indirect noise take up a good chunk of the render time with master Cycles. Something like a half lit face should benefit quite a bit from adaptive sampling like this.
Juicy results!
Lukas, what is actually the difference between "adaptive stopping" and "adaptive stopping+sampling" ?
as m9105826 says. test could be done on a more difficult scene as this: http://www.blenderartists.org/forum/showthread.php?331149-The-new-Cyles-GPU-2-70-Benchmark
Added subscriber: @nudelZ
New patch version, the adaptive stuff got a massive update, importance equalisation now works again and CUDA/OpenCL should build again (without Metro support). Adaptive stopping works on CPU and GPU, although it might slow down GPU a bit since every test has to fetch the buffers from device memory. I didn't test it, but in ~1 week I'll get access to a CUDA 2.0-capable system so I can fix the CUDA support. Adaptive sampling will get GPU support, but it's not finished yet.
For adaptive stopping, set the Stopping Threshold value under the Performance tab to a value > 0. The adaptive stopping works parallel to the classic samples-value stopping, so if a tile hits the specified sample number, it's stopped regardless of whether its error is below threshold or not. In my experience, something around 1 gives a rough-preview-quality result while values <= 0.25 appear noise-free. The test whether the tile is done starts after some warmup samples have passed and is performed in a specifix interval, the settings for these are under the threshold. The checkbox will activate adaptive sampling inside the tiles. By the way, adaptive stopping also works for Metro.
In Metro mode, the adaptive sampling checkbox activates Importance equalisation. This setting basically causes the Metro sampler to distribute the samples more uniformly instead of sampling according to brightness (technically, it samples acccording to path brightness divided by average pixel brightness, therefor still favoring high-energy paths). Sometimes this helps really a lot, in other scenes it's pretty useless.
@lsscpp
Adaptive stopping looks at the tile and decides whether it's done or not, while adaptive sampling distributes the samples inside the tile to areas of high error.
This is useful, for example, if the tile is at the boundary of Background/Object, since without adaptive sampling, all of the tile will be sampled until the error is low enough. With adaptive sampling, however, the Object will receive more samples than the background.
Basically, adaptive sampling is what the patch already did, while adaptive stopping is the new feature.
By the way, in the 2.70 benchmark, there is nearly no improvement since the noise is already distributed quite evenly.
metropolis_17.diff
Added subscriber: @plasmasolutions
The patch doesn't seem to apply for me with latest master..
Hi Lucas, patch not apply on
1e6fa59
from today and get build error:Hunk #1 FAILED at 64.
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
With patch -p1 < metropolis_17.diff
Thanks, mib
Added subscriber: @00Ghz
For Metro Octane Render Team made some custom implementation. Most likely it's patented but maybe we can find a similar hybrid solution.
What do you think?
They use ERPT with Population Monte Carlo I think. It's not clear to me that this would be better than MLT combined with the adaptive sampling.
I suggest to look at the "Multiplexed Metropolis Light Transport" paper when it comes out, I saw some impressive results from that (sorry, I have no link) and it should fit well here.
Well I and many people out there want GPU support, so whatever works for that will be good.
Also Octane is getting out of core textures(no more GPU memory limit). Any chance to get something like that for cycles
Added subscriber: @NahuelBelich
OK, here is a corrected version. One line in kernel_types.h was different between trunk and patch.
Regarding the Octane method: Running a large number of concurrent samplers on GPUs would be possible, it's quite similar to ERPT and LuxRender already uses it. However, I'm not too sure whether this is any better than PT since fewer, longer MCMC chains should explore the sample space more evenly.
The Multiplexed Metropolis paper sounds definitely interesting, I'll look at it once it's released.
metropolis_17_fixed.diff
http://cs.au.dk/~toshiya/
here is the guy making it.
And apparently there is a paper on some SSS method as well.
It's a dipole based method, useless for us. This isn't the place to discuss that anyway.
No luck compiling against master here. Lots of errors thrown from util_color.h about ambiguous calls to overloaded functions. VC2012 on Win7 64-bit.
Added subscriber: @akishin
@MatthewHeimlich I've been trying to setup a VC2012 environment to reproduce your errors (also W7 64), but can't get it to work properly. The instructions in the Wiki still use the SVN trunk, it works fine with this one. However, once I use the Git trunk, CMake fails to find the libs. My folder layout is "E:/Blender/blender" for the trunk, "E:/Blender/build" for CMake and "E:/Blender/libs/win64_vc11" for the precompiled libs (still from SVN). CMake won't find Boost, OpenJPEG and Python 3.4 (the precompiled libs still have Python 3.3) and stops afterwards. Could you tell me where you got your libs from or did you compile them yourself?
Regarding the error: I suppose it complains about linear_rgb_to_gray? If yes, a workaround would be to rename the new float4 version to linear_rgb_to_gray4 and changing the occurences in device/device_cpu.cpp (after the Metropolis kernel call) and in render/buffers.cpp (in build_importance_map()). Of course, this still needs to be fixed properly, but GCC doesn't seem to care about it at all...
VC2012 is not supported anymore, we should remove the leftovers. Use VC2008 or VC2013.
Such confusing naming for VS/VC. I am indeed using 2013.
@ThomasDinges Thanks, works great (apart from an error complaining that lib\win64_vc12\release\python34_numpy_1.8.tar.gz is missing, but it doesn't seem to matter)
@MatthewHeimlich OK, I fixed the error. It's not the linear_rgb_to_gray(), but the logarithm in "return exp(log_i * log(10))", in the linear_gray_to_tvi() function. Changing "10" to "10.0f" fixes it.
Oh, and another thing, "struct RenderTile;" in util_metropolis.h needs to be changed to "class RenderTile;"
Another, probably unrelated thing: In a test scene, when I enable Cycles and go to the Material tab, I get an "Assertion failed" in the python34_d.dll ( ../Python/ceval.c, different lines, but always "!PyErr_Occured()" ), even on trunk. Is this normal?
Since I just setup my Windows toolchain, I thought I'll just post my build here. It's based on today's trunk, built with VC2013 for Windows x64 (x32 will follow soon). It lacks some features (mainly GPU and OSL), but the standard stuff should work.
I hope I included every needed file, if something is missing, please tell me and I'll post it. By the way, the patch doesn't affect performance in classical PT without the new options in any way, in fact, on my machine this build renders faster than the 2.70a release.
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
That's because 2.70(a) has been built with vc2008, vc2013 makes Cycles 15-20% faster for free.
WOW. I tested the scene I had spoken about previously. What a savings! Image quality difference is negligible, but a bit over 75% faster render times. Excellent work!
That was with only 6 of my 8 cores, btw. Tolerated error set to 3, update rate at 5, warmup at 10. Let me know if you think there are even better settings.
rendering freezed, when i use small render samples and corelatted multijitter
BMW1M_adptiv_sampl.blend
@MatthewHeimlich Well, that's quite an improvement. Scenes like this are of course best-case for adaptive sampling, but especially big background areas are quite common.
Regarding the settings: The tolerated error sounds quite high to me, but indeed the rendered image looks just fine, so I guess it's okay.
The other two settings seem reasonable, basically they depend on how expansive tracing a single sample is (when the tile is rendered at 100 samples/sec, checking for convergence every 5 sample might give a significant slowdown, while for something like 5 samples/sec it's just fine).
This gives me another idea: Maybe a time-based update rate would be better than a sample-based one?
@lopataasdf Yes, this scene freezes for me too. Actually (or at least it seems like this to me), it's not freezing completely, but somehow blocking the GUI while rendering, since after ~1min it shortly got responsive for me again. I'll try to reproduce this on Linux since I have no idea of MSVC debugging...
Could someone try to reproduce this with another Windows build to see whether it's related to this specific build, the compiler, the OS or the whole patch?
Time based could be a problem when the scene is rendered on another computer with much different performance (say, an old workstation that retired to the render farm).
What about setting warmup/update as a percentage of the total number of AA samples? Like, every 10% of completion, do the update?
Added subscriber: @craigar
Your build
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
won't load on my Win 7 PC, it crashes before ever loading.
I know you talked about "vc2013 makes Cycles 15-20% faster for free. Could it be because of the "MSVCR120.dll" included should be "MSVCR130.dll" - or is this unrelated to your usage of "vc2013"?
Problem signature:
My system only has a RADEON 6450 GPU, must this build run on a system with a CUDA GPU?
My system is a stock Dell XPS8300 with Windows 7 Home premium 64bit, i7 CPU, 12 GB ram
Thank You
Just replaced the RADEON 6450 with a nVidea 550ti, and this build still won't load in my particular system.
Also the Windows 7 Home premium 64bit is service pack 1 - I am leery of MS service packs.. :)
Thank You
but vc 12.0 == vc 2013
found out why it wouldn't load - there needs to be a folder named " 2.70" , with only the folders you already included ( datafiles, python, scripts) inside of it , then it opened!
Looking forward to trying it out
Thank You
rendering of "BMW1M_adptiv_sampl.blend" freezes after about 27/150 tiles (64x64), and continues to consume 98% of CPU power - have to "kill" (end task) blender using task manager, takes about 5 seconds to "kill" blender after clicking button, and minute to get to task manager, cuase the PC is responding so slow. After stopping blender the PC runs fine & can try another instance again without any noticeable problems
thanks
craig
tried it with 96x96 tiles, still crashes on render (13/60) and hangs the CPU drawing almost full CPU power until I force task manager to stop blender.
Thanks again
Maybe make adaptive sampling as separate patch and include to blender ? It is very useful feature.
different noise level
BlenderGuru_InteriorRenderingStarter.blend
@loptataasdf: cranking tolerated error to 0.25 seems to get rid of it. I only see that error at preview-quality settings.
Hi lucas, cant apply the patch anymore with latest master
1 out of 19 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_path.h.rej
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
Thank you, mib.
@craigar I'll look into the freezing, it also happens in my Linux build.
@lopataasdf The different noise level is quite expected, since the error of the tile is currently based on an average over the tile and the max error in the tile. In your example, the right-side part of the low-noise tile probably needed more samples, so the left part also got sampled mode. In the high-noise tile, the error is more uniform so the rendering stops earlier. This is the reason why I also added the in-tile adaptive sampling (it looks to me as if you didn't use it, if you did, please tell me), which would give the right side of the low a noise tile more priority, making the tile pass the convergence test in less samples.
By the way: The smaller tiles are, the more fine-grained the adaptive stopping works. I usually use 8x8 or 16x16 in my tests, which usually (at least on CPU) is faster than large tiles.
@mib2berlin Yes, the bake merge broke the metro_17, I'm currently rebasing the patch. Once I'm done, I'll publish metro_18 (also with the MSVC fixes).
Regarding splitting the patch: There are some features used by both Metro and Adaptive (most importantly the Samples pass), so a clean split is quite difficult. My suggestion would be to setup a "advanced sampling" branch that could be merged to trunk feature by feature (Basic Adaptive Stopping, Ada-Sampling, Metro) once it's tested enough and the codestyle is fine.
Added subscriber: @ideasman42
Had a look at updating the patch (since some changes I made might have caused conflicts),
However it looks like
ScenePassType
now has no bits left, (so perhaps it has to be extended toint64_t
)This patch may be almost ready to simply commit to trunk as an experimental feature, I'm currently testing adaptive metropolis sampling with adaptive stopping on a very difficult scene and several areas have already shown to converge an exponential amount faster than with generic pathtracing (more than 20x faster at the least).
If it's that intertwined, simply committing it with the current feature set might actually be less work.
The code needs to go through extensive code review first via our Differential system here: https://developer.blender.org/differential/query/open/
For 2.71 it's also too late.
Before submitting for code review, there are still some issues I'd like to fix:
So that means?
Will be in 2.72?
Going to get Metro on GPU as well?
How hard - if ever possible - is it to implement per-pixel stop condition?
@00Ghz Whether it will be in 2.72 depends on how fast it passes code review. There's a chance, but it's not sure. Regarding GPU Metro: As I said before, it would be possible, but I think it would only be slightly better than PT and not worth the effort. Maybe I'm wrong with this, but definitely, GPU-Metro will never reach CPU-Metro (since it requires more and shorter Markov-Chains) in Terms of Quality per Sample. So: Maybe later, if it actually gives a benefit, but it's currently no priority for me.
@lsscpp It's not hard at all, actually. This would require a per-pixel rendering as in the old Blender-Internal times, which might even improve Cache Coherency (and therefore speed). The Pixels could be walked with a Hilbert curve for even more Cache performance, with one thread rendering one pixel.
The main problem would be pixels falsely being considered converged. For 64 pixels this is more unlikely since the errors in the estimation cancel each other out quite well. This might be solved by rendering each pixel ~10 samples more after it converged and then testing convergence again.
So, to conclude, it would be definitely possible and is an interesting option for future development.
Well in Octane Metro(well whatever hybrid they did) gives much better results for caustics. There might be some other things but don't know them.
Another couple of test renders, this time at 720p resolution. Still only using 6 of 8 cores.
Lukas; Okay, thanks for the full rundown on the remaining issues and points that need to be resolved before it gets submitted to review, I assume that you know a lot better of what needs to be done than what testers are finding.
I can only imagine how much better it will be by then once those things are resolved :)
Also, I think I have found that Metro actually seems to work with adaptive stopping when you're not using the 'progressive refine' mode, at least I had a very quick test render stop on its own after a minute with a high error tolerance.
Ok... I've been following closely and testing every build that has came out here...
And I have to say that this last build has given my CPU ~4 times faster renders than my GPU, when in the past, my GPU setup gives me 8x faster renders... My CPU is a rather old Intel Q9550 O.C. to 3Ghz... and my GPUs are two Nvidia GeForce 550ti.
This whole development is incredible, not only the Metro part, but the whole Adaptive Sampling and Importance Equalization give real-live-production renders an incredible boost in convergence efficiency!
Huge kudos to you, Lukas! Please, keep up the great work!
Do I read well: now your CPU rendering is 4 times faster than GPU... instead of 8 times slower??
This means 32 times faster in CPU/CPU comparison!! Or in other words, 3200% speed boost, or also rendering in ~3% of the time!
Ok, let's say that I misunderstand something here...
It does seem fishy I guess. It does seem way to fast.
Yes indeed, my tiny test showed way lower speed gains. Indeed almost no speed gain for bit-noisy renders. But this is probably my fault.
BTW I read this - http://lists.blender.org/pipermail/bf-cycles/2014-May/001921.html - on the mailing list And wondered if it could be used for some further balancing in the (adaptive) sampling process. Reading Brecht answer sounds like I totally misunderstood the paper though...
Yes, I'm saying that now my " CPU rendering is 4 times faster than GPU... instead of 8 times slower"...
I'm very aware that the scene I rendered was something of a "best case" scenario and not in all cases the speed improvement was that good, but it was better by 2x at the very least!
Here's a couple of renders that took around 8h (each) in CPU while they would take around 30-36h (each) in GPU. (original 1920x1080).
using the build lukasstockner97 posted Fri, May 2, 10:11 PM
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
it is only about 30% quicker on my i7 than just 2.7 cycles in a simple scene, that has a complex "diamond shader" from
http://www.blendswap.com/blends/view/39307
Where can I get newer builds of this?
Thank you

@craigar Currently nowhere since metro_17 was (up to now) the newest patch, I'll upload a 18-build shortly (most likely tomorrow).
OK, metropolis_18 is finished now. I included the MSVC fixes, so it should work for it as well (unless the new code broke it). CUDA works now, I tested it and everything compiles and runs (No, still no Metro on GPU). Adaptive sampling code now works on CUDA as well, but it gives results that are inferior to CPU and I currently have no idea why (it's still better than no adaptive, through). However, the performance of GPU rendering drops when using adaptive sampling, so that in some cases running without it might even give better results in the same time.
All in all, I changed quite a lot in both adaptive features, for example, now the adaptive stopping uses a power mean with p=4 instead of a regular quadratic mean, this gives more influence to the higher errors in the tile. Basically, the higher p is, the more the extremes are pronounced (1 gives an average, 2 the "regular" quadratic mean, and in theory infinity would give a simple max operation). Some other things were also changed, so you probably have to re-tune your parameters.
The progressive + adaptive stopping combination doesn't work yet, but the changed code layout makes it way easier now.
Importance equalisation now automatically switches on progressive rendering (for it to work well, set a low adaptive interval like 5).
I've worked a bit on codestyle, but its still not quite good yet.
Another thing: I removed the adaptive warmup, it now just uses one map interval as the warmup, but I'll probably re-add it.
Metro_19 might take a while since I currently have quite a lot of other things to do, but I'll still follow this thread. Real development will continue around the beginning of June.
metropolis_18.diff
I just read the Multiplexed Metropolis paper (from http://cs.au.dk/~toshiya/) and sadly, it's no use for Cycles at all (currently) since it focuses on choosing the right (s, t) pair for bidirectional path tracing. That's what the authors mean with "Combination of MIS and MCMC", since MIS was developed for choosing this pair at first.
I wouldn't worry about that right now Lukas, I and a lot of others would just be happy to see the adaptive metropolis sampling feature developed as far as you can get it and committed to Master. If you're thinking about adding a bidirectional sampler, then it would be advisable to hold that off until after this patch is completed, one thing at a time :).
Hi Lucas, I am sorry but _18.diff fail on Linux during patching.
I don´t try to compile.
1 out of 6 hunks FAILED -- saving rejects to file intern/cycles/kernel/kernel_types.h.rej
Thank you, mib
@Ace_Dragon Well, I do have some ideas what do do after this patch, but BPT is definitely not on it (see http://lists.blender.org/pipermail/bf-cycles/2014-May/001929.html for my reasons) and of course I'll finish this patch first :D
@mib2berlin Are you sure you use the current master? I just (5min ago) pulled the newest changes and metropolis_18.diff applies for me (also on Linux). The error in kernel_types sounds as if you haven't pulled in the baking commit yet, since then one Pass would be missing. If not, could you post the intern/cycles/kernel/kernel_types.h.rej?
I'll also include a new patch, this time in a better format (true patch instead of diff), maybe this one works better (although apart from 2 functions now being inlined, there is nothing changed).
By the way: I ran some benchmarks a few days ago, and it really seems that most speed issues are fixed. First of all, when compared with a unpatched master (built with the same compiler etc.), there was no speed difference in classic PT (with all new options off) apart from the usual +- 1-2% due to background processes. The Metro sampler that was so slow in the beginning also catches up quite good, the same scene (an indoor Archviz) took 8:31 with PT (no Adaptive) and 8:57 with Metro at the same sample number (200). The problem with 10-second-tests is that Metro has to do an "first pass" for the UV and ID channels that of course influences results when rendering with 5 regular samples (the first pass doesn't count, so if you set 5 samples, it will do first pass + 5 samples). Maybe a check for only doing this when UV or ID passes are activated would be a good idea.
By the way, something I forgot to say until now: Due to the way the multi-threading for Metro currently works, if you specify 10 samples and 4 threads, every thread will run 10 samples on the whole image, resulting in 40 samples/pixel. Therefore, for a fair comparison, you'd have to set 40 samples when rendering PT. I'll change this, but currently that's how it works.
metropolis_18_n.patch
Build fails for me with metropolis_18_n.patch and latest master:
@gandalf3 Well, this is a warning that also appears for me when building, however, I haven't modified kernel_accumulate, so it is most likely from master. The line "cc1plus: some warnings being treated as errors" sounds like you have enabled -Werror or something similar, or are there any errors reported further up (when you build multithreaded, the actual error might be further up).?
Unfortunately I don't really know what I'm doing here..
I tried configuring with
-Wno-error
and building single threaded, but there was a different error (or maybe the same one, and I just didn't see it the first time if it was further up):It seems you may be right about
-Werror
, but I didn't set it..You are right, was not on master.
Patch work and compile on my system.
Opensuse 13.1/64
i5 3770K
Thanks and cheers, mib
@LukasStockner
I see you upoladed
metropolis_18_n.patch
I don't have a compiler (nor do I know how to use a C++ compiler, or any modern code compiler) so I need a build for Windows 7 64-bit, or newer files that I can copy into the folders I made for the May 2 build
http://www.mediafire.com/download/1lkr2mqdd521atu/Blender_270_Metro_VC2013_x64.zip
Thank You
Hi again.
I'm also still waiting for the Win64 build with the latest patch version and I'm starting to wonder if plans have switched to wait until iteration 19 is done before another build is made.
Thanks.
Do we really need to be filling up task/tracker pages with requests for builds? There's plenty of that at BA as is. There's no pages here, so it makes it much harder to follow progress when there's tons of "compile for me plz". BA's forum setup makes it a lot easier to work with general discussion, plus there's a lot more people there who can help with building anyway.
Not to mention, compiling Blender REALLY isn't all that hard. There's good instructions in the wiki. If you're good enough with computers to make sense of this thread, and have permissions on your machine to install test builds, you really shouldn't have any issue rolling your own anyhow.
Added subscriber: @PeterBoos
May i suggest "as an exception", this gets pushed to the Blender build bot ?.
Bypassing code reviews. This doesn't break the other cycles render methods or other stuff.
And we all know those builds are experimental anyway.
As a result we wouldn't disturb the programmer any more with build questions, or failing compilations.
Sure it might not have been peer viewed, but is it that so important ??..
Its experimental, and its the biggest blender change since well ehmm... cycles introduction.
Anyone loves a faster renderer, even if still under development.
Also then real problems / suggestions / ideas could be forum discussed BA, and perhaps more people understand this math on other forums as well.
This is not how our development process works. If you want to use it now, you can do it by building Blender on your own.
Also, can we please stop with this "Please I want this" comments? They don't really help here and won't make things happen faster. Thanks!
It would be easier if we all had patience. :)
http://gpupathtracer.blogspot.dk/ anyone checked this one?
they got the code on github and using MLT on GPU
i compiled new master with metropolis_18_n.patch using mingw64 and cmake
(i had to disable a few things to get smake to work correctly, this isnt exclusive to this patch
i think its something wired with cmake, anyhow i disabled these items they are not available in this build
libmv
bullet
freestyle
osl
CUDA
if you would like try it
http://www.mediafire.com/download/4wjp48eo7le802j/Blender_2.7.5_Mingw64_cmake_mertopolis18.7z
.
Added subscriber: @SterlingRoth
@00Ghz, That is not an implementation of MLT, it's just standard path tracing.
I read about MLT somewhere. Guess I was mistaken
Added subscriber: @LevonHudson
I think this conversation should be moved over to blender artists forums. this thread is really for development purposes. and while it is great to see some of the images being produced, the "can someone send me a build", "i need help compiling" and "omg wowz" posts are clogging up the page.
Please continue to post on this thread http://blenderartists.org/forum/showthread.php?329089-Cycles-MLT-patch
and leave this thread to developers.
im not a moderator or anything on here, i just think developer.blender should be just that... for devs
Added subscriber: @kwk-2
Added subscriber: @VictorMukayev
i thought i'd share it here
sorry if it's not related, but this Progressive Importance sampling sounds very promissing
https://corona-renderer.com/blog/research-corner-progressive-importance-sampling/
Now that's awesome indeed. Only it it would work on the GPU imagine the possibilities;)
@LukasStockner,
I just got some probleme merging the diff ? Could you diff from master if you get some time ? Or provide a githash that merge without difficulties, so i can checkout to it ? Many thanks !
Added subscriber: @derekbarker
@LukasStockner what are your plans after the full implementation of mlt
Well, time to get back to the Blender development :D
First of all, sorry for not uploading Windows builds and not answering on the thread anymore, my CUDA toolchain just won't work on Windows and I just had so much other stuff to do...
But, I think I have news that can make up for this: Since, for said other project, I got a GTX780 from NVIDIA, I'm currently working on GPU Metropolis! There basically are two reasons for this: First of all, while reading "Physically based Rendering" and Veach's thesis, I noticed that they both recommend using more, shorter chains, so it seems that I was wrong in assuming that more chains (on GPU) mean less quality. Second, due to my current re-structuring of the patch (moving stuff from util/ into the kernel), it's not much work to add it anymore, so why not just do it?
So, once it works reasonably, I'll post a patch, hopefully it won't be long...
Regarding the Progressive Importance Sampling: Wow, that's really awesome! I read the paper and it seems quite reasonable and impementable. It's basically a Photon mapping prepass that is not used to generate the image itself, but rather to "train" the path tracer so it knows from where the light comes from and is able to find more important paths. The implementation details are a mix of PT, Photon mapping, irradiance caching and GMM machine learning.
Since it still uses PT for the image generation, it would definitely be useful for Cycles. In fact, since it already uses Lightpath-Generation etc., but no explicit path connection, it's quite a step on the way to BPT without being as hard to incorporate into Cycles. By the way, PIS should work just as well on the GPU as it does on the CPU.
So, to answer @derekbarker, until the PIS paper was posted here, the next feature on my plan have been Lightgroups. Probably I'll still do them next, but PIS now definitely also is on the plan. Lightgroups won't be as involved as Metro (I imagined representing them like scene layers in the GUI, with every activated layer adding one output to the compositor), but the PIS would be quite more work to do. Still, challenges are fun :D
@tychota Yes, the current patch won't merge anymore. 83cdd5 should work, since it's two days older than metro_18_n. Metro_19 will of course be rebased again.
@LukasStockner Dude.......... I love you! <3
Drum rollll !!!! xxxoooxxxxooo :) :) :) * * * Go Lucas! Go * * * Lookin forward -Thanks in advance ;)
This is not a Blenderartists thread here, what is the sense of all these off-topic "Drum roll" posts? I get an e-mail about every new post.
Removed subscriber: @ThomasDinges
Maybe a silly question but i think it would be perfect to get a branched path version of this. Sometimes (most of the time) either translucent or glossy shader or SSS are producing the noise but diffuse isn't. So is it possible that the noise aware sample adjust branched sample so it won't spend processor or tracing diffuse rays or anything that has currently converted ?
no it don't.
I did
but stilll get plenty of rejections
Would it be different if i use arc instead of patch -p1 ?
Added subscriber: @ThomasDinges
@ThomasDinges Please accept my sincere apologies, from now on I will keep my input here only relative to "beta testing/suggestions" of the code in development - if thats appropriate? I won't waste your time via your "automatic email" from this (or any developer) thread.
I am very grateful for all you do for the 3D community, and don't want to be disrespectful or wasteful.
I became a bit "over excited" when I read about Lucas' enthusiasm towards moving forward on implementing GPU, MLT and PIS and should have thought first to send my "cheering" to him from an appropriate forum/channel.
Thank You
@LukasStockner would it be possible to turn 1 gpu in to multiple small tiles so it can be more efficient with tolerated error ?
About me asking earlier if it was possible to have per-pixel stop-condition, I thought of a workflow like this:
(* this could be every single pass or user-defined number of passes. Not sure what would be better.)
Not sure if i was clear. English is not my language of course.
And not sure if this is a totally absurd coding concept, or, I suspect a total memory-killer approach.
Max Consecutive Rejections - 256

Mutation distance - 0.40
Large Mutation chance - 0.40
Num. samples - 1024 using Correlated multi-jitter
Sampling - Equi-Importance
Patch revision - 18
The build I'm using is almost a month old (A MingW build from HolyEnigma), but the results you get from this patch are absolutely stunning with the right settings.
It goes to show that even the worst case scenarios can be rendered now in around 3 days max (with simpler cases surely rendering quite a lot faster than that).
Good work once again, I stand by my previous token award :)
lukas, waiting for 19 so i make a new build. :D
Metropolis Sampling

Max Consecutive Rejections - 256
Mutation distance - 0.25
Large Mutation chance - 0.20
Num. samples - 1532(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 29 hours
Patch revision - 18
Only very minor touch up was needed, I would really like to see Lukas thinking of this getting this into master at least as an experimental feature as soon as he can, the potential improvements here are just too great to just see waste away.
I have also noted, the Multi-Jitter was crushed by Sobol in this case, lowering the max rejects doesn't have any rapid rise in bias introduction using that method and 256 then gives the best that the higher and lower values will give you.
I can't wait to see the next update and the next build containing it, keep up the good work. :)
Metropolis Sampling

Max Consecutive Rejections - 256
Mutation distance - 0.20
Large Mutation chance - 0.25
Num. samples - approx. 710(total until stopping) using Sobol
Sampling - Importance Equalization
Time - 70 hours
Patch revision - 18
I know, same image as the first one I posted, but it looks to me like the very nature of Sobol eliminates pretty much every issue with bias and incomplete convergence that the Multi-Jitter version had. The time is the same, but the convergence is quite a bit better.
Now I guess it's back to vanilla builds for now since the MingW one is getting old, which in turns results in me hoping we get a nice big update soon like better adaptive sampling for the Metro integrator and the like. This pretty much solves the issue of trying to render scenes with complex indirect lighting in Cycles.
Added subscriber: @PRosendahl
Added subscriber: @nAssembly
@Ace_Dragon, as for reference what is your PC hardware (processor-speed&cores / gpu /memory ),
It seams a complex scene, how many faces its made of ?, how much lights ?
And how do these render times compare to the other cycles render methods
Added subscriber: @VilemDuha
Regarding noise levels between stopped tiles in adaptive:
Could the sample-count be interpolated inside the tile? This means rendering a different sample count on each side of the tile - when one of the neighbouring tiles is stopped, the surrounding tiles don't render more samples on the connecting side, and the sample count increases towards the other non-stopped tiles.
It could also be done the other way - also tiles that are below the stopping condition render some extra samples towards the borders of non-stopped tiles.
Another idea is to render these extra samples after all tiles have stopped.
this would enable consistent grain/noise levels even at low samples...
Just to show a sign of life here - I'm still working on the patch, but the full Metro rewrite necessary for the GPU is quite some work. It's working by now, but now regular PT crashes :/
Also, from what I can currently see, GPU Metro doesn't significantly outperform the CPU version. Considering I use a GTX780, that's quite disappointing, but I'll try to improve the performance.
On the positive side, the new code is already quite cleaner than the previous version, since now the whole Metro code is in the kernel instead of the device.
Another thing I'll try is to add "regular" pixel filtering for Metro mode. With PT, the current approach (shifting the camera point) works just as well, but I can imagine that for Metro this might be different. Also, this would filter functions with negative lobes (Sinc-Lanczos, Mitchell-Netravali etc.).
Well the custom MLT octane has it's slower as well by around 40% then normal PT. Not sure what kind of slowdown you get so can't say for sure.
Lukas; if you're still following up on messages here...
Do you have any updates on this patch, if the GPU portion is looking to be too difficult right now, perhaps then you can at least get this done as a CPU-only feature for the initial release and work on the GPU stuff later?
If you can give an answer, that would be great.
Removed subscriber: @VilemDuha
Removed subscriber: @ThomasDinges
Well, yes, I'm still working on it and Metro19 is nearly finished, the only problem remaining is a CUDA alignment problem, it seems that nvcc doesn't manage to align the strict itself. Once this is done (I already tracked it down to a variable, so most likely today), I'll upload it here (rebase to master etc. is already done).
So how is the CUDA performance?
Finally, Metro19 is finished!
So, what has changed?
GPU Metropolis! After some changes, it's considerably faster than CPU Metro (at least on my system) and has all features that CPU also has.
To do this, I rewrote (or rather re-organized) the whole sampler, moving it from the device code to the kernel. This is IMO a lot cleaner.
Importance Equalisation is currently broken, but of course I'll re-add it.
There now is an option the choose the Chain Number (number of independent samplers). On GPUs, this should be really high (I currently use 16384 or 32768, however, this results in 75MB / 150MB memory usage on the GPU for the samplers), on CPUs it should be a multiple of the thread number (1-2 chains per thread are just fine). Basically, for GPU performance, this is the performance equivalent of tile size.
I fixed a pretty massive bug in the cooldown phase, but I might have only added it while rewriting, so it probably wasn't present in metro_18
Error estimation was also rewritten, now again using variance instead of the even-samples pass. Reasons for this are: It's got a solid theoretical basis, there are no correlation issues with Sobol, it works just as well with Metro and it seems more solid. If anyone is interested in the derivation of the actual formulas used, I put in some comments, I hope it's clear enough.
Basically, it uses perceptually weighted standard deviation (I call it PWSD), multiplied by sqrt(N) with N being the number of samples. The reason for this is that the standard deviation convergence of MC methods like PT is O(1/sqrt(N)), so by increasing the sample count by N, the error lowers by sqrt(N). This correction is necessary so that the error goes to 0 as N goes to infinity.
You can see the estimated error in the Diffuse direct pass
There is a "Power mean exponent" setting which can be used to balance between using the average error of a tile and the maximum error. Basically, the higher this value is, the more important the maximum values get. 2 gives a regular average, while in theory infinity gives a pure maximum. However, due to numerical precision issues, going higher than 10 is probably a bad idea. I always use 4, it seems quite balanced.
The new system seems pretty stable to varying tile sizes, so it works just as well for GPU.
To see the system at work, just render an image without any option, one just with stopping and one with both options. You'll see that the error distribution gets flatter from image to image.
Some things to note:
Things still to do:
The patch is pased on commit 49c73f.
metropolis_19.patch
Hi Lucas, _19 compiles fine on my system, CPU is working but get error during GPU kernel compilation.
#include "util_hash.h"
Opensuse 13.1/64
Intel i5 3770K
GTX 760 4 GB (Display)
GTX 560Ti 1.28 GB 448 Cores
Driver 331.67
Thank you, mib
@mib2berlin That's really weird since ../util is in both CMakeLists and SConscript. Are you building with CMake or SCons? Could you try a new build from scratch?
Added subscriber: @hdunderscore
Testing this out on msvc 2013, I made a few changes to get it to compile:
#elif defined(_WIN32)
I also had to disable OSL, there was a difficult to trace undefined reference bug.
Very nice work !
Hi lukasstockner97, try with clean build directory but same error.
After add complete path to util_hash.h Cuda kernel start to build but stop with:
Thank you for the fast help, mib
EDIT: The kernel for my GTX 760 sm_30 is compiling and working.
Error is for GTX 560Ti 448 sm_20.
The memory consumptionon GPU is huge, 1.1 GB for BMW!
THX
@hdunderscore Thanks, I'll add it in the next version!
@mib2berlin This error comes from the 128-textures limit on devices with sm20 or lower, since MCQMC uses another texture, one of the image textures has to be moved to the sm30+ code. I forgot to do so since I only use sm30 and sm35, sorry for that. For a quick fix, move the "__tex_image_098"-line in intern/cycles/kernel/svm/svm_image.h 2 lines down into the "#if defined ..."-block.
Regarding memory usage: Well, the Metro currently requires lots of memory, I am aware of this and will try to reduce it (this falls under optimization). One solution would be to disable lazy sample generation, another one to always re-mutate from the last large-step. This would give performance problems, however, when there is a large chain of rejected large-steps, and especially on GPUs, where every warp has to wait for all threads to finish the mutations, this would probably be a quite bad trade-off.
A quick fix for this is to reduce the bounce count, it nothing else helps and you're getting OOMs, reduce the chain count.
Thank you, with changes in svm_image.h GTX 560Ti 448 start rendering.
I cant render with both cards and it seams sometimes the 560 render and sometimes the 760.
Over 6000 M. Chains GTX 560 stop working without error.
The render results also very different on GTX 560.
http://www.pasteall.org/pic/73837
http://www.pasteall.org/pic/73838
http://www.pasteall.org/pic/73839
http://www.pasteall.org/pic/73840
Very interessting, cheers, mib
someone requested a build..
here mingw64-cmake 7-12-14 metropolis19
with player, no CUDA or OSL
http://www.mediafire.com/download/84vzfy99a5h6n41/Blender_2.71_mingw64_cmake_7-12-14_metropolis19.7z
Added subscriber: @jemonn
Initial thoughts....
So the Metropolis functionality seems to have seen a few regressions as of rev. 19,, but the adaptive sampling seems intact after all of the changes (EDIT; Mostly, just found that a high min-bounce number will also crash Blender on the same scenes that crash Metro)..
Hope you can fix these.
EDIT: Updated information after more testing
(hopefully) built a version of R19 with CUDA (and GPU SSS and Volumetrics) compiled with VS2013/cmake. Doesn't have OSL or Player.
It works on my computer but as it is my first build, not sure whether I've zipped everything up that is needed.
http://www.mediafire.com/download/bq551h24xxdym9e/Release.zip
While playing around with settings i noticed something strange using R19 Jemonn's build (thanks), on my i7 octocore (no GPU).
I had a simple blend with 3 lights and some musical notes as objects > https:*dl.dropboxusercontent.com/u/54767531/music.blend
It does render a first preview, but then later it doesn't update (although the rendering system stays busy).
Well the setting i used are probably not good for a nice render, but its strange that the system kept busy while no improvement was made.
With other settings it worked just fine, so i just post this as maybe there is something going wrong her with metropolis or the adaptive sampling.
Hey Lukas; I don't know how much it would help, but I found a potentially useful resource that has free .blend file scenes (to verify that the adaptive and metropolis sampling code works with as many cases as possible).
http://www.emirage.org/category/free-stuff/
For example, if any one of these scenes crash with the metropolis sampler being used, you know that there's a regression or two since Patch 18.
Hi Luckas.
First of all, thanks for all that job, that's cool !
More constructive remark: i tested the @holyenigma 's build, it crash when i render a scene with metro and render border enabeled.
@jemonn: your build crash on my computer, missing a file: "VCOMP120.DLL"
@Lapineige for some reason, it didn't copy it to my build folder, but it still works for me? anyway, I've uploaded it again, copying the contents of my build to a copy of the 2.71 folder, which should include all the files but as it works for me, I've got no idea whether it's fixed everything.
Using my build, it doesn't crash when using metro and a render border is used.
(Hopefully) fixed v19 link: http://www.mediafire.com/download/0jkmzgnk2qim00q/Blender.7z
I've been leaving all the settings at their defaults, are there any suggestions to improve performance? Is leaving sampler count at 0 (auto) okay, or should I manually change it?
Some unfortunate news about the adaptive sampling in rev. 19.
Never mind about it remaining intact from rev. 18, just found that adaptive sampling on a complex scene will also crash Blender when the adaptive stopping kicks in for the first time (providing it takes a while to get there). I can also get Blender to crash early on if I use a high number for the min bounces setting (10 or more).
The crashing issues mean the patch is unusable once a scene gets to a certain degree of complexity, back to rendering it in a vanilla Master build I guess for now. Don't forget that those eMirage scenes can used for testing of your patch and seeing if things crash for you as well.
Sorry about all the crashes...
Metro20 is making progress, in 2-3 days it should be done. The Metro crashes should be gone with it, the adaptive crashes are quite strange. Could you post a crashlog (the ones from the Temp folder), and, if possible, use a debug build so the place where it crashes can be seen better.
For me, Metro19 didn't crash once, even with scenes like the Lego bulldozer or the Pavilion from eMirage. I'll run some Valgrind tests today to search for wrong memory accesses. By the way, is it expected that cuda-memcheck crashes with Blender?
Regarding the GPU Metro problems: With Metro20, Multi-GPU will work, but the error on the GTX 560 is strange. Well, we'll see if it works with Metro20.
The min. bounces is strange, I'll have a look.
@jemonn: ok the new build works ! Thanks !
But even this time rendering with MLT and render border enabeled do not crash Blender, but still fails.
Lukas, testing the VC2013 build I have with your patch, there's a chance it may just be MingW being unstable compared to the official platforms (MingW being the only platform Holy Enigma will build for).
I couldn't get the crashes I got early on with the other build in this case, so it may be MingW's fault and not the fault of your code.
If I do indeed find it a false alarm, then I'm sorry for that.
EDIT: Okay, even the use of 64 threads doesn't crash adaptive sampling with the VC2013 build, so it really is just MingW doing its thing as the least stable platform you can possibly build with. Sorry for that.
Another question.
Do you by any chance know what the unit is for the adaptive map update value? I ask because I couldn't really obtain a difference say, between a value of 5 and one of 50, but then I found that much larger values seem to be necessary if I wanted to see a difference when using a lower error tolerance along with the adaptive map.
Perhaps if one could make sure that the value was easy for the user to understand it would be different, say, update the adaptive map every N number of passes instead if it's not being done like that already.
MLT 19 (holyenigma MingW build) really good fast CPU renders (once I messed with it a little, and followed Lucas' suggestions) on a "physically" simple scene, with very complex "real" spectral/caustic lighting, using MLT 19 holyenigma MingW
A useable "draft" in Around 10 minutes 1920x1080 50 passes, adaptive sampling at 2.5.
Before using this MLT 19 I could NOT GET a good reasonably, noise free render any other way on this scene, I tried many combinations on both PT and BPT, at least 200 times. Tried MLT 18 and it crashed a lot on other scenes, maybe I'll try it on this scene just to see ....
Image and .blend file here
BA Thread: Cycles MLT patch
i7 2600 3.4ghz, Windows 7 64-bit
rendered scene above at 4 times the samples, was at 50 but once it gets to about 70 passes it starts showing "black fireflies" - even tried it on "Suzanne" with the same spectral node AND lots of ambient light - same problem?
also tried @jemonn build of 19 w GPU (GTX580 3 GB), and the GPU mode is about 30% slower on this file, than using CPU = i7 2600 3.4ghz, Windows 7 64-bit 12 GB
Added subscriber: @blakenator
Just an FYI: out of the box patch from master to metropolis_19 yields "patch does not apply" and "trailing white space" errors and when compiling:
My machine (using the term lightly):
Windows 7
VS 2013 Win32 (builds master just fine)
CMake
portablegit (Github) with tortoisesvn (idk if it makes a difference)
Hope this helps
I've just tried the test file @craigar linked to above and something is really wrong with importance equalisation (but we all knew that anyway).
However, I have slightly different results with my CPU and GPU?
i7 4500U - 400 samples, importance equalisation off, 4 chains. (04:53:63)
Geforce GT 750M - 400 samples, importance equalisation off, 16000 chains. (08:53:28)
Also GPU rendering is really slow for me, might just be my build as @craigar said it was slower for GPU, but there usually isn't this much difference in the speeds.