archive/blender-archive - blender-archive - Blender Projects

archive/blender-archive

Archived

Author	SHA1	Message	Date
Brecht Van Lommel	e3e16cecc4	Code refactor: remove rng_state buffer and compute hash on the fly. A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	45dcd20ca9	Cycles: CUDA split performance tweaks, still far from megakernel. On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.	2017-08-05 14:32:59 +02:00
Mai Lavelle	ea846a4dfc	Cycles: Add kernel to enqueue inactive rays The queue will be used to make reuse of inactive threads to keep the GPU more busy.	2017-06-10 03:51:18 -04:00
Hristo Gueorguiev	6bf4115c13	Cycles: Split kernel - sort shaders Reduce thread divergence in kernel_shader_eval. Rays are sorted in blocks of 2048 according to shader->id. On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster. No sorting for CUDA split kernel. Reviewers: sergey, maiself Reviewed By: maiself Differential Revision: https://developer.blender.org/D2598	2017-05-03 15:30:45 +02:00
Mai Lavelle	915766f42d	Cycles: Branched path tracing for the split kernel This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611	2017-05-02 14:26:46 -04:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Sergey Sharybin	1cad64900e	Cycles: Define ccl_local variables in kernel functions Declaring ccl_local in a device function is not supported by certain compilers.	2017-03-16 11:27:17 +01:00
Mai Lavelle	96868a3941	Fix T50888: Numeric overflow in split kernel state buffer size calculation Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.	2017-03-11 05:39:28 -05:00
Mai Lavelle	4a2cde3f0e	Cycles: Enable SSS and volumes for CUDA and Nvidia OpenCL split kernel	2017-03-10 02:09:41 -05:00
Mai Lavelle	306034790f	Cycles: Calculate size of split state buffer kernel side By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.	2017-03-08 01:31:30 -05:00
Mai Lavelle	cd7d5669d1	Cycles: Remove sum_all_radiance kernel This was only needed for the previous implementation of parallel samples. As we don't have that any more it can be removed. Real reason for removal tho is this: `per_sample_output_buffers` was being calculated too small and artifacts resulted. The tile buffer is already the correct size and calculating the size for `per_sample_output_buffers` is a bit difficult with the current layout of the code. As `per_sample_output_buffers` was only needed for `sum_all_radiance`, removing that kernel and writing output to the tile buffer directly fixes the artifacts.	2017-03-08 01:31:07 -05:00
Mai Lavelle	4cf501b835	Cycles: Split path initialization into own kernel This makes it easier to initialize things correctly in the data_init kernel before they are needed by path tracing.	2017-03-08 01:30:43 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00