archive/blender-archive - blender-archive - Blender Projects

archive/blender-archive

Archived

Author	SHA1	Message	Date
Mai Lavelle	087331c495	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable Goal is to reduce OpenCL kernel recompilations. Currently viewport renders are still set to use 64 closures as this seems to be faster and we don't want to cause a performance regression there. Needs to be investigated. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2775	2017-11-09 01:04:06 -05:00
Brecht Van Lommel	8a72be7697	Cycles: reduce closure memory usage for emission/shadow shader data. With a Titan Xp, reduces path trace local memory from 1092MB to 840MB. Benchmark performance was within 1% with both RX 480 and Titan Xp. Original patch was implemented by Sergey. Differential Revision: https://developer.blender.org/D2249	2017-11-05 20:48:33 +01:00
Brecht Van Lommel	400e6f37b8	Cycles: reduce subsurface stack memory usage. This is done by storing only a subset of PathRadiance, and by storing direct light immediately in the main PathRadiance. Saves about 10% of CUDA stack memory, and simplifies subsurface indirect ray code.	2017-09-28 15:18:43 +02:00
Brecht Van Lommel	32449e1b21	Code cleanup: store branch factor in PathState.	2017-09-13 15:24:14 +02:00
Brecht Van Lommel	cfa8b762e2	Code cleanup: move rng into path state. Also pass by value and don't write back now that it is just a hash for seeding and no longer an LCG state. Together this makes CUDA a tiny bit faster in my tests, but mainly simplifies code.	2017-08-19 18:14:16 +02:00
Mai Lavelle	6238214159	Cycles: Faster split branched path tracing by sharing samples with inactive threads Unlike regular path tracing, branched path tracing is usually used with lower sample counts, at least for primary rays. This means that are less samples for the GPU to work on in parallel and rendering is slower. As there is less work overall there is also more inactive threads during rendering with BPT. This patch makes use of those inactive rays to render branched samples in parallel with other samples. Each thread that is preparing for a branched sample will attempt to find an inactive thread and if one is found the state for the sample is copied to that thread. Potentially, if there are enough inactive threads, 100s of branched samples could be generated from the same originating thread and ran in parallel giving large speed ups. Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW with car paint replaced with SSS/volumes.	2017-06-10 04:08:49 -04:00
Sergey Sharybin	2eb906e1b4	Cycles: Fix access array index of -1 in SSS and volume split kernels	2017-05-05 17:54:03 +02:00
Mai Lavelle	915766f42d	Cycles: Branched path tracing for the split kernel This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611	2017-05-02 14:26:46 -04:00
Hristo Gueorguiev	8ada7f7397	Cycles: Remove ccl_addr_space from RNG passed to functions Simplifies code quite a bit, making it shorter and easier to extend. Currently no functional changes for users, but is required for the upcoming work of shadow catcher support with OpenCL.	2017-03-27 10:46:28 +02:00
Sergey Sharybin	26620f3f87	Cycles: Avoid some ccl_local in various kernels	2017-03-16 11:27:17 +01:00
Hristo Gueorguiev	57e26627c4	Cycles: SSS and Volume rendering in split kernel Decoupled ray marching is not supported yet. Transparent shadows are always enabled for volume rendering. Changes in kernel/bvh and kernel/geom are from Sergey. This simiplifies code significantly, and prepares it for record-all transparent shadow function in split kernel.	2017-03-09 17:09:37 +01:00