blender-archive

Archived

Author	SHA1	Message	Date
Sergey Sharybin	a280697e77	Cycles: Support "precompiled" headers in include expansion algorithm The idea here is that it is possible to mark certain include statements as "precompiled" which means all subsequent includes of that file will be replaced with an empty string. This is a way to deal with tricky include pattern happening in single program OpenCL split kernel which was including bunch of headers about 10 times. This brings preprocessing time from ~1sec to ~0.1sec on my laptop.	2017-08-02 20:59:19 +02:00
Mai Lavelle	eb293f59f2	Cycles: Pass all buffers to each kernel call for OpenCL Technically not passing all buffers used by a kernel is undefined behavior. We haven't had any issues with this so far on AMD or Nvidia, but it's known to be a problem with Intel and we received a report from AMD that this is a problem on newer hardware, so we need to make this change at some point. Unfortunately there a cost to being correct, about 5% for the benchmark scenes. For low sample counts it's even worse, I've seen up to 50% slowdown. For the latter case I think adjusting tile updating logic can help, but not sure what that would look like yet (it would be just a few lines change however).	2017-06-10 04:08:49 -04:00
Mai Lavelle	ea846a4dfc	Cycles: Add kernel to enqueue inactive rays The queue will be used to make reuse of inactive threads to keep the GPU more busy.	2017-06-10 03:51:18 -04:00
Lukas Stockner	705c43be0b	Cycles Denoising: Merge outlier heuristic and confidence interval test The previous outlier heuristic only checked whether the pixel is more than twice as bright compared to the 75% quantile of the 5x5 neighborhood. While this detected fireflies robustly, it also incorrectly marked a lot of legitimate small highlights as outliers and filtered them away. This commit adds an additional condition for marking a pixel as a firefly: In addition to being above the reference brightness, the lower end of the 3-sigma confidence interval has to be below it. Since the lower end approximates how low the true value of the pixel might be, this test separates pixels that are supposed to be very bright from pixels that are very bright due to random fireflies. Also, since there is now a reliable outlier filter as a preprocessing step, the additional confidence interval test in the reconstruction kernel is no longer needed.	2017-06-09 03:46:11 +02:00
Sergey Sharybin	90a62404cb	Cycles: Cleanup, variable names Don't use camel case for variable names. Leave that for the structures.	2017-05-19 12:52:12 +02:00
Sergey Sharybin	de86da521c	Cycles: Cleanup, braces after function definition I wouldn't mind switching fully to Google style, but i am against of mixing two different styles in same project. So just stick to brace at the new line after function definition.	2017-05-19 12:43:26 +02:00
Sergey Sharybin	803337f3f6	\0;115;0cCycles: Cleanup, use ccl_restrict instead of ccl_restrict_ptr There were following issues with ccl_restrict_ptr: - We already had ccl_restrict for all platforms. - It was secretly adding `const` qualifier to the declaration, which is quite weird since non-const pointer can also be declared as restricted. - We never in Blender are using foo_ptr or FooPtr type definitions, so not sure why we should introduce such a thing here. - It is absolutely wrong from semantic point of view to put pointer into the restrict macro -- const is a part of type, not part of hint for compiler that some pointer is never aliased.	2017-05-19 12:41:03 +02:00
Lukas Stockner	740cd28748	Cycles Denoising: Add more robust outlier heuristic to avoid artifacts Extremely bright pixels in the rendered image cause the denoising algorithm to produce extremely noticable artifacts. Therefore, a heuristic is needed to exclude these pixels from the filtering process. The new approach calculates the 75% percentile of the 5x5 neighborhood of each pixel and flags the pixel if it is more than twice as bright. During the reconstruction process, flagged pixels are skipped. Therefore, they don't cause any problems for neighboring pixels, and the outlier pixels themselves are replaced by a prediction of their actual value based on their feature pass values and the neighboring pixels. Therefore, the denoiser now also works as a smarter despeckling filter that uses a more accurate prediction of the pixel instead of a simple average. This can be used even if denoising isn't wanted by setting the denoising radius to 1.	2017-05-18 21:55:56 +02:00
Mai Lavelle	966a2681f9	Cycles: Fix building with native only option Approach suggested by Lukas S.	2017-05-16 16:05:04 -04:00
Lukas Stockner	43b374e8c5	Cycles: Implement denoising option for reducing noise in the rendered image This commit contains the first part of the new Cycles denoising option, which filters the resulting image using information gathered during rendering to get rid of noise while preserving visual features as well as possible. To use the option, enable it in the render layer options. The default settings fit a wide range of scenes, but the user can tweak individual settings to control the tradeoff between a noise-free image, image details, and calculation time. Note that the denoiser may still change in the future and that some features are not implemented yet. The most important missing feature is animation denoising, which uses information from multiple frames at once to produce a flicker-free and smoother result. These features will be added in the future. Finally, thanks to all the people who supported this project: - Google (through the GSoC) and Theory Studios for sponsoring the development - The authors of the papers I used for implementing the denoiser (more details on them will be included in the technical docs) - The other Cycles devs for feedback on the code, especially Sergey for mentoring the GSoC project and Brecht for the code review! - And of course the users who helped with testing, reported bugs and things that could and/or should work better!	2017-05-07 14:40:58 +02:00
Hristo Gueorguiev	6bf4115c13	Cycles: Split kernel - sort shaders Reduce thread divergence in kernel_shader_eval. Rays are sorted in blocks of 2048 according to shader->id. On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster. No sorting for CUDA split kernel. Reviewers: sergey, maiself Reviewed By: maiself Differential Revision: https://developer.blender.org/D2598	2017-05-03 15:30:45 +02:00
Mai Lavelle	915766f42d	Cycles: Branched path tracing for the split kernel This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611	2017-05-02 14:26:46 -04:00
Sergey Sharybin	4245ed360e	Cycles: Cleanup, indentaiton and trailing whitespace and wrapping	2017-04-28 13:21:17 +02:00
Stefan Werner	ec25060a05	Unlimited number of textures for Cycles This patch allows for an unlimited number of textures in Cycles where the hardware allows. It replaces a number static arrays with dynamic arrays and changes the way the flat_slot indices are calculated. Eventually, I'd like to get to a point where there are only flat slots left and textures off all kinds are stored in a single array. Note that the arrays in DeviceScene are changed from containing device_vector<T> objects to device_vector<T>* pointers. Ideally, I'd like to store objects, but dynamic resizing of a std:vector in pre-C++11 calls the copy constructor, which for a good reason is not implemented for device_vector. Once we require C++11 for Cycles builds, we can implement a move constructor for device_vector and store objects again. The limits for CUDA Fermi hardware still apply. Reviewers: tod_baudais, InsigMathK, dingto, #cycles Reviewed By: dingto, #cycles Subscribers: dingto, smellslikedonkey Differential Revision: https://developer.blender.org/D2650	2017-04-27 09:35:22 +02:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Sergey Sharybin	1cad64900e	Cycles: Define ccl_local variables in kernel functions Declaring ccl_local in a device function is not supported by certain compilers.	2017-03-16 11:27:17 +01:00
Sergey Sharybin	1ff753baa4	Cycles: Workaround for compilation error caused by passing KernelGlobals Pass globals as a bare pointer, same as it sued to be prior to split kernel rework. AMD CPU platform and Intel OpenCL were complaining about this. Perhaps we shouldn't pass globals as pointer at all, this isn't something what is really portable and can cause issues on 32 bit perhaps.	2017-03-16 11:27:17 +01:00
Mai Lavelle	96868a3941	Fix T50888: Numeric overflow in split kernel state buffer size calculation Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.	2017-03-11 05:39:28 -05:00
Mai Lavelle	4a2cde3f0e	Cycles: Enable SSS and volumes for CUDA and Nvidia OpenCL split kernel	2017-03-10 02:09:41 -05:00
Hristo Gueorguiev	06c051363b	Cycles: split kernel_shadow_blocked to AO & DL parts Reduces memory allocation for split kernel. This allows for faster rendering due to bigger global size, specially when GPU memory is limited. Perfromance results: R9 290 total render time Before After Change BMW 4:37 4:34 -1.1 % Classroom 14:43 14:30 -1.5 % Fishy Cat 11:20 11:04 -2.4 % Koro 12:11 12:04 -1.0 % Pabellon Barcelona 22:01 20:44 -5.8 % Pabellon Barcelona() 15:32 15:09 -2.5 % () without glossy connected to volume	2017-03-09 17:09:37 +01:00
Hristo Gueorguiev	57e26627c4	Cycles: SSS and Volume rendering in split kernel Decoupled ray marching is not supported yet. Transparent shadows are always enabled for volume rendering. Changes in kernel/bvh and kernel/geom are from Sergey. This simiplifies code significantly, and prepares it for record-all transparent shadow function in split kernel.	2017-03-09 17:09:37 +01:00
Mai Lavelle	306034790f	Cycles: Calculate size of split state buffer kernel side By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.	2017-03-08 01:31:30 -05:00
Mai Lavelle	cd7d5669d1	Cycles: Remove sum_all_radiance kernel This was only needed for the previous implementation of parallel samples. As we don't have that any more it can be removed. Real reason for removal tho is this: `per_sample_output_buffers` was being calculated too small and artifacts resulted. The tile buffer is already the correct size and calculating the size for `per_sample_output_buffers` is a bit difficult with the current layout of the code. As `per_sample_output_buffers` was only needed for `sum_all_radiance`, removing that kernel and writing output to the tile buffer directly fixes the artifacts.	2017-03-08 01:31:07 -05:00
Mai Lavelle	4cf501b835	Cycles: Split path initialization into own kernel This makes it easier to initialize things correctly in the data_init kernel before they are needed by path tracing.	2017-03-08 01:30:43 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00
Mai Lavelle	0892352bfe	Cycles: CPU implementation of split kernel	2017-03-08 00:52:41 -05:00
Mai Lavelle	230c00d872	Cycles: OpenCL split kernel refactor This does a few things at once: - Refactors host side split kernel logic into a new device agnostic class `DeviceSplitKernel`. - Removes tile splitting, a new work pool implementation takes its place and allows as many threads as will fit in memory regardless of tile size, which can give performance gains. - Refactors split state buffers into one buffer, as well as reduces the number of arguments passed to kernels. Means there's less code to deal with overall. - Moves kernel logic out of OpenCL kernel files so they can later be used by other device types. - Replaced OpenCL specific APIs with new generic versions - Tiles can now be seen updating during rendering	2017-03-08 00:52:41 -05:00
Mai Lavelle	520b53364c	Cycles: Add OpenCL kernel for zeroing memory buffers Transferring memory to the device was very slow and there's really no need when only zeroing a buffer.	2017-03-08 00:52:41 -05:00
Sergey Sharybin	dde40989f3	Cycles: Store shadow intersections in the kernel globals Seems CUDA failed to de-duplicate the array across multiple inlined versions of the shadow_blocked(). Helped it a bit with that now. Gives about 100MB memory improvement on a scenes after previous commit and brings up memory "regression" to only 100MB comparing to the master branch now.	2017-02-08 14:00:48 +01:00
Sergey Sharybin	c54381488b	Cycles: Enable SSE math optimization for AVX kernels This gives about 5% speedup for AVX processors. Benefit of such optimization on other microarchitectures is still under investigation.	2016-10-25 16:10:47 +02:00
Hristo Gueorguiev	8905c5c874	Cycles: OpenCL 3d textures support. Note that volume rendering is not supported yet, this is a step towards that. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2299	2016-10-22 23:49:29 +02:00
Brecht Van Lommel	21e65d7457	Fix build error with WITH_CYCLES_NATIVE_ONLY and recent AVX2 changes.	2016-10-12 17:35:03 +02:00
Sergey Sharybin	fa62a989b4	Cycles: Enable SSE options of math module for AVX2 kernels Currently this does not give measurable difference, but is required ground work for some upcoming further optimization of AVX2 kernels.	2016-10-12 12:54:31 +02:00
Thomas Dinges	5ac7ef873b	Cycles: Change code order for Image Data Types. Now we have the 4 component ones first (float4, byte4, half4) followed by the 1 component ones (float, byte, half). Makes code a bit more consistent and also reduces code a bit when enabling half support on GPU in next commit. This also exposed a typo in half CPU images for 3D textures, which wasn't used yet, but good to have that one fixed anyway.	2016-08-11 22:30:03 +02:00
Brecht Van Lommel	20ec6bc166	Fix Cycles kernel build without render passes support.	2016-07-18 22:40:08 +02:00
Sergey Sharybin	4355603790	Cycles: Move BVK kernel files to own directory BVH traversal is not really that much a geometry and we've got quite some traversals now. Makes sense to keep them separate in the name of source structure clarity.	2016-07-11 13:58:47 +02:00
Brecht Van Lommel	e26eb9c93b	Cycles: reduce CUDA stack memory access for Maxwell and up, increasing max registers. For non-branched path tracing with a GTX 960 and CUDA 7.5, this gives a small reduction in stack usage but mainly: 8% faster render on BMW, 5% on pabellon, 13% on classroom.	2016-06-19 20:17:26 +02:00
Thomas Dinges	6311a9ff23	Cycles: Support half and half4 textures. This is an initial commit for half texture support in Cycles. It adds the basic infrastructure inside of the ImageManager and support for these textures on CPU. Supported: * Half Float OpenEXR images (can be used for e.g HDRs or Normalmaps) now use 1/2 the memory, when loaded via disk (OIIO). ToDo: Various things like support for inbuilt half textures, GPU... will come later, step by step. Part of my GSoC 2016.	2016-06-19 17:31:16 +02:00
Thomas Dinges	2ee063868d	Cleanup: Shorten texture variables, tex and image was kinda redundant. Also make prefix consistent, so it starts with either TEX_NUM or TEX_START, followed by texture type and architecture.	2016-05-27 22:58:33 +02:00
Thomas Dinges	a5a05fc291	Cycles: Fix long compile time with MSVC. Compile time per kernel increased alot after recent image commits, re-shuffle some code to fix this. Patch by "LazyDodo". Differential Revision: https://developer.blender.org/D2012	2016-05-20 16:50:29 +02:00
Thomas Dinges	3c85e1ca1a	Cycles: Add support for single channel byte textures. This way, we also save 3/4th of memory for single channel byte textures (e.g. Bump Maps). Note: In order for this to work, the texture must have 1 channel only. In Gimp you can e.g. do that via the menu: Image -> Mode -> Grayscale	2016-05-12 14:51:42 +02:00
Thomas Dinges	4a4f043bc4	Cycles: Add support for single channel float textures on CPU. Until now, single channel textures were packed into a float4, wasting 3 floats per pixel. Memory usage of such textures is now reduced by 3/4. Voxel Attributes such as density, flame and heat benefit from this, but also Bumpmaps with one channel. This commit also includes some cleanup and code deduplication for image loading. Example Smoke render from Cosmos Laundromat: http://www.pasteall.org/pic/show.php?id=102972 Memory here went down from ~600MB to ~300MB. Reviewers: #cycles, brecht Differential Revision: https://developer.blender.org/D1981	2016-05-11 21:58:34 +02:00
Thomas Dinges	d6555d936c	Cleanup: Avoid duplicative defines for CPU textures, use the ones from util_texture.h Also includes some further byte -> byte4 renaming, missed that in last commit.	2016-05-09 09:16:41 +02:00
Thomas Dinges	9a1e11260c	Cleanup: More byte -> byte4 renaming for consistency.	2016-05-09 02:22:01 +02:00
Thomas Dinges	4422b3f919	Some fixes for CUDA runtime compile: * When Baking wasn't used we got an error. * On top of Volume Nodes (NODES_FEATURE_VOLUME), we now also check if we need volume sampling code, so we can disable that as well and save some further compilation time.	2016-05-06 23:13:33 +02:00
Thomas Dinges	3807bcb3a8	Cleanup: Rename texture slots to float4 and byte, to distinguish from future float (single channel) and half_float slots. Should be no functional changes, tested CPU and CUDA.	2016-05-06 14:37:35 +02:00
Sergey Sharybin	e4a265f058	Cycles: Add an option to build single kernel only which fits current CPU This seems quite useful for the development, so you don't need to wait all the kernels to be re-compiled when working on a new feature, which speeds up re-iteration. Marked as an advanced option, so if it doesn't work so well in practice it's safe to revert anyway.	2016-03-25 16:09:05 +01:00
Sergey Sharybin	700722f686	Cycles: Cleanup, indent nested preprocessor directives Quite straightforward, main trick is happening in path_source_replace_includes(). Reviewers: brecht, dingto, lukasstockner97, juicyfruit Differential Revision: https://developer.blender.org/D1794	2016-03-25 13:55:42 +01:00
Sergey Sharybin	87c8ff0164	Cycles: Fix compilation error of certain OpenCL split kernels	2016-02-28 16:53:38 +01:00
Brecht Van Lommel	0ccae52394	Fix OpenCL kernel build errors after recent 3D texture changes.	2016-02-17 01:38:55 +01:00

1 2

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

73 Commits