blender-archive

Archived

Author	SHA1	Message	Date
Campbell Barton	1143bf281a	Cleanup: spelling in comments, comment block formatting	2021-11-13 13:07:13 +11:00
Campbell Barton	acc800d24d	Cleanup: clang-format	2021-11-13 12:47:18 +11:00
Sergey Sharybin	ce395c84a3	Merge branch 'blender-v3.0-release'	2021-11-11 15:29:35 +01:00
Sergey Sharybin	d26d3cfe19	Fix T92868: Cycles catcher with transparency crashes The issue was caused by splitting happening twice. Fixed by checking for split flag which is assigned to the both states during split. The tricky part was to write catcher data at the moment of split: the transparency and shadow catcher sample count is to be accumulated at that point. Now it is happening in the `intersect_closest` kernel. The downside is that render buffer is to be passed to the kernel, but the benefit is that extra split bounce check is not needed now. Had to move the passes write to shadow catcher header, since include of `film/passes.h` causes all the fun of requirement to have BSDF data structures available. Differential Revision: https://developer.blender.org/D13177	2021-11-11 15:21:35 +01:00
Brecht Van Lommel	3fa86f4b28	Merge branch 'blender-v3.0-release'	2021-11-10 20:19:09 +01:00
Brecht Van Lommel	6b0008129e	Fix T92972: Cycles HIP wrong render display after a recent refactor It's unclear why this fails. Maybe the size of half4 is not the expected 8 bytes and adjacent pixels are overwritten. Or there is some bug in the HIP compiler writing a struct into global memory, which we probably don't do elsewhere in the kernel. Thanks to Thomas, William and Jeroen for helping investigate this.	2021-11-10 20:03:07 +01:00
Patrick Mours	f565620435	Fix T92985: CUDA errors with Cycles film convert kernels rB3a4c8f406a3a3bf0627477c6183a594fa707a6e2 changed the macros that create the film convert kernel entry points, but in the process accidentally changed the parameter definition to one of those (which caused CUDA launch and misaligned address errors) and changed the implementation as well. This restores the correct implementation from before. In addition, the `ccl_gpu_kernel_threads` macro did not work as intended and caused the generated launch bounds to end up with an incorrect input for the second parameter (it was set to "thread_num_registers", rather than the result of the block number calculation). I'm not entirely sure why, as the macro definition looked sound to me. Decided to simply go with two separate macros instead, to simplify and solve this. Also changed how state is captured with the `ccl_gpu_kernel_lambda` macro slightly, to avoid a compiler warning (expression has no effect) that otherwise occurred. Maniphest Tasks: T92985 Differential Revision: https://developer.blender.org/D13175	2021-11-10 15:49:50 +01:00
Michael Jones	3a4c8f406a	Cycles: Adapt shared kernel/device/gpu layer for MSL This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code. In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL. MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness. Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions. A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13109	2021-11-09 21:43:10 +00:00
Patrick Mours	440a3475b8	Cycles: Improve OptiX denoising with dark images and fix crash when denoiser is destroyed Adds a pass before denoising that calculates the intensity of the image, which can be passed into the OptiX denoiser for more optimal results for very dark or very bright images. In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object has to be destroyed before the OptiX device context object (since it references that). But in C++ the destructor function of a class is called before its fields are destructed, so "~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore "optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash. Differential Revision: https://developer.blender.org/D13160	2021-11-09 14:49:00 +01:00
Brecht Van Lommel	97ff37bf54	Cycles: perform CPU film reading in the kernel, to use AVX2 half conversion Adds a bunch of CPU kernel function to process on row of pixels, and use those instead of calling unoptimized implementations. Fixes T92598	2021-11-05 22:04:36 +01:00
Brecht Van Lommel	fd25e883e2	Cycles: remove prefix from source code file names Remove prefix of filenames that is the same as the folder name. This used to help when #includes were using individual files, but now they are always relative to the cycles root directory and so the prefixes are redundant. For patches and branches, git merge and rebase should be able to detect the renames and move over code to the right file.	2021-10-26 15:37:04 +02:00
Brecht Van Lommel	d7d40745fa	Cycles: changes to source code folders structure * Split render/ into scene/ and session/. The scene/ folder now contains the scene and its nodes. The session/ folder contains the render session and associated data structures like drivers and render buffers. * Move top level kernel headers into new folders kernel/camera/, kernel/film/, kernel/light/, kernel/sample/, kernel/util/ * Move integrator related kernel headers into kernel/integrator/ * Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/ For patches and branches, git merge and rebase should be able to detect the renames and move over code to the right file.	2021-10-26 15:36:39 +02:00
Brecht Van Lommel	282516e53e	Cleanup: refactor float/half conversions for clarity	2021-10-22 13:03:03 +02:00
Sayak Biswas	d092933abb	Cycles: various fixes for HIP and compilation of HIP binaries * Additional structs added to the hipew loader for device props * Adds hipRTC functions to the loader for future usage * Enables CPU+GPU usage for HIP * Cleanup to the adaptive kernel compilation process * Fix for kernel compilation failures with HIP with latest master Ref T92393, D12958	2021-10-22 12:15:29 +02:00
Brecht Van Lommel	df00463764	Cycles: add shadow path compaction for GPU rendering Similar to main path compaction that happens before adding work tiles, this compacts shadow paths before launching kernels that may add shadow paths. Only do it when more than 50% of space is wasted. It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused by different order of scheduling kernels having an unpredictable performance impact. Still feels like compaction is just the right thing to avoid cases where a few shadow paths can hold up a lot of main paths. Differential Revision: https://developer.blender.org/D12944	2021-10-21 15:38:03 +02:00
Brecht Van Lommel	7d111f4ac2	Cleanup: remove unused code	2021-10-20 18:15:21 +02:00
Brecht Van Lommel	cccfa597ba	Cycles: make ambient occlusion pass take into account transparency again Taking advantage of the new decoupled main and shadow paths. For CPU we just store two nested structs in the integrator state, one for direct light shadows and one for AO. For the GPU we restrict the number of shade surface states to be executed based on available space in the shadow paths queue. This also helps improve performance in benchmark scenes with an AO pass, since it is no longer needed to use the shader raytracing kernel there, which has worse performance. Differential Revision: https://developer.blender.org/D12900	2021-10-20 17:50:31 +02:00
Brecht Van Lommel	fd77a28031	Cycles: bake transparent shadows for hair These transparent shadows can be expansive to evaluate. Especially on the GPU they can lead to poor occupancy when only some pixels require many kernel launches to trace and evaluate many layers of transparency. Baked transparency allows tracing a single ray in many cases by accumulating the throughput directly in the intersection program without recording hits or evaluating shaders. Transparency is baked at curve vertices and interpolated, for most shaders this will look practically the same as actual shader evaluation. Fixes T91428, performance regression with spring demo file due to transparent hair, and makes it render significantly faster than Blender 2.93. Differential Revision: https://developer.blender.org/D12880	2021-10-19 15:11:09 +02:00
Brecht Van Lommel	d06828f0b8	Cycles: avoid intermediate stack array for writing shadow intersections Helps save one OptiX payload and is a bit more efficient. Differential Revision: https://developer.blender.org/D12909	2021-10-19 15:10:55 +02:00
Brecht Van Lommel	943e73b07e	Cycles: decouple shadow paths from main path on GPU The motivation for this is twofold. It improves performance (5-10% on most benchmark scenes), and will help to bring back transparency support for the ambient occlusion pass. * Duplicate some members from the main path state in the shadow path state. * Add shadow paths incrementally to the array similar to what we do for the shadow catchers. * For the scheduling, allow running shade surface and shade volume kernels as long as there is enough space in the shadow paths array. If not, execute shadow kernels until it is empty. * Add IntegratorShadowState and ConstIntegratorShadowState typedefs that can be different between CPU and GPU. For GPU both main and shadow paths juse have an integer for SoA access. Bt with CPU it's a different pointer type so we get type safety checks in code shared between CPU and GPU. * For CPU, add a separate IntegratorShadowStateCPU struct embedded in IntegratorShadowState. * Update various functions to take the shadow state, and make SVM take either type of state using templates. Differential Revision: https://developer.blender.org/D12889	2021-10-19 15:09:29 +02:00
Brecht Van Lommel	3065d26097	Cycles: optimize volume stack copying for shadow catcher/compaction Only copy the number of items used instead of the max items. Ref D12889	2021-10-18 19:02:10 +02:00
Brecht Van Lommel	1df3b51988	Cycles: replace integrator state argument macros * Rename struct KernelGlobals to struct KernelGlobalsCPU * Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs that every device can define in its own way. * Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and replace with these new typedefs. * Add explicit state argument to INTEGRATOR_STATE and similar macros In preparation for decoupling main and shadow paths. Differential Revision: https://developer.blender.org/D12888	2021-10-18 19:02:10 +02:00
Brecht Van Lommel	5d565062ed	Cleanup: refactor OptiX shadow intersection for upcoming changes	2021-10-15 15:42:44 +02:00
Brecht Van Lommel	eb71157e2a	Cleanup: add utility functions for packing integers	2021-10-15 15:42:44 +02:00
Brecht Van Lommel	2ba7c3aa65	Cleanup: refactor to make number of channels for shader evaluation variable	2021-10-15 15:42:44 +02:00
Michael Jones (Apple)	a0f269f682	Cycles: Kernel address space changes for MSL This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation. MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness. The vast majority of deltas in this patch fall into one of two cases: - Ensuring ccl_private is specified for thread-local pointer types - Ensuring ccl_global is specified for device-wide pointer types Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant. In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture. The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D12864	2021-10-14 16:14:43 +01:00
Brecht Van Lommel	04857cc8ef	Cycles: fully decouple triangle and curve primitive storage from BVH2 Previously the storage here was optimized to avoid indirections in BVH2 traversal. This helps improve performance a bit, but makes performance and memory usage of Embree and OptiX BVHs a bit worse also. It also adds code complexity in other parts of the code. Now decouple triangle and curve primitive storage from BVH2. * Reduced peak memory usage on all devices * Bit better performance for OptiX and Embree * Bit worse performance for CUDA * Simplified code: Intersection.prim/object now matches ShaderData.prim/object No more offset manipulation for mesh displacement before a BVH is built Remove primitive packing code and flags for Embree and OptiX Curve segments are now stored in a KernelCurve struct * Also happens to fix a bug in baking with incorrect prim/object Fixes T91968, T91770, T91902 Differential Revision: https://developer.blender.org/D12766	2021-10-06 17:52:04 +02:00
Sergey Sharybin	6e268a749f	Fix adaptive sampling artifacts on tile boundaries Implement an overscan support for tiles, so that adaptive sampling can rely on the pixels neighbourhood. Differential Revision: https://developer.blender.org/D12599	2021-10-05 16:19:14 +02:00
Campbell Barton	79290f5160	Cleanup: spelling in comments	2021-09-29 07:29:15 +10:00
Brian Savery	044a77352f	Cycles: add HIP device support for AMD GPUs NOTE: this feature is not ready for user testing, and not yet enabled in daily builds. It is being merged now for easier collaboration on development. HIP is a heterogenous compute interface allowing C++ code to be executed on GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support on Windows and Linux. https://github.com/ROCm-Developer-Tools/HIP. As of the time of writing, it should compile and run on Linux with existing HIP compilers and driver runtimes. Publicly available compilers and drivers for Windows will come later. See task T91571 for more details on the current status and work remaining to be done. Credits: Sayak Biswas (AMD) Arya Rafii (AMD) Brian Savery (AMD) Differential Revision: https://developer.blender.org/D12578	2021-09-28 19:18:55 +02:00
Patrick Mours	2189dfd6e2	Cycles: Rework OptiX visibility flags handling Before the visibility test against the visibility flags was performed in an any-hit program in OptiX (called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array. This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility flags of all objects that reference that primitive, so if one object uses different visibility flags than another object, but they both are instances of the same geometry, they would appear the same way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles visibility flags (12 bits). To mostly fix the problem with multiple instances and different visibility flags, I changed things to use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA` to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in curves from OptiX, since the any-hit program is then also used to skip the curve endcaps). This may also improve performance in some cases, since by default OptiX can now perform the normal scene intersection trace calls entirely on RT cores without having to jump back to the SM on every hit to execute the any-hit program. Fixes T89801 Differential Revision: https://developer.blender.org/D12604	2021-09-27 17:12:43 +02:00
Campbell Barton	b659d1a560	Cleanup: spelling in comments	2021-09-23 22:08:02 +10:00
Campbell Barton	4d66cbd140	Cleanup: spelling in comments	2021-09-22 14:54:01 +10:00
Brecht Van Lommel	0803119725	Cycles: merge of cycles-x branch, a major update to the renderer This includes much improved GPU rendering performance, viewport interactivity, new shadow catcher, revamped sampling settings, subsurface scattering anisotropy, new GPU volume sampling, improved PMJ sampling pattern, and more. Some features have also been removed or changed, breaking backwards compatibility. Including the removal of the OpenCL backend, for which alternatives are under development. Release notes and code docs: https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles https://wiki.blender.org/wiki/Source/Render/Cycles Credits: * Sergey Sharybin * Brecht Van Lommel * Patrick Mours (OptiX backend) * Christophe Hery (subsurface scattering anisotropy) * William Leeson (PMJ sampling pattern) * Alaska (various fixes and tweaks) * Thomas Dinges (various fixes) For the full commit history, see the cycles-x branch. This squashes together all the changes since intermediate changes would often fail building or tests. Ref T87839, T87837, T87836 Fixes T90734, T89353, T80267, T80267, T77185, T69800	2021-09-21 14:55:54 +02:00

Download

What's New

Roadmap

Documentation

Blender Studio

Manual

Benchmark

Blender Conference

Development Fund

One-time Donations

34 Commits