blender-archive

Archived

Author	SHA1	Message	Date
Lukas Stockner	fccf506ed7	Cycles: animation denoising support in the kernel. This is the internal implementation, not available from the API or interface yet. The algorithm takes into account past and future frames, both to get more coherent animation and reduce noise. Ref D3889.	2019-02-06 15:18:42 +01:00
Lukas Stockner	405cacd4cd	Cycles: prefilter feature passes separate from denoising. Prefiltering of feature passes will happen during rendering, which can then be used for denoising immediately or written as a render pass for later (animation) denoising. The number of denoising data passes written is reduced because of this, leaving out the feature variance passes. The passes are now Normal, Albedo, Depth, Shadowing, Variance and Intensity. Ref D3889.	2019-02-06 15:18:29 +01:00
Brecht Van Lommel	a8b8da5567	Fix T58183: crash with CPU + GPU rendering after profiling changes. Multi-device was not passing along profiler to the CPU.	2018-11-29 23:43:27 +01:00
Sergey Sharybin	203de0bbf0	Cycles: Cleanup, space after (void) It was used in like 95% of places.	2018-11-09 12:08:51 +01:00
Sergey Sharybin	2330cadb0f	Cycles: Cleanup, don't use strict C prototypes Those are more like a legacy of language, which is not needed in C++.	2018-11-09 12:04:41 +01:00
Sergey Sharybin	cb4b5e12ab	Cycles: Cleanup, spacing after preprocessor It is supposed to be two spaces before comment stating which if else/endif statements corresponds to. Was mainly violated in the header guards.	2018-11-09 11:34:54 +01:00
Sergey Sharybin	e0cc3e9809	Cycles: Fix wrong BVH used when disabling AVX2 in debug settings Mainly useful for debugging. Previously, when AVX2 was disabled in the debug panel but BVH layout was kept on BVH8 nothing was rendered. Needed to make it so supported BVH layout mask for devices is queried in "dynamic", so it is possible to use DebugFlags there.	2018-10-31 11:46:52 +01:00
Lukas Stockner	15e9d80375	Cycles: Use existing shared temporary memory in reconstruction step of the denoiser Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.	2018-10-08 22:13:40 +02:00
Sergey Sharybin	a5101e4da8	Cycles: Cleanup, double semicolon	2018-09-19 18:41:43 +02:00
Lukas Stockner	94efc651d4	Cycles Denoiser: Allocate a single temporary buffer for the entire denoising process With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot. Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.	2018-08-25 12:23:52 -07:00
Stefan Werner	4d00e95ee3	Cycles: Adding native support for UINT16 textures. Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed. Reviewers: #cycles, brecht, sergey Reviewed By: #cycles, brecht, sergey Subscribers: sergey, dingto, #cycles Tags: #cycles Differential Revision: https://developer.blender.org/D3523	2018-07-05 13:53:34 +02:00
Lukas Stockner	c960804747	Cycles Denoising: Pass tile buffers to every OpenCL kernel to conform to standard and get rid of set_tile_info	2018-07-04 14:38:03 +02:00
Lukas Stockner	9db8bdbc65	Cycles Denoising: Cleanup: Rename tiles to tile_info	2018-07-04 14:37:24 +02:00
Lukas Stockner	97a0d6fcc7	Cycles Denoising: Refactor denoiser tile handling This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).	2018-07-04 14:36:01 +02:00
Lukas Stockner	b10c64bd2f	Cycles Denoising: Split main function into logical steps	2018-07-04 14:35:05 +02:00
Brecht Van Lommel	a283333cd8	Fix Cycles CUDA render errors with CUDA 9.2. Work around what might be a compiler bug.	2018-06-21 12:32:32 +02:00
Lukas Stockner	7bf4023689	Fix T55448: Typo in Cycles CUDA debug output Reviewers: sergey, lukasstockner97 Reviewed By: lukasstockner97 Tags: #cycles, #bf_blender Differential Revision: https://developer.blender.org/D3472	2018-06-12 10:45:32 +02:00
Lukas Stockner	16c05161e7	Cycles: Cleanup: Remove double semicolons	2018-04-29 09:28:41 +02:00
Brecht Van Lommel	fee4b646c4	Cycles: tweak CUDA messages and avoid build errors with existing sm_2x configs.	2018-02-18 00:53:25 +01:00
Brecht Van Lommel	1dcd7db73d	Code cleanup: remove some more unused code after recent CUDA changes.	2018-02-18 00:53:03 +01:00
Thomas Dinges	9e717c0495	Cycles: Remove Fermi texture code. This should be the last Fermi removal commit, unless I missed something. It's been a pleasure Fermi!	2018-02-17 22:56:58 +01:00
Thomas Dinges	2eaf90b305	Cycles: Remove Fermi support from CMake and update runtime checks in device_cuda.cpp. Fermi code in Cycles kernel and texture system are coming next.	2018-02-17 16:15:07 +01:00
Brecht Van Lommel	1dafe759ed	Update CUEW to latest version This brings separate initialization for libcuda and libnvrtc, which fixes Cycles nvrtc compilation not working on build machines without CUDA hardware available. Differential Revision: https://developer.blender.org/D3045	2018-02-07 11:53:01 +01:00
Ray molenkamp	a5052770b8	cycles: Add an nvrtc based cubin cli compiler. nvcc is very picky regarding compiler versions, severely limiting the compiler we can use, this commit adds a nvrtc based compiler that'll allow us to build the cubins even if the host compiler is unsupported. for details see D2913. Differential Revision: http://developer.blender.org/D2913	2018-02-03 10:59:09 -07:00
Sergey Sharybin	2f79d1c058	Cycles: Replace use_qbvh boolean flag with an enum-based property This was we can introduce other types of BVH, for example, wider ones, without causing too much mess around boolean flags. Thoughs: - Ideally device info should probably return bitflag of what BVH types it supports. It is possible to implement based on simple logic in device/ and mesh.cpp, rest of the changes will stay the same. - Not happy with workarounds in util_debug and duplicated enum in kernel. Maybe enbum should be stores in kernel, but then it's kind of weird to include kernel types from utils. Soudns some cyclkic dependency. Reviewers: brecht, maxim_d33 Reviewed By: brecht Differential Revision: https://developer.blender.org/D3011	2018-01-22 17:19:20 +01:00
Brecht Van Lommel	d0892a6648	Fix issue with moving CUDA memory to host and multiple devices. This is not expected to fix all issues. Also adds some more details to error reporting to investigate failures.	2018-01-11 00:00:48 +01:00
Brecht Van Lommel	c621832d3d	Cycles: CUDA support for rendering scenes that don't fit on GPU. In that case it can now fall back to CPU memory, at the cost of reduced performance. For scenes that fit in GPU memory, this commit should not cause any noticeable slowdowns. We don't use all physical system RAM, since that can cause OS instability. We leave at least half of system RAM or 4GB to other software, whichever is smaller. For image textures in host memory, performance was maybe 20-30% slower in our tests (although this is highly hardware and scene dependent). Once other type of data doesn't fit on the GPU, performance can be e.g. 10x slower, and at that point it's probably better to just render on the CPU. Differential Revision: https://developer.blender.org/D2056	2018-01-02 23:50:18 +01:00
Brecht Van Lommel	6699454fb6	Cycles: make CUDA code a bit more robust to host/device alloc failures. Fixes a few corner cases found while stress testing host mapped memory.	2018-01-02 23:46:19 +01:00
Sergey Sharybin	5650fe77e4	Cycles: Cleanup, indentation	2017-12-20 17:42:50 +01:00
Lukas Stockner	fa3d50af95	Cycles: Improve denoising speed on GPUs with small tile sizes Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.	2017-11-30 07:37:08 +01:00
Lukas Stockner	40f528a7da	Cycles: Add per-tile render time debug pass Reviewers: sergey, brecht Differential Revision: https://developer.blender.org/D2920	2017-11-17 16:40:24 +01:00
Brecht Van Lommel	e568c1a975	Fix T53289: CUDA missing textures not showing pink, after recent changes.	2017-11-12 20:45:47 +01:00
Brecht Van Lommel	bd4bea3e98	Cycles: avoid reallocating tile denoising memory many times during render.	2017-11-09 20:28:00 +01:00
Mai Lavelle	087331c495	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable Goal is to reduce OpenCL kernel recompilations. Currently viewport renders are still set to use 64 closures as this seems to be faster and we don't want to cause a performance regression there. Needs to be investigated. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2775	2017-11-09 01:04:06 -05:00
Brecht Van Lommel	ff34e48911	Cycles: add an extra CUDA synchronize before rendering. It should not be needed as far as I know, but just in case it fixes any of the recent issues like T52572.	2017-11-07 22:35:12 +01:00
Brecht Van Lommel	5801ef71e4	Code refactor: device memory cleanups, preparing for mapped host memory.	2017-11-05 15:22:04 +01:00
Brecht Van Lommel	5475314f49	Cycles: reserve CUDA local memory ahead of time. This way we can log the amount of memory used, and it will be important for host mapped memory support.	2017-11-05 15:22:04 +01:00
Brecht Van Lommel	33b5e8daff	Code refactor: replace CUDA array with linear memory for 1D and 2D textures. This is a prequisite for getting host memory allocation to work. There appears to be no support for 3D textures using host memory. The original version of this code was written by Stefan Werner for D2056.	2017-11-04 02:23:00 +01:00
Brecht Van Lommel	6ec599c682	Fix T53247: mixed CPU + GPU render wrong texture limits.	2017-11-03 20:32:29 +01:00
Brecht Van Lommel	070a668d04	Code refactor: move more memory allocation logic into device API. * Remove tex_* and pixels_* functions, replace by mem_. Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices. * No longer create device_memory and call mem_* directly, always go through device_only_memory, device_vector and device_pixels.	2017-10-24 01:25:19 +02:00
Brecht Van Lommel	aa8b4c5d81	Code refactor: use device_only_memory and device_vector in more places.	2017-10-24 01:25:13 +02:00
Brecht Van Lommel	7ad9333fad	Code refactor: store device/interp/extension/type in each device_memory.	2017-10-24 01:03:59 +02:00
Brecht Van Lommel	57a0cb797d	Code refactor: avoid some unnecessary device memory copying.	2017-10-21 20:58:28 +02:00
Sergey Sharybin	910dd7fb1b	Cycles: Add extra logging in CUDA device detection code	2017-10-19 11:26:10 +02:00
Brecht Van Lommel	e360d003ea	Cycles: schedule more work for non-display and compute preemption CUDA cards. This change affects CUDA GPUs not connected to a display or connected to a display but supporting compute preemption so that the display does not freeze. I couldn't find an official list, but compute preemption seems to be only supported with GTX 1070+ and Linux (not GTX 1060- or Windows). This helps improve small tile rendering performance further if there are sufficient samples x number of pixels in a single tile to keep the GPU busy.	2017-10-08 21:12:16 +02:00
Brecht Van Lommel	cdb0b3b1dc	Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading.	2017-10-08 13:17:33 +02:00
Brecht Van Lommel	23098cda99	Code refactor: make texture code more consistent between devices. * Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels//kernel__image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.	2017-10-07 14:53:14 +02:00
Brecht Van Lommel	fb99ea79f8	Code refactor: split displace/background into separate kernels, remove luma.	2017-10-05 17:57:58 +02:00
Brecht Van Lommel	49199963bf	Fix incorrect CUDA remaining time estimate after previous commit.	2017-10-04 23:25:51 +02:00
Brecht Van Lommel	6da6f8d33f	Cycles: CUDA faster rendering of small tiles, using multiple samples like OpenCL. The work size is still very conservative, and this doesn't help for progressive refine. For that we will need to render multiple tiles at the same time. But this should already help for denoising renders that require too much memory with big tiles, and just generally soften the performance dropoff with small tiles. Differential Revision: https://developer.blender.org/D2856	2017-10-04 21:58:47 +02:00

1 2 3 4 5

Download

What's New

Roadmap

Documentation

Blender Studio

Manual

Benchmark

Blender Conference

Development Fund

One-time Donations

241 Commits