blender-archive

Archived

Author	SHA1	Message	Date
Sergey Sharybin	10a25b655a	Cycles: Add AVX2 path to subsurface triangle intersection Similar to regular triangle intersection case. Gives about 3% speedup rendering SSS object on my desktop, Question: how to avoid such a code duplication in a nice way without speed loss?	2016-10-24 16:56:41 +02:00
Sergey Sharybin	42aeb608e7	Cycles: Implement AVX2 version of triangle_intersect This commit basically vectorizes existing code using AVX2 instructions (without modifying algorithm itself). This gives quite nice speedups: BMW: -8% Classroom: -5% Cat: -5% Koro: +1% Barcelona: -8% That's on Linux machine, reported performance improvement on Windows goes up to 20%. Not currently sure why Koro is somewhat slower because it mainly uses curve intersection tests, could be a time noise? Or osmething with the cache utilization perhaps? In any case speedup in other scenes makes me thinking that current state is acceptable for initial implementation. This is again inspired by Maxym Dmytrychenko.	2016-10-12 14:11:55 +02:00
Campbell Barton	710ab5be36	Cleanup: spelling, style	2016-07-31 17:41:05 +10:00
Lukas Stockner	d9cc3ea2c6	Cycles: Fix rays parallel to the surface in the triangle refine and MultiGGX code In the triangle intersection refinement code, rays that are parallel to the triangle caused a divide by zero. These rays might initially hit the triangle due to the watertight intersection test, but are very rare - therefore, just skipping the refinement for them works fine. Also, a few remaining issues in the MultiGGX code are fixed that were caused by rays parallel to the surface (which happened more often there due to smooth shading).	2016-07-25 16:14:25 +02:00
Brecht Van Lommel	f23fecf306	Fix use of uninitialized variable in recent SSS fix.	2016-07-24 16:40:30 +02:00
Sergey Sharybin	9946cca146	Fix T48860: Cycles SSS artifacts with spatially split BVH The issue was caused by SSS intersection code gathering all intersections without check for duplicated ones. This caused situations when same intersection will be recorded twice in the case if triangle is shared by several BVH nodes. Usually this is handled by checking intersection distance after sorting intersections (in shadow_blocked for example) but for SSS we don't do such sorting and using number of intersections to calculate various things. Didn't find anything smarter than to check intersection distance in triangle_intersect_subsurface(). This solves render artifacts in the cost of 1.5% slowdown of extreme case rendering (SSS object filling in whole FullHD screen). Reviewers: brecht Reviewed By: brecht Differential Revision: https://developer.blender.org/D2105	2016-07-18 10:04:20 +02:00
Sergey Sharybin	17e7454263	Cycles: Reduce memory usage by de-duplicating triangle storage There are several internal changes for this: First idea is to make __tri_verts to behave similar to __tri_storage, meaning, __tri_verts array now contains all vertices of all triangles instead of just mesh vertices. This saves some lookup when reading triangle coordinates in functions like triangle_normal(). In order to make it efficient needed to store global triangle offset somewhere. So no __tri_vindex.w contains a global triangle index which can be used to read triangle vertices. Additionally, the order of vertices in that array is aligned with primitives from BVH. This is needed to keep cache as much coherent as possible for BVH traversal. This causes some extra tricks needed to fill the array in and deal with True Displacement but those trickery is fully required to prevent noticeable slowdown. Next idea was to use this __tri_verts instead of __tri_storage in intersection code. Unfortunately, this is quite tricky to do without noticeable speed loss. Mainly this loss is caused by extra lookup happening to access vertex coordinate. Fortunately, tricks here and there (i,e, some types changes to avoid casts which are not really coming for free) reduces those losses to an acceptable level. So now they are within couple of percent only, On a positive site we've achieved: - Few percent of memory save with triangle-only scenes. Actual save in this case is close to size of all vertices. On a more fine-subdivided scenes this benefit might become more obvious. - Huge memory save of hairy scenes. For example, on koro.blend there is about 20% memory save. Similar figure for bunny.blend. This memory save was the main goal of this commit to move forward with Hair BVH which required more memory per BVH node. So while this sounds exciting, this memory optimization will become invisible by upcoming Hair BVH work. But again on a positive side, we can add an option to NOT use Hair BVH and then we'll have same-ish render times as we've got currently but will have this 20% memory benefit on hairy scenes.	2016-07-07 17:25:48 +02:00
Sergey Sharybin	b595a692c8	Cycles: Limit degenerated triangle check got CUDA only OpenCL seems to work fine here, and for some reason that comparison was giving compilation error on OpenCL here. Better to compile OpenCL kernel than to be fully robust to weird corner cases.	2016-06-07 15:48:56 +02:00
Sergey Sharybin	f54a98a1c5	Cycles: Simplify check for degenerated faces on GPU Still not sure how to properly solve the issue, needs some trickery to get actual optimized values from intersection function (using printf() avoids some optimization and makes stuff render correct). For the time being let's just simplify check.	2016-06-03 10:36:04 +02:00
Sergey Sharybin	6cd13a221f	Cycles: Rename tri_woop to tri_storage It's no longer a pre-computed data and just a storage of triangle coordinates which are faster to access to.	2016-04-11 17:18:14 +02:00
Sergey Sharybin	700722f686	Cycles: Cleanup, indent nested preprocessor directives Quite straightforward, main trick is happening in path_source_replace_includes(). Reviewers: brecht, dingto, lukasstockner97, juicyfruit Differential Revision: https://developer.blender.org/D1794	2016-03-25 13:55:42 +01:00
Sergey Sharybin	c93069083e	Cycles: Tweaks for 32bit CUDA binaries Tweak some inline policies. Not totally crazy yet, and in fact we now have one less ifdef statement now.	2016-02-15 19:11:02 +01:00
Sergey Sharybin	72e31d6a72	Cycles: Always inline triangle precalc for CUDA devices Since the SSS changes compiling Experimental sm_52 kernel seems to work just fine.	2016-01-11 21:41:00 +05:00
Sergey Sharybin	a6bbf05ba6	Cycles: Fix wrong SSS intersection refinement when this option is disabled The code is disabled by default, but we'd better keep it all correct.	2015-12-02 03:14:54 +05:00
Sergey Sharybin	8bca34fe32	Cysles: Avoid having ShaderData on the stack This commit introduces a SSS-oriented intersection structure which is replacing old logic of having separate arrays for just intersections and shader data and encapsulates all the data needed for SSS evaluation. This giver a huge stack memory saving on GPU. In own experiments it gave 25% memory usage reduction on GTX560Ti (722MB vs. 946MB). Unfortunately, this gave some performance loss of 20% which only happens on GPU. This is perhaps due to different memory access pattern. Will be solved in the future, hopefully. Famous saying: won in memory - lost in time (which is also valid in other way around).	2015-11-25 13:01:22 +05:00
Sergey Sharybin	47b1279762	Cycles: Watertight fix for SSS intersection Same as previous commit, just was missing in there.	2015-10-22 22:10:40 +05:00
Sergey Sharybin	f84cbae43e	Cycles: Fix for watertight intersection It was possible to miss some intersection caused by wrong barycentric coordinates sign. Cases when one of the coordinate is zero and other are negative was not handled correct.	2015-10-22 22:07:28 +05:00
Sergey Sharybin	3cee28ebf3	Fix T46143: Faces missing with GPU render Epsilon was quite arbitrary for GPU, replaced with checking for zero-sized faces. It should solve both original report and the new one. After the release we can check why GPU doesn't produce accurate math here and go to the root of the issue.	2015-09-17 17:21:17 +05:00
Sergey Sharybin	1a04179802	Cycles: Cleanup, typo Spotted by Campbell, thanks!	2015-09-09 14:25:43 +05:00
Sergey Sharybin	d13a0e8f4a	Cycles: Limit triangle magnitude check for only GPU Found a way to make AVX2 CPUs happy by reshuffling instructions a bit, so now there's no weird precision errors happening in there. This solves some render speed regressions on CPU, but unfortunately this doesn't help for GPU rendering.	2015-09-09 13:39:36 +05:00
Sergey Sharybin	46d2abf78f	Cycles: Only use ascii in comments	2015-09-09 13:39:36 +05:00
Sergey Sharybin	1a7eca3c54	Fix T46034: OpenCL kernel compilation error in latest buildbot Simply expanded expression, so no float4->float3 conversion happens.	2015-09-07 15:02:44 +05:00
Sergey Sharybin	713ce037ab	Cycles: Fix wrong check for zero-sized triangles Initial idea was to optimize calculation a bit by skipping calculation of actual triangle edges and use vector from ray origin to triangles. In practice this optimization didn't quite work in cases when origin point is too close to the triangle. Let's do 2.76 with a bit more complicated calculation, still looking into exact reasons why watertight intersections fails in certain cases, but actual fix might bit be ready so soon. This fixes wrong eyes on the lady from T46013.	2015-09-04 20:06:31 +05:00
Sergey Sharybin	7dc75ea8f4	Fix T45904: Cycles bug after recent triangle intersect changes Calculated cross product from wrong vectors by accident.	2015-08-25 18:32:11 +02:00
Sergey Sharybin	2fb639deed	Fix T45778: Objects scaled to 0 cause black artifacts with Static BVH The issue was caused by some numeric instability in triangle intersection which was visible on avx2 CPUs and GPUs (at least sm_20 here) but maybe some others too. Committing rather a workaround for now to be safe for the release, still need some investigation. From tests with grass field from Gooseberry project didn't see measurable slowdown.	2015-08-24 21:23:49 +02:00
Sergey Sharybin	a95b0e0e9d	Cycles: Another fix for OSX, sm_50 experimental actually also fails to compile Didn't notice it originally because compilation was threaded.	2015-06-20 19:40:23 +02:00
Sergey Sharybin	5e2835037a	Cycles: Tweak to previous commit, experimental sm_52 works on Linux but not OSX	2015-06-20 19:01:24 +02:00
Sergey Sharybin	34d665a4a4	Cycles: Un-inline triangle_intersect_precalc() on Apple OpenCL This gives quite the same problems as experimental CUDA kernels and for until it's found a root cause of the problem we'd just explicitly uninline the function.	2015-06-20 18:00:30 +02:00
Sergey Sharybin	845854959f	Cycles: Cleanup, make it more obvious which platform requires workaround for triangle intersection Should be no functional changes.	2015-06-20 17:01:21 +02:00
Sergey Sharybin	2ab909a88c	Cycles: Make experimental kernel build option more generic Previously it was explicitly mentioning it's NVidia kernel related option, but in fact it's also handy for the OpenCL kernel.	2015-05-15 13:22:47 +05:00
Sergey Sharybin	3d3d805b64	Cycles: Prepare code for OpenCL camera/motion blur The kernels are now compiling just fine, but there're some issues during rendering. This is still to be investigated.	2015-05-14 18:48:56 +05:00
Sergey Sharybin	bf11e362c5	Fix T44046: Cycles speed regression in 2.74 (CPU only) Issue was caused by MSVC not being able to optimize some code out in the same way as GCC/Clang does, so now that parts of code are explicitly unfolded in order to help compilers out. This makes speed loss much less drastic on my laptop. That's probably as good as we can do with MSVC without investing infinite amount of time looking trying to workaround the optimizer.	2015-04-08 18:47:25 +05:00
Sergey Sharybin	09a746b857	Cycles: Cleanup, typos	2015-04-08 01:15:38 +05:00
Sergey Sharybin	858f54f16e	Cycles: Cleanup, indentation	2015-04-07 22:41:08 +05:00
Sergey Sharybin	f1494edf78	Cycles: Make SSS intersection closer to regular triangle intersection	2015-04-01 21:20:04 +05:00
Sergey Sharybin	394b947a50	Cycles: Remove unused direction from triangle intersection functions This argument was unused and got nicely optimized out. But once it starts to be using registers are getting stressed really crazy, causing slow down of render.	2015-04-01 21:08:12 +05:00
Sergey Sharybin	5ff132182d	Cycles: Code cleanup, spaces around keywords This inconsistency drove me totally crazy, it's really confusing when it's inconsistent especially when you work on both Cycles and Blender sides. Shouldn;t cause merge PITA, it's whitespace changes only, Git should be able to merge it nicely.	2015-03-28 00:15:15 +05:00
Sergey Sharybin	dce16d57dc	Revert "Fix T43865: Cycles: Watertight rendering produces artifacts on a huge plane" The fix was really flacky, in terms during speed benchmarks i had abort() in the fallback block to be sure it never runs in production scenes, but that affected on the optimization as well. Without this abort there's quite bad slowdown of 5-7% on the renders even tho the Pleucker fallback was never run. This is all weird and for now reverting the change which affects on all the production scenes and will look into alternative fixes for the original issue with precision loss on huge planes. This reverts commit `9489205c5c`.	2015-03-12 18:24:53 +05:00
Sergey Sharybin	9489205c5c	Fix T43865: Cycles: Watertight rendering produces artifacts on a huge plane The issue was caused by numerical instability whrn having ray origin close to a huge triangle, which could have aused bad ray distance check. Watertight Woop intersection isn't really addressing such cases, it's dealing with small triangles far away from the ray origin instead, so it's a bit tricky yo make it working reliably. While we're quite close to the release it's safer to do check in Pleaucker coordinates if ray close to a huge triangle. Likely this additional check combined with some other tweaks to the code doesn't cause measurable slowdown in the scenes tested here. After the release we can play a bit more with this code in order to make it more stable without Pleucker fallback.	2015-03-05 18:55:30 +05:00
Sergey Sharybin	d544bc5cd5	Cycles: Fix embarrassing type remained after getting rid of utility SWAP()	2015-03-04 00:16:21 +05:00
Sergey Sharybin	edb7195f27	Cycles: Bring back distance check in re-intersection From more investigation of the numeric failures in the kernel it appears the check was rather correct. But in theory it;s also needed for the motion triangles.	2015-02-10 19:07:55 +05:00
Sergey Sharybin	298d8681a0	Fix T43596: Refraction BSDF crashes blender on pre-sse4 CPU This is the same issue T43475: SSE4 code is more robust to non-finite values in the ray origin/direction. So for now added a check before doing BVH traversal for pre-SSE4 CPUs. For sure actual root of the issue is a bit different and much more tricky to solve, especially without disturbing render results too much. Still looking into this. In any case, it's kinda fine to have such a check, we might later make it to be a kernel_assert() instead of just a return.	2015-02-10 17:36:05 +05:00
Sergey Sharybin	b83d851901	Cycles: Another attempt to solve 32bit CUDA kernel Previous fix didn't quite work well. For some reason everything worked fine when using native nvcc in 32bit environment, but cross-compiling from 64bit platform it was still running out of memory. For now just made it so all the kernels are slower on 32bit CUDA as a temporary solution. Either it'll be solved in next CUDA releases (by dropped 32bit? =\) or we'll find better workaround.	2015-02-09 16:14:44 +05:00
Sergey Sharybin	da06dab4e5	Cycles: Use pre-aligned triangle vertex coordinates for subsurface intersection This gives small speedup (around 2% in quick tests) for ray scattering.	2015-02-04 14:49:19 +05:00
Sergey Sharybin	432e478f43	Cycles: Further tweaks to T43511 to solve compilation error on 32bit platforms	2015-02-02 22:09:02 +05:00
Sergey Sharybin	31263192bb	Fix T43511: Major slow down with many instanced objects in cycles GPU Slowdown was caused by watertight intersection commit and follow-up workaorund for compiler crash which uninlined utility function which rotates the ray. Now it's only uninlined for sm_50 and sm_52 experimental kernels which are the only ones which failed to compile. Rendering still might be a bit slower but at least shouldn't be that dramatic.	2015-02-02 17:35:57 +05:00
Sergey Sharybin	3f5771475d	Cycles: Don't perform re-intersection if ray distance is zero It is possible that ray distance will be zero which would make intersection refinement return NaN as the refined position which would later lead to all sort of mathematical issues. Don't think there are ways to improve intersection accuracy for such rays so just return original intersection coordinate. This should fix T43475. TODO: Need to look into possible issues in Ashikhmin BSDF which might return zero-length reflected/transmitted ray?	2015-01-31 01:49:48 +05:00
Sergey Sharybin	2a8a56929b	Cycles: Fix unneeded int/float conversion happened in previous commit	2015-01-02 17:21:24 +05:00
Sergey Sharybin	4f2583ee13	Fix T43027: OpenCL kernel compilation broken after QBVH OpenCL apparently does not support templates, so the idea of generic function for swapping is a bit of a failure. Now it is either inlined into the code (in triangle intersection) or has specific implementation for QBVH. This is probably even better, because we can't create QBVH-specific function in util_math anyway.	2015-01-02 14:58:01 +05:00
Sergey Sharybin	fe06ec82a9	Cycles: Workaround CUDA 6.5.16 error after watertight commit This issue doesn't happen with 6.5.12 and there's slight piece of hope it'll be fixed in next toolkit releases.. For now we're forcing CUDA to not inline ray precalculation. This could lead to some speed regression, but wouldn't expect it to be huge -- this code does not run that often comparing to actual triangle intersection.	2014-12-25 14:15:37 +05:00

1 2

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

54 Commits