Commit Graph

4909 Commits

Author SHA1 Message Date
cc03689962 Fix embarrassing typo... 2014-10-13 15:26:00 +02:00
858bf1adea Cycles: Add CUDA support for sm_32 (Tegra K1, Jetson TK1).
Fix T42174.
2014-10-12 18:17:00 +02:00
744aaa955f Cleanup: Typo fix for Blackbody variable, had different naming in the comments and also in OSL. 2014-10-12 14:18:30 +02:00
abd38c00f1 Cycles: set hit values in-order 2014-10-11 11:17:08 +02:00
815919b1fb fixed printf format warning that occurred with 64-bit targets 2014-10-10 23:21:44 -05:00
dd897de061 check for missing Windows error code headers (was missing from Mingw64) 2014-10-10 23:21:44 -05:00
0f1c3958da Fix typo breaking compilation with rather strict flags (does not like implicit double to float conversion). 2014-10-10 15:11:23 +02:00
5711025765 Cycles: Use a bit better approach for erfinv()
Also reduce number of branching and multiplications a bit by inlining the branches.

This gives an unmeasurable speedup, which is in case of BMW is about 2% here.
2014-10-10 13:40:09 +02:00
fd6537a53a OSX: as an prerequisite to make Dalai's upcoming "area_fullsceen" work,
make sure the window states are correct in the lion_fs animation phase.
This also assures the CTX_wm_window(C) is okay.
2014-10-10 12:58:52 +02:00
f2280661cb Enable atomic peak memory detection
This gives more precise information about memory usage which might be real handy
when doing memory optimization.

It works good here for as long as i can tell but if for some reason you'll be
experiencing some weird slowdown please let me know.
2014-10-10 01:55:57 +06:00
45ce901079 Cycles: Remove redundant float4->float3 conversion
Not as if it gives noticeable changes render-time, but it's just weird to
convert float4 to float 3 to just access individual x/y/z components.

Plus some compilers might be more stupid than GCC and don't optimize this
out well.
2014-10-09 11:48:47 +02:00
e1ef451996 Fix OpenGL error on cycles rendered viewport.
We queried the wrong value when looking for the bound 2D texture. This
is not totally robust because currently bound texture may not be a 2D
one, but this should work for now.
2014-10-08 12:19:06 +02:00
efee3be1d3 Cycles: enable double promotion warning /w gcc 2014-10-08 10:58:40 +02:00
47b8bf591c Fix more issues after recent context commit 2014-10-08 04:15:51 +06:00
be3a6d78e8 Cycles: reduce float/double conversions 2014-10-08 00:13:26 +02:00
e2522b4a29 Cycles: correct math wrappers
include the parens around value before cast,
in some cases was causing double/float promotion by only casting the left value.
2014-10-08 00:13:26 +02:00
8d084e8c8f Ghost Context Refactor
https://developer.blender.org/D643
Separates graphics context creation from window code in Ghost so that they can vary separately.
2014-10-07 15:47:32 -05:00
409b3c9c9c Fix T42106: Box image mapping shows black triangles if they point to a corner and blend is 0
After discussion with cambo here we decided it's better to choose arbitrary side of the box
(in this case it's X-axis) and use image from it. That's better than doing a blackness.

P.S. This is literally a corner case anyway.
2014-10-07 15:48:39 +02:00
4c60aae66c Cleanup: warnings 2014-10-06 23:19:07 +02:00
939fa6759c Cycles: Fix for camera-in-volume detection
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.

This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
2014-10-06 12:36:46 +02:00
233de800e2 Cycles: Optimize of volume stack update when sampling SSS
basically we skip all non-volume objects now in the volume stack function.
Depending on the show it might give some percent of speedup.

Most of the speedup would be gained in the scenes when having SSS object
intersecting the volume and taking a reasonable amount of frame space.
2014-10-06 12:36:46 +02:00
b36eb51d37 Cycles: Fix for viewport rendering with debug enabled 2014-10-06 12:36:46 +02:00
cd6129d1ff Cycles: Workaround dead-slow expf() on 64bit linux
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.

Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.

Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
2014-10-06 12:36:46 +02:00
1f1dcdfd76 Cycles: Move system headers include to the top of the files
This is a good practice to do anyway, plus it'll help with the upcoming change.
2014-10-06 12:36:46 +02:00
6feac1e940 Cycles: Center Tile order had a slight offset to the left.
Signed-off-by: Thomas Dinges
2014-10-05 18:35:49 +02:00
a1b27d6424 Fix T42081, OpenCL supports float3 since the 1.1 specification, not sure why we needed this. 2014-10-05 18:10:42 +02:00
d3a7f3fa29 Cycles: Forgot to set WITH_CYCLES_DEBUG for OSL kernel 2014-10-05 17:43:54 +02:00
e4b910a0aa Cycles: __KERNEL_DEBUG__ wasn't set for compile-time kernels 2014-10-05 21:42:53 +06:00
9241f12e10 OSX/Ghost: little code cleanup 2014-10-05 12:45:14 +02:00
68f2066602 Cycles: Make OpenCL folks happy to use __KERNEL_DEBUG__
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.

Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
2014-10-05 16:00:23 +06:00
0106b94f9d Cycles: Fix for debug kernel not working with CUDA 2014-10-05 15:31:48 +06:00
a613290775 Cycles / CUDA: Workaround to make sm_52 (Maxwell) cards work.
* sm_52 can run a sm_50 kernel, so tell runtime detection to use that until we build a dedicated sm_52 kernel.
2014-10-05 04:13:40 +02:00
dde740bcd7 Cycles / CUDA: Change inline rules for BVH intersection functions.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba4
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
2014-10-05 03:53:51 +02:00
15969e8a30 Cycles: Fix wrong ifdef check around shadows record all 2014-10-04 16:21:05 +02:00
27d660ad20 Cycles: Add support for debug passes
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.

Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.

From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.

Reviewers: campbellbarton

Reviewed By: campbellbarton

Differential Revision: https://developer.blender.org/D813
2014-10-04 19:00:26 +06:00
6359c36ba4 Cycles: Remove a workaround for Titan GPUs, not needed anymore with the latest CUDA compiler. 2014-10-04 01:29:08 +02:00
cdbac018a2 Cycles, some tweaks to scene_intersect_shadow_all()
* Function returns a bool, not an uint.
* Remove GPU ifdefs, this is CPU only due to malloc / qsort.
2014-10-03 20:41:38 +02:00
02ffed4052 Cleanup: Remove some unused / unreferenced functions for perdiodic perlin noise. 2014-10-03 18:00:45 +02:00
3aa65574f5 Cycles / OSL: Make the signed/unsigned Perlin parameter more self explaining. 2014-10-03 17:51:21 +02:00
dc1ca0c94f Cycles: Fix OpenCL compile after new Volume BVH introduction and add some comments. 2014-10-03 17:23:45 +02:00
5e10392e9f Cycles: Missing volume traversal header in cmake for GPU compilation. 2014-10-03 17:11:00 +02:00
4b2fadeaba Cycles: Remove Westin closure.
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
2014-10-03 16:03:49 +02:00
02f58ac623 Cleanup: Spelling. 2014-10-03 15:28:52 +02:00
1e4d99368b Cycles: Use more accurate implementation of erf() and erfinv()
This functions are  orders of magnitude more accurate than the old ones,
and they're around the same complexity to compute.
2014-10-03 18:28:44 +06:00
0fa7e4c853 Cycles: Decouple object flags update to a separate update step
This way there's much less cross-references between objects and meshes
device update functions.

The only thing remained s the object bounds calculation which is needed
by bvh update. This could also be decoupled, but it's not that crucial
yet because its's how it used to be for ages now.
2014-10-03 12:13:41 +02:00
502f6d538d Fix T41920: Changing Use Alpha settings doesn't refresh viewport properly 2014-10-03 11:27:05 +02:00
a654512356 Cycles: Implement preliminary test for volume stack update from SSS
This adds an AABB collision check for objects with volumes and if there's a
collision detected then the object will have SD_OBJECT_INTERSECTS_VOLUME flag.

This solves a speed regression introduced by the fix for T39823 by skipping
volume stack update in cases no volumes intersects the current SSS object.
2014-10-03 10:52:04 +02:00
b86f199a98 Cycles: Fix for non-initialized variable 2014-10-03 10:44:24 +02:00
527d049c5c Cycles: Make camera-in-volume an official feature
This means it's no longer needed to enable experimental feature set in order to
have proper camera in volume support. And this also means if there's something
wrong going on, or if there's speed regression for cases when camera is obviously
not in the volume -- this issues are to be reported and handled in the regular
matter.

Happy blending!
2014-10-03 12:55:31 +06:00
7dabfb2048 Cycles: Speedup of kernel side camera-in-volume detection
The idea is to only count intersections with objects which has volumetric shader
and ignore all other objects.

This is probably as fast as we can go without involving some forth level magic.
2014-10-03 12:55:31 +06:00