* malloc() is used now, which is supported since sm_20: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations The performance of this needs to be tested on various cards still. * This also works for Heterogeneous Decoupled Ray Marching, but in this case I get sporadic "Illegal Address" errors on my Geforce 540, therefore I did not remove the GPU check in kernel_volume_use_decoupled() yet. I would appreciate some tests from people who compile themselves, enable Volumetrics in kernel_types.h.