Issue was caused by linear float buffer creating for every working thread. This buffer actually duplicated original buffer which doubles amount of required memory. We can not avoid such a duplication, because OCIO needs to work on a float buffer and it modifies this buffer. Alternative for now is to not allocate linear buffer for the whole chunk which needs to be handled by the thread and use further chunk cutting in thread itself. So now every thread will handle the chunk in blocks of 64 scanlines. This reduces memory overhead significantly without speed loss in own tests. Ideally, IMB_processor_apply_threaded need to be switched to generic task scheduler and made it so this function generates tasks with reasonable number of scanlines. This requires much more testing to be sure there're no conflicts with object update and so. Such a change to IMB_processor_apply_threaded would not be noticed by users, so do not consider this is a crucial to do right now.