Vulkan: Low Precision Float Conversion #108168

Merged
Jeroen Bakker merged 3 commits from Jeroen-Bakker/blender:vulkan-low-precision-float-conversion into main 2023-06-07 07:50:12 +02:00
Member

This PR adds conversion template to convert between Low Precision float
formats. These include Binary32 floats and lower. It also adds support
to convert between unsigned and signed float formats and float formats
with different mantissa and exponents.

Additionally overflows (values that don't fit in the target float
format) will be clamped to the maximum value.

Reasoning:
Up to now the Vulkan backend only supported float and half float
formats, but to support workbench 11 and 10 unsigned floats have to be
supported as well. The available libraries that support those float
formats targets scientific applications. Where the final code couldn't
be optimized that well by the compiler.

Data conversion for color pixels have different requirements about
clamping and sign, what could eliminate some clamping code in other
areas in Blender as well. Also could fix some undesired overflow when
using pixels with high intensity that didn't fit in the texture format
leading to artifects in Eevee and slow-down in the image editor.

Future
In the future we might want to move this to the public part of the GPU
module so we can use this as well in other areas (Metal backend), Imbuf clamping
See 3c658d2c2e for a commit that uses
this and improves image editor massively as it doesn't need to reiterate over
the image buffer to clamp the values into a known range.

This PR adds conversion template to convert between Low Precision float formats. These include Binary32 floats and lower. It also adds support to convert between unsigned and signed float formats and float formats with different mantissa and exponents. Additionally overflows (values that don't fit in the target float format) will be clamped to the maximum value. **Reasoning**: Up to now the Vulkan backend only supported float and half float formats, but to support workbench 11 and 10 unsigned floats have to be supported as well. The available libraries that support those float formats targets scientific applications. Where the final code couldn't be optimized that well by the compiler. Data conversion for color pixels have different requirements about clamping and sign, what could eliminate some clamping code in other areas in Blender as well. Also could fix some undesired overflow when using pixels with high intensity that didn't fit in the texture format leading to artifects in Eevee and slow-down in the image editor. **Future** In the future we might want to move this to the public part of the GPU module so we can use this as well in other areas (Metal backend), Imbuf clamping See 3c658d2c2e69e9cf97dfaa7a3c164262aefb9e76 for a commit that uses this and improves image editor massively as it doesn't need to reiterate over the image buffer to clamp the values into a known range.
Jeroen Bakker added 1 commit 2023-05-23 10:01:26 +02:00
418cb6f797 Vulkan: Low Precision Float Conversion
This PR adds conversion template to convert between Low Precision float
formats. These include Binary32 floats and lower. It also adds support
to convert between unsigned and signed float formats and float formats
with different mantissa and exponents.

Additionally overflows (values that don't fit in the target float
format) will be clamped to the maximum value.

Reasoning:
Up to now the Vulkan backend only supported float and half float
formats, but to support workbench 11 and 10 unsigned floats have to be
supported as well. The available libraries that support those float
formats targets scientific applications. Where the final code couldn't
be optimized that well by the compiler.

Data conversion for color pixels have different requirements about
clamping and sign, what could eliminate some clamping code in other
areas in Blender as well. Also could fix some indesired clamping when
using pixels with high intensity that didn't fit in the texture format
leading to artifects in Eevee and slow-down in the image editor.
Jeroen Bakker requested review from Clément Foucault 2023-05-23 10:01:49 +02:00
Jeroen Bakker requested review from Bastien Montagne 2023-05-23 10:01:58 +02:00
Jeroen Bakker added the
Interest
Vulkan
label 2023-05-23 10:02:04 +02:00
Jeroen Bakker added this to the 4.0 milestone 2023-05-23 10:02:08 +02:00
Jeroen Bakker added this to the EEVEE & Viewport project 2023-05-23 10:02:12 +02:00
Jeroen Bakker self-assigned this 2023-05-23 10:02:22 +02:00
Author
Member

Code that clang generates includes branches. But as most of them are predictable branches and don't need access to different memory it seems to be ok. Note that optimizations to use vectorization could be added later. The idea to add it is to generate shifts and masks between two float formats and use them inside the conversion routine.

convert_f32_to_f16(unsigned int):                # @convert_f32_to_f16(unsigned int)
        mov     ecx, edi
        and     ecx, 8388607
        mov     eax, edi
        shr     eax, 23
        movzx   edx, al
        mov     eax, edx
        or      eax, ecx
        je      .LBB0_1
        xor     esi, esi
        cmp     edx, 255
        mov     r8d, 1023
        mov     eax, 1023
        cmove   eax, esi
        test    ecx, ecx
        cmovne  eax, r8d
        mov     r8d, 31744
        cmp     edx, 255
        je      .LBB0_7
        mov     eax, 8388607
        cmp     edx, 142
        ja      .LBB0_6
        xor     eax, eax
        cmp     edx, 112
        jb      .LBB0_8
        add     edx, -127
        mov     esi, edx
        mov     eax, ecx
.LBB0_6:
        shr     eax, 13
        shl     esi, 10
        add     esi, 15360
        mov     r8d, esi
.LBB0_7:
        shr     edi, 16
        and     edi, 32768
        or      edi, eax
        or      edi, r8d
        mov     eax, edi
.LBB0_8:
        ret
.LBB0_1:
        xor     eax, eax
        ret
convert_f32_to_f11(unsigned int):                # @convert_f32_to_f11(unsigned int)
        test    edi, edi
        setns   sil
        mov     ecx, edi
        and     ecx, 8388607
        shr     edi, 23
        movzx   edx, dil
        cmp     edx, 255
        sete    dil
        test    ecx, ecx
        setne   r8b
        xor     eax, eax
        mov     r9d, edx
        or      r9d, ecx
        je      .LBB0_8
        and     r8b, dil
        or      sil, r8b
        je      .LBB0_8
        xor     edi, edi
        cmp     edx, 255
        mov     eax, 63
        mov     esi, 63
        cmove   esi, edi
        test    ecx, ecx
        cmovne  esi, eax
        mov     eax, 1984
        cmp     edx, 255
        je      .LBB0_7
        mov     esi, 8388607
        cmp     edx, 142
        ja      .LBB0_6
        xor     eax, eax
        cmp     edx, 112
        jb      .LBB0_8
        add     edx, -127
        mov     edi, edx
        mov     esi, ecx
.LBB0_6:
        shr     esi, 17
        shl     edi, 6
        add     edi, 960
        mov     eax, edi
.LBB0_7:
        or      eax, esi
.LBB0_8:
        ret
Code that clang generates includes branches. But as most of them are predictable branches and don't need access to different memory it seems to be ok. Note that optimizations to use vectorization could be added later. The idea to add it is to generate shifts and masks between two float formats and use them inside the conversion routine. ```asm convert_f32_to_f16(unsigned int): # @convert_f32_to_f16(unsigned int) mov ecx, edi and ecx, 8388607 mov eax, edi shr eax, 23 movzx edx, al mov eax, edx or eax, ecx je .LBB0_1 xor esi, esi cmp edx, 255 mov r8d, 1023 mov eax, 1023 cmove eax, esi test ecx, ecx cmovne eax, r8d mov r8d, 31744 cmp edx, 255 je .LBB0_7 mov eax, 8388607 cmp edx, 142 ja .LBB0_6 xor eax, eax cmp edx, 112 jb .LBB0_8 add edx, -127 mov esi, edx mov eax, ecx .LBB0_6: shr eax, 13 shl esi, 10 add esi, 15360 mov r8d, esi .LBB0_7: shr edi, 16 and edi, 32768 or edi, eax or edi, r8d mov eax, edi .LBB0_8: ret .LBB0_1: xor eax, eax ret ``` ```asm convert_f32_to_f11(unsigned int): # @convert_f32_to_f11(unsigned int) test edi, edi setns sil mov ecx, edi and ecx, 8388607 shr edi, 23 movzx edx, dil cmp edx, 255 sete dil test ecx, ecx setne r8b xor eax, eax mov r9d, edx or r9d, ecx je .LBB0_8 and r8b, dil or sil, r8b je .LBB0_8 xor edi, edi cmp edx, 255 mov eax, 63 mov esi, 63 cmove esi, edi test ecx, ecx cmovne esi, eax mov eax, 1984 cmp edx, 255 je .LBB0_7 mov esi, 8388607 cmp edx, 142 ja .LBB0_6 xor eax, eax cmp edx, 112 jb .LBB0_8 add edx, -127 mov edi, edx mov esi, ecx .LBB0_6: shr esi, 17 shl edi, 6 add edi, 960 mov eax, edi .LBB0_7: or eax, esi .LBB0_8: ret ```
Jeroen Bakker reviewed 2023-05-23 11:02:43 +02:00
@ -99,0 +106,4 @@
template<bool HasSignBit, uint8_t MantissaBitLen, uint8_t ExponentBitLen>
class FloatingPointFormat {
public:
static constexpr bool HasSign = HasSignBit;
Author
Member

Codestyle

Codestyle
Jeroen-Bakker marked this conversation as resolved
Jeroen Bakker added 1 commit 2023-05-23 14:36:44 +02:00
Jeroen Bakker added 1 commit 2023-06-01 13:40:22 +02:00
Clément Foucault approved these changes 2023-06-06 17:57:55 +02:00
Jeroen Bakker merged commit b7963d247c into main 2023-06-07 07:50:12 +02:00
Jeroen Bakker deleted branch vulkan-low-precision-float-conversion 2023-06-07 07:50:13 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#108168
No description provided.