PBVH image texture painting technical design #96223

Open
opened 2022-03-07 18:04:13 +01:00 by Brecht Van Lommel · 14 comments

We want to support painting 16K textures on high resolution meshes, 3D rather than just projection painting, and more features matching sculpt mode. For this we are re-implementating texture painting similar to #73935 (Texture Displacement Sculpting design).

That task sets out a rough direction, this is a more detailed technical design.

Current Implementation

  • Before painting starts:
    Divide screen into buckets (tiles) Loop over triangles and add to buckets they fit in
    ** Detect if each triangle edge is a seam
  • For paint operation, multithreaded over buckets:
    Init bucket if not done yet, caching list of pixels with world/image space coordinate, accumulated mask, color pointer, etc.* Rasterize each triangle in the bucket into pixels
    *Add additional pixels around triangle for seam bleeding Loop over cached pixels to perform paint operation
  • Blur and clone
    To lookup a color at an arbitrary position, intersect mesh and read color from image at corresponding UV coordinate Intersection is accelerated through using triangles cached in buckets and working in screen space

Seam Bleeding
The current algorithm works by effectively extending the UV coordinates of a triangle and rasterizing additional pixels. The algorithm for has some logic to avoid overlaps with other triangles, however in general there are still some limitations with it:

  • UV islands extended this way may end up overlapping with other islands, and incorrectly bleed into them or not evenly fill in the space between islands.
  • Extending UV coordinates like this does not necessarily give the least discontinuities across seams, since what is baked across the seam may be different due to mapping distortion or different resolution.

GPU Texture Updates
Currently there is a mechanism for tagging tiles of an image as modified, and only uploading those to the GPUs. However some profiling revealed this to be a bottleneck, so potentially this is not fully working correctly.

With an incoherent UV map, for example a light map packing individual triangles, this system may also not help since many tiles might be affected even with a small brush radius.

Another performance issue may be updating of mipmaps. In old Blender versions mipmap updates were disabled in texture paint mode, but it’s not clear this is still working.

Proposed Implementation

In order to support 3D brushes and integration with sculpt features, it seems logical to go with a design proposed in #73935 (Texture Displacement Sculpting design). The basic idea is to reuse the PBVH and iterators over it, so that you can iterate over pixels the same as you do over vertices.

One important consideration is memory usage. With a 16K texture, just storing 1 byte per pixel is already 256 MB. And multiple 16K textures might be painted on at the same time (either multiple channels or UDIMs). So any additional precomputed data, masks and other utility data structures should not be too large.

Basic Algorithm

  • Compute a PBVH over the mesh, organizing triangles in spatially coherent nodes
  • For brush operations, find overlapping nodes:
    If needed, lazily init an array of cached pixels, by rasterizing triangles in image texture space, and storing a pixel index + barycentric coordinate. Loop over cached pixels to perform paint operations

In terms of what is actually cached, there are multiple options and the right performance/memory trade-off would have to be found empirically.

  • Don’t store any cached pixels, re-rasterize every time
  • Store only a pixel index, and compute barycentric and then 3D coordinates from that (32 bit)
  • Store pixel index and barycentric coordinate (64 bit, 2x 16 bit for pixel index and 2x 16 bit float for barycentric).
  • Store pixel index and 3D coordinate (128 bit)

Here can also be some coherence from store pixel of the same triangles consecutively in the array, so that memory access is more coherent and certain per-triangle computations can be done once for all pixels in the triangle.

Seam Bleeding
A possible solution for better seam bleeding would be to work more in texture pixel space, similar to how baking works. The new texture margin algorithm in baking extends island pixels outward with a Dijkstra shortest distance algorithm. Then to fill in the pixel it reads pixels from other islands across the seam. This addresses two of the limitations of the existing algorithm mentioned above.

An interesting aspect of such a method is that it can run as a post processing step, potentially after finishing a stroke or at lower frequency, though ideally it would be fast enough to do it every step. It can also be re-run with a larger margin as post-processing. Or it could run on the GPU even if painting happens on the CPU.

However there is the question of how to make this efficient both in performance and memory usage. Some information could be precomputed per texture. For example the image could be split into tiles, and then storing a map of which pixels to copy to which other pixels. Tiles could be lazily initialized in some way, which may require building a 2D BVH of triangles in UV space or some other type of efficient culling.

One potential other issue is when islands are so small they don’t even cover any pixel. In such extending UVs would have at least written something, whereas this post processing method might leave gaps.

GPU Texture Updates
The implementation should only changed tiles are updated, and probably (temporarily) disabling mipmaps for images while they are being painted on. Or at least delaying that until the end of the stroke. These optimizations existed in 2.7x, but seem to be broken now and should be restored.

Two potential optimizations that could be done in addition to that:

  • When the number of changed pixels is a small subset of the entire texture, an array of change pixel indices + values could be constructed and sent to the GPU. Then a compute shader could be used to update the actual texture.
  • Mipmaps could be partially recomputed somehow based on the changed tiles. Currently we use the native GPU implementation, we’d have to write a custom compute shader that can efficiently do this.

Clone and Blur/Filter Tools
The best approach here is unclear, but some options are:

  • Perform ray intersection with the mesh to find neighboring points, accelerated with BVH suitable for efficient ray-tracing. This is quite expensive.
  • Rasterize the mesh from the normal direction into an image, and then sample from that. Quality/performance depends on the image resolution, and some heuristic would need to be found to pick a good resolution.
  • Use mesh adjacency information to find neighboring pixels on the surface. Extend the UV coordinate, and if within the same triangle just use that location. If not then jump to other triangles across the edge, which may or may not be a seam.

For the clone tool, the normal that we are cloning from could be considered fixed. For blurring the normal direction could also potentially be ignored entirely, and always blur across the surface shape or in 3D space. So there is potential here for caching neighboring pixel locations, though this is memory intensive.

GPU Implementation
Implementing algorithms on the GPU in compute shaders would be another potential speedup. This would significantly increase code complexity and in practical terms is more like a future project than something we expect to start with. However some notes regarding this:

  • Storing a pixel list and instead rasterizing could be redone every time. Note however that we can’t simply rasterize to a 16K frame buffer, both because hardware has limits here, and because it may be inefficient. There may be other ways to do this in compute shaders that lets us touch just the pixels we need.
  • For textured brushes we’d need texture nodes evaluation on the GPU, which could become possible if we unify texture nodes so brushes and Eevee share the same nodes.
  • No need to copy pixels to the GPU would avoid some overhead. We’d need to copy the pixels back from the GPU to the CPU at the end of a stroke.
  • PBVH already has GPU buffers per node with vertex coordinates, so these could be reused.

Sculpt Unification
This design naturally helps integration with sculpt tools, in that the same PBVH and even some tools can be reused.

For masking, an easy first step would be to reuse the masking tools that work at the vertex level from sculpt, and simply interpolate the mask for pixels. However support for masks at the image resolution (or a lower resolution) is likely useful also.

The PBVH iterations should be refactored, so it becomes easier to add the pixel case. Likely porting to C++ rather than using a long macro.

We want to support painting 16K textures on high resolution meshes, 3D rather than just projection painting, and more features matching sculpt mode. For this we are re-implementating texture painting similar to #73935 (Texture Displacement Sculpting design). That task sets out a rough direction, this is a more detailed technical design. ### Current Implementation * Before painting starts: **Divide screen into buckets (tiles)** Loop over triangles and add to buckets they fit in ** Detect if each triangle edge is a seam * For paint operation, multithreaded over buckets: **Init bucket if not done yet, caching list of pixels with world/image space coordinate, accumulated mask, color pointer, etc.*** Rasterize each triangle in the bucket into pixels ***Add additional pixels around triangle for seam bleeding** Loop over cached pixels to perform paint operation * Blur and clone **To lookup a color at an arbitrary position, intersect mesh and read color from image at corresponding UV coordinate** Intersection is accelerated through using triangles cached in buckets and working in screen space **Seam Bleeding** The current algorithm works by effectively extending the UV coordinates of a triangle and rasterizing additional pixels. The algorithm for has some logic to avoid overlaps with other triangles, however in general there are still some limitations with it: * UV islands extended this way may end up overlapping with other islands, and incorrectly bleed into them or not evenly fill in the space between islands. * Extending UV coordinates like this does not necessarily give the least discontinuities across seams, since what is baked across the seam may be different due to mapping distortion or different resolution. **GPU Texture Updates** Currently there is a mechanism for tagging tiles of an image as modified, and only uploading those to the GPUs. However some profiling revealed this to be a bottleneck, so potentially this is not fully working correctly. With an incoherent UV map, for example a light map packing individual triangles, this system may also not help since many tiles might be affected even with a small brush radius. Another performance issue may be updating of mipmaps. In old Blender versions mipmap updates were disabled in texture paint mode, but it’s not clear this is still working. ### Proposed Implementation In order to support 3D brushes and integration with sculpt features, it seems logical to go with a design proposed in #73935 (Texture Displacement Sculpting design). The basic idea is to reuse the PBVH and iterators over it, so that you can iterate over pixels the same as you do over vertices. One important consideration is memory usage. With a 16K texture, just storing 1 byte per pixel is already 256 MB. And multiple 16K textures might be painted on at the same time (either multiple channels or UDIMs). So any additional precomputed data, masks and other utility data structures should not be too large. **Basic Algorithm** * Compute a PBVH over the mesh, organizing triangles in spatially coherent nodes * For brush operations, find overlapping nodes: **If needed, lazily init an array of cached pixels, by rasterizing triangles in image texture space, and storing a pixel index + barycentric coordinate.** Loop over cached pixels to perform paint operations In terms of what is actually cached, there are multiple options and the right performance/memory trade-off would have to be found empirically. * Don’t store any cached pixels, re-rasterize every time * Store only a pixel index, and compute barycentric and then 3D coordinates from that (32 bit) * Store pixel index and barycentric coordinate (64 bit, 2x 16 bit for pixel index and 2x 16 bit float for barycentric). * Store pixel index and 3D coordinate (128 bit) Here can also be some coherence from store pixel of the same triangles consecutively in the array, so that memory access is more coherent and certain per-triangle computations can be done once for all pixels in the triangle. **Seam Bleeding** A possible solution for better seam bleeding would be to work more in texture pixel space, similar to how baking works. The new texture margin algorithm in baking extends island pixels outward with a Dijkstra shortest distance algorithm. Then to fill in the pixel it reads pixels from other islands across the seam. This addresses two of the limitations of the existing algorithm mentioned above. An interesting aspect of such a method is that it can run as a post processing step, potentially after finishing a stroke or at lower frequency, though ideally it would be fast enough to do it every step. It can also be re-run with a larger margin as post-processing. Or it could run on the GPU even if painting happens on the CPU. However there is the question of how to make this efficient both in performance and memory usage. Some information could be precomputed per texture. For example the image could be split into tiles, and then storing a map of which pixels to copy to which other pixels. Tiles could be lazily initialized in some way, which may require building a 2D BVH of triangles in UV space or some other type of efficient culling. One potential other issue is when islands are so small they don’t even cover any pixel. In such extending UVs would have at least written something, whereas this post processing method might leave gaps. **GPU Texture Updates** The implementation should only changed tiles are updated, and probably (temporarily) disabling mipmaps for images while they are being painted on. Or at least delaying that until the end of the stroke. These optimizations existed in 2.7x, but seem to be broken now and should be restored. Two potential optimizations that could be done in addition to that: * When the number of changed pixels is a small subset of the entire texture, an array of change pixel indices + values could be constructed and sent to the GPU. Then a compute shader could be used to update the actual texture. * Mipmaps could be partially recomputed somehow based on the changed tiles. Currently we use the native GPU implementation, we’d have to write a custom compute shader that can efficiently do this. **Clone and Blur/Filter Tools** The best approach here is unclear, but some options are: * Perform ray intersection with the mesh to find neighboring points, accelerated with BVH suitable for efficient ray-tracing. This is quite expensive. * Rasterize the mesh from the normal direction into an image, and then sample from that. Quality/performance depends on the image resolution, and some heuristic would need to be found to pick a good resolution. * Use mesh adjacency information to find neighboring pixels on the surface. Extend the UV coordinate, and if within the same triangle just use that location. If not then jump to other triangles across the edge, which may or may not be a seam. For the clone tool, the normal that we are cloning from could be considered fixed. For blurring the normal direction could also potentially be ignored entirely, and always blur across the surface shape or in 3D space. So there is potential here for caching neighboring pixel locations, though this is memory intensive. **GPU Implementation** Implementing algorithms on the GPU in compute shaders would be another potential speedup. This would significantly increase code complexity and in practical terms is more like a future project than something we expect to start with. However some notes regarding this: * Storing a pixel list and instead rasterizing could be redone every time. Note however that we can’t simply rasterize to a 16K frame buffer, both because hardware has limits here, and because it may be inefficient. There may be other ways to do this in compute shaders that lets us touch just the pixels we need. * For textured brushes we’d need texture nodes evaluation on the GPU, which could become possible if we unify texture nodes so brushes and Eevee share the same nodes. * No need to copy pixels to the GPU would avoid some overhead. We’d need to copy the pixels back from the GPU to the CPU at the end of a stroke. * PBVH already has GPU buffers per node with vertex coordinates, so these could be reused. **Sculpt Unification** This design naturally helps integration with sculpt tools, in that the same PBVH and even some tools can be reused. For masking, an easy first step would be to reuse the masking tools that work at the vertex level from sculpt, and simply interpolate the mask for pixels. However support for masks at the image resolution (or a lower resolution) is likely useful also. The PBVH iterations should be refactored, so it becomes easier to add the pixel case. Likely porting to C++ rather than using a long macro.
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Added subscriber: @brecht

Added subscriber: @brecht
Author
Owner

Added subscribers: @JosephEagar, @Jeroen-Bakker

Added subscribers: @JosephEagar, @Jeroen-Bakker
Member

Added subscriber: @lichtwerk

Added subscriber: @lichtwerk
Author
Owner

There is overlap here with the planned system for layered textures and baking, and I'm wondering to what extent it makes sense to share data structures as users would likely be using both at the same time. For an interactive texture baking system, you could imagine doing this:

  • Generate an image texture with a barycentric coordinate + triangle index from rasterizing all triangles in the mesh (fits in a 32 bits texture).
  • Generate a seam bleeding map based on this, where the same image texture stores which other pixel to copy from for seam pixels.
  • Then during texture evaluation, use this to lookup mesh attributes like 3D position, normals, texture coordinates and vertex colors, as well as do seam bleeding at the end.

If texture painting use the same cached image texture, the PBVH nodes could then do one of these:

  • Store a list of pixel indices
  • Rasterize on demand
  • Compute a triangle bounding box in image space and iterate over pixels to find matching triangle indices

There are memory/performance trade-offs here of course:

  • If the image texture is not densely packed or you're painting on a model that maps to only part of an image, there is memory overhead storing data for pixels that don't affect anything.
  • If it's all stored in PBVH nodes the triangle index can be left out (though computing the seam bleeding map may need it temporarily).
  • For realtime baking, it is more efficient to store positions, normals, texture coordinates in their own texture, though memory usage will be high. A hybrid system is also possible.
  • Creating such a barycentric coordinate + triangle index texture is fairly straightforward to GPU accelerate, though copying back to the CPU is not free.

I think this is worth considering, though I do worry about memory usage. Even without this though, every 32 bit 16K texture takes up 1GB. Imagine you have multiple texture channels, UDIMs, intermediate buffers for baking, ... . It's easy to exceed GPU memory really quickly.

There is overlap here with the planned system for layered textures and baking, and I'm wondering to what extent it makes sense to share data structures as users would likely be using both at the same time. For an interactive texture baking system, you could imagine doing this: * Generate an image texture with a barycentric coordinate + triangle index from rasterizing all triangles in the mesh (fits in a 32 bits texture). * Generate a seam bleeding map based on this, where the same image texture stores which other pixel to copy from for seam pixels. * Then during texture evaluation, use this to lookup mesh attributes like 3D position, normals, texture coordinates and vertex colors, as well as do seam bleeding at the end. If texture painting use the same cached image texture, the PBVH nodes could then do one of these: * Store a list of pixel indices * Rasterize on demand * Compute a triangle bounding box in image space and iterate over pixels to find matching triangle indices There are memory/performance trade-offs here of course: * If the image texture is not densely packed or you're painting on a model that maps to only part of an image, there is memory overhead storing data for pixels that don't affect anything. * If it's all stored in PBVH nodes the triangle index can be left out (though computing the seam bleeding map may need it temporarily). * For realtime baking, it is more efficient to store positions, normals, texture coordinates in their own texture, though memory usage will be high. A hybrid system is also possible. * Creating such a barycentric coordinate + triangle index texture is fairly straightforward to GPU accelerate, though copying back to the CPU is not free. I think this is worth considering, though I do worry about memory usage. Even without this though, every 32 bit 16K texture takes up 1GB. Imagine you have multiple texture channels, UDIMs, intermediate buffers for baking, ... . It's easy to exceed GPU memory really quickly.
Member

We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels.

We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels.
Member

Note mipmap generation is still being disabled during painting of a stroke. For lightnaps gpu updates could be more effective when done via a compute shader. Updates of ‘regular’ maps is still working as expected. But can be improved

Note mipmap generation is still being disabled during painting of a stroke. For lightnaps gpu updates could be more effective when done via a compute shader. Updates of ‘regular’ maps is still working as expected. But can be improved
Contributor

Added subscriber: @Raimund58

Added subscriber: @Raimund58
Author
Owner

In #96223#1318817, @Jeroen-Bakker wrote:
We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels.

An alternative to splitting the actual PBVH nodes would be to modify multithreaded scheduling over pixels, so PBVH nodes with many pixels are split into multiple tasks.

A material may use multiple images at different resolutions, and we want to be able to switch between them quickly or even paint on multiple at once. For this it may make help to not have the PBVH tree layout tied to any particular image resolution.

> In #96223#1318817, @Jeroen-Bakker wrote: > We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels. An alternative to splitting the actual PBVH nodes would be to modify multithreaded scheduling over pixels, so PBVH nodes with many pixels are split into multiple tasks. A material may use multiple images at different resolutions, and we want to be able to switch between them quickly or even paint on multiple at once. For this it may make help to not have the PBVH tree layout tied to any particular image resolution.
Member

In #96223#1319400, @brecht wrote:

In #96223#1318817, @Jeroen-Bakker wrote:
We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels.

An alternative to splitting the actual PBVH nodes would be to modify multithreaded scheduling over pixels, so PBVH nodes with many pixels are split into multiple tasks.

A material may use multiple images at different resolutions, and we want to be able to switch between them quickly or even paint on multiple at once. For this it may make help to not have the PBVH tree layout tied to any particular image resolution.

I had a fair amount of success in my experiments with splitting PBVH nodes. It does mean having two different types of leaves; nodes flagged with PBVH_Leaf store triangles, and then I added a PBVH_TexLeaf that stores texels. PBVH_TexLeafs always descend from PBVH_Leaf nodes of course.

> In #96223#1319400, @brecht wrote: >> In #96223#1318817, @Jeroen-Bakker wrote: >> We should also considering painting on simple meshes. When painting on a cube with a high res texture might have a small number of PBVHNodes with a large collectioin of pixels per node. What makes the current design not usable. Perhaps we should consider using splitting up leafs when there is an imbalance between surface faces and pixels. > > An alternative to splitting the actual PBVH nodes would be to modify multithreaded scheduling over pixels, so PBVH nodes with many pixels are split into multiple tasks. > > A material may use multiple images at different resolutions, and we want to be able to switch between them quickly or even paint on multiple at once. For this it may make help to not have the PBVH tree layout tied to any particular image resolution. I had a fair amount of success in my experiments with splitting PBVH nodes. It does mean having two different types of leaves; nodes flagged with PBVH_Leaf store triangles, and then I added a PBVH_TexLeaf that stores texels. PBVH_TexLeafs always descend from PBVH_Leaf nodes of course.

Added subscriber: @pauanyu_blender

Added subscriber: @pauanyu_blender

Added subscriber: @Sergi-Alberca-Santamaria

Added subscriber: @Sergi-Alberca-Santamaria

Added subscriber: @GeorgiaPacific

Added subscriber: @GeorgiaPacific

Added subscriber: @Emi_Martinez

Added subscriber: @Emi_Martinez
Julien Kaspar added this to the Sculpt, Paint & Texture project 2023-02-08 10:20:48 +01:00
Philipp Oeser removed the
Interest
Sculpt, Paint & Texture
label 2023-02-10 09:11:31 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
9 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#96223
No description provided.