WIP: UI: Bidirectional Text and Complex Shaping #104662

Draft
Harley Acheson wants to merge 6 commits from Harley/blender:ComplexShaping into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

Adds an Experimental feature to support the output of UI text for
scripts that are right-to-left and require complex shaping.


This is progress toward supporting RTL languages and complex text shaping, as needed for Arabic, Aramaic, Azeri, Dhivehi/Maldivian, Hebrew, Kurdish (Sorani), Persian/Farsi, Urdu, Thai, and others.

Specifically this means allowing the entry, editing, and display of these languages in the UI. So text edit boxes, labels, hints, etc. This does not attempt to add such support to Text Editor, nor to Text Objects or String to Curve Node. Text Objects and Curves would be a logical next step, but Text Editor is currently not even considered (mostly because of how Syntax Highlighting is done).

Current status: It is demonstrable, looks cool, and doesn't crash. Mostly works.

Once compiled you need to enable it by going to Edit / Preferences / Interface, turning on "Developer Extras", then enabling "Complex Text Shaping" in the "Experimental" section. Once enabled it will use the complex shaping code for ALL text. The code includes means to only do so for runs of text that require it, but for testing everything goes through it. This means that regular Latin text will track slightly differently as this properly handles gpos kerning.

complex2.png

The impact on performance is minimal. Using this code for all text for default scene is 11.7 milliseconds, while it 9.6 using current code.

It handles text caret insertion properly. This is complex because of bidirectionality in that you are selecting something in visual order but needing an answer in logical order (and these might be differing lengths). Made more complex because the length of a subset of a complex string is not related to the full string as glyphs change as the content changes.

I handles text selection somewhat adequately. Selection with mixed directions gets complicated, but current code does so almost correctly and allows editing and deletion properly. The following might look odd, but is exactly how it should work for mixed-direction selection.

complex4.gif

Implementation Details

Our regular text functions handle each character one-by-one as the string is traversed. With complex shaping however we need to process chunks of text at a time since this requires greater context. For RTL languages, for example, we need at least entire words at a time in order to select the correct glyphs for each position. Ligatures need to know of multiple characters in order to do substitutions. Even kerning using the gpos table uses rules that require more context.

In a nutshell this digests an entire string at a time, processing separate segments that have the same language and direction. Strings can contain an unlimited number of segments of differing language or direction. So you can combine English, Arabic, Hebrew, and English again. In that example the string will processed using specific fonts for Arabic and Hebrew from the font stack. This code selects the best font based on the font's language coverage.

A string is consumed at once in the constructor of a ShapingData object:

ShapingData text(str, str_len);

At this point we have versions of the string in u32string in the order we received it (logical order), a copy transformed by FriBiDi into visual order, along with arrays that index between the two.

We need to process each segment separately, so this is done like this:

  while (text.process(font, gc)) {
    // do stuff with text.segment
  }

The above will process each segment by Harfbuzz to transform it into its final form with appropriate glyphs, spacing, and positioning. The segment member has a glyph_count (which may vary from the character count), and other information about this portion of the string including the language. Most importantly it has a Vector of glyphs, a Vector of bounds, and other position and size information. Note that in the above, the font and gc arguments are just what is used as defaults and will change as needed.

Therefore outputting the text is very simple with just a simple loop to place each glyph. Measuring a string is just summing the bounds extents. "For each" processing works as it does now, but keep in mind that the callbacks are called in visual order, so don't assume that the offset will always be ascending.

One big change outside of shaping has to do with glyph caching. We currently locate cached glyphs by codepoint, and then we find the glyph id stored there. But with complex shaping we are given arrays of glyph ids. Therefore the hash index is changed glyph id. Searches for glyphs by codepoint (for simple path) is still fast because the FreeType caching system caches these lookups.

Adds an Experimental feature to support the output of UI text for scripts that are right-to-left and require complex shaping. --- This is progress toward supporting RTL languages and complex text shaping, as needed for Arabic, Aramaic, Azeri, Dhivehi/Maldivian, Hebrew, Kurdish (Sorani), Persian/Farsi, Urdu, Thai, and others. Specifically this means allowing the entry, editing, and display of these languages **in the UI**. So text edit boxes, labels, hints, etc. This does not attempt to add such support to Text Editor, nor to Text Objects or String to Curve Node. Text Objects and Curves would be a logical next step, but Text Editor is currently not even considered (mostly because of how Syntax Highlighting is done). Current status: **It is demonstrable, looks cool, and doesn't crash. Mostly works.** Once compiled you need to **enable it** by going to Edit / Preferences / Interface, turning on "Developer Extras", then enabling "Complex Text Shaping" in the "Experimental" section. Once enabled it will use the complex shaping code **for ALL text**. The code includes means to only do so for runs of text that require it, but for testing everything goes through it. This means that regular Latin text will track slightly differently as this properly handles gpos kerning. ![complex2.png](/attachments/cf3f66f8-bdc8-488b-8fa6-4b30db89627f) The impact on performance is minimal. Using this code for all text for default scene is 11.7 milliseconds, while it 9.6 using current code. It handles text caret insertion properly. This is complex because of bidirectionality in that you are selecting something in visual order but needing an answer in logical order (and these might be differing lengths). Made more complex because the length of a subset of a complex string is not related to the full string as glyphs change as the content changes. I handles text selection somewhat adequately. Selection with mixed directions gets complicated, but current code does so almost correctly and allows editing and deletion properly. The following might look odd, but is _exactly_ how it should work for mixed-direction selection. ![complex4.gif](/attachments/63500864-a228-4116-a246-f3ea709fec87) **Implementation Details** Our regular text functions handle each character one-by-one as the string is traversed. With complex shaping however we need to process chunks of text at a time since this requires greater context. For RTL languages, for example, we need at least entire words at a time in order to select the correct glyphs for each position. Ligatures need to know of multiple characters in order to do substitutions. Even kerning using the gpos table uses rules that require more context. In a nutshell this digests an entire string at a time, processing separate segments that have the same language and direction. Strings can contain an unlimited number of segments of differing language or direction. So you can combine English, Arabic, Hebrew, and English again. In that example the string will processed using specific fonts for Arabic and Hebrew from the font stack. This code selects the best font based on the font's language coverage. A string is consumed at once in the constructor of a `ShapingData` object: ``` ShapingData text(str, str_len); ``` At this point we have versions of the string in u32string in the order we received it (logical order), a copy transformed by FriBiDi into visual order, along with arrays that index between the two. We need to process each segment separately, so this is done like this: ``` while (text.process(font, gc)) { // do stuff with text.segment } ``` The above will process each segment by Harfbuzz to transform it into its final form with appropriate glyphs, spacing, and positioning. The `segment` member has a `glyph_count` (which may vary from the character count), and other information about this portion of the string including the language. Most importantly it has a Vector of glyphs, a Vector of bounds, and other position and size information. Note that in the above, the font and gc arguments are just what is used as defaults and will change as needed. Therefore outputting the text is very simple with just a simple loop to place each glyph. Measuring a string is just summing the bounds extents. "For each" processing works as it does now, but keep in mind that the callbacks are called in _visual order_, so don't assume that the offset will always be ascending. One big change outside of shaping has to do with glyph caching. We currently locate cached glyphs by codepoint, and then we find the glyph id stored there. But with complex shaping we are given arrays of glyph ids. Therefore the hash index is changed glyph id. Searches for glyphs by codepoint (for simple path) is still fast because the FreeType caching system caches these lookups.
Harley Acheson force-pushed ComplexShaping from 3f9984cfa1 to 25826f4ee1 2023-02-14 22:08:04 +01:00 Compare
Harley Acheson force-pushed ComplexShaping from 25826f4ee1 to 8c469259bb 2023-02-15 21:39:48 +01:00 Compare
Author
Member

@blender-bot build

@blender-bot build
Harley Acheson force-pushed ComplexShaping from 4bab57090c to f601d54d7d 2023-02-23 20:05:48 +01:00 Compare
Author
Member

@blender-bot build

@blender-bot build
Harley Acheson force-pushed ComplexShaping from f601d54d7d to cba6e30990 2023-07-18 19:43:22 +02:00 Compare
Harley Acheson force-pushed ComplexShaping from 11e7e6c967 to d594ff98c4 2024-03-07 01:15:27 +01:00 Compare
Harley Acheson force-pushed ComplexShaping from cedf41ecb6 to d6be4d9c1e 2024-03-26 20:52:37 +01:00 Compare
Harley Acheson force-pushed ComplexShaping from d6be4d9c1e to 664dc4ed7b 2024-04-09 00:46:29 +02:00 Compare
Harley Acheson added 1 commit 2024-04-09 16:47:21 +02:00
Harley Acheson added 1 commit 2024-04-09 21:36:47 +02:00
Harley Acheson force-pushed ComplexShaping from 6f72fe6660 to 6ea3cf6a17 2024-04-15 23:49:50 +02:00 Compare
Harley Acheson added 1 commit 2024-04-15 23:57:49 +02:00
Harley Acheson added 2 commits 2024-04-17 17:29:17 +02:00
Harley Acheson added 2 commits 2024-04-28 20:05:49 +02:00
This pull request has changes conflicting with the target branch.
  • source/blender/blenfont/intern/blf_font.cc

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u ComplexShaping:Harley-ComplexShaping
git checkout Harley-ComplexShaping
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#104662
No description provided.