Refactor: UTF-8 Character Defines #109163
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#109163
Loading…
Reference in New Issue
No description provided.
Delete Branch "Harley/blender:utf8_defines"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Use defined UTF-8 Universal character names in place of byte escape
sequences and literals.
We have good support for displaying non-ascii Unicode characters in the interface, and we have been increasingly doing so. Current pending examples include #106388 and #108210.
However our uses of these involve different styles and duplication. We use some with literals, like "↓", with escape sequences like "\xe2\x96\xb8" and {0xe2, 0x87, 0xa7, 0x0}, and with universal characters like \u2715.
This PR defines the characters we use in a single place in a consistent way, in a new header
BLI_string_utf8_symbols.h
. This does seem to make it all much easier to follow and better to extend.@blender-bot build
In theory this sounds like a nice cleanup, however currently it breaks the translations which use macros (
N_()
,IFACE_()
,TIP_()
) to extract the messages for translation.This is because the extraction for these messages uses regexes to parse source files, and there is no preprocessing to evaluate macros.
are simply no longer extracted.
I don’t know how to fix this translation issue, short of using a C/C++ preprocessor from Python to identify and replace defines in those strings.
I believe settling on a single way to express Unicode characters and using that consistently would be a better solution.
BTW here are a few other characters that could be changed if you think it makes sense:
Good point. Glad I asked you to check.
I only looked quickly, but most of these look fixable. Most are printfs so we could move the bullet from the format string to an argument instead. That doesn't fix "✕ (Ax + B)" though, but this might be worth it even if that one remains an inline universal character?
Might just wait for Campbell to wade in. But yes, I would be okay to just use universal characters everywhere instead. I was surprised how hard it is to tell which characters are which from the encoded byte values. The universal names are so much nicer since they match the unicode 32-bit codepoint value.
I’d be a bit wary of that, nothing guarantees that the bullet point is the proper character to use in all languages. Looking at the Japanese translation, they use "・" instead of "•". (Also in French, the character traditionally used for list items is the em dash, though the bullet point is increasingly used because it’s the default option of word processors, so really it’s acceptable.)
Yes, in many instances it may still be worth it, but I fear new translation issues could be introduced later if this define system becomes part of the style guidelines.
e0bdfee1e7
to409601e0e2
Yes, I should have thought of that.
You are probably right. But... is that a different problem though? I mean this use of Unicode bullet is happening in just one file. And that file has five usages that are translated and three that are not, so would be a mishmash if it does differ by language.
Maybe we need a #define UI_BULLET_CHAR TIP_("\u2022") somewhere?
In general this seems fine although a table of named utf8 defines doesn't have so much in common with a UTF8 API.
This could be a separate header: e.g.
BLI_string_utf8_symbols.h
.Requesting a separate header, otherwise LGTM.
Ooh, good catch! Yes, this should indeed be fixed, I’ll add it to my list for later.
If the only uses for the bullet point are in this file, it could work, but to me it doesn’t seem great. Firstly there is no clear benefit for translators because upon extraction the escaped character is converted to Unicode, so we see the actual bullet point instead of "\u2022". Secondly, this character is part of the message so it is useful for us to see it in the .po file in its entirety, as it gives context.
In addition, at one point I wanted to translate a single character, but for @mont29 it was a bad idea for multiple reasons, including performance. This might be another such situation.
Are you sure? I thought that only applied to
N_()
, which does nothing and just acts as a translation marker. The others, likeIFACE_()
,TIP_()
look to return a translated string when called usingBLT_pgettext
. So those should work with the bullets in the strings as I had it earlier?I am definitely sure: these both do the translation and extract the message to the .po files. Take a look at bl_i18n_utils/settings.py for more detail on which patterns from the source code get extracted.
I tested this PR yesterday and unless I did something wrong, the messages I mentioned were all of those that disappeared [EDIT: disappeared from the .po files] after applying the patch and updating the .po files using the UI translation add-on.
Yes, I was not thinking about that part of it. Thanks for being patient with my lack of knowledge there.
Yes, those look like a great idea. But would have to be a separate change.
So... how to proceed?
I think we are all in agreement that using universal character format is nicer. And Campbell is okay with it and the naming, but wants this in a new header.
What do you think of me going forward but without making any changes to
node_draw.cc
? That one is the only with the mentioned translations issues and it is using universal characters anyway? This PR would still be cleaning up a lot of (my) mess and encourages future uses to be a bit cleaner.Others are so patient with mine 😅
Sounds good to me!
409601e0e2
toed08af5532
Requesting removal of ASCII UTF32 code-points.
@ -0,0 +42,4 @@
/* Unicode characters as UTF-32 codepoints. Last portion should include the official assigned name.
* Please do not add defines here that are not actually in use. */
#define BLI_STR_UTF32_SPACE U'\u0020' /* */
There doesn't seem to be much overall benefit to include ASCII-UTF32 code-points (which can be written as plain-text).
If a developer needs to use 10+ more characters would they would be expected to add every one as a define here... come up with unambiguous names for each. If the define becomes unused ... we have to remember to remove it. It seems like unnecessary busywork & added ambiguity since for e.g. it's not so obvious which direction a SLASH is & so it may be with other ASCII characters.
Prefer to keep inline
uint32_t(' ')
,uint32_t('/')
as-is.Done. Yes, they weren't doing much.
@blender-bot build