fix search string in a endless loop #129167

Closed
dereck-4 wants to merge 2 commits from dereck-4/blender:fix_search_problem into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Contributor

When the language is chinese simplified, and you use F3 to search ops in blender, if use input camera the blender will fall into a dead loop.


Notes to help an automated script:
Broken: 4.2

When the language is chinese simplified, and you use F3 to search ops in blender, if use input camera the blender will fall into a dead loop. --- Notes to help an automated script: Broken: 4.2
dereck-4 added 1 commit 2024-10-17 17:17:05 +02:00
when the language is chinese simplified,and you use F3 to search ops in blender,if use input camera the blender will fall into a dead loop
Iliya Katushenock added this to the Core project 2024-10-17 21:27:29 +02:00
Iliya Katushenock requested review from Jacques Lucke 2024-10-17 21:28:58 +02:00
Jacques Lucke requested changes 2024-10-18 00:14:26 +02:00
Jacques Lucke left a comment
Member

Thanks for the fix. Do you think it may be possible for me to reproduce this by just pasting some text into the searchbox or is it more complicated?

Thanks for the fix. Do you think it may be possible for me to reproduce this by just pasting some text into the searchbox or is it more complicated?
@ -150,6 +150,8 @@ int get_fuzzy_match_errors(StringRef query, StringRef full)
for (int i = 0; i < window_offset && window_end < full_end; i++) {
window_begin += BLI_str_utf8_size_safe(window_begin);
window_end += BLI_str_utf8_size_safe(window_end);
if (window_end > full_end)
Member

Please use Blenders code style, which means using braces { } here. Would also be a good to add a comment for how this case can happen (it's not super obvious right now).

Please use Blenders code style, which means using braces `{ }` here. Would also be a good to add a comment for how this case can happen (it's not super obvious right now).
dereck-4 marked this conversation as resolved
Author
Contributor

Thanks for the fix. Do you think it may be possible for me to reproduce this by just pasting some text into the searchbox or is it more complicated?

I found it is because I use a plugins which name is geo-scatter5,and it register some name which is not utf8 character
it name is 鐒跺悗鍦ㄦā鎬佹ā寮忎笅鐩存帴鏇存敼?
all character bytes are
231
132
182
229
144
142
229
156
168
230
168
161
230
128
129
230
168
161
229
188
143
228
184
139
231
155
180
230
142
165
230
155
180
230
148
185
228

so blendr enter a dead loop

> Thanks for the fix. Do you think it may be possible for me to reproduce this by just pasting some text into the searchbox or is it more complicated? I found it is because I use a plugins which name is geo-scatter5,and it register some name which is not utf8 character it name is 鐒跺悗鍦ㄦā鎬佹ā寮忎笅鐩存帴鏇存敼? all character bytes are 231 132 182 229 144 142 229 156 168 230 168 161 230 128 129 230 168 161 229 188 143 228 184 139 231 155 180 230 142 165 230 155 180 230 148 185 228 so blendr enter a dead loop
Author
Contributor

use this plugins and switch to simplified chinese,then use f3 to search "camera",the problem is show

use this plugins and switch to simplified chinese,then use f3 to search "camera",the problem is show
Member

Thanks, I was able to reproduce the issue now. I created an alternative fix in #129209 which prevents invalid utf-8 in the case presented here.

Thanks, I was able to reproduce the issue now. I created an alternative fix in #129209 which prevents invalid utf-8 in the case presented here.
Bastien Montagne requested changes 2024-10-18 15:07:24 +02:00
Bastien Montagne left a comment
Owner

I still think that making this search code safe would also be worth it.

Would move the check outside of the for loop though, something like that:

diff --git a/source/blender/blenlib/intern/string_search.cc b/source/blender/blenlib/intern/string_search.cc
index f740f52f627..d8a6c2bda57 100644
--- a/source/blender/blenlib/intern/string_search.cc
+++ b/source/blender/blenlib/intern/string_search.cc
@@ -151,6 +151,12 @@ int get_fuzzy_match_errors(StringRef query, StringRef full)
       window_begin += BLI_str_utf8_size_safe(window_begin);
       window_end += BLI_str_utf8_size_safe(window_end);
     }
+    if (window_end > full_end) {
+      /* Can happen in case the `full` string has invalid utf8 bytes. While typically this should
+       * not occur, better be safe and handle the case gracefully. See also PR !129167, where a
+       * breaking case was reported. */
+      window_end = full_end;
+    }
   }
 }
 

We could also add a CLOG warning, though think it's not clear currently whether there can be valid cases of handling utf-8 invalid strings here?

I still think that making this search code safe would also be worth it. Would move the check outside of the `for` loop though, something like that: ```diff diff --git a/source/blender/blenlib/intern/string_search.cc b/source/blender/blenlib/intern/string_search.cc index f740f52f627..d8a6c2bda57 100644 --- a/source/blender/blenlib/intern/string_search.cc +++ b/source/blender/blenlib/intern/string_search.cc @@ -151,6 +151,12 @@ int get_fuzzy_match_errors(StringRef query, StringRef full) window_begin += BLI_str_utf8_size_safe(window_begin); window_end += BLI_str_utf8_size_safe(window_end); } + if (window_end > full_end) { + /* Can happen in case the `full` string has invalid utf8 bytes. While typically this should + * not occur, better be safe and handle the case gracefully. See also PR !129167, where a + * breaking case was reported. */ + window_end = full_end; + } } } ``` We could also add a CLOG warning, though think it's not clear currently whether there can be _valid_ cases of handling utf-8 invalid strings here?
Author
Contributor

I still think that making this search code safe would also be worth it.

Would move the check outside of the for loop though, something like that:

diff --git a/source/blender/blenlib/intern/string_search.cc b/source/blender/blenlib/intern/string_search.cc
index f740f52f627..d8a6c2bda57 100644
--- a/source/blender/blenlib/intern/string_search.cc
+++ b/source/blender/blenlib/intern/string_search.cc
@@ -151,6 +151,12 @@ int get_fuzzy_match_errors(StringRef query, StringRef full)
       window_begin += BLI_str_utf8_size_safe(window_begin);
       window_end += BLI_str_utf8_size_safe(window_end);
     }
+    if (window_end > full_end) {
+      /* Can happen in case the `full` string has invalid utf8 bytes. While typically this should
+       * not occur, better be safe and handle the case gracefully. See also PR !129167, where a
+       * breaking case was reported. */
+      window_end = full_end;
+    }
   }
 }
 

We could also add a CLOG warning, though think it's not clear currently whether there can be valid cases of handling utf-8 invalid strings here?

So I change my code as you said,and I understand your idea can be more performance friendly.While the orginal code which I put the if in for loops is because once it is greater then full_end,it may deref the unexpected address,which may crash the program.

> I still think that making this search code safe would also be worth it. > > Would move the check outside of the `for` loop though, something like that: > > ```diff > diff --git a/source/blender/blenlib/intern/string_search.cc b/source/blender/blenlib/intern/string_search.cc > index f740f52f627..d8a6c2bda57 100644 > --- a/source/blender/blenlib/intern/string_search.cc > +++ b/source/blender/blenlib/intern/string_search.cc > @@ -151,6 +151,12 @@ int get_fuzzy_match_errors(StringRef query, StringRef full) > window_begin += BLI_str_utf8_size_safe(window_begin); > window_end += BLI_str_utf8_size_safe(window_end); > } > + if (window_end > full_end) { > + /* Can happen in case the `full` string has invalid utf8 bytes. While typically this should > + * not occur, better be safe and handle the case gracefully. See also PR !129167, where a > + * breaking case was reported. */ > + window_end = full_end; > + } > } > } > > ``` > > We could also add a CLOG warning, though think it's not clear currently whether there can be _valid_ cases of handling utf-8 invalid strings here? So I change my code as you said,and I understand your idea can be more performance friendly.While the orginal code which I put the if in for loops is because once it is greater then full_end,it may deref the unexpected address,which may crash the program.
dereck-4 added 1 commit 2024-10-18 16:02:03 +02:00
dereck-4 requested review from Bastien Montagne 2024-10-18 16:10:30 +02:00

While this change seems OK in isolation, it raises questions about how UTF8 should be handled in string_search.cc.

If every call to decode a UTF8 code point should handle invalid data, then it looks like there would be other bugs exist in the code which doesn't account for incomplete UTF8 sequences.


Instead we could require all strings in string_search.cc be valid UTF8. We could have assertions to ensure this is followed by all users.

While this change seems OK in isolation, it raises questions about how UTF8 should be handled in `string_search.cc`. If every call to decode a UTF8 code point should handle invalid data, then it looks like there would be other bugs exist in the code which doesn't account for incomplete UTF8 sequences. ---- Instead we could require all strings in `string_search.cc` be valid UTF8. We could have assertions to ensure this is followed by all users.
Member

Will close this because it has been solved separately.

In the last core module meeting we briefly talked about adding an assert in string-search code to check that everything is valid utf8. Everyone seemed to be fine with that.

Will close this because it has been solved separately. In the last [core module meeting](https://devtalk.blender.org/t/2024-10-31-core-meeting/37140) we briefly talked about adding an assert in string-search code to check that everything is valid utf8. Everyone seemed to be fine with that.
Jacques Lucke closed this pull request 2024-11-04 14:53:50 +01:00
Alaska added the
Module
User Interface
label 2024-11-13 03:18:41 +01:00

Pull request closed

Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#129167
No description provided.