OBJ importer: Inconsistent handling of decoding errors between Windows and Linux #86783

Closed
opened 2021-03-21 07:43:41 +01:00 by June · 13 comments

System Information
Operating system: Windows-10-10.0.18362 64 Bits
Graphics card: Quadro M1200/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 452.41

Blender Version
Broken: version: 2.92 & 2.80/2.83/2.93

Short description of error
The obj importer does not provide the same error handling behavior on Linux and Windows. If an *.obj is imported that is not encoded as UTF-8, the name of the referenced *.mtl can't be properly decoded. On Windows this immediately raises an UnicodeError and the import fails in its entirety. This is because of the use of os.fsdecode that applies error='strict' on Windows. On Linux it attempts to decode the filename and doesn't throw an error. Instead it skips the loading of the *.mtl later on as it cannot find a file with the garbled filename. The object itself is still imported.

Original description:
Obj importer from Blender 2.8-2.93 on my machine/Win10 seems unable to import non UTF-8 encoded .obj , gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte.

However, Blender 2.79 on my machine/Win10 is able to import with no problem.

Attached a test example. obj_name.7z

Exact steps for others to reproduce the error
open Blender 2.80+, import attached .obj, get error.

**System Information** Operating system: Windows-10-10.0.18362 64 Bits Graphics card: Quadro M1200/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 452.41 **Blender Version** Broken: version: 2.92 & 2.80/2.83/2.93 **Short description of error** The obj importer does not provide the same error handling behavior on Linux and Windows. If an *.obj is imported that is *not* encoded as UTF-8, the name of the referenced *.mtl can't be properly decoded. On Windows this immediately raises an `UnicodeError` and the import fails in its entirety. This is because of the use of `os.fsdecode` that applies `error='strict'` on Windows. On Linux it attempts to decode the filename and doesn't throw an error. Instead it skips the loading of the *.mtl later on as it cannot find a file with the garbled filename. The object itself is still imported. Original description: Obj importer from Blender 2.8-2.93 on my machine/Win10 seems unable to import non UTF-8 encoded .obj , gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte. However, Blender 2.79 on my machine/Win10 is able to import with no problem. Attached a test example. [obj_name.7z](https://archive.blender.org/developer/F9902049/obj_name.7z) **Exact steps for others to reproduce the error** open Blender 2.80+, import attached .obj, get error.
Author

Added subscriber: @tiancaipipi110

Added subscriber: @tiancaipipi110
Author

Here're the log files from failed import in Blender 2.92 & 2.80rc3. bug_tracker.7z

Here're the log files from failed import in Blender 2.92 & 2.80rc3. [bug_tracker.7z](https://archive.blender.org/developer/F9902052/bug_tracker.7z)

Added subscriber: @deadpin

Added subscriber: @deadpin

Notes to other triagers:

  • 2.79 loads the geometry, however, it fails to load the material- . It doesn't seem like the file was written correctly; perhaps the software used was also not unicode friendly.

  • 2.93 does not load anything, it fails with the material like 2.79 and then stops processing[2]

  • Here's the console output from 2.79

(  0.0050 sec |   0.0050 sec) Importing OBJ 'E:\\51_╨┤╨╕╨▓╨░╨╜.obj'...
  (  0.0050 sec |   0.0000 sec) Parsing OBJ file...
    (  0.7121 sec |   0.7071 sec) Done, loading materials and images...
        Material not found MTL: 'E:\\51_§Õ§Ú§Ó§Ñ§ß.mtl'         <--- HERE
WARNING, currently unsupported glass material, skipped.
WARNING, currently unsupported translucency option, skipped.
WARNING, currently unsupported ambient texture, skipped.
WARNING, currently unsupported glass material, skipped.
WARNING, currently unsupported translucency option, skipped.
WARNING, currently unsupported glass material, skipped.
WARNING, currently unsupported translucency option, skipped.
    (  0.7326 sec |   0.7276 sec) Done, building geometries (verts:41836 faces:49412 materials: 3 smoothgroups:3) ...
    (  1.5498 sec |   1.5448 sec) Done.
  (  1.5498 sec |   1.5448 sec) Finished importing: 'E:\\51_╨┤╨╕╨▓╨░╨╜.obj'
Progress: 100.00%
  • From 2.93
(  0.0000 sec |   0.0000 sec) Importing OBJ 'E:\\51_диван.obj'...
  (  0.0151 sec |   0.0010 sec) Parsing OBJ file...
Progress:   0.00%

Python: Traceback (most recent call last):
  File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\__init__.py", line 146, in execute
    return import_obj.load(context, **keywords)
  File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in load
    material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')
  File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in <setcomp>
    material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')
  File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\python\lib\os.py", line 824, in fsdecode
    return filename.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte
Notes to other triagers: - 2.79 loads the geometry, however, it fails to load the material- [x]. It doesn't seem like the file was written correctly; perhaps the software used was also not unicode friendly. - 2.93 does not load anything, it fails with the material like 2.79 and then stops processing[2] - [x] Here's the console output from 2.79 ``` ( 0.0050 sec | 0.0050 sec) Importing OBJ 'E:\\51_╨┤╨╕╨▓╨░╨╜.obj'... ( 0.0050 sec | 0.0000 sec) Parsing OBJ file... ( 0.7121 sec | 0.7071 sec) Done, loading materials and images... Material not found MTL: 'E:\\51_┬º├ò┬º├Ü┬º├ô┬º├æ┬º├ƒ.mtl' <--- HERE WARNING, currently unsupported glass material, skipped. WARNING, currently unsupported translucency option, skipped. WARNING, currently unsupported ambient texture, skipped. WARNING, currently unsupported glass material, skipped. WARNING, currently unsupported translucency option, skipped. WARNING, currently unsupported glass material, skipped. WARNING, currently unsupported translucency option, skipped. ( 0.7326 sec | 0.7276 sec) Done, building geometries (verts:41836 faces:49412 materials: 3 smoothgroups:3) ... ( 1.5498 sec | 1.5448 sec) Done. ( 1.5498 sec | 1.5448 sec) Finished importing: 'E:\\51_╨┤╨╕╨▓╨░╨╜.obj' Progress: 100.00% ``` - [x] From 2.93 ``` ( 0.0000 sec | 0.0000 sec) Importing OBJ 'E:\\51_диван.obj'... ( 0.0151 sec | 0.0010 sec) Parsing OBJ file... Progress: 0.00% Python: Traceback (most recent call last): File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\__init__.py", line 146, in execute return import_obj.load(context, **keywords) File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in load material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl') File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in <setcomp> material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl') File "D:\blender-2.93.0-de06cb85593b-windows64\2.93\python\lib\os.py", line 824, in fsdecode return filename.decode(encoding, errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte ```
Author

I didn't include textures because I don't think it matters. Here's the updated files with textures, Blender 2.79 should import everything correctly. obj_name_tex.7z

I didn't include textures because I don't think it matters. Here's the updated files with textures, Blender 2.79 should import everything correctly. [obj_name_tex.7z](https://archive.blender.org/developer/F9902218/obj_name_tex.7z)

Actually I do get them to load. The output on the command line was weird.

<Edit> Actually I do get them to load. The output on the command line was weird.
Author

Well the .mtl is in the .7z. This is what I see. screenshot.jpg

Well the .mtl is in the .7z. This is what I see. ![screenshot.jpg](https://archive.blender.org/developer/F9902225/screenshot.jpg)

Added subscriber: @rjg

Added subscriber: @rjg

The .obj appears to be encoded as GB2312 and it fails to import properly on both Windows and Linux.

On Linux it imports the model, because it fails gracefully. It attempts to decode the filename of the *.mtl and doesn't throw an error in this step. However, since the attempt to decode the GB2312 encoded name as UTF-8 results in a mangled string, it doesn't load the material definition since it can't find a file with that name.

Material not found MTL: '/home/dev/01-data/03-bug-tracker/obj_name_tex/51_\udca7էڧӧѧ\udcdf.mtl'

On Windows it already fails in the decoding step as https://docs.python.org/3/library/os.html#os.fsdecode uses 'strict' on this platform as the default for error handling. This results in the raising of the UnicodeError as shown in the log file by @deadpin.

Python: Traceback (most recent call last):
  File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\__init__.py", line 146, in execute
    return import_obj.load(context, **keywords)
  File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in load
    material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')
  File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in <setcomp>
    material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')
  File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\python\lib\os.py", line 824, in fsdecode
    return filename.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte

Since there is no code that attempts to detect the encoding of the text file and use that to interpret it, anything other than UTF-8 can't be properly imported. At best if fails gracefully ignoring parts it can't decode, as it does on Linux.

The importer can be improved for Windows to fail as graceful as it does on Linux by replacing

material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')`

with

material_libs |= {f.decode('utf-8', "replace") for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')

All other usages of os.fsdecode() would have to be replaced as well.

In conclusion:

  1. The importer can be improved to handle encoding issues more gracefully on Windows, e.g. ignoring the *.mtl file instead of throwing a UnicodeError.
  2. The import only supports UTF-8 (or ASCII) encoded files.
  3. The .obj/.mtl file in this report uses an unsupported and invalid encoding. It needs to be saved with UTF-8 encoding.
The .obj appears to be encoded as GB2312 and it fails to import properly on both Windows and Linux. On Linux it imports the model, because it fails gracefully. It attempts to decode the filename of the *.mtl and doesn't throw an error in this step. However, since the attempt to decode the GB2312 encoded name as UTF-8 results in a mangled string, it doesn't load the material definition since it can't find a file with that name. ```lines Material not found MTL: '/home/dev/01-data/03-bug-tracker/obj_name_tex/51_\udca7էڧӧѧ\udcdf.mtl' ``` On Windows it already fails in the decoding step as [[ https://docs.python.org/3/library/os.html#os.fsdecode | os.fsdecode ]] uses `'strict'` on this platform as the default for error handling. This results in the raising of the `UnicodeError` as shown in the log file by @deadpin. ```lines Python: Traceback (most recent call last): File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\__init__.py", line 146, in execute return import_obj.load(context, **keywords) File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in load material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl') File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\scripts\addons\io_scene_obj\import_obj.py", line 1186, in <setcomp> material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl') File "Z:\01_git\01_contribution\blender-git\build_windows_Full_x64_vc16_Debug\bin\Debug\2.93\python\lib\os.py", line 824, in fsdecode return filename.decode(encoding, errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 3: invalid start byte ``` Since there is no code that attempts to detect the encoding of the text file and use that to interpret it, anything other than UTF-8 can't be properly imported. At best if fails gracefully ignoring parts it can't decode, as it does on Linux. The importer can be improved for Windows to fail as graceful as it does on Linux by replacing ``` material_libs |= {os.fsdecode(f) for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl')` ``` with ``` material_libs |= {f.decode('utf-8', "replace") for f in filenames_group_by_ext(line.lstrip()[7:].strip(), b'.mtl') ``` All other usages of `os.fsdecode()` would have to be replaced as well. In conclusion: 1. The importer can be improved to handle encoding issues more gracefully on Windows, e.g. ignoring the *.mtl file instead of throwing a `UnicodeError`. 2. The import only supports UTF-8 (or ASCII) encoded files. 3. The *.obj/*.mtl file in this report uses an unsupported and invalid encoding. It needs to be saved with UTF-8 encoding.

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'

I'm setting the status of the ticket to confirmed with the goal to adjust the error handling behavior on Windows to match Linux and macOS. The fact that files that aren't UTF-8 encoded can't be properly imported is not a bug though.

I'm setting the status of the ticket to confirmed with the goal to adjust the error handling behavior on Windows to match Linux and macOS. The fact that files that aren't UTF-8 encoded can't be properly imported is not a bug though.
Robert Guetzkow changed title from UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xa7 in position 3: invalid start byte to OBJ importer: Inconsistent handling of decoding errors between Windows and Linux 2021-03-22 08:46:05 +01:00
Aras Pranckevicius self-assigned this 2022-04-24 12:07:16 +02:00

Changed status from 'Confirmed' to: 'Archived'

Changed status from 'Confirmed' to: 'Archived'

The new OBJ importer (experimental in 3.2, default since 3.3) imports this better on Windows. It still can't find the MTL file of course, since the OBJ file is not UTF-8 encoded, but happens to load some of the textures as a side effect of "if we don't have a valid MTL file, look for the one named exactly like the OBJ file" logic.

I'm archiving this since the python based importer is unlikely to get any further fixes, if anything it is likely to get removed soon-ish.

The new OBJ importer (experimental in 3.2, default since 3.3) imports this better on Windows. It still can't find the MTL file of course, since the OBJ file is not UTF-8 encoded, but happens to load some of the textures as a side effect of "if we don't have a valid MTL file, look for the one named exactly like the OBJ file" logic. I'm archiving this since the python based importer is unlikely to get any further fixes, if anything it is likely to get removed soon-ish.
Sign in to join this conversation.
No Milestone
No project
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender-addons#86783
No description provided.