Build Bot: MacOS X test fails #81077
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
FBX
Interest
Freestyle
Interest
Geometry Nodes
Interest
glTF
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Asset System
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
7 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#81077
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Since we released Blender 2.90.0 the tests of the build bot are failing for the mac.
stdio
It seems to be the case most of the time, happens only on the mac build bot. always on the same test, but not related to a specific commit.
the 2.90.0 was released with passing tests, but the day after that the test started to fail. Strange enough the test did pass once 2 days ago. After we added all the fixes of 2.90.1.
This needs investigation. I set it to Unbreak now as this halts the release for 2.90.1. Is there anything I can do?
Changed status from 'Needs Triage' to: 'Confirmed'
Added subscribers: @Jeroen-Bakker, @mont29, @Sergey
Added subscriber: @sebbas
I am not sure why I'm in the subsribers. This is not specific to the buildbot setup, it will happen on any macOS build. Compiling with ASAN will make it easier to catch the issue.
I do not think so. Someone on a mac should dig into it and see if it's something wrong is going on in the test itself, or in the code.
Not sure why to do it at the day of release. This is not a newly introduced issue. For the release is safer to NOT do changes in code at this point.
Doesn't mean we should not fix the issue, is just to me this is not a stopper for 2.90.1.
Lowering the prio as after testing with the dmg we decided to continue with the release as is.
I got these tests failing on my macOS machine with today's master (
a6b16cfd80
):Needs further investigation ..
Regarding id_management, did someone check that it was not a mere 'out of RAM' issue? Those tests are run in parallel now, iirc this can consume quite a lot of memory… Would also explain why it passes sometimes, and sometimes not?
Not sure how valid this remark is though, don't know the specs of our buildbots.
Where this information is coming from?
The issue can be easily reproduced on macOS by:
ctest -R id_management
Please look into actual problems rather than speculating that something is wrong on the buildbot.
I am not speculating, I am asking a question, after facing same out-of-memory issue here. And I would like to know how I am supposed to investigate an issue that only shows on an OS I have absolutely no access to.
@mont29, I'm not sure why you're the one who is supposed to look into the issue: as I've mentioned above that someone on macOS is to look into it, that it is easy to reproduce, and that it is not specific to buildbot.
At this time I don't think you should be looking into this issue. Give some time for the mac people to dig deeper, and, maybe, eventually assist with addressing the root cause (after it is identified).
Added subscriber: @ankitm
Removed. I couldn't redo the original crash and thought what I fixed was happening on buildbot.
Please ignore the previous comment, it is a separate issue.
Debug build didn't crash at all, so built Release with ASan, and got a heap use after free due to the experimental method
batch_remove(..)
: P1659Added subscriber: @dfelinto
@ankitm do we have any updates on that?
Added subscriber: @JulianEisel
The day started with P1726#8937 showing that
id->us
is not 0 andMECube
was being freed when there was still a user. Output was:Later, @JulianEisel shared a patch that had fixed it for him, but not for me. https://pasteall.org/4OjK/slim
Later, after a lot of debug statements and misguided breakpoints, I found that the code in the for-loop
for (id = last_remapped_id->next; id; id = id->next) {
is not even being executed. So while trying to debug why that is, surprisingly P1726#8932 fixed the test, and also fixed theid->us
from being 1 to 0.Ray suggested P1726#8936 and that is also a fix.
Crash/ test failure happens only in release and relwithdebinfo builds, not debug ones. (ASAN doesn't affect that)
From my uneducated point of view, this sounds like Clang optimizer being over aggressive here, to say the least...
Those patches are nice to investigate, but none are acceptable fixes of course, they are all ways to 'hide' it with extra processing forcing somehow the compiler to generate correct code again ( or disabling any optimization).
I will try with clang on linux tomorrow out of curiosity (what is the version on OSX btw?), but did you try a full explicit init of
tagged_deleted_ids
, with two NULL pointers? That's the only obvious thing I can see from quickly checking the code again?And obviously, big thanks to everybody for investigating this hairy issue!
buildbot is using "AppleClang 12.0.0.12000032"
Julian is using "Apple clang version 12.0.0 (clang-1200.0.32.21)"
I'm using LLVM "clang version 12.0.0 (https://github.com/llvm/llvm-project.git e139450166a7c23ad42f839eddb1e34553967d78)"
I also tested "AppleClang 10.0.1.10010046".
Same results in all four.
P1726#8940 this ? It crashes with this patch applied.
Yes, it was the only potentially fuzzy think I could spot (though I would not have expected it to be an issue)...
No issues here with clang 9, trying with clang 11 now...
And Clang 11 also passes fine here :(
https://godbolt.org/z/nKzaqE comparison of clang and gcc.
code of interest:
Julian found this fix P1726#8941 (making
last_remapped_id
volatile)This issue was referenced by
2ddecfffc3
Changed status from 'Confirmed' to: 'Resolved'
This issue was referenced by
30ec0753c7