Unit tests sometimes fail due to failing the allowed sleep difference - /code/internal/worker/command_misc_test.go #104288

Closed
opened 2024-03-01 15:41:29 +01:00 by MichaelC · 3 comments
Contributor

System Information
Operating System(s): Linux

Flamenco Version
Is Broken: Somewhere in 3.4 and latest main
Worked OK: <=3.3

Short description of error

The unit tests seem to fail ~50% of the time for the same platform, different build machines.

2024-03-01T00:52:32.3530536Z #16 46.76         	Error Trace:	/code/internal/worker/command_misc_test.go:75
2024-03-01T00:52:32.3530722Z #16 46.76         	Error:      	Max difference between 2006-01-02 15:04:52 +0000 UTC and 2006-01-02 15:04:54 +0000 UTC allowed is 1s, but difference was -2s
2024-03-01T00:52:32.3530934Z #16 46.76         	Test:       	TestCommandSleep

Exact steps for others to reproduce the error

The flamenco executable build environment is done within a Docker buildx on amd64 platform, for either amd64 or arm64. Either target can fail but does not always fail.

The executable is successfully built, test is invoked as per documentation but in a docker buildx container...

make with-deps
make test

The executables run fine, so this seems like a "timing thing" related to how the unit test is calculating time given the difference is negative. It is usually, -2 seconds in the failed builds.

**System Information** Operating System(s): Linux **Flamenco Version** Is Broken: Somewhere in 3.4 and latest main Worked OK: <=3.3 **Short description of error** The unit tests seem to fail ~50% of the time for the same platform, different build machines. ``` 2024-03-01T00:52:32.3530536Z #16 46.76 Error Trace: /code/internal/worker/command_misc_test.go:75 2024-03-01T00:52:32.3530722Z #16 46.76 Error: Max difference between 2006-01-02 15:04:52 +0000 UTC and 2006-01-02 15:04:54 +0000 UTC allowed is 1s, but difference was -2s 2024-03-01T00:52:32.3530934Z #16 46.76 Test: TestCommandSleep ``` **Exact steps for others to reproduce the error** The flamenco executable build environment is done within a Docker buildx on amd64 platform, for either amd64 or arm64. Either target can fail but does not always fail. The executable is successfully built, test is invoked as per documentation but in a docker buildx container... ``` make with-deps make test ``` The executables run fine, so this seems like a "timing thing" related to how the unit test is calculating time given the difference is negative. It is usually, -2 seconds in the failed builds.

This is very strange, I've never seen a negative time pop up, and I can't reproduce this issue. Might have something to do with the goroutine scheduler on specific hardware/platforms.

Could you try this patch to see if it fixes / changes anything?

diff --git a/internal/worker/command_misc_test.go b/internal/worker/command_misc_test.go
index 3958b621..292e81df 100644
--- a/internal/worker/command_misc_test.go
+++ b/internal/worker/command_misc_test.go
@@ -64,7 +64,7 @@ loop:
                select {
                case <-runDone:
                        break loop
-               default:
+               case <-time.After(1 * time.Millisecond):
                        mocks.clock.Add(timeStepSize)
                }
        }
This is very strange, I've never seen a negative time pop up, and I can't reproduce this issue. Might have something to do with the goroutine scheduler on specific hardware/platforms. Could you try this patch to see if it fixes / changes anything? ```diff diff --git a/internal/worker/command_misc_test.go b/internal/worker/command_misc_test.go index 3958b621..292e81df 100644 --- a/internal/worker/command_misc_test.go +++ b/internal/worker/command_misc_test.go @@ -64,7 +64,7 @@ loop: select { case <-runDone: break loop - default: + case <-time.After(1 * time.Millisecond): mocks.clock.Add(timeStepSize) } } ```
Sybren A. Stüvel added the
Status
Needs Information from User
label 2024-03-01 16:33:37 +01:00
Author
Contributor

From what I understand, the buildx cross-platform builds use QEMU and emulate hardware so this could be another riddle on top of different host machines we are using.

I have just tried that patch for the unit test and built arm64 and amd64 execs four times in a row for each and not seen a failure.

From what I understand, the buildx cross-platform builds use QEMU and emulate hardware so this could be another riddle on top of different host machines we are using. I have just tried that patch for the unit test and built arm64 and amd64 execs four times in a row for each and not seen a failure.

Thanks for testing :)

Thanks for testing :)
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: studio/flamenco#104288
No description provided.