Unit tests sometimes fail due to failing the allowed sleep difference - /code/internal/worker/command_misc_test.go #104288

New Issue

MichaelC · 2024-03-01T15:41:29+01:00

MichaelC commented

2024-03-01 15:41:29 +01:00

System Information
Operating System(s): Linux

Flamenco Version
Is Broken: Somewhere in 3.4 and latest main
Worked OK: <=3.3

Short description of error

The unit tests seem to fail ~50% of the time for the same platform, different build machines.

2024-03-01T00:52:32.3530536Z #16 46.76         	Error Trace:	/code/internal/worker/command_misc_test.go:75
2024-03-01T00:52:32.3530722Z #16 46.76         	Error:      	Max difference between 2006-01-02 15:04:52 +0000 UTC and 2006-01-02 15:04:54 +0000 UTC allowed is 1s, but difference was -2s
2024-03-01T00:52:32.3530934Z #16 46.76         	Test:       	TestCommandSleep

Exact steps for others to reproduce the error

The flamenco executable build environment is done within a Docker buildx on amd64 platform, for either amd64 or arm64. Either target can fail but does not always fail.

The executable is successfully built, test is invoked as per documentation but in a docker buildx container...

make with-deps
make test

The executables run fine, so this seems like a "timing thing" related to how the unit test is calculating time given the difference is negative. It is usually, -2 seconds in the failed builds.

**System Information** Operating System(s): Linux **Flamenco Version** Is Broken: Somewhere in 3.4 and latest main Worked OK: <=3.3 **Short description of error** The unit tests seem to fail ~50% of the time for the same platform, different build machines. ``` 2024-03-01T00:52:32.3530536Z #16 46.76 Error Trace: /code/internal/worker/command_misc_test.go:75 2024-03-01T00:52:32.3530722Z #16 46.76 Error: Max difference between 2006-01-02 15:04:52 +0000 UTC and 2006-01-02 15:04:54 +0000 UTC allowed is 1s, but difference was -2s 2024-03-01T00:52:32.3530934Z #16 46.76 Test: TestCommandSleep ``` **Exact steps for others to reproduce the error** The flamenco executable build environment is done within a Docker buildx on amd64 platform, for either amd64 or arm64. Either target can fail but does not always fail. The executable is successfully built, test is invoked as per documentation but in a docker buildx container... ``` make with-deps make test ``` The executables run fine, so this seems like a "timing thing" related to how the unit test is calculating time given the difference is negative. It is usually, -2 seconds in the failed builds.

build-flamenco-executables-latest-build-and-deploy (linux_arm64, nightly)-836.log

1.8 MiB

Sybren A. Stüvel commented

2024-03-01 16:33:32 +01:00

This is very strange, I've never seen a negative time pop up, and I can't reproduce this issue. Might have something to do with the goroutine scheduler on specific hardware/platforms.

Could you try this patch to see if it fixes / changes anything?

diff --git a/internal/worker/command_misc_test.go b/internal/worker/command_misc_test.go
index 3958b621..292e81df 100644
--- a/internal/worker/command_misc_test.go
+++ b/internal/worker/command_misc_test.go
@@ -64,7 +64,7 @@ loop:
                select {
                case <-runDone:
                        break loop
-               default:
+               case <-time.After(1 * time.Millisecond):
                        mocks.clock.Add(timeStepSize)
                }
        }

This is very strange, I've never seen a negative time pop up, and I can't reproduce this issue. Might have something to do with the goroutine scheduler on specific hardware/platforms. Could you try this patch to see if it fixes / changes anything? ```diff diff --git a/internal/worker/command_misc_test.go b/internal/worker/command_misc_test.go index 3958b621..292e81df 100644 --- a/internal/worker/command_misc_test.go +++ b/internal/worker/command_misc_test.go @@ -64,7 +64,7 @@ loop: select { case <-runDone: break loop - default: + case <-time.After(1 * time.Millisecond): mocks.clock.Add(timeStepSize) } } ```

Sybren A. Stüvel added the

Status

Needs Information from User

label 2024-03-01 16:33:37 +01:00

MichaelC commented

2024-03-01 17:10:17 +01:00

From what I understand, the buildx cross-platform builds use QEMU and emulate hardware so this could be another riddle on top of different host machines we are using.

I have just tried that patch for the unit test and built arm64 and amd64 execs four times in a row for each and not seen a failure.

From what I understand, the buildx cross-platform builds use QEMU and emulate hardware so this could be another riddle on top of different host machines we are using. I have just tried that patch for the unit test and built arm64 and amd64 execs four times in a row for each and not seen a failure.

Sybren A. Stüvel referenced this issue from a commit

2024-03-04 14:18:30 +01:00

Worker: fix Go scheduling issue in `sleep` command test

Sybren A. Stüvel closed this issue

2024-03-04 14:18:30 +01:00

Sybren A. Stüvel commented

2024-03-04 14:18:41 +01:00

Thanks for testing :)

MichaelC referenced this issue

2024-03-07 15:58:57 +01:00

Unit Tests - Docker/arm64 - FAIL projects.blender.org/studio/flamenco/internal/worker - context deadline exceeded #104290

Sign in to join this conversation.