Buildbot: when triggered from Gitea, sometimes builds are missing #57

Closed
opened 2023-03-20 14:22:47 +01:00 by Sybren A. Stüvel · 21 comments

Sometimes when I trigger the buildbot with @blender-bot build or @blender-bot package on a PR, it seems to skip some platforms.

Here's a screenshot of the situation:

image

As you can see, only one builder (vexp-code-patch-darwin-x86_64) actually performed a build, and 3 more are "pending".

However, there are no further builders listed. Since 100% of the listed builders actually succeeded, the entire build is marked as succesful. It is this 'success' status that is communicated back to Gitea.

This issue happened to me twice before. In all those cases it were the Windows and Linux builders that were missing. I think once or twice it also missed one of the macOS builders, but I'm not 100% sure about that.

When triggering a rebuild from the buildbot web interface, two entries hang (for at least 5 minutes) on "loading buildrequest details...". Refreshing the page doesn't help.

image

Not sure what it means, maybe it helps with finding the root cause.

Links:

Sometimes when I trigger the buildbot with `@blender-bot build` or `@blender-bot package` on a PR, it seems to skip some platforms. Here's a screenshot of the situation: ![image](/attachments/5ec0364a-87d2-4f99-9b0f-e4eb1369fc3b) As you can see, only one builder (`vexp-code-patch-darwin-x86_64`) actually performed a build, and 3 more are "pending". However, there are no further builders listed. Since 100% of the listed builders actually succeeded, the entire build is marked as succesful. It is this 'success' status that is communicated back to Gitea. This issue happened to me twice before. In all those cases it were the Windows and Linux builders that were missing. I think once or twice it also missed one of the macOS builders, but I'm not 100% sure about that. When triggering a rebuild from the buildbot web interface, two entries hang (for at least 5 minutes) on "loading buildrequest details...". Refreshing the page doesn't help. ![image](/attachments/b2f1822a-20e7-4c82-a24d-f130fae7a9b0) Not sure what it means, maybe it helps with finding the root cause. Links: - [this build](https://builder.blender.org/admin/#/builders/136/builds/794) - triggered from https://projects.blender.org/blender/blender/pulls/105604#issuecomment-904231
Brecht Van Lommel added the
Service
Buildbot
label 2023-03-20 14:34:47 +01:00

This might be a bug in buildbot that would be fixed by upgrading to the latest version.
https://github.com/buildbot/buildbot/pull/6152

@Arnd I guess we should upgrade buildbot at some point regardless.

This might be a bug in buildbot that would be fixed by upgrading to the latest version. https://github.com/buildbot/buildbot/pull/6152 @Arnd I guess we should upgrade buildbot at some point regardless.
Brecht Van Lommel added the
Type
Deployment
label 2023-03-24 13:44:07 +01:00

For reference, created this report some weeks ago, which seems to be same issue too: https://gitlab.com/blender/bdr-devops-core/-/issues/1

For reference, created this report some weeks ago, which seems to be same issue too: https://gitlab.com/blender/bdr-devops-core/-/issues/1

For completeness-sake, the builds do not seem to only be missing when triggered from Gitea; but perhaps these were the most visible to users.

This issue is likely to have been addressed with upgrade of buildbot from 3.2.0 to 3.3.0.
Preliminary tests do not have the issue show up so far.
Closing ticket. Please re-open if happening again.

For completeness-sake, the builds do not seem to only be missing when triggered from Gitea; but perhaps these were the most visible to users. This issue is likely to have been addressed with upgrade of buildbot from 3.2.0 to 3.3.0. Preliminary tests do not have the issue show up so far. Closing ticket. Please re-open if happening again.

It appears this is still happening after the upgrade.

It appears this is still happening after the upgrade.

The latest buildbot version is 3.8.0 but we only upgraded to 3.3.0. I think we should upgrade to the latest.

The latest buildbot version is 3.8.0 but we only upgraded to 3.3.0. I think we should upgrade to the latest.

As far as I can find, the last version in the 3.* series is 3.6.0
I have updated the UATEST cluster to 3.6.0 on the master. The clients should, according to documentation, stay compatible; but will upgrade those at some point too.

So far no weird things.
I'm planning to upgrade PROD to 3.6.0 on Jul 26 (tomorrow) unless I find something weird (will report that here then)

As far as I can find, the last version in the 3.* series is 3.6.0 I have updated the UATEST cluster to 3.6.0 on the master. The clients should, according to documentation, stay compatible; but will upgrade those at some point too. So far no weird things. I'm planning to upgrade PROD to 3.6.0 on Jul 26 (tomorrow) unless I find something weird (will report that here then)
The latest is 3.8.0? https://github.com/buildbot/buildbot/releases https://pypi.org/project/buildbot/#history

Totally weird. Their website release-notes history only goes to 3.6.0
http://docs.buildbot.net/current/relnotes/index.html

Will investigate if there's anything big going on in the releases after that and roll 'm out on uatest asap so I can hopefully still roll out a 3.8.0 on prod, tomorrow.

Totally weird. Their website release-notes history only goes to 3.6.0 http://docs.buildbot.net/current/relnotes/index.html Will investigate if there's anything big going on in the releases after that and roll 'm out on uatest asap so I can hopefully still roll out a 3.8.0 on prod, tomorrow.

Upgraded buildbot-master, worker and www + components to 3.8.0 on UATEST and PROD.
Initial results seem to indicate that we're likely still experiencing the same issue.

I did a little sleuthing to see if anything obvious could be found.
Using the api/JSON that the web-interface uses, and sqlite on the database, the following seems to happen:

  • Build-request for buildset comes in
  • creates subbuildrequests in DB
  • Creates buildrequest_claims for each of them
    ...but only a few actually were properly claimed and finalized.
    Not all build-requests/buildrequest_claims have a build associated with it; reason is not clear but possibly race-condition somewhere or improper locking in general.

This result was seen when the workers were still running 3.2.0, however.. Upgraded them to 3.8.0 just now to see if that'd fix the issue.

Upgraded buildbot-master, worker and www + components to 3.8.0 on UATEST and PROD. Initial results seem to indicate that we're likely still experiencing the same issue. I did a little sleuthing to see if anything obvious could be found. Using the api/JSON that the web-interface uses, and sqlite on the database, the following seems to happen: * Build-request for buildset comes in * creates subbuildrequests in DB * Creates buildrequest_claims for *each* of them ...but only a few actually were properly claimed and finalized. Not all build-requests/buildrequest_claims have a build associated with it; reason is not clear but possibly race-condition somewhere or improper locking in general. This result was seen when the workers were still running 3.2.0, however.. Upgraded them to 3.8.0 just now to see if that'd fix the issue.

Issue still present, sadly.
In preparation for submitting issue upstream, i'm working on migrating buildbot away from sqlite first (as stated in deployment guide for non-small deployments).

Issue still present, sadly. In preparation for submitting issue upstream, i'm working on migrating buildbot away from sqlite first (as stated in deployment guide for non-small deployments).

The Sqlite->Postgres migration has been performed on both UATEST and PROD. Tidying up leftovers now (bdr-devops-core configs, etc)
The next step is for the problem to re-occur with Postgres in place and create a report out of the occurrance to send upstream.
If using postgres fixed the issue; even better.
to be continued.

The Sqlite->Postgres migration has been performed on both UATEST and PROD. Tidying up leftovers now (bdr-devops-core configs, etc) The next step is for the problem to re-occur with Postgres in place and create a report out of the occurrance to send upstream. If using postgres fixed the issue; even better. to be continued.

If this build's anything to go by; it'd seem we're still seeing the same issue. This was a re-trigger of a previous build that was cancelled before the upgrade but apart from that it shouldn't make a difference.

https://builder.blender.org/admin/#/builders/36/builds/11800

If this build's anything to go by; it'd seem we're still seeing the same issue. This was a re-trigger of a previous build that was cancelled before the upgrade but apart from that it shouldn't make a difference. https://builder.blender.org/admin/#/builders/36/builds/11800

I've made an issue in the github issue-tracker for buildbot:
https://github.com/buildbot/buildbot/issues/7091

I've made an issue in the github issue-tracker for buildbot: https://github.com/buildbot/buildbot/issues/7091

Currently trying to resolve this issue. It was mentioned in the GitHub issues by one of the maintainers that it might have to do with collapsing build/requests.

Currently trying to resolve this issue. It was mentioned in the GitHub issues by one of the maintainers that it might have to do with collapsing build/requests.

After trying to disable collapsing results globally in the configuration of Buildbot and still seeing these pending builds disappear, I want to see if updating our Buildbot to the latest version will fix the issue at hand.

After trying to disable collapsing results globally in the configuration of Buildbot and still seeing these pending builds disappear, I want to see if updating our Buildbot to the latest version will fix the issue at hand.
Bart van der Braak added this to the DevOps Progress Board project 2024-07-16 12:59:32 +02:00
Bart van der Braak self-assigned this 2024-07-16 13:38:20 +02:00

When trying to upgrade Buildbot to 4.0.1, I get the following error:

pipenv run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg
error while parsing config file:
Traceback (most recent call last):
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/twisted/internet/defer.py", line 212, in maybeDeferred
    result = f(*args, **kwargs)
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/scripts/checkconfig.py", line 58, in checkconfig
    return _loadConfig(basedir=basedir, configFile=configFile, quiet=quiet)
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/scripts/checkconfig.py", line 28, in _loadConfig
    FileLoader(basedir, configFile).loadConfig()
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/config/master.py", line 123, in loadConfig
    filename, config_dict = loadConfigDict(self.basedir, self.configFileName)
--- <exception caught here> ---
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/config/master.py", line 85, in loadConfigDict
    execfile(filename, localDict)
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/twisted/python/compat.py", line 214, in execfile
    exec(code, globals, locals)
  File "/home/blender/.devops/services/buildbot-master/master.cfg", line 11, in <module>
    BuildmasterConfig = setup.setup()
  File "/home/blender/git/bdr-devops-core/buildbot/setup.py", line 43, in setup
    c["change_source"] = pipeline.change_sources()
  File "/home/blender/git/bdr-devops-core/buildbot/pipeline/__init__.py", line 58, in change_sources
    plugins_changes.GitPoller(
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/changes/gitpoller.py", line 75, in __init__
    super().__init__(repourl, **kwargs)
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/util/service.py", line 291, in __init__
    super().__init__(*args, **kwargs)
  File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/util/service.py", line 186, in __init__
    self.checkConfig(*args, **kwargs)
builtins.TypeError: checkConfig() got an unexpected keyword argument 'pollinterval'
System.Management.Automation.RemoteException
Configuration Errors:
  error while parsing config file: checkConfig() got an unexpected keyword argument 'pollinterval' (traceback in logfile)
Command [pipenv] with arguments [run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg] failed with exit code [1]
Caught error during invoke step [start]
Command [pipenv] with arguments [run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg] failed with exit code [1]

------------------------------------------------------------
Error summary - [start]
------------------------------------------------------------
Traceback (most recent call last):
  error while parsing config file: checkConfig() got an unexpected keyword argument 'pollinterval' (traceback in logfile)

ERROR: Step [start] failed, halting pipeline !
<ScriptBlock>:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:158
Core-Shell-Retry-Command<Process>:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:52
Core-Shell-Invoke-Command:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:121
Core-Shell-Invoke-Pipenv:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:684
Buildbot-Check-MasterConfig:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:115
Buildbot-Start-Master:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:85
<ScriptBlock>:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:342
Core-Shell-Invoke-Step:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:741
Core-Shell-Invoke-Pipeline:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:830
Main:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:380
<ScriptBlock>:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:419
<ScriptBlock>:<No file>:1
Pipeline failed, exiting !
When trying to upgrade Buildbot to 4.0.1, I get the following error: ``` pipenv run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg error while parsing config file: Traceback (most recent call last): File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/twisted/internet/defer.py", line 212, in maybeDeferred result = f(*args, **kwargs) File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/scripts/checkconfig.py", line 58, in checkconfig return _loadConfig(basedir=basedir, configFile=configFile, quiet=quiet) File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/scripts/checkconfig.py", line 28, in _loadConfig FileLoader(basedir, configFile).loadConfig() File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/config/master.py", line 123, in loadConfig filename, config_dict = loadConfigDict(self.basedir, self.configFileName) --- <exception caught here> --- File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/config/master.py", line 85, in loadConfigDict execfile(filename, localDict) File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/twisted/python/compat.py", line 214, in execfile exec(code, globals, locals) File "/home/blender/.devops/services/buildbot-master/master.cfg", line 11, in <module> BuildmasterConfig = setup.setup() File "/home/blender/git/bdr-devops-core/buildbot/setup.py", line 43, in setup c["change_source"] = pipeline.change_sources() File "/home/blender/git/bdr-devops-core/buildbot/pipeline/__init__.py", line 58, in change_sources plugins_changes.GitPoller( File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/changes/gitpoller.py", line 75, in __init__ super().__init__(repourl, **kwargs) File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/util/service.py", line 291, in __init__ super().__init__(*args, **kwargs) File "/home/blender/.local/share/virtualenvs/buildbot-master-r-Y2g80k/lib/python3.9/site-packages/buildbot/util/service.py", line 186, in __init__ self.checkConfig(*args, **kwargs) builtins.TypeError: checkConfig() got an unexpected keyword argument 'pollinterval' System.Management.Automation.RemoteException Configuration Errors: error while parsing config file: checkConfig() got an unexpected keyword argument 'pollinterval' (traceback in logfile) Command [pipenv] with arguments [run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg] failed with exit code [1] Caught error during invoke step [start] Command [pipenv] with arguments [run buildbot checkconfig /home/blender/.devops/services/buildbot-master/master.cfg] failed with exit code [1] ------------------------------------------------------------ Error summary - [start] ------------------------------------------------------------ Traceback (most recent call last): error while parsing config file: checkConfig() got an unexpected keyword argument 'pollinterval' (traceback in logfile) ERROR: Step [start] failed, halting pipeline ! <ScriptBlock>:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:158 Core-Shell-Retry-Command<Process>:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:52 Core-Shell-Invoke-Command:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:121 Core-Shell-Invoke-Pipenv:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:684 Buildbot-Check-MasterConfig:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:115 Buildbot-Start-Master:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:85 <ScriptBlock>:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:342 Core-Shell-Invoke-Step:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:741 Core-Shell-Invoke-Pipeline:/home/blender/git/bdr-devops-core/modules/powershell/core-shell.ps1:830 Main:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:380 <ScriptBlock>:/home/blender/git/bdr-devops-core/cmd/buildbot/buildbot-master.ps1:419 <ScriptBlock>:<No file>:1 Pipeline failed, exiting ! ```

I've upgraded Buildbot to 3.11.6 on UATEST for temporary testing, needed the following changes:

  • ~/.devops/services/buildbot-master/Pipfile
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[packages]
pyyaml = "==6.0.1"
buildbot-worker = "==3.11.6"
pylint = "*"
requests = "*"
treq = "==22.2.0"
buildbot = "==3.11.6"
buildbot-www = "==3.11.6"
buildbot-console-view = "==3.11.6"
buildbot-grid-view = "==3.11.6"
buildbot-waterfall-view = "==3.11.6"

[dev-packages]

[requires]
python_version = "3.9"
  • ~/git/bdr-devops-core/buildbot/pipeline/__init__.py
:%s/pollinterval/pollInterval/g
  • cd /home/blender/git/bdr-devops-core && /snap/bin/pwsh -c "./cmd/buildbot/buildbot-master.ps1 -steps db-upgrade -serviceEnvId UATEST -serviceHostId pvep-lvm-buildbot-master-01"
I've upgraded Buildbot to 3.11.6 on UATEST for temporary testing, needed the following changes: - `~/.devops/services/buildbot-master/Pipfile` ``` [[source]] url = "https://pypi.python.org/simple" verify_ssl = true name = "pypi" [packages] pyyaml = "==6.0.1" buildbot-worker = "==3.11.6" pylint = "*" requests = "*" treq = "==22.2.0" buildbot = "==3.11.6" buildbot-www = "==3.11.6" buildbot-console-view = "==3.11.6" buildbot-grid-view = "==3.11.6" buildbot-waterfall-view = "==3.11.6" [dev-packages] [requires] python_version = "3.9" ``` - `~/git/bdr-devops-core/buildbot/pipeline/__init__.py` ``` :%s/pollinterval/pollInterval/g ``` - `cd /home/blender/git/bdr-devops-core && /snap/bin/pwsh -c "./cmd/buildbot/buildbot-master.ps1 -steps db-upgrade -serviceEnvId UATEST -serviceHostId pvep-lvm-buildbot-master-01"`

@bartvdbraak The pollinterval was a deprecated field, which was removed for 4.0: 9c459aad37

The proper spelling is pollInterval, so should be easy to update our code: https://docs.buildbot.net/latest/manual/configuration/changesources.html#gitpoller

@bartvdbraak The `pollinterval` was a deprecated field, which was removed for 4.0: https://github.com/buildbot/buildbot/commit/9c459aad3765596a91b2545727cbb77dd259dd8c The proper spelling is `pollInterval`, so should be easy to update our code: https://docs.buildbot.net/latest/manual/configuration/changesources.html#gitpoller

@bartvdbraak The pollinterval was a deprecated field, which was removed for 4.0: 9c459aad37

The proper spelling is pollInterval, so should be easy to update our code: https://docs.buildbot.net/latest/manual/configuration/changesources.html#gitpoller

I was already aware of this change:

> @bartvdbraak The `pollinterval` was a deprecated field, which was removed for 4.0: https://github.com/buildbot/buildbot/commit/9c459aad3765596a91b2545727cbb77dd259dd8c > > The proper spelling is `pollInterval`, so should be easy to update our code: https://docs.buildbot.net/latest/manual/configuration/changesources.html#gitpoller I was already aware of this change: - https://gitlab.com/blender/bdr-devops-core/-/commit/d0717598c86f076d3b796f6e93df2bdfd14f9086

@bartvdbraak Ah, great! I didn't see that commit at the time I saw the comment here about issues with upgrade to 4.0.1.

@bartvdbraak Ah, great! I didn't see that commit at the time I saw the comment here about issues with upgrade to 4.0.1.

After upgrading Buildbot to version 3.11.6 and adding the c["collapseRequests"] = False configuration, this issue no longer appears. I will close this issue for now, but I will reopen it if it happens again.

After upgrading Buildbot to version `3.11.6` and adding the `c["collapseRequests"] = False` configuration, this issue no longer appears. I will close this issue for now, but I will reopen it if it happens again.
Sign in to join this conversation.
No description provided.