Merging conflict checking takes a long time #43

Closed
opened 2023-02-19 14:30:00 +01:00 by Brecht Van Lommel · 10 comments

This can take for example 10 minutes, and causes developers to be blocked from merging their pull requests in the meantime.

When a commit is pushed to the main branch, all pull requests targeting that branch will begin conflict checking (patch_checker). This involves operations like making a temporary shared, bare clone of the repository. Even if this would take just a couple of seconds, multiply that by 125 open pull requests and it's going to take a while to get through everything.

A few ideas:

  • Prioritize the pull requests checking by recent activity, as these are most likely the ones developers are actively working on and trying to get merged.
  • Find a way to configure workers so that push_update and patch_checker can run in parallel instead of just 1 worker doing it. Also the number of workers never seems to go beyond 1.
  • Investigate performance of cloning on Ceph. On my laptop with SSD it's only a few seconds but it's likely worse. Perhaps the work path can be configured to be on a ramdisk or some other faster drive, assuming this either does not involve hard links or they can be preserved across filesystems.
  • Find some way to speed up cloning and potentially other operations.

There is also the following display bug. On pushing changes to a PR, it can show this error, which gets resolved once push_update completes.

This pull request is broken due to missing fork information.

Workers seem to be configured in such a way that first all the conflict checking has to be completed before it handles more push updates.

This can take for example 10 minutes, and causes developers to be blocked from merging their pull requests in the meantime. When a commit is pushed to the main branch, all pull requests targeting that branch will begin conflict checking (`patch_checker`). This involves operations like making a temporary shared, bare clone of the repository. Even if this would take just a couple of seconds, multiply that by 125 open pull requests and it's going to take a while to get through everything. A few ideas: * Prioritize the pull requests checking by recent activity, as these are most likely the ones developers are actively working on and trying to get merged. * Find a way to configure workers so that `push_update` and `patch_checker` can run in parallel instead of just 1 worker doing it. Also the number of workers never seems to go beyond 1. * Investigate performance of cloning on Ceph. On my laptop with SSD it's only a few seconds but it's likely worse. Perhaps the work path can be configured to be on a ramdisk or some other faster drive, assuming this either does not involve hard links or they can be preserved across filesystems. * Find some way to speed up cloning and potentially other operations. There is also the following display bug. On pushing changes to a PR, it can show this error, which gets resolved once `push_update` completes. ``` This pull request is broken due to missing fork information. ``` Workers seem to be configured in such a way that first all the conflict checking has to be completed before it handles more push updates.
Brecht Van Lommel added the
Type
Bug
label 2023-02-19 14:30:00 +01:00
Author
Owner

The error happens before push_update has run for the new commit, once it gets queued for pr_patch_checker it's gone.

The SHA is different in the following code in routers/web/repo/pull.go.

if pull.HeadRepo == nil || !headBranchExist || headBranchSha != sha {
    ctx.Data["IsPullRequestBroken"] = true

I imagine dedicating/prioritizing a worker to push_update would mostly hide this, since pushes are not that frequent and it's probably waiting for pr_patch_checker tasks to complete.

The error happens before `push_update` has run for the new commit, once it gets queued for `pr_patch_checker` it's gone. The SHA is different in the following code in `routers/web/repo/pull.go`. ``` if pull.HeadRepo == nil || !headBranchExist || headBranchSha != sha { ctx.Data["IsPullRequestBroken"] = true ``` I imagine dedicating/prioritizing a worker to `push_update` would mostly hide this, since pushes are not that frequent and it's probably waiting for `pr_patch_checker` tasks to complete.
Author
Owner

Here's an example that seems to have been stuck on checking for days with nothing pending to update it:
blender/blender#104677

Not sure how that happens, if for example restarting the server gets things stuck, or if it crashed while checking.

Here's an example that seems to have been stuck on checking for days with nothing pending to update it: https://projects.blender.org/blender/blender/pulls/104677 Not sure how that happens, if for example restarting the server gets things stuck, or if it crashed while checking.
Author
Owner

It appears the order of conflict checking is from oldest to newest PR, which is almost the opposite of what we want. I made a simple change in our branch to prioritize most recently updated, though this is not global. We can see if this helps.
cb362b7e28

I also confirmed that it is possible for push_update and patch_checker works to run in parallel, and I think we should tweak our configuration to ensure one is not blocked by the other.

It appears the order of conflict checking is from oldest to newest PR, which is almost the opposite of what we want. I made a simple change in our branch to prioritize most recently updated, though this is not global. We can see if this helps. https://github.com/blender/gitea/commit/cb362b7e2820bccf881c919389dd62b052b4e721 I also confirmed that it is possible for `push_update` and `patch_checker` works to run in parallel, and I think we should tweak our configuration to ensure one is not blocked by the other.
Author
Owner

Regarding stuck pull requests. It appears that on 2023/02/17 14:35:36 there were a ton of different errors, various different operations failing at the same time or in quick succession.

This includes conflict checking for a few dozens PRs that (from a quick sample) all seems to be stuck waiting.

@Arnd not sure if you remember anything particular happening at that time, or can correlate it with something.

Regarding stuck pull requests. It appears that on `2023/02/17 14:35:36` there were a ton of different errors, various different operations failing at the same time or in quick succession. This includes conflict checking for a few dozens PRs that (from a quick sample) all seems to be stuck waiting. @Arnd not sure if you remember anything particular happening at that time, or can correlate it with something.
Author
Owner

This configuration may help speeding things up:

[queue.push_update]
WORKERS = 1
[queue.pr_patch_checker]
WORKERS = 2

It's also configurable through the web UI (and lost on restart). So I can experiment with it a bit there to see if it works well.

This configuration may help speeding things up: ``` [queue.push_update] WORKERS = 1 [queue.pr_patch_checker] WORKERS = 2 ``` It's also configurable through the web UI (and lost on restart). So I can experiment with it a bit there to see if it works well.
Author
Owner

There's a workaround in the branch now for the stuck conflict checking:
2093ca5175

Conflict checking still seems slow for new PRs even with the ordering and workers. Maybe because the ordering is not global.

There's a workaround in the branch now for the stuck conflict checking: https://github.com/blender/gitea/commit/2093ca517539f57e5d04528f98c1f2bda93cbb6b Conflict checking still seems slow for new PRs even with the ordering and workers. Maybe because the ordering is not global.
Author
Owner

Patch to speed things up submitted upstream:
https://github.com/go-gitea/gitea/pull/23219

Patch to speed things up submitted upstream: https://github.com/go-gitea/gitea/pull/23219
Author
Owner

The various fixes were deployed, I will consider this resolved.

The various fixes were deployed, I will consider this resolved.
Author
Owner

This still seems to take longer than I expect, I noticed it taking more than 10 minutes after a commit.

Why is not clear, checking the running processes I didn't see the previously slow git commands. There were git cat-file processes running, but those seem to be used for many purposes so it's not clear what they are for and if they are holding up conflict checking. Maybe language stats? Or just viewing of big commits or diffs.

Ideally we could replicate this on the test instance with trace logging enabled, though also not sure it's worth spending much time on this now.

This still seems to take longer than I expect, I noticed it taking more than 10 minutes after a commit. Why is not clear, checking the running processes I didn't see the previously slow git commands. There were `git cat-file` processes running, but those seem to be used for many purposes so it's not clear what they are for and if they are holding up conflict checking. Maybe language stats? Or just viewing of big commits or diffs. Ideally we could replicate this on the test instance with trace logging enabled, though also not sure it's worth spending much time on this now.
Brecht Van Lommel changed title from Merging conflict checking takes a long time and shows as broken in the meantime to Merging conflict checking takes a long time 2023-04-20 20:39:39 +02:00
Author
Owner

Workaround for this is now:

  • Only immediately check PRs updated in the latest 24 hours.
  • For other PRs, start checking when loading the page. That means usually after 10 seconds or less, reloading the page should show the results.
Workaround for this is now: * Only immediately check PRs updated in the latest 24 hours. * For other PRs, start checking when loading the page. That means usually after 10 seconds or less, reloading the page should show the results.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-projects-platform#43
No description provided.