Gitea: Guard against unknown submodule hashes #125

Open
opened 2024-09-27 09:58:38 +02:00 by Sergey Sharybin · 6 comments

Sometimes incidents happen and changes to blender.git are landed and they point to a non-existing hashes in the submodule repository.

Example: https://builder.blender.org/admin/#/builders/30/builds/18549

For developers and artists recovery is quite easy: run make update second time. Still annoying, and could be confusing.
For the buildbot it is much more annoying, to recover, as git pull need to happen twice on all the workers, all the tracks.

Ideally we'd have a receive hook which will verify that the submodule hash exists. It might be tricky to detect the change to hash. More importantly, the hash check should be cheap, and not require having to do a checkout to check the hash. Not sure there is a way to query repository by URL to check commit. Can also just hardcode some paths on the local drive to the libs/tests folders.

If the automated check is not possible for whatever reason, would be good to automate recovery from this situation on the buildbot. Maybe parse the make update output for not our ref and run it again?

Sometimes incidents happen and changes to `blender.git` are landed and they point to a non-existing hashes in the submodule repository. Example: https://builder.blender.org/admin/#/builders/30/builds/18549 For developers and artists recovery is quite easy: run `make update` second time. Still annoying, and could be confusing. For the buildbot it is much more annoying, to recover, as `git pull` need to happen twice on all the workers, all the tracks. Ideally we'd have a receive hook which will verify that the submodule hash exists. It might be tricky to detect the change to hash. More importantly, the hash check should be cheap, and not require having to do a checkout to check the hash. Not sure there is a way to query repository by URL to check commit. Can also just hardcode some paths on the local drive to the libs/tests folders. If the automated check is not possible for whatever reason, would be good to automate recovery from this situation on the buildbot. Maybe parse the `make update` output for `not our ref` and run it again?
Sergey Sharybin added the
Service
Gitea
label 2024-09-27 09:58:38 +02:00

Not sure there is a way to query repository by URL to check commit.

There is, something like this would work:

curl "https://projects.blender.org/api/v1/repos/blender/blender-test-data/commits?limit=1&sha=0ff59eafc7658787b55db167b4157e7eed40bc6c" | grep sha | wc -l

it will return 0 for a non-existing hash in the repo, 1 for an existing one

> Not sure there is a way to query repository by URL to check commit. There is, something like this would work: ``` curl "https://projects.blender.org/api/v1/repos/blender/blender-test-data/commits?limit=1&sha=0ff59eafc7658787b55db167b4157e7eed40bc6c" | grep sha | wc -l ``` it will return 0 for a non-existing hash in the repo, 1 for an existing one

I took the time to cook up a rough pre-receive Git hook script:

#!/bin/bash

ERROR=0

# check if a ref exists in the remote submodule repository
check_submodule_ref() {
  local url="$1"
  local sha="$2"

  # query API and check if the ref exists in the remote repo
  exists=$(curl -s "$url" | grep sha | wc -l)

  # when the ref does not exist, set the ERROR flag and notify
  if [ "$exists" -eq 0 ]; then
    echo "Error: Invalid ref $sha for submodule $url"
    ERROR=1
  fi
}

# extract the submodule ref, path, and repo URL
check_submodules() {
  while read -r line; do
    sha=$(echo "$line" | awk '{print $1}')
    path=$(echo "$line" | awk '{print $2}')

    # get corresponding URL for the submodule in the .gitmodules file
    url=$(git config --file .gitmodules --get-regexp "submodule\.$path\.url" | awk '{print $2}')

    # transform URL to call the API with the correct repo path and commit SHA
    api_url="${url%.git}/commits?limit=1&sha=${sha#-}"

    # check the ref in the remote repo
    check_submodule_ref "$api_url" "$sha"
  done <<< "$(git submodule status)"
}

# loop incoming references to check for submodule changes
while read oldrev newrev refname; do
  # check for any changes in the submodule references
  if git diff-tree -r --name-only "$oldrev" "$newrev" | grep -q '\.gitmodules'; then
    echo "Submodule changes detected, validating refs..."

    # run the submodules check
    check_submodules

    # reject the push when an error is found
    if [ "$ERROR" -ne 0 ]; then
      echo "Push rejected due to invalid submodule references."
      exit 1
    fi
  fi
done

exit 0

It's quite a bit of logic and I will need to have this tested as well as reviewed by others.

I took the time to cook up a rough pre-receive Git hook script: ``` #!/bin/bash ERROR=0 # check if a ref exists in the remote submodule repository check_submodule_ref() { local url="$1" local sha="$2" # query API and check if the ref exists in the remote repo exists=$(curl -s "$url" | grep sha | wc -l) # when the ref does not exist, set the ERROR flag and notify if [ "$exists" -eq 0 ]; then echo "Error: Invalid ref $sha for submodule $url" ERROR=1 fi } # extract the submodule ref, path, and repo URL check_submodules() { while read -r line; do sha=$(echo "$line" | awk '{print $1}') path=$(echo "$line" | awk '{print $2}') # get corresponding URL for the submodule in the .gitmodules file url=$(git config --file .gitmodules --get-regexp "submodule\.$path\.url" | awk '{print $2}') # transform URL to call the API with the correct repo path and commit SHA api_url="${url%.git}/commits?limit=1&sha=${sha#-}" # check the ref in the remote repo check_submodule_ref "$api_url" "$sha" done <<< "$(git submodule status)" } # loop incoming references to check for submodule changes while read oldrev newrev refname; do # check for any changes in the submodule references if git diff-tree -r --name-only "$oldrev" "$newrev" | grep -q '\.gitmodules'; then echo "Submodule changes detected, validating refs..." # run the submodules check check_submodules # reject the push when an error is found if [ "$ERROR" -ne 0 ]; then echo "Push rejected due to invalid submodule references." exit 1 fi fi done exit 0 ``` It's quite a bit of logic and I will need to have this tested as well as reviewed by others.
Author
Owner

url=$(git config --file .gitmodules --get-regexp "submodule.$path.url" | awk '{print $2}')

I don't think it'll work on the server side?
On the server side the repository is stored as "bare", so it would not contain .gitmodules file.

There are some way described on the Internet how to get content of a file in a bare repository. Perhaps it could work well enough.
Although, currently I think it will error out if a new submodule is added. Imagine when we'll be adding linux_arm64 libraries: ideally the sanitization script will let us do it.

And another aspect of checking commit: would the fetch fail for commit which is not in any of the branches, but is not yet garbage-collected? You can imagine situation when locally you force-push to the libraries repo branch, and then push blender.git.

> url=$(git config --file .gitmodules --get-regexp "submodule\.$path\.url" | awk '{print $2}') I don't think it'll work on the server side? On the server side the repository is stored as "bare", so it would not contain `.gitmodules` file. There are some way described on the Internet how to get content of a file in a bare repository. Perhaps it could work well enough. Although, currently I think it will error out if a new submodule is added. Imagine when we'll be adding linux_arm64 libraries: ideally the sanitization script will let us do it. And another aspect of checking commit: would the fetch fail for commit which is not in any of the branches, but is not yet garbage-collected? You can imagine situation when locally you force-push to the libraries repo branch, and then push blender.git.

On the server side the repository is stored as "bare", so it would not contain .gitmodules file.

Can we use git show "$newrev:.gitmodules" instead?

git@gitea-linux-uatest:/gitea/work_dir/data/gitea-repositories/blender/blender.git$ git show "428e67d7f9a99e2a91aa49a6fea0f4792242b164:.gitmodules"
[submodule "lib/linux_x64"]
        update = none
        path = lib/linux_x64
        url = https://projects.blender.org/blender/lib-linux_x64.git
        branch = main
[submodule "lib/macos_arm64"]
        update = none
        path = lib/macos_arm64
        url = https://projects.blender.org/blender/lib-macos_arm64.git
        branch = main
[submodule "lib/macos_x64"]
        update = none
        path = lib/macos_x64
        url = https://projects.blender.org/blender/lib-macos_x64.git
        branch = main
[submodule "lib/windows_x64"]
        update = none
        path = lib/windows_x64
        url = https://projects.blender.org/blender/lib-windows_x64.git
        branch = main
[submodule "release/datafiles/assets"]
        path = release/datafiles/assets
        url = https://projects.blender.org/blender/blender-assets.git
        branch = main
[submodule "tests/data"]
        update = none
        path = tests/data
        url = https://projects.blender.org/blender/blender-test-data.git
        branch = main
> On the server side the repository is stored as "bare", so it would not contain .gitmodules file. Can we use `git show "$newrev:.gitmodules"` instead? ``` git@gitea-linux-uatest:/gitea/work_dir/data/gitea-repositories/blender/blender.git$ git show "428e67d7f9a99e2a91aa49a6fea0f4792242b164:.gitmodules" [submodule "lib/linux_x64"] update = none path = lib/linux_x64 url = https://projects.blender.org/blender/lib-linux_x64.git branch = main [submodule "lib/macos_arm64"] update = none path = lib/macos_arm64 url = https://projects.blender.org/blender/lib-macos_arm64.git branch = main [submodule "lib/macos_x64"] update = none path = lib/macos_x64 url = https://projects.blender.org/blender/lib-macos_x64.git branch = main [submodule "lib/windows_x64"] update = none path = lib/windows_x64 url = https://projects.blender.org/blender/lib-windows_x64.git branch = main [submodule "release/datafiles/assets"] path = release/datafiles/assets url = https://projects.blender.org/blender/blender-assets.git branch = main [submodule "tests/data"] update = none path = tests/data url = https://projects.blender.org/blender/blender-test-data.git branch = main ```
Author
Owner

Intuitively yes. I am not sure how git show handles situation for the new hash from a receive hook. It is one of those things we'd need to try and see :)

Intuitively yes. I am not sure how `git show` handles situation for the new hash from a receive hook. It is one of those things we'd need to try and see :)

some googling brought me https://github.com/rodrigo-lima/submodule_checker

Plus: this looks pretty close to what we are looking for
Minus: Last commit: 10 years ago

sooo.... it may need a little work

some googling brought me https://github.com/rodrigo-lima/submodule_checker Plus: this looks pretty close to what we are looking for Minus: Last commit: 10 years ago sooo.... it may need a little work
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-projects-platform#125
No description provided.