Skip to content

Add GH Action workflow to update downstream repos#73

Open
joverlee521 wants to merge 3 commits intomainfrom
update-vendored
Open

Add GH Action workflow to update downstream repos#73
joverlee521 wants to merge 3 commits intomainfrom
update-vendored

Conversation

@joverlee521
Copy link
Copy Markdown
Contributor

@joverlee521 joverlee521 commented Apr 13, 2026

Description of proposed changes

Searches the Nextstrain GitHub org to find repos that have the .gitrepo file with the nextstrain/shared or nextstrain/ingest remote to create a matrix of repos to potentially update. Installs and uses git subrepo to pull in the latest changes with the --force flag to avoid merge conflicts to due rebasing in the downstream repo. If there are changes pulled down, then git subrepo will create a single commit. If there is a single commit, then push up the changes to a branch and create or update the PR in the downstream repo. Nothing happens if there were no changes and workflow exits with error if it encounters more than one commit.

Related issue(s)

Prompted by nextstrain/public#39 (comment)

Checklist

  • Checks pass
  • If adding a script, add an entry for it in the README.
  • Test run creates/updates PRs in vendored repos
  • latest test run

git config user.email "${{ vars.GIT_USER_EMAIL_NEXTSTRAIN_BOT }}"

git switch -c "$branch"
git subrepo pull "$VENDORED_PATH" --force
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's been (thankfully) a while since I interacted with subrepo) if there were local changes, in what situation(s) would we want to override them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, I have not seen any of the vendored repos use long-term local changes. I've only seen temporary test of local changes that are ported to the central nextstrain/shared repo. I've mostly had to use the --force flag to work around failures due to rebasing in the vendored repo, as described in the README:

shared/README.md

Lines 38 to 44 in c29898f

> **Warning**
> Beware of rebasing/dropping the parent commit of a `git subrepo` update
`git subrepo` relies on metadata in the `shared/vendored/.gitrepo` file,
which includes the hash for the parent commit in the pathogen repos.
If this hash no longer exists in the commit history, there will be errors when
running future `git subrepo pull` commands.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--force is only necessary if someone had done a manual rebase of the git subrepo-generated commit, right? That shouldn't be an issue going forwards with these automated PRs.

I'd vote to drop --force for visibility into any local changes, instead of silently discarding them.

Comment thread .github/workflows/update-vendored.yaml Outdated
Comment thread .github/workflows/update-vendored.yaml Outdated
Comment on lines +30 to +31
matrix=$(gh api -X GET search/code \
-f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also search for references to the old repo name nextstrain/ingest. See code search

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, totally forgot about the old name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ada3160.

This actually flagged an issue that I didn't consider before: avian-flu has both shared/vendored and ingest/vendored, which causes one of the pushes to fail because of mismatched ref. avian-flu should be cleaned up to only have a single copy of the vendored repo, but this workflow should probably also flag when there are multiple vendored paths for the same repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avian-flu/ingest/vendored will need to be removed as part of nextstrain/avian-flu#67.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ab5399f to deduplicate by repo and only update nextstrain/shared if both are in the repo. I decided not to add another step for flagging multiple paths since that feels out of the purview of this repo...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually flagged an issue that I didn't consider before: avian-flu has both shared/vendored and ingest/vendored, which causes one of the pushes to fail because of mismatched ref.

The alternative is to push to different branches and open separate PRs for each. But that requires additional changes and probably not worth it if avian-flu is the only exception which is slated for removal anyways.

Comment thread .github/workflows/update-vendored.yaml Outdated
Comment thread .github/workflows/update-vendored.yaml Outdated
@joverlee521 joverlee521 changed the title Add GH Action workflow to update vendored repos Add GH Action workflow to update downstream repos Apr 14, 2026
Searches the Nextstrain GitHub org to find repos that have the `.gitrepo`
file with the nextstrain/shared remote to create a matrix of repos to
potentially update. Installs and uses `git subrepo` to pull in the latest
changes with the `--force` flag to avoid merge conflicts to due
rebasing in the downstream repo. If there are changes pulled down, then
`git subrepo` will create a single commit. If there is a single commit,
then push up the changes to a branch and create or update the PR in the
downstream repo. Nothing happens if there were no changes and workflow
exits with error if it encounters more than one commit.
@joverlee521 joverlee521 force-pushed the update-vendored branch 3 times, most recently from ca62f4a to e3df43a Compare April 14, 2026 22:12
I was unable to get the search/code API to work with the 'OR' syntax
so just added a separate query for nextstrain/ingest and concatenated
the two arrays. Deduplicated the final array to guard against potential
overlap.
@joverlee521
Copy link
Copy Markdown
Contributor Author

nextstrain/rubella#53 shows example of automated update causing workflow error flagged by CI. I created a separate branch/PR to fix the error in nextstrain/rubella#54 so that future updates from this GHA does not overwrite the manual fixes.

We should only be keeping a single copy of the vendored repo in each
downstream repo, so deduplicate the matrix by repo. In cases where there
are multiple copies, we are prioritizing the `nextstrain/shared` remote
since that is the newer version.

This is prompted by the error in the workflow when avian-flu had two
paths to update.

<#73 (comment)>
Comment on lines +30 to +31
shared_matrix=$(gh api -X GET search/code \
-f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub API calls are limited to 30 results by default. --paginate + --slurp should work well to get everything. Something like this (copied from LLM output, untested):

shared_matrix=$(gh api --paginate --slurp -X GET search/code \
  -f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \
  | jq -c '
      [.[].items[] | {
        "repo": .repository.full_name,
        "path": (.path | split("/")[0:-1] | join("/"))
      }]
    ')

Same goes for the other API call.

Comment on lines +111 to +112
if [[ "$changes" == "1" ]]; then
git push --force origin HEAD
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about --force-with-lease? This would allow us to make edits directly to the PR safely without risk of being overwritten by bot force-push - an alternative to the flow of creating a new PR entirely e.g. nextstrain/rubella#54

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I exclusively use --force-with-lease now for this reason

repository: ${{ matrix.repo }}
token: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_REPO }}
# Checkout git-subrepo _after_ the downstream repo to ensure that we
# keep it in a path within the downstream repo that does not interefere
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# keep it in a path within the downstream repo that does not interefere
# keep it in a path within the downstream repo that does not interfere

GH_TOKEN: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_REPO }}
title: '[bot] Update ${{ env.VENDORED_PATH }}'
body: |
This PR was automaticaly created by http://github.com/nextstrain/shared/actions/runs/${{ github.run_id }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PR was automaticaly created by http://github.com/nextstrain/shared/actions/runs/${{ github.run_id }}
This PR was automatically created by https://github.com/nextstrain/shared/actions/runs/${{ github.run_id }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants