Add GH Action workflow to update downstream repos#73
Add GH Action workflow to update downstream repos#73joverlee521 wants to merge 3 commits intomainfrom
Conversation
| git config user.email "${{ vars.GIT_USER_EMAIL_NEXTSTRAIN_BOT }}" | ||
|
|
||
| git switch -c "$branch" | ||
| git subrepo pull "$VENDORED_PATH" --force |
There was a problem hiding this comment.
(it's been (thankfully) a while since I interacted with subrepo) if there were local changes, in what situation(s) would we want to override them?
There was a problem hiding this comment.
In practice, I have not seen any of the vendored repos use long-term local changes. I've only seen temporary test of local changes that are ported to the central nextstrain/shared repo. I've mostly had to use the --force flag to work around failures due to rebasing in the vendored repo, as described in the README:
Lines 38 to 44 in c29898f
There was a problem hiding this comment.
--force is only necessary if someone had done a manual rebase of the git subrepo-generated commit, right? That shouldn't be an issue going forwards with these automated PRs.
I'd vote to drop --force for visibility into any local changes, instead of silently discarding them.
| matrix=$(gh api -X GET search/code \ | ||
| -f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \ |
There was a problem hiding this comment.
This should also search for references to the old repo name nextstrain/ingest. See code search
There was a problem hiding this comment.
Ah, good catch, totally forgot about the old name.
There was a problem hiding this comment.
Added in ada3160.
This actually flagged an issue that I didn't consider before: avian-flu has both shared/vendored and ingest/vendored, which causes one of the pushes to fail because of mismatched ref. avian-flu should be cleaned up to only have a single copy of the vendored repo, but this workflow should probably also flag when there are multiple vendored paths for the same repo.
There was a problem hiding this comment.
avian-flu/ingest/vendored will need to be removed as part of nextstrain/avian-flu#67.
There was a problem hiding this comment.
Added ab5399f to deduplicate by repo and only update nextstrain/shared if both are in the repo. I decided not to add another step for flagging multiple paths since that feels out of the purview of this repo...
There was a problem hiding this comment.
This actually flagged an issue that I didn't consider before: avian-flu has both
shared/vendoredandingest/vendored, which causes one of the pushes to fail because of mismatched ref.
The alternative is to push to different branches and open separate PRs for each. But that requires additional changes and probably not worth it if avian-flu is the only exception which is slated for removal anyways.
Searches the Nextstrain GitHub org to find repos that have the `.gitrepo` file with the nextstrain/shared remote to create a matrix of repos to potentially update. Installs and uses `git subrepo` to pull in the latest changes with the `--force` flag to avoid merge conflicts to due rebasing in the downstream repo. If there are changes pulled down, then `git subrepo` will create a single commit. If there is a single commit, then push up the changes to a branch and create or update the PR in the downstream repo. Nothing happens if there were no changes and workflow exits with error if it encounters more than one commit.
ca62f4a to
e3df43a
Compare
I was unable to get the search/code API to work with the 'OR' syntax so just added a separate query for nextstrain/ingest and concatenated the two arrays. Deduplicated the final array to guard against potential overlap.
e3df43a to
ada3160
Compare
|
nextstrain/rubella#53 shows example of automated update causing workflow error flagged by CI. I created a separate branch/PR to fix the error in nextstrain/rubella#54 so that future updates from this GHA does not overwrite the manual fixes. |
We should only be keeping a single copy of the vendored repo in each downstream repo, so deduplicate the matrix by repo. In cases where there are multiple copies, we are prioritizing the `nextstrain/shared` remote since that is the newer version. This is prompted by the error in the workflow when avian-flu had two paths to update. <#73 (comment)>
| shared_matrix=$(gh api -X GET search/code \ | ||
| -f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \ |
There was a problem hiding this comment.
GitHub API calls are limited to 30 results by default. --paginate + --slurp should work well to get everything. Something like this (copied from LLM output, untested):
shared_matrix=$(gh api --paginate --slurp -X GET search/code \
-f q='org:nextstrain filename:.gitrepo "remote = https://github.com/nextstrain/shared"' \
| jq -c '
[.[].items[] | {
"repo": .repository.full_name,
"path": (.path | split("/")[0:-1] | join("/"))
}]
')Same goes for the other API call.
| if [[ "$changes" == "1" ]]; then | ||
| git push --force origin HEAD |
There was a problem hiding this comment.
How about --force-with-lease? This would allow us to make edits directly to the PR safely without risk of being overwritten by bot force-push - an alternative to the flow of creating a new PR entirely e.g. nextstrain/rubella#54
There was a problem hiding this comment.
👍 I exclusively use --force-with-lease now for this reason
| repository: ${{ matrix.repo }} | ||
| token: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_REPO }} | ||
| # Checkout git-subrepo _after_ the downstream repo to ensure that we | ||
| # keep it in a path within the downstream repo that does not interefere |
There was a problem hiding this comment.
| # keep it in a path within the downstream repo that does not interefere | |
| # keep it in a path within the downstream repo that does not interfere |
| GH_TOKEN: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_REPO }} | ||
| title: '[bot] Update ${{ env.VENDORED_PATH }}' | ||
| body: | | ||
| This PR was automaticaly created by http://github.com/nextstrain/shared/actions/runs/${{ github.run_id }} |
There was a problem hiding this comment.
| This PR was automaticaly created by http://github.com/nextstrain/shared/actions/runs/${{ github.run_id }} | |
| This PR was automatically created by https://github.com/nextstrain/shared/actions/runs/${{ github.run_id }} |
Description of proposed changes
Searches the Nextstrain GitHub org to find repos that have the
.gitrepofile with the nextstrain/shared or nextstrain/ingest remote to create a matrix of repos to potentially update. Installs and usesgit subrepoto pull in the latest changes with the--forceflag to avoid merge conflicts to due rebasing in the downstream repo. If there are changes pulled down, thengit subrepowill create a single commit. If there is a single commit, then push up the changes to a branch and create or update the PR in the downstream repo. Nothing happens if there were no changes and workflow exits with error if it encounters more than one commit.Related issue(s)
Prompted by nextstrain/public#39 (comment)
Checklist
If adding a script, add an entry for it in the README.