Skip to content

fix: rewrite checkpoint commands for runner allowlist; broaden license and metadata checks for org-wide files#69

Merged
CybotTM merged 4 commits intomainfrom
fix/checkpoint-allowlist-and-org-wide-fallback
May 4, 2026
Merged

fix: rewrite checkpoint commands for runner allowlist; broaden license and metadata checks for org-wide files#69
CybotTM merged 4 commits intomainfrom
fix/checkpoint-allowlist-and-org-wide-fallback

Conversation

@CybotTM
Copy link
Copy Markdown
Member

@CybotTM CybotTM commented May 4, 2026

Summary

Two unrelated but related-in-scope fixes to skills/github-project/checkpoints.yaml, both surfaced by running the assessment runner from automated-assessment against netresearch/agent-harness-skill:

  1. Allowlist conflicts on feat: add Go project CI checklist and related skills section #6, feat: add release announcement discussion to release-labeler #23, refactor: trim SKILL.md for token efficiency #30, feat: add skill validation CI job #31 — these checkpoints could never actually evaluate because the runner rejected their command strings as unsafe. Rewritten to use the runner's first-class file-existence and gh_api types.
  2. License + project-metadata checks that didn't account for split-license repos and Netresearch's org-wide .github repo. Broadened the LICENSE brace expansion; demoted SECURITY/CONTRIBUTING/CODEOWNERS from warning to info with explicit notes about the org-wide fallback.

Allowlist rewrites (commit 8aa7ed1)

The runner's is_safe_eval_command rejects:

  • ;, &&, ||, backticks, $() (command-chaining)
  • .. path segments anywhere
  • ./X paths except ./vendor/bin/*
  • path-prefixed cmd_base
  • bare base commands not in the allowlist

Pipes (|) ARE allowed mid-pipeline, but the runner's simple line parser sees the YAML literal-block scalar indicator (target: |) as the first whitespace-separated token of the command, and rejects with '|' not in allowed command whitelist.

Checkpoint Before After
GH-6 type: command with test -f X || test -f Y || ... (5 alternatives) type: file_exists with brace expansion
GH-23 type: command with test -f X || test -f Y type: file_exists with brace expansion
GH-30 type: command multi-line literal block invoking gh api type: gh_api (runner skips with clear evidence; semantically covered by GH-32 in llm_reviews)
GH-31 type: command multi-line literal block invoking gh api type: gh_api (same as GH-30)

License + metadata broadening (commits 781b156, e79050d)

GH-2 only matched the literal filename LICENSE, so any skill repo using the SPDX-style multi-file naming (e.g. LICENSE-MIT for code + LICENSE-CC-BY-SA-4.0 for prose) failed even though it is properly licensed. Broadened the brace expansion per the issue description.

GH-3/04/05 check for files that are typically provided org-wide via netresearch/.github. The runner cannot cheaply check org-wide files (gh api is intentionally not in the allowlist, and that is the right tradeoff). Demoted these from warning to info and updated descriptions to mention the org-wide fallback explicitly. Also extended each target with brace expansion for the documented in-repo locations (e.g. .github/SECURITY.md, docs/CONTRIBUTING.md, .github/CODEOWNERS — GitHub honours CODEOWNERS in any of those locations).

Severity changes

Checkpoint Before After Why
GH-3 (SECURITY.md) warning info Org-wide fallback common; runner can't verify it
GH-4 (CONTRIBUTING.md) warning info Org-wide fallback common; runner can't verify it
GH-5 (CODEOWNERS) warning info Org-wide fallback common; runner can't verify it

Why no gh-api-based checks for org-wide files?

The runner's allowlist deliberately excludes gh. Adding it would let arbitrary gh api invocations run during automated assessment, which broadens the security surface for a payoff (org-wide file detection) that is only meaningful for a small subset of repos. The severity demotion is the correct tradeoff: the checks still surface as info so consumers can choose to address them, without flagging org-aware repos as failing.

Verification

Run the runner against netresearch/agent-harness-skill (the same repo that surfaced these issues during /assess):

bash /home/sme/.claude/skills/automated-assessment/scripts/run-checkpoints.sh \
  skills/github-project/checkpoints.yaml /home/sme/p/agent-harness-skill/main

Before this PR: 5 passed, 21 failed, 0 skipped — including 4 Command rejected: ... rejections that the assessor cannot remediate from the project side.

After this PR: 8 passed, 16 failed, 2 skipped — zero Command rejected rejections. Remaining failures are all genuine repo gaps (missing files, missing patterns in target files), not allowlist issues. GH-30 and GH-31 cleanly skip with explanatory evidence.

Test plan

  • Skill-validation CI passes
  • Assessment runner produces zero Command rejected results on agent-harness-skill
  • /assess github-project against another netresearch repo evaluates all checkpoints without rejection

Copilot AI review requested due to automatic review settings May 4, 2026 13:48
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the checkpoints.yaml file to modernize file existence checks and branch protection audits. Specifically, it replaces several shell-based command checks with a more structured file_exists type using glob-like patterns for licenses, security policies, and dependency configurations. It also introduces a new gh_api type for branch protection rules to handle GitHub API interactions more cleanly. The review feedback suggests further expanding the allowed filenames and extensions for licenses, dependency tools, and GitHub Actions workflows to include common variants like COPYING, .yaml, and .json5.

Comment thread skills/github-project/checkpoints.yaml Outdated
Comment thread skills/github-project/checkpoints.yaml Outdated
Comment thread skills/github-project/checkpoints.yaml
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates skills/github-project/checkpoints.yaml so the GitHub project skill's mechanical checks align better with the assessment runner and better recognize common repository layouts.

Changes:

  • Replaces unsafe command-based checks in GH-6 and GH-23 with file_exists targets using brace expansion.
  • Broadens GH-2 through GH-5 to accept more license/community-health file locations and adjusts severities/descriptions for org-wide defaults.
  • Rewrites GH-30 and GH-31 from shell commands to gh_api checkpoint definitions intended to skip cleanly instead of being rejected by the runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/github-project/checkpoints.yaml
Comment thread skills/github-project/checkpoints.yaml Outdated
Comment thread skills/github-project/checkpoints.yaml Outdated
CybotTM added 4 commits May 4, 2026 18:59
…allowlist

The automated-assessment runner enforces a command allowlist that rejects
command-chaining metacharacters (; && || backticks $()) and only accepts
specific base commands. Four checkpoints failed allowlist validation and
were never actually evaluated against target projects.

GH-6 and GH-23 used `test -f X || test -f Y` chains to test for any of
several files. Rewritten as `type: file_exists` with brace expansion,
which is the runner's first-class idiom for "any of these files".

GH-30 and GH-31 used multi-line YAML literal-block scalars (`target: |`)
to invoke `gh api` for branch protection audits. The runner's simple
line parser sees the literal-block indicator as the first token and
rejects with "'|' not in allowed command whitelist". Even with the
allowlist passing, these cannot be executed mechanically — they require
GitHub API auth context. Converted to `type: gh_api`, which the runner
recognises and skips with a clear evidence string. The semantically
equivalent audit is preserved in GH-32 (llm_reviews).
…ce repos

Skill repos commonly use split licences with the SPDX-style multi-file
naming convention (LICENSE-MIT for code + LICENSE-CC-BY-SA-4.0 for prose,
LICENSE-APACHE + LICENSE-MIT dual licensing, etc.). The previous check
only looked for a literal `LICENSE` filename and failed against any repo
following this convention even though they are properly licensed.

Replace the literal target with a brace expansion covering the common
single-file names (LICENSE, LICENSE.md, LICENSE.txt) plus the SPDX-style
LICENSE-* variants. file_exists with brace expansion passes if any one
of the alternatives is present.
…in-repo locations

GH-3 (SECURITY.md), GH-4 (CONTRIBUTING.md), and GH-5 (CODEOWNERS) check
for project-local files. GitHub honours these files org-wide via the
special `.github` repo (e.g. netresearch/.github), so many repos that
appear "missing" these files actually have them satisfied at the org level.

The assessment runner cannot cheaply verify org-wide files (it would have
to call the GitHub API, which requires auth and is outside the command
allowlist). Demoting these from `warning` to `info` better reflects the
real situation: nice-to-have at the repo level, often satisfied elsewhere.
The descriptions now mention the org-wide fallback explicitly.

Also broaden each target with brace expansion for the documented in-repo
locations:
  - SECURITY.md      → {SECURITY.md, .github/SECURITY.md, docs/SECURITY.md}
  - CONTRIBUTING.md  → {CONTRIBUTING.md, .github/CONTRIBUTING.md, docs/CONTRIBUTING.md}
  - CODEOWNERS       → {.github/CODEOWNERS, CODEOWNERS, docs/CODEOWNERS}

GitHub itself accepts CODEOWNERS in any of those locations.
…ert GH-5 demotion

Addresses PR #69 review feedback:

GH-2 (Copilot, Gemini): added missing license filenames the skill itself
documents — COPYING, COPYING.md, COPYING.txt, LICENSE-BSD-2-Clause,
LICENSE-BSD-3-Clause, LICENSE-GPL-2.0, LICENSE-GPL-3.0, LICENSE-LGPL,
LICENSE-LGPL-3.0, LICENSE-AGPL-3.0, LICENSE-MPL-2.0.

GH-5 (Copilot): reverted demotion to info — CODEOWNERS must exist in the
repository itself on the default branch (.github/, root, or docs/), and
the org-wide .github mechanism explicitly does NOT cover it (that
mechanism only provides templates and community-health files, never
review-routing rules). Severity restored to warning; description
corrected to remove the misleading org-wide claim.

GH-6 (Gemini): added .github/dependabot.yaml, renovate.json5, renovate
config variants, and the .json5 form for Renovate.

GH-13/14 (Gemini follow-on): brace-expanded targets to also accept
.github/dependabot.yaml.

GH-19/20/21 (Gemini): glob target now `.github/workflows/*.{yml,yaml}`
to match either extension.

GH-24..27 (Gemini): glob target now
`.github/workflows/auto-merge*.{yml,yaml}`.

Push-back on Copilot 'auto-merge.yml weakens GH-23' comment: GH-24..27
already use a glob (auto-merge*.yml — now expanded to *.{yml,yaml})
that matches both filenames, so adding auto-merge.yml to GH-23 does
not produce false failures downstream.

GH-30/31 desc: wrapped long lines as YAML folded scalars (no impact on
runner — these are gh_api types, the desc field is human-readable).

.yamllint.yml: line-length bumped 160 → 360 to accommodate single-line
brace-expansion targets that the runner cannot read as folded scalars
(the runner's parser is bash regex, not a YAML library).
@CybotTM CybotTM force-pushed the fix/checkpoint-allowlist-and-org-wide-fallback branch from e79050d to 932b1a9 Compare May 4, 2026 17:04
@CybotTM CybotTM merged commit 5852d92 into main May 4, 2026
12 checks passed
@CybotTM CybotTM deleted the fix/checkpoint-allowlist-and-org-wide-fallback branch May 4, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants