chore(ai): Add check-code-attribution skill (JAVA-499)#5449
chore(ai): Add check-code-attribution skill (JAVA-499)#54490xadam-brown wants to merge 1 commit into
Conversation
📲 Install BuildsAndroid
|
Adds a check-code-attribution skill that validates license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatiblity against Sentry's licensing policy. Focus is limited to the branch diff. Reports any issues found via PR comments (when run on CI) or to the terminal (when run locally). To run it in Claude Code: ``` /check-code-attribution ``` Runs on CI automatically via [Warden](https://warden.sentry.dev/). - Purely advisory / does not block merge. - Generates PR comments with code suggestions for all discovered issues. - Automatically manages removing stale comments as PRs are updated. Current Warden configs: ┌─────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐ │ Setting │ Value │ Effect │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ model │ anthropic/claude-sonnet-4-6 │ Model used for analysis │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ maxTurns │ 30 │ Max tool calls per chunk │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ skill │ check-code-attribution │ Per-file vendored code attribution check │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failOn │ off │ Do not fail workflow if attribution issues found │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOn │ medium │ Show findings at >= medium severity via PR comment│ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ requestChanges │ false │ Never post REQUEST_CHANGES comments on PRs │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failCheck │ false │ No red X on workflow in GitHub UI if it fails │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ triggers │ pull_request + local │ Runs on PR open/sync and local warden invocations │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOnSuccess │ false (default) │ No comment when everything is clean │ └─────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘ Going forward, we can consider blocking PRs once we've had a chance to vet behavior in the wild.
9ced7c7 to
93b92d7
Compare
| } else { | ||
| const reason = s.expectFinding | ||
| ? 'expected finding (>= medium), got none' | ||
| : `expected no finding (>= medium), got ${count}`; | ||
| console.log(`${RED}FAIL${RESET} ${s.id} (${reason})`); | ||
|
|
||
| failures.push({ |
There was a problem hiding this comment.
Bug: The validateExpected function only validates .java fixture files, ignoring other file types like .md. This can lead to silently passing tests if non-Java fixtures are missing.
Severity: MEDIUM
Suggested Fix
Modify the validateExpected function to check all files in the scenarios directory against the EXPECTED.json manifest, not just .java files. This ensures all test fixtures are accounted for during validation.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location:
.claude/skills/check-code-attribution/validation-tests/assert-scenarios.mjs#L299-L305
Potential issue: The validation logic in the `validateExpected` function is incomplete.
It filters for and validates only `.java` files found on disk against the
`EXPECTED.json` manifest. Consequently, other required test fixture files, such as
`THIRD_PARTY_NOTICES.mismatch-snippet.md`, are not validated. If such a file were
accidentally deleted or renamed, the validation step would pass silently. The test would
only fail later when a shell script attempts to access the non-existent file, making the
root cause harder to identify.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 93b92d7. Configure here.
|
|
||
| ## Example — HeaderCompleteAndNoticePresent (Apache 2.0) | ||
|
|
||
| **Source:** https://github.com/example/complete-with-notices<br> |
There was a problem hiding this comment.
Catalog source URL mismatches scenario header URLs
Medium Severity
The header-complete-and-notice-present scenario expects no findings, but the catalog Source URL (https://github.com/example/complete-with-notices) doesn't match either URL in the Java file header (https://github.com/example, https://github.com/example/something). Since source URL is a required field and SKILL.md flags header-vs-NOTICES inconsistencies on required fields, the AI can intermittently report a finding here, making this negative test case flaky.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 93b92d7. Configure here.
| @@ -0,0 +1,21 @@ | |||
| /* | |||
There was a problem hiding this comment.
Most of this PR's diff is test fixtures for the automated tests (runnable via check-code-attribution-tests.sh).
The interesting (ie, non-test) parts live in SKILL.md and warden.toml.


📜 Description
Adds a
check-code-attributionskill that validates license headers andTHIRD_PARTY_NOTICES.mdentries for code copied or adapted from third parties. Also verifies license compatibility against Sentry's Licensing Policy.The skill focuses on the branch diff only. It's a pure-LLM approach, in contrast to the part-deterministic, part-LLM approach we decided against from #5401.
Reports findings via PR comments when run on CI, or to the terminal when run locally.
Local
To run it from Claude Code:
CI
Warden configuration ensures the skill runs automatically on all PRs:
💡 Motivation and Context
Third-party code attribution is a legal and compliance requirement. Currently, attribution correctness is only caught during manual code review. This skill automates detection of vendored code in branch diffs and can help us flag missing or incomplete attributions before a PR is merged.
Background: Click to expand
Sentry SDKs and third-party code
3 possible ways third-party code enters Sentry’s SDKs (including sentry-java):
1. Plain vanilla dependencies
2. Shaded code
3. Vendored code
All third-party code must be properly attributed, and licenses must be compatible with Sentry’s licensing policies.
Plain deps + shaded code: We run an
enforce-license-complianceGitHub workflow that applies a FOSSA check to all plain vanilla dependencies and our few shaded dependencies, which ensures their licenses are properly attributed and are compatible with Sentry’s licensing policies.Vendored code: Relies on a manual process where developers add attributions to files containing vendored code + include a corresponding entry is included in the THIRD_PARTY_NOTICES.md file that ships with the SDK. Developers are also responsible for ensuring license compatibility.
The criteria for what counts as a proper attribution of vendored code lives in the AGENTS.md file under the heading “Third-Party Code Attribution”.
Goal of this PR: Create a skill that helps us properly attribute vendored code
Types of vendored code:
The skill introduced in this PR protects (1) from regression and identifies instances of (2). (Addressing (3) is out of scope – and is obviously non-trivial.)
Skill does not mandate that license headers exactly match the template from
AGENTS.md(link) so long as all template fields are present.That^^ lets us maintain our current, diverse header formats and remain relatively unopinionated going forward.
Example output
Local runs
PR comments
💚 How did you test it?
[1] Automated validation tests (
check-code-attribution-tests.sh) with scenario files covering:Note: the tests are not run on CI / are only run manually atm (see the
check-code-attribution-tests.shscript).[2] Ran the skill on branches with known attribution issues to verify correct detection and reporting.
Manual tests + output: Click to expand
Note the skill's output format has changed since these tests were run, but the behavior remains the same.
Diff 1: Remove entire license header
Output 1
Required attribution field(s) removed:
Diff 2: Modify existing license header, but retain all required fields
Output 2
Vendored code detected (Apache Commons Collections) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 3: Modify existing license header by removing one or more required fields
Output 3
Required attribution field(s) removed:
Copyright (C) 2016 Matej Tymeswas removed from the license header. Please restore it.Diff 4: Leave existing license header unchanged, but make an inconsistent modification to THIRD_PARTY_NOTICES.md entry
Output 4
Entry metadata inconsistent with source file headers:
THIRD_PARTY_NOTICES.md, but the source files (QueueFile.java,FileObjectQueue.java,ObjectQueue.java) all still say "Copyright (C) 2010Square, Inc."
Diff 5: Leave existing license header unchanged, but remove THIRD_PARTY_NOTICES.md entry
Output 5
Source file(s) still reference this library:
-
io.sentry.android.core.SentryShakeDetectorstill contains attribution header for Square's Seismic. Either restore theTHIRD_PARTY_NOTICES.mdentry or remove the vendored code.Diff 6: Add newly-vendored code with valid license header and THIRD_PARTY_NOTICES.md entry
Output 6
Vendored code detected (Dropwizard Metrics SlidingWindowReservoir) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 7: Add newly-vendored code with valid license header but no THIRD_PARTY_NOTICES.md entry
Output 7
Vendored code detected (Caffeine Cache) — attribution header is complete.
THIRD_PARTY_NOTICES.md. An entry needs to be added.Diff 8: Add newly-vendored code with an invalid license header and existing THIRD_PARTY_NOTICES.md entry
Output 8
Vendored code detected (Resilience4j RateLimiter) — missing required fields:
Copyright 2019 Robert Winkler and Bohdan StorozhukandLicensed under the Apache License, Version 2.0(per the existing
THIRD_PARTY_NOTICES.mdentry).Diff 9: Add newly-vendored code with an invalid license header and no THIRD_PARTY_NOTICES.md entry
Output 9
Vendored code detected (Guava RateLimiter) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added for Guava RateLimiter.Diff 10: Add newly-vendored code with an invalid license header, no THIRD_PARTY_NOTICES.md entry, and a new license type
Output 10
Vendored code detected (compact-writer) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added.THIRD_PARTY_NOTICES.md. Please verify it is compatible with Sentry's licensing policies:https://open.sentry.io/licensing/.
Diff 11: False positive
Output 11 (numbered as “1.” because it’s the first entry in the False Positives section)
[3] Local Warden runs.
Running Warden locally: Click to expand
Added to my
.*profile:[4] Pushed up diffs with attribution violations in a draft PR to vet the UX (see #5444).
📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps
failOn/failCheckonce we've vetted behavior in the wild#skip-changelog