Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions review-calibration-bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Review Calibration Bench

Dependency-free peer-review calibration and coaching signals for the Community & Reputation System bounty.

This module focuses on the quality gate before peer-review activity increases a researcher's public reputation. It compares structured review rubric scores against consensus panels, identifies leniency/severity drift, flags reproducibility blind spots, and emits transparent trust-adjustment and coaching actions.

## Run

```bash
npm run check
npm test
npm run demo
```

## Demo Output

```text
Program: community-review-q2-2026
Status: coaching-needed
Reviewers calibrated: 3
Coaching actions: 7
Quarantined reviewers: 1
Top reviewer: ada (trusted-reviewer)
Top action: Review practice set: rigor scores are higher than consensus.
```

## Files

- `src/review-calibration.js` builds calibration reports, leaderboard scores, coaching actions, trust adjustments, dashboard summary, audit trail, and stable digest.
- `data/sample-calibration.json` contains synthetic structured reviews, consensus panels, reviewer modes, and contributor signals.
- `test/review-calibration.test.js` verifies calibration scores, drift classification, blind-spot detection, quarantine behavior, leaderboard ordering, and digest stability.
- `docs/requirement-map.md` maps this slice to issue #15.
- `docs/demo.svg` and `docs/demo.mp4` provide a short visual artifact for review.

No real private review content, identity secret, or external service credential is used.
171 changes: 171 additions & 0 deletions review-calibration-bench/data/sample-calibration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
{
"programId": "community-review-q2-2026",
"asOf": "2026-05-15T00:00:00Z",
"rubric": {
"criteria": [
"clarity",
"rigor",
"novelty",
"reproducibility"
],
"weights": {
"clarity": 0.2,
"rigor": 0.3,
"novelty": 0.2,
"reproducibility": 0.3
},
"calibrationThreshold": 0.7
},
"projects": [
{
"id": "project-organoid-benchmark",
"domain": "biology",
"consensus": {
"clarity": 4,
"rigor": 4,
"novelty": 3,
"reproducibility": 2
}
},
{
"id": "project-graph-protocol",
"domain": "computational-science",
"consensus": {
"clarity": 3,
"rigor": 5,
"novelty": 4,
"reproducibility": 4
}
},
{
"id": "project-open-dataset-release",
"domain": "data-science",
"consensus": {
"clarity": 5,
"rigor": 3,
"novelty": 2,
"reproducibility": 5
}
}
],
"reviews": [
{
"id": "review-ada-1",
"reviewerId": "ada",
"projectId": "project-organoid-benchmark",
"mode": "public",
"scores": {
"clarity": 4,
"rigor": 4,
"novelty": 3,
"reproducibility": 3
},
"comments": [
"Strong protocol trace, but execution container still needs one replication note."
]
},
{
"id": "review-ada-2",
"reviewerId": "ada",
"projectId": "project-graph-protocol",
"mode": "public",
"scores": {
"clarity": 3,
"rigor": 4,
"novelty": 4,
"reproducibility": 4
},
"comments": [
"Good evidence paths and adequate reproducibility metadata."
]
},
{
"id": "review-bohr-1",
"reviewerId": "bohr",
"projectId": "project-organoid-benchmark",
"mode": "anonymous",
"scores": {
"clarity": 5,
"rigor": 5,
"novelty": 5,
"reproducibility": 5
},
"comments": [
"Excellent across all dimensions."
]
},
{
"id": "review-bohr-2",
"reviewerId": "bohr",
"projectId": "project-open-dataset-release",
"mode": "anonymous",
"scores": {
"clarity": 5,
"rigor": 5,
"novelty": 4,
"reproducibility": 5
},
"comments": [
"Dataset appears ready for reuse."
]
},
{
"id": "review-curie-1",
"reviewerId": "curie",
"projectId": "project-graph-protocol",
"mode": "double-blind",
"scores": {
"clarity": 2,
"rigor": 4,
"novelty": 3,
"reproducibility": 2
},
"comments": [
"Promising, but missing a complete notebook execution trail."
]
},
{
"id": "review-curie-2",
"reviewerId": "curie",
"projectId": "project-open-dataset-release",
"mode": "double-blind",
"scores": {
"clarity": 4,
"rigor": 3,
"novelty": 2,
"reproducibility": 3
},
"comments": [
"Good data dictionary, but independent rerun evidence is incomplete."
]
}
],
"contributors": [
{
"reviewerId": "ada",
"roles": [
"review",
"validation"
],
"completedBounties": 2,
"endorsements": 5
},
{
"reviewerId": "bohr",
"roles": [
"review"
],
"completedBounties": 4,
"endorsements": 8
},
{
"reviewerId": "curie",
"roles": [
"review",
"reproducibility"
],
"completedBounties": 1,
"endorsements": 3
}
]
}
Binary file added review-calibration-bench/docs/demo.mp4
Binary file not shown.
29 changes: 29 additions & 0 deletions review-calibration-bench/docs/demo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions review-calibration-bench/docs/requirement-map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Requirement Map

This module contributes a focused structured-review quality layer for issue #15, "Community & Reputation System."

| Issue area | Covered by this module |
| --- | --- |
| Peer reviews and comments | Scores structured reviews against discipline-neutral rubric criteria: clarity, rigor, novelty, reproducibility |
| Optional scoring quality | Compares reviewer scores against consensus panels and identifies systematic leniency, severity, and inconsistency |
| Review history on profiles | Builds reviewer calibration reports, modes used, review counts, and audit-trail events |
| Contributor credits | Includes CRediT-style reviewer roles and completed bounty / endorsement contribution signals |
| Reputation scoring | Emits transparent trust adjustments, tiers, and quarantine decisions when calibration is weak |
| Incentive tiers | Produces trusted reviewer, calibrated reviewer, coaching-needed, and mentor-required tiers |

## Distinctness

Existing #15 submissions cover broad community reputation ledgers, CRediT graphs, badges, leaderboards, abuse detection, and appeals. This module focuses on the quality gate before peer-review activity increases reputation:

- Does the reviewer score close to consensus?
- Is a reviewer systematically too lenient or too severe?
- Does a reviewer overlook reproducibility?
- Should the review count toward reputation immediately, or enter a coaching queue first?

## Verification

```bash
cd review-calibration-bench
npm run check
npm test
npm run demo
```
18 changes: 18 additions & 0 deletions review-calibration-bench/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"name": "review-calibration-bench",
"version": "1.0.0",
"private": true,
"description": "Dependency-free peer-review calibration and coaching signals for scientific reputation systems.",
"scripts": {
"check": "node --check src/review-calibration.js && node --check scripts/demo.js && node --check test/review-calibration.test.js",
"demo": "node scripts/demo.js",
"test": "node test/review-calibration.test.js"
},
"keywords": [
"peer-review",
"reputation",
"calibration",
"review-quality"
],
"license": "MIT"
}
16 changes: 16 additions & 0 deletions review-calibration-bench/scripts/demo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
const fs = require("node:fs");
const path = require("node:path");
const { buildCalibrationBench } = require("../src/review-calibration");

const samplePath = path.join(__dirname, "..", "data", "sample-calibration.json");
const input = JSON.parse(fs.readFileSync(samplePath, "utf8"));
const report = buildCalibrationBench(input);

console.log(`Program: ${report.programId}`);
console.log(`Status: ${report.dashboard.status}`);
console.log(`Reviewers calibrated: ${report.dashboard.reviewerCount}`);
console.log(`Coaching actions: ${report.dashboard.coachingActionCount}`);
console.log(`Quarantined reviewers: ${report.dashboard.quarantinedReviewerCount}`);
console.log(`Top reviewer: ${report.leaderboard[0].reviewerId} (${report.leaderboard[0].tier})`);
console.log(`Top action: ${report.dashboard.topAction}`);
console.log(`Digest: ${report.digest}`);
Loading