Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions scientific-artifact-provenance-chain/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Scientific Artifact Provenance Chain

This module adds a dependency-free provenance receipt builder for scientific data and code hosting. It gives reviewers a deterministic way to prove which datasets, code files, environments, and outputs belong to a reproducible research package.

## What It Covers

- Content-hashed artifact manifests for datasets, code, figures, models, supplements, and runtime environments.
- Metadata-aware previews for tabular data and source files.
- Dataset/code version diffs with hash, size, line, and metadata change signals.
- PROV-like activity chains that connect raw inputs, analysis commands, environments, and generated outputs.
- FAIR scoring and validation warnings for missing license, metadata, runtime, or provenance references.
- JSON-LD and DataCite-style exports for machine discovery and DOI registration workflows.
- Rerun plans that tell reviewers which commands and environments are ready to execute.

## Usage

```bash
node scientific-artifact-provenance-chain/test.js
node scientific-artifact-provenance-chain/demo.js
```

```js
const {
buildProvenanceReceipt,
validateProvenanceReceipt,
exportJsonLd,
} = require("./index");

const receipt = buildProvenanceReceipt({
projectId: "scibase:project:enzyme-kinetics",
title: "Enzyme kinetics reproducibility bundle",
doi: "10.5555/scibase.enzyme.2026",
license: "CC-BY-4.0",
creators: ["Researcher"],
artifacts: [
{
path: "data/raw/assay.csv",
content: "sample,rate\nA,1.2\n",
license: "CC-BY-4.0",
metadata: { variables: ["sample", "rate"] },
},
],
activities: [],
});

console.log(validateProvenanceReceipt(receipt));
console.log(exportJsonLd(receipt));
```

## Requirement Mapping

| Issue #14 requirement | Implementation |
| --- | --- |
| Store datasets, code, models, figures, and supplementary files | `makeArtifactRecord()` classifies supported artifact types and records path, role, size, hash, license, and metadata. |
| Folder organization and versioning | Artifact paths are normalized as project-relative paths; `compareArtifactVersions()` produces deterministic version diffs. |
| Metadata-aware previews | CSV/TSV previews expose columns and line counts; code previews expose language and line counts. |
| JSON-LD, DataCite, schema.org | `exportJsonLd()` and `exportDataCite()` produce machine-readable metadata payloads. |
| FAIR compliance | `scoreFair()` and `validateProvenanceReceipt()` check findability, accessibility, interoperability, and reusability signals. |
| Executable environments and reruns | Activities include commands and environment references; `buildRerunPlan()` reports reviewer-ready execution steps. |
| Reproducibility review | Provenance validation detects missing references, unlicensed artifacts, missing runtime metadata, and derived artifacts without generation activity. |

## Design Notes

The module stores only content hashes and caller-provided content in memory for deterministic tests. It does not read arbitrary local files, execute commands, call external services, or require credentials. That makes it safe to run in CI and suitable as a foundation for a future upload/API layer.
61 changes: 61 additions & 0 deletions scientific-artifact-provenance-chain/demo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"use strict";

const {
buildProvenanceReceipt,
buildRerunPlan,
exportDataCite,
exportJsonLd,
validateProvenanceReceipt,
} = require("./index");

const receipt = buildProvenanceReceipt({
projectId: "scibase:demo:soil-carbon",
title: "Soil carbon notebook replay package",
doi: "10.5555/scibase.soil-carbon.demo",
license: "CC-BY-4.0",
creators: ["SCIBASE demo researcher"],
artifacts: [
{
path: "data/raw/soil_samples.csv",
content: "plot,carbon_pct\nnorth,2.1\nsouth,1.8\n",
license: "CC-BY-4.0",
metadata: { variables: ["plot", "carbon_pct"], instrument: "field-sampling" },
},
{
path: "notebooks/soil_carbon.ipynb",
content: "{\"cells\":[{\"cell_type\":\"code\",\"source\":\"print('soil carbon')\"}]}",
license: "MIT",
metadata: { runtime: "python>=3.11", kernel: "python3" },
},
{
path: "outputs/soil_summary.csv",
content: "metric,value\nmean_carbon_pct,1.95\n",
license: "CC-BY-4.0",
derivedFrom: ["data/raw/soil_samples.csv", "notebooks/soil_carbon.ipynb"],
metadata: { variables: ["metric", "value"] },
},
{
path: "env/environment.yml",
content: "name: soil-carbon\ndependencies:\n - python=3.11\n - pandas\n",
kind: "environment",
},
],
activities: [
{
id: "activity:notebook-run",
type: "notebook-replay",
actor: "reviewer-workflow",
command: "jupyter nbconvert --execute notebooks/soil_carbon.ipynb",
environment: "env/environment.yml",
used: ["data/raw/soil_samples.csv", "notebooks/soil_carbon.ipynb"],
generated: ["outputs/soil_summary.csv"],
},
],
});

console.log(JSON.stringify({
validation: validateProvenanceReceipt(receipt),
rerunPlan: buildRerunPlan(receipt),
jsonLd: exportJsonLd(receipt),
datacite: exportDataCite(receipt),
}, null, 2));
Loading