Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions knowledge-graph-author-affiliation-disambiguation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Knowledge Graph Author Affiliation Disambiguation

Self-contained module for issue `#17` Scientific Knowledge Graph Integration.

It adds a deterministic trust layer for author and affiliation entities before they are used in entity pages, collaboration maps, semantic search, or AI recommendations.

## What It Does

- Normalizes author names, initials, ORCID values, email domains, affiliations, and scientific concepts.
- Merges author mentions when there is strong evidence from ORCID, affiliation, domain, and topic overlap.
- Sends homonyms and low-confidence merges to a curator queue instead of polluting the graph.
- Builds collaboration edges from shared document evidence.
- Produces recommendation guards so uncertain identities do not drive cross-lab suggestions.
- Exports schema.org-compatible creator metadata.

## Demo

```bash
npm run check
npm test
npm run demo
```

The sample dataset intentionally contains two different `Maya Chen` authors. The Stanford CRISPR author is merged across paper, dataset, and protocol mentions, while the MIT materials-science homonym is routed to curator review.

## Why This Belongs in the Knowledge Graph

The issue calls out authors, affiliations, entity pages, lab-to-lab collaboration maps, graph navigation, and personalized recommendations. Those features depend on author identity quality. A graph that merges homonyms or splits the same author across affiliations will produce misleading entity pages, false collaboration paths, and bad recommendation digests.

This slice complements broad extractors and navigators by adding identity-quality controls before graph edges are trusted.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
[
{
"mentionId": "m1",
"documentId": "paper-crispr-01",
"name": "Dr. Maya A. Chen",
"orcid": "0000-0002-1825-0097",
"email": "maya.chen@stanford.edu",
"affiliation": "Stanford University, Department of Bioengineering",
"concepts": ["CRISPR", "single-cell RNA-seq"],
"doi": "10.5555/crispr.01"
},
{
"mentionId": "m2",
"documentId": "dataset-crispr-01",
"name": "Maya Chen",
"email": "mchen@stanford.edu",
"affiliation": "Dept. of Bioengineering, Stanford Univ.",
"concepts": ["CRISPR", "perturb-seq"],
"doi": "10.5555/data.01"
},
{
"mentionId": "m3",
"documentId": "protocol-crispr-02",
"name": "M. Chen",
"affiliation": "Stanford Bioengineering",
"concepts": ["CRISPR", "protocol"],
"doi": "10.5555/protocol.02"
},
{
"mentionId": "m4",
"documentId": "paper-materials-07",
"name": "Maya Chen",
"email": "maya.chen@mit.edu",
"affiliation": "MIT Materials Science and Engineering",
"concepts": ["perovskite", "thin films"],
"doi": "10.5555/materials.07"
},
{
"mentionId": "m5",
"documentId": "paper-crispr-01",
"name": "Luis Ortega",
"orcid": "0000-0003-1111-2222",
"email": "lortega@ucsf.edu",
"affiliation": "UCSF Computational Biology",
"concepts": ["single-cell RNA-seq", "trajectory inference"],
"doi": "10.5555/crispr.01"
},
{
"mentionId": "m6",
"documentId": "dataset-crispr-01",
"name": "L. Ortega",
"affiliation": "University of California San Francisco",
"concepts": ["trajectory inference", "perturb-seq"],
"doi": "10.5555/data.01"
}
]
Binary file not shown.
20 changes: 20 additions & 0 deletions knowledge-graph-author-affiliation-disambiguation/docs/demo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Requirement Map

This module targets issue `#17` Scientific Knowledge Graph Integration.

| Issue capability | Implementation |
| --- | --- |
| Parse authors and affiliations from uploaded content | `createAuthorGraph()` accepts author mentions from papers, datasets, notebooks, or protocols and normalizes names, affiliations, ORCID values, email domains, concepts, and DOI evidence. |
| Build author graphs and collaboration maps | `buildCollaborationEdges()` creates weighted collaboration edges from shared document evidence and shared concept context. |
| Aggregate usage contexts for entity pages | Author nodes include mention IDs, source documents, concepts, affiliations, and merge evidence for entity-page rendering. |
| Support semantic graph recommendations | `recommendationGuards` suppresses recommendations when identity confidence is not sufficient, preventing unsafe cross-lab suggestions. |
| Output linked data / schema.org metadata | `toSchemaOrg()` exports creator metadata with schema.org `Person` and `Organization` structures. |
| Human review for ambiguous graph edges | `curatorQueue` captures homonyms and low-confidence merge candidates with transparent scoring reasons. |

## Distinctness

This slice is intentionally narrower than broad knowledge graph extractors and navigators. It focuses on author identity, affiliation normalization, homonym safety, and collaboration-edge trust, which are prerequisites for accurate author graphs, lab-to-lab maps, and personalized recommendations.
11 changes: 11 additions & 0 deletions knowledge-graph-author-affiliation-disambiguation/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "knowledge-graph-author-affiliation-disambiguation",
"version": "1.0.0",
"private": true,
"type": "commonjs",
"scripts": {
"check": "node --check src/author-affiliation-disambiguation.js && node --check test/author-affiliation-disambiguation.test.js && node --check scripts/demo.js",
"test": "node --test test/author-affiliation-disambiguation.test.js",
"demo": "node scripts/demo.js"
}
}
25 changes: 25 additions & 0 deletions knowledge-graph-author-affiliation-disambiguation/scripts/demo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
const mentions = require("../data/sample-mentions.json");
const {createAuthorGraph, toSchemaOrg} = require("../src/author-affiliation-disambiguation");

const graph = createAuthorGraph(mentions);

console.log("Author nodes");
console.table(graph.authorNodes.map((node) => ({
id: node.id,
name: node.name,
mentions: node.mentionIds.length,
documents: node.documents.length,
concepts: node.concepts.length
})));

console.log("\nCollaboration edges");
console.table(graph.collaborationEdges);

console.log("\nCurator queue");
console.table(graph.curatorQueue);

console.log("\nRecommendation guards");
console.log(JSON.stringify(graph.recommendationGuards, null, 2));

console.log("\nSchema.org export");
console.log(JSON.stringify(toSchemaOrg(graph), null, 2));
Loading