Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ nav:
- explanation/index.md
- Overview:
- Data Pipelines: explanation/data-pipelines.md
- What's New in 2.0: explanation/whats-new-2.md
- What's New in 2.2: explanation/whats-new-22.md
- FAQ: explanation/faq.md
- Data Model:
- Relational Workflow Model: explanation/relational-workflow-model.md
Expand Down Expand Up @@ -127,6 +125,9 @@ nav:
- API: api/ # Auto-generated via gen-files + literate-nav
- About:
- about/index.md
- What's New in 2.2: about/whats-new-22.md
- What's New in 2.1: about/whats-new-21.md
- What's New in 2.0: about/whats-new-2.md
- History: about/history.md
- Documentation Versioning: about/versioning.md
- Platform: https://www.datajoint.com/sign-up
Expand Down
2 changes: 1 addition & 1 deletion src/.overrides/partials/announce.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% if config.extra.datajoint_version %}
<a href="{{ 'explanation/whats-new-2/' | url }}">
<a href="{{ 'about/whats-new-2/' | url }}">
Documentation for DataJoint {{ config.extra.datajoint_version }}
</a>
{% endif %}
2 changes: 1 addition & 1 deletion src/about/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ print(dj.__version__)

If you're upgrading from legacy DataJoint (pre-2.0):

1. **Review** the [What's New in 2.0](../explanation/whats-new-2.md) page to understand major changes
1. **Review** the [What's New in 2.0](whats-new-2.md) page to understand major changes
2. **Follow** the [Migration Guide](../how-to/migrate-to-v20.md) for step-by-step upgrade instructions
3. **Reference** this documentation for updated syntax and APIs

Expand Down
26 changes: 9 additions & 17 deletions src/explanation/whats-new-2.md → src/about/whats-new-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,20 +274,12 @@ Most users complete Phases 1-2 in a single session. Phases 3-4 only apply if you

## See Also

### Migration
- **[Migration Guide](../how-to/migrate-to-v20.md/)** — Complete upgrade instructions
- [Configuration](../how-to/configure-database.md/) — Setup new configuration system

### Core Concepts
- [Type System](type-system.md) — Understand the three-tier type architecture
- [Computation Model](computation-model.md) — Jobs 2.0 and AutoPopulate
- [Query Algebra](query-algebra.md) — Semantic matching and operators

### Getting Started
- [Installation](../how-to/installation.md/) — Install DataJoint 2.0
- [Tutorials](../tutorials/index.md/) — Learn by example

### Reference
- [Type System Specification](../reference/specs/type-system.md/) — Complete type system details
- [Codec API](../reference/specs/codec-api.md/) — Build custom codecs
- [AutoPopulate Specification](../reference/specs/autopopulate.md/) — Jobs 2.0 reference
- [What's New in 2.1](whats-new-21.md) — Next release
- [Release Notes (v2.0.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.0.0) — GitHub changelog
- **[Migration Guide](../how-to/migrate-to-v20.md)** — Complete upgrade instructions
- [Configuration](../how-to/configure-database.md) — Setup new configuration system
- [Type System](../explanation/type-system.md) — Understand the three-tier type architecture
- [Computation Model](../explanation/computation-model.md) — Jobs 2.0 and AutoPopulate
- [Query Algebra](../explanation/query-algebra.md) — Semantic matching and operators
- [Installation](../how-to/installation.md) — Install DataJoint 2.0
- [Tutorials](../tutorials/index.md) — Learn by example
125 changes: 125 additions & 0 deletions src/about/whats-new-21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# What's New in DataJoint 2.1

DataJoint 2.1 adds **PostgreSQL as a production backend**, **enhanced diagram visualization**, and **singleton tables**.

> **Upgrading from 2.0?** No breaking changes. All existing code continues to work. New features are purely additive.

> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)

## PostgreSQL Backend

DataJoint now supports PostgreSQL 15+ as a production database backend alongside MySQL 8+. The adapter architecture generates backend-specific SQL while maintaining a consistent API — the same table definitions, queries, and pipeline logic work on both backends.

```bash
export DJ_BACKEND=postgresql
export DJ_HOST=localhost
export DJ_PORT=5432
```

Or configure programmatically:

```python
dj.config['database.backend'] = 'postgresql'
```

All core types (`int32`, `float64`, `varchar`, `uuid`, `json`), codec types (`<blob>`, `<attach>`, `<object@>`), query operations, foreign keys, indexes, and auto-populate work identically across backends. Backend-specific differences are handled internally by the adapter layer.

See [Database Backends](../reference/specs/database-backends.md) for the full specification.

## Diagram Enhancements

`dj.Diagram` gains several visualization features for working with complex, multi-schema pipelines.

### Layout Direction

Control the flow direction of diagrams:

```python
# Horizontal layout
dj.config.display.diagram_direction = "LR"

# Or temporarily
with dj.config.override(display__diagram_direction="LR"):
dj.Diagram(schema).draw()
```

| Value | Description |
|-------|-------------|
| `"TB"` | Top to bottom (default) |
| `"LR"` | Left to right |

### Mermaid Output

Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown, GitHub, or web documentation:

```python
print(dj.Diagram(schema).make_mermaid())
```

Save directly to `.mmd` or `.mermaid` files:

```python
dj.Diagram(schema).save("pipeline.mmd")
```

### Schema Grouping

Multi-schema diagrams automatically group tables into visual clusters by database schema. The cluster label shows the Python module name when available, following the DataJoint convention of one module per schema.

```python
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
combined.draw() # tables grouped by schema
```

### Collapsing Schemas

For high-level pipeline views, collapse entire schemas into single nodes:

```python
# Show schema1 expanded, schema2 as a single node with table count
dj.Diagram(schema1) + dj.Diagram(schema2).collapse()
```

The **"expanded wins" rule** applies: if a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows showing specific tables while collapsing the rest:

```python
# Subject is expanded, rest of analysis schema is collapsed
dj.Diagram(Subject) + dj.Diagram(analysis).collapse()
```

See [Diagram Specification](../reference/specs/diagram.md) for the full reference.

## Singleton Tables

A **singleton table** holds at most one row. Declare it with no attributes in the primary key section:

```python
@schema
class Config(dj.Lookup):
definition = """
# Global configuration
---
setting1 : varchar(100)
setting2 : int32
"""
```

| Operation | Result |
|-----------|--------|
| Insert | Works without specifying a key |
| Second insert | Raises `DuplicateError` |
| `fetch1()` | Returns the single row |

Useful for global configuration, pipeline parameters, and summary statistics.

See [Table Declaration](../reference/specs/table-declaration.md#25-singleton-tables-empty-primary-keys) for details.

## See Also

- [Database Backends](../reference/specs/database-backends.md) — Full backend specification
- [Diagram Specification](../reference/specs/diagram.md) — Diagram reference
- [Table Declaration](../reference/specs/table-declaration.md) — Singleton tables
- [Configure Database](../how-to/configure-database.md) — Connection setup for both backends
- [What's New in 2.0](whats-new-2.md) — Previous release
- [What's New in 2.2](whats-new-22.md) — Next release
- [Release Notes (v2.1.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.1.0) — GitHub changelog
64 changes: 57 additions & 7 deletions src/explanation/whats-new-22.md → src/about/whats-new-22.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ In prior versions, `dj.Diagram` existed solely for visualization — drawing the
- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.

In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `counts()` methods are available as a public inspection API for understanding cascade impact before executing.

### The Preview-Then-Execute Pattern

Expand All @@ -225,7 +225,7 @@ diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
counts = restricted.counts()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute via Table.delete() after reviewing the blast radius
Expand All @@ -238,11 +238,11 @@ This is valuable when working with unfamiliar pipelines, large datasets, or mult

The diagram supports two restriction propagation modes designed for fundamentally different tasks.

**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `counts()` or `delete()`.

When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.

**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `counts()` to inspect the result.

The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.

Expand All @@ -256,15 +256,63 @@ export = (dj.Diagram(schema)
.restrict(Session & 'session_date > "2024-01-01"')
.prune())

export.preview() # only tables with matching rows
export.counts() # only tables with matching rows
export # visualize the export subgraph
```

Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.

### Restriction Propagation Rules

When `cascade()` or `restrict()` propagates a restriction from a parent to a child, one of three rules applies depending on the foreign key relationship:

| Rule | Condition | Child restriction |
|------|-----------|-------------------|
| **Direct copy** | Non-aliased FK, restriction attributes are a subset of child's primary key | Restriction copied directly |
| **Aliased projection** | FK uses attribute renaming (e.g., `subject_id` → `animal_id`) | Parent projected with attribute mapping |
| **Full projection** | Non-aliased FK, restriction uses attributes not in child's primary key | Parent projected (all attributes) as restriction |

When a child has multiple restricted ancestors, convergence depends on the mode: `cascade()` uses OR (any path marks a row for deletion), `restrict()` uses AND (all conditions must match).

When a child references the same parent through multiple foreign keys (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`), these paths always combine with OR regardless of the mode — each FK path is an independent reason for the child row to be affected.

### Dry Run

`Table.delete()` and `Table.drop()` accept a `dry_run` parameter that returns affected row counts without modifying data:

```python
# Preview what would be deleted
(Session & {'subject_id': 'M001'}).delete(dry_run=True)
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Preview what would be dropped
Session.drop(dry_run=True)
# {'`lab`.`session`': 100, '`lab`.`trial`': 5000}
```

### Unloaded Schema Detection

If a descendant table lives in a schema that hasn't been activated, the graph-driven delete won't know about it. When the final `DELETE` fails with a foreign key error, DataJoint catches it and produces an actionable error message identifying which schema needs to be activated — rather than the opaque crash of the prior implementation.

### Iteration API

Diagrams support Python's iteration protocol, yielding `FreeTable` objects in topological order:

```python
# Forward iteration (parents first) — useful for export/inspection
for ft in diagram:
print(ft.full_table_name, len(ft))

# Reverse iteration (leaves first) — used by delete and drop
for ft in reversed(diagram):
ft.delete_quick()
```

Each yielded `FreeTable` carries any cascade or restrict conditions that have been applied. `Table.delete()` and `Table.drop()` use `reversed(diagram)` internally, replacing the manual `topo_sort()` loops from prior implementations.

### Architecture

`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then iterates `reversed(diagram)` to delete leaves first. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `counts()` and iteration, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.

### Advantages over Error-Driven Cascade

Expand All @@ -278,10 +326,12 @@ The graph-driven approach resolves every known limitation of the prior error-dri
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
| Reusability | Delete-only | Delete, drop, export, prune |
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
| Inspectability | Opaque recursive cascade | `counts()` / `dry_run` before executing |

## See Also

- [What's New in 2.1](whats-new-21.md) — Previous release
- [Release Notes (v2.2.0)](https://github.com/datajoint/datajoint-python/releases) — GitHub changelog
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
Expand Down
2 changes: 1 addition & 1 deletion src/explanation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ and scalable.

How DataJoint ensures safe joins through attribute lineage tracking.

- :material-new-box: **[What's New in 2.0](whats-new-2.md)**
- :material-new-box: **[What's New in 2.0](../about/whats-new-2.md)**

Major changes, new features, and migration guidance for DataJoint 2.0.

Expand Down
Loading
Loading