Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 174 additions & 111 deletions PYDANTIC_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,11 +542,119 @@ Inheriting from `str, Enum` makes enum values work as both enums and strings, wh

### Relationship Patterns

**What are relationships?** Relationships (or "associations") represent connections between different features or models. Think of them like links that connect related pieces of information - for example, a city center that belongs to an administrative area, or a transportation segment that connects to specific intersections.
#### What are relationships?

Pydantic provides several ways to express these relationships, each suited to different use cases and complexity levels.
Relationships (or "associations") represent connections between different features or models. Think of them like links that connect related pieces of information — for example, a building part that is structurally part of a building, or a division area that is administratively nested under a division.

#### 1. Direct References (Foreign Keys)
Pydantic provides several ways to express these relationships, each suited to different use cases and complexity levels. Before choosing a pattern, it's important to understand the **semantic type** of the relationship you're modeling.

---

#### Semantic Relationship Types

Every relationship between two features carries a semantic meaning about coupling strength, lifecycle dependency, and ownership. The schema defines four relationship types, ordered from strongest to weakest coupling.

The types describe the *nature* of the link, not which feature is "parent" or "child." Direction is implicit: the feature holding the reference is the source, and the type it references is the destination.

```python
class Relationship(str, Enum):
"""Semantic relationship types, ordered by coupling strength."""

# Composition: structural whole-part relationship.
# Lifecycle dependent — the part cannot exist without the whole.
# Also covers geometric roles like boundary definitions.
COMPOSITION = "composition"

# Aggregation: grouping relationship without lifecycle dependency.
# The member is independently viable and survives reassignment or orphaning.
AGGREGATION = "aggregation"

# Hierarchy: organizational or classificatory nesting.
# Not structural assembly — administrative parentage, taxonomy, categorization.
HIERARCHY = "hierarchy"

# Association: peer-level reference with no ownership or nesting.
# Neither feature depends on or contains the other.
ASSOCIATION = "association"
```

##### `COMPOSITION` — Structural Whole-Part

A structural whole-part relationship with lifecycle dependency. The part has no independent meaning outside the whole. Deleting the whole invalidates the part.

**Test question:** *"If I delete the whole, does keeping the part orphaned make any sense at all?"* If the answer is no, it's `COMPOSITION`.

**Examples:**

- `Building` → `BuildingPart` — building *has parts*
- `BuildingPart` → `Building` — part *is part of* building
- `BoundaryLine` → `Area` — line *defines the boundary of* area

##### `AGGREGATION` — Grouping Without Lifecycle Dependency

A grouping or collection relationship where the member is independently viable. No lifecycle dependency — the member survives reassignment to another group or orphaning.

**Test question:** *"Can this thing belong to something else or nothing and still be a valid map feature?"* If yes, it's `AGGREGATION`.

**Examples:**

- `Route` → `Segment` — route *groups* segments
- `TrailSegment` → `NationalPark` — segment *is grouped by* park

##### `HIERARCHY` — Organizational Nesting

An organizational or classificatory nesting relationship. This is not about structural assembly — it's about administrative parentage, taxonomy, or categorization, with property inheritance down the tree.

**Test question:** *"Is this about organizational belonging rather than structural assembly?"* If yes, it's `HIERARCHY`.

**Examples:**

- `Division` → `DivisionArea` — division *has child* areas
- `DivisionArea` → `Division` — area *is child of* division
- `Place` → `AdminArea` — place nested under admin area
- `POICategory` → `POI` — category *has child* POIs

##### `ASSOCIATION` — Peer-Level Reference

A peer-level reference with no ownership, containment, or nesting. Neither feature depends on or contains the other. This is the fallback when none of the stronger types apply.

**Test question:** *"Are these just peers that know about each other?"* If yes, it's `ASSOCIATION`.

**Examples:**

- `Building` → `Address` — a building references its address, neither owns the other
- `Bridge` → `River` — the bridge crosses the river, no ownership implied
- `Segment` → `Connector` — segment references its start/end connector

---

#### Selection Priority

When a relationship could fit multiple types, the choice is not a strict linear ranking but a **diamond decision**: start at the top, fork in the middle based on the *kind* of coupling, and fall through to the bottom only when no stronger type applies.

```text
COMPOSITION
/ \
AGGREGATION HIERARCHY
\ /
ASSOCIATION
```

The rationale is **information preservation**: each tier above association adds a guarantee (lifecycle, structural grouping, or organizational nesting). Drop down only when the stronger commitment would be a false promise.

| If the relationship implies... | Use |
| ------------------------------------------------------- | ---------------- |
| Structural whole-part with lifecycle dependency | `COMPOSITION` |
| Geometric boundary definition (lifecycle dependent) | `COMPOSITION` |
| Grouping/collection without lifecycle dependency | `AGGREGATION` |
| Organizational nesting or classification tree | `HIERARCHY` |
| Peer-level reference, no ownership or nesting | `ASSOCIATION` |

---

#### Implementation Patterns

##### 1. Direct References (Foreign Keys)

The fundamental pattern is a direct reference where one feature "points to" another using an ID field with type safety and semantic information. This creates a one-way relationship - like a building knowing which neighborhood it belongs to, but the neighborhood doesn't automatically know about all its buildings.

Expand All @@ -556,165 +664,120 @@ from pydantic import Field
from overture.schema.core import OvertureFeature
from overture.schema.system.ref import Id, Reference, Relationship

class DivisionArea(OvertureFeature[Literal["divisions"], Literal["division_area"]]):
"""Area polygon that belongs to a division."""
# COMPOSITION — building lists its parts ("has part")
class Building(OvertureFeature[Literal["buildings"], Literal["building"]]):
"""A building composed of structural parts."""

part_ids: Annotated[
list[Id] | None,
Reference(Relationship.COMPOSITION, BuildingPart),
Field(description="Building parts that compose this building"),
] = None

# COMPOSITION — part points to its whole ("part of")
class BuildingPart(OvertureFeature[Literal["buildings"], Literal["building_part"]]):
"""A structural part of a building."""

# Required reference - every division area must belong to a division
division_id: Annotated[
building_id: Annotated[
Id,
Reference(Relationship.BELONGS_TO, Division),
Field(description="Division this area belongs to")
Reference(Relationship.COMPOSITION, Building),
Field(description="Building this part structurally belongs to")
]

# Optional reference - may or may not be associated with a place
place_id: Annotated[
# ASSOCIATION — peer reference, no ownership
class Building(OvertureFeature[Literal["buildings"], Literal["building"]]):
"""A building feature."""

address_id: Annotated[
Id | None,
Reference(Relationship.CONNECTS_TO, Place),
Field(description="Place this area is associated with")
Reference(Relationship.ASSOCIATION, Address),
Field(description="Address associated with this building")
] = None
```

**Available relationship types (see [Relationship](packages/overture-schema-system/src/overture/schema/system/ref/ref.py)):**
##### 2. Association as a Separate Feature (Complex Relationships)

- **`BELONGS_TO`**: The referencing feature belongs to the referenced feature (division area belongs to division)
- **`CONNECTS_TO`**: The referencing feature connects to the referenced feature (segment connects to connector)
- **`BOUNDARY_OF`**: The referencing feature forms a boundary of the referenced feature (boundary line defines area)

#### 2. Association as a Separate Feature (Complex Relationships)

When the relationship itself needs to store information, create a dedicated feature to represent that relationship. This is like creating a "relationship record" that describes how two things are connected.
When the relationship itself needs to store information, create a dedicated feature to represent it. This applies regardless of the semantic type — any of the four types can carry metadata.

**Simple relationship (use Pattern 1):**

- "Building A belongs to Neighborhood B" - just needs an ID reference
- "Building Part A is part of Building B" just needs an ID reference.

**Complex relationship (use Pattern 2):**

- "Admin Area X has City Center Y as its primary center since 2010 with 85% confidence" - the relationship has properties (`type=primary`, `date=2010`, `confidence=85%`)
- "Admin Area X has City Center Y as its primary center since 2010 with 85% confidence" the relationship has properties.

```python
class AdminCityCenterAssociation(OvertureFeature[Literal["associations"], Literal["admin_city_center"]]):
class AdminCityCenterAssociation(
OvertureFeature[Literal["associations"], Literal["admin_city_center"]]
):
"""Describes how an administrative area relates to a city center."""

# The two things being connected
admin_area_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, AdminArea)]
city_center_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, CityCenter)]
admin_area_id: Annotated[
Id,
Reference(Relationship.ASSOCIATION, AdminArea)
]
city_center_id: Annotated[
Id,
Reference(Relationship.ASSOCIATION, CityCenter)
]

# Information about the relationship itself
relationship_type: Literal["primary_center", "secondary_center"] = "primary_center"
established_date: str | None = None
confidence_score: Annotated[float64, Field(ge=0.0, le=1.0)] | None = None
```

When to use separate association features:
**When to use separate association features:**

- The relationship has properties: confidence scores, dates, types, notes
- Many-to-many connections: one admin area can have multiple city centers, one city center can serve multiple admin areas
- You need to query the relationships: "show me all primary city center relationships established after 2015"
- The relationship has properties (confidence scores, dates, types, notes).
- Many-to-many connections exist (one admin area can have multiple city centers, one city center can serve multiple admin areas).
- You need to query the relationships independently ("show me all primary city center relationships established after 2015").

This focuses on the core concept: when relationships carry data, they become features themselves.
##### 3. Collection References

#### 3. Collection References

When a feature needs to reference multiple other features, use a list of references:
When a feature needs to reference multiple other features, use a list of references. The semantic type still matters — a route aggregating segments is `AGGREGATION`, not `ASSOCIATION`.

```python
# AGGREGATION — segments are independently viable features
class Route(OvertureFeature[Literal["transportation"], Literal["route"]]):
"""A transportation route that passes through multiple segments."""

segment_ids: Annotated[
list[Id],
Field(min_length=1, description="Ordered list of segments in this route"),
UniqueItemsConstraint(), # No duplicate segments
Reference(Relationship.CONNECTS_TO, TransportationSegment) # All IDs reference segments
Reference(Relationship.AGGREGATION, TransportationSegment),
Field(min_length=1, description="Ordered segments aggregated into this route"),
UniqueItemsConstraint(),
]

class Building(OvertureFeature[Literal["buildings"], Literal["building"]]):
"""A building that may contain multiple building parts."""

part_ids: Annotated[
list[Id] | None,
Field(description="Building parts that belong to this building"),
Reference(Relationship.BOUNDARY_OF, BuildingPart) # All IDs reference building parts
] = None
```

---

#### Best Practices

##### Always Use Reference Annotations

Include `Reference` annotations for semantic clarity and documentation:

```python
# Good - complete relationship information
# Good complete relationship information with semantic type
division_id: Annotated[
Id,
Reference(Relationship.BELONGS_TO, Division),
Field(description="Division this area belongs to")
Reference(Relationship.HIERARCHY, Division),
Field(description="Division this area is administratively nested under")
]

# Avoid - missing semantic information
# Avoid missing semantic information
division_id: Id
```

##### Choose the Right Pattern

- **Simple relationships** → Direct references (foreign keys)
- **Relationships with metadata** → Separate association features

```python
# Simple: just a link
admin_area_id: Annotated[Id, Reference(Relationship.BELONGS_TO, AdminArea)]
##### Choose the Right Semantic Type First, Then the Right Pattern

# Complex: relationship has properties
class AdminCityCenterAssociation(Feature[...]):
admin_area_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, AdminArea)]
city_center_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, CityCenter)]
relationship_type: Literal["primary", "secondary"] = "primary"
```

#### Association Pattern Examples

**Transportation Network:**

```python
# Segments connect to connectors (intersection points)
class Segment(OvertureFeature[Literal["transportation"], Literal["segment"]]):
from_connector_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, Connector)]
to_connector_id: Annotated[Id, Reference(Relationship.CONNECTS_TO, Connector)]

# Routes contain multiple segments in order
class Route(Feature[Literal["transportation"], Literal["route"]]):
segment_ids: Annotated[
list[Id],
Reference(Relationship.CONNECTS_TO, TransportationSegment),
Field(description="Ordered list of segments in this route")
]
```

**Administrative Hierarchy:**

```python
# Division areas belong to divisions
class DivisionArea(OvertureFeature[Literal["divisions"], Literal["division_area"]]):
division_id: Annotated[Id, Reference(Relationship.BELONGS_TO, Division)]

# Places belong to administrative areas
class Place(OvertureFeature[Literal["places"], Literal["place"]]):
admin_area_id: Annotated[Id | None, Reference(Relationship.BELONGS_TO, AdminArea)] = None
```

**Building Relationships:**

```python
# Building parts belong to buildings
class BuildingPart(OvertureFeature[Literal["buildings"], Literal["building_part"]]):
building_id: Annotated[Id, Reference(Relationship.BELONGS_TO, Building)]

# Buildings can reference their address
class Building(OvertureFeature[Literal["buildings"], Literal["building"]]):
address_id: Annotated[Id | None, Reference(Relationship.CONNECTS_TO, Address)] = None
```
1. **Determine the semantic type** using the selection priority and test questions above.
2. **Then choose the implementation pattern:**
- Simple relationships → Direct references (Pattern 1)
- Relationships with metadata → Separate association features (Pattern 2)
- One-to-many references → Collection references (Pattern 3)

### Discriminated Unions

Expand Down Expand Up @@ -1223,13 +1286,13 @@ class MyAssociation(OvertureFeature[Literal["associations"], Literal["my_associa
# References to the associated features
feature_a_id: Annotated[
Id,
Reference(Relationship.CONNECTS_TO, FeatureA),
Reference(Relationship.ASSOCIATION, FeatureA),
Field(description="First feature in the relationship")
]

feature_b_id: Annotated[
Id,
Reference(Relationship.CONNECTS_TO, FeatureB),
Reference(Relationship.ASSOCIATION, FeatureB),
Field(description="Second feature in the relationship")
]

Expand Down Expand Up @@ -1257,7 +1320,7 @@ height: Annotated[float64 | None, Field(ge=0, description="Height in meters")] =
tags: Annotated[list[str] | None, Field(min_length=1), UniqueItemsConstraint()] = None

# Association patterns
parent_id: Annotated[Id | None, Reference(Relationship.BELONGS_TO, ParentModel)] = None
parent_id: Annotated[Id | None, Reference(Relationship.HIERARCHY, ParentModel)] = None
connector_ids: list[Id] # References to multiple related features
```

Expand Down
Loading