Skip to content

[FEATURE] Extensible tag classification model discovery through Entry Points#463

Open
Roel Bollens (RoelBollens-TomTom) wants to merge 23 commits intodevfrom
discovery-rework-with-tags
Open

[FEATURE] Extensible tag classification model discovery through Entry Points#463
Roel Bollens (RoelBollens-TomTom) wants to merge 23 commits intodevfrom
discovery-rework-with-tags

Conversation

@RoelBollens-TomTom
Copy link
Copy Markdown
Collaborator

@RoelBollens-TomTom Roel Bollens (RoelBollens-TomTom) commented Mar 10, 2026

Extensible tag classification model discovery through Entry Points

This replaces the hardcoded model classification system with tag-based classification model discovery through Entry Points. This is based on #440 by Seth and several schema (ad-hoc) coding sessions where Seth, Vic, Dana, Tristan and Roel participated in.

Model discovery moved into system, eliminating assumptions about Overture in the process. The hardcoded namespace concept ("overture", "annex") and the ModelKind classifier is replaced with tags -- string labels derived by tag providers. Tags become the filtering, grouping, and classification mechanism for model discovery, driven by introspection and package metadata rather than central coordination.

system provides generic tag-based grouping without understanding what any particular tag means. Any package can register tag providers that classify models without special support in the discovery layer.

Purpose

Tags serve three roles:

  • CLI filtering: select subsets of models for output and codegen (--tag system:feature, --tag draft)
  • Classification and endorsement: distinguish features from extensions, mark models as vetted or approved by an authority
  • Marketplace taxonomy: browse and classify models and extensions in a future extension catalog

These roles overlap -- a tag like overture:theme=buildings serves both filtering and taxonomy. The design accommodates this overlap through structured tags that encode both ownership and dimension.

Tag Format

Tags are strings following the pattern [prefix:]key[=value]:

  • Plain: overture, draft, feature
  • Prefixed: system:extension -- : separates ownership
  • Prefixed k/v: overture:theme=buildings

: signals ownership and enables prefix reservation (see Privileged Packages and Tag Reservation). = signals a dimension with a value (groupable via --group-by). One level of each -- no nested colons or multiple = signs.

Minimal launch set

Tag Meaning
feature (was: system:feature) This model is a feature type (has geometry, inherits from Feature
overture:theme=<theme> Which Overture theme this belongs to (e.g., buildings, transportation)
overture (was: overture:official) Placeholder for a lifecycle/endorsement tag — exact name deferred pending Dana and Tristan's work on extension lifecycle

Reserved tags

Tags can be reserved either as simple tags or by namespaces. These are the tags and namespaces that are currently reserved:

Tag Reserved for use by
feature Tag providers from overture-schema-system
overture Tag providers from overture-schema-core
overture:* Tag providers from overture-schema-core
system:* Tag providers from overture-schema-system

Extensions

Additional extensions and accompanied tags will be introduced in a future PR. Extensions allows to augment existing types with new fields (columns).

Tag Meaning
system:extension This model is an extension (adds columns/fields to an existing type)

CLI

The list-types command has been updated to support filtering and grouping by tags. Currently, it no longer displays the description or fully qualified class name. The json-schema and validate commands from the overture-schema cli and generate command from the overture-codegen cli have been updated to be able to filter on tags instead of filtering by theme and type. Further changes can be introduced in a future update.

Examples

% overture-schema list-types
address            feature  overture  overture:theme=addresses
bathymetry         feature  overture  overture:theme=base
building           feature  overture  overture:theme=buildings
building_part      feature  overture  overture:theme=buildings
connector          feature  overture  overture:theme=transportation
division           feature  overture  overture:theme=divisions
division_area      feature  overture  overture:theme=divisions
division_boundary  feature  overture  overture:theme=divisions
infrastructure     feature  overture  overture:theme=base
land               feature  overture  overture:theme=base
land_cover         feature  overture  overture:theme=base
land_use           feature  overture  overture:theme=base
place              feature  overture  overture:theme=places
segment            feature  overture  overture:theme=transportation
sources            overture
water              feature  overture  overture:theme=base
% overture-schema list-types --group-by overture:theme
overture:theme=addresses (1)
→ address            feature  overture  overture:theme=addresses

overture:theme=base (6)
→ bathymetry         feature  overture  overture:theme=base
→ infrastructure     feature  overture  overture:theme=base
→ land               feature  overture  overture:theme=base
→ land_cover         feature  overture  overture:theme=base
→ land_use           feature  overture  overture:theme=base
→ water              feature  overture  overture:theme=base

overture:theme=buildings (2)
→ building           feature  overture  overture:theme=buildings
→ building_part      feature  overture  overture:theme=buildings

overture:theme=divisions (3)
→ division           feature  overture  overture:theme=divisions
→ division_area      feature  overture  overture:theme=divisions
→ division_boundary  feature  overture  overture:theme=divisions

overture:theme=places (1)
→ place              feature  overture  overture:theme=places

overture:theme=transportation (2)
→ connector          feature  overture  overture:theme=transportation
→ segment            feature  overture  overture:theme=transportation
% overture-schema list-types --tag overture --exclude-tag overture:theme=base 
address            feature  overture  overture:theme=addresses
building           feature  overture  overture:theme=buildings
building_part      feature  overture  overture:theme=buildings
connector          feature  overture  overture:theme=transportation
division           feature  overture  overture:theme=divisions
division_area      feature  overture  overture:theme=divisions
division_boundary  feature  overture  overture:theme=divisions
place              feature  overture  overture:theme=places
segment            feature  overture  overture:theme=transportation
sources            overture

Deviations

  • Tag providers are additive only and can't remove existing tags.
  • The execution order of tag providers is non-deterministic.
  • There is currently no warning on a tag amount limit
  • Agreed minimal tag set deviates from coding session outcome to make them less clunky by dropping namespace

Closes #512

Comment thread packages/overture-schema-cli/src/overture/schema/cli/commands.py Outdated
Comment thread README.pydantic.md Outdated
Comment thread packages/overture-schema-cli/tests/test_resolve_types.py Outdated
Comment thread packages/overture-schema-cli/tests/test_resolve_types.py Outdated
Comment thread packages/overture-schema-codegen/src/overture/schema/codegen/cli.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-core/src/overture/schema/core/tag_providers.py Outdated
Comment thread packages/overture-schema-cli/src/overture/schema/cli/commands.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
@RoelBollens-TomTom Roel Bollens (RoelBollens-TomTom) changed the title [WIP] Extensible tag classification model discovery through Entry Points Extensible tag classification model discovery through Entry Points Mar 25, 2026
@RoelBollens-TomTom Roel Bollens (RoelBollens-TomTom) changed the title Extensible tag classification model discovery through Entry Points [FEATURE] Extensible tag classification model discovery through Entry Points Mar 25, 2026
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let some comments, but I'm generally aligned and would merge once Roel Bollens (@RoelBollens-TomTom) and Seth Fitzsimmons (@mojodna) are jointly aligned on merging.

Left some thoughts on the AND/OR issue in the CLI, probably above there somewhere. 👆

Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
Comment thread packages/overture-schema-system/src/overture/schema/system/discovery.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 14, 2026

🗺️ Schema reference docs preview is live!

🌍 Preview https://staging.overturemaps.org/schema/pr/463/schema/index.html
🕐 Updated May 06, 2026 19:03 UTC
📝 Commit e11a1c4
🔧 env SCHEMA_PREVIEW true

Note

♻️ This preview updates automatically with each push to this PR.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Co-authored-by: Seth Fitzsimmons <sethfitz@amazon.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Co-authored-by: Seth Fitzsimmons <sethfitz@amazon.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
… filtering logic

- Removes overture tag provider (was deferred)
- Simplified tags
- Reserved tags instead of reserved namespaces
- Fixes small issue introduced in earlier commit

Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
… CLI commands

Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
`filter_models` selects feature types from the registry through three
combinators applied to the same tag grammar (plain `feature`,
namespaced `system:extension`, or compound `overture:theme=buildings`):

  --tag     OR      defines scope (any-of)
  --filter  AND     narrows scope (all-of)
  --exclude OR-NOT  subtracts (none-of)
  --type    OR      closed-list match on ModelKey.name (orthogonal)

  T = ⋃ tag predicates       (absent → U)
  F = ⋂ filter predicates    (absent → U)
  E = ⋃ exclude predicates   (absent → ∅)
  result = (T ∩ F \ E) restricted to type_names if non-empty

The mental model is procedural: --tag widens, --filter narrows,
--exclude subtracts. Without --tag the scope is every registered
model. An empty selector imposes no filtering.

A `TagSelector` value object carries the three tag predicates:

  class TagSelector:
      include_any: tuple[str, ...] = ()
      require_all: tuple[str, ...] = ()
      exclude_any: tuple[str, ...] = ()

Field names encode the combinator (any-of / all-of / none-of),
deliberately distinct from CLI flag names. Flags are user-facing
affordances; field names are implementation-facing and self-document
at the call site.

`type_names` lives on `filter_models` as a keyword, not on
`TagSelector`. It's a closed-list match on `ModelKey.name`, orthogonal
to the tag predicate algebra. Isolating it makes `TagSelector`'s
purpose statable in one sentence and confines a future fold-in of
`--type` to a kwarg deletion that doesn't disturb `TagSelector`.

User-facing help text frames flags as acting on feature types
("Include feature types with these tags — defines scope (OR;
repeatable)"). Internal API docstrings keep "models" since they
describe the Python class layer; "feature types" is the user-facing
vocabulary for entry-point-registered top-level types, distinct from
the Pydantic models used for nested fields.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Use provider_key.name (always a string) instead of provider.__name__,
which raises AttributeError when a provider is a callable instance
without __name__ — masking the original error inside the except block.
Add exc_info=True to preserve the traceback in the warning.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Replace unittest.TestCase classes with module-level pytest functions
parametrized over the tag lists. Per-tag parametrization isolates
failures to the offending input instead of stopping at the first
assertion in a loop.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Fixes D100 reported by pydocstyle / make docformat.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Plain tags, namespaces, and predicates now share a single TAG_PART
pattern: lowercase alphanumeric start followed by alphanumeric, hyphen,
underscore, or dot. Values remain case-permissive. Drops the prior
asymmetry where namespaces and predicates allowed dots but plain tags
did not.

Make generate_tags private (its sole caller is discover_models) and
broaden TagProvider's return type to Iterable[str] so providers can
yield, return lists, or return sets.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
The provider's first argument is the value loaded from an
`overture.models` entry point. For discriminated-union features (e.g.
`Segment`) that's `Annotated[Union[...], Field(...)]`, not
`type[BaseModel]` — the prior signature was a lie. Widen `TagProvider`
and the in-tree providers to accept `Any` and document the boundary in
`discovery/types.py`.

Strip `typing_util.collect_types` to the cases discovery actually meets
today: `Annotated`, `Union`/`X | Y`, plain class. Drop the unreached
`NewType` and `Literal` branches. Point at `overture-schema-codegen`'s
`extraction/type_analyzer.py:analyze_type` as the more capable
implementation, with consolidation across system, core, and cli flagged
as future work.

`theme_provider` extracts the theme via `_theme_literal`, which asserts
that `theme` is a single-value `str` `Literal[...]` and raises
`TypeError` otherwise. `_generate_tags` catches and logs at WARNING, so
third-party model-definition bugs surface visibly without crashing
discovery.

Promote tag-rejection logging from DEBUG to WARNING so authorization
failures (invalid tags, reserved tags, reserved namespaces) don't
disappear silently in normal operation.

Convert filter tests from direct `_filter_tags` calls to a fake
`TagProvider` driven through `_generate_tags`. Tests now exercise
provider invocation and merge wiring, not just the filter, and decouple
from the private filter name. Provider-behavior tests still call the
providers directly. Add discriminated-union coverage for both
`feature_provider` and `theme_provider`, plus a `TypeError` case for a
non-Literal `theme`.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Add Discovery and Tagging sections to system's README, covering the
overture.models / overture.tag_providers entry point groups, the tag
format, provider contract, namespace and tag reservation, the built-in
providers, and TagSelector-based filtering.

Update core's README: replace the stale Discovery bullet (discovery has
moved to system) with one describing the authority and theme tag
providers core contributes.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Tag providers now receive the concrete BaseModel subclasses for the
entry point instead of the raw entry-point value. _generate_tags walks
the model once via collect_types and passes the result to every
provider, so providers can't forget to handle discriminated unions
and the walk happens once per model rather than once per provider.

The TagProvider type alias drops Any in favor of
Iterable[type[BaseModel]], honestly typing what providers receive.
The first arg of _generate_tags is annotated Any to match the
entry-point loader, which yields union expressions that aren't
type[BaseModel].

All three registered providers (feature_provider, authority_provider,
theme_provider) update to the new signature; unit tests pass concrete
classes directly while union-handling tests move to the
_generate_tags integration boundary, where the walk now lives.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Per discussion in the coding sesh.

Signed-off-by: Victor Schappert <schapper@amazon.com>
The module defines ModelKey and TagProviderKey -- key types, not domain
models. Rename clarifies intent and avoids confusion with theme model
modules elsewhere in the codebase.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Click 8.1 introduced typed decorator returns that preserve the
TypeVar in `tag_selection_options`, so the lowest-direct mypy job
no longer reports `Callable[..., Any]` reassignments. The 8.0
floor predated this and only affected lowest-direct.

Signed-off-by: Seth Fitzsimmons <seth@mojodna.net>
…lain tag that were missed in the authority_provider removal

Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extensible tag-based model classification and discovery

4 participants