Skip to content

Added GenericAliasHandler class [ENG-226]#76

Open
brian-arnold wants to merge 1 commit intowalkerlab:devfrom
brian-arnold:claude/generic-alias-hasher
Open

Added GenericAliasHandler class [ENG-226]#76
brian-arnold wants to merge 1 commit intowalkerlab:devfrom
brian-arnold:claude/generic-alias-hasher

Conversation

@brian-arnold
Copy link
Collaborator

The semantic hasher had no registered handler for types.GenericAlias — the type Python uses to represent parameterized generics like dict[int, str] or list[int]. Plain types like dict or list worked fine, but as soon as key/value types were specified, the hasher raised a TypeError in strict mode.

This affected both user-written type annotations on @function_pod functions and orcapod's own internal schema inference, which produces parameterized generics like dict[key_type, value_type].

The fix adds a GenericAliasHandler that produces a stable serializable representation by recursively hashing origin and args, and registers it for both types.GenericAlias and typing._GenericAlias.

Original error message:

Traceback (most recent call last): File "/home/barnold/personal_repos/orcapod-testing/02_simple_ephys_examples/ephys_pipeline.py", line 25, in <module> @function_pod(output_keys=["cluster_spike_trains"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/function_pod.py", line 633, in decorator packet_function = PythonPacketFunction( ^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/packet_function.py", line 359, in init self._output_schema_hash = semantic_hasher.hash_object( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 167, in hash_object expanded = self._expand_structure( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 253, in _expand_structure return self._expand_mapping(obj, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 313, in _expand_mapping items[str_key] = self._expand_element(v, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 301, in _expand_element return self.hash_object(obj, resolver=resolver).to_string() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 198, in hash_object fallback = self._handle_unknown(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 380, in _handle_unknown raise TypeError( TypeError: BaseSemanticHasher (strict): no TypeHandlerProtocol registered for type 'types.GenericAlias' and it does not implement ContentIdentifiableProtocol. Register a TypeHandlerProtocol via the TypeHandlerRegistry or implement identity_structure() on the class. 

@brian-arnold brian-arnold requested a review from eywalker March 6, 2026 01:30
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 53.33333% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...capod/hashing/semantic_hashing/builtin_handlers.py 53.33% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@brian-arnold brian-arnold changed the title Added GenericAliasHandler class Added GenericAliasHandler class [ENG-226] Mar 6, 2026
Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support to this! Could you add new tests that cover the GenericAliasHandler usage?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for hashing parameterized generic type annotations (e.g. dict[int, str]) by introducing and registering a new semantic hashing handler, and fixes a truthiness check in Python→Arrow conversion to avoid failures on containers with ambiguous boolean evaluation.

Changes:

  • Introduce GenericAliasHandler and register it for types.GenericAlias and (programmatically) typing._GenericAlias.
  • Update the default data context (v0.1.json) to register the new handler for types.GenericAlias.
  • Adjust Python→Arrow list/map converters to use value is not None rather than if value.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/orcapod/semantic_types/universal_converter.py Avoids truthiness checks when converting list/map values to Arrow-friendly structures.
src/orcapod/hashing/semantic_hashing/builtin_handlers.py Adds and registers GenericAliasHandler for parameterized generic annotations.
src/orcapod/contexts/data/v0.1.json Registers GenericAliasHandler in the versioned default context’s TypeHandlerRegistry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +175 to +194
class GenericAliasHandler:
"""
Handler for generic alias type annotations such as ``dict[int, list[int]]``
(``types.GenericAlias``) and ``typing`` generics (``typing._GenericAlias``).

Produces a stable dict containing the origin type and a list of hashed
argument types so that structurally identical generic annotations always
yield the same hash, and structurally different ones yield different hashes.
"""

def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any:
origin = getattr(obj, "__origin__", None)
args = getattr(obj, "__args__", None) or ()
if origin is None:
return f"generic_alias:{obj!r}"
return {
"__type__": "generic_alias",
"origin": hasher.hash_object(origin).to_string(),
"args": [hasher.hash_object(arg).to_string() for arg in args],
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are extensive semantic hasher tests, but none appear to cover parameterized generics (e.g. dict[int, str], list[int], typing.Dict[int, str]). Adding a regression test that hashes these types in strict mode (and asserts stable equality/inequality across different args) would prevent this handler from silently breaking in future refactors.

Copilot uses AI. Check for mistakes.
Comment on lines 631 to 648
elif origin is list:
element_converter = self.get_python_to_arrow_converter(args[0])
return (
lambda value: [element_converter(item) for item in value]
if value
if value is not None
else []
)

elif origin is dict:
key_converter = self.get_python_to_arrow_converter(args[0])
value_converter = self.get_python_to_arrow_converter(args[1])
return (
lambda value: [
{"key": key_converter(k), "value": value_converter(v)}
for k, v in value.items()
]
if value
if value is not None
else []
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes truthiness checks for list/dict values, but there’s no regression test ensuring python_to_arrow conversion works for iterable containers with an ambiguous/invalid __bool__ (e.g. numpy arrays raise on if value). Consider adding a test using a small custom iterable whose __bool__ raises, and verifying conversion still succeeds when the value is empty/non-empty.

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 65
[{"_type": "types.BuiltinFunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
[{"_type": "types.MethodType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
[{"_type": "builtins.type"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler", "_config": {}}],
[{"_type": "types.GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}],
[{"_type": "pyarrow.Table"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}],
[{"_type": "pyarrow.RecordBatch"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}]
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_handler_registry in this context only registers types.GenericAlias, but the PR description (and register_builtin_handlers) also registers typing._GenericAlias. Because the default semantic_hasher here uses this JSON-constructed registry, typing generics like typing.Dict[int, str] / typing.Optional[int] can still raise in strict mode. Consider adding a typing._GenericAlias entry (or switching this context to a registry implementation that conditionally registers it).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-arnold this is a good point. Shall we register _GenericAlias as well?

registry.register(type, TypeObjectHandler())

# generic alias type annotations: dict[int, str], list[str], etc.
import types as _types
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_builtin_handlers imports types as _types twice (once for functions, again for GenericAlias). This is redundant and makes the section a bit harder to read; you can reuse the existing _types import from the functions block instead of re-importing it.

Suggested change
import types as _types

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants