Added GenericAliasHandler class [ENG-226]#76
Added GenericAliasHandler class [ENG-226]#76brian-arnold wants to merge 1 commit intowalkerlab:devfrom
Conversation
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
eywalker
left a comment
There was a problem hiding this comment.
Thanks for adding support to this! Could you add new tests that cover the GenericAliasHandler usage?
There was a problem hiding this comment.
Pull request overview
Adds support for hashing parameterized generic type annotations (e.g. dict[int, str]) by introducing and registering a new semantic hashing handler, and fixes a truthiness check in Python→Arrow conversion to avoid failures on containers with ambiguous boolean evaluation.
Changes:
- Introduce
GenericAliasHandlerand register it fortypes.GenericAliasand (programmatically)typing._GenericAlias. - Update the default data context (
v0.1.json) to register the new handler fortypes.GenericAlias. - Adjust Python→Arrow list/map converters to use
value is not Nonerather thanif value.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/orcapod/semantic_types/universal_converter.py |
Avoids truthiness checks when converting list/map values to Arrow-friendly structures. |
src/orcapod/hashing/semantic_hashing/builtin_handlers.py |
Adds and registers GenericAliasHandler for parameterized generic annotations. |
src/orcapod/contexts/data/v0.1.json |
Registers GenericAliasHandler in the versioned default context’s TypeHandlerRegistry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class GenericAliasHandler: | ||
| """ | ||
| Handler for generic alias type annotations such as ``dict[int, list[int]]`` | ||
| (``types.GenericAlias``) and ``typing`` generics (``typing._GenericAlias``). | ||
|
|
||
| Produces a stable dict containing the origin type and a list of hashed | ||
| argument types so that structurally identical generic annotations always | ||
| yield the same hash, and structurally different ones yield different hashes. | ||
| """ | ||
|
|
||
| def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any: | ||
| origin = getattr(obj, "__origin__", None) | ||
| args = getattr(obj, "__args__", None) or () | ||
| if origin is None: | ||
| return f"generic_alias:{obj!r}" | ||
| return { | ||
| "__type__": "generic_alias", | ||
| "origin": hasher.hash_object(origin).to_string(), | ||
| "args": [hasher.hash_object(arg).to_string() for arg in args], | ||
| } |
There was a problem hiding this comment.
There are extensive semantic hasher tests, but none appear to cover parameterized generics (e.g. dict[int, str], list[int], typing.Dict[int, str]). Adding a regression test that hashes these types in strict mode (and asserts stable equality/inequality across different args) would prevent this handler from silently breaking in future refactors.
| elif origin is list: | ||
| element_converter = self.get_python_to_arrow_converter(args[0]) | ||
| return ( | ||
| lambda value: [element_converter(item) for item in value] | ||
| if value | ||
| if value is not None | ||
| else [] | ||
| ) | ||
|
|
||
| elif origin is dict: | ||
| key_converter = self.get_python_to_arrow_converter(args[0]) | ||
| value_converter = self.get_python_to_arrow_converter(args[1]) | ||
| return ( | ||
| lambda value: [ | ||
| {"key": key_converter(k), "value": value_converter(v)} | ||
| for k, v in value.items() | ||
| ] | ||
| if value | ||
| if value is not None | ||
| else [] |
There was a problem hiding this comment.
This change fixes truthiness checks for list/dict values, but there’s no regression test ensuring python_to_arrow conversion works for iterable containers with an ambiguous/invalid __bool__ (e.g. numpy arrays raise on if value). Consider adding a test using a small custom iterable whose __bool__ raises, and verifying conversion still succeeds when the value is empty/non-empty.
| [{"_type": "types.BuiltinFunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}], | ||
| [{"_type": "types.MethodType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}], | ||
| [{"_type": "builtins.type"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler", "_config": {}}], | ||
| [{"_type": "types.GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}], | ||
| [{"_type": "pyarrow.Table"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}], | ||
| [{"_type": "pyarrow.RecordBatch"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}] |
There was a problem hiding this comment.
type_handler_registry in this context only registers types.GenericAlias, but the PR description (and register_builtin_handlers) also registers typing._GenericAlias. Because the default semantic_hasher here uses this JSON-constructed registry, typing generics like typing.Dict[int, str] / typing.Optional[int] can still raise in strict mode. Consider adding a typing._GenericAlias entry (or switching this context to a registry implementation that conditionally registers it).
There was a problem hiding this comment.
@brian-arnold this is a good point. Shall we register _GenericAlias as well?
| registry.register(type, TypeObjectHandler()) | ||
|
|
||
| # generic alias type annotations: dict[int, str], list[str], etc. | ||
| import types as _types |
There was a problem hiding this comment.
register_builtin_handlers imports types as _types twice (once for functions, again for GenericAlias). This is redundant and makes the section a bit harder to read; you can reuse the existing _types import from the functions block instead of re-importing it.
| import types as _types |
The semantic hasher had no registered handler for types.GenericAlias — the type Python uses to represent parameterized generics like dict[int, str] or list[int]. Plain types like dict or list worked fine, but as soon as key/value types were specified, the hasher raised a TypeError in strict mode.
This affected both user-written type annotations on @function_pod functions and orcapod's own internal schema inference, which produces parameterized generics like dict[key_type, value_type].
The fix adds a GenericAliasHandler that produces a stable serializable representation by recursively hashing origin and args, and registers it for both types.GenericAlias and typing._GenericAlias.
Original error message: