GH-49103: [Python] Add internal type system stubs (_types, error, _stubs_typing)#48622
GH-49103: [Python] Add internal type system stubs (_types, error, _stubs_typing)#48622rok wants to merge 13 commits intoapache:mainfrom
Conversation
d3c5740 to
27d1c65
Compare
|
I've rebased this on the annotation infra check PR (#48618) to make sure we're on the right track. |
3f9ed3b to
0ac95b0
Compare
|
|
7873930 to
43e7cc6
Compare
51a7a4e to
7bc0a98
Compare
0d15871 to
8f3796d
Compare
8f3796d to
72571d2
Compare
diff --git c/python/CMakeLists.txt i/python/CMakeLists.txt index 6395b3e..f71a495 100644 --- c/python/CMakeLists.txt +++ i/python/CMakeLists.txt @@ -1042,9 +1042,9 @@ if(EXISTS "${PYARROW_STUBS_SOURCE_DIR}") install(CODE " execute_process( COMMAND \"${Python3_EXECUTABLE}\" - \"${CMAKE_CURRENT_SOURCE_DIR}/scripts/update_stub_docstrings.py\" + \"${CMAKE_SOURCE_DIR}/scripts/update_stub_docstrings.py\" \"${CMAKE_INSTALL_PREFIX}\" - \"${CMAKE_CURRENT_SOURCE_DIR}\" + \"${CMAKE_SOURCE_DIR}\" RESULT_VARIABLE _pyarrow_stub_docstrings_result ) if(NOT _pyarrow_stub_docstrings_result EQUAL 0)
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
85f625c to
2cddfd2
Compare
|
@dangotbanned I'm about to have limited connectivity for 2 weeks. If you think this is ready we can ask @raulcd to do a last pass and merge, so we get it into the release. @raulcd I found that I had to do this df7dce5 to get a working editable install. Seems to be due to filename collisions. Thoughts? |
2cddfd2 to
104369a
Compare
|
The error I was getting without the CMake change was: >>> import pyarrow
Traceback (most recent call last):
File "<python-input-0>", line 1, in <module>
import pyarrow
File "/Users/rok/Documents/repos/arrow/python/pyarrow/__init__.py", line 59, in <module>
from pyarrow.lib import (BuildInfo, CppBuildInfo, RuntimeInfo, set_timezone_db_path,
...<3 lines>...
io_thread_count, is_opentelemetry_enabled, set_io_thread_count)
ModuleNotFoundError: No module named 'pyarrow.lib'With the CMake change it works fine. In both cases I'm doing: export ARROW_HOME=$(pwd)/dist
export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH
export DYLD_LIBRARY_PATH=$ARROW_HOME/lib:$DYLD_LIBRARY_PATH
uv pip install --no-build-isolation --editable . |
Sorry for the delay @rok! Just starting to dig in again now |
dangotbanned
left a comment
There was a problem hiding this comment.
Thanks @rok, nothing blocking for me.
A lot of these are just filling in (#48622 (comment)) - so don't be alarmed at the size 😂
| IntegerType: TypeAlias = ( | ||
| lib.Int8Type | ||
| | lib.Int16Type | ||
| | lib.Int32Type | ||
| | lib.Int64Type | ||
| | lib.UInt8Type | ||
| | lib.UInt16Type | ||
| | lib.UInt32Type | ||
| | lib.UInt64Type | ||
| ) | ||
|
|
||
| Mask: TypeAlias = ( | ||
| Sequence[bool | None] | ||
| | NDArray[np.bool_] | ||
| | lib.Array[lib.Scalar[lib.BoolType]] | ||
| | ChunkedArray[Any] | ||
| ) | ||
| Indices: TypeAlias = ( | ||
| Sequence[int | None] | ||
| | NDArray[np.integer[Any]] | ||
| | lib.Array[lib.Scalar[IntegerType]] | ||
| | ChunkedArray[Any] | ||
| ) | ||
|
|
||
| PyScalar: TypeAlias = ( | ||
| bool | ||
| | int | ||
| | float | ||
| | Decimal | ||
| | str | ||
| | bytes | ||
| | dt.date | ||
| | dt.datetime | ||
| | dt.time | ||
| | dt.timedelta | ||
| ) |
There was a problem hiding this comment.
Not sure how many other cases you might have for this IntoArray.
But the building blocks to get there should be helpful anyway 🙂
| IntegerType: TypeAlias = ( | |
| lib.Int8Type | |
| | lib.Int16Type | |
| | lib.Int32Type | |
| | lib.Int64Type | |
| | lib.UInt8Type | |
| | lib.UInt16Type | |
| | lib.UInt32Type | |
| | lib.UInt64Type | |
| ) | |
| Mask: TypeAlias = ( | |
| Sequence[bool | None] | |
| | NDArray[np.bool_] | |
| | lib.Array[lib.Scalar[lib.BoolType]] | |
| | ChunkedArray[Any] | |
| ) | |
| Indices: TypeAlias = ( | |
| Sequence[int | None] | |
| | NDArray[np.integer[Any]] | |
| | lib.Array[lib.Scalar[IntegerType]] | |
| | ChunkedArray[Any] | |
| ) | |
| PyScalar: TypeAlias = ( | |
| bool | |
| | int | |
| | float | |
| | Decimal | |
| | str | |
| | bytes | |
| | dt.date | |
| | dt.datetime | |
| | dt.time | |
| | dt.timedelta | |
| ) | |
| IntegerType: TypeAlias = ( | |
| lib.Int8Type | |
| | lib.Int16Type | |
| | lib.Int32Type | |
| | lib.Int64Type | |
| | lib.UInt8Type | |
| | lib.UInt16Type | |
| | lib.UInt32Type | |
| | lib.UInt64Type | |
| ) | |
| PyScalar: TypeAlias = ( | |
| bool | |
| | int | |
| | float | |
| | Decimal | |
| | str | |
| | bytes | |
| | dt.date | |
| | dt.datetime | |
| | dt.time | |
| | dt.timedelta | |
| ) | |
| NumpyScalar: TypeAlias = "np.generic[Any]" | |
| PyScalarT_co = TypeVar("PyScalarT_co", bound=PyScalar, covariant=True) | |
| NumpyScalarT_co = TypeVar("NumpyScalarT_co", bound=NumpyScalar, covariant=True) | |
| DataTypeT_co = TypeVar("DataTypeT_co", bound=lib.DataType, covariant=True) | |
| IntoArray: TypeAlias = ( | |
| Sequence[PyScalarT_co | None] | |
| | NDArray[NumpyScalarT_co] | |
| | lib.Array[lib.Scalar[DataTypeT_co]] | |
| | ChunkedArray[Any] | |
| ) | |
| Mask: TypeAlias = IntoArray[bool, np.bool_, lib.BoolType] | |
| Indices: TypeAlias = IntoArray[int, np.integer[Any], IntegerType] |
| def _import_from_c(cls, in_ptr: int) -> Self: ... | ||
| def __arrow_c_schema__(self) -> Any: ... | ||
| @classmethod | ||
| def _import_from_c_capsule(cls, schema) -> Self: ... |
There was a problem hiding this comment.
| def _import_from_c_capsule(cls, schema) -> Self: ... | |
| def _import_from_c_capsule(cls, schema: Any) -> Self: ... |
| UInt64Type, | ||
| Int64Type, | ||
| ) | ||
| _BasicValueT = TypeVar("_BasicValueT", bound=_BasicDataType, default=_BasicDataType) |
There was a problem hiding this comment.
| _BasicValueT = TypeVar("_BasicValueT", bound=_BasicDataType, default=_BasicDataType) | |
| _BasicValueT = TypeVar( | |
| "_BasicValueT", bound=_BasicDataType[Any], default=_BasicDataType[Any] | |
| ) |
|
|
||
| class StructType(DataType): | ||
| def get_field_index(self, name: str) -> int: ... | ||
| def field(self, i: int | str) -> Field: ... |
There was a problem hiding this comment.
| def field(self, i: int | str) -> Field: ... | |
| def field(self, i: int | str) -> Field[Any]: ... |
| def field(self, i: int | str) -> Field: ... | ||
| def get_all_field_indices(self, name: str) -> list[int]: ... | ||
| def __len__(self) -> int: ... | ||
| def __iter__(self) -> Iterator[Field]: ... |
There was a problem hiding this comment.
| def __iter__(self) -> Iterator[Field]: ... | |
| def __iter__(self) -> Iterator[Field[Any]]: ... |
| def large_list( | ||
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | ||
| ) -> LargeListType[_DataTypeT]: ... | ||
| def list_view( | ||
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | ||
| ) -> ListViewType[_DataTypeT]: ... | ||
| def large_list_view( | ||
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | ||
| ) -> LargeListViewType[_DataTypeT]: ... |
There was a problem hiding this comment.
You might need to do another pass on where a default None appears:
| def large_list( | |
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | |
| ) -> LargeListType[_DataTypeT]: ... | |
| def list_view( | |
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | |
| ) -> ListViewType[_DataTypeT]: ... | |
| def large_list_view( | |
| value_type: _DataTypeT | Field[_DataTypeT] | None = None, | |
| ) -> LargeListViewType[_DataTypeT]: ... | |
| def large_list( | |
| value_type: _DataTypeT | Field[_DataTypeT], | |
| ) -> LargeListType[_DataTypeT]: ... | |
| def list_view( | |
| value_type: _DataTypeT | Field[_DataTypeT], | |
| ) -> ListViewType[_DataTypeT]: ... | |
| def large_list_view( | |
| value_type: _DataTypeT | Field[_DataTypeT], | |
| ) -> LargeListViewType[_DataTypeT]: ... |
Using None or no parameters will raise:
>>> pa.list_()
TypeError: list_() takes at least 1 positional argument (0 given)
>>> pa.large_list(None)
TypeError: List requires DataType or Field| def unregister_extension_type(type_name: str) -> None: ... | ||
|
|
||
| _StrOrBytes: TypeAlias = str | bytes | ||
| _MetadataMapping: TypeAlias = Mapping[_StrOrBytes, _StrOrBytes] |
There was a problem hiding this comment.
Multi-part suggestion (1/3)
| _MetadataMapping: TypeAlias = Mapping[_StrOrBytes, _StrOrBytes] | |
| _MetadataMapping: TypeAlias = ( | |
| Mapping[bytes, bytes] | Mapping[str, str] | Mapping[bytes, str] | Mapping[str, bytes] | |
| ) |
| _SchemaMetadataInput: TypeAlias = ( | ||
| Mapping[bytes, bytes] | ||
| | Mapping[str, str] | ||
| | Mapping[bytes, str] | ||
| | Mapping[str, bytes] | ||
| ) |
There was a problem hiding this comment.
Multi-part suggestion (2/3)
| _SchemaMetadataInput: TypeAlias = ( | |
| Mapping[bytes, bytes] | |
| | Mapping[str, str] | |
| | Mapping[bytes, str] | |
| | Mapping[str, bytes] | |
| ) |
| | Iterable[tuple[str, _FieldTypeInput]] | ||
| | Mapping[Any, _FieldTypeInput] | ||
| ), | ||
| metadata: _SchemaMetadataInput | None = None, |
There was a problem hiding this comment.
Multi-part suggestion (3/3)
| metadata: _SchemaMetadataInput | None = None, | |
| metadata: _MetadataMapping | None = None, |
| class Buffer(Protocol): ... | ||
| class SupportPyBuffer(Protocol): ... |
There was a problem hiding this comment.
Yes, these are to be expanded in subsequent PRs.
Would you suggest a change at this point?
No need, was just curious 😄
Rationale for this change
This is the second in series of PRs adding type annotations to pyarrow and resolving #32609. It builds on top of and should be merged after #48618.
What changes are included in this PR?
This adds:
_types.pyi- Core type definitions including_stubs_typing.pyi- Internal typing protocols and helpers used across stub fileserror.pyi- Exception classes (ArrowException,ArrowInvalid,ArrowIOError, etc.)lib.pyi,io.pyi,scalar.pyi- using__getattr__to allow imports to resolve while deferring to subsequent PRsAre these changes tested?
Via CI type checks established in #48618.
Are there any user-facing changes?
Users will start seeing some minimal annotated types.