GH-49103: [Python] Add internal type system stubs (_types, error, _stubs_typing) by rok · Pull Request #48622 · apache/arrow

rok · 2025-12-22T19:53:04Z

Rationale for this change

This is the second in series of PRs adding type annotations to pyarrow and resolving #32609. It builds on top of and should be merged after #48618.

What changes are included in this PR?

This adds:

_types.pyi - Core type definitions including
_stubs_typing.pyi - Internal typing protocols and helpers used across stub files
error.pyi - Exception classes (ArrowException, ArrowInvalid, ArrowIOError, etc.)
Minimal placeholder stubs - lib.pyi, io.pyi, scalar.pyi - using __getattr__ to allow imports to resolve while deferring to subsequent PRs

Are these changes tested?

Via CI type checks established in #48618.

Are there any user-facing changes?

Users will start seeing some minimal annotated types.

GitHub Issue: [Python] Type checking support #32609
GitHub Issue: [Python][Annotations] Add internal type system stubs (_types, error, _stubs_typing) #49103

python/pyarrow-stubs/pyarrow/_stubs_typing.pyi

rok · 2026-01-26T13:31:15Z

I've rebased this on the annotation infra check PR (#48618) to make sure we're on the right track.

github-actions · 2026-01-31T15:25:45Z

⚠️ GitHub issue #49103 has been automatically assigned in GitHub to PR creator.

diff --git c/python/CMakeLists.txt i/python/CMakeLists.txt index 6395b3e..f71a495 100644 --- c/python/CMakeLists.txt +++ i/python/CMakeLists.txt @@ -1042,9 +1042,9 @@ if(EXISTS "${PYARROW_STUBS_SOURCE_DIR}") install(CODE " execute_process( COMMAND \"${Python3_EXECUTABLE}\" - \"${CMAKE_CURRENT_SOURCE_DIR}/scripts/update_stub_docstrings.py\" + \"${CMAKE_SOURCE_DIR}/scripts/update_stub_docstrings.py\" \"${CMAKE_INSTALL_PREFIX}\" - \"${CMAKE_CURRENT_SOURCE_DIR}\" + \"${CMAKE_SOURCE_DIR}\" RESULT_VARIABLE _pyarrow_stub_docstrings_result ) if(NOT _pyarrow_stub_docstrings_result EQUAL 0)

Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>

rok · 2026-03-17T20:51:39Z

@dangotbanned I'm about to have limited connectivity for 2 weeks. If you think this is ready we can ask @raulcd to do a last pass and merge, so we get it into the release.

@raulcd I found that I had to do this df7dce5 to get a working editable install. Seems to be due to filename collisions. Thoughts?

raulcd · 2026-03-18T09:25:08Z

@raulcd I found that I had to do this df7dce5 to get a working editable install. Seems to be due to filename collisions. Thoughts?

What was the error? Seems strange that we have to avoid installing a single specific file (lib) for editable builds

rok · 2026-03-18T10:22:40Z

The error I was getting without the CMake change was:

>>> import pyarrow
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import pyarrow
  File "/Users/rok/Documents/repos/arrow/python/pyarrow/__init__.py", line 59, in <module>
    from pyarrow.lib import (BuildInfo, CppBuildInfo, RuntimeInfo, set_timezone_db_path,
    ...<3 lines>...
                             io_thread_count, is_opentelemetry_enabled, set_io_thread_count)
ModuleNotFoundError: No module named 'pyarrow.lib'

With the CMake change it works fine.

In both cases I'm doing:

export ARROW_HOME=$(pwd)/dist
export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH
export DYLD_LIBRARY_PATH=$ARROW_HOME/lib:$DYLD_LIBRARY_PATH
uv pip install --no-build-isolation --editable .

dangotbanned · 2026-03-22T13:03:58Z

@dangotbanned I'm about to have limited connectivity for 2 weeks. If you think this is ready we can ask @raulcd to do a last pass and merge, so we get it into the release.

Sorry for the delay @rok!

Just starting to dig in again now

dangotbanned

Thanks @rok, nothing blocking for me.

A lot of these are just filling in (#48622 (comment)) - so don't be alarmed at the size 😂

dangotbanned · 2026-03-22T13:10:06Z

python/pyarrow-stubs/pyarrow/_stubs_typing.pyi

+IntegerType: TypeAlias = (
+    lib.Int8Type
+    | lib.Int16Type
+    | lib.Int32Type
+    | lib.Int64Type
+    | lib.UInt8Type
+    | lib.UInt16Type
+    | lib.UInt32Type
+    | lib.UInt64Type
+)
+
+Mask: TypeAlias = (
+    Sequence[bool | None]
+    | NDArray[np.bool_]
+    | lib.Array[lib.Scalar[lib.BoolType]]
+    | ChunkedArray[Any]
+)
+Indices: TypeAlias = (
+    Sequence[int | None]
+    | NDArray[np.integer[Any]]
+    | lib.Array[lib.Scalar[IntegerType]]
+    | ChunkedArray[Any]
+)
+
+PyScalar: TypeAlias = (
+    bool
+    | int
+    | float
+    | Decimal
+    | str
+    | bytes
+    | dt.date
+    | dt.datetime
+    | dt.time
+    | dt.timedelta
+)


dangotbanned · 2026-03-22T13:18:47Z