-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Author of Proposal: @brendan
Reason or Problem
_follow traces every polygon boundary twice -- once to count vertices, then again to fill the coordinate array. This doubles boundary-tracing cost, which is already the dominant phase for rasters with many small regions.
The dask chunk-merge path runs _point_in_ring, _simplify_ring, and _signed_ring_area as plain Python loops. On chunked rasters with many boundary-crossing polygons, these show up in profiles.
_to_geopandas builds Polygon objects one at a time. Shapely 2.0 has a batch polygons() function (compiled C) that does this faster.
Proposal
Design:
No API changes -- these are all internal.
Rewrite _follow to trace in a single pass using a dynamically-grown numpy buffer. Start at 64 points, double when full, trim after the loop. The O(log n) reallocation overhead is small next to a full second traversal of the boundary.
Add @ngjit to _point_in_ring, _simplify_ring, and _signed_ring_area. Numeric loops over numpy arrays, no Python objects -- numba handles them fine.
In _to_geopandas, batch-construct hole-free polygons with shapely.polygons(). Polygons with holes keep using the scalar constructor. Shapely < 2.0 gets the current code path via a version check.
Usage:
No user-facing changes. Existing polygonize() calls get faster.
Value:
The tracing rewrite speeds up all four backends. JIT merge helpers mainly help dask, where boundary merging can eat a real fraction of total time. Batch shapely helps geopandas output.
Stakeholders and Impacts
Internal only. The existing benchmark suite (benchmarks/benchmarks/polygonize.py) covers before/after measurement.
Drawbacks
The single-pass buffer growth in _follow is harder to read than the current two-pass logic. The shapely batch path needs shapely >= 2.0 (Dec 2022), gated by a runtime check.
Alternatives
Could pre-compute vertex counts from region perimeters, but that's itself another pass over the data. Numba typed lists instead of buffer doubling have worse per-append overhead.
Unresolved Questions
None.