You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #717 tracks a bug where ORCA falls back to the Postgres optimizer when column attributes use COLLATE "C". This feature request proposes actively using locale C / collation C across the database (or enabling it in ORCA plans) to gain significant performance improvements, especially for TPC-DS workloads.
libc/C (collation C) proved fastest in nearly every benchmark
Newer builtin providers (e.g., ICU) did not match libc/C performance
Even sequential scans with builtin providers were >20% slower than C collation
Why This Matters for TPC-DS
TPC-DS contains many string columns (item descriptions, store names, date strings, etc.) that are compared with =, <, >, ORDER BY, and GROUP BY. When ORCA falls back to Postgres planner for these operations due to collation issues:
ORCA's superior plan shapes (e.g., parallel aggregates, better join ordering) are lost
String comparisons use locale-aware collation, incurring significant overhead
As shown in #717, ORCA currently falls back when it encounters COLLATE "C" columns:
-- ORCA works fine for default collation
EXPLAIN SELECT*FROM tbl ORDER BY v;
-- Optimizer: Pivotal Optimizer (GPORCA) ✓-- ORCA falls back for collate C columns
EXPLAIN SELECT*FROM tbl_collate_c ORDER BY v;
-- Optimizer: Postgres query optimizer ✗ (fallback)
Proposed Solution
Fix ORCA to support COLLATE "C" (prerequisite: [Bug] ORCA fallbacks for collate "C" #717) — allow ORCA to generate and execute plans for tables/columns with C collation
Enable C locale support in ORCA's sort/comparison operators — ensure ORCA can correctly classify C collation as a supported collation for sort keys, merge keys, and equality comparisons
TPC-DS test setup: consider recommending LC_COLLATE=C or COLLATE "C" for TPC-DS benchmark tables to unlock ORCA's full optimization potential
Expected Benefit
Full ORCA plan coverage for TPC-DS string columns without fallback
50–100%+ speedup on string-heavy sort/scan/join operations
More accurate and competitive TPC-DS benchmark results for Cloudberry
Background
Issue #717 tracks a bug where ORCA falls back to the Postgres optimizer when column attributes use
COLLATE "C". This feature request proposes actively using locale C / collation C across the database (or enabling it in ORCA plans) to gain significant performance improvements, especially for TPC-DS workloads.Performance Impact
A detailed benchmark by depesz (How much speed you're leaving at the table if you use default locale) demonstrates that the
libc/Ccollation consistently outperforms locale-specific collations:Key findings:
libc/C(collation C) proved fastest in nearly every benchmarkWhy This Matters for TPC-DS
TPC-DS contains many string columns (item descriptions, store names, date strings, etc.) that are compared with
=,<,>,ORDER BY, andGROUP BY. When ORCA falls back to Postgres planner for these operations due to collation issues:Current Behavior
As shown in #717, ORCA currently falls back when it encounters
COLLATE "C"columns:Proposed Solution
COLLATE "C"(prerequisite: [Bug] ORCA fallbacks for collate "C" #717) — allow ORCA to generate and execute plans for tables/columns with C collationLC_COLLATE=CorCOLLATE "C"for TPC-DS benchmark tables to unlock ORCA's full optimization potentialExpected Benefit
Related