feat: add clickhouse-bench with auto-downloaded ClickHouse binary#6736
feat: add clickhouse-bench with auto-downloaded ClickHouse binary#6736fastio wants to merge 8 commits intovortex-data:developfrom
Conversation
Introduce a new clickhouse-bench benchmark crate that runs ClickBench queries against Parquet data via clickhouse-local, providing a baseline for comparing Vortex performance against ClickHouse. Key design decisions: - build.rs auto-downloads the full ClickHouse binary (with Parquet support) into target/clickhouse-local/, similar to how vortex-duckdb downloads the DuckDB library. This eliminates manual install steps and avoids issues with slim/homebrew builds lacking Parquet support. - The binary path is baked in via CLICKHOUSE_BINARY env at compile time; CLICKHOUSE_LOCAL env var allows runtime override. - ClickHouse-dialect SQL queries are maintained in a separate clickbench_clickhouse_queries.sql file (43 queries). - CI workflows updated to include clickhouse:parquet target in ClickBench benchmarks and conditionally build clickhouse-bench.
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
There was a problem hiding this comment.
why do we need this file is it difference to the already included one?
There was a problem hiding this comment.
Good catch! I have removed the duplicate clickbench_clickhouse_queries.sql and validated with cargo check -p vortex-bench.
…ithub.com/fastio/vortex into integration-clickhouse-benchmark-baseline
myrrc
left a comment
There was a problem hiding this comment.
I don't think downloading untrusted binaries from internet via a build script is a good idea. We want first-class integration with duckdb thus we need to download its sources (although I'd not do it in build script as well), but we don't need such integration with Clickhouse yet.
My idea is to use clickhouse binary in CI (as it runs on Linux only) and require users to download it by hand if they want a local run. Benchmarking on MacOS doesn't make much sense anyway as vectorized instrustion set is different.
Agreed — removed the binary download from build.rs entirely. The clickhouse binary is now resolved at runtime: via CLICKHOUSE_BINARY env var or from $PATH. CI installs it via the official installer before building. Local users need to install it manually. No more untrusted binary downloads in the build script. |
…use from PATH - Remove reqwest-based binary download from build.rs - Resolve clickhouse binary via CLICKHOUSE_BINARY env var or $PATH at runtime - Add CI step to install clickhouse before building when needed - Fail with clear error message if binary is not found locally
.github/workflows/sql-benchmarks.yml
Outdated
| - name: Install ClickHouse | ||
| if: contains(matrix.targets, 'clickhouse:') | ||
| run: | | ||
| curl https://clickhouse.com/ | sh |
There was a problem hiding this comment.
Why not download the latest release file for our architecture from Github releases? We then don't need any installation and curl in general.
There was a problem hiding this comment.
Good call — updated CI to download the static binary directly from GitHub Releases (pined ClickHouse to LTS release v25.8.18.1 from GitHub Releases), no curl | sh or installation needed.
- Pass subcommand arg to clickhouse-bench in run-sql-bench.sh for consistency - Use BenchmarkArg + create_benchmark() in main.rs like other engines - Replace `which` with `clickhouse local --version` for binary verification - Pin ClickHouse to LTS release v25.8.18.1 from GitHub Releases
…ithub.com/fastio/vortex into integration-clickhouse-benchmark-baseline
Introduce a new clickhouse-bench benchmark crate that runs ClickBench queries against Parquet data via clickhouse-local, providing a baseline for comparing Vortex performance against ClickHouse.
Key design decisions:
#6425