[perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse read/export by ddelnano · Pull Request #40 · k8sstormcenter/pixie

ddelnano · 2026-05-16T22:37:28Z

Summary: [perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse read/export

Relevant Issues: N/A

Type of change: /kind perf

Test Plan: Testing on #31. I will also run the perf test an additional time to verify it works.

…read/export functionality Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

… and soc suite Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

coderabbitai · 2026-05-16T22:37:45Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive metrics export abstraction for the perf tool, adding pluggable Parquet GCS and BigQuery exporters; implements a ClickHouse export metrics system with embedded PxL scripts and dedicated experiments; builds a full-featured browser-based dashboard with DuckDB and Plotly; improves Kubernetes resource handling with scope-aware operations and aggregated API filtering; and updates ClickHouse sink/UDTF for cluster mode and better data serialization.

Changes

Perf Tool Export Abstraction and ClickHouse Integration

Layer / File(s)	Summary
Exporter interface and contract `src/e2e_test/perf_tool/pkg/exporter/exporter.go`	Defines the public `Exporter` interface with `ExportResults`, `ExportSpec`, and `Close` methods that all export backends implement.
BigQuery exporter implementation `src/e2e_test/perf_tool/pkg/exporter/bq_exporter.go`	Implements BigQuery exporter wrapping result/spec tables, converting metrics rows, batch-inserting results via goroutine, and writing specs with context timeout.
Parquet GCS exporter with tag schema handling `src/e2e_test/perf_tool/pkg/exporter/parquet_exporter.go`, `parquet_exporter_test.go`	Parquet GCS export with dynamic schema generation from tag keys, batched file writing, GCS upload, and comprehensive tests covering tag ordering, schema validation, single/multiple batches, and empty channels.
Metrics recorder initialization with kubeconfig overrides `src/e2e_test/perf_tool/pkg/metrics/recorder.go`, `prometheus_recorder.go`, `src/e2e_test/perf_tool/pkg/cluster/context.go`	Supports dynamic kubeconfig path and context overrides in metrics recorder creation, tracks cluster context ownership, returns errors on context creation failure, and adds `NewContextFromOptions` constructor.
Perf command-line interface with exporter backends `src/e2e_test/perf_tool/cmd/run.go`	Adds CLI flags for backend selection (BigQuery vs Parquet GCS), GCS/Parquet config, Prometheus overrides, and keep-on-failure mode; implements exporter creation, override parsing/application, and runner wiring.
Runner refactoring for exporter-based output `src/e2e_test/perf_tool/pkg/run/run.go`	Refactors `Runner` to accept `Exporter` instead of BigQuery tables, adds `SetKeepOnFailure`, rewrites `RunExperiment` orchestration with goroutine-driven exporter call, implements conditional cleanup deferral.
ClickHouse export metrics and experiment definitions `src/e2e_test/perf_tool/pkg/suites/metrics.go`, `experiments.go`, `suites.go`, `workloads.go`	Adds ClickHouse export load metric (embedded PxL), ClickHouse operator Prometheus metric, `ClickHouseExportExperiment`, `VizierReleaseWorkload`, and `clickhouse-exec` suite with hard-coded export parameters.
ClickHouse export PxL script `src/e2e_test/perf_tool/pkg/suites/scripts/clickhouse_export.pxl`	Reads event data from parameterized source, exports rows to ClickHouse destination via `px.otel.ClickHouseRows`, records per-run export statistics (row count, node name, timestamp).
Web-based perf metrics dashboard `src/e2e_test/perf_tool/ui/index.html`	Browser UI with DuckDB WASM, Parquet loading (local/GCS), results/specs table creation, suite-view and experiment-view SQL generators, interactive Plotly charts with filtering, aggregation, quantiles, timeline actions, and full dashboard orchestration.
Build configuration updates for exporter `src/e2e_test/perf_tool/cmd/BUILD.bazel`, `pkg/run/BUILD.bazel`, `pkg/suites/BUILD.bazel`, `pkg/exporter/BUILD.bazel`	Adds exporter package dependencies to cmd and run packages, embeds clickhouse_export.pxl in suites, creates new exporter library and test targets.

Kubernetes Resource Handling Enhancements

Layer / File(s)	Summary
REST mapper-based cluster scope detection `src/utils/shared/k8s/apply.go`	Uses REST mapper `mapping.Scope` instead of hardcoded kind list to determine cluster-wide vs namespace-scoped resource application.
Aggregated API service detection in cluster deletion `src/utils/shared/k8s/delete.go`	Detects APIService resources and their aggregated group/versions, excludes matching resources from cluster-wide deletion, ensures dynamic client initialization before filtering.
Pod readiness check for succeeded pods `src/e2e_test/perf_tool/pkg/deploy/checks/k8s_healthcheck.go`, `BUILD.bazel`	Skips readiness checks for pods in `Succeeded` phase (CronJob pods) by importing Kubernetes core v1 and adding early-continue logic.
Optional namespace deletion in prerendered deployments `src/e2e_test/perf_tool/experimentpb/experiment.proto`, `src/e2e_test/perf_tool/pkg/deploy/steps/prerendered.go`	Adds `skip_namespace_delete` field to `PrerenderedDeploy` protobuf and implements early return in `Deploy` when namespace cleanup should be skipped.

ClickHouse Data Sink and UDTF Updates

Layer / File(s)	Summary
UINT128 and event_time handling in ClickHouse export sink `src/carnot/exec/clickhouse_export_sink_node.cc`	Changes UINT128 serialization to `"high:low"` string format, implements automatic `event_time` derivation by converting `TIME64NS` to ClickHouse `DateTime64(3)`.
ClickHouse cluster mode support in schema UDTF `src/vizier/funcs/md_udtfs/md_udtfs_impl.h`	Extends `CreateClickHouseSchemas` UDTF with optional `cluster_name` parameter, conditionally generates `ON CLUSTER` clauses, switches table engine from `MergeTree` to `ReplicatedMergeTree` for cluster mode.

Dependency and Build Configuration Updates

Layer / File(s)	Summary
Go module dependencies for Parquet and compression `go.mod`	Adds `parquet-go v0.25.1` as direct dependency, updates compression/serialization libraries (brotli, lz4, uuid), pins `google.golang.org/protobuf` to `v1.29.1` via replace directive.
Bazel dependency pin synchronization `go_deps.bzl`	Updates version hashes/sums for brotli, uuid, compress; adds repository entries for parquet-go and lz4; adds replace directive for protobuf in Bazel.
Skaffold and Bazel build argument updates `skaffold/skaffold_vizier.yaml`, `src/e2e_test/protocol_loadtest/skaffold_*.yaml`	Adds `--config=x86_64_sysroot` build arguments to Skaffold vizier and protocol loadtest configurations.
CI workflow and Copybara configuration `.github/workflows/perf_clickhouse.yaml`, `tools/private/copybara/copy.bara.sky`	Builds px CLI via Bazel and passes `--cloud_addr` argument in workflow; excludes perf tool and Go dependency files from Copybara upstream sync.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly describes the main change: adding a GitHub action for a ClickHouse perf test focused on pxl read/export functionality.
Description check	✅ Passed	The description is related to the changeset, mentioning the perf_clickhouse GitHub action and ClickHouse read/export testing, which aligns with the PR objectives and file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ddelnano/clickhouse-perf-test

⚔️ Resolve merge conflicts

Resolve merge conflict in branch ddelnano/clickhouse-perf-test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

coderabbitai

Actionable comments posted: 19

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: The workflow currently interpolates `${{ vars.PERF_CLOUD_ADDR }}`
directly into the `--cloud_addr` flag which can expand to an empty value; add an
explicit pre-check step before the job/step that uses `--cloud_addr` to
hard-fail if `vars.PERF_CLOUD_ADDR` is empty (for example, a short run step that
tests `echo -n "${{ vars.PERF_CLOUD_ADDR }}"` and exits non-zero with a clear
error message when empty), or guard the step that passes `--cloud_addr` with an
`if:` condition that ensures `vars.PERF_CLOUD_ADDR != ''`, so the `--cloud_addr`
flag is never invoked with an empty value.

In `@go.mod`:
- Line 324: The go.mod replace directive forcing google.golang.org/protobuf to
v1.29.1 is a vulnerable pin; remove this replace or bump it to a patched release
(e.g., a non-vulnerable v1.30.x or later) and ensure the same protobuf version
is used consistently in any other dependency pins (e.g., Bazel/WORKSPACE or
other module overrides referencing google.golang.org/protobuf) so there are no
conflicting overrides; update the replace line (or delete it) and align all
protobuf pins to the chosen patched version.

In `@src/carnot/exec/clickhouse_export_sink_node.cc`:
- Around line 38-40: The code should defend against missing/unmapped columns by
validating column names against the target schema during initialization: in
ClickhouseExportSinkNode (e.g., implement or update a
mapColumns/InitColumnMapping routine) check each requested column with a
columnExists(schema, name) helper, skip and log a warning for any missing
columns, and ensure downstream functions that build queries or write rows (e.g.,
WriteRow/processRow/exportColumns) only access columns present in the validated
mapping; remove any direct assumptions that all input columns exist so accesses
use the validated mapping (or conditional checks) to avoid crashes.
- Around line 170-196: The code auto-deriving event_time assumes the source
column is TIME64NS but never checks its actual Arrow/PIXIE type; update the
logic around plan_node_->column_mappings() and the block that reads
rb.ColumnAt(time_col_index) to verify the column's type before calling
types::GetValueFromArrowArray<types::TIME64NS>. Specifically, inspect the Arrow
array/type (from rb.ColumnAt(...) or the mapping metadata) to confirm it is
TIME64NS (or the equivalent Arrow type id) and only perform the ns->ms
conversion and call GetValueFromArrowArray when that check passes; if the type
mismatches, handle it explicitly (log/error/skip/convert appropriately) rather
than blindly calling GetValueFromArrowArray, keeping the rest of the flow that
builds clickhouse::ColumnDateTime64 and block.AppendColumn unchanged.

In `@src/e2e_test/perf_tool/cmd/run.go`:
- Around line 359-374: The createExporter(ctx context.Context) parameter is
unused for the bq path; remove the unused ctx parameter from createExporter's
signature and all its call sites, and instead create a local context inside the
parquet-gcs branch to pass into exporter.NewParquetGCSExporter; leave
createBQExporter as-is (it continues not to need a context). Update references
to createExporter and ensure exporter.NewParquetGCSExporter is still called with
a valid local context while createBQExporter remains unchanged.

In `@src/e2e_test/perf_tool/pkg/cluster/context.go`:
- Around line 75-77: The code sets kubeconfigPath to
clientcmd.RecommendedHomeFile when empty, which ignores the actual file resolved
by loadingRules; change the logic so that when kubeconfigPath is empty you
assign loadingRules.GetDefaultFilename() instead of RecommendedHomeFile,
ensuring the resolved path used by loadingRules is what you later export (e.g.,
via AddEnv) so child commands use the correct kubeconfig; update references to
kubeconfigPath and ensure loadingRules is available in the scope where you call
GetDefaultFilename().

In `@src/e2e_test/perf_tool/pkg/exporter/bq_exporter.go`:
- Around line 87-107: ExportResults currently launches inserter.Run(bqCh)
asynchronously and returns without waiting or surfacing insertion failures;
change ExportResults to close bqCh and then wait for the inserter to finish and
return any insertion error(s). Specifically: have bq.BatchInserter expose a
completion mechanism (e.g., an ErrChan or make Run return an error/accept a
context and provide a Done/Err channel) and in BQExporter.ExportResults call
inserter.Run in a goroutine that sends its error to that channel, close(bqCh)
after the input loop, then read from the inserter's completion/error channel and
return a non-nil error if any uploadBatch failures occurred. Reference symbols:
BQExporter.ExportResults, inserter.Run, bq.BatchInserter, bqCh, uploadBatch.

In `@src/e2e_test/perf_tool/pkg/exporter/parquet_exporter.go`:
- Around line 237-267: The per-row sort in buildResultRow is causing unnecessary
overhead; since tagKeys is already sorted, remove the sort.Slice call and
instead assemble entries in alphabetical column order by merging the sorted tag
entries with the fixed columns—i.e., compute the static columns
("experiment_id","name","timestamp","value") and the tag_* columns (iterating
tagKeys and reading row.Tags) and merge them into entries in the correct
alphabetical order (no per-row sort) so the produced entries match the parquet
schema; update buildResultRow to produce this merged order and drop the
sort.Slice usage.

In `@src/e2e_test/perf_tool/pkg/metrics/recorder.go`:
- Line 69: The function handling MetricSpec currently returns nil, nil for
unsupported metric types (the return at the end of recorder.go), which hides
errors; change it to return a non-nil error (e.g., fmt.Errorf("unsupported
metric type: %q", spec.Type) or similar) instead of nil error, and update
imports if needed; reference the function that inspects MetricSpec (the
factory/constructor that returns (..., error) in recorder.go) and ensure callers
handle the error.

In `@src/e2e_test/perf_tool/pkg/run/run.go`:
- Around line 93-98: The change made getCluster and prepareWorkloads run
serially; restore their previous concurrent execution by running
r.getCluster(ctx, spec.ClusterSpec) and r.prepareWorkloads(ctx, spec) as
separate goroutines under an errgroup (e.g., errgroup.WithContext), capture
errors from each goroutine, and return g.Wait() so both run in parallel and
errors are propagated; reference r.getCluster and r.prepareWorkloads to locate
the calls and ensure you pass the same ctx/spec values into the goroutines and
handle their returned errors via the errgroup.

In `@src/e2e_test/perf_tool/pkg/suites/experiments.go`:
- Line 39: The Vizier workload usage is inconsistent: HTTPLoadTestExperiment
uses VizierReleaseWorkload() while ClickHouseExportExperiment uses
VizierWorkload(); decide whether ClickHouseExportExperiment should also use the
pre-built images or intentionally build from source and make that explicit.
Either change ClickHouseExportExperiment to call VizierReleaseWorkload() to
align usage, or add a clear inline comment next to ClickHouseExportExperiment
(or above VizierWorkload/VizierReleaseWorkload usage) explaining why it must
build from source (e.g., requires latest dev build), referencing the experiment
names ClickHouseExportExperiment and HTTPLoadTestExperiment and the functions
VizierWorkload() and VizierReleaseWorkload() so reviewers can easily locate and
verify the intent.

In `@src/e2e_test/perf_tool/pkg/suites/suites.go`:
- Line 42: The duration for nightlyExperimentSuite was changed by assigning dur
:= 5 * time.Minute (previously 40 * time.Minute); confirm whether this 8x
reduction was intentional—if not, revert dur to 40 * time.Minute in the
nightlyExperimentSuite initialization (or restore the prior constant/variable
that set 40 minutes) and add a brief comment or PR note explaining the final
chosen duration to avoid accidental debugging-shortened runs.
- Around line 86-115: The clickhouseExecSuite function currently hardcodes
ClickHouse credentials in the clickhouseDSN variable and contains commented-out
duration alternatives; replace the literal DSN with a configurable source (e.g.,
read from an environment variable or CLI flag) and fall back to a safe default
or return an error if missing, update any references to
clickhouseDSN/clickhouseTable accordingly, add a short comment near
clickhouseDSN clarifying that the value must be for a public/test instance (if
that’s intended), and remove the unused commented-out duration assignments (the
old preDur/dur lines) to eliminate dead code.

In `@src/e2e_test/perf_tool/ui/index.html`:
- Line 107: Buttons like the one with id="gcs-load-btn" (and other <button
class="btn"> elements flagged at the other locations) are missing an explicit
type and will default to type="submit"; update each of these button elements
(e.g., id="gcs-load-btn" and the other .btn buttons mentioned) to include
type="button" to prevent accidental form submission.
- Around line 392-420: The SQL strings in buildSuiteQuery and
buildExperimentQuery interpolate user-provided values (e.g., suite, workload,
experiment IDs) directly, which risks injection from malicious parquet data;
update these functions to stop direct interpolation by either (1) switching to
DuckDB parameterized/prepared queries if available, or (2) implementing a strict
sanitizer that escapes single quotes and rejects unexpected characters
(allow-list names/IDs to [A-Za-z0-9._-] or numeric IDs) before inserting into
the template; ensure the code paths that build metricSourceCTE and the other
WHERE clauses use the sanitized/parameterized values and add unit tests that
pass malicious strings to verify escaping/validation works.

In `@src/utils/shared/k8s/apply.go`:
- Around line 33-34: Reorder the import block so that
"k8s.io/apimachinery/pkg/api/meta" is grouped with the other
"k8s.io/apimachinery" imports (alongside k8serrors) to satisfy the gci linter;
update the import grouping in src/utils/shared/k8s/apply.go so both meta and
k8serrors appear together in the same k8s.io/apimachinery group.

In `@src/utils/shared/k8s/delete.go`:
- Around line 126-127: The function getAggregatedGroupVersions uses the
deprecated sets.String; change its return type and local variable to the generic
type sets.Set[string] and instantiate with sets.New[string]() (instead of
sets.NewString()). Update any uses of out.Insert / out.Has etc. as needed (they
work on sets.Set[string]) so ObjectDeleter.getAggregatedGroupVersions and its
callers compile against k8s.io/apimachinery v0.26.x.

In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h`:
- Around line 1234-1237: The SQL construction for DROP TABLE uses cluster_name_
without validation (see drop_cluster_clause and clickhouse_client_->Execute),
which allows quotes/injection; update the Init method to validate cluster_name_
(reject empty or names containing single quotes, whitespace, semicolons or other
unsafe characters, e.g., only allow alphanumeric, hyphen, underscore) and return
an error if invalid, and/or escape/quote it safely before constructing
drop_cluster_clause; ensure the same validation/escape is applied where
cluster_name_ is used elsewhere (e.g., the similar occurrence at the line
referenced around 1316).
- Around line 1315-1329: Update the explanatory comment around
on_cluster_clause/engine/create_sql to correct the ClickHouse requirement and
note the server config dependency: change the line that says "requires
ClickHouse >= 22.x" to say the auto-generation of ZooKeeper paths for
ReplicatedMergeTree() was introduced in ClickHouse 20.9 (with improvements in
21.1), and add that this auto-generation also requires the server to have
default_replica_path and default_replica_name configured; keep the rest of the
logic (setting engine to "ReplicatedMergeTree()" when cluster_name is non-empty)
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 19e35db8-043f-4270-9d9b-dcbc49e772b2

📥 Commits

Reviewing files that changed from the base of the PR and between 5e04aef and 88763e0.

⛔ Files ignored due to path filters (2)

go.sum is excluded by !**/*.sum
src/e2e_test/perf_tool/experimentpb/experiment.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (34)

.github/workflows/perf_clickhouse.yaml
go.mod
go_deps.bzl
skaffold/skaffold_vizier.yaml
src/carnot/exec/clickhouse_export_sink_node.cc
src/e2e_test/perf_tool/cmd/BUILD.bazel
src/e2e_test/perf_tool/cmd/run.go
src/e2e_test/perf_tool/experimentpb/experiment.proto
src/e2e_test/perf_tool/pkg/cluster/context.go
src/e2e_test/perf_tool/pkg/deploy/checks/BUILD.bazel
src/e2e_test/perf_tool/pkg/deploy/checks/k8s_healthcheck.go
src/e2e_test/perf_tool/pkg/deploy/steps/prerendered.go
src/e2e_test/perf_tool/pkg/exporter/BUILD.bazel
src/e2e_test/perf_tool/pkg/exporter/bq_exporter.go
src/e2e_test/perf_tool/pkg/exporter/exporter.go
src/e2e_test/perf_tool/pkg/exporter/parquet_exporter.go
src/e2e_test/perf_tool/pkg/exporter/parquet_exporter_test.go
src/e2e_test/perf_tool/pkg/metrics/prometheus_recorder.go
src/e2e_test/perf_tool/pkg/metrics/recorder.go
src/e2e_test/perf_tool/pkg/run/BUILD.bazel
src/e2e_test/perf_tool/pkg/run/run.go
src/e2e_test/perf_tool/pkg/suites/BUILD.bazel
src/e2e_test/perf_tool/pkg/suites/experiments.go
src/e2e_test/perf_tool/pkg/suites/metrics.go
src/e2e_test/perf_tool/pkg/suites/scripts/clickhouse_export.pxl
src/e2e_test/perf_tool/pkg/suites/suites.go
src/e2e_test/perf_tool/pkg/suites/workloads.go
src/e2e_test/perf_tool/ui/index.html
src/e2e_test/protocol_loadtest/skaffold_client.yaml
src/e2e_test/protocol_loadtest/skaffold_loadtest.yaml
src/utils/shared/k8s/apply.go
src/utils/shared/k8s/delete.go
src/vizier/funcs/md_udtfs/md_udtfs_impl.h
tools/private/copybara/copy.bara.sky

coderabbitai · 2026-05-16T22:45:51Z

      run: |
        bazel run //src/e2e_test/perf_tool:perf_tool -- run \
          --api_key="${PX_API_KEY}" \
+          --cloud_addr=${{ vars.PERF_CLOUD_ADDR }}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when PERF_CLOUD_ADDR is unset.

Line 133 uses ${{ vars.PERF_CLOUD_ADDR }} directly; when unset, this becomes an empty flag value and can silently run against an unintended default. Add an explicit precheck and hard-fail.

Suggested fix

- name: Run clickhouse-export perf env: PX_API_KEY: ${{ secrets.PX_API_KEY }} + PERF_CLOUD_ADDR: ${{ vars.PERF_CLOUD_ADDR }} GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }} KUBECONFIG: ${{ runner.temp }}/kubeconfig run: | + test -n "${PERF_CLOUD_ADDR}" || { echo "PERF_CLOUD_ADDR is not set"; exit 1; } bazel run //src/e2e_test/perf_tool:perf_tool -- run \ --api_key="${PX_API_KEY}" \ - --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} + --cloud_addr="${PERF_CLOUD_ADDR}" \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/perf_clickhouse.yaml at line 133, The workflow currently interpolates `${{ vars.PERF_CLOUD_ADDR }}` directly into the `--cloud_addr` flag which can expand to an empty value; add an explicit pre-check step before the job/step that uses `--cloud_addr` to hard-fail if `vars.PERF_CLOUD_ADDR` is empty (for example, a short run step that tests `echo -n "${{ vars.PERF_CLOUD_ADDR }}"` and exits non-zero with a clear error message when empty), or guard the step that passes `--cloud_addr` with an `if:` condition that ensures `vars.PERF_CLOUD_ADDR != ''`, so the `--cloud_addr` flag is never invoked with an empty value.

coderabbitai · 2026-05-16T22:45:51Z

GHSA-8r3f-844c-mc37

 	gopkg.in/yaml.v2 => gopkg.in/yaml.v2 v2.4.0
 )
+
+replace google.golang.org/protobuf => google.golang.org/protobuf v1.29.1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Confirm protobuf pins in go.mod ==" rg -n 'google\.golang\.org/protobuf' go.mod echo echo "== Query OSV for protobuf v1.29.1 advisories ==" curl -s https://api.osv.dev/v1/query \ -H 'Content-Type: application/json' \ -d '{"package":{"name":"google.golang.org/protobuf","ecosystem":"Go"},"version":"v1.29.1"}' \ | jq '.vulns[] | {id: .id, summary: .summary, severity: .database_specific.severity}'

Repository: k8sstormcenter/pixie

Length of output: 582

Remove or upgrade forced protobuf version; current pin is vulnerable.

Line 324 forces google.golang.org/protobuf to v1.29.1, which contains a moderate-severity vulnerability (GHSA-8r3f-844c-mc37: infinite loop in JSON unmarshaling). This override defeats the safer indirect version at Line 282 and should be removed or bumped to a patched release consistently across Go/Bazel dependency pins.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go.mod` at line 324, The go.mod replace directive forcing google.golang.org/protobuf to v1.29.1 is a vulnerable pin; remove this replace or bump it to a patched release (e.g., a non-vulnerable v1.30.x or later) and ensure the same protobuf version is used consistently in any other dependency pins (e.g., Bazel/WORKSPACE or other module overrides referencing google.golang.org/protobuf) so there are no conflicting overrides; update the replace line (or delete it) and align all protobuf pins to the chosen patched version.

coderabbitai · 2026-05-16T22:45:51Z

+// TODO(ddelnano): Defend against columns that don't exist. These should be
+// ignored by the Node.
+


🧹 Nitpick | 🔵 Trivial

Address TODO: Defending against missing columns.

This TODO notes a valid defensive measure. If unmapped columns are passed, the code should handle them gracefully rather than potentially failing.

Would you like me to help implement this defensive check, or should I open an issue to track this task?

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/carnot/exec/clickhouse_export_sink_node.cc` around lines 38 - 40, The code should defend against missing/unmapped columns by validating column names against the target schema during initialization: in ClickhouseExportSinkNode (e.g., implement or update a mapColumns/InitColumnMapping routine) check each requested column with a columnExists(schema, name) helper, skip and log a warning for any missing columns, and ensure downstream functions that build queries or write rows (e.g., WriteRow/processRow/exportColumns) only access columns present in the validated mapping; remove any direct assumptions that all input columns exist so accesses use the validated mapping (or conditional checks) to avoid crashes.

coderabbitai · 2026-05-16T22:45:51Z

+  // Auto-derive event_time from time_ if time_ is present but event_time is not.
+  // The ClickHouse table schema uses event_time (DateTime64(3), milliseconds) for
+  // partitioning and ordering, but the Pixie table has time_ (TIME64NS, nanoseconds).
+  bool has_time_ = false;
+  bool has_event_time = false;
+  int time_col_index = -1;
+  for (const auto& mapping : plan_node_->column_mappings()) {
+    if (mapping.clickhouse_column_name() == "time_") {
+      has_time_ = true;
+      time_col_index = mapping.input_column_index();
+    }
+    if (mapping.clickhouse_column_name() == "event_time") {
+      has_event_time = true;
+    }
+  }
+
+  if (has_time_ && !has_event_time && time_col_index >= 0) {
+    auto arrow_col = rb.ColumnAt(time_col_index);
+    int64_t num_rows = arrow_col->length();
+    auto event_time_col = std::make_shared<clickhouse::ColumnDateTime64>(3);
+    for (int64_t i = 0; i < num_rows; ++i) {
+      int64_t ns_val = types::GetValueFromArrowArray<types::TIME64NS>(arrow_col.get(), i);
+      // Convert nanoseconds to milliseconds for DateTime64(3)
+      event_time_col->Append(ns_val / 1000000LL);
+    }
+    block.AppendColumn("event_time", event_time_col);
+  }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing type verification before casting to TIME64NS.

The event_time derivation assumes the column mapped to "time_" is TIME64NS, but Line 177-179 only checks the ClickHouse column name, not the actual column type. If a non-TIME64NS column is mapped to "time_", GetValueFromArrowArray<types::TIME64NS> at Line 191 would produce incorrect values or potentially crash.

🛡️ Proposed fix to verify column type

bool has_time_ = false; bool has_event_time = false; int time_col_index = -1; + types::DataType time_col_type = types::DATA_TYPE_UNKNOWN; for (const auto& mapping : plan_node_->column_mappings()) { if (mapping.clickhouse_column_name() == "time_") { has_time_ = true; time_col_index = mapping.input_column_index(); + time_col_type = mapping.column_type(); } if (mapping.clickhouse_column_name() == "event_time") { has_event_time = true; } } - if (has_time_ && !has_event_time && time_col_index >= 0) { + if (has_time_ && !has_event_time && time_col_index >= 0 && + time_col_type == types::TIME64NS) { auto arrow_col = rb.ColumnAt(time_col_index);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/carnot/exec/clickhouse_export_sink_node.cc` around lines 170 - 196, The code auto-deriving event_time assumes the source column is TIME64NS but never checks its actual Arrow/PIXIE type; update the logic around plan_node_->column_mappings() and the block that reads rb.ColumnAt(time_col_index) to verify the column's type before calling types::GetValueFromArrowArray<types::TIME64NS>. Specifically, inspect the Arrow array/type (from rb.ColumnAt(...) or the mapping metadata) to confirm it is TIME64NS (or the equivalent Arrow type id) and only perform the ns->ms conversion and call GetValueFromArrowArray when that check passes; if the type mismatches, handle it explicitly (log/error/skip/convert appropriately) rather than blindly calling GetValueFromArrowArray, keeping the rest of the flow that builds clickhouse::ColumnDateTime64 and block.AppendColumn unchanged.

coderabbitai · 2026-05-16T22:45:51Z

+func createExporter(ctx context.Context) (exporter.Exporter, error) {
+	switch viper.GetString("export_backend") {
+	case "bq":
+		return createBQExporter()
+	case "parquet-gcs":
+		bucket := viper.GetString("gcs_bucket")
+		if bucket == "" {
+			return nil, errors.New("--gcs_bucket is required when using parquet-gcs backend")
+		}
+		prefix := viper.GetString("gcs_prefix")
+		batchSize := viper.GetInt("parquet_batch_size")
+		return exporter.NewParquetGCSExporter(ctx, bucket, prefix, batchSize)
+	default:
+		return nil, fmt.Errorf("unknown export backend: %s", viper.GetString("export_backend"))
+	}
+}


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Context parameter is unused in createExporter.

The ctx parameter is passed but only used for parquet-gcs. For bq, the context is not passed to createBQExporter. This is fine for now since BigQuery table creation doesn't require context, but consider removing the parameter or documenting why it's needed only for certain backends.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/e2e_test/perf_tool/cmd/run.go` around lines 359 - 374, The createExporter(ctx context.Context) parameter is unused for the bq path; remove the unused ctx parameter from createExporter's signature and all its call sites, and instead create a local context inside the parquet-gcs branch to pass into exporter.NewParquetGCSExporter; leave createBQExporter as-is (it continues not to need a context). Update references to createExporter and ensure exporter.NewParquetGCSExporter is still called with a valid local context while createBQExporter remains unchanged.

coderabbitai · 2026-05-16T22:45:52Z

+    function buildSuiteQuery(metric, filters) {
+      const { suite, workload, numCommits, everyN } = filters;
+
+      // Build metric source CTE based on whether it's computed
+      let metricSourceCTE;
+      if (metric.computed && metric.metricName === 'heap_ex_table') {
+        metricSourceCTE = `
+          metric_source AS (
+            SELECT
+              h.experiment_id,
+              h.timestamp,
+              h.value - COALESCE(t.value, 0) as metric_value,
+              h.tag_pod
+            FROM results h
+            LEFT JOIN results t
+              ON h.experiment_id = t.experiment_id
+              AND h.timestamp = t.timestamp
+              AND COALESCE(h.tag_pod, '') = COALESCE(t.tag_pod, '')
+              AND t.name = 'table_size'
+            WHERE h.name = 'heap_size_bytes'
+          )`;
+      } else {
+        metricSourceCTE = `
+          metric_source AS (
+            SELECT experiment_id, timestamp, value as metric_value, tag_pod
+            FROM results
+            WHERE name = '${metric.metricName}'
+          )`;
+      }


🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

SQL queries use template strings with user-provided filter values.

The buildSuiteQuery and buildExperimentQuery functions interpolate filter values (suite, workload names, experiment IDs) directly into SQL strings. While DuckDB runs client-side in WASM (reducing server-side injection risk), malicious parquet data could contain crafted strings that break queries or cause unexpected behavior.

Consider using parameterized queries if DuckDB WASM supports them, or sanitizing/escaping values.

Also applies to: 568-602

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/e2e_test/perf_tool/ui/index.html` around lines 392 - 420, The SQL strings in buildSuiteQuery and buildExperimentQuery interpolate user-provided values (e.g., suite, workload, experiment IDs) directly, which risks injection from malicious parquet data; update these functions to stop direct interpolation by either (1) switching to DuckDB parameterized/prepared queries if available, or (2) implementing a strict sanitizer that escapes single quotes and rejects unexpected characters (allow-list names/IDs to [A-Za-z0-9._-] or numeric IDs) before inserting into the template; ensure the code paths that build metricSourceCTE and the other WHERE clauses use the sanitized/parameterized values and add unit tests that pass malicious strings to verify escaping/validation works.

coderabbitai · 2026-05-16T22:45:52Z

+	"k8s.io/apimachinery/pkg/api/meta"
 	k8serrors "k8s.io/apimachinery/pkg/api/errors"


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Import ordering flagged by linter.

The gci linter reports that imports are not properly formatted. The meta import should be grouped with other k8s.io/apimachinery imports.

🔧 Suggested fix

- "k8s.io/apimachinery/pkg/api/meta" k8serrors "k8s.io/apimachinery/pkg/api/errors" + "k8s.io/apimachinery/pkg/api/meta" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"k8s.io/apimachinery/pkg/api/meta"

k8serrors "k8s.io/apimachinery/pkg/api/errors"

k8serrors "k8s.io/apimachinery/pkg/api/errors"

"k8s.io/apimachinery/pkg/api/meta"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

🧰 Tools

🪛 golangci-lint (2.12.2)

[error] 33-33: File is not properly formatted

(gci)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/shared/k8s/apply.go` around lines 33 - 34, Reorder the import block so that "k8s.io/apimachinery/pkg/api/meta" is grouped with the other "k8s.io/apimachinery" imports (alongside k8serrors) to satisfy the gci linter; update the import grouping in src/utils/shared/k8s/apply.go so both meta and k8serrors appear together in the same k8s.io/apimachinery group.

coderabbitai · 2026-05-16T22:45:52Z

+func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) {
+	out := sets.NewString()


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What version of k8s.io/apimachinery introduced sets.Set generic type?

💡 Result:

The generic sets.Set[T] type in k8s.io/apimachinery/pkg/util/sets was introduced in version v0.26.0 [1]. This generic implementation is intended to replace the older, type-specific set implementations (such as sets.String or sets.Int), which are now deprecated [1][2].

Citations:

1: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/sets

2: https://github.com/kubernetes/apimachinery/blob/master/pkg/util/sets/string.go

🏁 Script executed:

fd -t f go.mod | head -5 && cat go.mod | grep -E 'k8s.io/apimachinery|go\s+'

Repository: k8sstormcenter/pixie

Length of output: 849

🏁 Script executed:

cat -n src/utils/shared/k8s/delete.go | sed -n '120,135p'

Repository: k8sstormcenter/pixie

Length of output: 895

🏁 Script executed:

rg 'sets\.String|sets\.New' src/utils/shared/k8s/delete.go -n

Repository: k8sstormcenter/pixie

Length of output: 237

Replace deprecated sets.String with generic sets.Set[string].

The sets.String type is deprecated and the project uses k8s.io/apimachinery v0.26.2 which supports the generic sets.Set[T] type (available since v0.26.0).

♻️ Proposed fix

-func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) { - out := sets.NewString() +func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.Set[string], error) { + out := sets.New[string]()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) {

out := sets.NewString()

func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.Set[string], error) {

out := sets.New[string]()

🧰 Tools

🪛 golangci-lint (2.12.2)

[error] 126-126: SA1019: sets.String is deprecated: use generic Set instead. new ways: s1 := Set[string]{} s2 := Newstring

(staticcheck)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/shared/k8s/delete.go` around lines 126 - 127, The function getAggregatedGroupVersions uses the deprecated sets.String; change its return type and local variable to the generic type sets.Set[string] and instantiate with sets.New[string]() (instead of sets.NewString()). Update any uses of out.Insert / out.Has etc. as needed (they work on sets.Set[string]) so ObjectDeleter.getAggregatedGroupVersions and its callers compile against k8s.io/apimachinery v0.26.x.

coderabbitai · 2026-05-16T22:45:52Z

+          std::string drop_cluster_clause =
+              cluster_name_.empty() ? "" : absl::Substitute(" ON CLUSTER '$0'", cluster_name_);
+          clickhouse_client_->Execute(
+              absl::Substitute("DROP TABLE IF EXISTS $0$1", table_name, drop_cluster_clause));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SQL injection risk: cluster_name is not sanitized.

If cluster_name contains a single quote (e.g., test'cluster), the SQL will break or could be exploited. The same pattern appears at Line 1316.

🔒 Proposed fix to validate cluster_name

Add validation in Init to reject invalid cluster names:

cluster_name_ = std::string(cluster_name); + + // Validate cluster_name to prevent SQL injection + if (!cluster_name_.empty()) { + for (char c : cluster_name_) { + if (!std::isalnum(c) && c != '_' && c != '-') { + return error::InvalidArgument( + "cluster_name contains invalid characters: only alphanumeric, underscore, and hyphen allowed"); + } + } + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h` around lines 1234 - 1237, The SQL construction for DROP TABLE uses cluster_name_ without validation (see drop_cluster_clause and clickhouse_client_->Execute), which allows quotes/injection; update the Init method to validate cluster_name_ (reject empty or names containing single quotes, whitespace, semicolons or other unsafe characters, e.g., only allow alphanumeric, hyphen, underscore) and return an error if invalid, and/or escape/quote it safely before constructing drop_cluster_clause; ensure the same validation/escape is applied where cluster_name_ is used elsewhere (e.g., the similar occurrence at the line referenced around 1316).

coderabbitai · 2026-05-16T22:45:52Z

+    std::string on_cluster_clause =
+        cluster_name.empty() ? "" : absl::Substitute(" ON CLUSTER '$0'", cluster_name);
+    // Use ReplicatedMergeTree when cluster mode is enabled so that data is
+    // replicated across nodes. ClickHouse auto-generates the ZooKeeper paths
+    // when no explicit arguments are provided (requires ClickHouse >= 22.x).
+    std::string engine = cluster_name.empty() ? "MergeTree()" : "ReplicatedMergeTree()";
    std::string create_sql = absl::Substitute(R"(
-      CREATE TABLE $0$1 (
-        $2
-      ) ENGINE = MergeTree()
+      CREATE TABLE $0$1$2 (
+        $3
+      ) ENGINE = $4
      PARTITION BY toYYYYMM(event_time)
      ORDER BY (hostname, event_time)
    )",
-                                              if_not_exists_clause, table_name, columns_str);
+                                              if_not_exists_clause, table_name, on_cluster_clause,
+                                              columns_str, engine);


⚠️ Potential issue | 🟠 Major | 💤 Low value

🧩 Analysis chain

🌐 Web query:

What ClickHouse version introduced ReplicatedMergeTree without explicit ZooKeeper path arguments?

💡 Result:

The ability to omit explicit ZooKeeper path and replica name arguments when creating a ReplicatedMergeTree table was introduced in ClickHouse version 20.9 [1]. This feature was implemented via pull request #14791, which allowed users to define default values for these parameters in the server configuration file using the default_replica_path and default_replica_name settings [1]. When these settings are configured, the ReplicatedMergeTree engine can be instantiated without providing the zoo_path and replica_name arguments in the CREATE TABLE statement [2][1]. A subsequent improvement was made in version 21.1 (via pull request #21060) to ensure these default settings were correctly applied even when the engine required other parameters (such as in ReplicatedReplacingMergeTree) [3][4].

Citations:

1: Support default arguments for Replicated engine ClickHouse/ClickHouse#14791

2: https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/engines/table-engines/mergetree-family/replication.md

3: fix default_replica_path and default_replica_name values are useless … ClickHouse/ClickHouse#21060

4: fix default_replica_path and default_replica_name values are useless … ClickHouse/ClickHouse#21060

Correct the ClickHouse version requirement and document server configuration dependency.

The code comment incorrectly states ReplicatedMergeTree() without explicit ZooKeeper paths requires ClickHouse >= 22.x. Verification shows this feature was actually introduced in ClickHouse 20.9 (with improvements in 21.1). Additionally, the comment omits that this feature requires server configuration settings default_replica_path and default_replica_name to be defined. Update the comment to accurately reflect the version requirement and highlight the configuration dependency.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h` around lines 1315 - 1329, Update the explanatory comment around on_cluster_clause/engine/create_sql to correct the ClickHouse requirement and note the server config dependency: change the line that says "requires ClickHouse >= 22.x" to say the auto-generation of ZooKeeper paths for ReplicatedMergeTree() was introduced in ClickHouse 20.9 (with improvements in 21.1), and add that this auto-generation also requires the server to have default_replica_path and default_replica_name configured; keep the rest of the logic (setting engine to "ReplicatedMergeTree()" when cluster_name is non-empty) unchanged.

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/perf_clickhouse.yaml (1)

133-133: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when PERF_CLOUD_ADDR is unset.

This flag still uses ${{ vars.PERF_CLOUD_ADDR }} directly; when unset, it expands to --cloud_addr= and may silently proceed with an unintended default or cause runtime errors. Add an explicit validation step that hard-fails if the variable is empty.

🛡️ Recommended fix

     - name: Run clickhouse-export perf
       env:
         PX_API_KEY: ${{ secrets.PX_API_KEY }}
+        PERF_CLOUD_ADDR: ${{ vars.PERF_CLOUD_ADDR }}
         GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
         KUBECONFIG: ${{ runner.temp }}/kubeconfig
       run: |
+        test -n "${PERF_CLOUD_ADDR}" || { echo "PERF_CLOUD_ADDR is not set"; exit 1; }
         bazel run //src/e2e_test/perf_tool:perf_tool -- run \
           --api_key="${PX_API_KEY}" \
-          --cloud_addr=${{ vars.PERF_CLOUD_ADDR }}
+          --cloud_addr="${PERF_CLOUD_ADDR}" \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/perf_clickhouse.yaml at line 133, Add a pre-flight
validation step before the command that uses the --cloud_addr flag to fail fast
when PERF_CLOUD_ADDR is empty: check the GitHub Actions variable PERF_CLOUD_ADDR
(e.g., run: if [ -z "${{ vars.PERF_CLOUD_ADDR }}" ]; then echo "ERROR:
PERF_CLOUD_ADDR is not set" >&2; exit 1; fi) and only proceed to invoke the
command with --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} if the check passes;
ensure the error message clearly names PERF_CLOUD_ADDR and exits non‑zero so the
workflow fails immediately.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: Add a pre-flight validation step before the command that uses the
--cloud_addr flag to fail fast when PERF_CLOUD_ADDR is empty: check the GitHub
Actions variable PERF_CLOUD_ADDR (e.g., run: if [ -z "${{ vars.PERF_CLOUD_ADDR
}}" ]; then echo "ERROR: PERF_CLOUD_ADDR is not set" >&2; exit 1; fi) and only
proceed to invoke the command with --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} if
the check passes; ensure the error message clearly names PERF_CLOUD_ADDR and
exits non‑zero so the workflow fails immediately.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8d86c57f-4342-4bd8-a987-ddb126b8dda8

📥 Commits

Reviewing files that changed from the base of the PR and between 88763e0 and fc172f1.

📒 Files selected for processing (1)

.github/workflows/perf_clickhouse.yaml

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/perf_clickhouse.yaml (1)
133-133: ⚠️ Potential issue | 🟠 Major

Fail fast when PERF_CLOUD_ADDR is unset.

This issue was already flagged in a previous review. The --cloud_addr flag uses ${{ vars.PERF_CLOUD_ADDR }} directly without validation or quoting. If the variable is unset, this becomes an empty value and may silently connect to an unintended default endpoint.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/perf_clickhouse.yaml at line 133, The workflow currently
passes --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} without validating
PERF_CLOUD_ADDR; add a pre-flight validation step that checks PERF_CLOUD_ADDR is
non-empty and fails the job (exit non‑zero) if unset, and quote the flag when
used (e.g. --cloud_addr="${{ vars.PERF_CLOUD_ADDR }}") to avoid silent defaults;
target the step that emits the --cloud_addr flag and add the validation before
it so the job fails fast when PERF_CLOUD_ADDR is not provided.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: The workflow currently passes --cloud_addr=${{ vars.PERF_CLOUD_ADDR
}} without validating PERF_CLOUD_ADDR; add a pre-flight validation step that
checks PERF_CLOUD_ADDR is non-empty and fails the job (exit non‑zero) if unset,
and quote the flag when used (e.g. --cloud_addr="${{ vars.PERF_CLOUD_ADDR }}")
to avoid silent defaults; target the step that emits the --cloud_addr flag and
add the validation before it so the job fails fast when PERF_CLOUD_ADDR is not
provided.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d14f16f8-cbda-4200-a5dc-59bf7f65cc70

📥 Commits

Reviewing files that changed from the base of the PR and between fc172f1 and 381e958.

📒 Files selected for processing (1)

.github/workflows/perf_clickhouse.yaml

ddelnano added 4 commits May 16, 2026 15:04

[perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse …

c608b62

…read/export functionality Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Remove skaffold error log support, unnecessary _pem_hostname function…

7ded5bb

… and soc suite Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Update copybara accordingly

46c7386

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Use variable to avoid hardcoding value

88763e0

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

ddelnano temporarily deployed to pr-actions-approval May 16, 2026 22:37 — with GitHub Actions Inactive

Use correct bazel config for px binary

fc172f1

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

coderabbitai Bot requested changes May 16, 2026

View reviewed changes

ddelnano temporarily deployed to pr-actions-approval May 16, 2026 22:45 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Fix malformed bash command

381e958

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

ddelnano temporarily deployed to pr-actions-approval May 16, 2026 22:59 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

		// TODO(ddelnano): Defend against columns that don't exist. These should be
		// ignored by the Node.

		"k8s.io/apimachinery/pkg/api/meta"
		k8serrors "k8s.io/apimachinery/pkg/api/errors"

		func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) {
		out := sets.NewString()

Conversation

ddelnano commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 16, 2026 •

edited

Loading