[perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse read/export#40
[perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse read/export#40ddelnano wants to merge 6 commits into
Conversation
…read/export functionality Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
… and soc suite Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
📝 WalkthroughWalkthroughThis PR introduces a comprehensive metrics export abstraction for the perf tool, adding pluggable Parquet GCS and BigQuery exporters; implements a ClickHouse export metrics system with embedded PxL scripts and dedicated experiments; builds a full-featured browser-based dashboard with DuckDB and Plotly; improves Kubernetes resource handling with scope-aware operations and aggregated API filtering; and updates ClickHouse sink/UDTF for cluster mode and better data serialization. ChangesPerf Tool Export Abstraction and ClickHouse Integration
Kubernetes Resource Handling Enhancements
ClickHouse Data Sink and UDTF Updates
Dependency and Build Configuration Updates
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 19
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: The workflow currently interpolates `${{ vars.PERF_CLOUD_ADDR }}`
directly into the `--cloud_addr` flag which can expand to an empty value; add an
explicit pre-check step before the job/step that uses `--cloud_addr` to
hard-fail if `vars.PERF_CLOUD_ADDR` is empty (for example, a short run step that
tests `echo -n "${{ vars.PERF_CLOUD_ADDR }}"` and exits non-zero with a clear
error message when empty), or guard the step that passes `--cloud_addr` with an
`if:` condition that ensures `vars.PERF_CLOUD_ADDR != ''`, so the `--cloud_addr`
flag is never invoked with an empty value.
In `@go.mod`:
- Line 324: The go.mod replace directive forcing google.golang.org/protobuf to
v1.29.1 is a vulnerable pin; remove this replace or bump it to a patched release
(e.g., a non-vulnerable v1.30.x or later) and ensure the same protobuf version
is used consistently in any other dependency pins (e.g., Bazel/WORKSPACE or
other module overrides referencing google.golang.org/protobuf) so there are no
conflicting overrides; update the replace line (or delete it) and align all
protobuf pins to the chosen patched version.
In `@src/carnot/exec/clickhouse_export_sink_node.cc`:
- Around line 38-40: The code should defend against missing/unmapped columns by
validating column names against the target schema during initialization: in
ClickhouseExportSinkNode (e.g., implement or update a
mapColumns/InitColumnMapping routine) check each requested column with a
columnExists(schema, name) helper, skip and log a warning for any missing
columns, and ensure downstream functions that build queries or write rows (e.g.,
WriteRow/processRow/exportColumns) only access columns present in the validated
mapping; remove any direct assumptions that all input columns exist so accesses
use the validated mapping (or conditional checks) to avoid crashes.
- Around line 170-196: The code auto-deriving event_time assumes the source
column is TIME64NS but never checks its actual Arrow/PIXIE type; update the
logic around plan_node_->column_mappings() and the block that reads
rb.ColumnAt(time_col_index) to verify the column's type before calling
types::GetValueFromArrowArray<types::TIME64NS>. Specifically, inspect the Arrow
array/type (from rb.ColumnAt(...) or the mapping metadata) to confirm it is
TIME64NS (or the equivalent Arrow type id) and only perform the ns->ms
conversion and call GetValueFromArrowArray when that check passes; if the type
mismatches, handle it explicitly (log/error/skip/convert appropriately) rather
than blindly calling GetValueFromArrowArray, keeping the rest of the flow that
builds clickhouse::ColumnDateTime64 and block.AppendColumn unchanged.
In `@src/e2e_test/perf_tool/cmd/run.go`:
- Around line 359-374: The createExporter(ctx context.Context) parameter is
unused for the bq path; remove the unused ctx parameter from createExporter's
signature and all its call sites, and instead create a local context inside the
parquet-gcs branch to pass into exporter.NewParquetGCSExporter; leave
createBQExporter as-is (it continues not to need a context). Update references
to createExporter and ensure exporter.NewParquetGCSExporter is still called with
a valid local context while createBQExporter remains unchanged.
In `@src/e2e_test/perf_tool/pkg/cluster/context.go`:
- Around line 75-77: The code sets kubeconfigPath to
clientcmd.RecommendedHomeFile when empty, which ignores the actual file resolved
by loadingRules; change the logic so that when kubeconfigPath is empty you
assign loadingRules.GetDefaultFilename() instead of RecommendedHomeFile,
ensuring the resolved path used by loadingRules is what you later export (e.g.,
via AddEnv) so child commands use the correct kubeconfig; update references to
kubeconfigPath and ensure loadingRules is available in the scope where you call
GetDefaultFilename().
In `@src/e2e_test/perf_tool/pkg/exporter/bq_exporter.go`:
- Around line 87-107: ExportResults currently launches inserter.Run(bqCh)
asynchronously and returns without waiting or surfacing insertion failures;
change ExportResults to close bqCh and then wait for the inserter to finish and
return any insertion error(s). Specifically: have bq.BatchInserter expose a
completion mechanism (e.g., an ErrChan or make Run return an error/accept a
context and provide a Done/Err channel) and in BQExporter.ExportResults call
inserter.Run in a goroutine that sends its error to that channel, close(bqCh)
after the input loop, then read from the inserter's completion/error channel and
return a non-nil error if any uploadBatch failures occurred. Reference symbols:
BQExporter.ExportResults, inserter.Run, bq.BatchInserter, bqCh, uploadBatch.
In `@src/e2e_test/perf_tool/pkg/exporter/parquet_exporter.go`:
- Around line 237-267: The per-row sort in buildResultRow is causing unnecessary
overhead; since tagKeys is already sorted, remove the sort.Slice call and
instead assemble entries in alphabetical column order by merging the sorted tag
entries with the fixed columns—i.e., compute the static columns
("experiment_id","name","timestamp","value") and the tag_* columns (iterating
tagKeys and reading row.Tags) and merge them into entries in the correct
alphabetical order (no per-row sort) so the produced entries match the parquet
schema; update buildResultRow to produce this merged order and drop the
sort.Slice usage.
In `@src/e2e_test/perf_tool/pkg/metrics/recorder.go`:
- Line 69: The function handling MetricSpec currently returns nil, nil for
unsupported metric types (the return at the end of recorder.go), which hides
errors; change it to return a non-nil error (e.g., fmt.Errorf("unsupported
metric type: %q", spec.Type) or similar) instead of nil error, and update
imports if needed; reference the function that inspects MetricSpec (the
factory/constructor that returns (..., error) in recorder.go) and ensure callers
handle the error.
In `@src/e2e_test/perf_tool/pkg/run/run.go`:
- Around line 93-98: The change made getCluster and prepareWorkloads run
serially; restore their previous concurrent execution by running
r.getCluster(ctx, spec.ClusterSpec) and r.prepareWorkloads(ctx, spec) as
separate goroutines under an errgroup (e.g., errgroup.WithContext), capture
errors from each goroutine, and return g.Wait() so both run in parallel and
errors are propagated; reference r.getCluster and r.prepareWorkloads to locate
the calls and ensure you pass the same ctx/spec values into the goroutines and
handle their returned errors via the errgroup.
In `@src/e2e_test/perf_tool/pkg/suites/experiments.go`:
- Line 39: The Vizier workload usage is inconsistent: HTTPLoadTestExperiment
uses VizierReleaseWorkload() while ClickHouseExportExperiment uses
VizierWorkload(); decide whether ClickHouseExportExperiment should also use the
pre-built images or intentionally build from source and make that explicit.
Either change ClickHouseExportExperiment to call VizierReleaseWorkload() to
align usage, or add a clear inline comment next to ClickHouseExportExperiment
(or above VizierWorkload/VizierReleaseWorkload usage) explaining why it must
build from source (e.g., requires latest dev build), referencing the experiment
names ClickHouseExportExperiment and HTTPLoadTestExperiment and the functions
VizierWorkload() and VizierReleaseWorkload() so reviewers can easily locate and
verify the intent.
In `@src/e2e_test/perf_tool/pkg/suites/suites.go`:
- Line 42: The duration for nightlyExperimentSuite was changed by assigning dur
:= 5 * time.Minute (previously 40 * time.Minute); confirm whether this 8x
reduction was intentional—if not, revert dur to 40 * time.Minute in the
nightlyExperimentSuite initialization (or restore the prior constant/variable
that set 40 minutes) and add a brief comment or PR note explaining the final
chosen duration to avoid accidental debugging-shortened runs.
- Around line 86-115: The clickhouseExecSuite function currently hardcodes
ClickHouse credentials in the clickhouseDSN variable and contains commented-out
duration alternatives; replace the literal DSN with a configurable source (e.g.,
read from an environment variable or CLI flag) and fall back to a safe default
or return an error if missing, update any references to
clickhouseDSN/clickhouseTable accordingly, add a short comment near
clickhouseDSN clarifying that the value must be for a public/test instance (if
that’s intended), and remove the unused commented-out duration assignments (the
old preDur/dur lines) to eliminate dead code.
In `@src/e2e_test/perf_tool/ui/index.html`:
- Line 107: Buttons like the one with id="gcs-load-btn" (and other <button
class="btn"> elements flagged at the other locations) are missing an explicit
type and will default to type="submit"; update each of these button elements
(e.g., id="gcs-load-btn" and the other .btn buttons mentioned) to include
type="button" to prevent accidental form submission.
- Around line 392-420: The SQL strings in buildSuiteQuery and
buildExperimentQuery interpolate user-provided values (e.g., suite, workload,
experiment IDs) directly, which risks injection from malicious parquet data;
update these functions to stop direct interpolation by either (1) switching to
DuckDB parameterized/prepared queries if available, or (2) implementing a strict
sanitizer that escapes single quotes and rejects unexpected characters
(allow-list names/IDs to [A-Za-z0-9._-] or numeric IDs) before inserting into
the template; ensure the code paths that build metricSourceCTE and the other
WHERE clauses use the sanitized/parameterized values and add unit tests that
pass malicious strings to verify escaping/validation works.
In `@src/utils/shared/k8s/apply.go`:
- Around line 33-34: Reorder the import block so that
"k8s.io/apimachinery/pkg/api/meta" is grouped with the other
"k8s.io/apimachinery" imports (alongside k8serrors) to satisfy the gci linter;
update the import grouping in src/utils/shared/k8s/apply.go so both meta and
k8serrors appear together in the same k8s.io/apimachinery group.
In `@src/utils/shared/k8s/delete.go`:
- Around line 126-127: The function getAggregatedGroupVersions uses the
deprecated sets.String; change its return type and local variable to the generic
type sets.Set[string] and instantiate with sets.New[string]() (instead of
sets.NewString()). Update any uses of out.Insert / out.Has etc. as needed (they
work on sets.Set[string]) so ObjectDeleter.getAggregatedGroupVersions and its
callers compile against k8s.io/apimachinery v0.26.x.
In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h`:
- Around line 1234-1237: The SQL construction for DROP TABLE uses cluster_name_
without validation (see drop_cluster_clause and clickhouse_client_->Execute),
which allows quotes/injection; update the Init method to validate cluster_name_
(reject empty or names containing single quotes, whitespace, semicolons or other
unsafe characters, e.g., only allow alphanumeric, hyphen, underscore) and return
an error if invalid, and/or escape/quote it safely before constructing
drop_cluster_clause; ensure the same validation/escape is applied where
cluster_name_ is used elsewhere (e.g., the similar occurrence at the line
referenced around 1316).
- Around line 1315-1329: Update the explanatory comment around
on_cluster_clause/engine/create_sql to correct the ClickHouse requirement and
note the server config dependency: change the line that says "requires
ClickHouse >= 22.x" to say the auto-generation of ZooKeeper paths for
ReplicatedMergeTree() was introduced in ClickHouse 20.9 (with improvements in
21.1), and add that this auto-generation also requires the server to have
default_replica_path and default_replica_name configured; keep the rest of the
logic (setting engine to "ReplicatedMergeTree()" when cluster_name is non-empty)
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 19e35db8-043f-4270-9d9b-dcbc49e772b2
⛔ Files ignored due to path filters (2)
go.sumis excluded by!**/*.sumsrc/e2e_test/perf_tool/experimentpb/experiment.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (34)
.github/workflows/perf_clickhouse.yamlgo.modgo_deps.bzlskaffold/skaffold_vizier.yamlsrc/carnot/exec/clickhouse_export_sink_node.ccsrc/e2e_test/perf_tool/cmd/BUILD.bazelsrc/e2e_test/perf_tool/cmd/run.gosrc/e2e_test/perf_tool/experimentpb/experiment.protosrc/e2e_test/perf_tool/pkg/cluster/context.gosrc/e2e_test/perf_tool/pkg/deploy/checks/BUILD.bazelsrc/e2e_test/perf_tool/pkg/deploy/checks/k8s_healthcheck.gosrc/e2e_test/perf_tool/pkg/deploy/steps/prerendered.gosrc/e2e_test/perf_tool/pkg/exporter/BUILD.bazelsrc/e2e_test/perf_tool/pkg/exporter/bq_exporter.gosrc/e2e_test/perf_tool/pkg/exporter/exporter.gosrc/e2e_test/perf_tool/pkg/exporter/parquet_exporter.gosrc/e2e_test/perf_tool/pkg/exporter/parquet_exporter_test.gosrc/e2e_test/perf_tool/pkg/metrics/prometheus_recorder.gosrc/e2e_test/perf_tool/pkg/metrics/recorder.gosrc/e2e_test/perf_tool/pkg/run/BUILD.bazelsrc/e2e_test/perf_tool/pkg/run/run.gosrc/e2e_test/perf_tool/pkg/suites/BUILD.bazelsrc/e2e_test/perf_tool/pkg/suites/experiments.gosrc/e2e_test/perf_tool/pkg/suites/metrics.gosrc/e2e_test/perf_tool/pkg/suites/scripts/clickhouse_export.pxlsrc/e2e_test/perf_tool/pkg/suites/suites.gosrc/e2e_test/perf_tool/pkg/suites/workloads.gosrc/e2e_test/perf_tool/ui/index.htmlsrc/e2e_test/protocol_loadtest/skaffold_client.yamlsrc/e2e_test/protocol_loadtest/skaffold_loadtest.yamlsrc/utils/shared/k8s/apply.gosrc/utils/shared/k8s/delete.gosrc/vizier/funcs/md_udtfs/md_udtfs_impl.htools/private/copybara/copy.bara.sky
| run: | | ||
| bazel run //src/e2e_test/perf_tool:perf_tool -- run \ | ||
| --api_key="${PX_API_KEY}" \ | ||
| --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} |
There was a problem hiding this comment.
Fail fast when PERF_CLOUD_ADDR is unset.
Line 133 uses ${{ vars.PERF_CLOUD_ADDR }} directly; when unset, this becomes an empty flag value and can silently run against an unintended default. Add an explicit precheck and hard-fail.
Suggested fix
- name: Run clickhouse-export perf
env:
PX_API_KEY: ${{ secrets.PX_API_KEY }}
+ PERF_CLOUD_ADDR: ${{ vars.PERF_CLOUD_ADDR }}
GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }}
KUBECONFIG: ${{ runner.temp }}/kubeconfig
run: |
+ test -n "${PERF_CLOUD_ADDR}" || { echo "PERF_CLOUD_ADDR is not set"; exit 1; }
bazel run //src/e2e_test/perf_tool:perf_tool -- run \
--api_key="${PX_API_KEY}" \
- --cloud_addr=${{ vars.PERF_CLOUD_ADDR }}
+ --cloud_addr="${PERF_CLOUD_ADDR}" \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/perf_clickhouse.yaml at line 133, The workflow currently
interpolates `${{ vars.PERF_CLOUD_ADDR }}` directly into the `--cloud_addr` flag
which can expand to an empty value; add an explicit pre-check step before the
job/step that uses `--cloud_addr` to hard-fail if `vars.PERF_CLOUD_ADDR` is
empty (for example, a short run step that tests `echo -n "${{
vars.PERF_CLOUD_ADDR }}"` and exits non-zero with a clear error message when
empty), or guard the step that passes `--cloud_addr` with an `if:` condition
that ensures `vars.PERF_CLOUD_ADDR != ''`, so the `--cloud_addr` flag is never
invoked with an empty value.
| gopkg.in/yaml.v2 => gopkg.in/yaml.v2 v2.4.0 | ||
| ) | ||
|
|
||
| replace google.golang.org/protobuf => google.golang.org/protobuf v1.29.1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Confirm protobuf pins in go.mod =="
rg -n 'google\.golang\.org/protobuf' go.mod
echo
echo "== Query OSV for protobuf v1.29.1 advisories =="
curl -s https://api.osv.dev/v1/query \
-H 'Content-Type: application/json' \
-d '{"package":{"name":"google.golang.org/protobuf","ecosystem":"Go"},"version":"v1.29.1"}' \
| jq '.vulns[] | {id: .id, summary: .summary, severity: .database_specific.severity}'Repository: k8sstormcenter/pixie
Length of output: 582
Remove or upgrade forced protobuf version; current pin is vulnerable.
Line 324 forces google.golang.org/protobuf to v1.29.1, which contains a moderate-severity vulnerability (GHSA-8r3f-844c-mc37: infinite loop in JSON unmarshaling). This override defeats the safer indirect version at Line 282 and should be removed or bumped to a patched release consistently across Go/Bazel dependency pins.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@go.mod` at line 324, The go.mod replace directive forcing
google.golang.org/protobuf to v1.29.1 is a vulnerable pin; remove this replace
or bump it to a patched release (e.g., a non-vulnerable v1.30.x or later) and
ensure the same protobuf version is used consistently in any other dependency
pins (e.g., Bazel/WORKSPACE or other module overrides referencing
google.golang.org/protobuf) so there are no conflicting overrides; update the
replace line (or delete it) and align all protobuf pins to the chosen patched
version.
| // TODO(ddelnano): Defend against columns that don't exist. These should be | ||
| // ignored by the Node. | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Address TODO: Defending against missing columns.
This TODO notes a valid defensive measure. If unmapped columns are passed, the code should handle them gracefully rather than potentially failing.
Would you like me to help implement this defensive check, or should I open an issue to track this task?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/carnot/exec/clickhouse_export_sink_node.cc` around lines 38 - 40, The
code should defend against missing/unmapped columns by validating column names
against the target schema during initialization: in ClickhouseExportSinkNode
(e.g., implement or update a mapColumns/InitColumnMapping routine) check each
requested column with a columnExists(schema, name) helper, skip and log a
warning for any missing columns, and ensure downstream functions that build
queries or write rows (e.g., WriteRow/processRow/exportColumns) only access
columns present in the validated mapping; remove any direct assumptions that all
input columns exist so accesses use the validated mapping (or conditional
checks) to avoid crashes.
| // Auto-derive event_time from time_ if time_ is present but event_time is not. | ||
| // The ClickHouse table schema uses event_time (DateTime64(3), milliseconds) for | ||
| // partitioning and ordering, but the Pixie table has time_ (TIME64NS, nanoseconds). | ||
| bool has_time_ = false; | ||
| bool has_event_time = false; | ||
| int time_col_index = -1; | ||
| for (const auto& mapping : plan_node_->column_mappings()) { | ||
| if (mapping.clickhouse_column_name() == "time_") { | ||
| has_time_ = true; | ||
| time_col_index = mapping.input_column_index(); | ||
| } | ||
| if (mapping.clickhouse_column_name() == "event_time") { | ||
| has_event_time = true; | ||
| } | ||
| } | ||
|
|
||
| if (has_time_ && !has_event_time && time_col_index >= 0) { | ||
| auto arrow_col = rb.ColumnAt(time_col_index); | ||
| int64_t num_rows = arrow_col->length(); | ||
| auto event_time_col = std::make_shared<clickhouse::ColumnDateTime64>(3); | ||
| for (int64_t i = 0; i < num_rows; ++i) { | ||
| int64_t ns_val = types::GetValueFromArrowArray<types::TIME64NS>(arrow_col.get(), i); | ||
| // Convert nanoseconds to milliseconds for DateTime64(3) | ||
| event_time_col->Append(ns_val / 1000000LL); | ||
| } | ||
| block.AppendColumn("event_time", event_time_col); | ||
| } |
There was a problem hiding this comment.
Missing type verification before casting to TIME64NS.
The event_time derivation assumes the column mapped to "time_" is TIME64NS, but Line 177-179 only checks the ClickHouse column name, not the actual column type. If a non-TIME64NS column is mapped to "time_", GetValueFromArrowArray<types::TIME64NS> at Line 191 would produce incorrect values or potentially crash.
🛡️ Proposed fix to verify column type
bool has_time_ = false;
bool has_event_time = false;
int time_col_index = -1;
+ types::DataType time_col_type = types::DATA_TYPE_UNKNOWN;
for (const auto& mapping : plan_node_->column_mappings()) {
if (mapping.clickhouse_column_name() == "time_") {
has_time_ = true;
time_col_index = mapping.input_column_index();
+ time_col_type = mapping.column_type();
}
if (mapping.clickhouse_column_name() == "event_time") {
has_event_time = true;
}
}
- if (has_time_ && !has_event_time && time_col_index >= 0) {
+ if (has_time_ && !has_event_time && time_col_index >= 0 &&
+ time_col_type == types::TIME64NS) {
auto arrow_col = rb.ColumnAt(time_col_index);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/carnot/exec/clickhouse_export_sink_node.cc` around lines 170 - 196, The
code auto-deriving event_time assumes the source column is TIME64NS but never
checks its actual Arrow/PIXIE type; update the logic around
plan_node_->column_mappings() and the block that reads
rb.ColumnAt(time_col_index) to verify the column's type before calling
types::GetValueFromArrowArray<types::TIME64NS>. Specifically, inspect the Arrow
array/type (from rb.ColumnAt(...) or the mapping metadata) to confirm it is
TIME64NS (or the equivalent Arrow type id) and only perform the ns->ms
conversion and call GetValueFromArrowArray when that check passes; if the type
mismatches, handle it explicitly (log/error/skip/convert appropriately) rather
than blindly calling GetValueFromArrowArray, keeping the rest of the flow that
builds clickhouse::ColumnDateTime64 and block.AppendColumn unchanged.
| func createExporter(ctx context.Context) (exporter.Exporter, error) { | ||
| switch viper.GetString("export_backend") { | ||
| case "bq": | ||
| return createBQExporter() | ||
| case "parquet-gcs": | ||
| bucket := viper.GetString("gcs_bucket") | ||
| if bucket == "" { | ||
| return nil, errors.New("--gcs_bucket is required when using parquet-gcs backend") | ||
| } | ||
| prefix := viper.GetString("gcs_prefix") | ||
| batchSize := viper.GetInt("parquet_batch_size") | ||
| return exporter.NewParquetGCSExporter(ctx, bucket, prefix, batchSize) | ||
| default: | ||
| return nil, fmt.Errorf("unknown export backend: %s", viper.GetString("export_backend")) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Context parameter is unused in createExporter.
The ctx parameter is passed but only used for parquet-gcs. For bq, the context is not passed to createBQExporter. This is fine for now since BigQuery table creation doesn't require context, but consider removing the parameter or documenting why it's needed only for certain backends.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/e2e_test/perf_tool/cmd/run.go` around lines 359 - 374, The
createExporter(ctx context.Context) parameter is unused for the bq path; remove
the unused ctx parameter from createExporter's signature and all its call sites,
and instead create a local context inside the parquet-gcs branch to pass into
exporter.NewParquetGCSExporter; leave createBQExporter as-is (it continues not
to need a context). Update references to createExporter and ensure
exporter.NewParquetGCSExporter is still called with a valid local context while
createBQExporter remains unchanged.
| function buildSuiteQuery(metric, filters) { | ||
| const { suite, workload, numCommits, everyN } = filters; | ||
|
|
||
| // Build metric source CTE based on whether it's computed | ||
| let metricSourceCTE; | ||
| if (metric.computed && metric.metricName === 'heap_ex_table') { | ||
| metricSourceCTE = ` | ||
| metric_source AS ( | ||
| SELECT | ||
| h.experiment_id, | ||
| h.timestamp, | ||
| h.value - COALESCE(t.value, 0) as metric_value, | ||
| h.tag_pod | ||
| FROM results h | ||
| LEFT JOIN results t | ||
| ON h.experiment_id = t.experiment_id | ||
| AND h.timestamp = t.timestamp | ||
| AND COALESCE(h.tag_pod, '') = COALESCE(t.tag_pod, '') | ||
| AND t.name = 'table_size' | ||
| WHERE h.name = 'heap_size_bytes' | ||
| )`; | ||
| } else { | ||
| metricSourceCTE = ` | ||
| metric_source AS ( | ||
| SELECT experiment_id, timestamp, value as metric_value, tag_pod | ||
| FROM results | ||
| WHERE name = '${metric.metricName}' | ||
| )`; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff
SQL queries use template strings with user-provided filter values.
The buildSuiteQuery and buildExperimentQuery functions interpolate filter values (suite, workload names, experiment IDs) directly into SQL strings. While DuckDB runs client-side in WASM (reducing server-side injection risk), malicious parquet data could contain crafted strings that break queries or cause unexpected behavior.
Consider using parameterized queries if DuckDB WASM supports them, or sanitizing/escaping values.
Also applies to: 568-602
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/e2e_test/perf_tool/ui/index.html` around lines 392 - 420, The SQL strings
in buildSuiteQuery and buildExperimentQuery interpolate user-provided values
(e.g., suite, workload, experiment IDs) directly, which risks injection from
malicious parquet data; update these functions to stop direct interpolation by
either (1) switching to DuckDB parameterized/prepared queries if available, or
(2) implementing a strict sanitizer that escapes single quotes and rejects
unexpected characters (allow-list names/IDs to [A-Za-z0-9._-] or numeric IDs)
before inserting into the template; ensure the code paths that build
metricSourceCTE and the other WHERE clauses use the sanitized/parameterized
values and add unit tests that pass malicious strings to verify
escaping/validation works.
| "k8s.io/apimachinery/pkg/api/meta" | ||
| k8serrors "k8s.io/apimachinery/pkg/api/errors" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Import ordering flagged by linter.
The gci linter reports that imports are not properly formatted. The meta import should be grouped with other k8s.io/apimachinery imports.
🔧 Suggested fix
- "k8s.io/apimachinery/pkg/api/meta"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
+ "k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "k8s.io/apimachinery/pkg/api/meta" | |
| k8serrors "k8s.io/apimachinery/pkg/api/errors" | |
| k8serrors "k8s.io/apimachinery/pkg/api/errors" | |
| "k8s.io/apimachinery/pkg/api/meta" | |
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" |
🧰 Tools
🪛 golangci-lint (2.12.2)
[error] 33-33: File is not properly formatted
(gci)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/shared/k8s/apply.go` around lines 33 - 34, Reorder the import block
so that "k8s.io/apimachinery/pkg/api/meta" is grouped with the other
"k8s.io/apimachinery" imports (alongside k8serrors) to satisfy the gci linter;
update the import grouping in src/utils/shared/k8s/apply.go so both meta and
k8serrors appear together in the same k8s.io/apimachinery group.
| func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) { | ||
| out := sets.NewString() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
🧩 Analysis chain
🌐 Web query:
What version of k8s.io/apimachinery introduced sets.Set generic type?
💡 Result:
The generic sets.Set[T] type in k8s.io/apimachinery/pkg/util/sets was introduced in version v0.26.0 [1]. This generic implementation is intended to replace the older, type-specific set implementations (such as sets.String or sets.Int), which are now deprecated [1][2].
Citations:
- 1: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/sets
- 2: https://github.com/kubernetes/apimachinery/blob/master/pkg/util/sets/string.go
🏁 Script executed:
fd -t f go.mod | head -5 && cat go.mod | grep -E 'k8s.io/apimachinery|go\s+'Repository: k8sstormcenter/pixie
Length of output: 849
🏁 Script executed:
cat -n src/utils/shared/k8s/delete.go | sed -n '120,135p'Repository: k8sstormcenter/pixie
Length of output: 895
🏁 Script executed:
rg 'sets\.String|sets\.New' src/utils/shared/k8s/delete.go -nRepository: k8sstormcenter/pixie
Length of output: 237
Replace deprecated sets.String with generic sets.Set[string].
The sets.String type is deprecated and the project uses k8s.io/apimachinery v0.26.2 which supports the generic sets.Set[T] type (available since v0.26.0).
♻️ Proposed fix
-func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) {
- out := sets.NewString()
+func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.Set[string], error) {
+ out := sets.New[string]()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.String, error) { | |
| out := sets.NewString() | |
| func (o *ObjectDeleter) getAggregatedGroupVersions() (sets.Set[string], error) { | |
| out := sets.New[string]() |
🧰 Tools
🪛 golangci-lint (2.12.2)
[error] 126-126: SA1019: sets.String is deprecated: use generic Set instead. new ways: s1 := Set[string]{} s2 := Newstring
(staticcheck)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/shared/k8s/delete.go` around lines 126 - 127, The function
getAggregatedGroupVersions uses the deprecated sets.String; change its return
type and local variable to the generic type sets.Set[string] and instantiate
with sets.New[string]() (instead of sets.NewString()). Update any uses of
out.Insert / out.Has etc. as needed (they work on sets.Set[string]) so
ObjectDeleter.getAggregatedGroupVersions and its callers compile against
k8s.io/apimachinery v0.26.x.
| std::string drop_cluster_clause = | ||
| cluster_name_.empty() ? "" : absl::Substitute(" ON CLUSTER '$0'", cluster_name_); | ||
| clickhouse_client_->Execute( | ||
| absl::Substitute("DROP TABLE IF EXISTS $0$1", table_name, drop_cluster_clause)); |
There was a problem hiding this comment.
SQL injection risk: cluster_name is not sanitized.
If cluster_name contains a single quote (e.g., test'cluster), the SQL will break or could be exploited. The same pattern appears at Line 1316.
🔒 Proposed fix to validate cluster_name
Add validation in Init to reject invalid cluster names:
cluster_name_ = std::string(cluster_name);
+
+ // Validate cluster_name to prevent SQL injection
+ if (!cluster_name_.empty()) {
+ for (char c : cluster_name_) {
+ if (!std::isalnum(c) && c != '_' && c != '-') {
+ return error::InvalidArgument(
+ "cluster_name contains invalid characters: only alphanumeric, underscore, and hyphen allowed");
+ }
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h` around lines 1234 - 1237, The SQL
construction for DROP TABLE uses cluster_name_ without validation (see
drop_cluster_clause and clickhouse_client_->Execute), which allows
quotes/injection; update the Init method to validate cluster_name_ (reject empty
or names containing single quotes, whitespace, semicolons or other unsafe
characters, e.g., only allow alphanumeric, hyphen, underscore) and return an
error if invalid, and/or escape/quote it safely before constructing
drop_cluster_clause; ensure the same validation/escape is applied where
cluster_name_ is used elsewhere (e.g., the similar occurrence at the line
referenced around 1316).
| std::string on_cluster_clause = | ||
| cluster_name.empty() ? "" : absl::Substitute(" ON CLUSTER '$0'", cluster_name); | ||
| // Use ReplicatedMergeTree when cluster mode is enabled so that data is | ||
| // replicated across nodes. ClickHouse auto-generates the ZooKeeper paths | ||
| // when no explicit arguments are provided (requires ClickHouse >= 22.x). | ||
| std::string engine = cluster_name.empty() ? "MergeTree()" : "ReplicatedMergeTree()"; | ||
| std::string create_sql = absl::Substitute(R"( | ||
| CREATE TABLE $0$1 ( | ||
| $2 | ||
| ) ENGINE = MergeTree() | ||
| CREATE TABLE $0$1$2 ( | ||
| $3 | ||
| ) ENGINE = $4 | ||
| PARTITION BY toYYYYMM(event_time) | ||
| ORDER BY (hostname, event_time) | ||
| )", | ||
| if_not_exists_clause, table_name, columns_str); | ||
| if_not_exists_clause, table_name, on_cluster_clause, | ||
| columns_str, engine); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What ClickHouse version introduced ReplicatedMergeTree without explicit ZooKeeper path arguments?
💡 Result:
The ability to omit explicit ZooKeeper path and replica name arguments when creating a ReplicatedMergeTree table was introduced in ClickHouse version 20.9 [1]. This feature was implemented via pull request #14791, which allowed users to define default values for these parameters in the server configuration file using the default_replica_path and default_replica_name settings [1]. When these settings are configured, the ReplicatedMergeTree engine can be instantiated without providing the zoo_path and replica_name arguments in the CREATE TABLE statement [2][1]. A subsequent improvement was made in version 21.1 (via pull request #21060) to ensure these default settings were correctly applied even when the engine required other parameters (such as in ReplicatedReplacingMergeTree) [3][4].
Citations:
- 1: Support default arguments for Replicated engine ClickHouse/ClickHouse#14791
- 2: https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/engines/table-engines/mergetree-family/replication.md
- 3: fix default_replica_path and default_replica_name values are useless … ClickHouse/ClickHouse#21060
- 4: fix default_replica_path and default_replica_name values are useless … ClickHouse/ClickHouse#21060
Correct the ClickHouse version requirement and document server configuration dependency.
The code comment incorrectly states ReplicatedMergeTree() without explicit ZooKeeper paths requires ClickHouse >= 22.x. Verification shows this feature was actually introduced in ClickHouse 20.9 (with improvements in 21.1). Additionally, the comment omits that this feature requires server configuration settings default_replica_path and default_replica_name to be defined. Update the comment to accurately reflect the version requirement and highlight the configuration dependency.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vizier/funcs/md_udtfs/md_udtfs_impl.h` around lines 1315 - 1329, Update
the explanatory comment around on_cluster_clause/engine/create_sql to correct
the ClickHouse requirement and note the server config dependency: change the
line that says "requires ClickHouse >= 22.x" to say the auto-generation of
ZooKeeper paths for ReplicatedMergeTree() was introduced in ClickHouse 20.9
(with improvements in 21.1), and add that this auto-generation also requires the
server to have default_replica_path and default_replica_name configured; keep
the rest of the logic (setting engine to "ReplicatedMergeTree()" when
cluster_name is non-empty) unchanged.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/perf_clickhouse.yaml (1)
133-133:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when
PERF_CLOUD_ADDRis unset.This flag still uses
${{ vars.PERF_CLOUD_ADDR }}directly; when unset, it expands to--cloud_addr=and may silently proceed with an unintended default or cause runtime errors. Add an explicit validation step that hard-fails if the variable is empty.🛡️ Recommended fix
- name: Run clickhouse-export perf env: PX_API_KEY: ${{ secrets.PX_API_KEY }} + PERF_CLOUD_ADDR: ${{ vars.PERF_CLOUD_ADDR }} GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.gcloud-creds.outputs.gcloud-creds }} KUBECONFIG: ${{ runner.temp }}/kubeconfig run: | + test -n "${PERF_CLOUD_ADDR}" || { echo "PERF_CLOUD_ADDR is not set"; exit 1; } bazel run //src/e2e_test/perf_tool:perf_tool -- run \ --api_key="${PX_API_KEY}" \ - --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} + --cloud_addr="${PERF_CLOUD_ADDR}" \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/perf_clickhouse.yaml at line 133, Add a pre-flight validation step before the command that uses the --cloud_addr flag to fail fast when PERF_CLOUD_ADDR is empty: check the GitHub Actions variable PERF_CLOUD_ADDR (e.g., run: if [ -z "${{ vars.PERF_CLOUD_ADDR }}" ]; then echo "ERROR: PERF_CLOUD_ADDR is not set" >&2; exit 1; fi) and only proceed to invoke the command with --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} if the check passes; ensure the error message clearly names PERF_CLOUD_ADDR and exits non‑zero so the workflow fails immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: Add a pre-flight validation step before the command that uses the
--cloud_addr flag to fail fast when PERF_CLOUD_ADDR is empty: check the GitHub
Actions variable PERF_CLOUD_ADDR (e.g., run: if [ -z "${{ vars.PERF_CLOUD_ADDR
}}" ]; then echo "ERROR: PERF_CLOUD_ADDR is not set" >&2; exit 1; fi) and only
proceed to invoke the command with --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} if
the check passes; ensure the error message clearly names PERF_CLOUD_ADDR and
exits non‑zero so the workflow fails immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 8d86c57f-4342-4bd8-a987-ddb126b8dda8
📒 Files selected for processing (1)
.github/workflows/perf_clickhouse.yaml
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/perf_clickhouse.yaml (1)
133-133:⚠️ Potential issue | 🟠 MajorFail fast when
PERF_CLOUD_ADDRis unset.This issue was already flagged in a previous review. The
--cloud_addrflag uses${{ vars.PERF_CLOUD_ADDR }}directly without validation or quoting. If the variable is unset, this becomes an empty value and may silently connect to an unintended default endpoint.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/perf_clickhouse.yaml at line 133, The workflow currently passes --cloud_addr=${{ vars.PERF_CLOUD_ADDR }} without validating PERF_CLOUD_ADDR; add a pre-flight validation step that checks PERF_CLOUD_ADDR is non-empty and fails the job (exit non‑zero) if unset, and quote the flag when used (e.g. --cloud_addr="${{ vars.PERF_CLOUD_ADDR }}") to avoid silent defaults; target the step that emits the --cloud_addr flag and add the validation before it so the job fails fast when PERF_CLOUD_ADDR is not provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.github/workflows/perf_clickhouse.yaml:
- Line 133: The workflow currently passes --cloud_addr=${{ vars.PERF_CLOUD_ADDR
}} without validating PERF_CLOUD_ADDR; add a pre-flight validation step that
checks PERF_CLOUD_ADDR is non-empty and fails the job (exit non‑zero) if unset,
and quote the flag when used (e.g. --cloud_addr="${{ vars.PERF_CLOUD_ADDR }}")
to avoid silent defaults; target the step that emits the --cloud_addr flag and
add the validation before it so the job fails fast when PERF_CLOUD_ADDR is not
provided.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: d14f16f8-cbda-4200-a5dc-59bf7f65cc70
📒 Files selected for processing (1)
.github/workflows/perf_clickhouse.yaml
Summary: [perf_tool] Add perf_clickhouse GitHub action to test pxl clickhouse read/export
Relevant Issues: N/A
Type of change: /kind perf
Test Plan: Testing on #31. I will also run the perf test an additional time to verify it works.