Add Pixie cloud IaC terraform and skaffold/kustomize manifests to deploy cockpit Cloud#41
Add Pixie cloud IaC terraform and skaffold/kustomize manifests to deploy cockpit Cloud#41ddelnano wants to merge 12 commits into
Conversation
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
📝 WalkthroughWalkthroughThis PR introduces a complete cloud infrastructure provisioning stack combining Terraform (Auth0, certificate management, Kubernetes secrets), Kubernetes manifests (service routing, private cockpit deployment), GitHub Actions runner infrastructure via Helm, and build tooling via Skaffold. Infrastructure is encrypted via SOPS and gitignored appropriately. ChangesCloud Infrastructure and Deployment
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 25
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitignore:
- Around line 84-85: The .gitignore entries "terraform/**/.terraform.tfstate"
and "terraform/**/.terraform.tfstate*" are incorrect because local state files
are named "terraform.tfstate" (no leading dot); update the ignore patterns that
reference these symbols (the two terraform/**/.terraform.tfstate* lines) to
ignore "terraform.tfstate" and any suffixes (e.g., "terraform.tfstate*") under
terraform/** and at repo root so local state files (terraform.tfstate,
terraform.tfstate.backup, etc.) are not accidentally committed; find and replace
the incorrect ".terraform.tfstate" patterns with "terraform.tfstate" equivalents
and add a root-level "terraform.tfstate*" pattern if missing.
In `@k8s/cloud/overlays/exposed_services_traefik/api_service.yaml`:
- Line 7: The traefik ServersTransport annotation currently points to the wrong
resource name; update the annotation key
traefik.ingress.kubernetes.io/service.serverstransport to reference
cloud-backend-transport@kubernetescrd (the actual ServersTransport resource name
in the plc namespace) instead of plc-cloud-backend-transport@kubernetescrd so
Traefik can locate the transport configuration for the service.
In `@k8s/cloud/overlays/exposed_services_traefik/proxy_service.yaml`:
- Line 19: Update the port name value that contains the typo: find the YAML
resource where the name field is set to "tcp-grcp-web" and change it to
"tcp-grpc-web" so the port name correctly reads "tcp-grpc-web".
- Around line 12-23: Add an inline comment in the proxy service port section
explaining why port 443 and port 5555 both map to targetPort 56000 (the HTTP/2
backend) so intent is clear; locate the port entries with name: tcp-https (port:
443, targetPort: 56000) and name: tcp-http2 (port: 5555, targetPort: 56000) and
add a brief note describing whether this is for protocol multiplexing, client
compatibility/fallback, or separate routing semantics so future readers know the
design decision.
In `@k8s/cloud/overlays/exposed_services_traefik/servers_transport.yaml`:
- Line 8: The servers transport configuration currently has insecureSkipVerify:
true which disables TLS verification; change this to secure validation by
removing or setting insecureSkipVerify to false and configure Traefik to trust
your CA instead: add a rootCAs entry pointing to a Kubernetes secret or CA
bundle (use the serversTransport block and its rootCAs option) so Traefik
verifies backend certificates, or install proper CA-signed certs for the
backends and stop using insecureSkipVerify entirely.
- Line 5: The ServersTransport resource currently named
"cloud-backend-transport" does not match the service annotations which reference
"plc-cloud-backend-transport@kubernetescrd"; update the ServersTransport name to
"plc-cloud-backend-transport" (or alternatively update all service annotations
to remove the "plc-" prefix) so the Traefik annotation and the ServersTransport
resource name match, ensuring Traefik will apply the transport config.
In `@k8s/devinfra/arc-controller/helm/README.md`:
- Around line 1-3: Change the top-level heading to an H1 on the first line
(e.g., "# install instructions") and add fenced code block languages for the
code blocks starting on lines referenced (for example add "```bash" instead of
"```") so both the block at Line 3 and the block around Line 23 include a
language specifier to satisfy markdownlint; update the README.md heading and the
two fenced code blocks accordingly.
- Around line 34-45: The README currently embeds the StorageClass YAML for
"runner-hostpath" but does not instruct users to apply it before installing the
Helm chart, causing PVCs that rely on values.yaml's storageClassName:
runner-hostpath to remain Pending; update the README to add an explicit
pre-install step that runs kubectl apply of the StorageClass manifest (for the
storage class named runner-hostpath with BasePath /mnt/runner-storage) before
running helm install so the storage class exists and volumeBindingMode:
WaitForFirstConsumer can bind PVCs.
In `@k8s/devinfra/arc-controller/helm/values.yaml`:
- Line 5: Replace floating image tags with pinned version tags or SHA-256
digests: locate the image references "ghcr.io/actions/actions-runner:latest" and
"docker:dind" in the helm values (the three image entries referenced in the
diff) and update them to explicit versions or digests (e.g.,
ghcr.io/actions/actions-runner:2.322.0 or its SHA-256 digest, and
docker:27.5.1-dind or its digest); ensure all three occurrences are updated
consistently and use immutable tags to prevent non-deterministic updates.
- Line 21: Add documentation and a version constraint explaining that using
initContainers with restartPolicy: Always (the sidecar behavior set in the
values.yaml entry `restartPolicy: Always`) requires Kubernetes 1.28+ (alpha) or
1.29+ (stable); update the Helm chart docs (and Chart.yaml `kubeVersion` if
appropriate) to state this minimum Kubernetes version and warn that clusters
<1.28 will reject the Pod spec so operators know the prerequisite.
In `@private/cockpit/artifact_config.yaml`:
- Around line 8-9: Add a short comment in this YAML stating that
PL_ARTIFACT_BUCKET and PL_SA_KEY_PATH are environment-specific placeholders that
must be populated for real deployments (e.g., by overlay or k8s/base values),
and also update or add a one-line note in the repo README or deployment docs
referring to these keys; reference the exact variables PL_ARTIFACT_BUCKET and
PL_SA_KEY_PATH so maintainers know which values to set and where to look for
concrete examples (see k8s/cloud/base/artifact_config.yaml).
In `@private/cockpit/auth_deployment.yaml`:
- Around line 14-24: PL_AUTH0_CLIENT_ID and PL_AUTH0_CLIENT_SECRET have
inconsistent optional settings which makes one credential potentially optional
while the other is required; make them consistent by either adding optional:
true to the PL_AUTH0_CLIENT_SECRET secretKeyRef (secret name
cloud-auth0-secrets, key auth0-client-secret) if Auth0 is optional, or remove
optional: true from PL_AUTH0_CLIENT_ID so both are required if Auth0 is
mandatory—update the corresponding secretKeyRef entries for PL_AUTH0_CLIENT_ID
and PL_AUTH0_CLIENT_SECRET accordingly.
In `@private/cockpit/plugin_db_updater_job.yaml`:
- Around line 22-26: The Postgres readiness loop currently uses an unbounded
`until pg_isready -h ${PL_POSTGRES_HOSTNAME} -p ${PL_POSTGRES_PORT}` which can
hang forever; change the job command to bound the wait by introducing a max-wait
(e.g. PL_POSTGRES_WAIT_SECONDS or PL_POSTGRES_MAX_RETRIES) and implement a loop
that increments a counter (or tracks elapsed time) and exits with a non-zero
status when the timeout is reached; update the command string that contains
`pg_isready` so it retries until success or until the counter/timeout is
exceeded and then prints an error and `exit 1` to fail the job instead of
hanging.
- Line 32: Replace the mutable image tag "image: cloud-plugin-load_db:latest"
with an immutable digest reference (e.g.
"cloud-plugin-load_db@sha256:<actual-digest>") so the updater job always pulls
the exact image built for the migration; obtain the correct sha256 digest from
your image registry for the intended build and update the image value
accordingly, and ensure imagePullPolicy is set appropriately (e.g., IfNotPresent
or Always per your CI/CD policy) to avoid pulling a different artifact at
runtime.
In `@private/cockpit/proxy_envoy.yaml`:
- Around line 87-96: The TLS upstream config (transport_socket / typed_config
using
type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
with common_tls_context and tls_certificates) lacks a
validation_context/trusted_ca, so Envoy will skip verifying the upstream server
certificate; add a validation_context (or common_tls_context/validation_context)
with trusted_ca pointing to the CA bundle (filename to the CA file) or inline
certificate to enable upstream server certificate validation and optional fields
like verify_subject_alt_name as needed.
In `@private/skaffold_cloud.yaml`:
- Around line 70-71: The skaffold tagPolicy currently uses dateTime which
produces non-reproducible, timestamp-based image tags; change tagPolicy in the
skaffold config from dateTime to a reproducible policy (e.g., gitCommit or
sha256) by replacing the dateTime block with a gitCommit or sha256 block, or add
logic to conditionally select dateTime only for ephemeral/private deployments;
update the tagPolicy key (tagPolicy: dateTime -> tagPolicy: gitCommit or
tagPolicy: sha256) and ensure any CI/build tooling that consumes image tags is
adjusted to use the new commit/sha identifier.
- Around line 2-5: The YAML anchor `.common_bazel_args` is defined but not
referenced in this file; remove the unused anchor definition (the block starting
with `.common_bazel_args: &common_bazel_args` and its associated list items) so
the config no longer contains an unused anchor; if other environments need these
args, instead reference the existing anchor where required (ensure callers use
`*common_bazel_args`) or keep the canonical definition only in the file that
actually references it.
In `@terraform/kubernetes/auth0/generated.tf`:
- Around line 1749-1756: The refresh_token block currently uses unsafe settings
(expiration_type = "non-expiring" and rotation_type = "non-rotating"); update
the refresh_token configuration to use expiring and rotating tokens by changing
expiration_type to "expiring" and rotation_type to "rotating", and add/adjust
absolute_lifetime and idle_lifetime (and any related fields such as
token_lifetime or idle_token_lifetime) to enforce appropriate expiration and
rotation windows per OAuth 2.0 BCP; ensure the changes are made in the
refresh_token block so the module will create rotating, time-limited refresh
tokens.
- Around line 33-42: The Username-Password-Authentication connection resource
(auth0_connection.username_password_authentication) currently allows
password-only logins because Guardian is set to "never" and all MFA factors are
off; update the Auth0 configuration so that either this connection is disabled
if password DB auth is not wanted, or better, require MFA for it by changing the
Guardian policy from "never" to an enforcing policy (e.g., "always" or your
org's minimum) and enable at least one strong factor (set
email/otp/recovery_code/duo/phone/push to true as appropriate) in the Auth0
Guardian settings that apply to this connection to prevent password-only
authentication.
- Around line 467-476: The auth0_client_grant resource
auth0_client_grant.akm0sumjg95tcrlp390pxzjozkkyndt_https_k8sstormcenter_eu_auth0_com_api_v2
is granting far more Management API scopes than needed; edit the scopes list on
that resource to remove all unused permissions and limit it to the
least-privilege scope required (keep "read:users" only), removing the listed
create/update/delete scopes for users, users_app_metadata, user_custom_blocks,
user_tickets, user_attribute_profiles and the role/permission-related scopes so
the grant only contains the minimal "read:users" permission.
In `@terraform/kubernetes/auth0/variables.tf`:
- Line 1: The variable "cloud_domain" currently has no type or validation;
update the variable declaration for cloud_domain to declare type = string and
add a validation block that enforces a sane domain format (e.g., no scheme, no
trailing slash, and matches a domain/hostname regex or at least contains a dot
and allowed characters) with a clear error message; locate the variable
"cloud_domain" in variables.tf and add the type and validation blocks to fail
fast when an invalid domain is provided.
In `@terraform/kubernetes/cloud_deps/base_cloud_deps.tf`:
- Around line 2-3: Replace cwd-relative path strings in the kustomization_build
blocks with module-relative interpolation using ${path.module}; specifically, in
base_cloud_deps.tf and env_cloud_deps.tf update assignments like path =
"../../../k8s/cloud_deps/base/elastic/operator" to use path =
"${path.module}/k8s/cloud_deps/base/elastic/operator" (and the corresponding env
path) so Terraform resolves paths relative to the module rather than the current
working directory; locate the path = ... entries in the kustomization_build
blocks to make the change.
In `@terraform/kubernetes/cloud_deps/env_cloud_deps.tf`:
- Around line 2-3: Replace the cwd-relative kustomization path in the "path"
attribute inside the kustomization_build blocks of env_cloud_deps.tf and
base_cloud_deps.tf with an anchored path using Terraform's path.module;
specifically, update the current hardcoded "../../../k8s/cloud_deps/public"
value so it is constructed by joining path.module with the same relative segment
(i.e., prepend path.module to the existing relative tail) so CI/automation runs
resolve the kustomization locations reliably.
In `@terraform/kubernetes/cloud_deps/provider.tf`:
- Around line 18-26: The Kubernetes and Kustomization provider blocks currently
hardcode config_path and context values (provider "kubernetes": config_path =
"~/.kube/cockpick-config", config_context = "default"; provider "kustomization":
kubeconfig_path/kubeconfig context), which breaks CI and other dev environments;
introduce two variables (e.g., variable "kubeconfig_path" and variable
"kubeconfig_context" in variables.tf) and replace the hardcoded values in the
provider "kubernetes" and provider "kustomization" blocks to reference
var.kubeconfig_path and var.kubeconfig_context (with sensible defaults like
"~/.kube/config" and "default") so pipelines and teammates can override them via
CLI or environment.
In `@terraform/kubernetes/cloud_deps/variables.tf`:
- Around line 4-20: Remove production defaults from the sensitive Terraform
input variables (project_environment, cluster_env, cluster_name, namespace,
cloud_domain) so users must supply explicit values via tfvars or workspace
variables; specifically, delete the default = "..." lines for those variable
blocks (or change to no default) and optionally add a simple validation block in
each variable (or a single shared validation) to ensure the value is non-empty
and not "prod"/the production domain to prevent accidental production applies,
referencing the variable names project_environment, cluster_env, cluster_name,
namespace, and cloud_domain when making the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c3880a8a-6551-437f-b3d7-a319941b696f
⛔ Files ignored due to path filters (1)
terraform/kubernetes/cloud_deps/terraform.tfstateis excluded by!**/*.tfstate
📒 Files selected for processing (35)
.gitignore.sops.yamlk8s/cloud/overlays/exposed_services_traefik/api_service.yamlk8s/cloud/overlays/exposed_services_traefik/cloud_ingress.yamlk8s/cloud/overlays/exposed_services_traefik/kustomization.yamlk8s/cloud/overlays/exposed_services_traefik/proxy_service.yamlk8s/cloud/overlays/exposed_services_traefik/servers_transport.yamlk8s/cloud/overlays/exposed_services_traefik/vzconn_service.yamlk8s/devinfra/arc-controller/helm/README.mdk8s/devinfra/arc-controller/helm/values.yamlprivate/cockpit/artifact_config.yamlprivate/cockpit/auth_deployment.yamlprivate/cockpit/cloud_ingress.yamlprivate/cockpit/domain_config.yamlprivate/cockpit/kustomization.yamlprivate/cockpit/plugin_db_updater_job.yamlprivate/cockpit/proxy_envoy.yamlprivate/cockpit/script_bundles_config.yamlprivate/cockpit/servers_transport.yamlprivate/skaffold_cloud.yamlskaffold/skaffold_cloud.yamlterraform/credentials/cockpit/auth0_config.yamlterraform/kubernetes/auth0/.terraform.lock.hclterraform/kubernetes/auth0/auth0_import.tfterraform/kubernetes/auth0/auth0_main.tfterraform/kubernetes/auth0/generated.tfterraform/kubernetes/auth0/variables.tfterraform/kubernetes/cloud_deps/.terraform.lock.hclterraform/kubernetes/cloud_deps/base_cloud_deps.tfterraform/kubernetes/cloud_deps/cert_manager.tfterraform/kubernetes/cloud_deps/core_resource_deps.tfterraform/kubernetes/cloud_deps/env_cloud_deps.tfterraform/kubernetes/cloud_deps/provider.tfterraform/kubernetes/cloud_deps/variables.tfterraform/kubernetes/scripts/create_random_bytes.sh
| terraform/**/.terraform.tfstate | ||
| terraform/**/.terraform.tfstate* |
There was a problem hiding this comment.
Fix Terraform state ignore patterns (current ones miss actual state filenames).
Lines 84-85 ignore .terraform.tfstate*, but local state files are terraform.tfstate* (no leading dot). This can leak state containing secrets.
Proposed fix
terraform/**/.terraform
-terraform/**/.terraform.tfstate
-terraform/**/.terraform.tfstate*
+terraform/**/terraform.tfstate
+terraform/**/terraform.tfstate.*
+terraform/**/terraform.tfstate.d/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| terraform/**/.terraform.tfstate | |
| terraform/**/.terraform.tfstate* | |
| terraform/**/.terraform | |
| terraform/**/terraform.tfstate | |
| terraform/**/terraform.tfstate.* | |
| terraform/**/terraform.tfstate.d/ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.gitignore around lines 84 - 85, The .gitignore entries
"terraform/**/.terraform.tfstate" and "terraform/**/.terraform.tfstate*" are
incorrect because local state files are named "terraform.tfstate" (no leading
dot); update the ignore patterns that reference these symbols (the two
terraform/**/.terraform.tfstate* lines) to ignore "terraform.tfstate" and any
suffixes (e.g., "terraform.tfstate*") under terraform/** and at repo root so
local state files (terraform.tfstate, terraform.tfstate.backup, etc.) are not
accidentally committed; find and replace the incorrect ".terraform.tfstate"
patterns with "terraform.tfstate" equivalents and add a root-level
"terraform.tfstate*" pattern if missing.
| metadata: | ||
| name: api-service | ||
| annotations: | ||
| traefik.ingress.kubernetes.io/service.serverstransport: plc-cloud-backend-transport@kubernetescrd |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check ServersTransport resource names and annotation references
# Find all ServersTransport resources
echo "=== ServersTransport resources ==="
rg -n "kind: ServersTransport" -A 5 --type yaml
# Find all serverstransport annotation references
echo -e "\n=== ServersTransport annotation references ==="
rg -n "service.serverstransport:" --type yamlRepository: k8sstormcenter/pixie
Length of output: 1719
🏁 Script executed:
# Get full content of servers_transport.yaml in the overlay
cat k8s/cloud/overlays/exposed_services_traefik/servers_transport.yaml
# Check the kustomization.yaml for namespace context
cat k8s/cloud/overlays/exposed_services_traefik/kustomization.yamlRepository: k8sstormcenter/pixie
Length of output: 479
🏁 Script executed:
# Check for namespace declarations in the overlay directory
echo "=== Checking for namespace context in overlay ==="
find k8s/cloud/overlays/exposed_services_traefik -type f -name "*.yaml" | xargs grep -l "namespace:" || echo "No namespace found"
# Check parent kustomization files for namespace patches
echo -e "\n=== Checking kustomization.yaml in parent directory ==="
cat k8s/cloud/overlays/kustomization.yaml 2>/dev/null || cat k8s/cloud/kustomization.yaml 2>/dev/null || echo "Parent kustomization not found"
# Check if there's a namespace patch or declaration
echo -e "\n=== Searching for namespace in exposed_services_traefik directory ==="
rg "namespace:" k8s/cloud/overlays/exposed_services_traefik/ --type yamlRepository: k8sstormcenter/pixie
Length of output: 346
🏁 Script executed:
# Check if there's a namespace declaration in the overlay's kustomization that specifies namespace
echo "=== Full kustomization.yaml content ==="
cat k8s/cloud/overlays/exposed_services_traefik/kustomization.yaml
# Check for any namespace field or commonLabels/commonAnnotations
echo -e "\n=== Searching for namespace declarations across overlay hierarchy ==="
rg "^namespace:" k8s/cloud/overlays/ --type yaml
# Check if overlay is applied to a specific namespace via parent kustomization
echo -e "\n=== Checking parent kustomizations for namespace context ==="
find k8s/cloud -name "kustomization.yaml" -type f | head -5 | xargs grep -l "namespace" || echo "No namespace declarations found"Repository: k8sstormcenter/pixie
Length of output: 596
🏁 Script executed:
# Check the parent kustomizations that have namespace declarations
echo "=== k8s/cloud/dev/kustomization.yaml ==="
cat k8s/cloud/dev/kustomization.yaml
echo -e "\n=== k8s/cloud/base/kustomization.yaml ==="
cat k8s/cloud/base/kustomization.yamlRepository: k8sstormcenter/pixie
Length of output: 2081
🏁 Script executed:
# Check which kustomization files include the exposed_services_traefik overlay
echo "=== Searching for exposed_services_traefik references ==="
rg "exposed_services_traefik" k8s/cloud/ --type yaml
# Check k8s/cloud/base/kustomization.yaml for any traefik references
echo -e "\n=== Checking for traefik in all kustomizations ==="
find k8s/cloud -name "kustomization.yaml" -type f | xargs grep -l "traefik\|exposed_services"
# Get the full content of any kustomization that includes exposed_services_traefik
echo -e "\n=== Checking production/staging kustomizations ==="
find k8s/cloud -name "kustomization.yaml" -type f -exec grep -l "exposed_services" {} \;Repository: k8sstormcenter/pixie
Length of output: 590
🏁 Script executed:
# Get the content of prod and staging kustomizations to see how traefik overlay is used
echo "=== k8s/cloud/prod/kustomization.yaml ==="
cat k8s/cloud/prod/kustomization.yaml
echo -e "\n=== k8s/cloud/staging/kustomization.yaml ==="
cat k8s/cloud/staging/kustomization.yaml
echo -e "\n=== k8s/cloud/testing/kustomization.yaml ==="
cat k8s/cloud/testing/kustomization.yamlRepository: k8sstormcenter/pixie
Length of output: 3551
🏁 Script executed:
# Search for where exposed_services_traefik overlay is actually used
echo "=== Searching for any references to exposed_services_traefik ==="
find k8s -type f -name "*.yaml" | xargs grep -l "exposed_services_traefik"
# Check if it's used in a different context or documentation
echo -e "\n=== Searching all yaml files for exposed_services_traefik ==="
rg "exposed_services_traefik" . --type yaml
# Check for any README or documentation about this overlay
echo -e "\n=== Checking for documentation ==="
find k8s/cloud/overlays/exposed_services_traefik -type f | head -20Repository: k8sstormcenter/pixie
Length of output: 698
🏁 Script executed:
# Get the cockpit kustomization to understand the namespace context for exposed_services_traefik
echo "=== private/cockpit/kustomization.yaml ==="
cat private/cockpit/kustomization.yamlRepository: k8sstormcenter/pixie
Length of output: 1002
Fix ServersTransport reference in the annotation.
The annotation references plc-cloud-backend-transport@kubernetescrd, but the ServersTransport resource is named cloud-backend-transport in the plc namespace. Since both the service and ServersTransport are in the same namespace, the correct reference should be cloud-backend-transport@kubernetescrd. This mismatch will prevent Traefik from locating the transport configuration.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@k8s/cloud/overlays/exposed_services_traefik/api_service.yaml` at line 7, The
traefik ServersTransport annotation currently points to the wrong resource name;
update the annotation key traefik.ingress.kubernetes.io/service.serverstransport
to reference cloud-backend-transport@kubernetescrd (the actual ServersTransport
resource name in the plc namespace) instead of
plc-cloud-backend-transport@kubernetescrd so Traefik can locate the transport
configuration for the service.
| - port: 443 | ||
| protocol: TCP | ||
| targetPort: 56000 | ||
| name: tcp-https | ||
| - port: 4444 | ||
| protocol: TCP | ||
| targetPort: 56004 | ||
| name: tcp-grcp-web | ||
| - port: 5555 | ||
| protocol: TCP | ||
| targetPort: 56000 | ||
| name: tcp-http2 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify cloud-proxy-server listens on both 56000 and 56004
# Search for cloud-proxy-server pod/deployment definitions
echo "=== Searching for cloud-proxy-server definitions ==="
rg -n "name: cloud-proxy-server" -B 5 -A 20 --type yaml
# Search for port 56000 and 56004 references
echo -e "\n=== Port 56000 references ==="
rg -n "56000" --type yaml
echo -e "\n=== Port 56004 references ==="
rg -n "56004" --type yamlRepository: k8sstormcenter/pixie
Length of output: 6448
🏁 Script executed:
cat -n k8s/cloud/overlays/exposed_services_traefik/proxy_service.yamlRepository: k8sstormcenter/pixie
Length of output: 791
Consider documenting the dual-port design.
Ports 443 and 5555 both map to targetPort: 56000 (the HTTP/2 backend). Add a comment explaining this intentional design—for example, whether this supports protocol multiplexing, provides fallback routing, or serves different client requirements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@k8s/cloud/overlays/exposed_services_traefik/proxy_service.yaml` around lines
12 - 23, Add an inline comment in the proxy service port section explaining why
port 443 and port 5555 both map to targetPort 56000 (the HTTP/2 backend) so
intent is clear; locate the port entries with name: tcp-https (port: 443,
targetPort: 56000) and name: tcp-http2 (port: 5555, targetPort: 56000) and add a
brief note describing whether this is for protocol multiplexing, client
compatibility/fallback, or separate routing semantics so future readers know the
design decision.
| - port: 4444 | ||
| protocol: TCP | ||
| targetPort: 56004 | ||
| name: tcp-grcp-web |
There was a problem hiding this comment.
Fix typo in port name.
The port name appears to have a typo: tcp-grcp-web should likely be tcp-grpc-web.
📝 Proposed fix
- name: tcp-grcp-web
+ name: tcp-grpc-web📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: tcp-grcp-web | |
| name: tcp-grpc-web |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@k8s/cloud/overlays/exposed_services_traefik/proxy_service.yaml` at line 19,
Update the port name value that contains the typo: find the YAML resource where
the name field is set to "tcp-grcp-web" and change it to "tcp-grpc-web" so the
port name correctly reads "tcp-grpc-web".
| apiVersion: traefik.io/v1alpha1 | ||
| kind: ServersTransport | ||
| metadata: | ||
| name: cloud-backend-transport |
There was a problem hiding this comment.
ServersTransport name does not match service annotation references.
The ServersTransport is named cloud-backend-transport, but all service annotations reference plc-cloud-backend-transport@kubernetescrd (with plc- prefix). This mismatch will prevent Traefik from applying the transport configuration to backend connections.
🔧 Proposed fix
- name: cloud-backend-transport
+ name: plc-cloud-backend-transport📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: cloud-backend-transport | |
| name: plc-cloud-backend-transport |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@k8s/cloud/overlays/exposed_services_traefik/servers_transport.yaml` at line
5, The ServersTransport resource currently named "cloud-backend-transport" does
not match the service annotations which reference
"plc-cloud-backend-transport@kubernetescrd"; update the ServersTransport name to
"plc-cloud-backend-transport" (or alternatively update all service annotations
to remove the "plc-" prefix) so the Traefik annotation and the ServersTransport
resource name match, ensuring Traefik will apply the transport config.
| @@ -0,0 +1 @@ | |||
| variable "cloud_domain" {} | |||
There was a problem hiding this comment.
Add type and validation for cloud_domain.
This currently accepts any value and errors only downstream when URLs are rendered.
Suggested fix
-variable "cloud_domain" {}
+variable "cloud_domain" {
+ type = string
+ description = "Base domain for Cockpit/Auth0 callback URLs."
+ nullable = false
+
+ validation {
+ condition = can(regex("^[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$", var.cloud_domain))
+ error_message = "cloud_domain must be a valid DNS domain."
+ }
+}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@terraform/kubernetes/auth0/variables.tf` at line 1, The variable
"cloud_domain" currently has no type or validation; update the variable
declaration for cloud_domain to declare type = string and add a validation block
that enforces a sane domain format (e.g., no scheme, no trailing slash, and
matches a domain/hostname regex or at least contains a dot and allowed
characters) with a clear error message; locate the variable "cloud_domain" in
variables.tf and add the type and validation blocks to fail fast when an invalid
domain is provided.
| # TODO(ddelnano): This will need to be updated for the terraform Azure pipeline | ||
| path = "../../../k8s/cloud_deps/base/elastic/operator" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --type=tf '^\s*path\s*=\s*"\.\./' terraform/kubernetes/cloud_depsRepository: k8sstormcenter/pixie
Length of output: 268
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find all kustomization_build blocks in terraform files to see full scope
echo "=== All kustomization_build blocks ==="
rg -A 2 'kustomization_build' terraform/kubernetes/cloud_deps --type=tf
echo ""
echo "=== All path assignments in cloud_deps ==="
rg -n 'path\s*=' terraform/kubernetes/cloud_deps --type=tfRepository: k8sstormcenter/pixie
Length of output: 4652
Fix cwd-relative paths in kustomization_build blocks to use ${path.module}.
Both base_cloud_deps.tf and env_cloud_deps.tf use relative paths that will fail when Terraform runs from different working directories. The TODO comments in each file already flag the Azure pipeline risk.
Changes needed in two files
base_cloud_deps.tf line 3:
# TODO(ddelnano): This will need to be updated for the terraform Azure pipeline
- path = "../../../k8s/cloud_deps/base/elastic/operator"
+ path = "${path.module}/../../../k8s/cloud_deps/base/elastic/operator"env_cloud_deps.tf line 3:
# TODO(ddelnano): This will need to be updated for the terraform Azure pipeline
- path = "../../../k8s/cloud_deps/public"
+ path = "${path.module}/../../../k8s/cloud_deps/public"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@terraform/kubernetes/cloud_deps/base_cloud_deps.tf` around lines 2 - 3,
Replace cwd-relative path strings in the kustomization_build blocks with
module-relative interpolation using ${path.module}; specifically, in
base_cloud_deps.tf and env_cloud_deps.tf update assignments like path =
"../../../k8s/cloud_deps/base/elastic/operator" to use path =
"${path.module}/k8s/cloud_deps/base/elastic/operator" (and the corresponding env
path) so Terraform resolves paths relative to the module rather than the current
working directory; locate the path = ... entries in the kustomization_build
blocks to make the change.
| # TODO(ddelnano): This will need to be updated for the terraform Azure pipeline | ||
| path = "../../../k8s/cloud_deps/public" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --type=tf '^\s*path\s*=\s*"\.\./' terraform/kubernetes/cloud_depsRepository: k8sstormcenter/pixie
Length of output: 268
🏁 Script executed:
rg -n 'kustomization_build' terraform/ -A 3Repository: k8sstormcenter/pixie
Length of output: 19238
🏁 Script executed:
rg -n 'path\s*=\s*"\.\./' terraform/ --type=tfRepository: k8sstormcenter/pixie
Length of output: 268
Use path.module to anchor both kustomization_build paths.
Both env_cloud_deps.tf and base_cloud_deps.tf use cwd-relative paths that can break in CI/automation contexts. Both require the same fix:
Proposed fixes
--- a/terraform/kubernetes/cloud_deps/env_cloud_deps.tf
+++ b/terraform/kubernetes/cloud_deps/env_cloud_deps.tf
@@ -1,5 +1,5 @@
data "kustomization_build" "env_deps" {
# TODO(ddelnano): This will need to be updated for the terraform Azure pipeline
- path = "../../../k8s/cloud_deps/public"
+ path = "${path.module}/../../../k8s/cloud_deps/public"
}
--- a/terraform/kubernetes/cloud_deps/base_cloud_deps.tf
+++ b/terraform/kubernetes/cloud_deps/base_cloud_deps.tf
@@ -1,5 +1,5 @@
data "kustomization_build" "elastic" {
# TODO(ddelnano): This will need to be updated for the terraform Azure pipeline
- path = "../../../k8s/cloud_deps/base/elastic/operator"
+ path = "${path.module}/../../../k8s/cloud_deps/base/elastic/operator"
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@terraform/kubernetes/cloud_deps/env_cloud_deps.tf` around lines 2 - 3,
Replace the cwd-relative kustomization path in the "path" attribute inside the
kustomization_build blocks of env_cloud_deps.tf and base_cloud_deps.tf with an
anchored path using Terraform's path.module; specifically, update the current
hardcoded "../../../k8s/cloud_deps/public" value so it is constructed by joining
path.module with the same relative segment (i.e., prepend path.module to the
existing relative tail) so CI/automation runs resolve the kustomization
locations reliably.
| provider "kubernetes" { | ||
| config_path = "~/.kube/cockpick-config" | ||
| config_context = "default" | ||
| } | ||
|
|
||
| provider "kustomization" { | ||
| context = "default" | ||
| kubeconfig_path = "~/.kube/cockpick-config" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all hardcoded kubeconfig references and potential typo.
rg -n --type=tf -C2 'cockpick-config|kubeconfig_path|config_path|config_context|context\s*=' terraform/kubernetesRepository: k8sstormcenter/pixie
Length of output: 1676
🏁 Script executed:
#!/bin/bash
# Check for variables.tf in cloud_deps directory
find terraform/kubernetes/cloud_deps -name "*.tf" -type f | head -20Repository: k8sstormcenter/pixie
Length of output: 356
🏁 Script executed:
#!/bin/bash
# Search for "cockpick" references throughout the codebase
rg -n "cockpick" . --type=tf --type=json --type=yaml --type=toml 2>/dev/null | head -30Repository: k8sstormcenter/pixie
Length of output: 253
🏁 Script executed:
#!/bin/bash
# Check what other provider configurations look like in the repository
rg -n -B2 -A4 'provider\s*"kubernetes"' terraform/ --type=tf | head -50Repository: k8sstormcenter/pixie
Length of output: 497
🏁 Script executed:
#!/bin/bash
# Read the variables.tf file to see existing variable definitions
cat -n terraform/kubernetes/cloud_deps/variables.tfRepository: k8sstormcenter/pixie
Length of output: 1086
🏁 Script executed:
#!/bin/bash
# Also check the full provider.tf file to understand complete context
cat -n terraform/kubernetes/cloud_deps/provider.tfRepository: k8sstormcenter/pixie
Length of output: 740
Parameterize kubeconfig path and context instead of hardcoding local values.
These hardcoded configurations (config_path = "~/.kube/cockpick-config", config_context = "default") break CI pipelines and teammate workflows that use different kubeconfig paths and contexts. Additionally, cockpick-config appears nowhere else in the codebase, suggesting it may be a typo or leftover from local development.
Define kubeconfig_path and kubeconfig_context variables in variables.tf and reference them in the provider blocks.
Suggested fix
provider "kubernetes" {
- config_path = "~/.kube/cockpick-config"
- config_context = "default"
+ config_path = var.kubeconfig_path
+ config_context = var.kubeconfig_context
}
provider "kustomization" {
- context = "default"
- kubeconfig_path = "~/.kube/cockpick-config"
+ context = var.kubeconfig_context
+ kubeconfig_path = var.kubeconfig_path
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@terraform/kubernetes/cloud_deps/provider.tf` around lines 18 - 26, The
Kubernetes and Kustomization provider blocks currently hardcode config_path and
context values (provider "kubernetes": config_path = "~/.kube/cockpick-config",
config_context = "default"; provider "kustomization": kubeconfig_path/kubeconfig
context), which breaks CI and other dev environments; introduce two variables
(e.g., variable "kubeconfig_path" and variable "kubeconfig_context" in
variables.tf) and replace the hardcoded values in the provider "kubernetes" and
provider "kustomization" blocks to reference var.kubeconfig_path and
var.kubeconfig_context (with sensible defaults like "~/.kube/config" and
"default") so pipelines and teammates can override them via CLI or environment.
| variable "project_environment" { | ||
| default = "prod" | ||
| } | ||
| variable "cluster_env" { | ||
| default = "prod" | ||
| } | ||
|
|
||
| variable "cluster_name" { | ||
| default = "prod-pixie-cloud" | ||
| } | ||
| variable "namespace" { | ||
| default = "plc" | ||
| } | ||
|
|
||
| variable "cloud_domain" { | ||
| default = "pixie.austrianopencloudcommunity.org" | ||
| } |
There was a problem hiding this comment.
Avoid production defaults for environment-sensitive variables.
Defaulting to prod/production domain makes accidental production-targeted applies too easy. Require explicit values via tfvars or workspace variables.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@terraform/kubernetes/cloud_deps/variables.tf` around lines 4 - 20, Remove
production defaults from the sensitive Terraform input variables
(project_environment, cluster_env, cluster_name, namespace, cloud_domain) so
users must supply explicit values via tfvars or workspace variables;
specifically, delete the default = "..." lines for those variable blocks (or
change to no default) and optionally add a simple validation block in each
variable (or a single shared validation) to ensure the value is non-empty and
not "prod"/the production domain to prevent accidental production applies,
referencing the variable names project_environment, cluster_env, cluster_name,
namespace, and cloud_domain when making the changes.
No description provided.