validatedpatterns · butler54 · Mar 10, 2026
diff --git a/docs/nfd-matchall-bug.md b/docs/nfd-matchall-bug.md
@@ -0,0 +1,122 @@
+# Bug Report: NFD `matchAll` Field Silently Dropped, Causing False TEE Labels
+
+## Summary
+
+`matchAll` is not a valid field in the NFD `NodeFeatureRule.spec.rules[]` schema. The correct field for AND-logic matching is `matchFeatures` (top-level on each rule). When `matchAll` is used, the OpenShift NFD operator silently drops it during the `nfd.openshift.io/v1alpha1` to `nfd.k8s-sigs.io/v1alpha1` conversion, leaving rules with no match predicates. These empty rules match every node unconditionally, applying all TEE labels regardless of hardware.
+
+## Impact
+
+- Every node receives ALL TEE labels: `intel.feature.node.kubernetes.io/tdx`, `amd.feature.node.kubernetes.io/snp`, `ibm.feature.node.kubernetes.io/se`, and `intel.feature.node.kubernetes.io/sgx`
+- The OpenShift sandboxed containers operator fails with: **`"multiple TEE platforms detected; only one per cluster supported"`**
+- KataConfig cannot reconcile, so no kata runtime handler is installed
+- All confidential container pods fail with: `failed to find runtime handler kata-snp from runtime list`
+
+## Root Cause
+
+The NFD `Rule` schema supports two match fields:
+
+| Field | Behavior | Valid? |
+|-------|----------|--------|
+| `matchFeatures` | Top-level list of feature matchers; ALL must match (AND) | Yes |
+| `matchAny` | List of match groups; ANY must match (OR) | Yes |
+| `matchAll` | **Does not exist in the NFD API** | No |
+
+When the chart template uses `matchAll`:
+
+```yaml
+# BROKEN - matchAll is not a valid field
+- name: "amd.sev-snp"
+  labels:
+    amd.feature.node.kubernetes.io/snp: "true"
+  matchAll:
+    - matchFeatures:
+        - feature: cpu.security
+          matchExpressions:
+            sev.snp.enabled: { op: Exists }
+```
+
+The OpenShift NFD operator creates a shadow resource under `nfd.k8s-sigs.io/v1alpha1`. During this conversion, `matchAll` is an unrecognized field and is silently stripped. The resulting live resource has:
+
+```yaml
+# RESULT - no match conditions, matches every node
+- name: "amd.sev-snp"
+  labels:
+    amd.feature.node.kubernetes.io/snp: "true"
+  labelsTemplate: ""
+  varsTemplate: ""
+```
+
+## Evidence
+
+**Node:** `master-03` (Intel Xeon, model family 6, ID 207, vendor: Intel)
+
+**NFD-reported hardware features (`cpu.security`):**
+
+```bash
+sgx.enabled: "true"
+sgx.epc: "4257210368"
+```
+
+Note: `sev.snp.enabled`, `tdx.enabled`, and `se.enabled` are **not present** in the node's feature data.
+
+**Labels applied to the node (all false positives except sgx):**
+
+```bash
+amd.feature.node.kubernetes.io/snp=true    # FALSE - Intel CPU, no SEV-SNP
+intel.feature.node.kubernetes.io/tdx=true  # FALSE - no tdx.enabled in cpu.security
+ibm.feature.node.kubernetes.io/se=true     # FALSE - Intel CPU, no SE
+intel.feature.node.kubernetes.io/sgx=true  # CORRECT - sgx.enabled is true
+feature.node.kubernetes.io/runtime.kata=true  # CORRECT - matchAny works (valid field)
+```
+
+**Sandbox operator log:**
+
+```text
+INFO  controllers.KataConfig  failed to detect TEE platform
+      {"err": "multiple TEE platforms detected; only one per cluster supported"}
+```
+
+## Fix
+
+Replace `matchAll` with `matchFeatures` in each rule. The `matchFeatures` list at the rule level uses AND logic (all entries must match), which is the intended behavior.
+
+Additionally, add vendor-discriminating CPUID checks to prevent cross-platform false positives:
+
+```yaml
+# FIXED - uses matchFeatures (valid field) with vendor guard
+- name: "amd.sev-snp"
+  labels:
+    amd.feature.node.kubernetes.io/snp: "true"
+  matchFeatures:
+    - feature: cpu.cpuid
+      matchExpressions:
+        SVM: { op: Exists }           # AMD-only CPUID flag
+    - feature: cpu.security
+      matchExpressions:
+        sev.snp.enabled: { op: Exists }
+```
+
+| Rule | `matchAll` (broken) | `matchFeatures` (fixed) | Vendor guard added |
+|------|---------------------|-------------------------|--------------------|
+| `amd.sev-snp` | Dropped silently | AND: SVM + sev.snp.enabled | `SVM` (AMD) |
+| `intel.sgx` | Dropped silently | AND: SGX + SGXLC + sgx.enabled + X86_SGX | `SGX`, `SGXLC` (Intel) |
+| `intel.tdx` | Dropped silently | AND: VMX + tdx.enabled | `VMX` (Intel) |
+| `ibm.se.enabled` | Dropped silently | AND: se.enabled | None (s390x only) |
+
+## Affected Versions
+
+Any deployment using the consolidated `NodeFeatureRule` (`consolidated-hardware-features`) introduced in commit `57ec5f4`. The original separate NFD rule files (`amd-nfd-rules.yaml`, `intel-nfd-rules.yaml`) used `matchFeatures` correctly but the consolidation mistakenly introduced `matchAll`.
+
+## Remediation
+
+After deploying the corrected chart:
+
+```bash
+# Remove false labels so NFD can re-evaluate
+oc label node <node> amd.feature.node.kubernetes.io/snp- \
+  intel.feature.node.kubernetes.io/tdx- \
+  ibm.feature.node.kubernetes.io/se-
+
+# Restart sandbox operator to re-evaluate KataConfig
+oc delete pod -n openshift-sandboxed-containers-operator -l app=controller-manager
+```
diff --git a/docs/pcr-reference-values-bare-metal.md b/docs/pcr-reference-values-bare-metal.md
@@ -0,0 +1,198 @@
+# Collecting PCR Reference Values on Bare Metal for RVPS Attestation
+
+## Overview
+
+The Trustee attestation service uses the Reference Value Provider Service (RVPS) to verify that a confidential VM is running the expected software stack. RVPS compares measurements from the TEE attestation quote against pre-registered reference values. On Azure, these values are populated automatically. On bare metal, they must be collected manually and pushed to Vault.
+
+This document covers the collection of vTPM PCR values for bare-metal TDX deployments, **excluding PCR8** (init data), which is computed separately by the `init-data-gzipper.yaml` imperative playbook.
+
+## What Gets Measured
+
+### PCR Registers Used by RVPS
+
+The trustee chart's RVPS policy (`rvps-values-policies.yaml`) consumes these PCR values:
+
+| PCR | Contents | Source |
+|-----|----------|--------|
+| PCR03 | Boot loader configuration | `pcr-stash` secret |
+| PCR08 | Init data (CoCo-specific) | `initdata` ConfigMap (computed by `init-data-gzipper.yaml`) |
+| PCR09 | Boot loader code | `pcr-stash` secret |
+| PCR11 | BitLocker access control / TPM event log | `pcr-stash` secret |
+| PCR12 | Boot events | `pcr-stash` secret |
+
+PCR08 is excluded from this procedure because it represents the hash of the CoCo init data TOML, which changes with each deployment configuration. It is handled by the `ansible/init-data-gzipper.yaml` playbook, which renders the init data template, computes `sha256(zero_pcr || sha256(initdata.toml))`, and stores the result in the `imperative/initdata` ConfigMap as `PCR8_HASH`.
+
+### TDX RTMR to PCR Mapping
+
+For TDX on bare metal, the vTPM inside the confidential VM maps measurements to PCR registers. The underlying TDX hardware uses Runtime Measurement Registers (RTMRs):
+
+| RTMR | Contents | Corresponding PCRs |
+|------|----------|---------------------|
+| MRTD | TD build-time measurement (firmware) | N/A (separate field in quote) |
+| RTMR[0] | TDVF configuration, ACPI tables | PCR 0-1 |
+| RTMR[1] | OS kernel, boot parameters, initrd | PCR 4-7 |
+| RTMR[2] | OS applications, IMA measurements | PCR 8-15 |
+| RTMR[3] | Reserved | N/A |
+
+## Collection Methods
+
+### Method 1: From a Running Confidential VM (Recommended)
+
+Launch a confidential container pod, exec into it, and read the vTPM PCR values. This gives you the actual measurements for your specific firmware + kernel + configuration.
+
+**Prerequisites:**
+
+- A working CoCo deployment with kata runtime installed
+- `tpm2-tools` available in the guest (or use a debug pod)
+
+**Steps:**
+
+1. Launch a confidential pod:
+
+   ```bash
+   oc run pcr-collector --image=registry.access.redhat.com/ubi9/ubi:latest \
+     --restart=Never --overrides='{"spec":{"runtimeClassName":"kata-cc"}}' \
+     -- sleep 3600
+   ```
+
+1. Install tpm2-tools inside the pod:
+
+   ```bash
+   oc exec pcr-collector -- dnf install -y tpm2-tools
+   ```
+
+1. Read PCR values (SHA-256 bank):
+
+   ```bash
+   oc exec pcr-collector -- tpm2_pcrread sha256:3,9,11,12
+   ```
+
+   Example output:
+
+   ```text
+   sha256:
+     3 : 0x3D458CFE55CC03EA1F443F1562BEE8DF30100AB2E1C4B6E5FE4568E7B0E6745A
+     9 : 0x96A18E5C5E3E9AEC7FE5B8A1C6A02E8D6A4E8C6B3E9A7F5B2D4C8E1A3F6B9D2
+     11: 0x0000000000000000000000000000000000000000000000000000000000000000
+     12: 0x0000000000000000000000000000000000000000000000000000000000000000
+   ```
+
+1. Clean up:
+
+   ```bash
+   oc delete pod pcr-collector
+   ```
+
+### Method 2: Pre-Calculation with tdx-measure
+
+The [virtee/tdx-measure](https://github.com/virtee/tdx-measure) tool computes expected TDX measurement registers offline from firmware and kernel binaries, without requiring a running TD.
+
+**Install:**
+
+```bash
+cargo install tdx-measure
+```
+
+**Usage:**
+
+Create a metadata JSON file describing your boot components:
+
+```json
+{
+  "firmware": "/path/to/OVMF.fd",
+  "kernel": "/path/to/vmlinuz",
+  "initrd": "/path/to/initrd.img",
+  "cmdline": "console=ttyS0",
+  "memory_mb": 4096,
+  "vcpus": 2
+}
+```
+
+Compute all measurements:
+
+```bash
+tdx-measure metadata.json --direct-boot true --json
+```
+
+Compute platform-only (MRTD + RTMR0, excludes kernel/initrd):
+
+```bash
+tdx-measure metadata.json --platform-only --json
+```
+
+Compute runtime-only (RTMR1 + RTMR2, kernel + initrd):
+
+```bash
+tdx-measure metadata.json --runtime-only --json
+```
+
+**Note:** You need to extract the firmware (OVMF/TDVF), kernel, and initrd images from your OpenShift node. These can be found in the kata containers payload.
+
+### Method 3: From the Attestation Quote
+
+If you have a running deployment with attestation disabled (`global.coco.secured: false`), you can capture the raw attestation quote and extract measurements:
+
+1. Deploy with `bypassAttestation: true` in the trustee configuration
+2. Launch a confidential pod
+3. The attestation agent logs the quote contents — extract PCR values from the quote structure
+4. Use these values as your reference baseline
+
+## Pushing Values to Vault
+
+Once you have collected PCR values, push them to Vault in the format expected by the `pcrs-eso` ExternalSecret:
+
+```bash
+# Format: JSON object with measurements.sha256.pcrNN keys
+oc exec -n vault vault-0 -- vault kv put secret/hub/pcrStash \
+  json='{
+    "measurements": {
+      "sha256": {
+        "pcr03": "<PCR03_HEX_VALUE>",
+        "pcr09": "<PCR09_HEX_VALUE>",
+        "pcr11": "<PCR11_HEX_VALUE>",
+        "pcr12": "<PCR12_HEX_VALUE>"
+      }
+    }
+  }'
+```
+
+The `pcrs-eso` ExternalSecret will then sync this into a `pcr-stash` Kubernetes secret in the `trustee-operator-system` namespace. The RVPS policy (`rvps-values-policies.yaml`) reads from this secret to populate the `rvps-reference-values` ConfigMap.
+
+## Pipeline Summary
+
+```text
+tpm2_pcrread (inside CoCo pod)
+  |
+  v
+Vault: secret/data/hub/pcrStash    (pcr03, pcr09, pcr11, pcr12)
+  |
+  v (ExternalSecret: pcrs-eso)
+K8s Secret: pcr-stash               (trustee-operator-system)
+  |
+  v (ACM Policy: rvps-policy)
+ConfigMap: rvps-reference-values     (trustee-operator-system)
+  |
+  v
+Attestation Service RVPS            (compares against TD quote)
+```
+
+PCR8 follows a separate path:
+
+```text
+ansible/init-data-gzipper.yaml
+  |
+  v
+ConfigMap: initdata                  (imperative namespace)
+  |                                  contains: INITDATA, PCR8_HASH
+  v (ACM Policy: rvps-policy)
+ConfigMap: rvps-reference-values     (merged with pcr-stash values)
+```
+
+## References
+
+- [Red Hat: How to deploy confidential containers on bare metal](https://developers.redhat.com/articles/2025/02/19/how-deploy-confidential-containers-bare-metal)
+- [Red Hat: Introducing Confidential Containers Trustee](https://www.redhat.com/en/blog/introducing-confidential-containers-trustee-attestation-services-solution-overview-and-use-cases)
+- [Intel: Runtime Integrity Measurement and Attestation in a Trust Domain](https://www.intel.com/content/www/us/en/developer/articles/community/runtime-integrity-measure-and-attest-trust-domain.html)
+- [virtee/tdx-measure](https://github.com/virtee/tdx-measure) - Pre-calculate TDX measurements offline
+- [CoCo Attestation Service RVPS docs](https://github.com/confidential-containers/attestation-service/blob/main/docs/rvps.md)
+- [CoCo Attestation Policies](https://confidentialcontainers.org/docs/attestation/policies/)