[WIP] test by JaySon-Huang · Pull Request #10854 · pingcap/tiflash

JaySon-Huang · 2026-05-21T09:55:29Z

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

New Features
- Added test execution timeout support with configurable limits
- Enhanced documentation with detailed columnar cluster setup guide
Bug Fixes
- Fixed header alignment for partition scan operations
Improvements
- Enhanced error messages for table scan validation with more detailed diagnostic information
Chores
- Updated test configurations to enable columnar storage support
- Improved test infrastructure and script reliability

Signed-off-by: JaySon-Huang <tshent@qq.com>

Add an optional EXPOSE_TIDB_PORT override so local debugging can reach tidb0 from the host without changing the base cluster yaml.

Enable use_columnar on tiflash-cn0 and set pids_limit to -1 so fullstack tests can run without TiFlash write node or hitting container PID limits.

Signed-off-by: JaySon-Huang <tshent@qq.com>

ti-chi-bot · 2026-05-21T09:55:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hehechen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pantheon-ai · 2026-05-21T09:55:35Z

@JaySon-Huang I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai · 2026-05-21T09:56:04Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5278585f-5470-40d3-b683-52ea4d5ae403

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR establishes a comprehensive testing infrastructure for TiFlash's next-generation columnar storage. It fixes core header alignment issues in the columnar read path, configures a fully disaggregated test cluster, introduces Docker orchestration utilities, enhances test execution with timeouts and error tracking, and provides complete documentation and verification workflows.

Changes

TiFlash Columnar Next-Gen Infrastructure

Layer / File(s)	Summary
Columnar header alignment and validation fixes `dbms/src/Core/NamesAndTypes.h`, `dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp`, `dbms/src/Storages/StorageDisaggregatedColumnar.h`, `contrib/tiflash-proxy-columnar`	Fixes schema/header validation in columnar reads by updating RNProxyInputStream and RNProxySourceOp header initialization to use `AddExtraTableIDColumnTransformAction::getHeader()`, ensuring alignment with `genNamesAndTypesForTableScan` when `_tidb_tid` is involved. Improves exception messages for schema size and column type mismatches with structured JSON output. Bumps tiflash-proxy-columnar submodule.
Next-gen cluster configuration `tests/docker/next-gen-config/tidb.toml`, `tests/docker/next-gen-config/tiflash_cn.toml`, `tests/docker/next-gen-config/tikv.toml`, `tests/docker/next-gen-yaml/cluster.yaml`, `tests/docker/next-gen-yaml/disagg_tiflash.yaml`	Defines TiDB, TiFlash CN, and TiKV configurations for columnar disaggregated testing: enables columnar storage type, configures service scope, adds KV-engine columnar and FTS flags, and sets container resource limits (removes version, adjusts pids_limit to -1, removes tiflash-wn0).
Docker Compose local binary override files `tests/docker/next-gen-yaml/override/expose_tidb.yaml`, `tests/docker/next-gen-yaml/override/local_pd.yaml`, `tests/docker/next-gen-yaml/override/local_tidb.yaml`, `tests/docker/next-gen-yaml/override/local_tikv.yaml`	Adds reusable override YAML files to mount locally built binaries (PD, TiDB, TiKV) and expose TiDB port, enabling rapid iteration without container rebuilds.
Test infrastructure helpers and environment `tests/docker/util.sh`, `tests/fullstack-test-next-gen/_env.sh`	Introduces shell helper functions (`prepare_next_gen_data_dirs`, `validate_local_binary`, `append_local_binary_overrides`, `setup_next_gen_compose_files`) and environment setup (_env.sh) that exports Docker image URLs, branch normalization logic, and local binary directory overrides.
Docker Compose orchestrator wrapper `tests/fullstack-test-next-gen/compose.sh`	Implements a command-line wrapper around Docker Compose that sources environment, conditionally disables binary validation for certain subcommands, assembles compose files via helper functions, and prepares data directories before cluster operations.
Test execution framework improvements `tests/run-test.py`, `tests/run-test.sh`, `tests/fullstack-test-next-gen/run.sh`, `tests/temp_run_tests.sh`	Adds per-test execution timeouts via SIGALRM in Python, improves shell test runner with failure aggregation and reliable recursive discovery using process substitution, refactors run.sh to use centralized helpers, and introduces temp_run_tests.sh for suite-level execution.
Documentation and verification guides `tests/AGENTS.md`, `tests/.gitignore`, `verify_loop.md`	Replaces AGENTS.md with comprehensive next-gen setup guide covering TiFlash build, cluster prerequisites, `_env.sh` configuration, compose.sh commands, troubleshooting, and test execution. Adds *.log to gitignore. Introduces verify_loop.md with complete TiFlash build/cluster startup instructions, MySQL connection setup, and three concrete verification cases (enum PK columnar, partitioned table MPP, BIGINT PK boundary cases) with one-key reproduction scripts.

Sequence Diagram(s)

sequenceDiagram
  participant TiDB
  participant PhysicalTableScan
  participant StorageDisaggregatedColumnar
  participant RNProxyInputStream
  TiDB->>PhysicalTableScan: validate schema against storage header
  PhysicalTableScan->>PhysicalTableScan: check column count match
  alt schema size mismatch
    PhysicalTableScan->>PhysicalTableScan: throw with dumpJsonStructure
  end
  alt column type mismatch
    PhysicalTableScan->>PhysicalTableScan: throw with table_id and column details
  end
  StorageDisaggregatedColumnar->>RNProxyInputStream: initialize header via action.getHeader()
  RNProxyInputStream->>RNProxyInputStream: align with _tidb_tid from genNamesAndTypesForTableScan

sequenceDiagram
  participant run-test.sh
  participant run-test.py
  participant SIGALRM
  participant temp_run_tests.sh
  temp_run_tests.sh->>run-test.sh: iterate test suites
  run-test.sh->>run-test.sh: initialize FAILED_TESTS=0
  run-test.sh->>run-test.sh: run_file, run_dir with recursion
  run-test.sh->>run-test.py: execute with timeout
  run-test.py->>run-test.py: set alarm for TEST_TIMEOUT
  run-test.py->>SIGALRM: await interrupt on timeout
  alt timeout occurs
    SIGALRM->>run-test.py: raise TestTimeoutError
    run-test.py->>run-test.py: print file path and timeout
  end
  run-test.py-->>run-test.sh: return status
  run-test.sh->>run-test.sh: increment FAILED_TESTS on error
  run-test.sh->>run-test.sh: check FAILED_TESTS after all tests
  alt failures detected
    run-test.sh->>run-test.sh: print Total failed and exit 1
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Next-gen columnar: BIGINT clustered PK at INT64_MAX missing on TiFlash read #10852: Updates the tiflash-proxy-columnar submodule and fixes RNProxyInputStream/RNProxySourceOp header initialization in StorageDisaggregatedColumnar, directly addressing the schema/header mismatch issues referenced in the bug report.

Possibly related PRs

pingcap/tiflash#10675: Both PRs modify tests/AGENTS.md to improve next-gen test setup documentation.
pingcap/tiflash#10850: Related through StorageDisaggregatedColumnar adjustments affecting disaggregated columnar read paths.
pingcap/tiflash#10842: Related through tiflash-proxy-columnar submodule updates and RNProxyInputStream/RNProxySourceOp columnar data source wiring.

Suggested labels

size/XXL, type/feature, component/disaggregated, component/test-infra, area/columnar

Suggested reviewers

JinheLin
yongman
CalvinNeo

🐰 A columnar dream comes true today,
With headers aligned in every way,
The proxy now dances with perfect grace,
While test clusters hum at a steady pace,
From verification scripts to compose's embrace!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title '[WIP] test' is vague and generic, providing no meaningful information about the actual changeset.	Replace with a descriptive title summarizing the main changes, such as 'Add next-gen columnar cluster test infrastructure and fixes' or similar based on the actual primary objectives.
Description check	⚠️ Warning	The description is essentially a blank template with no substantive content filled in; problem statement, changes, and issue references are missing.	Fill in the template sections with actual problem description, detailed changes summary, checked test items, and affected issue number.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

verify_loop.md (2)
151-156: 💤 Low value

Specify language for fenced code block.

The fenced code block should specify a language identifier for better clarity and to satisfy markdown linting rules.
📝 Suggested fix
-```
+```text
 pk: tidb, tiflash
 pk+0: tidb/1, tiflash/4
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @verify_loop.md around lines 151 - 156, The fenced code block containing the
lines "pk: tidb, tiflash" and "pk+0: tidb/1, tiflash/4" lacks a language
identifier and the second line ordering should match the suggested fix; update
the block to use a language tag (e.g., ```text) and replace the block contents
with the corrected lines exactly as shown: "pk: tidb, tiflash" and "pk+0:
tidb/1, tiflash/4" so linting and clarity are satisfied.
</details>

---

`254-258`: _💤 Low value_

**Specify language for fenced code block.**

The fenced code block should specify a language identifier for better clarity and to satisfy markdown linting rules.




<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 | a                   | col  |
 |                   0 |    3 |
 | 9223372036854775807 |    2 |
 ```
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @verify_loop.md around lines 254 - 258, The fenced code block in
verify_loop.md lacks a language identifier; update the triple-backtick fence
that precedes the table to include a language (e.g., text or text-table)
so the block is explicitly marked (modify the existing fenced block around the
table content shown in the diff).
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @tests/AGENTS.md:

Line 47: The README command uses an undefined variable ${PROJECT_DIR};
update the instruction so it works out-of-the-box by either replacing
${PROJECT_DIR} with a relative path (./tests/.build/tiflash) in the shown
command or add a one-line note above the command stating that ${PROJECT_DIR}
should be set to the repository root path (e.g., export PROJECT_DIR="$(pwd)"
from repo root) so readers know how to define it before running the cmake
--install command.

Line 118: Replace the ungrammatical phrase "logging are available under
tests/fullstack-test-next-gen/log" in the tests/AGENTS.md content with a
correct form; update the sentence to either "logs are available under
tests/fullstack-test-next-gen/log" or "logging is available under
tests/fullstack-test-next-gen/log" (prefer "logs are available..." for plural
noun consistency).

In @tests/docker/util.sh:

Around line 179-199: The append_local_binary_overrides block unconditionally
runs realpath into LOCAL_PD_BIN_DIR/LOCAL_TIKV_BIN_DIR/LOCAL_TIDB_BIN_DIR which
can produce empty values when realpath fails; change the logic to only call
realpath (or at least validate its result) when validate_binaries is true, or
check the return of realpath and bail/log a clear error before exporting the
variable. Locate append_local_binary_overrides and update the sections that set
LOCAL_PD_BIN_DIR, LOCAL_TIKV_BIN_DIR, and LOCAL_TIDB_BIN_DIR so they either (a)
run realpath inside the if [[ "${validate_binaries}" == "true" ]] block before
calling validate_local_binary, or (b) test the realpath output for
non-empty/valid path and handle failure (error and exit) before adding
compose_files and printing "Using local ..." messages; reference
validate_local_binary, compose_files, and the LOCAL_*_BIN_DIR variables when
making the change.

In @tests/fullstack-test-next-gen/run.sh:

Around line 32-65: The script currently tears down Docker and cleans data only
on the success path, so failing commands leak resources; introduce a guaranteed
teardown by adding a cleanup function (invoking ${COMPOSE} "${COMPOSE_FILES[@]}"
down and clean_data_log) and register it with trap to run on EXIT/ERR before
running prepare_next_gen_data_dirs/wait_next_gen_env and the test invocations
(references: COMPOSE, COMPOSE_FILES, prepare_next_gen_data_dirs,
wait_next_gen_env, ENV_ARGS, clean_data_log); ensure the trap is installed early
(after set -e) so the cleanup runs regardless of test failures and remove or
avoid duplicate teardown at the end if the trap already handles it.

In @tests/temp_run_tests.sh:

Line 4: The script's cd command (cd "$(dirname "$0")") is unguarded and can
silently fail; update tests/temp_run_tests.sh to check the cd return and fail
fast (e.g., test the exit status and exit with a non-zero code or set -e before
the cd) so the script doesn't continue in the wrong working directory; locate
the cd "$(dirname "$0")" line and add a guard such as exiting on failure (or
wrapping with a conditional that prints an error and exits) to ensure the script
aborts if the chdir fails.

Nitpick comments:
In @verify_loop.md:

Around line 151-156: The fenced code block containing the lines "pk: tidb,
tiflash" and "pk+0: tidb/1, tiflash/4" lacks a language identifier and the
second line ordering should match the suggested fix; update the block to use a
language tag (e.g., ```text) and replace the block contents with the corrected
lines exactly as shown: "pk: tidb, tiflash" and "pk+0: tidb/1, tiflash/4" so
linting and clarity are satisfied.

Around line 254-258: The fenced code block in verify_loop.md lacks a language
identifier; update the triple-backtick fence that precedes the table to include
a language (e.g., text or text-table) so the block is explicitly marked
(modify the existing fenced block around the table content shown in the diff).
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `89c11643-e222-4bd6-adde-7a1053902b4f`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 7a5550a1b586249b519e633ffa2673107457946c and 533f2170575de8dda70a4ccc0813de81a0f3ba4c.

</details>

<details>
<summary>📒 Files selected for processing (23)</summary>

* `contrib/tiflash-proxy-columnar`
* `dbms/src/Core/NamesAndTypes.h`
* `dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp`
* `dbms/src/Storages/StorageDisaggregatedColumnar.h`
* `tests/.gitignore`
* `tests/AGENTS.md`
* `tests/docker/next-gen-config/tidb.toml`
* `tests/docker/next-gen-config/tiflash_cn.toml`
* `tests/docker/next-gen-config/tikv.toml`
* `tests/docker/next-gen-yaml/cluster.yaml`
* `tests/docker/next-gen-yaml/disagg_tiflash.yaml`
* `tests/docker/next-gen-yaml/override/expose_tidb.yaml`
* `tests/docker/next-gen-yaml/override/local_pd.yaml`
* `tests/docker/next-gen-yaml/override/local_tidb.yaml`
* `tests/docker/next-gen-yaml/override/local_tikv.yaml`
* `tests/docker/util.sh`
* `tests/fullstack-test-next-gen/_env.sh`
* `tests/fullstack-test-next-gen/compose.sh`
* `tests/fullstack-test-next-gen/run.sh`
* `tests/run-test.py`
* `tests/run-test.sh`
* `tests/temp_run_tests.sh`
* `verify_loop.md`

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* dbms/src/Core/NamesAndTypes.h

</details>

</details>

coderabbitai · 2026-05-21T10:03:20Z

+# From repository root
+cmake --preset dev -DENABLE_NEXT_GEN=ON -DENABLE_NEXT_GEN_COLUMNAR=ON
+cmake --build --preset dev --target tiflash
+cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the ${PROJECT_DIR} variable.

The ${PROJECT_DIR} variable is used without being defined. Since the preceding comment says "From repository root", consider either:

Replacing it with ./tests/.build/tiflash to make it relative to the current directory

Or adding a note explaining that ${PROJECT_DIR} should be the repository root path

📝 Suggested fix

-cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash" +cmake --install cmake-build-debug --component=tiflash-release --prefix ./tests/.build/tiflash

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"

cmake --install cmake-build-debug --component=tiflash-release --prefix ./tests/.build/tiflash

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/AGENTS.md` at line 47, The README command uses an undefined variable `${PROJECT_DIR}`; update the instruction so it works out-of-the-box by either replacing `${PROJECT_DIR}` with a relative path (`./tests/.build/tiflash`) in the shown command or add a one-line note above the command stating that `${PROJECT_DIR}` should be set to the repository root path (e.g., export PROJECT_DIR="$(pwd)" from repo root) so readers know how to define it before running the cmake --install command.

coderabbitai · 2026-05-21T10:03:20Z

+#### Troubleshooting
+- **`tests/.build/tiflash` missing:** run the build/install step above before `./compose.sh up`.
+- **Local binary not picked up:** confirm `LOCAL_*_BIN_DIR` is exported in `_env.sh` or on the command line before `./compose.sh up`.
+- logging are available under `tests/fullstack-test-next-gen/log`


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammar issue.

"logging are available" should be either "logs are available" or "logging is available".

📝 Suggested fix

-- logging are available under `tests/fullstack-test-next-gen/log` +- Logs are available under `tests/fullstack-test-next-gen/log`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- logging are available under `tests/fullstack-test-next-gen/log`

- Logs are available under `tests/fullstack-test-next-gen/log`

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/AGENTS.md` at line 118, Replace the ungrammatical phrase "logging are available under `tests/fullstack-test-next-gen/log`" in the tests/AGENTS.md content with a correct form; update the sentence to either "logs are available under `tests/fullstack-test-next-gen/log`" or "logging is available under `tests/fullstack-test-next-gen/log`" (prefer "logs are available..." for plural noun consistency).

coderabbitai · 2026-05-21T10:03:20Z

+    export LOCAL_PD_BIN_DIR="$(realpath "${LOCAL_PD_BIN_DIR}")"
+    if [[ "${validate_binaries}" == "true" ]]; then
+      validate_local_binary "${LOCAL_PD_BIN_DIR}/pd-server" "pd-server"
+    fi
+    compose_files+=(-f "${override_dir}/local_pd.yaml")
+    echo "Using local PD binary: ${LOCAL_PD_BIN_DIR}/pd-server"
+  fi
+  if [[ -n "${LOCAL_TIKV_BIN_DIR:-}" ]]; then
+    export LOCAL_TIKV_BIN_DIR="$(realpath "${LOCAL_TIKV_BIN_DIR}")"
+    if [[ "${validate_binaries}" == "true" ]]; then
+      validate_local_binary "${LOCAL_TIKV_BIN_DIR}/tikv-server" "tikv-server"
+      validate_local_binary "${LOCAL_TIKV_BIN_DIR}/tikv-worker" "tikv-worker"
+    fi
+    compose_files+=(-f "${override_dir}/local_tikv.yaml")
+    echo "Using local TiKV binary: ${LOCAL_TIKV_BIN_DIR}/tikv-server"
+    echo "Using local TiKV worker binary: ${LOCAL_TIKV_BIN_DIR}/tikv-worker"
+  fi
+  if [[ -n "${LOCAL_TIDB_BIN_DIR:-}" ]]; then
+    export LOCAL_TIDB_BIN_DIR="$(realpath "${LOCAL_TIDB_BIN_DIR}")"
+    if [[ "${validate_binaries}" == "true" ]]; then
+      validate_local_binary "${LOCAL_TIDB_BIN_DIR}/tidb-server" "tidb-server"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Repro: inline assignment exits before custom validation" bash -c 'set -e; export X="$(realpath /definitely/not/exist)"; echo "unreachable"' || echo "exited early as expected" echo "Repro: separated assignment allows explicit error handling" bash -c 'set -e; x="$(realpath /definitely/not/exist)" || { echo "custom error path reached"; exit 1; }'

Repository: pingcap/tiflash

Length of output: 314

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Bash set -e behavior with failing command substitution ==" echo "-- Case A: export VAR=\"\$(false)\" (expect: may or may not exit depending on bash semantics)" bash -c 'set -e; export X="$(false)"; echo "A: reached after export"' echo "-- Case A should have aborted; if it printed above, then behavior differs." echo "-- Case B: assignment with explicit || handler" bash -c 'set -e; x="$(false)" || { echo "B: custom error path reached"; exit 1; }; echo "B: reached after handler"' echo "-- Case C: export with substitution guarded by || (syntax variant)" bash -c 'set -e; export X="$(false)" || { echo "C: custom error path reached"; exit 1; }' echo "== Inspect tests/docker/util.sh around cited lines ==" # Print a small window around the mentioned lines nl -ba tests/docker/util.sh | sed -n '150,220p'

Repository: pingcap/tiflash

Length of output: 386

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== util.sh: lines 160-220 ==" nl -ba tests/docker/util.sh | sed -n '160,220p' echo echo "== Find validate_local_binary definition ==" rg -n --hidden --no-ignore-vcs "validate_local_binary" tests/docker/util.sh # show surrounding function body python3 - <<'PY' import re, itertools, sys, pathlib p=pathlib.Path("tests/docker/util.sh") lines=p.read_text().splitlines() idxs=[i for i,l in enumerate(lines) if "validate_local_binary" in l and re.search(r'\bvalidate_local_binary\b', l)] print("Occurrences:", idxs[:10]) PY echo echo "== util.sh: show function validate_local_binary region (best-effort) ==" # naive: print from first match line-5 to line+60 start=$(rg -n "^[[:space:]]*validate_local_binary\\b" tests/docker/util.sh | head -n1 | cut -d: -f1) if [ -n "$start" ]; then s=$((start-5)) if [ $s -lt 1 ]; then s=1; fi e=$((start+120)) nl -ba tests/docker/util.sh | sed -n "${s},${e}p" else echo "Could not locate function definition with expected pattern." fi echo echo "== util.sh: locate any set -e usage ==" rg -n "set -e" tests/docker/util.sh echo echo "== util.sh: locate realpath usage ==" rg -n "realpath" tests/docker/util.sh

Repository: pingcap/tiflash

Length of output: 149

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== util.sh: lines 160-220 ==" awk 'NR>=160 && NR<=220 {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh echo echo "== validate_local_binary occurrences (and show nearby) ==" rg -n "validate_local_binary" tests/docker/util.sh echo rg -n "validate_local_binary" tests/docker/util.sh -n # Show function definition region around the first match that looks like a function header start=$(rg -n "^[[:space:]]*(function[[:space:]]+)?validate_local_binary\\b" tests/docker/util.sh | head -n1 | cut -d: -f1 || true) if [ -n "${start}" ]; then s=$((start-10)); if [ $s -lt 1 ]; then s=1; fi e=$((start+140)) echo echo "== validate_local_binary region (around header line $start) ==" awk -v a="$s" -v b="$e" 'NR>=a && NR<=b {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh else echo echo "Could not find a function header for validate_local_binary." fi echo echo "== set -e usage in util.sh ==" rg -n "set -e" tests/docker/util.sh || true echo echo "== realpath usage in util.sh ==" rg -n "realpath" tests/docker/util.sh || true echo echo "== show exact realpath assignment lines around 179-199 ==" awk 'NR>=170 && NR<=210 {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh

Repository: pingcap/tiflash

Length of output: 11332

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "validate_binaries" -S tests/docker || true rg -n "setup_next_gen_compose_files" -S tests/docker || true rg -n "append_local_binary_overrides" -S tests/docker || true

Repository: pingcap/tiflash

Length of output: 960

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "setup_next_gen_compose_files\\b" -S . || true rg -n "append_local_binary_overrides\\b" -S . || true rg -n "validate_binaries" -S tests/docker || true

Repository: pingcap/tiflash

Length of output: 1167

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== compose.sh: lines 1-120 (look for validate_binaries) ==" awk 'NR>=1 && NR<=120 {printf "%6d\t%s\n", NR, $0}' tests/fullstack-test-next-gen/compose.sh echo echo "== run.sh: lines 1-80 ==" awk 'NR>=1 && NR<=80 {printf "%6d\t%s\n", NR, $0}' tests/fullstack-test-next-gen/run.sh echo echo "== validate_binaries occurrences in fullstack-test-next-gen ==" rg -n "validate_binaries" tests/fullstack-test-next-gen || true

Repository: pingcap/tiflash

Length of output: 6743

Guard realpath handling when setting LOCAL_*_BIN_DIR in append_local_binary_overrides.

tests/docker/util.sh lines 179/187/197 unconditionally do export LOCAL_*_BIN_DIR="$(realpath ...)", but validate_local_binary only runs when ${validate_binaries}=="true"; tests/fullstack-test-next-gen/compose.sh sets validate_binaries=false for stop/down/ps/logs/exec/.... If realpath fails, the LOCAL_*_BIN_DIR can become empty, yielding override paths like /pd-server and confusing output instead of a clear/consistent failure.

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 179-179: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 187-187: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 197-197: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/docker/util.sh` around lines 179 - 199, The append_local_binary_overrides block unconditionally runs realpath into LOCAL_PD_BIN_DIR/LOCAL_TIKV_BIN_DIR/LOCAL_TIDB_BIN_DIR which can produce empty values when realpath fails; change the logic to only call realpath (or at least validate its result) when validate_binaries is true, or check the return of realpath and bail/log a clear error before exporting the variable. Locate append_local_binary_overrides and update the sections that set LOCAL_PD_BIN_DIR, LOCAL_TIKV_BIN_DIR, and LOCAL_TIDB_BIN_DIR so they either (a) run realpath inside the if [[ "${validate_binaries}" == "true" ]] block before calling validate_local_binary, or (b) test the realpath output for non-empty/valid path and handle failure (error and exit) before adding compose_files and printing "Using local ..." messages; reference validate_local_binary, compose_files, and the LOCAL_*_BIN_DIR variables when making the change.

coderabbitai · 2026-05-21T10:03:20Z

 if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then
    echo "Running fullstack test on next-gen TiFlash"

-    # set images for next-gen TiFlash cluster
-    HUB_ADDR="${HUB_ADDR:-us-docker.pkg.dev/pingcap-testing-account/tidbx}"
-    if [[ -z "${PD_BRANCH}" || "${PD_BRANCH}" == "master" ]]; then
-        PD_BRANCH="master-next-gen"
-    fi
-    if [[ -z "${TIKV_BRANCH}" || "${TIKV_BRANCH}" == "cloud-engine" ]]; then
-        TIKV_BRANCH="cloud-engine-next-gen"
-    fi
-    if [[ -z "${TIDB_BRANCH}" || "${TIDB_BRANCH}" == "master" ]]; then
-        TIDB_BRANCH="master-next-gen"
-    fi
-    export PD_IMAGE="${PD_IMAGE:-${HUB_ADDR}/tikv/pd/image:${PD_BRANCH}}"
-    export TIKV_IMAGE="${TIKV_IMAGE:-${HUB_ADDR}/tikv/tikv/image:${TIKV_BRANCH}}"
-    export TIDB_IMAGE="${TIDB_IMAGE:-${HUB_ADDR}/pingcap/tidb/images/tidb-server:${TIDB_BRANCH}}"
-
    # clean up previous docker instances, data and log
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
+    ${COMPOSE} "${COMPOSE_FILES[@]}" down
    clean_data_log
+    prepare_next_gen_data_dirs

-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" up -d
+    ${COMPOSE} "${COMPOSE_FILES[@]}" up -d
    echo "PD version:"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T pd0 bash -c '/pd-server -V'
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'
    echo "TiDB version:"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tidb0 bash -c '/tidb-server -V'
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'
    echo "TiKV version:"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tikv0 bash -c '/tikv-server -V'
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'
+    echo "TiKV worker version:"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'

    # run fullstack-tests
    wait_next_gen_env
    ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "
    # most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
    # maybe we need to split them into parallel pipelines because they take too long to run.
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
-    ${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
+    ${COMPOSE} "${COMPOSE_FILES[@]}" down
    clean_data_log


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure teardown runs on failure, not only on success path.

With set -e, any failing test command exits before Line 64-65, so compose resources may leak and poison follow-up runs.

Suggested fix

if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then echo "Running fullstack test on next-gen TiFlash" + cleanup() { + ${COMPOSE} "${COMPOSE_FILES[@]}" down || true + clean_data_log || true + } + trap cleanup EXIT # clean up previous docker instances, data and log ${COMPOSE} "${COMPOSE_FILES[@]}" down clean_data_log @@ - ${COMPOSE} "${COMPOSE_FILES[@]}" down - clean_data_log + trap - EXIT + cleanup exit 0 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then

echo "Running fullstack test on next-gen TiFlash"

# set images for next-gen TiFlash cluster

HUB_ADDR="${HUB_ADDR:-us-docker.pkg.dev/pingcap-testing-account/tidbx}"

if [[ -z "${PD_BRANCH}" || "${PD_BRANCH}" == "master" ]]; then

PD_BRANCH="master-next-gen"

fi

if [[ -z "${TIKV_BRANCH}" || "${TIKV_BRANCH}" == "cloud-engine" ]]; then

TIKV_BRANCH="cloud-engine-next-gen"

fi

if [[ -z "${TIDB_BRANCH}" || "${TIDB_BRANCH}" == "master" ]]; then

TIDB_BRANCH="master-next-gen"

fi

export PD_IMAGE="${PD_IMAGE:-${HUB_ADDR}/tikv/pd/image:${PD_BRANCH}}"

export TIKV_IMAGE="${TIKV_IMAGE:-${HUB_ADDR}/tikv/tikv/image:${TIKV_BRANCH}}"

export TIDB_IMAGE="${TIDB_IMAGE:-${HUB_ADDR}/pingcap/tidb/images/tidb-server:${TIDB_BRANCH}}"

# clean up previous docker instances, data and log

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down

${COMPOSE} "${COMPOSE_FILES[@]}" down

clean_data_log

prepare_next_gen_data_dirs

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" up -d

${COMPOSE} "${COMPOSE_FILES[@]}" up -d

echo "PD version:"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T pd0 bash -c '/pd-server -V'

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'

echo "TiDB version:"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tidb0 bash -c '/tidb-server -V'

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'

echo "TiKV version:"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tikv0 bash -c '/tikv-server -V'

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'

echo "TiKV worker version:"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'

# run fullstack-tests

wait_next_gen_env

ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "

# most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"

# maybe we need to split them into parallel pipelines because they take too long to run.

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"

${COMPOSE} "${COMPOSE_FILES[@]}" down

clean_data_log

if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then

echo "Running fullstack test on next-gen TiFlash"

cleanup() {

${COMPOSE} "${COMPOSE_FILES[@]}" down || true

clean_data_log || true

}

trap cleanup EXIT

# clean up previous docker instances, data and log

${COMPOSE} "${COMPOSE_FILES[@]}" down

clean_data_log

prepare_next_gen_data_dirs

${COMPOSE} "${COMPOSE_FILES[@]}" up -d

echo "PD version:"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'

echo "TiDB version:"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'

echo "TiKV version:"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'

echo "TiKV worker version:"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'

# run fullstack-tests

wait_next_gen_env

ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "

# most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"

# maybe we need to split them into parallel pipelines because they take too long to run.

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"

${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"

trap - EXIT

cleanup

exit 0

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/fullstack-test-next-gen/run.sh` around lines 32 - 65, The script currently tears down Docker and cleans data only on the success path, so failing commands leak resources; introduce a guaranteed teardown by adding a cleanup function (invoking ${COMPOSE} "${COMPOSE_FILES[@]}" down and clean_data_log) and register it with trap to run on EXIT/ERR before running prepare_next_gen_data_dirs/wait_next_gen_env and the test invocations (references: COMPOSE, COMPOSE_FILES, prepare_next_gen_data_dirs, wait_next_gen_env, ENV_ARGS, clean_data_log); ensure the trap is installed early (after set -e) so the cleanup runs regardless of test failures and remove or avoid duplicate teardown at the end if the trap already handles it.

coderabbitai · 2026-05-21T10:03:20Z

+#!/bin/bash
+set -x
+
+cd "$(dirname "$0")"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -u # Demonstrates why explicit guard matters without `set -e`. bash -c 'cd /definitely/not/exist; echo "still running in wrong dir"' bash -c 'cd /definitely/not/exist || exit 1; echo "unreachable"'

Repository: pingcap/tiflash

Length of output: 238

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate and show the file with line numbers ls -la tests/temp_run_tests.sh sed -n '1,120p' tests/temp_run_tests.sh | nl -ba # Check if any other scripts use a similar guard pattern rg -n --hidden --no-ignore-vcs 'cd "\$$dirname "\$0"$"' tests || true rg -n --hidden --no-ignore-vcs 'cd "\$$dirname "\$0"$" \|\| exit' -S . || true rg -n --hidden --no-ignore-vcs 'set -e|set -u|set -eu|pipefail' tests/temp_run_tests.sh || true # Check shellcheck availability and run it on the file if present if command -v shellcheck >/dev/null 2>&1; then shellcheck -x tests/temp_run_tests.sh || true else echo "shellcheck not available" fi

Repository: pingcap/tiflash

Length of output: 192

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== File: tests/temp_run_tests.sh (numbered) ===" cat -n tests/temp_run_tests.sh echo echo "=== Search for existing cd pattern in tests/ ===" rg -n --hidden --no-ignore-vcs 'cd "\$$dirname "\$0"$"' tests || true echo echo "=== Search for cd-with-guard usage in scripts ===" rg -n --hidden --no-ignore-vcs 'cd "\$$dirname "\$0"$"\s*\|\|\s*exit' tests || true rg -n --hidden --no-ignore-vcs 'cd "\$$dirname "\$0"$"\s*\|\|\s*(exit|return)' tests || true echo echo "=== Check shell options used in this file ===" rg -n --hidden --no-ignore-vcs 'set\s+-[euoEPaix]*|#!/bin/(ba)?sh' tests/temp_run_tests.sh || true echo echo "=== Attempt shellcheck if available ===" if command -v shellcheck >/dev/null 2>&1; then shellcheck -x tests/temp_run_tests.sh || true else echo "shellcheck not available" fi

Repository: pingcap/tiflash

Length of output: 1768

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== tests/run-gtest.sh around cd ===" sed -n '1,80p' tests/run-gtest.sh | cat -n echo echo "=== tests/fullstack-test-next-gen/compose.sh around cd ===" sed -n '1,120p' tests/fullstack-test-next-gen/compose.sh | cat -n

Repository: pingcap/tiflash

Length of output: 5661

Guard cd in tests/temp_run_tests.sh to fail fast.

tests/temp_run_tests.sh only sets set -x, so if cd "$(dirname "$0")" fails it keeps running from the wrong working directory (shellcheck SC2164).

Suggested fix

-cd "$(dirname "$0")" +cd "$(dirname "$0")" || exit 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cd "$(dirname "$0")"

cd "$(dirname "$0")" || exit 1

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/temp_run_tests.sh` at line 4, The script's cd command (cd "$(dirname "$0")") is unguarded and can silently fail; update tests/temp_run_tests.sh to check the cd return and fail fast (e.g., test the exit status and exit with a non-zero code or set -e before the cd) so the script doesn't continue in the wrong working directory; locate the cd "$(dirname "$0")" line and add a guard such as exiting on failure (or wrapping with a conditional that prints an error and exits) to ensure the script aborts if the chdir fails.

ti-chi-bot · 2026-05-21T11:00:06Z

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-integration-next-gen	`533f217`	link	true	`/test pull-integration-next-gen`
pull-integration-test	`533f217`	link	true	`/test pull-integration-test`
pull-unit-next-gen	`533f217`	link	true	`/test pull-unit-next-gen`
pull-sanitizer-tsan	`533f217`	link	false	`/test pull-sanitizer-tsan`
pull-sanitizer-asan	`533f217`	link	false	`/test pull-sanitizer-asan`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Add [columnar_trace] logs across the columnar disaggregated read pipeline, bad_alloc guards in buildPushDownFilter/appendExtraCastsAfterTS, and RUNTIME_CHECK to confirm column_defines index misuse with generated columns. Update tiflash-proxy-columnar with matching filter debug logs.

…loop. genColumnDefinesForDisaggregatedRead already excludes generated columns from column_defines. The second-pass loop incorrectly indexed column_defines with table_scan indices, causing misaligned columns and std::bad_alloc. Fixes pingcap#10856

Signed-off-by: JaySon-Huang <tshent@qq.com>

Columnar nodes do not rely on kvstore, but restore still fetched legacy proxy config JSON and logged parse warnings against Columnar Hub output.

Signed-off-by: JaySon-Huang <tshent@qq.com>

ti-chi-bot · 2026-05-22T09:57:55Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

JaySon-Huang added 9 commits May 20, 2026 22:23

Enhance test scripts

ff1ba09

Signed-off-by: JaySon-Huang <tshent@qq.com>

fix enum encoding

f9054d1

Signed-off-by: JaySon-Huang <tshent@qq.com>

Fix partition table reading

a306c62

Signed-off-by: JaySon-Huang <tshent@qq.com>

add verify loop

bb3aeb1

Signed-off-by: JaySon-Huang <tshent@qq.com>

update new problem

6d09964

Signed-off-by: JaySon-Huang <tshent@qq.com>

prepare the scripts for setup a next-gen columnar cluster

9468516

Signed-off-by: JaySon-Huang <tshent@qq.com>

tests: expose tidb0 MySQL port on host for compose cluster

dae21a5

Add an optional EXPOSE_TIDB_PORT override so local debugging can reach tidb0 from the host without changing the base cluster yaml.

tests: fix next-gen compose cluster for columnar TiFlash CN

60e5575

Enable use_columnar on tiflash-cn0 and set pids_limit to -1 so fullstack tests can run without TiFlash write node or hitting container PID limits.

update AGENTS.md

533f217

Signed-off-by: JaySon-Huang <tshent@qq.com>

ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 21, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

JaySon-Huang marked this pull request as draft May 21, 2026 13:32

JaySon-Huang added 6 commits May 22, 2026 09:43

deprecated warnings

5881948

Signed-off-by: JaySon-Huang <tshent@qq.com>

Merge remote-tracking branch 'upstream/master' into jayson/temp_fix

4f11187

Skip kvstore restore for columnar compute during TMTContext init.

1398f07

Columnar nodes do not rely on kvstore, but restore still fetched legacy proxy config JSON and logged parse warnings against Columnar Hub output.

Fix config file

8ec0250

Signed-off-by: JaySon-Huang <tshent@qq.com>

	cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"
	cmake --install cmake-build-debug --component=tiflash-release --prefix ./tests/.build/tiflash

	- logging are available under `tests/fullstack-test-next-gen/log`
	- Logs are available under `tests/fullstack-test-next-gen/log`

Conversation

JaySon-Huang commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented May 21, 2026

Uh oh!

pantheon-ai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented May 21, 2026

Uh oh!

ti-chi-bot Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JaySon-Huang commented May 21, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading