Skip to content

[WIP] test#10854

Draft
JaySon-Huang wants to merge 15 commits into
pingcap:masterfrom
JaySon-Huang:jayson/temp_fix
Draft

[WIP] test#10854
JaySon-Huang wants to merge 15 commits into
pingcap:masterfrom
JaySon-Huang:jayson/temp_fix

Conversation

@JaySon-Huang
Copy link
Copy Markdown
Contributor

@JaySon-Huang JaySon-Huang commented May 21, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Added test execution timeout support with configurable limits
    • Enhanced documentation with detailed columnar cluster setup guide
  • Bug Fixes

    • Fixed header alignment for partition scan operations
  • Improvements

    • Enhanced error messages for table scan validation with more detailed diagnostic information
  • Chores

    • Updated test configurations to enable columnar storage support
    • Improved test infrastructure and script reliability

Review Change Stack

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Add an optional EXPOSE_TIDB_PORT override so local debugging can reach
tidb0 from the host without changing the base cluster yaml.
Enable use_columnar on tiflash-cn0 and set pids_limit to -1 so fullstack
tests can run without TiFlash write node or hitting container PID limits.
Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 21, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hehechen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 21, 2026

@JaySon-Huang I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 21, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5278585f-5470-40d3-b683-52ea4d5ae403

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR establishes a comprehensive testing infrastructure for TiFlash's next-generation columnar storage. It fixes core header alignment issues in the columnar read path, configures a fully disaggregated test cluster, introduces Docker orchestration utilities, enhances test execution with timeouts and error tracking, and provides complete documentation and verification workflows.

Changes

TiFlash Columnar Next-Gen Infrastructure

Layer / File(s) Summary
Columnar header alignment and validation fixes
dbms/src/Core/NamesAndTypes.h, dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp, dbms/src/Storages/StorageDisaggregatedColumnar.h, contrib/tiflash-proxy-columnar
Fixes schema/header validation in columnar reads by updating RNProxyInputStream and RNProxySourceOp header initialization to use AddExtraTableIDColumnTransformAction::getHeader(), ensuring alignment with genNamesAndTypesForTableScan when _tidb_tid is involved. Improves exception messages for schema size and column type mismatches with structured JSON output. Bumps tiflash-proxy-columnar submodule.
Next-gen cluster configuration
tests/docker/next-gen-config/tidb.toml, tests/docker/next-gen-config/tiflash_cn.toml, tests/docker/next-gen-config/tikv.toml, tests/docker/next-gen-yaml/cluster.yaml, tests/docker/next-gen-yaml/disagg_tiflash.yaml
Defines TiDB, TiFlash CN, and TiKV configurations for columnar disaggregated testing: enables columnar storage type, configures service scope, adds KV-engine columnar and FTS flags, and sets container resource limits (removes version, adjusts pids_limit to -1, removes tiflash-wn0).
Docker Compose local binary override files
tests/docker/next-gen-yaml/override/expose_tidb.yaml, tests/docker/next-gen-yaml/override/local_pd.yaml, tests/docker/next-gen-yaml/override/local_tidb.yaml, tests/docker/next-gen-yaml/override/local_tikv.yaml
Adds reusable override YAML files to mount locally built binaries (PD, TiDB, TiKV) and expose TiDB port, enabling rapid iteration without container rebuilds.
Test infrastructure helpers and environment
tests/docker/util.sh, tests/fullstack-test-next-gen/_env.sh
Introduces shell helper functions (prepare_next_gen_data_dirs, validate_local_binary, append_local_binary_overrides, setup_next_gen_compose_files) and environment setup (_env.sh) that exports Docker image URLs, branch normalization logic, and local binary directory overrides.
Docker Compose orchestrator wrapper
tests/fullstack-test-next-gen/compose.sh
Implements a command-line wrapper around Docker Compose that sources environment, conditionally disables binary validation for certain subcommands, assembles compose files via helper functions, and prepares data directories before cluster operations.
Test execution framework improvements
tests/run-test.py, tests/run-test.sh, tests/fullstack-test-next-gen/run.sh, tests/temp_run_tests.sh
Adds per-test execution timeouts via SIGALRM in Python, improves shell test runner with failure aggregation and reliable recursive discovery using process substitution, refactors run.sh to use centralized helpers, and introduces temp_run_tests.sh for suite-level execution.
Documentation and verification guides
tests/AGENTS.md, tests/.gitignore, verify_loop.md
Replaces AGENTS.md with comprehensive next-gen setup guide covering TiFlash build, cluster prerequisites, _env.sh configuration, compose.sh commands, troubleshooting, and test execution. Adds *.log to gitignore. Introduces verify_loop.md with complete TiFlash build/cluster startup instructions, MySQL connection setup, and three concrete verification cases (enum PK columnar, partitioned table MPP, BIGINT PK boundary cases) with one-key reproduction scripts.

Sequence Diagram(s)

sequenceDiagram
  participant TiDB
  participant PhysicalTableScan
  participant StorageDisaggregatedColumnar
  participant RNProxyInputStream
  TiDB->>PhysicalTableScan: validate schema against storage header
  PhysicalTableScan->>PhysicalTableScan: check column count match
  alt schema size mismatch
    PhysicalTableScan->>PhysicalTableScan: throw with dumpJsonStructure
  end
  alt column type mismatch
    PhysicalTableScan->>PhysicalTableScan: throw with table_id and column details
  end
  StorageDisaggregatedColumnar->>RNProxyInputStream: initialize header via action.getHeader()
  RNProxyInputStream->>RNProxyInputStream: align with _tidb_tid from genNamesAndTypesForTableScan
Loading
sequenceDiagram
  participant run-test.sh
  participant run-test.py
  participant SIGALRM
  participant temp_run_tests.sh
  temp_run_tests.sh->>run-test.sh: iterate test suites
  run-test.sh->>run-test.sh: initialize FAILED_TESTS=0
  run-test.sh->>run-test.sh: run_file, run_dir with recursion
  run-test.sh->>run-test.py: execute with timeout
  run-test.py->>run-test.py: set alarm for TEST_TIMEOUT
  run-test.py->>SIGALRM: await interrupt on timeout
  alt timeout occurs
    SIGALRM->>run-test.py: raise TestTimeoutError
    run-test.py->>run-test.py: print file path and timeout
  end
  run-test.py-->>run-test.sh: return status
  run-test.sh->>run-test.sh: increment FAILED_TESTS on error
  run-test.sh->>run-test.sh: check FAILED_TESTS after all tests
  alt failures detected
    run-test.sh->>run-test.sh: print Total failed and exit 1
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • pingcap/tiflash#10675: Both PRs modify tests/AGENTS.md to improve next-gen test setup documentation.
  • pingcap/tiflash#10850: Related through StorageDisaggregatedColumnar adjustments affecting disaggregated columnar read paths.
  • pingcap/tiflash#10842: Related through tiflash-proxy-columnar submodule updates and RNProxyInputStream/RNProxySourceOp columnar data source wiring.

Suggested labels

size/XXL, type/feature, component/disaggregated, component/test-infra, area/columnar

Suggested reviewers

  • JinheLin
  • yongman
  • CalvinNeo

🐰 A columnar dream comes true today,
With headers aligned in every way,
The proxy now dances with perfect grace,
While test clusters hum at a steady pace,
From verification scripts to compose's embrace!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title '[WIP] test' is vague and generic, providing no meaningful information about the actual changeset. Replace with a descriptive title summarizing the main changes, such as 'Add next-gen columnar cluster test infrastructure and fixes' or similar based on the actual primary objectives.
Description check ⚠️ Warning The description is essentially a blank template with no substantive content filled in; problem statement, changes, and issue references are missing. Fill in the template sections with actual problem description, detailed changes summary, checked test items, and affected issue number.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
verify_loop.md (2)

151-156: 💤 Low value

Specify language for fenced code block.

The fenced code block should specify a language identifier for better clarity and to satisfy markdown linting rules.

📝 Suggested fix
-```
+```text
 pk: tidb, tiflash
 pk+0: tidb/1, tiflash/4

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @verify_loop.md around lines 151 - 156, The fenced code block containing the
lines "pk: tidb, tiflash" and "pk+0: tidb/1, tiflash/4" lacks a language
identifier and the second line ordering should match the suggested fix; update
the block to use a language tag (e.g., ```text) and replace the block contents
with the corrected lines exactly as shown: "pk: tidb, tiflash" and "pk+0:
tidb/1, tiflash/4" so linting and clarity are satisfied.


</details>

---

`254-258`: _💤 Low value_

**Specify language for fenced code block.**

The fenced code block should specify a language identifier for better clarity and to satisfy markdown linting rules.




<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 | a                   | col  |
 |                   0 |    3 |
 | 9223372036854775807 |    2 |
 ```
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @verify_loop.md around lines 254 - 258, The fenced code block in
verify_loop.md lacks a language identifier; update the triple-backtick fence
that precedes the table to include a language (e.g., text or text-table)
so the block is explicitly marked (modify the existing fenced block around the
table content shown in the diff).


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @tests/AGENTS.md:

  • Line 47: The README command uses an undefined variable ${PROJECT_DIR};
    update the instruction so it works out-of-the-box by either replacing
    ${PROJECT_DIR} with a relative path (./tests/.build/tiflash) in the shown
    command or add a one-line note above the command stating that ${PROJECT_DIR}
    should be set to the repository root path (e.g., export PROJECT_DIR="$(pwd)"
    from repo root) so readers know how to define it before running the cmake
    --install command.
  • Line 118: Replace the ungrammatical phrase "logging are available under
    tests/fullstack-test-next-gen/log" in the tests/AGENTS.md content with a
    correct form; update the sentence to either "logs are available under
    tests/fullstack-test-next-gen/log" or "logging is available under
    tests/fullstack-test-next-gen/log" (prefer "logs are available..." for plural
    noun consistency).

In @tests/docker/util.sh:

  • Around line 179-199: The append_local_binary_overrides block unconditionally
    runs realpath into LOCAL_PD_BIN_DIR/LOCAL_TIKV_BIN_DIR/LOCAL_TIDB_BIN_DIR which
    can produce empty values when realpath fails; change the logic to only call
    realpath (or at least validate its result) when validate_binaries is true, or
    check the return of realpath and bail/log a clear error before exporting the
    variable. Locate append_local_binary_overrides and update the sections that set
    LOCAL_PD_BIN_DIR, LOCAL_TIKV_BIN_DIR, and LOCAL_TIDB_BIN_DIR so they either (a)
    run realpath inside the if [[ "${validate_binaries}" == "true" ]] block before
    calling validate_local_binary, or (b) test the realpath output for
    non-empty/valid path and handle failure (error and exit) before adding
    compose_files and printing "Using local ..." messages; reference
    validate_local_binary, compose_files, and the LOCAL_*_BIN_DIR variables when
    making the change.

In @tests/fullstack-test-next-gen/run.sh:

  • Around line 32-65: The script currently tears down Docker and cleans data only
    on the success path, so failing commands leak resources; introduce a guaranteed
    teardown by adding a cleanup function (invoking ${COMPOSE} "${COMPOSE_FILES[@]}"
    down and clean_data_log) and register it with trap to run on EXIT/ERR before
    running prepare_next_gen_data_dirs/wait_next_gen_env and the test invocations
    (references: COMPOSE, COMPOSE_FILES, prepare_next_gen_data_dirs,
    wait_next_gen_env, ENV_ARGS, clean_data_log); ensure the trap is installed early
    (after set -e) so the cleanup runs regardless of test failures and remove or
    avoid duplicate teardown at the end if the trap already handles it.

In @tests/temp_run_tests.sh:

  • Line 4: The script's cd command (cd "$(dirname "$0")") is unguarded and can
    silently fail; update tests/temp_run_tests.sh to check the cd return and fail
    fast (e.g., test the exit status and exit with a non-zero code or set -e before
    the cd) so the script doesn't continue in the wrong working directory; locate
    the cd "$(dirname "$0")" line and add a guard such as exiting on failure (or
    wrapping with a conditional that prints an error and exits) to ensure the script
    aborts if the chdir fails.

Nitpick comments:
In @verify_loop.md:

  • Around line 151-156: The fenced code block containing the lines "pk: tidb,
    tiflash" and "pk+0: tidb/1, tiflash/4" lacks a language identifier and the
    second line ordering should match the suggested fix; update the block to use a
    language tag (e.g., ```text) and replace the block contents with the corrected
    lines exactly as shown: "pk: tidb, tiflash" and "pk+0: tidb/1, tiflash/4" so
    linting and clarity are satisfied.
  • Around line 254-258: The fenced code block in verify_loop.md lacks a language
    identifier; update the triple-backtick fence that precedes the table to include
    a language (e.g., text or text-table) so the block is explicitly marked
    (modify the existing fenced block around the table content shown in the diff).

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `89c11643-e222-4bd6-adde-7a1053902b4f`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 7a5550a1b586249b519e633ffa2673107457946c and 533f2170575de8dda70a4ccc0813de81a0f3ba4c.

</details>

<details>
<summary>📒 Files selected for processing (23)</summary>

* `contrib/tiflash-proxy-columnar`
* `dbms/src/Core/NamesAndTypes.h`
* `dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp`
* `dbms/src/Storages/StorageDisaggregatedColumnar.h`
* `tests/.gitignore`
* `tests/AGENTS.md`
* `tests/docker/next-gen-config/tidb.toml`
* `tests/docker/next-gen-config/tiflash_cn.toml`
* `tests/docker/next-gen-config/tikv.toml`
* `tests/docker/next-gen-yaml/cluster.yaml`
* `tests/docker/next-gen-yaml/disagg_tiflash.yaml`
* `tests/docker/next-gen-yaml/override/expose_tidb.yaml`
* `tests/docker/next-gen-yaml/override/local_pd.yaml`
* `tests/docker/next-gen-yaml/override/local_tidb.yaml`
* `tests/docker/next-gen-yaml/override/local_tikv.yaml`
* `tests/docker/util.sh`
* `tests/fullstack-test-next-gen/_env.sh`
* `tests/fullstack-test-next-gen/compose.sh`
* `tests/fullstack-test-next-gen/run.sh`
* `tests/run-test.py`
* `tests/run-test.sh`
* `tests/temp_run_tests.sh`
* `verify_loop.md`

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* dbms/src/Core/NamesAndTypes.h

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread tests/AGENTS.md
# From repository root
cmake --preset dev -DENABLE_NEXT_GEN=ON -DENABLE_NEXT_GEN_COLUMNAR=ON
cmake --build --preset dev --target tiflash
cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the ${PROJECT_DIR} variable.

The ${PROJECT_DIR} variable is used without being defined. Since the preceding comment says "From repository root", consider either:

  • Replacing it with ./tests/.build/tiflash to make it relative to the current directory
  • Or adding a note explaining that ${PROJECT_DIR} should be the repository root path
📝 Suggested fix
-cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"
+cmake --install cmake-build-debug --component=tiflash-release --prefix ./tests/.build/tiflash
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmake --install cmake-build-debug --component=tiflash-release --prefix "${PROJECT_DIR}/tests/.build/tiflash"
cmake --install cmake-build-debug --component=tiflash-release --prefix ./tests/.build/tiflash
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/AGENTS.md` at line 47, The README command uses an undefined variable
`${PROJECT_DIR}`; update the instruction so it works out-of-the-box by either
replacing `${PROJECT_DIR}` with a relative path (`./tests/.build/tiflash`) in
the shown command or add a one-line note above the command stating that
`${PROJECT_DIR}` should be set to the repository root path (e.g., export
PROJECT_DIR="$(pwd)" from repo root) so readers know how to define it before
running the cmake --install command.

Comment thread tests/AGENTS.md
#### Troubleshooting
- **`tests/.build/tiflash` missing:** run the build/install step above before `./compose.sh up`.
- **Local binary not picked up:** confirm `LOCAL_*_BIN_DIR` is exported in `_env.sh` or on the command line before `./compose.sh up`.
- logging are available under `tests/fullstack-test-next-gen/log`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammar issue.

"logging are available" should be either "logs are available" or "logging is available".

📝 Suggested fix
-- logging are available under `tests/fullstack-test-next-gen/log`
+- Logs are available under `tests/fullstack-test-next-gen/log`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- logging are available under `tests/fullstack-test-next-gen/log`
- Logs are available under `tests/fullstack-test-next-gen/log`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/AGENTS.md` at line 118, Replace the ungrammatical phrase "logging are
available under `tests/fullstack-test-next-gen/log`" in the tests/AGENTS.md
content with a correct form; update the sentence to either "logs are available
under `tests/fullstack-test-next-gen/log`" or "logging is available under
`tests/fullstack-test-next-gen/log`" (prefer "logs are available..." for plural
noun consistency).

Comment thread tests/docker/util.sh
Comment on lines +179 to +199
export LOCAL_PD_BIN_DIR="$(realpath "${LOCAL_PD_BIN_DIR}")"
if [[ "${validate_binaries}" == "true" ]]; then
validate_local_binary "${LOCAL_PD_BIN_DIR}/pd-server" "pd-server"
fi
compose_files+=(-f "${override_dir}/local_pd.yaml")
echo "Using local PD binary: ${LOCAL_PD_BIN_DIR}/pd-server"
fi
if [[ -n "${LOCAL_TIKV_BIN_DIR:-}" ]]; then
export LOCAL_TIKV_BIN_DIR="$(realpath "${LOCAL_TIKV_BIN_DIR}")"
if [[ "${validate_binaries}" == "true" ]]; then
validate_local_binary "${LOCAL_TIKV_BIN_DIR}/tikv-server" "tikv-server"
validate_local_binary "${LOCAL_TIKV_BIN_DIR}/tikv-worker" "tikv-worker"
fi
compose_files+=(-f "${override_dir}/local_tikv.yaml")
echo "Using local TiKV binary: ${LOCAL_TIKV_BIN_DIR}/tikv-server"
echo "Using local TiKV worker binary: ${LOCAL_TIKV_BIN_DIR}/tikv-worker"
fi
if [[ -n "${LOCAL_TIDB_BIN_DIR:-}" ]]; then
export LOCAL_TIDB_BIN_DIR="$(realpath "${LOCAL_TIDB_BIN_DIR}")"
if [[ "${validate_binaries}" == "true" ]]; then
validate_local_binary "${LOCAL_TIDB_BIN_DIR}/tidb-server" "tidb-server"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Repro: inline assignment exits before custom validation"
bash -c 'set -e; export X="$(realpath /definitely/not/exist)"; echo "unreachable"' || echo "exited early as expected"

echo "Repro: separated assignment allows explicit error handling"
bash -c 'set -e; x="$(realpath /definitely/not/exist)" || { echo "custom error path reached"; exit 1; }'

Repository: pingcap/tiflash

Length of output: 314


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Bash set -e behavior with failing command substitution =="
echo "-- Case A: export VAR=\"\$(false)\" (expect: may or may not exit depending on bash semantics)"
bash -c 'set -e; export X="$(false)"; echo "A: reached after export"'
echo "-- Case A should have aborted; if it printed above, then behavior differs."

echo "-- Case B: assignment with explicit || handler"
bash -c 'set -e; x="$(false)" || { echo "B: custom error path reached"; exit 1; }; echo "B: reached after handler"'

echo "-- Case C: export with substitution guarded by || (syntax variant)"
bash -c 'set -e; export X="$(false)" || { echo "C: custom error path reached"; exit 1; }'

echo "== Inspect tests/docker/util.sh around cited lines =="
# Print a small window around the mentioned lines
nl -ba tests/docker/util.sh | sed -n '150,220p'

Repository: pingcap/tiflash

Length of output: 386


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== util.sh: lines 160-220 =="
nl -ba tests/docker/util.sh | sed -n '160,220p'

echo
echo "== Find validate_local_binary definition =="
rg -n --hidden --no-ignore-vcs "validate_local_binary" tests/docker/util.sh
# show surrounding function body
python3 - <<'PY'
import re, itertools, sys, pathlib
p=pathlib.Path("tests/docker/util.sh")
lines=p.read_text().splitlines()
idxs=[i for i,l in enumerate(lines) if "validate_local_binary" in l and re.search(r'\bvalidate_local_binary\b', l)]
print("Occurrences:", idxs[:10])
PY

echo
echo "== util.sh: show function validate_local_binary region (best-effort) =="
# naive: print from first match line-5 to line+60
start=$(rg -n "^[[:space:]]*validate_local_binary\\b" tests/docker/util.sh | head -n1 | cut -d: -f1)
if [ -n "$start" ]; then
  s=$((start-5))
  if [ $s -lt 1 ]; then s=1; fi
  e=$((start+120))
  nl -ba tests/docker/util.sh | sed -n "${s},${e}p"
else
  echo "Could not locate function definition with expected pattern."
fi

echo
echo "== util.sh: locate any set -e usage =="
rg -n "set -e" tests/docker/util.sh

echo
echo "== util.sh: locate realpath usage =="
rg -n "realpath" tests/docker/util.sh

Repository: pingcap/tiflash

Length of output: 149


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== util.sh: lines 160-220 =="
awk 'NR>=160 && NR<=220 {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh

echo
echo "== validate_local_binary occurrences (and show nearby) =="
rg -n "validate_local_binary" tests/docker/util.sh
echo
rg -n "validate_local_binary" tests/docker/util.sh -n

# Show function definition region around the first match that looks like a function header
start=$(rg -n "^[[:space:]]*(function[[:space:]]+)?validate_local_binary\\b" tests/docker/util.sh | head -n1 | cut -d: -f1 || true)
if [ -n "${start}" ]; then
  s=$((start-10)); if [ $s -lt 1 ]; then s=1; fi
  e=$((start+140))
  echo
  echo "== validate_local_binary region (around header line $start) =="
  awk -v a="$s" -v b="$e" 'NR>=a && NR<=b {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh
else
  echo
  echo "Could not find a function header for validate_local_binary."
fi

echo
echo "== set -e usage in util.sh =="
rg -n "set -e" tests/docker/util.sh || true

echo
echo "== realpath usage in util.sh =="
rg -n "realpath" tests/docker/util.sh || true

echo
echo "== show exact realpath assignment lines around 179-199 =="
awk 'NR>=170 && NR<=210 {printf "%6d\t%s\n", NR, $0}' tests/docker/util.sh

Repository: pingcap/tiflash

Length of output: 11332


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "validate_binaries" -S tests/docker || true
rg -n "setup_next_gen_compose_files" -S tests/docker || true
rg -n "append_local_binary_overrides" -S tests/docker || true

Repository: pingcap/tiflash

Length of output: 960


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "setup_next_gen_compose_files\\b" -S . || true
rg -n "append_local_binary_overrides\\b" -S . || true
rg -n "validate_binaries" -S tests/docker || true

Repository: pingcap/tiflash

Length of output: 1167


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== compose.sh: lines 1-120 (look for validate_binaries) =="
awk 'NR>=1 && NR<=120 {printf "%6d\t%s\n", NR, $0}' tests/fullstack-test-next-gen/compose.sh

echo
echo "== run.sh: lines 1-80 =="
awk 'NR>=1 && NR<=80 {printf "%6d\t%s\n", NR, $0}' tests/fullstack-test-next-gen/run.sh

echo
echo "== validate_binaries occurrences in fullstack-test-next-gen =="
rg -n "validate_binaries" tests/fullstack-test-next-gen || true

Repository: pingcap/tiflash

Length of output: 6743


Guard realpath handling when setting LOCAL_*_BIN_DIR in append_local_binary_overrides.

tests/docker/util.sh lines 179/187/197 unconditionally do export LOCAL_*_BIN_DIR="$(realpath ...)", but validate_local_binary only runs when ${validate_binaries}=="true"; tests/fullstack-test-next-gen/compose.sh sets validate_binaries=false for stop/down/ps/logs/exec/.... If realpath fails, the LOCAL_*_BIN_DIR can become empty, yielding override paths like /pd-server and confusing output instead of a clear/consistent failure.

🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 179-179: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 187-187: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 197-197: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/docker/util.sh` around lines 179 - 199, The
append_local_binary_overrides block unconditionally runs realpath into
LOCAL_PD_BIN_DIR/LOCAL_TIKV_BIN_DIR/LOCAL_TIDB_BIN_DIR which can produce empty
values when realpath fails; change the logic to only call realpath (or at least
validate its result) when validate_binaries is true, or check the return of
realpath and bail/log a clear error before exporting the variable. Locate
append_local_binary_overrides and update the sections that set LOCAL_PD_BIN_DIR,
LOCAL_TIKV_BIN_DIR, and LOCAL_TIDB_BIN_DIR so they either (a) run realpath
inside the if [[ "${validate_binaries}" == "true" ]] block before calling
validate_local_binary, or (b) test the realpath output for non-empty/valid path
and handle failure (error and exit) before adding compose_files and printing
"Using local ..." messages; reference validate_local_binary, compose_files, and
the LOCAL_*_BIN_DIR variables when making the change.

Comment on lines 32 to 65
if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then
echo "Running fullstack test on next-gen TiFlash"

# set images for next-gen TiFlash cluster
HUB_ADDR="${HUB_ADDR:-us-docker.pkg.dev/pingcap-testing-account/tidbx}"
if [[ -z "${PD_BRANCH}" || "${PD_BRANCH}" == "master" ]]; then
PD_BRANCH="master-next-gen"
fi
if [[ -z "${TIKV_BRANCH}" || "${TIKV_BRANCH}" == "cloud-engine" ]]; then
TIKV_BRANCH="cloud-engine-next-gen"
fi
if [[ -z "${TIDB_BRANCH}" || "${TIDB_BRANCH}" == "master" ]]; then
TIDB_BRANCH="master-next-gen"
fi
export PD_IMAGE="${PD_IMAGE:-${HUB_ADDR}/tikv/pd/image:${PD_BRANCH}}"
export TIKV_IMAGE="${TIKV_IMAGE:-${HUB_ADDR}/tikv/tikv/image:${TIKV_BRANCH}}"
export TIDB_IMAGE="${TIDB_IMAGE:-${HUB_ADDR}/pingcap/tidb/images/tidb-server:${TIDB_BRANCH}}"

# clean up previous docker instances, data and log
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
${COMPOSE} "${COMPOSE_FILES[@]}" down
clean_data_log
prepare_next_gen_data_dirs

${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" up -d
${COMPOSE} "${COMPOSE_FILES[@]}" up -d
echo "PD version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T pd0 bash -c '/pd-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'
echo "TiDB version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tidb0 bash -c '/tidb-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'
echo "TiKV version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tikv0 bash -c '/tikv-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'
echo "TiKV worker version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'

# run fullstack-tests
wait_next_gen_env
ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "
# most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
# maybe we need to split them into parallel pipelines because they take too long to run.
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
${COMPOSE} "${COMPOSE_FILES[@]}" down
clean_data_log
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure teardown runs on failure, not only on success path.

With set -e, any failing test command exits before Line 64-65, so compose resources may leak and poison follow-up runs.

Suggested fix
 if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then
     echo "Running fullstack test on next-gen TiFlash"
+    cleanup() {
+        ${COMPOSE} "${COMPOSE_FILES[@]}" down || true
+        clean_data_log || true
+    }
+    trap cleanup EXIT
 
     # clean up previous docker instances, data and log
     ${COMPOSE} "${COMPOSE_FILES[@]}" down
     clean_data_log
@@
-    ${COMPOSE} "${COMPOSE_FILES[@]}" down
-    clean_data_log
+    trap - EXIT
+    cleanup
 
     exit 0
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then
echo "Running fullstack test on next-gen TiFlash"
# set images for next-gen TiFlash cluster
HUB_ADDR="${HUB_ADDR:-us-docker.pkg.dev/pingcap-testing-account/tidbx}"
if [[ -z "${PD_BRANCH}" || "${PD_BRANCH}" == "master" ]]; then
PD_BRANCH="master-next-gen"
fi
if [[ -z "${TIKV_BRANCH}" || "${TIKV_BRANCH}" == "cloud-engine" ]]; then
TIKV_BRANCH="cloud-engine-next-gen"
fi
if [[ -z "${TIDB_BRANCH}" || "${TIDB_BRANCH}" == "master" ]]; then
TIDB_BRANCH="master-next-gen"
fi
export PD_IMAGE="${PD_IMAGE:-${HUB_ADDR}/tikv/pd/image:${PD_BRANCH}}"
export TIKV_IMAGE="${TIKV_IMAGE:-${HUB_ADDR}/tikv/tikv/image:${TIKV_BRANCH}}"
export TIDB_IMAGE="${TIDB_IMAGE:-${HUB_ADDR}/pingcap/tidb/images/tidb-server:${TIDB_BRANCH}}"
# clean up previous docker instances, data and log
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
${COMPOSE} "${COMPOSE_FILES[@]}" down
clean_data_log
prepare_next_gen_data_dirs
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" up -d
${COMPOSE} "${COMPOSE_FILES[@]}" up -d
echo "PD version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T pd0 bash -c '/pd-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'
echo "TiDB version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tidb0 bash -c '/tidb-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'
echo "TiKV version:"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tikv0 bash -c '/tikv-server -V'
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'
echo "TiKV worker version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'
# run fullstack-tests
wait_next_gen_env
ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "
# most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
# maybe we need to split them into parallel pipelines because they take too long to run.
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
${COMPOSE} -f next-gen-cluster.yaml -f "${DISAGG_TIFLASH_YAML}" down
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
${COMPOSE} "${COMPOSE_FILES[@]}" down
clean_data_log
if [[ -n "$ENABLE_NEXT_GEN" && "$ENABLE_NEXT_GEN" != "false" && "$ENABLE_NEXT_GEN" != "0" ]]; then
echo "Running fullstack test on next-gen TiFlash"
cleanup() {
${COMPOSE} "${COMPOSE_FILES[@]}" down || true
clean_data_log || true
}
trap cleanup EXIT
# clean up previous docker instances, data and log
${COMPOSE} "${COMPOSE_FILES[@]}" down
clean_data_log
prepare_next_gen_data_dirs
${COMPOSE} "${COMPOSE_FILES[@]}" up -d
echo "PD version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T pd0 bash -c '/pd-server -V'
echo "TiDB version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tidb0 bash -c '/tidb-server -V'
echo "TiKV version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv0 bash -c '/tikv-server -V'
echo "TiKV worker version:"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tikv-worker0 bash -c '/tikv-worker -V'
# run fullstack-tests
wait_next_gen_env
ENV_ARGS="ENABLE_NEXT_GEN=true verbose=${verbose} "
# most failpoints are expected to be set on the compute layer, use tiflash-cn0 to run tests
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/sample.test"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test-next-gen/placement"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/clustered_index"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/dml"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/variables"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test2/mpp"
# maybe we need to split them into parallel pipelines because they take too long to run.
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/expr"
${COMPOSE} "${COMPOSE_FILES[@]}" exec -T tiflash-cn0 bash -c "cd /tests ; ${ENV_ARGS} ./run-test.sh fullstack-test/mpp"
trap - EXIT
cleanup
exit 0
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/fullstack-test-next-gen/run.sh` around lines 32 - 65, The script
currently tears down Docker and cleans data only on the success path, so failing
commands leak resources; introduce a guaranteed teardown by adding a cleanup
function (invoking ${COMPOSE} "${COMPOSE_FILES[@]}" down and clean_data_log) and
register it with trap to run on EXIT/ERR before running
prepare_next_gen_data_dirs/wait_next_gen_env and the test invocations
(references: COMPOSE, COMPOSE_FILES, prepare_next_gen_data_dirs,
wait_next_gen_env, ENV_ARGS, clean_data_log); ensure the trap is installed early
(after set -e) so the cleanup runs regardless of test failures and remove or
avoid duplicate teardown at the end if the trap already handles it.

Comment thread tests/temp_run_tests.sh
#!/bin/bash
set -x

cd "$(dirname "$0")"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -u
# Demonstrates why explicit guard matters without `set -e`.
bash -c 'cd /definitely/not/exist; echo "still running in wrong dir"'
bash -c 'cd /definitely/not/exist || exit 1; echo "unreachable"'

Repository: pingcap/tiflash

Length of output: 238


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and show the file with line numbers
ls -la tests/temp_run_tests.sh
sed -n '1,120p' tests/temp_run_tests.sh | nl -ba

# Check if any other scripts use a similar guard pattern
rg -n --hidden --no-ignore-vcs 'cd "\$\(dirname "\$0"\)"' tests || true
rg -n --hidden --no-ignore-vcs 'cd "\$\(dirname "\$0"\)" \|\| exit' -S . || true
rg -n --hidden --no-ignore-vcs 'set -e|set -u|set -eu|pipefail' tests/temp_run_tests.sh || true

# Check shellcheck availability and run it on the file if present
if command -v shellcheck >/dev/null 2>&1; then
  shellcheck -x tests/temp_run_tests.sh || true
else
  echo "shellcheck not available"
fi

Repository: pingcap/tiflash

Length of output: 192


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== File: tests/temp_run_tests.sh (numbered) ==="
cat -n tests/temp_run_tests.sh

echo
echo "=== Search for existing cd pattern in tests/ ==="
rg -n --hidden --no-ignore-vcs 'cd "\$\(dirname "\$0"\)"' tests || true

echo
echo "=== Search for cd-with-guard usage in scripts ==="
rg -n --hidden --no-ignore-vcs 'cd "\$\(dirname "\$0"\)"\s*\|\|\s*exit' tests || true
rg -n --hidden --no-ignore-vcs 'cd "\$\(dirname "\$0"\)"\s*\|\|\s*(exit|return)' tests || true

echo
echo "=== Check shell options used in this file ==="
rg -n --hidden --no-ignore-vcs 'set\s+-[euoEPaix]*|#!/bin/(ba)?sh' tests/temp_run_tests.sh || true

echo
echo "=== Attempt shellcheck if available ==="
if command -v shellcheck >/dev/null 2>&1; then
  shellcheck -x tests/temp_run_tests.sh || true
else
  echo "shellcheck not available"
fi

Repository: pingcap/tiflash

Length of output: 1768


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== tests/run-gtest.sh around cd ==="
sed -n '1,80p' tests/run-gtest.sh | cat -n

echo
echo "=== tests/fullstack-test-next-gen/compose.sh around cd ==="
sed -n '1,120p' tests/fullstack-test-next-gen/compose.sh | cat -n

Repository: pingcap/tiflash

Length of output: 5661


Guard cd in tests/temp_run_tests.sh to fail fast.

tests/temp_run_tests.sh only sets set -x, so if cd "$(dirname "$0")" fails it keeps running from the wrong working directory (shellcheck SC2164).

Suggested fix
-cd "$(dirname "$0")"
+cd "$(dirname "$0")" || exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd "$(dirname "$0")"
cd "$(dirname "$0")" || exit 1
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/temp_run_tests.sh` at line 4, The script's cd command (cd "$(dirname
"$0")") is unguarded and can silently fail; update tests/temp_run_tests.sh to
check the cd return and fail fast (e.g., test the exit status and exit with a
non-zero code or set -e before the cd) so the script doesn't continue in the
wrong working directory; locate the cd "$(dirname "$0")" line and add a guard
such as exiting on failure (or wrapping with a conditional that prints an error
and exits) to ensure the script aborts if the chdir fails.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 21, 2026

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-next-gen 533f217 link true /test pull-integration-next-gen
pull-integration-test 533f217 link true /test pull-integration-test
pull-unit-next-gen 533f217 link true /test pull-unit-next-gen
pull-sanitizer-tsan 533f217 link false /test pull-sanitizer-tsan
pull-sanitizer-asan 533f217 link false /test pull-sanitizer-asan

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JaySon-Huang JaySon-Huang marked this pull request as draft May 21, 2026 13:32
Add [columnar_trace] logs across the columnar disaggregated read pipeline,
bad_alloc guards in buildPushDownFilter/appendExtraCastsAfterTS, and
RUNTIME_CHECK to confirm column_defines index misuse with generated columns.
Update tiflash-proxy-columnar with matching filter debug logs.
…loop.

genColumnDefinesForDisaggregatedRead already excludes generated columns from
column_defines. The second-pass loop incorrectly indexed column_defines with
table_scan indices, causing misaligned columns and std::bad_alloc.

Fixes pingcap#10856
Signed-off-by: JaySon-Huang <tshent@qq.com>
Columnar nodes do not rely on kvstore, but restore still fetched legacy
proxy config JSON and logged parse warnings against Columnar Hub output.
Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 22, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant