Skip to content

[PECOBLR-2461] Add comprehensive MST transaction E2E tests#775

Merged
vikrantpuppala merged 3 commits into
mainfrom
add-mst-transaction-tests-v2
Apr 21, 2026
Merged

[PECOBLR-2461] Add comprehensive MST transaction E2E tests#775
vikrantpuppala merged 3 commits into
mainfrom
add-mst-transaction-tests-v2

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

@vikrantpuppala vikrantpuppala commented Apr 21, 2026

Summary

  • Rewrites tests/e2e/test_transactions.py with 42 MST tests across 5 categories
  • Tests are parallelization-friendly (each test uses its own unique Delta table)
  • Full suite runs in ~2 minutes on 4 workers (pytest -n 4 --dist=loadgroup)
  • Covers gaps identified in the MST + xDBC Metadata RPCs audit

Test categories

  • TestMstCorrectness (18) — commit/rollback/isolation/multi-table atomicity/repeatable reads/write conflict/parameterized DML/auto-start after commit/rollback/close-connection-implicit-rollback
  • TestMstApi (6) — DB-API-specific: autocommit getter/setter, isolation level (supported + unsupported), commit without active txn, setAutoCommit during active txn
  • TestMstMetadata (6)cursor.columns/tables/schemas/catalogs inside a txn + two freshness tests asserting Thrift metadata RPCs are non-transactional (they see concurrent DDL that the txn should not see)
  • TestMstBlockedSql (9) — MSTCheckRule enforcement. The server allowlists only SELECT / INSERT / MERGE / UPDATE / DELETE / DESCRIBE TABLE inside MST. 8 statements throw + abort txn (SHOW COLUMNS/TABLES/SCHEMAS/CATALOGS/FUNCTIONS, DESCRIBE QUERY, DESCRIBE TABLE EXTENDED, information_schema). 1 statement succeeds (DESCRIBE TABLE basic — explicitly allowed by server).
  • TestMstExecuteVariants (2) — executemany commit + rollback

Parallelisation strategy

  • Each test derives a unique Delta table name from its node id so 4 parallel workers don't collide.
  • Tests that spawn concurrent connections to the same table (repeatable reads, write conflict, freshness) use xdist_group so the concurrent connections within a single test don't conflict with other tests on different workers.

Test plan

Related

NO_CHANGELOG=true

This pull request was AI-assisted by Isaac.

Replaces the prior speculative test skeleton with 42 tests across 5
categories:

- TestMstCorrectness (18): commit/rollback/isolation/multi-table
  atomicity/repeatable reads/write conflict/parameterized DML/etc.
- TestMstApi (6): DB-API-specific — autocommit, isolation level,
  error handling.
- TestMstMetadata (6): cursor.columns/tables/schemas/catalogs inside
  a transaction, plus two freshness tests asserting Thrift metadata
  RPCs are non-transactional (they see concurrent DDL that the txn
  should not see).
- TestMstBlockedSql (9): MSTCheckRule enforcement. Some SHOW/DESCRIBE
  commands throw + abort txn, others succeed silently on Python/Thrift
  (diverges from JDBC). Both behaviors are explicitly tested so
  regressions in either direction are caught.
- TestMstExecuteVariants (2): executemany commit/rollback.

Parallelisation:
- Each test uses a unique Delta table derived from its test name so
  pytest-xdist workers don't collide on shared state.
- Tests that spawn concurrent connections to the same table
  (repeatable reads, write conflict, freshness) use xdist_group so
  the concurrent connections within a single test don't conflict with
  other tests on different workers.

Runtime: ~2 minutes on 4 workers (pytest -n 4 --dist=loadgroup),
well within the existing e2e budget.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
CI caught that the initial "not blocked" assertions were wrong — the
server returns TRANSACTION_NOT_SUPPORTED.COMMAND for SHOW COLUMNS
(ShowDeltaTableColumnsCommand) and DESCRIBE QUERY (DescribeQueryCommand)
inside an active transaction.

The server's error message explicitly lists the allowed commands:
"Only SELECT / INSERT / MERGE / UPDATE / DELETE / DESCRIBE TABLE are
supported." DESCRIBE TABLE (basic) remains the only DESCRIBE variant
that is allowed.

Earlier dogfood runs showed SHOW COLUMNS / DESCRIBE QUERY succeeding —
likely because the dogfood warehouse DBR is older than CI. Aligning
tests with the current/CI server behavior.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Comment thread tests/e2e/test_transactions.py Outdated
Comment thread tests/e2e/test_transactions.py Outdated
Comment thread tests/e2e/test_transactions.py Outdated
Copy link
Copy Markdown

@gopalldb gopalldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments, please address those

- test_auto_start_after_commit: assert the rolled-back id=2 is NOT
  present (use _get_ids set equality instead of just row count).
- test_auto_start_after_rollback: same pattern — assert the
  rolled-back id=1 is NOT present.
- test_commit_without_active_txn_throws: match specific
  NO_ACTIVE_TRANSACTION server error code to ensure we're catching
  the right exception, not an unrelated one.

Add _get_ids() helper for checking the exact set of persisted ids.

Verified 42/42 pass against pecotesting in ~1:36 (4 workers).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala changed the title [PECOBLR-2086] Add comprehensive MST transaction E2E tests [PECOBLR-2461] Add comprehensive MST transaction E2E tests Apr 21, 2026
@vikrantpuppala vikrantpuppala enabled auto-merge (squash) April 21, 2026 09:40
@vikrantpuppala vikrantpuppala merged commit d872075 into main Apr 21, 2026
34 checks passed
vikrantpuppala added a commit that referenced this pull request May 26, 2026
…urrent CI jobs

`_unique_table_name` derived the table name purely from the test node id,
so two CI runs racing on the same warehouse + catalog (e.g. PR + push to
main landing within seconds, as happened in run 26410038645) would both
target `peco.default.mst_pysql_<test_name>` and step on each other's
CREATE / DROP / DML. This caused the three "flaky" failures we kept
seeing in `test_transactions.py`:

  - test_multi_table_commit       → TRANSACTION_ROLLBACK_REQUIRED_AFTER_ABORT
  - test_executemany_rollback_in_txn → TABLE_OR_VIEW_NOT_FOUND
  - test_write_conflict_single_table → TABLE_OR_VIEW_ALREADY_EXISTS

The companion helper `_unique_table_name_raw` already appended a uuid4
suffix for exactly this reason; the fixture helper was an oversight from
#775. Add the same suffix here, reserving room so the result still fits
in the 80-char cap.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
vikrantpuppala added a commit that referenced this pull request May 26, 2026
…s concurrent CI jobs (#805)

* [PECOBLR-2461] fix(tests/e2e): de-collide MST table names across concurrent CI jobs

`_unique_table_name` derived the table name purely from the test node id,
so two CI runs racing on the same warehouse + catalog (e.g. PR + push to
main landing within seconds, as happened in run 26410038645) would both
target `peco.default.mst_pysql_<test_name>` and step on each other's
CREATE / DROP / DML. This caused the three "flaky" failures we kept
seeing in `test_transactions.py`:

  - test_multi_table_commit       → TRANSACTION_ROLLBACK_REQUIRED_AFTER_ABORT
  - test_executemany_rollback_in_txn → TABLE_OR_VIEW_NOT_FOUND
  - test_write_conflict_single_table → TABLE_OR_VIEW_ALREADY_EXISTS

The companion helper `_unique_table_name_raw` already appended a uuid4
suffix for exactly this reason; the fixture helper was an oversight from
#775. Add the same suffix here, reserving room so the result still fits
in the 80-char cap.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* also de-collide UC Volume tests and add PR-branch concurrency cancellation

While verifying the transactions fix, a second flake surfaced — the same
cross-CI race pattern, this time on a UC Volume file path:

  tests/e2e/test_driver.py::TestPySQLCoreSuite::
    test_uc_volume_put_fails_if_file_exists_and_overwrite_not_set
  → Failed: DID NOT RAISE <class 'databricks.sql.exc.ServerOperationError'>

Two tests in `tests/e2e/common/uc_volume_tests.py` PUT/GET/REMOVE against
the hardcoded path `/Volumes/{catalog}/{schema}/e2etests/file1.csv`. When
two CI runs overlap (as happened on this PR after the force-push for DCO
sign-off), one run's REMOVE deletes the other run's file between its two
PUTs, so the expected `FILE_IN_STAGING_PATH_ALREADY_EXISTS` never fires.

Suffix the volume path with `uuid4().hex[:8]` in the two affected tests
(test_uc_volume_life_cycle and test_uc_volume_put_fails_if_file_exists_
and_overwrite_not_set). The other UC Volume tests that reference the same
path are exercising client-side / server-parse failure paths that never
touch the file — leaving those alone keeps the diff focused.

Also add a `concurrency` block to the E2E workflow:

  - On pull_request, cancel-in-progress: a new push / force-push on a PR
    cancels the previous run on that ref. Eliminates the entire class of
    "force-pushed during CI → two runs racing on shared warehouse state"
    failures.
  - On push to main, runs are NOT cancelled — each merge commit keeps its
    own clean CI signal so a regression on commit N can't be hidden by
    commit N+1 landing seconds later. Concurrent main runs can still
    collide on shared state, but the uuid-suffix conventions in the e2e
    tests are what keep that isolated.

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

---------

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants