Skip to content

Commit 60060c9

Browse files
ci(integration-tests): use github.token for check-run posters (#801)
* ci(integration-tests): use github.token for check-run posters Follow-up to #799. The dispatch failure handlers and auto-pass steps were posting check-runs with `steps.public-token.outputs.token`, which is itself an App-token-generating step. That created a silent-failure trap: if the App secrets are missing or rotated, the App-token step fails, then the failure handler also fails (no token to authenticate with), and the gate sits green from the earlier `skip-integration-tests-pr` job's synthetic-success check — the exact silent-pass anti-pattern the failure handler exists to prevent. Discovered by exercising the dispatch end-to-end on a draft PR before the App secrets were installed (#800 closed). The canonical adbc-drivers/databricks workflow has the same latent bug — fix not yet upstreamed there. The fix is to use the default workflow `${{ github.token }}` for all check-posting steps. The default token already has `checks: write` because each job declares the permission. `steps.public-token` is no longer referenced anywhere; the generation step is removed to keep the workflow tidy. The App token is still used (correctly) for the actual dispatch call into databricks-driver-test, where cross-repo write access is required. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(retry): isolate retry tests from shared telemetry executor The e2e retry tests in retry_test_mixins.py assert exact urllib3 call counts (e.g. `mock_validate_conn.call_count == 6`) while mocking urllib3 globally via `_validate_conn` / `_get_conn`. The shared `TelemetryClientFactory` executor — populated by *prior* tests in the same worker — keeps firing background telemetry pushes that land on the same mocked urllib3 layer, inflating the counts and tripping the assertions. Drain the factory and route any new `initialize_telemetry_client` call to `NoopTelemetryClient` for the duration of the three tests that assert exact counts: - test_oserror_retries - test_retry_max_count_not_exceeded - test_retry_exponential_backoff Factory state is fully restored on exit so subsequent tests see the real telemetry pipeline. Co-authored-by: Isaac --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent fb55001 commit 60060c9

2 files changed

Lines changed: 77 additions & 25 deletions

File tree

.github/workflows/trigger-integration-tests.yml

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -194,15 +194,6 @@ jobs:
194194
owner: databricks
195195
repositories: databricks-driver-test
196196

197-
- name: Generate GitHub App Token (public repo)
198-
id: public-token
199-
uses: actions/create-github-app-token@f8d387b68d61c58ab83c6c016672934102569859 # v3.0.0
200-
with:
201-
app-id: ${{ secrets.INTEGRATION_TEST_APP_ID }}
202-
private-key: ${{ secrets.INTEGRATION_TEST_PRIVATE_KEY }}
203-
owner: databricks
204-
repositories: databricks-sql-python
205-
206197
- name: Sanitize PR title
207198
id: sanitize
208199
uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
@@ -235,7 +226,11 @@ jobs:
235226
if: steps.changed.outputs.python != 'true'
236227
uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
237228
with:
238-
github-token: ${{ steps.public-token.outputs.token }}
229+
# Default workflow token, not the App token — same rationale
230+
# as the failure handler below. We don't want a missing-secret
231+
# state to silently swallow the green check for path-filtered
232+
# no-op runs.
233+
github-token: ${{ github.token }}
239234
script: |
240235
await github.rest.checks.create({
241236
owner: context.repo.owner,
@@ -255,7 +250,15 @@ jobs:
255250
if: failure() && steps.changed.outputs.python == 'true'
256251
uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
257252
with:
258-
github-token: ${{ steps.public-token.outputs.token }}
253+
# Use the default workflow token, not the App token. The
254+
# App-token-generating step is the *most likely* thing to
255+
# fail (missing/rotated secrets, App uninstalled), and using
256+
# it here means a token-generation failure also kills this
257+
# handler — leaving the gate silently green on the stale
258+
# synthetic-success from skip-integration-tests-pr. The
259+
# default token has checks:write (declared on this job)
260+
# which is all we need.
261+
github-token: ${{ github.token }}
259262
script: |
260263
await github.rest.checks.create({
261264
owner: context.repo.owner,
@@ -316,20 +319,13 @@ jobs:
316319
echo "No driver files changed — will auto-pass"
317320
fi
318321
319-
- name: Generate GitHub App Token (public repo)
320-
id: public-token
321-
uses: actions/create-github-app-token@f8d387b68d61c58ab83c6c016672934102569859 # v3.0.0
322-
with:
323-
app-id: ${{ secrets.INTEGRATION_TEST_APP_ID }}
324-
private-key: ${{ secrets.INTEGRATION_TEST_PRIVATE_KEY }}
325-
owner: databricks
326-
repositories: databricks-sql-python
327-
328322
- name: Auto-pass (no driver changes)
329323
if: steps.changed.outputs.changed != 'true'
330324
uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
331325
with:
332-
github-token: ${{ steps.public-token.outputs.token }}
326+
# Default workflow token — see the trigger-tests-pr job's
327+
# equivalent step above for the rationale.
328+
github-token: ${{ github.token }}
333329
script: |
334330
await github.rest.checks.create({
335331
owner: context.repo.owner,
@@ -392,7 +388,9 @@ jobs:
392388
if: failure() && steps.changed.outputs.changed == 'true'
393389
uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b # v7.1.0
394390
with:
395-
github-token: ${{ steps.public-token.outputs.token }}
391+
# Use the default workflow token, not the App token — see
392+
# the rationale in the trigger-tests-pr job above.
393+
github-token: ${{ github.token }}
396394
script: |
397395
await github.rest.checks.create({
398396
owner: context.repo.owner,

tests/e2e/common/retry_test_mixins.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,58 @@
1515
SessionAlreadyClosedError,
1616
UnsafeToRetryError,
1717
)
18+
from databricks.sql.telemetry.telemetry_client import (
19+
NoopTelemetryClient,
20+
TelemetryClientFactory,
21+
_TelemetryClientHolder,
22+
)
23+
24+
25+
@contextmanager
26+
def _isolated_from_telemetry():
27+
# Tests that mock urllib3 globally (via _get_conn / _validate_conn) also
28+
# intercept background telemetry pushes from the shared
29+
# TelemetryClientFactory executor — inflating mock.call_count and breaking
30+
# assertions like `call_count == 6`. Drain the factory and force any new
31+
# connection to use NoopTelemetryClient for the duration of the test.
32+
with TelemetryClientFactory._lock:
33+
saved_clients = TelemetryClientFactory._clients
34+
saved_executor = TelemetryClientFactory._executor
35+
saved_initialized = TelemetryClientFactory._initialized
36+
for holder in list(saved_clients.values()):
37+
try:
38+
holder.client.close()
39+
except Exception:
40+
pass
41+
if saved_executor is not None:
42+
try:
43+
TelemetryClientFactory._stop_flush_thread()
44+
saved_executor.shutdown(wait=True)
45+
except Exception:
46+
pass
47+
TelemetryClientFactory._clients = {}
48+
TelemetryClientFactory._executor = None
49+
TelemetryClientFactory._initialized = False
50+
51+
def _install_noop(*args, host_url=None, **kwargs):
52+
host_url = TelemetryClientFactory.getHostUrlSafely(host_url)
53+
with TelemetryClientFactory._lock:
54+
TelemetryClientFactory._clients[host_url] = _TelemetryClientHolder(
55+
NoopTelemetryClient()
56+
)
57+
58+
try:
59+
with patch.object(
60+
TelemetryClientFactory,
61+
"initialize_telemetry_client",
62+
staticmethod(_install_noop),
63+
):
64+
yield
65+
finally:
66+
with TelemetryClientFactory._lock:
67+
TelemetryClientFactory._clients = saved_clients
68+
TelemetryClientFactory._executor = saved_executor
69+
TelemetryClientFactory._initialized = saved_initialized
1870

1971

2072
class Client429ResponseMixin:
@@ -253,7 +305,7 @@ def test_retry_urllib3_settings_are_honored(
253305
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
254306
def test_oserror_retries(self, mock_send_telemetry, extra_params):
255307
"""If a network error occurs during make_request, the request is retried according to policy"""
256-
with patch(
308+
with _isolated_from_telemetry(), patch(
257309
"urllib3.connectionpool.HTTPSConnectionPool._validate_conn",
258310
) as mock_validate_conn:
259311
mock_validate_conn.side_effect = OSError("Some arbitrary network error")
@@ -278,7 +330,9 @@ def test_retry_max_count_not_exceeded(self, mock_send_telemetry, extra_params):
278330
THEN the connector issues six request (original plus five retries)
279331
before raising an exception
280332
"""
281-
with mocked_server_response(status=429, headers={"Retry-After": "0"}) as mock_obj:
333+
with _isolated_from_telemetry(), mocked_server_response(
334+
status=429, headers={"Retry-After": "0"}
335+
) as mock_obj:
282336
with pytest.raises(MaxRetryError) as cm:
283337
extra_params = {**extra_params, **self._retry_policy}
284338
with self.connection(extra_params=extra_params) as conn:
@@ -302,7 +356,7 @@ def test_retry_exponential_backoff(self, mock_send_telemetry, extra_params):
302356
retry_policy["_retry_delay_min"] = 1
303357

304358
time_start = time.time()
305-
with mocked_server_response(
359+
with _isolated_from_telemetry(), mocked_server_response(
306360
status=429, headers={"Retry-After": "8"}
307361
) as mock_obj:
308362
with pytest.raises(RequestError) as cm:

0 commit comments

Comments
 (0)