Improve HTTP response handling and retry logic by MichaelGHSeg · Pull Request #520 · segmentio/analytics-python

MichaelGHSeg · 2026-02-12T01:07:50Z

Summary

Implements improved HTTP response handling and retry logic to align with changes in analytics-java and analytics-next:

✅ Add Authorization header (Basic auth with write key, OAuth Bearer token)
✅ Add X-Retry-Count header on all requests (starts at 0)
✅ Implement Retry-After header support (capped at 300s, doesn't count against retry budget)
✅ Granular status code classification for retryable vs non-retryable errors
✅ Replace backoff decorator with custom retry loop for better control
✅ Exponential backoff with jitter (0.5s base, 60s cap)
✅ Increase max retries from 10 to 1000 (to accommodate faster retry cadence)
✅ Clear OAuth token on 511 Network Authentication Required
✅ 413 Payload Too Large is non-retryable

Status Code Handling

Retryable 4xx: 408, 410, 429, 460
Non-retryable 4xx: 400, 401, 403, 404, 413, 422, and all other 4xx
Retryable 5xx: All except 501, 505
Non-retryable 5xx: 501, 505

Retry-After Behavior

Respected for status codes: 429
Retry-After attempts do NOT count against backoff retry budget
Capped at 300 seconds maximum delay

Test Coverage

Added 30 new comprehensive tests
Total test suite: 106 tests passing ✓
Tests cover: Authorization headers, X-Retry-Count, Retry-After parsing, status code classification, backoff logic, network errors, retry budget separation

Changes

Modified Files

segment/analytics/consumer.py: Custom retry loop, status code classification, Retry-After support
segment/analytics/request.py: Authorization header, X-Retry-Count header, parse_retry_after()
segment/analytics/client.py: Increased default max_retries to 1000
segment/analytics/test/test_consumer.py: 13 new test cases
segment/analytics/test/test_request.py: 17 new test cases

Alignment

This implementation aligns with:

analytics-java commit c3f60c5 and 80cc33d
analytics-next commit 31e275d2

🤖 Generated with Claude Code

- Add Authorization header with Basic auth (base64 encoded write key) - Add X-Retry-Count header on all requests (starts at 0) - Implement Retry-After header support (capped at 300s) - Retry-After attempts don't count against backoff retry budget - Add granular status code classification: - Retryable 4xx: 408, 410, 429, 460 - Non-retryable 4xx: 400, 401, 403, 404, 413, 422 - Retryable 5xx: all except 501, 505 - Non-retryable 5xx: 501, 505 - Replace backoff decorator with custom retry loop - Exponential backoff with jitter (0.5s base, 60s cap) - Clear OAuth token on 511 Network Authentication Required - 413 Payload Too Large is non-retryable - Add 30 new comprehensive tests (106 total tests) Aligns with analytics-java and analytics-next retry behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Aligns with analytics-java change to accommodate shorter backoff periods (0.5s base, 60s cap). With faster retries, a higher retry limit allows for better resilience during extended outages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves HTTP response handling and retry logic in the analytics-python client, aligning with changes from analytics-java and analytics-next. The implementation replaces the backoff decorator with a custom retry loop that provides fine-grained control over retry behavior, distinguishing between Retry-After attempts and exponential backoff attempts.

Changes:

Replaced backoff decorator with custom retry loop implementing exponential backoff with jitter (0.5s base, 60s cap)
Added Authorization header support (Basic auth with write key and OAuth Bearer token)
Added X-Retry-Count header to track attempt numbers
Implemented Retry-After header support (capped at 300s, doesn't count against retry budget)
Granular status code classification distinguishing retryable from non-retryable errors
Increased default max_retries from 10 to 1000 to accommodate faster retry cadence
OAuth token clearing expanded to include 511 status code

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
segment/analytics/request.py	Added Authorization header (Basic/Bearer), X-Retry-Count header, parse_retry_after() function, and response object to APIError
segment/analytics/consumer.py	Replaced backoff decorator with custom retry loop implementing exponential backoff, Retry-After support, and granular status code classification
segment/analytics/client.py	Increased default max_retries from 10 to 1000
segment/analytics/test/test_request.py	Added 17 comprehensive tests for authorization headers, X-Retry-Count, Retry-After parsing, and OAuth token clearing
segment/analytics/test/test_consumer.py	Added 13 comprehensive tests for retry logic, status code classification, backoff behavior, and Retry-After support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

segment/analytics/request.py

Copilot · 2026-02-12T22:54:55Z

segment/analytics/consumer.py

+                # Calculate exponential backoff delay with jitter
+                base_delay = 0.5 * (2 ** (backoff_attempts - 1))
+                jitter = random.uniform(0, 0.1 * base_delay)
+                delay = min(base_delay + jitter, 60)  # Cap at 60 seconds
+
+                self.log.debug(
+                    f"Retry attempt {backoff_attempts}/{self.retries} (total attempts: {total_attempts}) "
+                    f"after {delay:.2f}s for status {e.status}"
+                )
+                time.sleep(delay)
+
+            except Exception as e:
+                # Network errors or other exceptions - retry with backoff
+                total_attempts += 1
+                backoff_attempts += 1
+
+                if backoff_attempts >= max_backoff_attempts:
+                    self.log.error(
+                        f"All {self.retries} retries exhausted after {total_attempts} total attempts. Final error: {e}"
+                    )
+                    raise
+
+                # Calculate exponential backoff delay with jitter
+                base_delay = 0.5 * (2 ** (backoff_attempts - 1))
+                jitter = random.uniform(0, 0.1 * base_delay)
+                delay = min(base_delay + jitter, 60)  # Cap at 60 seconds


The exponential backoff calculation is duplicated between the APIError handler (lines 196-199) and the generic Exception handler (lines 218-221). Consider extracting this logic into a helper function to reduce code duplication and improve maintainability.

Copilot · 2026-02-12T22:54:56Z

segment/analytics/consumer.py

+                if should_use_retry_after(e.status) and e.response:
+                    retry_after = parse_retry_after(e.response)
+                    if retry_after:
+                        self.log.debug(
+                            f"Retry-After header present: waiting {retry_after}s (attempt {total_attempts})"
+                        )
+                        time.sleep(retry_after)
+                        continue  # Does not count against backoff budget


The retry logic does not impose an upper bound on Retry-After attempts. If the server continuously responds with Retry-After headers (for status codes 408, 429, or 503), the client could retry indefinitely. Consider adding a maximum total attempt limit to prevent infinite retry loops, even when Retry-After is present. For example, you could add a check like if total_attempts >= some_large_limit: raise before processing Retry-After.

segment/analytics/test/test_consumer.py

segment/analytics/test/test_request.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Extract duplicate exponential backoff calculation into helper function - Add upper bound (max_total_attempts) to prevent infinite retry loops with Retry-After - Improves code maintainability and prevents edge case of continuous Retry-After responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove 408/503 from Retry-After eligibility (only 429 uses Retry-After) - Add rate-limit state to Consumer (rate_limited_until, rate_limit_start_time) - 429 with Retry-After: set rate-limit state, raise to caller for requeue - 429 without Retry-After: counted backoff (not pipeline blocking) - Add maxTotalBackoffDuration / maxRateLimitDuration config (default 43200s) - upload() checks rate-limit state before request(), enforces duration limit - 511 OAuth gating: only retry when OauthManager is configured - Add tests: T04, T17, T19, T20; update 429/408/503 behavior tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Handle Retry-After: 0 correctly by checking 'is not None' instead of truthiness. Prevent silent batch re-queue on 429 without Retry-After by gating the upload() re-queue path on rate_limited_until being set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

segment/analytics/consumer.py:158

When a 429 with Retry-After is received (lines 136-144), the batch items are re-queued using queue.put(), but then in the finally block (lines 156-158), all items in the batch are marked as task_done(). This means the items that were re-queued are still marked as done, which could cause queue.join() to return prematurely before those re-queued items are actually processed. Consider either not calling task_done() for re-queued items, or handling the re-queueing differently.

            if e.status == 429 and self.rate_limited_until is not None:
                # 429: rate-limit state already set by request(). Re-queue batch.
                self.log.debug('429 received. Re-queuing batch and halting upload iteration.')
                for item in batch:
                    try:
                        self.queue.put(item, block=False)
                    except Exception:
                        pass  # Queue full, item lost
                success = False
            else:
                self.log.error('error uploading: %s', e)
                success = False
                if self.on_error:
                    self.on_error(e, batch)
        except Exception as e:
            self.log.error('error uploading: %s', e)
            success = False
            if self.on_error:
                self.on_error(e, batch)
        finally:
            # mark items as acknowledged from queue
            for _ in batch:
                self.queue.task_done()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T19:17:14Z

segment/analytics/request.py

        log.debug('received response: %s', payload)
-        raise APIError(res.status_code, payload['code'], payload['message'])
+        raise APIError(res.status_code, payload['code'], payload['message'], res)
    except ValueError:


The exception handler catches ValueError for when res.json() fails to parse JSON, but it doesn't catch KeyError that would be raised if the parsed JSON doesn't contain 'code' or 'message' keys (line 102). Consider catching both ValueError and KeyError to handle malformed JSON responses more gracefully.

Suggested change

except ValueError:

except (ValueError, KeyError):

Copilot · 2026-02-25T19:17:14Z

segment/analytics/test/test_consumer.py

+    def test_408_and_503_use_backoff(self):
+        """Test that 408 and 503 use exponential backoff"""
+        track = {'type': 'track', 'event': 'python event', 'userId': 'userId'}
+
+        for status_code in [408, 503]:
+            consumer = Consumer(None, 'testsecret', retries=2)
+            call_count = 0
+            sleep_durations = []
+
+            def mock_post_fn(*args, **kwargs):
+                nonlocal call_count
+                call_count += 1
+                if call_count == 1:
+                    response = mock.Mock()
+                    response.headers = {'Retry-After': '5'}
+                    error = APIError(status_code, 'error', 'Error')
+                    error.response = response
+                    raise error
+                return mock.Mock(status_code=200)
+
+            def mock_sleep(duration):
+                sleep_durations.append(duration)
+
+            with mock.patch('segment.analytics.consumer.post', side_effect=mock_post_fn):
+                with mock.patch('time.sleep', side_effect=mock_sleep):
+                    consumer.request([track])
+
+            # Should use backoff delay (0 for first retry), NOT the Retry-After value of 5
+            self.assertEqual(call_count, 2)
+            self.assertEqual(len(sleep_durations), 1)
+            self.assertEqual(sleep_durations[0], 0, f'{status_code} should use backoff, not Retry-After')


The PR description states "Retry-After Behavior - Respected for status codes: 408, 429, 503", but the implementation only respects Retry-After for 429. Status codes 408 and 503 always use exponential backoff, ignoring any Retry-After header. This test confirms this behavior is intentional. Consider updating the PR description to accurately reflect that only 429 respects Retry-After headers.

- Add KeyError to except clause when parsing JSON response to handle missing 'code' or 'message' keys - Add explanatory comment on pre-existing except-pass pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

requests.Response.__bool__() returns False for non-2xx status codes. The checks `if e.response` and `if response` evaluated to False for 429 responses, so parse_retry_after() was never called and the SDK fell back to normal backoff instead of respecting Retry-After. Changed both checks to explicit `is not None` comparisons. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add on_error handler to capture delivery failures from the SDK. Reports success=false with the first error message when any batch fails (non-retryable error or retries exhausted). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MichaelGHSeg and others added 2 commits February 11, 2026 19:20

Increase max retries from 10 to 1000

e4f34d0

Aligns with analytics-java change to accommodate shorter backoff periods (0.5s base, 60s cap). With faster retries, a higher retry limit allows for better resilience during extended outages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MichaelGHSeg requested a review from Copilot February 12, 2026 22:49

Copilot started reviewing on behalf of MichaelGHSeg February 12, 2026 22:49 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

MichaelGHSeg and others added 6 commits February 12, 2026 18:07

Update segment/analytics/test/test_request.py

8bd4163

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestions from code review

e1ac214

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'master' into response-status-updates

2165298

MichaelGHSeg requested a review from Copilot February 25, 2026 19:12

Copilot started reviewing on behalf of MichaelGHSeg February 25, 2026 19:12 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

MichaelGHSeg and others added 4 commits February 25, 2026 16:39

Address PR review: catch KeyError in response parsing

115d124

- Add KeyError to except clause when parsing JSON response to handle missing 'code' or 'message' keys - Add explanatory comment on pre-existing except-pass pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enabling retry e2e test set

2474b46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve HTTP response handling and retry logic#520

Improve HTTP response handling and retry logic#520
MichaelGHSeg wants to merge 12 commits intomasterfrom
response-status-updates

MichaelGHSeg commented Feb 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MichaelGHSeg commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Status Code Handling

Retry-After Behavior

Test Coverage

Changes

Modified Files

Alignment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelGHSeg commented Feb 12, 2026 •

edited

Loading