Improve HTTP response handling and retry logic#520
Improve HTTP response handling and retry logic#520MichaelGHSeg wants to merge 12 commits intomasterfrom
Conversation
- Add Authorization header with Basic auth (base64 encoded write key) - Add X-Retry-Count header on all requests (starts at 0) - Implement Retry-After header support (capped at 300s) - Retry-After attempts don't count against backoff retry budget - Add granular status code classification: - Retryable 4xx: 408, 410, 429, 460 - Non-retryable 4xx: 400, 401, 403, 404, 413, 422 - Retryable 5xx: all except 501, 505 - Non-retryable 5xx: 501, 505 - Replace backoff decorator with custom retry loop - Exponential backoff with jitter (0.5s base, 60s cap) - Clear OAuth token on 511 Network Authentication Required - 413 Payload Too Large is non-retryable - Add 30 new comprehensive tests (106 total tests) Aligns with analytics-java and analytics-next retry behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns with analytics-java change to accommodate shorter backoff periods (0.5s base, 60s cap). With faster retries, a higher retry limit allows for better resilience during extended outages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves HTTP response handling and retry logic in the analytics-python client, aligning with changes from analytics-java and analytics-next. The implementation replaces the backoff decorator with a custom retry loop that provides fine-grained control over retry behavior, distinguishing between Retry-After attempts and exponential backoff attempts.
Changes:
- Replaced backoff decorator with custom retry loop implementing exponential backoff with jitter (0.5s base, 60s cap)
- Added Authorization header support (Basic auth with write key and OAuth Bearer token)
- Added X-Retry-Count header to track attempt numbers
- Implemented Retry-After header support (capped at 300s, doesn't count against retry budget)
- Granular status code classification distinguishing retryable from non-retryable errors
- Increased default max_retries from 10 to 1000 to accommodate faster retry cadence
- OAuth token clearing expanded to include 511 status code
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| segment/analytics/request.py | Added Authorization header (Basic/Bearer), X-Retry-Count header, parse_retry_after() function, and response object to APIError |
| segment/analytics/consumer.py | Replaced backoff decorator with custom retry loop implementing exponential backoff, Retry-After support, and granular status code classification |
| segment/analytics/client.py | Increased default max_retries from 10 to 1000 |
| segment/analytics/test/test_request.py | Added 17 comprehensive tests for authorization headers, X-Retry-Count, Retry-After parsing, and OAuth token clearing |
| segment/analytics/test/test_consumer.py | Added 13 comprehensive tests for retry logic, status code classification, backoff behavior, and Retry-After support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
segment/analytics/consumer.py
Outdated
| # Calculate exponential backoff delay with jitter | ||
| base_delay = 0.5 * (2 ** (backoff_attempts - 1)) | ||
| jitter = random.uniform(0, 0.1 * base_delay) | ||
| delay = min(base_delay + jitter, 60) # Cap at 60 seconds | ||
|
|
||
| self.log.debug( | ||
| f"Retry attempt {backoff_attempts}/{self.retries} (total attempts: {total_attempts}) " | ||
| f"after {delay:.2f}s for status {e.status}" | ||
| ) | ||
| time.sleep(delay) | ||
|
|
||
| except Exception as e: | ||
| # Network errors or other exceptions - retry with backoff | ||
| total_attempts += 1 | ||
| backoff_attempts += 1 | ||
|
|
||
| if backoff_attempts >= max_backoff_attempts: | ||
| self.log.error( | ||
| f"All {self.retries} retries exhausted after {total_attempts} total attempts. Final error: {e}" | ||
| ) | ||
| raise | ||
|
|
||
| # Calculate exponential backoff delay with jitter | ||
| base_delay = 0.5 * (2 ** (backoff_attempts - 1)) | ||
| jitter = random.uniform(0, 0.1 * base_delay) | ||
| delay = min(base_delay + jitter, 60) # Cap at 60 seconds |
There was a problem hiding this comment.
The exponential backoff calculation is duplicated between the APIError handler (lines 196-199) and the generic Exception handler (lines 218-221). Consider extracting this logic into a helper function to reduce code duplication and improve maintainability.
segment/analytics/consumer.py
Outdated
| if should_use_retry_after(e.status) and e.response: | ||
| retry_after = parse_retry_after(e.response) | ||
| if retry_after: | ||
| self.log.debug( | ||
| f"Retry-After header present: waiting {retry_after}s (attempt {total_attempts})" | ||
| ) | ||
| time.sleep(retry_after) | ||
| continue # Does not count against backoff budget |
There was a problem hiding this comment.
The retry logic does not impose an upper bound on Retry-After attempts. If the server continuously responds with Retry-After headers (for status codes 408, 429, or 503), the client could retry indefinitely. Consider adding a maximum total attempt limit to prevent infinite retry loops, even when Retry-After is present. For example, you could add a check like if total_attempts >= some_large_limit: raise before processing Retry-After.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Extract duplicate exponential backoff calculation into helper function - Add upper bound (max_total_attempts) to prevent infinite retry loops with Retry-After - Improves code maintainability and prevents edge case of continuous Retry-After responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove 408/503 from Retry-After eligibility (only 429 uses Retry-After) - Add rate-limit state to Consumer (rate_limited_until, rate_limit_start_time) - 429 with Retry-After: set rate-limit state, raise to caller for requeue - 429 without Retry-After: counted backoff (not pipeline blocking) - Add maxTotalBackoffDuration / maxRateLimitDuration config (default 43200s) - upload() checks rate-limit state before request(), enforces duration limit - 511 OAuth gating: only retry when OauthManager is configured - Add tests: T04, T17, T19, T20; update 429/408/503 behavior tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle Retry-After: 0 correctly by checking 'is not None' instead of truthiness. Prevent silent batch re-queue on 429 without Retry-After by gating the upload() re-queue path on rate_limited_until being set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
segment/analytics/consumer.py:158
- When a 429 with Retry-After is received (lines 136-144), the batch items are re-queued using
queue.put(), but then in the finally block (lines 156-158), all items in the batch are marked astask_done(). This means the items that were re-queued are still marked as done, which could causequeue.join()to return prematurely before those re-queued items are actually processed. Consider either not callingtask_done()for re-queued items, or handling the re-queueing differently.
if e.status == 429 and self.rate_limited_until is not None:
# 429: rate-limit state already set by request(). Re-queue batch.
self.log.debug('429 received. Re-queuing batch and halting upload iteration.')
for item in batch:
try:
self.queue.put(item, block=False)
except Exception:
pass # Queue full, item lost
success = False
else:
self.log.error('error uploading: %s', e)
success = False
if self.on_error:
self.on_error(e, batch)
except Exception as e:
self.log.error('error uploading: %s', e)
success = False
if self.on_error:
self.on_error(e, batch)
finally:
# mark items as acknowledged from queue
for _ in batch:
self.queue.task_done()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
segment/analytics/request.py
Outdated
| log.debug('received response: %s', payload) | ||
| raise APIError(res.status_code, payload['code'], payload['message']) | ||
| raise APIError(res.status_code, payload['code'], payload['message'], res) | ||
| except ValueError: |
There was a problem hiding this comment.
The exception handler catches ValueError for when res.json() fails to parse JSON, but it doesn't catch KeyError that would be raised if the parsed JSON doesn't contain 'code' or 'message' keys (line 102). Consider catching both ValueError and KeyError to handle malformed JSON responses more gracefully.
| except ValueError: | |
| except (ValueError, KeyError): |
| def test_408_and_503_use_backoff(self): | ||
| """Test that 408 and 503 use exponential backoff""" | ||
| track = {'type': 'track', 'event': 'python event', 'userId': 'userId'} | ||
|
|
||
| for status_code in [408, 503]: | ||
| consumer = Consumer(None, 'testsecret', retries=2) | ||
| call_count = 0 | ||
| sleep_durations = [] | ||
|
|
||
| def mock_post_fn(*args, **kwargs): | ||
| nonlocal call_count | ||
| call_count += 1 | ||
| if call_count == 1: | ||
| response = mock.Mock() | ||
| response.headers = {'Retry-After': '5'} | ||
| error = APIError(status_code, 'error', 'Error') | ||
| error.response = response | ||
| raise error | ||
| return mock.Mock(status_code=200) | ||
|
|
||
| def mock_sleep(duration): | ||
| sleep_durations.append(duration) | ||
|
|
||
| with mock.patch('segment.analytics.consumer.post', side_effect=mock_post_fn): | ||
| with mock.patch('time.sleep', side_effect=mock_sleep): | ||
| consumer.request([track]) | ||
|
|
||
| # Should use backoff delay (0 for first retry), NOT the Retry-After value of 5 | ||
| self.assertEqual(call_count, 2) | ||
| self.assertEqual(len(sleep_durations), 1) | ||
| self.assertEqual(sleep_durations[0], 0, f'{status_code} should use backoff, not Retry-After') |
There was a problem hiding this comment.
The PR description states "Retry-After Behavior - Respected for status codes: 408, 429, 503", but the implementation only respects Retry-After for 429. Status codes 408 and 503 always use exponential backoff, ignoring any Retry-After header. This test confirms this behavior is intentional. Consider updating the PR description to accurately reflect that only 429 respects Retry-After headers.
- Add KeyError to except clause when parsing JSON response to handle missing 'code' or 'message' keys - Add explanatory comment on pre-existing except-pass pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
requests.Response.__bool__() returns False for non-2xx status codes. The checks `if e.response` and `if response` evaluated to False for 429 responses, so parse_retry_after() was never called and the SDK fell back to normal backoff instead of respecting Retry-After. Changed both checks to explicit `is not None` comparisons. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add on_error handler to capture delivery failures from the SDK. Reports success=false with the first error message when any batch fails (non-retryable error or retries exhausted). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Implements improved HTTP response handling and retry logic to align with changes in analytics-java and analytics-next:
Authorizationheader (Basic auth with write key, OAuth Bearer token)X-Retry-Countheader on all requests (starts at 0)Retry-Afterheader support (capped at 300s, doesn't count against retry budget)backoffdecorator with custom retry loop for better controlStatus Code Handling
Retryable 4xx: 408, 410, 429, 460
Non-retryable 4xx: 400, 401, 403, 404, 413, 422, and all other 4xx
Retryable 5xx: All except 501, 505
Non-retryable 5xx: 501, 505
Retry-After Behavior
Test Coverage
Changes
Modified Files
segment/analytics/consumer.py: Custom retry loop, status code classification, Retry-After supportsegment/analytics/request.py: Authorization header, X-Retry-Count header, parse_retry_after()segment/analytics/client.py: Increased default max_retries to 1000segment/analytics/test/test_consumer.py: 13 new test casessegment/analytics/test/test_request.py: 17 new test casesAlignment
This implementation aligns with:
🤖 Generated with Claude Code