fix(exporter/otlp/http): log error details from response on export failure#5155
fix(exporter/otlp/http): log error details from response on export failure#5155grvmishra788 wants to merge 17 commits into
Conversation
| return resp.reason | ||
|
|
||
| content_type = ( | ||
| resp.headers.get("Content-Type", "").split(";", 1)[0].strip().lower() |
There was a problem hiding this comment.
why do we split on semi-colon, shouldn't we do it on comma ? https://requests.readthedocs.io/en/latest/user/quickstart/#response-headers -- mentions comma
There was a problem hiding this comment.
The semicolon split is intentional here. Per https://www.rfc-editor.org/rfc/rfc9110#section-8.3.1, the Content-Type header uses semicolons to separate the media type from parameters like charset:
Content-Type: application/x-protobuf; charset=utf-8
I believe the comma behavior you linked is about how requests combines multiple values of the same header into a single string - that's a different concept. Since Content-Type only has one value with optional ;-delimited parameters, splitting on ; is correct to extract just the media type portion.
|
@DylanRussell One of the earlier CI checks failed |
Now, a different check failed: |
|
@grvmishra788 I've merged main into your branch, hopefully we'll see a clean run this time. |
|
Thanks for the PR! Just a heads-up: we no longer update Please add the appropriate changelog fragment for this change instead of editing |
|
The CI failure is a transient infrastructure issue unrelated to this PR - the job fails at the Python 3.14 setup step itself. Every other 3.14 Windows job in the same run passed with identical configuration. This PR doesn't touch Could an admin re-run the failed job and see what's going wrong? This PR has been in review for a while and already has approvals from @amir-h-rassafi and @DylanRussell - would love to get it merged. Thanks! |
Description
Fixes #4526.
Before: When an OTLP HTTP export fails (non-2xx response), all three exporters (trace, metrics, logs) log only
resp.reason— the HTTP reason phrase (e.g."Bad Request"). The response body is never read, so server-provided error details (e.g. a protobuf-encoded error fromtelemetry.googleapis.com) are silently discarded.After: On a non-2xx response, exporters parse the body using the
Content-Typeheader before logging:application/x-protobufgoogle.rpc.Status; extractmessagefieldapplication/jsonpartialSuccess.errorMessageormessage(google.rpc.Status)resp.text.strip()resp.reason(previous behaviour, unchanged)Type of change
How Has This Been Tested?
The fix was verified end-to-end by constructing a
400response with a protobuf-encodedExportTraceServiceResponsebody (containingpartial_success.error_message = "Request contains too many spans (limit: 10000)") and a mocked OTLP endpoint (via theresponseslibrary). Before the fix, the logged reason was"Bad Request"; after, it is the server-provided message. The same was confirmed for a JSONgoogle.rpc.Statusbody and for an empty body (fallback toresp.reason, no regression).Run from the repo root:
test_response_body_parsing.py— cover every branch of_parse_response_body: protobuf with/without error message, JSONpartialSuccessandmessagefields,; charset=utf-8content-type variants, unknown content-type, empty body, malformed protobuf, malformed JSON.test_proto_span_exporter.pyandtest_proto_log_exporter.py— verify the parsed message appears in the error log on a real export call.Does This PR Require a Contrib Repo Change?
Checklist: