Skip to content

bug: PHP-FPM worker crashes with heap corruption in _dd_curl_reset_headers (1.18.0) #3917

@masak1yu

Description

@masak1yu

Bug report

Summary

PHP-FPM worker processes crash with SIGABRT due to heap corruption in the background writer thread (`_dd_writer_loop`) with dd-trace-php 1.18.0. Each crash causes in-flight HTTP requests to be dropped with a 504 Gateway Timeout from the upstream reverse proxy.

Environment

  • dd-trace-php version: 1.18.0
  • PHP: 7.4 NTS, PHP-FPM (standard process manager, not FrankenPHP / not ZTS)
  • OS: Linux x86_64 (glibc)
  • libcurl: system libcurl (`/lib64/libcurl.so.4`)
  • Rollback version that resolved the issue: 1.0.0

Symptom

PHP-FPM worker processes terminate with `SIGABRT` under production load. The crash is triggered by glibc detecting heap corruption (`malloc_printerr`) when the background writer thread calls `curl_slist_free_all` on the stored HTTP header list (`writer->headers`).

From the upstream reverse proxy (nginx), the crashing worker produces a `504 Gateway Timeout` (`upstream timed out, 110: Connection timed out`), interrupting request processing mid-flight.

Coredump stack trace

#0  0x00007f59ee354690 in raise () from /lib64/libpthread.so.0
#1  0x00007f59dfa617b0 in libdd_crashtracker::collector::signal_handler_manager::chain_signal_handler ()
    at libdatadog/libdd-crashtracker/src/collector/signal_handler_manager.rs:125
#2  libdd_crashtracker::collector::crash_handler::handle_posix_sigaction () at libdatadog/libdd-crashtracker/src/collector/crash_handler.rs:209
#3  <signal handler called>
#4  0x00007f59ee594ae0 in raise () from /lib64/libc.so.6
#5  0x00007f59ee595f88 in abort () from /lib64/libc.so.6
#6  0x00007f59ee5d4b94 in __libc_message () from /lib64/libc.so.6
#7  0x00007f59ee5da80a in malloc_printerr () from /lib64/libc.so.6
#8  0x00007f59ee5dad16 in munmap_chunk () from /lib64/libc.so.6
#9  0x00007f59eb30021d in curl_slist_free_all () from /lib64/libcurl.so.4
#10 0x00007f59df6512d8 in _dd_curl_reset_headers (writer=<optimized out>)
    at /go/src/github.com/DataDog/apm-reliability/dd-trace-php/tmp/build_extension/ext/coms.c:928
#11 0x00007f59df6521db in _dd_curl_send_stack (metrics=0x7f59d4f1e990, stack=<optimized out>, writer=0x7f59e0215960)
    at /go/src/github.com/DataDog/apm-reliability/dd-trace-php/tmp/build_extension/ext/coms.c:1057
#12 _dd_writer_loop (_=<optimized out>) at /go/src/github.com/DataDog/apm-reliability/dd-trace-php/tmp/build_extension/ext/coms.c:1268
#13 0x00007f59ee34a40b in start_thread () from /lib64/libpthread.so.0
#14 0x00007f59ee64de7f in clone () from /lib64/libc.so.6

Impact

Intermittent crashes over multiple days, with each crash dropping an in-flight request and leaving dozens of core dump files across multiple web hosts.

Workaround

Rolling back to 1.0.0 resolved the issue immediately.

Notes

  • `DD_TRACE_CLI_ENABLED=false` and `DD_TRACE_SIDECAR_TRACE_SENDER=false` were already set in the environment at the time the crash occurred. These flags are documented to suppress the background writer thread for CLI, but the crash happened in PHP-FPM (web) regardless. The reason this configuration did not prevent the crash is unknown.
  • Code inspection of `coms.c` shows that `writer->headers` is declared as `_Atomic(struct curl_slist *)` (line 412) but is written with a plain non-atomic assignment in `_dd_curl_set_headers` (line 955).
  • `dd_agent_curl_headers` (the global slist used to seed per-request headers) is read from the writer thread without a lock, while it can be freed or reassigned from the main PHP thread (e.g., `ddtrace_coms_curl_shutdown`). A race between these two accesses is a candidate root cause for the heap corruption.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions