storage: fix stale chunks handling on limits#11632
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds helpers and projected write-size estimation, implements a two-pass local space-release strategy for input chunks, updates release/placement APIs to use projected sizes, and adds a test verifying deletable files are preferred when storage limits are hit. ChangesChunk space management
Tests — deletion preference
Sequence Diagram(s)sequenceDiagram
participant Input as flb_input_chunk
participant Router as Router/Outputs
participant FS as chunkio(cio_file)
Input->>Input: compute projected write size
Input->>Router: inspect routes_mask / other routes
Router-->>Input: route membership info
Input->>FS: request/release required_space
alt local scope, pass 0
Input->>Input: evaluate prefers physical delete?
opt skip physical deletion
Input-->>FS: defer unlink (no physical delete)
end
else pass 1 or global
Input-->>FS: unlink/delete candidate chunk(s)
end
FS-->>Input: released_space
Input->>FS: place new chunk using projected placement_size
FS-->>Input: confirm write/placement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)
694-733: Document the magic number 24 for chunk header overhead.The value 24 corresponds to the chunk file header structure (2 header bytes + 4 CRC32 + 16 padding + 2 metadata length bytes), as documented in
flb_input_chunk_get_real_size(lines 522-526). Consider either referencing that constant or adding a similar comment here for maintainability.meta_size = (size_t) cio_meta_size(ic->chunk); /* See https://github.com/edsiper/chunkio#file-layout for header overhead: * 2 (header) + 4 (CRC32) + 16 (padding) + 2 (metadata length) = 24 bytes */ logical_size = (size_t) flb_input_chunk_get_size(ic) + meta_size + 24;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/flb_input_chunk.c` around lines 694 - 733, The code in flb_input_chunk_get_projected_write_size uses the magic number 24 for chunk header overhead without explanation; update the function to document or reference that constant by adding a brief comment explaining the 24 bytes breakdown (2 header bytes + 4 CRC32 + 16 padding + 2 metadata length bytes) or use the same named constant used by flb_input_chunk_get_real_size if one exists; reference flb_input_chunk_get_projected_write_size and flb_input_chunk_get_real_size so reviewers can locate and align the header-overhead handling and ensure maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 694-733: The code in flb_input_chunk_get_projected_write_size uses
the magic number 24 for chunk header overhead without explanation; update the
function to document or reference that constant by adding a brief comment
explaining the 24 bytes breakdown (2 header bytes + 4 CRC32 + 16 padding + 2
metadata length bytes) or use the same named constant used by
flb_input_chunk_get_real_size if one exists; reference
flb_input_chunk_get_projected_write_size and flb_input_chunk_get_real_size so
reviewers can locate and align the header-overhead handling and ensure
maintainability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c3e81c91-1a73-4a0f-8e0b-9685008fa15a
📒 Files selected for processing (2)
src/flb_input_chunk.ctests/internal/input_chunk.c
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0723c308b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)
75-91: Consider directly including<chunkio/cio_file.h>instead of mirroring the struct.The struct
flb_input_chunk_cio_file_viewmirrors fields fromcio_file, which is defined in the public headerlib/chunkio/include/chunkio/cio_file.h. This header is not private—it's part of chunkio's public API and is directly included in chunkio's own tests and source files (e.g.,lib/chunkio/tests/fs.c,lib/chunkio/tests/fs_windows.c). Fluent-bit could simplify this by including<chunkio/cio_file.h>and usingstruct cio_filedirectly, as chunkio itself does, rather than maintaining a duplicate struct definition. If there is a specific reason to avoid this include (e.g., build isolation), document it in the comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/flb_input_chunk.c` around lines 75 - 91, Replace the mirrored struct definition flb_input_chunk_cio_file_view with a direct use of the public struct cio_file by including the public header <chunkio/cio_file.h> and update any casts/usages that referenced flb_input_chunk_cio_file_view to use struct cio_file instead; if including the header is intentionally avoided for build/isolation reasons, add a clear comment above the mirrored struct explaining why and referencing that decision so the duplicate layout is documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/flb_input_chunk.c`:
- Around line 756-764: The loop that increments projected_size by realloc_size
can overflow for very large append_size; replace the iterative loop in
flb_input_chunk.c with a safe calculation: first check for overflow when
computing target = logical_size + append_size, then compute needed = (target >
projected_size) ? target - projected_size : 0; compute increments = (needed +
realloc_size - 1) / realloc_size and add projected_size += increments *
realloc_size while guarding the multiplication/addition for overflow; finally
round projected_size up to page_size with an overflow check — reference
variables projected_size, alloc_size, realloc_size, logical_size, append_size,
and page_size to locate and update the logic.
---
Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 75-91: Replace the mirrored struct definition
flb_input_chunk_cio_file_view with a direct use of the public struct cio_file by
including the public header <chunkio/cio_file.h> and update any casts/usages
that referenced flb_input_chunk_cio_file_view to use struct cio_file instead; if
including the header is intentionally avoided for build/isolation reasons, add a
clear comment above the mirrored struct explaining why and referencing that
decision so the duplicate layout is documented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/flb_input_chunk.c (1)
75-94: Fragile struct mirroring pattern - ensure synchronization with chunkio.The mirror struct must exactly match the leading layout of
struct cio_fileinlib/chunkio/include/chunkio/cio_file.h. Currently, the field order appears correct based on the chunkio header. However, this pattern is inherently fragile:
- Any reordering or insertion of fields in the original struct will silently corrupt reads
- Platform-specific padding differences could cause misalignment
Consider adding a static assertion or compile-time check to validate the struct size/offsets match expectations, or document a version dependency on chunkio.
#!/bin/bash # Verify the mirror struct matches the original cio_file struct layout # by checking the field order in the chunkio header echo "=== Checking cio_file struct definition in chunkio ===" fd -t f 'cio_file.h' --exec cat {} \; 2>/dev/null | head -60 echo "" echo "=== Checking if there are any other references to cio_file fields ===" rg -n 'struct cio_file' --type c -A 20 | head -80🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/flb_input_chunk.c` around lines 75 - 94, The mirrored struct flb_input_chunk_cio_file_view must be validated against chunkio's real struct cio_file to avoid silent corruption; add compile-time assertions that sizeof(fl b_input_chunk_cio_file_view) equals sizeof(struct cio_file) and that offsetof for key fields (fd, flags, synced, allocate_strategy, fs_size, data_size, page_size, alloc_size, realloc_size) match the corresponding offsetof(struct cio_file, ...); implement these checks using static_assert/_Static_assert or equivalent macros (and guard them with `#ifdef` that includes chunkio/cio_file.h when available), or document a strict chunkio version requirement if the header cannot be included. Ensure the checks are named/placed with references to flb_input_chunk_cio_file_view and cio_file so failures point to the mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/flb_input_chunk.c`:
- Around line 798-805: flb_input_chunk_get_real_size can return -1 on error but
its result is cast to size_t into current_size causing unsigned wrap; in the
block that sets current_size when ic->fs_counted == FLB_TRUE, call
flb_input_chunk_get_real_size(ic) into a signed variable (e.g. ssize_t real =
flb_input_chunk_get_real_size(ic)), check for real == -1 and handle the error
path (either treat current_size as 0, log/propagate the error, or return an
error code) before casting to size_t; update the calculation using that
validated value so projected_size - current_size cannot underflow unexpectedly.
- Around line 389-400: flb_input_chunk_get_real_size() can return -1 on error;
before using its result in the FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL branch,
validate chunk_size != -1 and handle the error path: when chunk_size == -1, log
or handle the failure and skip subtracting from output_plugin->fs_chunks_size
and skip adding to released_space (or treat chunk_size as 0) to avoid
underflow/incorrect accounting. Update the block that references chunk_size,
old_input_chunk, output_plugin->fs_chunks_size, and released_space to perform
this check and safe handling.
---
Nitpick comments:
In `@src/flb_input_chunk.c`:
- Around line 75-94: The mirrored struct flb_input_chunk_cio_file_view must be
validated against chunkio's real struct cio_file to avoid silent corruption; add
compile-time assertions that sizeof(fl b_input_chunk_cio_file_view) equals
sizeof(struct cio_file) and that offsetof for key fields (fd, flags, synced,
allocate_strategy, fs_size, data_size, page_size, alloc_size, realloc_size)
match the corresponding offsetof(struct cio_file, ...); implement these checks
using static_assert/_Static_assert or equivalent macros (and guard them with
`#ifdef` that includes chunkio/cio_file.h when available), or document a strict
chunkio version requirement if the header cannot be included. Ensure the checks
are named/placed with references to flb_input_chunk_cio_file_view and cio_file
so failures point to the mismatch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
|
The issue is still present, the storage keeps growing and other error message is in the picture now: [2026/04/17 10:10:34.204] [error] [storage] cannot mmap/read chunk './tmp/fluentbit_storage/dummy.0/15031-1776435034.203997000.flb'This is reproducible with the same config previously shared. |
|
When a chunk is successfully delivered, the routes_mask bit is not being removed, hence I've added some debug options on top of this branch. If you run the shared config, you'll see this message among others: In this case, the data was already delivered to Patch with annotationsdiff --git a/include/fluent-bit/flb_task.h b/include/fluent-bit/flb_task.h
index de7720d19..5ce332eff 100644
--- a/include/fluent-bit/flb_task.h
+++ b/include/fluent-bit/flb_task.h
@@ -169,6 +169,13 @@ static inline void flb_task_users_release(struct flb_task *task)
if (task->users == 0 && mk_list_size(&task->retries) == 0) {
flb_task_destroy(task, FLB_TRUE);
}
+ else {
+ flb_debug("[task] task id=%i NOT destroyed: users=%i, pending "
+ "retries=%i — chunk file will stay on disk until both "
+ "reach zero",
+ task->id, task->users,
+ (int) mk_list_size(&task->retries));
+ }
}
/* Increase the counter for users */
diff --git a/src/flb_engine.c b/src/flb_engine.c
index c2786a1a2..15562b925 100644
--- a/src/flb_engine.c
+++ b/src/flb_engine.c
@@ -439,6 +439,16 @@ static inline int handle_output_event(uint64_t ts,
flb_task_retry_clean(task, ins);
flb_task_users_dec(task, FLB_TRUE);
+
+ flb_debug("[engine] output '%s' OK for chunk '%s' (task_id=%i): "
+ "task now has %i user(s) and %i retry/retries pending — "
+ "routes_mask bits for ALL outputs remain set until task "
+ "is fully destroyed (users==0 && retries==0)",
+ out_name,
+ flb_input_chunk_get_name(task->ic),
+ task_id,
+ task->users,
+ mk_list_size(&task->retries));
}
else if (ret == FLB_RETRY) {
if (ins->retry_limit == FLB_OUT_RETRY_NONE) {
diff --git a/src/flb_input_chunk.c b/src/flb_input_chunk.c
index a704c29bb..1945cbc51 100644
--- a/src/flb_input_chunk.c
+++ b/src/flb_input_chunk.c
@@ -354,7 +354,18 @@ static int flb_input_chunk_release_space(
pass_limit = 2;
}
+ flb_debug("[input chunk] release_space: output='%s' scope=%s "
+ "required=%zd bytes",
+ flb_output_name(output_plugin),
+ (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_LOCAL)
+ ? "LOCAL" : "GLOBAL",
+ *required_space);
+
for (pass = 0; pass < pass_limit; pass++) {
+ flb_debug("[input chunk] release_space: starting pass %d/%d for "
+ "output='%s'", pass, pass_limit - 1,
+ flb_output_name(output_plugin));
+
mk_list_foreach_safe(input_chunk_iterator, input_chunk_iterator_tmp,
&input_plugin->chunks) {
old_input_chunk = mk_list_entry(input_chunk_iterator,
@@ -376,6 +387,11 @@ static int flb_input_chunk_release_space(
flb_input_chunk_prefers_physical_delete(old_input_chunk,
output_plugin,
release_scope) == FLB_FALSE) {
+ flb_debug("[input chunk] release_space pass 0: skipping "
+ "chunk '%s' for output='%s' — has other routes or "
+ "task still active; will retry on pass 1",
+ flb_input_chunk_get_name(old_input_chunk),
+ flb_output_name(output_plugin));
continue;
}
@@ -402,6 +418,14 @@ static int flb_input_chunk_release_space(
old_input_chunk->routes_mask,
input_plugin->config->router);
+ flb_debug("[input chunk] release_space LOCAL: cleared "
+ "routes_mask bit for output='%s' on chunk '%s'; "
+ "routes_mask now %s — physical file will %s",
+ flb_output_name(output_plugin),
+ flb_input_chunk_get_name(old_input_chunk),
+ chunk_destroy_flag ? "empty" : "NOT empty (other outputs still hold bits)",
+ chunk_destroy_flag ? "be deleted" : "remain on disk");
+
chunk_released = FLB_TRUE;
}
else if (release_scope == FLB_INPUT_CHUNK_RELEASE_SCOPE_GLOBAL) {
@@ -465,6 +489,15 @@ static int flb_input_chunk_release_space(
chunk_released = FLB_TRUE;
}
+ else {
+ flb_debug("[input chunk] chunk '%s' routes_mask is "
+ "empty but task id=%i still has %i user(s) "
+ "— cannot destroy task/chunk yet (a "
+ "coroutine is in-flight for this chunk)",
+ flb_input_chunk_get_name(old_input_chunk),
+ old_input_chunk->task->id,
+ old_input_chunk->task->users);
+ }
}
else {
flb_debug("[input chunk] drop chunk %s with no output route from input plugin %s",
@@ -636,6 +669,9 @@ static int flb_input_chunk_is_task_safe_delete(struct flb_task *task)
}
if (task->users != 0) {
+ flb_debug("[input chunk] task id=%i cannot be safely deleted: "
+ "%i user(s) still active (coroutines in-flight or retries "
+ "re-acquiring the task)", task->id, task->users);
return FLB_FALSE;
}
@@ -681,6 +717,14 @@ static int flb_input_chunk_has_other_routes(struct flb_input_chunk *ic,
if (flb_routes_mask_get_bit(ic->routes_mask,
candidate->id,
ic->in->config->router) != 0) {
+ flb_debug("[input chunk] chunk '%s': output '%s' (id=%i) still "
+ "holds a routes_mask bit — physical delete blocked for "
+ "output '%s' (note: this bit is set at chunk creation "
+ "and cleared only when all outputs finish or on eviction, "
+ "NOT on successful delivery)",
+ flb_input_chunk_get_name(ic),
+ flb_output_name(candidate), candidate->id,
+ flb_output_name(o_ins));
return FLB_TRUE;
}
} |
…ction Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…egressions Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…iction Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
5058a48 to
6f20d54
Compare
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
|
@lecaros branch updated |
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
Tests