Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
node_modules/
dist/
.compute-build/
.compute-demo-build/
ds-data/
ds-data-loadtest/
ds-data-loadtest-*/
Expand Down
10 changes: 10 additions & 0 deletions docs/aggregation-rollups.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,16 @@ Current response shape:

Current coverage fields:

- `mode`
- `complete`
- `stream_head_offset`
- `visible_through_offset`
- `visible_through_primary_timestamp_max`
- `oldest_omitted_append_at`
- `possible_missing_events_upper_bound`
- `possible_missing_uploaded_segments`
- `possible_missing_sealed_rows`
- `possible_missing_wal_rows`
- `used_rollups`
- `indexed_segments`
- `scanned_segments`
Expand Down
10 changes: 5 additions & 5 deletions docs/alternative-metrics-approach.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ Primary Axiom sources used here:

Repository sources used for the current Prisma Streams behavior:

- [docs/metrics.md](/Users/sorenschmidt/code/streams/docs/metrics.md)
- [docs/aggregation-rollups.md](/Users/sorenschmidt/code/streams/docs/aggregation-rollups.md)
- [src/metrics.ts](/Users/sorenschmidt/code/streams/src/metrics.ts)
- [src/metrics_emitter.ts](/Users/sorenschmidt/code/streams/src/metrics_emitter.ts)
- [metrics.md](./metrics.md)
- [aggregation-rollups.md](./aggregation-rollups.md)
- [src/metrics.ts](../src/metrics.ts)
- [src/metrics_emitter.ts](../src/metrics_emitter.ts)

## Summary

Expand Down Expand Up @@ -156,7 +156,7 @@ model:
Prisma Streams does **not** bill by active time series, but the current internal
metrics path still has active-series pressure in the runtime.

The sharpest example is [src/metrics.ts](/Users/sorenschmidt/code/streams/src/metrics.ts):
The sharpest example is [src/metrics.ts](../src/metrics.ts):

- every distinct metric + tag set becomes a `MetricSeries` in memory
- that map lives for the whole flush interval
Expand Down
76 changes: 15 additions & 61 deletions docs/better-result-adoption.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,69 +212,23 @@ Exit criteria:
- Test suite reflects the Result-first standard.
- Policy checks are enforced by default in CI.

## Full Repository Scope (Current Throw/Catch Inventory)
## Current Guardrail

The following files currently contain `throw new Error` and/or `catch (...)` and are in-scope for migration:
The current enforced inventory is generated by the repository check rather than
kept as a static list in this document:

- `src/app.ts`
- `experiments/bench/routing_key_perf.ts`
- `experiments/bench/segment_cache_perf.ts`
- `experiments/bench/synth.ts`
- `src/bootstrap.ts`
- `src/config.ts`
- `src/db/db.ts`
- `src/db/schema.ts`
- `experiments/demo/common.ts`
- `experiments/demo/live_fields_app.ts`
- `experiments/demo/wal_demo_ingest.ts`
- `experiments/demo/wal_demo_subscribe.ts`
- `src/index/binary_fuse.ts`
- `src/index/indexer.ts`
- `src/index/run_format.ts`
- `src/ingest.ts`
- `src/lens/lens.ts`
- `experiments/loadtests/live/common.ts`
- `experiments/loadtests/live/read_path.ts`
- `experiments/loadtests/live/selective_shedding.ts`
- `experiments/loadtests/live/write_path.ts`
- `src/local/cli.ts`
- `src/local/daemon.ts`
- `src/local/http.ts`
- `src/local/paths.ts`
- `src/local/server.ts`
- `src/local/state.ts`
- `src/memory.ts`
- `src/objectstore/mock_r2.ts`
- `src/objectstore/null.ts`
- `src/objectstore/r2.ts`
- `src/offset.ts`
- `src/reader.ts`
- `src/runtime/hash.ts`
- `src/schema/proof.ts`
- `src/schema/registry.ts`
- `src/segment/format.ts`
- `src/segment/segmenter.ts`
- `src/sqlite/adapter.ts`
- `src/touch/processor_worker.ts`
- `src/touch/live_metrics.ts`
- `src/touch/manager.ts`
- `src/touch/spec.ts`
- `src/touch/worker_pool.ts`
- `src/uploader.ts`
- `src/util/base32_crockford.ts`
- `src/util/bloom256.ts`
- `src/util/duration.ts`
- `src/util/json_pointer.ts`
- `src/util/lru.ts`
- `src/util/retry.ts`
- `src/util/siphash.ts`
- `src/util/time.ts`
- `test/chaos_restart_bootstrap.test.ts`
- `test/ingest_queue_drain.test.ts`
- `test/segmenter_throughput.test.ts`
- `test/touch_processor.test.ts`
- `test/touch_memory_journal.test.ts`
- `test/touch_wait_timeout_reliability.test.ts`
```bash
bun run check:result-policy
```

That check scans `src/**/*.ts` and fails on:

- `throw new Error(...)`
- `.unwrap(...)`

Use the command output as the source of truth for current policy violations.
Tests, scripts, demos, and load-test utilities may still use ordinary thrown
test/process failures where they are not part of runtime error handling.

## Operational Constraints During Migration

Expand Down
155 changes: 155 additions & 0 deletions docs/bun-memory-risk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Bun Memory Risk Policy

This document is repository policy for Bun APIs that have shown native memory
retention under sustained Streams workloads. Treat it as authoritative when
changing object-store, fetch-body, file-body, ingest, or background indexing
code.

## Rule

Do not introduce high-volume use of Bun APIs that materialize native-backed
`Blob`, `File`, `S3File`, `ArrayBuffer`, or request/response bodies without a
specific memory investigation.

Prefer explicit streaming APIs and bounded byte budgets. If a code path must
materialize bytes, it must be protected by all of the following:

- a documented size bound
- bounded concurrency
- memory-sampler coverage for RSS, anon RSS, heap, external, and arrayBuffers
- a local 1 GiB Linux-container stress test when the path can run in production

Forced `Bun.gc(true)` is not an acceptable mitigation by itself. The observed
failure mode is RSS, especially anon RSS, staying high while JS heap and
`arrayBuffers` are much lower.

## Risky APIs

Avoid these APIs in long-lived, high-volume production paths:

- `Bun.S3Client`
- `Bun.S3File`
- `Bun.S3File.arrayBuffer()`
- `Response.arrayBuffer()` over repeated remote downloads
- `Response.blob()` over repeated remote downloads
- `Bun.file(path).arrayBuffer()`
- `Bun.file(path).bytes()`
- `Bun.file(path).text()` for repeated large files
- `Bun.file(path)` as a fetch upload body in sustained object-store upload paths

Small tests, CLI utilities, and bounded local-only helpers may still use these
APIs, but production server paths must use extra care. If in doubt, treat the
API as unsafe until a memory sampler run proves otherwise.

## Preferred Patterns

For object-store uploads:

- use signed `fetch()` requests instead of `Bun.S3Client`
- stream file uploads with `node:fs.createReadStream()` converted through
`node:stream.Readable.toWeb()`
- keep upload concurrency bounded by the memory preset
- avoid hidden follow-up reads or stats unless required for correctness

For object-store reads:

- prefer ranged reads or streaming reads
- use `Response.body.getReader()` rather than `Response.arrayBuffer()`
- if the object-store interface must return `Uint8Array`, collect chunks from
the stream reader under a known object/range size limit

For local files:

- prefer `node:fs` streams for large or repeated reads
- use `Bun.mmap()` only for immutable cache files whose pinned mapping is
intentionally tracked as a cache/leak-candidate budget
- avoid repeated `Bun.file().arrayBuffer()`, `bytes()`, or `text()` loops over
large files in the server

For HTTP request bodies:

- keep append bodies capped by `DS_APPEND_MAX_BODY_BYTES`
- keep ingest concurrency and queue bytes bounded
- on low-memory presets, close append keep-alive connections and keep the
post-append GC path throttled and observable

## Streams Evidence

The production symptom was repeated Compute OOM kills during external event
ingestion. The generator was not colocated with the Streams server, so the
memory pressure belonged to the Streams server process and its background
segment/upload/index work.

Production failures had this shape:

- process killed around `809 MiB` anon RSS plus about `51 MiB` shmem RSS
- JS heap, external memory, and tracked application counters did not explain
the RSS high water
- the host clamped a nominal `1024 MB` preset to about `684.9 MiB` of internal
pressure headroom before the kernel killed `bun`

Local reproduction and fixes:

- MockR2 with `300ms` operation latency alone did not reproduce the OOM shape.
A 500k-event run peaked around `340 MB` RSS and `290 MB` anon RSS.
- The R2-compatible path using Bun's native S3 implementation against MinIO did
reproduce production-shaped pressure. A 900k-event, 1 GiB Linux-container run
reached about `850.9 MB` RSS and `834.4 MB` anon RSS during background
companion catch-up.
- Replacing `Bun.S3Client` / `S3File` with signed `fetch()` R2 requests dropped
the same class of run to about `533.8 MB` RSS and `506.1 MB` anon RSS.
- Removing the remaining `Bun.file(path)` upload body and
`Response.arrayBuffer()` R2 reads reduced the streamed R2 path further. A
900k-event, 1 GiB Linux-container run peaked at about `474.8 MB` RSS and
`461.2 MB` anon RSS, with cgroup `memory.peak` about `637.1 MB`, and settled
near `204.7 MB` RSS and `188.9 MB` anon RSS.

Interpretation: R2 latency can increase overlap between upload and background
work, but the decisive local reproduction came from Bun native S3/body
materialization. Avoiding the Bun S3 API and avoiding remaining Blob/File
`arrayBuffer` paths materially reduced anon RSS.

## Linked Bun Issues

These issues were open or still relevant when this document was written on
2026-04-25. Re-check status before removing any guardrail.

- [oven-sh/bun#29083](https://github.com/oven-sh/bun/issues/29083):
`Bun.S3File.arrayBuffer()` retains RSS and reaches OOM in a 1 GiB Linux
container despite forced GC. This is the closest public repro to the Streams
R2 failure.
- [oven-sh/bun#28741](https://github.com/oven-sh/bun/issues/28741):
fetch `Blob` / `ArrayBuffer` memory is not reclaimed after references are
cleared and GC is forced.
- [oven-sh/bun#20487](https://github.com/oven-sh/bun/issues/20487):
large file downloads through `@google-cloud/storage` and Bun S3 keep RSS high
after GC; the reporter observed Node returning closer to baseline while Bun
accumulated RSS.
- [oven-sh/bun#28427](https://github.com/oven-sh/bun/issues/28427):
simple repeated fetch polling report marked as a Bun memory leak / needs
triage.
- [oven-sh/bun#15020](https://github.com/oven-sh/bun/issues/15020):
repeated file reads with `node:fs` and `Bun.File` reported as memory not being
freed.
- [oven-sh/bun#12941](https://github.com/oven-sh/bun/issues/12941):
earlier Blob/ArrayBuffer GC-retention report. This one was closed as not
planned, but it is relevant history because later open reports describe the
same retention class.

## Review Checklist

Before merging a change that touches body, file, fetch, or object-store code,
check:

- Does the change add `Bun.S3Client`, `Bun.S3File`, `Bun.file()`, `.blob()`, or
`.arrayBuffer()` to a hot path?
- If bytes are materialized, what is the maximum size and concurrency?
- Is the memory visible in `GET /v1/server/_mem` or
`DS_MEMORY_SAMPLER_PATH` output?
- Has the path been tested in a memory-limited Linux container when it can run
on Compute?
- If RSS/anon RSS remains high after work completes, did heap, external,
`arrayBuffers`, SQLite stats, active jobs, ingest queue bytes, and index or
companion phases explain it?

If the answer is unclear, use the streaming alternative first.
11 changes: 9 additions & 2 deletions docs/bundled-companion-and-backfill.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,20 @@ For a sealed uploaded segment, the steady-state published objects are:

The `.cix` may contain any subset of:

- `exact`
- `col`
- `fts`
- `agg`
- `mblk`

The exact secondary index family remains separate because it is a compacted
cross-segment accelerator, not a per-segment section family.
The exact secondary index family remains separate from `.exact`: secondary
exact runs are compacted cross-segment accelerators, while `.exact` is the
per-segment doc-level postings section.

Decoded section views are cached in memory by companion object key, plan
generation, and section kind. The cache is bounded by
`DS_SEARCH_COMPANION_SECTION_CACHE_BYTES`; raw immutable `.cix` objects remain
managed by the local companion file cache.

## Why Bundle Companions

Expand Down
Loading
Loading