Skip to content

feat: scheduled database snapshots#37

Merged
tac0turtle merged 11 commits intomainfrom
issue-23
Apr 8, 2026
Merged

feat: scheduled database snapshots#37
tac0turtle merged 11 commits intomainfrom
issue-23

Conversation

@pthmas
Copy link
Copy Markdown
Collaborator

@pthmas pthmas commented Mar 19, 2026

Summary

  • Adds opt-in daily pg_dump snapshots as a background tokio task, following the same spawn pattern as the indexer and metadata fetcher
  • Configurable via SNAPSHOT_ENABLED, SNAPSHOT_TIME (HH:MM UTC), SNAPSHOT_RETENTION (max files kept), and SNAPSHOT_DIR
  • Automatic cleanup of old snapshots beyond the retention count
  • Docker image updated with postgresql16-client for pg_dump; compose file wired with env vars and volume mount

Closes #23

Summary by CodeRabbit

  • New Features

    • Optional daily PostgreSQL snapshot scheduler with UTC scheduling, retention, and configurable storage when enabled.
  • Documentation

    • Added environment variables to configure snapshots (enable, time, retention, container/host paths, UID/GID).
  • Chores

    • Container image and compose defaults updated for snapshot storage and non-root mount permissions; local snapshot dir added to .gitignore. Dev tooling updated.
  • Tests

    • Added unit and integration tests for config, scheduler, retention, and a dump/restore round-trip (ignored by default).
  • Other

    • CLI faucet cooldown now accepts blank/string-style input and is validated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a configurable, time-based PostgreSQL snapshot scheduler with retention cleanup, CLI/config parsing updates, Docker and docker-compose support for snapshot storage, unit and integration tests (including an ignored dump/restore round-trip), and minor build/test tooling changes.

Changes

Cohort / File(s) Summary
Env example
\.env.example
Added commented snapshot env entries: SNAPSHOT_ENABLED, SNAPSHOT_TIME, SNAPSHOT_RETENTION, SNAPSHOT_DIR, SNAPSHOT_HOST_DIR, and optional UID/GID for bind mounts.
Docker & compose
backend/Dockerfile, docker-compose.yml
Install postgresql16-client; create /snapshots owned by atlas (UID/GID 1000); add snapshot-related env vars and host mount; run container under ${UID}:${GID}.
Server config & CLI
backend/crates/atlas-server/src/config.rs, backend/crates/atlas-server/src/cli.rs
Added SnapshotConfig::from_env(...) with parsing/validation and redacted Debug; changed faucet cooldown parsing to treat blank as optional (string typed); added unit tests for both.
Snapshot runtime
backend/crates/atlas-server/src/snapshot.rs, backend/crates/atlas-server/src/main.rs
New scheduler module implementing UTC scheduling, pg_dump to temp file + atomic rename, retry backoff, and retention cleanup; main now conditionally spawns snapshot loop and exposes PostgresConnectionConfig and async/portable pg command builders (pub(crate)).
Tests & dev deps
backend/crates/atlas-server/Cargo.toml, backend/crates/atlas-server/tests/integration/*
Added tempfile dev-dependency; added tests/integration/snapshots.rs (ignored, gated on pg_dump/pg_restore), exposed database_url() helper, and registered snapshots test module.
Misc
.gitignore
Added snapshots/ to ignore local snapshot artifacts.

Sequence Diagram

sequenceDiagram
    participant Scheduler as Scheduler (tokio sleep)
    participant SnapLoop as run_snapshot_loop
    participant PGDump as pg_dump process
    participant FS as File System
    participant Cleanup as cleanup_old_snapshots
    participant Backoff as Retry Backoff

    loop daily schedule
        Scheduler->>SnapLoop: Wake at scheduled UTC time
        SnapLoop->>FS: Ensure snapshot dir exists
        SnapLoop->>PGDump: Spawn pg_dump -> write to `.tmp`
        PGDump->>FS: Write dump data to temp file
        alt pg_dump succeeds
            FS->>FS: Rename `.tmp` -> `.dump` (atomic)
            SnapLoop->>Cleanup: Prune old dumps (keep retention, ignore `.tmp`)
            Cleanup->>FS: List/sort/remove old files
            SnapLoop->>Backoff: Reset retry state
        else pg_dump fails
            SnapLoop->>FS: Remove `.tmp` if present
            SnapLoop->>Backoff: Wait backoff interval (5s,10s,20s,30s,60s)
            Backoff->>SnapLoop: Retry
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • tac0turtle

Poem

🐇 I wake at UTC three to hum,

I call pg_dump and softly drum.
Temp files hop, then leap to store,
Old crumbs I count and sweep the floor.
Hooray — safe snapshots, snug once more!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: scheduled database snapshots' accurately summarizes the main change: adding automated daily database snapshots via pg_dump as a background service.
Linked Issues check ✅ Passed The PR fully implements the snapshot feature requirements from #23: adds both automated snapshots (with scheduling via SNAPSHOT_TIME) and manual dump/restore capabilities, with configurable retention and storage paths.
Out of Scope Changes check ✅ Passed All changes are directly related to snapshot functionality. Minor refactorings (PostgresConnectionConfig visibility, cooldown parsing) are necessary to support snapshot implementation and are appropriately scoped.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-23

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pthmas pthmas marked this pull request as ready for review March 19, 2026 16:35
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
backend/crates/atlas-server/Cargo.toml (1)

48-48: Use workspace-managed versioning for tempfile.

Please avoid pinning tempfile in the crate manifest and reference it via workspace instead, so dependency versions stay centralized.

Suggested change
 [dev-dependencies]
 testcontainers = { workspace = true }
 testcontainers-modules = { workspace = true }
 tokio = { workspace = true }
 tower = { workspace = true, features = ["util"] }
 serde_json = { workspace = true }
 sqlx = { workspace = true }
-tempfile = "3"
+tempfile = { workspace = true }

Based on learnings: Keep all Rust workspace dependency versions in the root Cargo.toml.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/Cargo.toml` at line 48, Replace the hard-pinned
tempfile = "3" entry in this crate's Cargo.toml with a workspace reference by
removing the version and using the workspace-managed dependency (i.e., change
the tempfile entry to reference the workspace dependency without a version
string), so the crate relies on the version declared in the workspace root; look
for the tempfile entry in backend/crates/atlas-server/Cargo.toml and remove the
pinned version to defer to workspace-managed versions.
backend/crates/atlas-server/tests/integration/snapshots.rs (2)

57-57: Fragile database URL substitution.

The replace("/postgres", "/test_restore") pattern assumes the source database is always named postgres and that /postgres doesn't appear elsewhere in the URL (e.g., in a password or hostname). Consider parsing the URL properly or using a more targeted replacement at the path end.

♻️ Suggested fix using URL parsing
-        let restore_url = db_url.replace("/postgres", "/test_restore");
+        let mut parsed = url::Url::parse(db_url).expect("parse DATABASE_URL");
+        parsed.set_path("/test_restore");
+        let restore_url = parsed.to_string();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` at line 57, The
db URL substitution using db_url.replace("/postgres", "/test_restore") is
fragile; instead parse db_url (e.g., with the Url type from the url crate or a
PostgreSQL connection string parser) and replace only the path component to
"test_restore" before rebuilding the string so you don't accidentally modify
passwords/hosts; update the code that constructs restore_url (reference variable
restore_url and source db_url) to parse, set url.path_segments_mut() or
equivalent to the single segment "test_restore", and then serialize the URL back
to a string for use in the test.

88-93: Cleanup may be skipped if assertions fail.

If any assertion fails before line 90, the test_restore database won't be dropped, potentially leaving artifacts that could cause subsequent test runs to fail (e.g., CREATE DATABASE test_restore would error). Consider wrapping cleanup in a guard or using a drop-based cleanup pattern.

♻️ Consider a cleanup guard or defer-style pattern

One approach is to use defer semantics via scopeguard or a custom struct that drops the database in Drop. Alternatively, add a DROP DATABASE IF EXISTS test_restore at the start of the test to handle leftover state from previous failed runs:

+        // Clean up any leftover database from previous failed runs
+        let _ = sqlx::query("DROP DATABASE IF EXISTS test_restore")
+            .execute(pool)
+            .await;
+
         // Create a separate database for restore
         sqlx::query("CREATE DATABASE test_restore")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 88 -
93, The test currently drops the test database with sqlx::query("DROP DATABASE
test_restore").execute(pool).await after calling restore_pool.close().await, but
this cleanup is skipped if earlier assertions panic; refactor to guarantee
cleanup by introducing a guard (e.g., a small struct with Drop impl or using
scopeguard) that holds the pool/connection and runs sqlx::query("DROP DATABASE
IF EXISTS test_restore").execute(...) in Drop, or alternatively run
sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(pool).await at the
start of the test to ensure idempotent cleanup; update references to
restore_pool.close().await and the existing sqlx::query(...) invocation to be
performed via that guard so the database is removed even on test failure.
backend/crates/atlas-server/src/snapshot.rs (2)

127-129: Use idiomatic descending sort.

Per coding guidelines, prefer idiomatic Rust patterns. The two-step sort() then reverse() can be combined into a single descending sort.

♻️ Suggested fix
-    files.sort();
-    files.reverse();
+    files.sort_by(|a, b| b.cmp(a));

As per coding guidelines: "Use idiomatic Rust patterns: prefer .min(), .max(), |=, += over manual if/assign statements"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, Replace
the two-step ascending-then-reverse sequence operating on the `files` collection
(calls to `files.sort()` followed by `files.reverse()`) with a single idiomatic
descending sort; update the code to call a single sort variant that orders
descending (e.g., use `sort_by`/`sort_unstable_by` with `b.cmp(a)` or an
appropriate key-based descending comparator) so the collection is sorted
newest-first in one pass.

47-54: Consider adding a timeout to pg_dump execution.

If pg_dump hangs (e.g., database unresponsive, network issues), the snapshot loop will block indefinitely, preventing future snapshots. Consider wrapping the command with tokio::time::timeout.

♻️ Suggested fix
+    let pg_dump_timeout = Duration::from_secs(3600); // 1 hour max
+    let status = tokio::time::timeout(
+        pg_dump_timeout,
+        tokio::process::Command::new("pg_dump")
             .arg("--dbname")
             .arg(&config.database_url)
             .arg("-Fc")
             .arg("-f")
             .arg(&tmp_path)
             .status()
-            .await?;
+    )
+    .await
+    .map_err(|_| anyhow::anyhow!("pg_dump timed out after {} seconds", pg_dump_timeout.as_secs()))??;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 54, Wrap the
blocking pg_dump invocation with tokio::time::timeout to avoid hanging the
snapshot loop: instead of awaiting .status() directly on
tokio::process::Command, spawn the child with Command::spawn() (reference
Command::spawn and the tmp_path/status variables), then await child.wait()
inside tokio::time::timeout(Duration::from_secs(...)). If the timeout returns
Err(elapsed) call child.kill().await (or child.kill() and then
child.wait().await) and return or log a timeout error; on Ok, inspect the
ExitStatus as before. Make the timeout duration configurable or a constant and
handle both timeout and command errors consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 255-262: SnapshotConfig currently derives Debug and may log
sensitive DB credentials via the database_url field; remove #[derive(Debug)] (or
stop deriving Debug) for SnapshotConfig and implement a manual Debug impl for
the SnapshotConfig struct that formats all fields except database_url (or prints
database_url as a redacted placeholder like "<redacted>") so logs cannot leak
secrets; update any uses that relied on the derived Debug (e.g., fmt::Debug
formatting or debug! calls referencing SnapshotConfig) to continue working with
the new manual impl.
- Around line 293-294: The current env::var call that sets let dir =
env::var("SNAPSHOT_DIR").unwrap_or_else(|_| "/snapshots".to_string()) can
produce an empty string; update the code that reads SNAPSHOT_DIR (the let dir
variable in config.rs) to validate that the resulting string is non-empty when
snapshots are enabled: after obtaining the value (either from env::var or the
default), check if dir.trim().is_empty() and return/configuration-error (or fall
back to "/snapshots") so the configuration fails fast instead of allowing an
empty SNAPSHOT_DIR to propagate into the scheduler loop; ensure the check is
applied in the same function that constructs the snapshot configuration so
callers that enable snapshots receive an explicit error or valid path.

In `@docker-compose.yml`:
- Around line 53-54: The snapshot bind mount using the volume entry "-
${SNAPSHOT_HOST_DIR:-./snapshots}:${SNAPSHOT_DIR:-/snapshots}" can make
/snapshots unwritable for the non-root atlas user because host directory
permissions override container ownership; to fix, either switch to a named
Docker volume (remove SNAPSHOT_HOST_DIR bind and use a named volume) or update
deployment docs and startup to ensure SNAPSHOT_HOST_DIR is pre-created with
correct owner/permissions (matching the atlas UID/GID), or add an init step that
chowns the mounted path to the atlas user before snapshot jobs run; reference
the volume line using SNAPSHOT_HOST_DIR and SNAPSHOT_DIR and update
docker-compose and docs accordingly.

---

Nitpick comments:
In `@backend/crates/atlas-server/Cargo.toml`:
- Line 48: Replace the hard-pinned tempfile = "3" entry in this crate's
Cargo.toml with a workspace reference by removing the version and using the
workspace-managed dependency (i.e., change the tempfile entry to reference the
workspace dependency without a version string), so the crate relies on the
version declared in the workspace root; look for the tempfile entry in
backend/crates/atlas-server/Cargo.toml and remove the pinned version to defer to
workspace-managed versions.

In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: Replace the two-step ascending-then-reverse sequence
operating on the `files` collection (calls to `files.sort()` followed by
`files.reverse()`) with a single idiomatic descending sort; update the code to
call a single sort variant that orders descending (e.g., use
`sort_by`/`sort_unstable_by` with `b.cmp(a)` or an appropriate key-based
descending comparator) so the collection is sorted newest-first in one pass.
- Around line 47-54: Wrap the blocking pg_dump invocation with
tokio::time::timeout to avoid hanging the snapshot loop: instead of awaiting
.status() directly on tokio::process::Command, spawn the child with
Command::spawn() (reference Command::spawn and the tmp_path/status variables),
then await child.wait() inside tokio::time::timeout(Duration::from_secs(...)).
If the timeout returns Err(elapsed) call child.kill().await (or child.kill() and
then child.wait().await) and return or log a timeout error; on Ok, inspect the
ExitStatus as before. Make the timeout duration configurable or a constant and
handle both timeout and command errors consistently.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs`:
- Line 57: The db URL substitution using db_url.replace("/postgres",
"/test_restore") is fragile; instead parse db_url (e.g., with the Url type from
the url crate or a PostgreSQL connection string parser) and replace only the
path component to "test_restore" before rebuilding the string so you don't
accidentally modify passwords/hosts; update the code that constructs restore_url
(reference variable restore_url and source db_url) to parse, set
url.path_segments_mut() or equivalent to the single segment "test_restore", and
then serialize the URL back to a string for use in the test.
- Around line 88-93: The test currently drops the test database with
sqlx::query("DROP DATABASE test_restore").execute(pool).await after calling
restore_pool.close().await, but this cleanup is skipped if earlier assertions
panic; refactor to guarantee cleanup by introducing a guard (e.g., a small
struct with Drop impl or using scopeguard) that holds the pool/connection and
runs sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(...) in Drop,
or alternatively run sqlx::query("DROP DATABASE IF EXISTS
test_restore").execute(pool).await at the start of the test to ensure idempotent
cleanup; update references to restore_pool.close().await and the existing
sqlx::query(...) invocation to be performed via that guard so the database is
removed even on test failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f400324c-4e91-4be0-b747-26e65caf32eb

📥 Commits

Reviewing files that changed from the base of the PR and between 2fb2426 and 5255005.

📒 Files selected for processing (10)
  • .env.example
  • backend/Dockerfile
  • backend/crates/atlas-server/Cargo.toml
  • backend/crates/atlas-server/src/config.rs
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/src/snapshot.rs
  • backend/crates/atlas-server/tests/integration/common.rs
  • backend/crates/atlas-server/tests/integration/main.rs
  • backend/crates/atlas-server/tests/integration/snapshots.rs
  • docker-compose.yml

@pthmas pthmas self-assigned this Mar 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/crates/atlas-server/src/config.rs (1)

293-295: Consider documenting that SNAPSHOT_TIME is interpreted as UTC.

Based on context from snapshot.rs, the NaiveTime value is interpreted as UTC (via and_utc()). Operators in non-UTC timezones may expect local time. A brief comment here or in .env.example would clarify.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/config.rs` around lines 293 - 295, Document
that SNAPSHOT_TIME is interpreted as UTC: update the comment near the NaiveTime
parsing in config.rs (around the time_str/NaiveTime::parse_from_str code) to
state explicitly that the parsed NaiveTime will be treated as UTC (snapshot.rs
uses .and_utc()), and add a matching note in .env.example or README to warn
operators in non-UTC timezones that times are UTC. Ensure the comment references
SNAPSHOT_TIME, NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior
so readers can find the related code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 293-295: Document that SNAPSHOT_TIME is interpreted as UTC: update
the comment near the NaiveTime parsing in config.rs (around the
time_str/NaiveTime::parse_from_str code) to state explicitly that the parsed
NaiveTime will be treated as UTC (snapshot.rs uses .and_utc()), and add a
matching note in .env.example or README to warn operators in non-UTC timezones
that times are UTC. Ensure the comment references SNAPSHOT_TIME,
NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior so readers can
find the related code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae8d11ee-2bc2-47c8-9c61-44b90aed9b3f

📥 Commits

Reviewing files that changed from the base of the PR and between 5255005 and 79529b4.

📒 Files selected for processing (3)
  • .env.example
  • backend/crates/atlas-server/src/config.rs
  • docker-compose.yml
✅ Files skipped from review due to trivial changes (1)
  • .env.example
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker-compose.yml

@pthmas pthmas requested a review from tac0turtle March 23, 2026 15:34
pthmas added 5 commits March 31, 2026 11:45
Add opt-in daily pg_dump snapshots to protect against data loss.
Runs as a background tokio task with configurable time, retention,
and output directory. Includes integration test for dump/restore
round-trip.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/crates/atlas-server/src/snapshot.rs (2)

127-129: Consider using sort_by for descending sort.

Per coding guidelines for idiomatic Rust, a single sort_by call with reverse comparison is slightly more efficient and clearer than sort() followed by reverse().

♻️ Proposed change
     // Sort descending (newest first) — timestamp in filename gives lexicographic order
-    files.sort();
-    files.reverse();
+    files.sort_by(|a, b| b.cmp(a));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, The code
currently sorts the vector of filenames with files.sort() and then
files.reverse(), which is less idiomatic and slightly less efficient; replace
that pair with a single descending sort call (e.g., use files.sort_by or
files.sort_unstable_by with a reversed comparison like comparing b to a) so the
filenames are sorted newest-first in one step, updating the sorting of the
variable files where files.sort() and files.reverse() are used.

47-64: Consider capturing pg_dump stderr for better error diagnostics.

When pg_dump fails, only the exit status is logged. Capturing stderr would provide more actionable error messages (e.g., connection refused, permission denied, disk full).

♻️ Proposed enhancement to capture stderr
-    let status = tokio::process::Command::new("pg_dump")
+    let output = tokio::process::Command::new("pg_dump")
         .arg("--dbname")
         .arg(&config.database_url)
         .arg("-Fc")
         .arg("-f")
         .arg(&tmp_path)
-        .status()
+        .output()
         .await?;
 
-    if status.success() {
+    if output.status.success() {
         tokio::fs::rename(&tmp_path, &final_path).await?;
         tracing::info!(%filename, "Snapshot complete");
         cleanup_old_snapshots(&config.dir, config.retention).await;
         Ok(())
     } else {
         let _ = tokio::fs::remove_file(&tmp_path).await;
-        bail!("pg_dump failed with status: {status}");
+        let stderr = String::from_utf8_lossy(&output.stderr);
+        bail!("pg_dump failed with status {}: {}", output.status, stderr.trim());
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 64, The
pg_dump invocation currently only checks exit status, losing stderr details;
change from tokio::process::Command::status() to .output().await to capture
Output { stdout, stderr }, then on success proceed with
tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename,
"Snapshot complete") and cleanup_old_snapshots(&config.dir,
config.retention).await; on failure remove the tmp file as before but include
the captured stderr (convert bytes to string safely) in the error/bail message
(and log it via tracing::error or include in bail!("pg_dump failed: {}: {}",
status, stderr_text)), referencing the existing variables tmp_path, final_path,
status/output, and cleanup_old_snapshots to locate where to modify behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: The code currently sorts the vector of filenames with
files.sort() and then files.reverse(), which is less idiomatic and slightly less
efficient; replace that pair with a single descending sort call (e.g., use
files.sort_by or files.sort_unstable_by with a reversed comparison like
comparing b to a) so the filenames are sorted newest-first in one step, updating
the sorting of the variable files where files.sort() and files.reverse() are
used.
- Around line 47-64: The pg_dump invocation currently only checks exit status,
losing stderr details; change from tokio::process::Command::status() to
.output().await to capture Output { stdout, stderr }, then on success proceed
with tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename,
"Snapshot complete") and cleanup_old_snapshots(&config.dir,
config.retention).await; on failure remove the tmp file as before but include
the captured stderr (convert bytes to string safely) in the error/bail message
(and log it via tracing::error or include in bail!("pg_dump failed: {}: {}",
status, stderr_text)), referencing the existing variables tmp_path, final_path,
status/output, and cleanup_old_snapshots to locate where to modify behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40200aef-ea90-4e71-aa1d-ba07c7bba016

📥 Commits

Reviewing files that changed from the base of the PR and between dd3c981 and 1aec619.

📒 Files selected for processing (11)
  • .env.example
  • .gitignore
  • backend/Dockerfile
  • backend/crates/atlas-server/Cargo.toml
  • backend/crates/atlas-server/src/config.rs
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/src/snapshot.rs
  • backend/crates/atlas-server/tests/integration/common.rs
  • backend/crates/atlas-server/tests/integration/main.rs
  • backend/crates/atlas-server/tests/integration/snapshots.rs
  • docker-compose.yml
✅ Files skipped from review due to trivial changes (5)
  • .gitignore
  • backend/crates/atlas-server/tests/integration/main.rs
  • .env.example
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (3)
  • docker-compose.yml
  • backend/Dockerfile
  • backend/crates/atlas-server/tests/integration/snapshots.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/crates/atlas-server/src/snapshot.rs (1)

124-126: Consider using sort_by for descending order in a single call.

This is a minor style suggestion. The current approach works correctly.

♻️ Optional refactor
-    // Sort descending (newest first) — timestamp in filename gives lexicographic order
-    files.sort();
-    files.reverse();
+    // Sort descending (newest first) — timestamp in filename gives lexicographic order
+    files.sort_by(|a, b| b.cmp(a));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 124 - 126, Replace
the two-step descending sort (calling files.sort() then files.reverse()) with a
single descending sort call on the files collection; use files.sort_by with a
comparator that compares b to a (or files.sort_unstable_by for a faster unstable
sort) to achieve descending order in one call, updating the snapshot.rs location
where the files variable is sorted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/main.rs`:
- Around line 359-371: The CI formatting failed due to an extra blank line after
the snapshot scheduler block; edit the block that uses snapshot_config and
tokio::spawn with run_with_retry calling snapshot::run_snapshot_loop and remove
the stray blank line (ensure the closing brace of the if block is directly
followed by the next code or EOF), then re-run rustfmt/cargo fmt to verify
formatting passes.

---

Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 124-126: Replace the two-step descending sort (calling
files.sort() then files.reverse()) with a single descending sort call on the
files collection; use files.sort_by with a comparator that compares b to a (or
files.sort_unstable_by for a faster unstable sort) to achieve descending order
in one call, updating the snapshot.rs location where the files variable is
sorted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c114d8d-c004-4b6e-8d3a-4b6b1d4e004b

📥 Commits

Reviewing files that changed from the base of the PR and between 1aec619 and 93dc63c.

📒 Files selected for processing (2)
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/src/snapshot.rs

@pthmas
Copy link
Copy Markdown
Collaborator Author

pthmas commented Apr 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/cli.rs`:
- Around line 403-423: The test
empty_faucet_cooldown_env_is_ignored_when_faucet_disabled mutates process-wide
env vars unsafely; wrap the env set/remove operations with the existing ENV_LOCK
to prevent races by acquiring the lock at the start of the test (e.g., let _lock
= ENV_LOCK.lock().unwrap();) before any std::env::set_var/std::env::remove_var
calls and keep the lock held for the duration of the test so all env mutations
are synchronized with other tests that use ENV_LOCK from config.rs.

In `@backend/crates/atlas-server/src/config.rs`:
- Around line 679-697: This test mutates FAUCET_PRIVATE_KEY without acquiring
the shared ENV_LOCK used by other environment-variable tests; protect the
mutation by acquiring the same ENV_LOCK before setting/removing the env var (or
move the test into the tests module that already defines ENV_LOCK) so the
setup/teardown around FaucetConfig::from_faucet_args is synchronized—e.g.,
obtain ENV_LOCK (from the tests module) at the top of
faucet_blank_cooldown_is_treated_as_missing, then set FAUCET_PRIVATE_KEY, call
FaucetConfig::from_faucet_args, and remove the var before dropping the lock.

In `@backend/crates/atlas-server/src/main.rs`:
- Line 287: The check() command currently doesn't validate SNAPSHOT_* variables;
call config::SnapshotConfig::from_env(&config.database_url) (same parsing used
in run()) inside check() and propagate or return the error if parsing/validation
fails so check() fails for invalid SNAPSHOT_ENABLED/time/retention/dir settings;
ensure you reference SnapshotConfig::from_env and mirror the same error handling
used in run() so invalid snapshot env vars are reported by atlas-server check.

In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 12-19: The scheduling function duration_until_next uses a strict
greater-than so when now equals the target time it incorrectly schedules for
tomorrow; change the comparison in duration_until_next to use >= (i.e., treat
today_target >= now as “next” is today) so the scheduler runs immediately at the
configured time, and update the unit test exactly_now_wraps_to_tomorrow to
expect immediate (today) scheduling rather than wrapping to tomorrow.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs`:
- Around line 52-57: The code currently derives restore_url by string-replacing
"/postgres" on db_url (variable restore_url and db_url), which fails for
non-standard paths and can collide; instead parse db_url with a URL parser
(e.g., url::Url), generate a unique database name (use UUID or timestamp) for
the restore target (instead of hardcoded "test_restore"), set the parsed URL's
path/last path segment to that unique name (using path_segments_mut or
equivalent), and then reconstruct restore_url from the parsed URL before running
sqlx::query to create and use the new database; ensure the same generated name
is used both for CREATE DATABASE and for building restore_url.
- Around line 34-40: The test currently calls pg_dump/pg_restore directly with
--dbname and a brittle URL mutation; update it to parse the URL via
postgres_connection_config() and build commands using
portable_pg_dump_command_async() for the dump and the production restore path
using PORTABLE_PG_RESTORE_FLAGS for restore so credentials are passed via PG_*
env vars and portable flags (--no-owner, --no-acl, --exit-on-error) are applied;
replace the manual .replace("/postgres", "/test_restore") logic with
constructing a proper ConnectionConfig via postgres_connection_config() and feed
that into the portable command builders to mirror production behavior and avoid
exposing credentials in the process list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e491a96-4d68-42ba-b222-8680de2ea40e

📥 Commits

Reviewing files that changed from the base of the PR and between 1aec619 and e027a4d.

📒 Files selected for processing (8)
  • .env.example
  • backend/crates/atlas-server/Cargo.toml
  • backend/crates/atlas-server/src/cli.rs
  • backend/crates/atlas-server/src/config.rs
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/src/snapshot.rs
  • backend/crates/atlas-server/tests/integration/common.rs
  • backend/crates/atlas-server/tests/integration/snapshots.rs
✅ Files skipped from review due to trivial changes (2)
  • backend/crates/atlas-server/Cargo.toml
  • .env.example

Comment on lines +403 to +423
#[test]
fn empty_faucet_cooldown_env_is_ignored_when_faucet_disabled() {
unsafe {
std::env::set_var("DATABASE_URL", "postgres://atlas:atlas@localhost/atlas");
std::env::set_var("RPC_URL", "http://localhost:8545");
std::env::set_var("FAUCET_ENABLED", "false");
std::env::set_var("FAUCET_COOLDOWN_MINUTES", "");
}

let cli = Cli::try_parse_from(["atlas-server", "run"]).expect("parse cli");
match cli.command {
Command::Run(args) => assert_eq!(args.faucet.cooldown_minutes, Some(String::new())),
_ => panic!("expected run command"),
}

unsafe {
std::env::remove_var("DATABASE_URL");
std::env::remove_var("RPC_URL");
std::env::remove_var("FAUCET_ENABLED");
std::env::remove_var("FAUCET_COOLDOWN_MINUTES");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the exact test code at lines 403-423
head -425 backend/crates/atlas-server/src/cli.rs | tail -23 | cat -n

Repository: evstack/atlas

Length of output: 1114


🏁 Script executed:

# Search for other tests in this file that mutate env vars
rg "set_var|remove_var" backend/crates/atlas-server/src/cli.rs -B 3 -A 1

Repository: evstack/atlas

Length of output: 689


🏁 Script executed:

# Look for mutex/lock patterns used for env synchronization elsewhere in the crate
rg "Mutex|RwLock|lazy_static|once_cell" backend/crates/atlas-server/ -t rs --max-count 20

Repository: evstack/atlas

Length of output: 83


🏁 Script executed:

# Check if there's a scoped env helper in this crate
fd "env" backend/crates/atlas-server/src/ -t rs | head -20

Repository: evstack/atlas

Length of output: 226


🏁 Script executed:

# Search for "scoped" or "helper" patterns related to env in the broader crate
rg "scoped|helper.*env" backend/crates/ -t rs -i --max-count 15

Repository: evstack/atlas

Length of output: 83


🏁 Script executed:

# Search for Mutex, lazy_static, once_cell without specifying file type
rg "Mutex|lazy_static|once_cell" backend/crates/atlas-server/ -B 2 -A 1 | head -50

Repository: evstack/atlas

Length of output: 1268


🏁 Script executed:

# Look for other #[test] functions to see how they handle state
rg "#\[test\]" backend/crates/atlas-server/src/cli.rs -A 20 | head -80

Repository: evstack/atlas

Length of output: 945


🏁 Script executed:

# Search broader backend crates for env-related test patterns
rg "set_var|remove_var" backend/crates/ -B 5 -A 5 | head -60

Repository: evstack/atlas

Length of output: 4355


🏁 Script executed:

# Check if there's any synchronization mechanism defined in lib.rs or mod.rs files
find backend/crates/atlas-server/src -name "lib.rs" -o -name "mod.rs" | xargs cat 2>/dev/null | head -100

Repository: evstack/atlas

Length of output: 3082


🏁 Script executed:

# Search for where ENV_LOCK is actually used/locked in tests
rg "ENV_LOCK" backend/crates/atlas-server/src/config.rs -B 2 -A 5

Repository: evstack/atlas

Length of output: 6343


🏁 Script executed:

# Get the full test function that uses env vars in config.rs to see if it acquires the lock
rg "#\[test\]" backend/crates/atlas-server/src/config.rs -A 30 | head -80

Repository: evstack/atlas

Length of output: 2747


Guard env mutations with the established ENV_LOCK pattern.

This test mutates process-global env vars without synchronization. The crate already defines static ENV_LOCK: Mutex<()> in config.rs and uses it consistently across all env-mutating tests (e.g., let _lock = ENV_LOCK.lock().unwrap();). Apply the same pattern here to prevent races with parallel test execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/cli.rs` around lines 403 - 423, The test
empty_faucet_cooldown_env_is_ignored_when_faucet_disabled mutates process-wide
env vars unsafely; wrap the env set/remove operations with the existing ENV_LOCK
to prevent races by acquiring the lock at the start of the test (e.g., let _lock
= ENV_LOCK.lock().unwrap();) before any std::env::set_var/std::env::remove_var
calls and keep the lock held for the duration of the test so all env mutations
are synchronized with other tests that use ENV_LOCK from config.rs.

Comment on lines +679 to +697
#[test]
fn faucet_blank_cooldown_is_treated_as_missing() {
let mut args = minimal_run_args();
args.faucet.enabled = true;
args.faucet.amount = Some("0.1".to_string());
args.faucet.cooldown_minutes = Some(" ".to_string());

unsafe {
env::set_var(
"FAUCET_PRIVATE_KEY",
"0x59c6995e998f97a5a0044966f0945382dbd8c5df5440d8d6d0d0f66f6d7d6a0d",
);
}
let err = FaucetConfig::from_faucet_args(&args.faucet).unwrap_err();
assert!(err.to_string().contains("cooldown-minutes"));
unsafe {
env::remove_var("FAUCET_PRIVATE_KEY");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and check its size
wc -l backend/crates/atlas-server/src/config.rs

Repository: evstack/atlas

Length of output: 103


🏁 Script executed:

# Search for ENV_LOCK in the file
rg "ENV_LOCK" backend/crates/atlas-server/src/config.rs -B 2 -A 2

Repository: evstack/atlas

Length of output: 3562


🏁 Script executed:

# Look at the tests module structure and how env is managed
rg "mod tests" backend/crates/atlas-server/src/config.rs -A 50 | head -80

Repository: evstack/atlas

Length of output: 2620


🏁 Script executed:

# Check the specific lines 679-697 and surrounding context to see if ENV_LOCK is used
sed -n '670,710p' backend/crates/atlas-server/src/config.rs

Repository: evstack/atlas

Length of output: 1255


Protect this env-driven test with the shared env lock.

This test writes FAUCET_PRIVATE_KEY directly without acquiring ENV_LOCK. Since ENV_LOCK synchronizes all other environment-variable tests in the tests module below, this unprotected mutation can race and cause nondeterministic failures. Either move ENV_LOCK to a shared location accessible from both tests_from_run_args and tests, or relocate this test to the tests module where the lock is defined.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/config.rs` around lines 679 - 697, This test
mutates FAUCET_PRIVATE_KEY without acquiring the shared ENV_LOCK used by other
environment-variable tests; protect the mutation by acquiring the same ENV_LOCK
before setting/removing the env var (or move the test into the tests module that
already defines ENV_LOCK) so the setup/teardown around
FaucetConfig::from_faucet_args is synchronized—e.g., obtain ENV_LOCK (from the
tests module) at the top of faucet_blank_cooldown_is_treated_as_missing, then
set FAUCET_PRIVATE_KEY, call FaucetConfig::from_faucet_args, and remove the var
before dropping the lock.


let config = config::Config::from_run_args(args.clone())?;
let faucet_config = config::FaucetConfig::from_faucet_args(&args.faucet)?;
let snapshot_config = config::SnapshotConfig::from_env(&config.database_url)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate SNAPSHOT_* in check() too.

The new snapshot config only gets parsed in run(). atlas-server check can still report success with SNAPSHOT_ENABLED=true and an invalid time/retention/dir, which defeats the purpose of the validation command.

Suggested fix
 async fn check(args: cli::RunArgs) -> Result<()> {
     init_tracing(&args.log.level, &args.log.format);

     let config = config::Config::from_run_args(args.clone())?;
     config::FaucetConfig::from_faucet_args(&args.faucet)?;
+    config::SnapshotConfig::from_env(&config.database_url)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/main.rs` at line 287, The check() command
currently doesn't validate SNAPSHOT_* variables; call
config::SnapshotConfig::from_env(&config.database_url) (same parsing used in
run()) inside check() and propagate or return the error if parsing/validation
fails so check() fails for invalid SNAPSHOT_ENABLED/time/retention/dir settings;
ensure you reference SnapshotConfig::from_env and mirror the same error handling
used in run() so invalid snapshot env vars are reported by atlas-server check.

Comment on lines +12 to +19
fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration {
let today_target = now.date_naive().and_time(target).and_utc();
let next = if today_target > now {
today_target
} else {
today_target + chrono::Duration::days(1)
};
(next - now).to_std().expect("positive duration")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Run immediately when the scheduler lands exactly on the configured time.

Line 14 uses > rather than >=, so hitting SNAPSHOT_TIME exactly sends the first run to tomorrow and skips a full day’s snapshot. The exactly_now_wraps_to_tomorrow test should flip with the fix.

Suggested fix
-    let next = if today_target > now {
+    let next = if today_target >= now {
         today_target
     } else {
         today_target + chrono::Duration::days(1)
     };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration {
let today_target = now.date_naive().and_time(target).and_utc();
let next = if today_target > now {
today_target
} else {
today_target + chrono::Duration::days(1)
};
(next - now).to_std().expect("positive duration")
fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration {
let today_target = now.date_naive().and_time(target).and_utc();
let next = if today_target >= now {
today_target
} else {
today_target + chrono::Duration::days(1)
};
(next - now).to_std().expect("positive duration")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/snapshot.rs` around lines 12 - 19, The
scheduling function duration_until_next uses a strict greater-than so when now
equals the target time it incorrectly schedules for tomorrow; change the
comparison in duration_until_next to use >= (i.e., treat today_target >= now as
“next” is today) so the scheduler runs immediately at the configured time, and
update the unit test exactly_now_wraps_to_tomorrow to expect immediate (today)
scheduling rather than wrapping to tomorrow.

Comment on lines +34 to +40
let dump_status = tokio::process::Command::new("pg_dump")
.arg("--dbname")
.arg(db_url)
.arg("-Fc")
.arg("-f")
.arg(&dump_path)
.status()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the test file exists and view the relevant lines
fd -t f "snapshots.rs" | head -5

Repository: evstack/atlas

Length of output: 115


🏁 Script executed:

# Check the structure around the test file
git ls-files backend/crates/atlas-server/tests/ | head -20

Repository: evstack/atlas

Length of output: 680


🏁 Script executed:

# Read the test file to see lines 34-40, 52-57, 60-64
wc -l backend/crates/atlas-server/tests/integration/snapshots.rs

Repository: evstack/atlas

Length of output: 118


🏁 Script executed:

# Read the test file content with line numbers
cat -n backend/crates/atlas-server/tests/integration/snapshots.rs

Repository: evstack/atlas

Length of output: 3686


🏁 Script executed:

# Search for portable_pg_dump_command_async in the codebase
rg "portable_pg_dump_command_async" --type rust

Repository: evstack/atlas

Length of output: 264


🏁 Script executed:

# Also search for pg_dump usage in production code
rg "pg_dump|pg_restore" backend/crates/atlas-server/src --type rust -B 2 -A 2

Repository: evstack/atlas

Length of output: 4668


🏁 Script executed:

# Check what's in main.rs for snapshot/dump related code
rg -i "snapshot|dump|restore" backend/crates/atlas-server/src/main.rs --type rust -B 2 -A 2 | head -80

Repository: evstack/atlas

Length of output: 2612


🏁 Script executed:

# Let's find the postgres_command and postgres_command_async implementations
rg "fn postgres_command" backend/crates/atlas-server/src --type rust -A 10 | head -80

Repository: evstack/atlas

Length of output: 1611


🏁 Script executed:

# Also check for postgres_connection_config to understand how env vars are set
rg "postgres_connection_config" backend/crates/atlas-server/src --type rust -A 15 | head -100

Repository: evstack/atlas

Length of output: 7762


🏁 Script executed:

# Check if there's a portable_pg_restore_command function
rg "portable_pg_restore|PORTABLE_PG_RESTORE" backend/crates/atlas-server/src --type rust -B 2 -A 5

Repository: evstack/atlas

Length of output: 1903


🏁 Script executed:

# Also check what PG_ENV_VARS contains
rg "PG_ENV_VARS" backend/crates/atlas-server/src --type rust -B 2 -A 10

Repository: evstack/atlas

Length of output: 2519


🏁 Script executed:

# Quick check: search for how cmd_db_restore actually uses the config
rg "cmd_db_restore|fn cmd_db_restore" backend/crates/atlas-server/src/main.rs -A 30 | head -50

Repository: evstack/atlas

Length of output: 2013


Use the production command builders for dump and restore to match actual behavior.

The test bypasses the production safeguards. Lines 34–40 and 60–64 invoke raw pg_dump/pg_restore with --dbname <URL>, exposing credentials in the process command line. Production parses the URL into PG_* environment variables (PGPASSWORD, PGUSER, etc.), which are not visible in process listings.

Additionally, the test omits the portable flags required in production: --no-owner, --no-acl for dump, and --exit-on-error for restore (critical for fail-fast on error). This means the test can pass while the actual snapshot/restore flow still breaks in another environment.

Use postgres_connection_config() to parse the database URL, then route through portable_pg_dump_command_async() and the production restore path with PORTABLE_PG_RESTORE_FLAGS. This also fixes the brittle restore URL derivation at line 57 (the .replace("/postgres", "/test_restore") approach).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 34 -
40, The test currently calls pg_dump/pg_restore directly with --dbname and a
brittle URL mutation; update it to parse the URL via
postgres_connection_config() and build commands using
portable_pg_dump_command_async() for the dump and the production restore path
using PORTABLE_PG_RESTORE_FLAGS for restore so credentials are passed via PG_*
env vars and portable flags (--no-owner, --no-acl, --exit-on-error) are applied;
replace the manual .replace("/postgres", "/test_restore") logic with
constructing a proper ConnectionConfig via postgres_connection_config() and feed
that into the portable command builders to mirror production behavior and avoid
exposing credentials in the process list.

Comment on lines +52 to +57
sqlx::query("CREATE DATABASE test_restore")
.execute(&pool)
.await
.expect("create test_restore database");

let restore_url = db_url.replace("/postgres", "/test_restore");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't derive the restore target with a string replace.

Line 57 only works when the source URL path is literally /postgres, but common::database_url() can come from ATLAS_TEST_DATABASE_URL. It also hardcodes test_restore, so reruns or parallel executions can collide. Parse the URL, set a unique database name, and build the restore URL from that instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 52 -
57, The code currently derives restore_url by string-replacing "/postgres" on
db_url (variable restore_url and db_url), which fails for non-standard paths and
can collide; instead parse db_url with a URL parser (e.g., url::Url), generate a
unique database name (use UUID or timestamp) for the restore target (instead of
hardcoded "test_restore"), set the parsed URL's path/last path segment to that
unique name (using path_segments_mut or equivalent), and then reconstruct
restore_url from the parsed URL before running sqlx::query to create and use the
new database; ensure the same generated name is used both for CREATE DATABASE
and for building restore_url.

@tac0turtle tac0turtle merged commit 8653e81 into main Apr 8, 2026
10 checks passed
@tac0turtle tac0turtle deleted the issue-23 branch April 8, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: snapshots

2 participants