Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a configurable, time-based PostgreSQL snapshot scheduler with retention cleanup, CLI/config parsing updates, Docker and docker-compose support for snapshot storage, unit and integration tests (including an ignored dump/restore round-trip), and minor build/test tooling changes. Changes
Sequence DiagramsequenceDiagram
participant Scheduler as Scheduler (tokio sleep)
participant SnapLoop as run_snapshot_loop
participant PGDump as pg_dump process
participant FS as File System
participant Cleanup as cleanup_old_snapshots
participant Backoff as Retry Backoff
loop daily schedule
Scheduler->>SnapLoop: Wake at scheduled UTC time
SnapLoop->>FS: Ensure snapshot dir exists
SnapLoop->>PGDump: Spawn pg_dump -> write to `.tmp`
PGDump->>FS: Write dump data to temp file
alt pg_dump succeeds
FS->>FS: Rename `.tmp` -> `.dump` (atomic)
SnapLoop->>Cleanup: Prune old dumps (keep retention, ignore `.tmp`)
Cleanup->>FS: List/sort/remove old files
SnapLoop->>Backoff: Reset retry state
else pg_dump fails
SnapLoop->>FS: Remove `.tmp` if present
SnapLoop->>Backoff: Wait backoff interval (5s,10s,20s,30s,60s)
Backoff->>SnapLoop: Retry
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
backend/crates/atlas-server/Cargo.toml (1)
48-48: Use workspace-managed versioning fortempfile.Please avoid pinning
tempfilein the crate manifest and reference it via workspace instead, so dependency versions stay centralized.Suggested change
[dev-dependencies] testcontainers = { workspace = true } testcontainers-modules = { workspace = true } tokio = { workspace = true } tower = { workspace = true, features = ["util"] } serde_json = { workspace = true } sqlx = { workspace = true } -tempfile = "3" +tempfile = { workspace = true }Based on learnings: Keep all Rust workspace dependency versions in the root
Cargo.toml.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/Cargo.toml` at line 48, Replace the hard-pinned tempfile = "3" entry in this crate's Cargo.toml with a workspace reference by removing the version and using the workspace-managed dependency (i.e., change the tempfile entry to reference the workspace dependency without a version string), so the crate relies on the version declared in the workspace root; look for the tempfile entry in backend/crates/atlas-server/Cargo.toml and remove the pinned version to defer to workspace-managed versions.backend/crates/atlas-server/tests/integration/snapshots.rs (2)
57-57: Fragile database URL substitution.The
replace("/postgres", "/test_restore")pattern assumes the source database is always namedpostgresand that/postgresdoesn't appear elsewhere in the URL (e.g., in a password or hostname). Consider parsing the URL properly or using a more targeted replacement at the path end.♻️ Suggested fix using URL parsing
- let restore_url = db_url.replace("/postgres", "/test_restore"); + let mut parsed = url::Url::parse(db_url).expect("parse DATABASE_URL"); + parsed.set_path("/test_restore"); + let restore_url = parsed.to_string();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/tests/integration/snapshots.rs` at line 57, The db URL substitution using db_url.replace("/postgres", "/test_restore") is fragile; instead parse db_url (e.g., with the Url type from the url crate or a PostgreSQL connection string parser) and replace only the path component to "test_restore" before rebuilding the string so you don't accidentally modify passwords/hosts; update the code that constructs restore_url (reference variable restore_url and source db_url) to parse, set url.path_segments_mut() or equivalent to the single segment "test_restore", and then serialize the URL back to a string for use in the test.
88-93: Cleanup may be skipped if assertions fail.If any assertion fails before line 90, the
test_restoredatabase won't be dropped, potentially leaving artifacts that could cause subsequent test runs to fail (e.g.,CREATE DATABASE test_restorewould error). Consider wrapping cleanup in a guard or using a drop-based cleanup pattern.♻️ Consider a cleanup guard or defer-style pattern
One approach is to use
defersemantics viascopeguardor a custom struct that drops the database inDrop. Alternatively, add aDROP DATABASE IF EXISTS test_restoreat the start of the test to handle leftover state from previous failed runs:+ // Clean up any leftover database from previous failed runs + let _ = sqlx::query("DROP DATABASE IF EXISTS test_restore") + .execute(pool) + .await; + // Create a separate database for restore sqlx::query("CREATE DATABASE test_restore")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 88 - 93, The test currently drops the test database with sqlx::query("DROP DATABASE test_restore").execute(pool).await after calling restore_pool.close().await, but this cleanup is skipped if earlier assertions panic; refactor to guarantee cleanup by introducing a guard (e.g., a small struct with Drop impl or using scopeguard) that holds the pool/connection and runs sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(...) in Drop, or alternatively run sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(pool).await at the start of the test to ensure idempotent cleanup; update references to restore_pool.close().await and the existing sqlx::query(...) invocation to be performed via that guard so the database is removed even on test failure.backend/crates/atlas-server/src/snapshot.rs (2)
127-129: Use idiomatic descending sort.Per coding guidelines, prefer idiomatic Rust patterns. The two-step
sort()thenreverse()can be combined into a single descending sort.♻️ Suggested fix
- files.sort(); - files.reverse(); + files.sort_by(|a, b| b.cmp(a));As per coding guidelines: "Use idiomatic Rust patterns: prefer .min(), .max(), |=, += over manual if/assign statements"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, Replace the two-step ascending-then-reverse sequence operating on the `files` collection (calls to `files.sort()` followed by `files.reverse()`) with a single idiomatic descending sort; update the code to call a single sort variant that orders descending (e.g., use `sort_by`/`sort_unstable_by` with `b.cmp(a)` or an appropriate key-based descending comparator) so the collection is sorted newest-first in one pass.
47-54: Consider adding a timeout to pg_dump execution.If
pg_dumphangs (e.g., database unresponsive, network issues), the snapshot loop will block indefinitely, preventing future snapshots. Consider wrapping the command withtokio::time::timeout.♻️ Suggested fix
+ let pg_dump_timeout = Duration::from_secs(3600); // 1 hour max + let status = tokio::time::timeout( + pg_dump_timeout, + tokio::process::Command::new("pg_dump") .arg("--dbname") .arg(&config.database_url) .arg("-Fc") .arg("-f") .arg(&tmp_path) .status() - .await?; + ) + .await + .map_err(|_| anyhow::anyhow!("pg_dump timed out after {} seconds", pg_dump_timeout.as_secs()))??;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 54, Wrap the blocking pg_dump invocation with tokio::time::timeout to avoid hanging the snapshot loop: instead of awaiting .status() directly on tokio::process::Command, spawn the child with Command::spawn() (reference Command::spawn and the tmp_path/status variables), then await child.wait() inside tokio::time::timeout(Duration::from_secs(...)). If the timeout returns Err(elapsed) call child.kill().await (or child.kill() and then child.wait().await) and return or log a timeout error; on Ok, inspect the ExitStatus as before. Make the timeout duration configurable or a constant and handle both timeout and command errors consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 255-262: SnapshotConfig currently derives Debug and may log
sensitive DB credentials via the database_url field; remove #[derive(Debug)] (or
stop deriving Debug) for SnapshotConfig and implement a manual Debug impl for
the SnapshotConfig struct that formats all fields except database_url (or prints
database_url as a redacted placeholder like "<redacted>") so logs cannot leak
secrets; update any uses that relied on the derived Debug (e.g., fmt::Debug
formatting or debug! calls referencing SnapshotConfig) to continue working with
the new manual impl.
- Around line 293-294: The current env::var call that sets let dir =
env::var("SNAPSHOT_DIR").unwrap_or_else(|_| "/snapshots".to_string()) can
produce an empty string; update the code that reads SNAPSHOT_DIR (the let dir
variable in config.rs) to validate that the resulting string is non-empty when
snapshots are enabled: after obtaining the value (either from env::var or the
default), check if dir.trim().is_empty() and return/configuration-error (or fall
back to "/snapshots") so the configuration fails fast instead of allowing an
empty SNAPSHOT_DIR to propagate into the scheduler loop; ensure the check is
applied in the same function that constructs the snapshot configuration so
callers that enable snapshots receive an explicit error or valid path.
In `@docker-compose.yml`:
- Around line 53-54: The snapshot bind mount using the volume entry "-
${SNAPSHOT_HOST_DIR:-./snapshots}:${SNAPSHOT_DIR:-/snapshots}" can make
/snapshots unwritable for the non-root atlas user because host directory
permissions override container ownership; to fix, either switch to a named
Docker volume (remove SNAPSHOT_HOST_DIR bind and use a named volume) or update
deployment docs and startup to ensure SNAPSHOT_HOST_DIR is pre-created with
correct owner/permissions (matching the atlas UID/GID), or add an init step that
chowns the mounted path to the atlas user before snapshot jobs run; reference
the volume line using SNAPSHOT_HOST_DIR and SNAPSHOT_DIR and update
docker-compose and docs accordingly.
---
Nitpick comments:
In `@backend/crates/atlas-server/Cargo.toml`:
- Line 48: Replace the hard-pinned tempfile = "3" entry in this crate's
Cargo.toml with a workspace reference by removing the version and using the
workspace-managed dependency (i.e., change the tempfile entry to reference the
workspace dependency without a version string), so the crate relies on the
version declared in the workspace root; look for the tempfile entry in
backend/crates/atlas-server/Cargo.toml and remove the pinned version to defer to
workspace-managed versions.
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: Replace the two-step ascending-then-reverse sequence
operating on the `files` collection (calls to `files.sort()` followed by
`files.reverse()`) with a single idiomatic descending sort; update the code to
call a single sort variant that orders descending (e.g., use
`sort_by`/`sort_unstable_by` with `b.cmp(a)` or an appropriate key-based
descending comparator) so the collection is sorted newest-first in one pass.
- Around line 47-54: Wrap the blocking pg_dump invocation with
tokio::time::timeout to avoid hanging the snapshot loop: instead of awaiting
.status() directly on tokio::process::Command, spawn the child with
Command::spawn() (reference Command::spawn and the tmp_path/status variables),
then await child.wait() inside tokio::time::timeout(Duration::from_secs(...)).
If the timeout returns Err(elapsed) call child.kill().await (or child.kill() and
then child.wait().await) and return or log a timeout error; on Ok, inspect the
ExitStatus as before. Make the timeout duration configurable or a constant and
handle both timeout and command errors consistently.
In `@backend/crates/atlas-server/tests/integration/snapshots.rs`:
- Line 57: The db URL substitution using db_url.replace("/postgres",
"/test_restore") is fragile; instead parse db_url (e.g., with the Url type from
the url crate or a PostgreSQL connection string parser) and replace only the
path component to "test_restore" before rebuilding the string so you don't
accidentally modify passwords/hosts; update the code that constructs restore_url
(reference variable restore_url and source db_url) to parse, set
url.path_segments_mut() or equivalent to the single segment "test_restore", and
then serialize the URL back to a string for use in the test.
- Around line 88-93: The test currently drops the test database with
sqlx::query("DROP DATABASE test_restore").execute(pool).await after calling
restore_pool.close().await, but this cleanup is skipped if earlier assertions
panic; refactor to guarantee cleanup by introducing a guard (e.g., a small
struct with Drop impl or using scopeguard) that holds the pool/connection and
runs sqlx::query("DROP DATABASE IF EXISTS test_restore").execute(...) in Drop,
or alternatively run sqlx::query("DROP DATABASE IF EXISTS
test_restore").execute(pool).await at the start of the test to ensure idempotent
cleanup; update references to restore_pool.close().await and the existing
sqlx::query(...) invocation to be performed via that guard so the database is
removed even on test failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f400324c-4e91-4be0-b747-26e65caf32eb
📒 Files selected for processing (10)
.env.examplebackend/Dockerfilebackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/config.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/src/snapshot.rsbackend/crates/atlas-server/tests/integration/common.rsbackend/crates/atlas-server/tests/integration/main.rsbackend/crates/atlas-server/tests/integration/snapshots.rsdocker-compose.yml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/crates/atlas-server/src/config.rs (1)
293-295: Consider documenting thatSNAPSHOT_TIMEis interpreted as UTC.Based on context from
snapshot.rs, theNaiveTimevalue is interpreted as UTC (viaand_utc()). Operators in non-UTC timezones may expect local time. A brief comment here or in.env.examplewould clarify.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/config.rs` around lines 293 - 295, Document that SNAPSHOT_TIME is interpreted as UTC: update the comment near the NaiveTime parsing in config.rs (around the time_str/NaiveTime::parse_from_str code) to state explicitly that the parsed NaiveTime will be treated as UTC (snapshot.rs uses .and_utc()), and add a matching note in .env.example or README to warn operators in non-UTC timezones that times are UTC. Ensure the comment references SNAPSHOT_TIME, NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior so readers can find the related code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 293-295: Document that SNAPSHOT_TIME is interpreted as UTC: update
the comment near the NaiveTime parsing in config.rs (around the
time_str/NaiveTime::parse_from_str code) to state explicitly that the parsed
NaiveTime will be treated as UTC (snapshot.rs uses .and_utc()), and add a
matching note in .env.example or README to warn operators in non-UTC timezones
that times are UTC. Ensure the comment references SNAPSHOT_TIME,
NaiveTime::parse_from_str, and snapshot.rs's .and_utc() behavior so readers can
find the related code paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae8d11ee-2bc2-47c8-9c61-44b90aed9b3f
📒 Files selected for processing (3)
.env.examplebackend/crates/atlas-server/src/config.rsdocker-compose.yml
✅ Files skipped from review due to trivial changes (1)
- .env.example
🚧 Files skipped from review as they are similar to previous changes (1)
- docker-compose.yml
Add opt-in daily pg_dump snapshots to protect against data loss. Runs as a background tokio task with configurable time, retention, and output directory. Includes integration test for dump/restore round-trip.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/crates/atlas-server/src/snapshot.rs (2)
127-129: Consider usingsort_byfor descending sort.Per coding guidelines for idiomatic Rust, a single
sort_bycall with reverse comparison is slightly more efficient and clearer thansort()followed byreverse().♻️ Proposed change
// Sort descending (newest first) — timestamp in filename gives lexicographic order - files.sort(); - files.reverse(); + files.sort_by(|a, b| b.cmp(a));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/snapshot.rs` around lines 127 - 129, The code currently sorts the vector of filenames with files.sort() and then files.reverse(), which is less idiomatic and slightly less efficient; replace that pair with a single descending sort call (e.g., use files.sort_by or files.sort_unstable_by with a reversed comparison like comparing b to a) so the filenames are sorted newest-first in one step, updating the sorting of the variable files where files.sort() and files.reverse() are used.
47-64: Consider capturingpg_dumpstderr for better error diagnostics.When
pg_dumpfails, only the exit status is logged. Capturing stderr would provide more actionable error messages (e.g., connection refused, permission denied, disk full).♻️ Proposed enhancement to capture stderr
- let status = tokio::process::Command::new("pg_dump") + let output = tokio::process::Command::new("pg_dump") .arg("--dbname") .arg(&config.database_url) .arg("-Fc") .arg("-f") .arg(&tmp_path) - .status() + .output() .await?; - if status.success() { + if output.status.success() { tokio::fs::rename(&tmp_path, &final_path).await?; tracing::info!(%filename, "Snapshot complete"); cleanup_old_snapshots(&config.dir, config.retention).await; Ok(()) } else { let _ = tokio::fs::remove_file(&tmp_path).await; - bail!("pg_dump failed with status: {status}"); + let stderr = String::from_utf8_lossy(&output.stderr); + bail!("pg_dump failed with status {}: {}", output.status, stderr.trim()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/snapshot.rs` around lines 47 - 64, The pg_dump invocation currently only checks exit status, losing stderr details; change from tokio::process::Command::status() to .output().await to capture Output { stdout, stderr }, then on success proceed with tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename, "Snapshot complete") and cleanup_old_snapshots(&config.dir, config.retention).await; on failure remove the tmp file as before but include the captured stderr (convert bytes to string safely) in the error/bail message (and log it via tracing::error or include in bail!("pg_dump failed: {}: {}", status, stderr_text)), referencing the existing variables tmp_path, final_path, status/output, and cleanup_old_snapshots to locate where to modify behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 127-129: The code currently sorts the vector of filenames with
files.sort() and then files.reverse(), which is less idiomatic and slightly less
efficient; replace that pair with a single descending sort call (e.g., use
files.sort_by or files.sort_unstable_by with a reversed comparison like
comparing b to a) so the filenames are sorted newest-first in one step, updating
the sorting of the variable files where files.sort() and files.reverse() are
used.
- Around line 47-64: The pg_dump invocation currently only checks exit status,
losing stderr details; change from tokio::process::Command::status() to
.output().await to capture Output { stdout, stderr }, then on success proceed
with tokio::fs::rename(&tmp_path, &final_path) and tracing::info!(%filename,
"Snapshot complete") and cleanup_old_snapshots(&config.dir,
config.retention).await; on failure remove the tmp file as before but include
the captured stderr (convert bytes to string safely) in the error/bail message
(and log it via tracing::error or include in bail!("pg_dump failed: {}: {}",
status, stderr_text)), referencing the existing variables tmp_path, final_path,
status/output, and cleanup_old_snapshots to locate where to modify behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 40200aef-ea90-4e71-aa1d-ba07c7bba016
📒 Files selected for processing (11)
.env.example.gitignorebackend/Dockerfilebackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/config.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/src/snapshot.rsbackend/crates/atlas-server/tests/integration/common.rsbackend/crates/atlas-server/tests/integration/main.rsbackend/crates/atlas-server/tests/integration/snapshots.rsdocker-compose.yml
✅ Files skipped from review due to trivial changes (5)
- .gitignore
- backend/crates/atlas-server/tests/integration/main.rs
- .env.example
- backend/crates/atlas-server/src/main.rs
- backend/crates/atlas-server/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (3)
- docker-compose.yml
- backend/Dockerfile
- backend/crates/atlas-server/tests/integration/snapshots.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/crates/atlas-server/src/snapshot.rs (1)
124-126: Consider usingsort_byfor descending order in a single call.This is a minor style suggestion. The current approach works correctly.
♻️ Optional refactor
- // Sort descending (newest first) — timestamp in filename gives lexicographic order - files.sort(); - files.reverse(); + // Sort descending (newest first) — timestamp in filename gives lexicographic order + files.sort_by(|a, b| b.cmp(a));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/snapshot.rs` around lines 124 - 126, Replace the two-step descending sort (calling files.sort() then files.reverse()) with a single descending sort call on the files collection; use files.sort_by with a comparator that compares b to a (or files.sort_unstable_by for a faster unstable sort) to achieve descending order in one call, updating the snapshot.rs location where the files variable is sorted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/crates/atlas-server/src/main.rs`:
- Around line 359-371: The CI formatting failed due to an extra blank line after
the snapshot scheduler block; edit the block that uses snapshot_config and
tokio::spawn with run_with_retry calling snapshot::run_snapshot_loop and remove
the stray blank line (ensure the closing brace of the if block is directly
followed by the next code or EOF), then re-run rustfmt/cargo fmt to verify
formatting passes.
---
Nitpick comments:
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 124-126: Replace the two-step descending sort (calling
files.sort() then files.reverse()) with a single descending sort call on the
files collection; use files.sort_by with a comparator that compares b to a (or
files.sort_unstable_by for a faster unstable sort) to achieve descending order
in one call, updating the snapshot.rs location where the files variable is
sorted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5c114d8d-c004-4b6e-8d3a-4b6b1d4e004b
📒 Files selected for processing (2)
backend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/src/snapshot.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/crates/atlas-server/src/cli.rs`:
- Around line 403-423: The test
empty_faucet_cooldown_env_is_ignored_when_faucet_disabled mutates process-wide
env vars unsafely; wrap the env set/remove operations with the existing ENV_LOCK
to prevent races by acquiring the lock at the start of the test (e.g., let _lock
= ENV_LOCK.lock().unwrap();) before any std::env::set_var/std::env::remove_var
calls and keep the lock held for the duration of the test so all env mutations
are synchronized with other tests that use ENV_LOCK from config.rs.
In `@backend/crates/atlas-server/src/config.rs`:
- Around line 679-697: This test mutates FAUCET_PRIVATE_KEY without acquiring
the shared ENV_LOCK used by other environment-variable tests; protect the
mutation by acquiring the same ENV_LOCK before setting/removing the env var (or
move the test into the tests module that already defines ENV_LOCK) so the
setup/teardown around FaucetConfig::from_faucet_args is synchronized—e.g.,
obtain ENV_LOCK (from the tests module) at the top of
faucet_blank_cooldown_is_treated_as_missing, then set FAUCET_PRIVATE_KEY, call
FaucetConfig::from_faucet_args, and remove the var before dropping the lock.
In `@backend/crates/atlas-server/src/main.rs`:
- Line 287: The check() command currently doesn't validate SNAPSHOT_* variables;
call config::SnapshotConfig::from_env(&config.database_url) (same parsing used
in run()) inside check() and propagate or return the error if parsing/validation
fails so check() fails for invalid SNAPSHOT_ENABLED/time/retention/dir settings;
ensure you reference SnapshotConfig::from_env and mirror the same error handling
used in run() so invalid snapshot env vars are reported by atlas-server check.
In `@backend/crates/atlas-server/src/snapshot.rs`:
- Around line 12-19: The scheduling function duration_until_next uses a strict
greater-than so when now equals the target time it incorrectly schedules for
tomorrow; change the comparison in duration_until_next to use >= (i.e., treat
today_target >= now as “next” is today) so the scheduler runs immediately at the
configured time, and update the unit test exactly_now_wraps_to_tomorrow to
expect immediate (today) scheduling rather than wrapping to tomorrow.
In `@backend/crates/atlas-server/tests/integration/snapshots.rs`:
- Around line 52-57: The code currently derives restore_url by string-replacing
"/postgres" on db_url (variable restore_url and db_url), which fails for
non-standard paths and can collide; instead parse db_url with a URL parser
(e.g., url::Url), generate a unique database name (use UUID or timestamp) for
the restore target (instead of hardcoded "test_restore"), set the parsed URL's
path/last path segment to that unique name (using path_segments_mut or
equivalent), and then reconstruct restore_url from the parsed URL before running
sqlx::query to create and use the new database; ensure the same generated name
is used both for CREATE DATABASE and for building restore_url.
- Around line 34-40: The test currently calls pg_dump/pg_restore directly with
--dbname and a brittle URL mutation; update it to parse the URL via
postgres_connection_config() and build commands using
portable_pg_dump_command_async() for the dump and the production restore path
using PORTABLE_PG_RESTORE_FLAGS for restore so credentials are passed via PG_*
env vars and portable flags (--no-owner, --no-acl, --exit-on-error) are applied;
replace the manual .replace("/postgres", "/test_restore") logic with
constructing a proper ConnectionConfig via postgres_connection_config() and feed
that into the portable command builders to mirror production behavior and avoid
exposing credentials in the process list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0e491a96-4d68-42ba-b222-8680de2ea40e
📒 Files selected for processing (8)
.env.examplebackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/cli.rsbackend/crates/atlas-server/src/config.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/src/snapshot.rsbackend/crates/atlas-server/tests/integration/common.rsbackend/crates/atlas-server/tests/integration/snapshots.rs
✅ Files skipped from review due to trivial changes (2)
- backend/crates/atlas-server/Cargo.toml
- .env.example
| #[test] | ||
| fn empty_faucet_cooldown_env_is_ignored_when_faucet_disabled() { | ||
| unsafe { | ||
| std::env::set_var("DATABASE_URL", "postgres://atlas:atlas@localhost/atlas"); | ||
| std::env::set_var("RPC_URL", "http://localhost:8545"); | ||
| std::env::set_var("FAUCET_ENABLED", "false"); | ||
| std::env::set_var("FAUCET_COOLDOWN_MINUTES", ""); | ||
| } | ||
|
|
||
| let cli = Cli::try_parse_from(["atlas-server", "run"]).expect("parse cli"); | ||
| match cli.command { | ||
| Command::Run(args) => assert_eq!(args.faucet.cooldown_minutes, Some(String::new())), | ||
| _ => panic!("expected run command"), | ||
| } | ||
|
|
||
| unsafe { | ||
| std::env::remove_var("DATABASE_URL"); | ||
| std::env::remove_var("RPC_URL"); | ||
| std::env::remove_var("FAUCET_ENABLED"); | ||
| std::env::remove_var("FAUCET_COOLDOWN_MINUTES"); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the exact test code at lines 403-423
head -425 backend/crates/atlas-server/src/cli.rs | tail -23 | cat -nRepository: evstack/atlas
Length of output: 1114
🏁 Script executed:
# Search for other tests in this file that mutate env vars
rg "set_var|remove_var" backend/crates/atlas-server/src/cli.rs -B 3 -A 1Repository: evstack/atlas
Length of output: 689
🏁 Script executed:
# Look for mutex/lock patterns used for env synchronization elsewhere in the crate
rg "Mutex|RwLock|lazy_static|once_cell" backend/crates/atlas-server/ -t rs --max-count 20Repository: evstack/atlas
Length of output: 83
🏁 Script executed:
# Check if there's a scoped env helper in this crate
fd "env" backend/crates/atlas-server/src/ -t rs | head -20Repository: evstack/atlas
Length of output: 226
🏁 Script executed:
# Search for "scoped" or "helper" patterns related to env in the broader crate
rg "scoped|helper.*env" backend/crates/ -t rs -i --max-count 15Repository: evstack/atlas
Length of output: 83
🏁 Script executed:
# Search for Mutex, lazy_static, once_cell without specifying file type
rg "Mutex|lazy_static|once_cell" backend/crates/atlas-server/ -B 2 -A 1 | head -50Repository: evstack/atlas
Length of output: 1268
🏁 Script executed:
# Look for other #[test] functions to see how they handle state
rg "#\[test\]" backend/crates/atlas-server/src/cli.rs -A 20 | head -80Repository: evstack/atlas
Length of output: 945
🏁 Script executed:
# Search broader backend crates for env-related test patterns
rg "set_var|remove_var" backend/crates/ -B 5 -A 5 | head -60Repository: evstack/atlas
Length of output: 4355
🏁 Script executed:
# Check if there's any synchronization mechanism defined in lib.rs or mod.rs files
find backend/crates/atlas-server/src -name "lib.rs" -o -name "mod.rs" | xargs cat 2>/dev/null | head -100Repository: evstack/atlas
Length of output: 3082
🏁 Script executed:
# Search for where ENV_LOCK is actually used/locked in tests
rg "ENV_LOCK" backend/crates/atlas-server/src/config.rs -B 2 -A 5Repository: evstack/atlas
Length of output: 6343
🏁 Script executed:
# Get the full test function that uses env vars in config.rs to see if it acquires the lock
rg "#\[test\]" backend/crates/atlas-server/src/config.rs -A 30 | head -80Repository: evstack/atlas
Length of output: 2747
Guard env mutations with the established ENV_LOCK pattern.
This test mutates process-global env vars without synchronization. The crate already defines static ENV_LOCK: Mutex<()> in config.rs and uses it consistently across all env-mutating tests (e.g., let _lock = ENV_LOCK.lock().unwrap();). Apply the same pattern here to prevent races with parallel test execution.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/src/cli.rs` around lines 403 - 423, The test
empty_faucet_cooldown_env_is_ignored_when_faucet_disabled mutates process-wide
env vars unsafely; wrap the env set/remove operations with the existing ENV_LOCK
to prevent races by acquiring the lock at the start of the test (e.g., let _lock
= ENV_LOCK.lock().unwrap();) before any std::env::set_var/std::env::remove_var
calls and keep the lock held for the duration of the test so all env mutations
are synchronized with other tests that use ENV_LOCK from config.rs.
| #[test] | ||
| fn faucet_blank_cooldown_is_treated_as_missing() { | ||
| let mut args = minimal_run_args(); | ||
| args.faucet.enabled = true; | ||
| args.faucet.amount = Some("0.1".to_string()); | ||
| args.faucet.cooldown_minutes = Some(" ".to_string()); | ||
|
|
||
| unsafe { | ||
| env::set_var( | ||
| "FAUCET_PRIVATE_KEY", | ||
| "0x59c6995e998f97a5a0044966f0945382dbd8c5df5440d8d6d0d0f66f6d7d6a0d", | ||
| ); | ||
| } | ||
| let err = FaucetConfig::from_faucet_args(&args.faucet).unwrap_err(); | ||
| assert!(err.to_string().contains("cooldown-minutes")); | ||
| unsafe { | ||
| env::remove_var("FAUCET_PRIVATE_KEY"); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and check its size
wc -l backend/crates/atlas-server/src/config.rsRepository: evstack/atlas
Length of output: 103
🏁 Script executed:
# Search for ENV_LOCK in the file
rg "ENV_LOCK" backend/crates/atlas-server/src/config.rs -B 2 -A 2Repository: evstack/atlas
Length of output: 3562
🏁 Script executed:
# Look at the tests module structure and how env is managed
rg "mod tests" backend/crates/atlas-server/src/config.rs -A 50 | head -80Repository: evstack/atlas
Length of output: 2620
🏁 Script executed:
# Check the specific lines 679-697 and surrounding context to see if ENV_LOCK is used
sed -n '670,710p' backend/crates/atlas-server/src/config.rsRepository: evstack/atlas
Length of output: 1255
Protect this env-driven test with the shared env lock.
This test writes FAUCET_PRIVATE_KEY directly without acquiring ENV_LOCK. Since ENV_LOCK synchronizes all other environment-variable tests in the tests module below, this unprotected mutation can race and cause nondeterministic failures. Either move ENV_LOCK to a shared location accessible from both tests_from_run_args and tests, or relocate this test to the tests module where the lock is defined.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/src/config.rs` around lines 679 - 697, This test
mutates FAUCET_PRIVATE_KEY without acquiring the shared ENV_LOCK used by other
environment-variable tests; protect the mutation by acquiring the same ENV_LOCK
before setting/removing the env var (or move the test into the tests module that
already defines ENV_LOCK) so the setup/teardown around
FaucetConfig::from_faucet_args is synchronized—e.g., obtain ENV_LOCK (from the
tests module) at the top of faucet_blank_cooldown_is_treated_as_missing, then
set FAUCET_PRIVATE_KEY, call FaucetConfig::from_faucet_args, and remove the var
before dropping the lock.
|
|
||
| let config = config::Config::from_run_args(args.clone())?; | ||
| let faucet_config = config::FaucetConfig::from_faucet_args(&args.faucet)?; | ||
| let snapshot_config = config::SnapshotConfig::from_env(&config.database_url)?; |
There was a problem hiding this comment.
Validate SNAPSHOT_* in check() too.
The new snapshot config only gets parsed in run(). atlas-server check can still report success with SNAPSHOT_ENABLED=true and an invalid time/retention/dir, which defeats the purpose of the validation command.
Suggested fix
async fn check(args: cli::RunArgs) -> Result<()> {
init_tracing(&args.log.level, &args.log.format);
let config = config::Config::from_run_args(args.clone())?;
config::FaucetConfig::from_faucet_args(&args.faucet)?;
+ config::SnapshotConfig::from_env(&config.database_url)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/src/main.rs` at line 287, The check() command
currently doesn't validate SNAPSHOT_* variables; call
config::SnapshotConfig::from_env(&config.database_url) (same parsing used in
run()) inside check() and propagate or return the error if parsing/validation
fails so check() fails for invalid SNAPSHOT_ENABLED/time/retention/dir settings;
ensure you reference SnapshotConfig::from_env and mirror the same error handling
used in run() so invalid snapshot env vars are reported by atlas-server check.
| fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration { | ||
| let today_target = now.date_naive().and_time(target).and_utc(); | ||
| let next = if today_target > now { | ||
| today_target | ||
| } else { | ||
| today_target + chrono::Duration::days(1) | ||
| }; | ||
| (next - now).to_std().expect("positive duration") |
There was a problem hiding this comment.
Run immediately when the scheduler lands exactly on the configured time.
Line 14 uses > rather than >=, so hitting SNAPSHOT_TIME exactly sends the first run to tomorrow and skips a full day’s snapshot. The exactly_now_wraps_to_tomorrow test should flip with the fix.
Suggested fix
- let next = if today_target > now {
+ let next = if today_target >= now {
today_target
} else {
today_target + chrono::Duration::days(1)
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration { | |
| let today_target = now.date_naive().and_time(target).and_utc(); | |
| let next = if today_target > now { | |
| today_target | |
| } else { | |
| today_target + chrono::Duration::days(1) | |
| }; | |
| (next - now).to_std().expect("positive duration") | |
| fn duration_until_next(target: NaiveTime, now: DateTime<Utc>) -> Duration { | |
| let today_target = now.date_naive().and_time(target).and_utc(); | |
| let next = if today_target >= now { | |
| today_target | |
| } else { | |
| today_target + chrono::Duration::days(1) | |
| }; | |
| (next - now).to_std().expect("positive duration") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/src/snapshot.rs` around lines 12 - 19, The
scheduling function duration_until_next uses a strict greater-than so when now
equals the target time it incorrectly schedules for tomorrow; change the
comparison in duration_until_next to use >= (i.e., treat today_target >= now as
“next” is today) so the scheduler runs immediately at the configured time, and
update the unit test exactly_now_wraps_to_tomorrow to expect immediate (today)
scheduling rather than wrapping to tomorrow.
| let dump_status = tokio::process::Command::new("pg_dump") | ||
| .arg("--dbname") | ||
| .arg(db_url) | ||
| .arg("-Fc") | ||
| .arg("-f") | ||
| .arg(&dump_path) | ||
| .status() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the test file exists and view the relevant lines
fd -t f "snapshots.rs" | head -5Repository: evstack/atlas
Length of output: 115
🏁 Script executed:
# Check the structure around the test file
git ls-files backend/crates/atlas-server/tests/ | head -20Repository: evstack/atlas
Length of output: 680
🏁 Script executed:
# Read the test file to see lines 34-40, 52-57, 60-64
wc -l backend/crates/atlas-server/tests/integration/snapshots.rsRepository: evstack/atlas
Length of output: 118
🏁 Script executed:
# Read the test file content with line numbers
cat -n backend/crates/atlas-server/tests/integration/snapshots.rsRepository: evstack/atlas
Length of output: 3686
🏁 Script executed:
# Search for portable_pg_dump_command_async in the codebase
rg "portable_pg_dump_command_async" --type rustRepository: evstack/atlas
Length of output: 264
🏁 Script executed:
# Also search for pg_dump usage in production code
rg "pg_dump|pg_restore" backend/crates/atlas-server/src --type rust -B 2 -A 2Repository: evstack/atlas
Length of output: 4668
🏁 Script executed:
# Check what's in main.rs for snapshot/dump related code
rg -i "snapshot|dump|restore" backend/crates/atlas-server/src/main.rs --type rust -B 2 -A 2 | head -80Repository: evstack/atlas
Length of output: 2612
🏁 Script executed:
# Let's find the postgres_command and postgres_command_async implementations
rg "fn postgres_command" backend/crates/atlas-server/src --type rust -A 10 | head -80Repository: evstack/atlas
Length of output: 1611
🏁 Script executed:
# Also check for postgres_connection_config to understand how env vars are set
rg "postgres_connection_config" backend/crates/atlas-server/src --type rust -A 15 | head -100Repository: evstack/atlas
Length of output: 7762
🏁 Script executed:
# Check if there's a portable_pg_restore_command function
rg "portable_pg_restore|PORTABLE_PG_RESTORE" backend/crates/atlas-server/src --type rust -B 2 -A 5Repository: evstack/atlas
Length of output: 1903
🏁 Script executed:
# Also check what PG_ENV_VARS contains
rg "PG_ENV_VARS" backend/crates/atlas-server/src --type rust -B 2 -A 10Repository: evstack/atlas
Length of output: 2519
🏁 Script executed:
# Quick check: search for how cmd_db_restore actually uses the config
rg "cmd_db_restore|fn cmd_db_restore" backend/crates/atlas-server/src/main.rs -A 30 | head -50Repository: evstack/atlas
Length of output: 2013
Use the production command builders for dump and restore to match actual behavior.
The test bypasses the production safeguards. Lines 34–40 and 60–64 invoke raw pg_dump/pg_restore with --dbname <URL>, exposing credentials in the process command line. Production parses the URL into PG_* environment variables (PGPASSWORD, PGUSER, etc.), which are not visible in process listings.
Additionally, the test omits the portable flags required in production: --no-owner, --no-acl for dump, and --exit-on-error for restore (critical for fail-fast on error). This means the test can pass while the actual snapshot/restore flow still breaks in another environment.
Use postgres_connection_config() to parse the database URL, then route through portable_pg_dump_command_async() and the production restore path with PORTABLE_PG_RESTORE_FLAGS. This also fixes the brittle restore URL derivation at line 57 (the .replace("/postgres", "/test_restore") approach).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 34 -
40, The test currently calls pg_dump/pg_restore directly with --dbname and a
brittle URL mutation; update it to parse the URL via
postgres_connection_config() and build commands using
portable_pg_dump_command_async() for the dump and the production restore path
using PORTABLE_PG_RESTORE_FLAGS for restore so credentials are passed via PG_*
env vars and portable flags (--no-owner, --no-acl, --exit-on-error) are applied;
replace the manual .replace("/postgres", "/test_restore") logic with
constructing a proper ConnectionConfig via postgres_connection_config() and feed
that into the portable command builders to mirror production behavior and avoid
exposing credentials in the process list.
| sqlx::query("CREATE DATABASE test_restore") | ||
| .execute(&pool) | ||
| .await | ||
| .expect("create test_restore database"); | ||
|
|
||
| let restore_url = db_url.replace("/postgres", "/test_restore"); |
There was a problem hiding this comment.
Don't derive the restore target with a string replace.
Line 57 only works when the source URL path is literally /postgres, but common::database_url() can come from ATLAS_TEST_DATABASE_URL. It also hardcodes test_restore, so reruns or parallel executions can collide. Parse the URL, set a unique database name, and build the restore URL from that instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/tests/integration/snapshots.rs` around lines 52 -
57, The code currently derives restore_url by string-replacing "/postgres" on
db_url (variable restore_url and db_url), which fails for non-standard paths and
can collide; instead parse db_url with a URL parser (e.g., url::Url), generate a
unique database name (use UUID or timestamp) for the restore target (instead of
hardcoded "test_restore"), set the parsed URL's path/last path segment to that
unique name (using path_segments_mut or equivalent), and then reconstruct
restore_url from the parsed URL before running sqlx::query to create and use the
new database; ensure the same generated name is used both for CREATE DATABASE
and for building restore_url.
Summary
pg_dumpsnapshots as a background tokio task, following the same spawn pattern as the indexer and metadata fetcherSNAPSHOT_ENABLED,SNAPSHOT_TIME(HH:MM UTC),SNAPSHOT_RETENTION(max files kept), andSNAPSHOT_DIRpostgresql16-clientforpg_dump; compose file wired with env vars and volume mountCloses #23
Summary by CodeRabbit
New Features
Documentation
Chores
Tests
Other