⚡ Bolt: [performance improvement] Pre-allocate string instead of vector join in D1 UPSERTs#222
Conversation
…or join in D1 UPSERTs Optimizes D1 UPSERT SQL generation by replacing multiple intermediate String/Vec allocations and string joins with a single pre-allocated String buffer using `std::fmt::Write`. This prevents O(N) heap allocations for hot network-bound codepaths writing to Cloudflare D1. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideRefactors D1 UPSERT SQL generation to build the statement directly into a single pre-allocated String buffer and pre-allocates the parameters vector, eliminating intermediate vectors and join operations, and documents the performance learning in the Bolt notes. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
write!calls currently use.unwrap(), which will panic on formatting errors; consider either ignoring thefmt::Result(since writing to aStringcan't fail) or usingexpectwith a clear message to avoid hidden panics. - You iterate over
values.fieldstwice (once to build columns/params and once for the update clauses); consider building theON CONFLICT DO UPDATEfragments in the same loop that processesvalues.fieldsto avoid the second pass.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `write!` calls currently use `.unwrap()`, which will panic on formatting errors; consider either ignoring the `fmt::Result` (since writing to a `String` can't fail) or using `expect` with a clear message to avoid hidden panics.
- You iterate over `values.fields` twice (once to build columns/params and once for the update clauses); consider building the `ON CONFLICT DO UPDATE` fragments in the same loop that processes `values.fields` to avoid the second pass.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR optimizes D1 UPSERT SQL generation in thread-flow by replacing vector/string assembly + joins with direct writes into a single pre-allocated String, reducing allocations on a hot path.
Changes:
- Refactor
D1ExportContext::build_upsert_stmtto build SQL viastd::fmt::Writeinto a pre-sizedStringand pre-size the params vector. - Record the performance optimization learning in
.jules/bolt.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/flow/src/targets/d1.rs | Refactors UPSERT SQL construction to reduce intermediate allocations by writing directly into a pre-allocated String. |
| .jules/bolt.md | Adds a short performance note describing the SQL generation optimization approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sql.push_str(") ON CONFLICT DO UPDATE SET "); | ||
| let mut first = true; | ||
| for (idx, _value) in values.fields.iter().enumerate() { | ||
| if let Some(value_field) = self.value_fields_schema.get(idx) { | ||
| if !first { sql.push_str(", "); } |
💡 What
Optimized the
build_upsert_stmtfunction incrates/flow/src/targets/d1.rsby refactoring it to directly write to a single pre-allocatedStringbuffer usingstd::fmt::Write. This replaces the previous implementation that allocated multiple intermediate vectors (columns,placeholders,update_clauses) and created temporary Strings before joining them all at the end.🎯 Why
Generating D1 queries happens on a very hot codepath when writing data back to Cloudflare D1 endpoints. The existing approach created multiple intermediate string allocations and vector resize churn per statement. For large tables or frequent updates, these unnecessary heap allocations create measurable memory overhead and GC pressure (if tracking). Writing directly to a buffer with pre-calculated capacity prevents this bottleneck.
📊 Impact
Reduces memory allocation overhead for UPSERT generation from O(N) intermediate structures (where N is the number of keys + values) down to O(1) buffer allocations per query.
🔬 Measurement
Verified functionality using
cargo test -p thread-flowto ensure the dynamically generated SQL exactly matches the expected format, and verified performance improvement theoretically by preventing at least 4 vectors and ~3N string allocations per call. Recorded performance learnings in.jules/bolt.md.PR created automatically by Jules for task 342063582730952543 started by @bashandbone
Summary by Sourcery
Optimize D1 UPSERT SQL generation to reduce string allocations and improve performance on a hot path.
Enhancements:
Documentation: