Skip to content

Added the DynamoDB integration#120

Merged
juanpardo merged 7 commits intomainfrom
jp/dynamodb-integration
Mar 20, 2026
Merged

Added the DynamoDB integration#120
juanpardo merged 7 commits intomainfrom
jp/dynamodb-integration

Conversation

@juanpardo
Copy link
Contributor

@juanpardo juanpardo commented Mar 16, 2026

What's Inside

Documentation for the Cloud DynamoDB integration

Preview

https://crate-cloud--120.org.readthedocs.build/en/120/cluster/integrations/dynamo-cdc.html

Highlights

Checklist

Summary by CodeRabbit

  • Documentation
    • Added comprehensive DynamoDB CDC integration docs with end-to-end setup, two-stage sync (initial full load + continuous CDC), consistency and retention guidance, and column name restrictions.
    • Updated integrations index to include DynamoDB CDC and promoted MongoDB CDC from Preview to GA.
    • Added DynamoDB to supported import formats and clarified three integration types (Full load only, CDC only, Full load + CDC) plus field name/type handling.

@juanpardo juanpardo requested a review from plaharanne March 16, 2026 10:33
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new DynamoDB CDC integration guide describing a two-stage sync (initial full table scan + continuous sync via Kinesis), updates the integrations index to include DynamoDB CDC, renames MongoDB CDC (Preview) to MongoDB CDC, and documents integration modes and automatic field/type inference. (39 words)

Changes

Cohort / File(s) Summary
DynamoDB CDC Documentation
docs/cluster/integrations/dynamo-cdc.md
New guide describing two-stage synchronization (initial full table scan then continuous sync via Kinesis Data Stream), AWS Kinesis setup and retention, CrateDB Cloud setup (secrets, integration modes, source/target mapping), column name restrictions, and retention considerations.
Integration Index & Types
docs/cluster/integrations/index.md
Added DynamoDB CDC to integrations list, renamed MongoDB CDC (Preview) to MongoDB CDC, added documentation for three integration modes (Full load only, CDC only, Full load and CDC), and clarified field-name compatibility and inferred typing/casting behavior.
Import Formats
docs/cluster/import.md
Added DynamoDB to the list of supported import data formats.
Mongo CDC minor edits
docs/cluster/integrations/mongo-cdc.md
Minor capitalization adjustments in integration mode headings (cosmetic text changes).

Sequence Diagram(s)

sequenceDiagram
  participant DynamoDB as DynamoDB
  participant Kinesis as Kinesis Data Stream
  participant Crate as CrateDB Cloud

  DynamoDB->>Kinesis: Initial full table scan (bulk records)
  Kinesis->>Crate: Deliver bulk records -> Crate inserts/upserts
  DynamoDB->>Kinesis: Continuous change events (streamed)
  Kinesis->>Crate: Continuous events -> Crate applies full inserts/upserts
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • Update mongo-cdc.md #119 — Related edits to integrations docs and column-name handling overlapping with MongoDB CDC/integrations content.

Suggested reviewers

  • bmunkholm

Poem

🐰 I hopped through docs with joyful cheer,

Streams and snapshots now appear,
Kinesis hums and records race,
I munch on dots and bracketed space,
Hooray — DynamoCDC is here! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Added the DynamoDB integration' is clear and directly related to the main changeset, which adds comprehensive documentation and integration support for DynamoDB CDC functionality across multiple documentation files.
Description check ✅ Passed The description covers the required template sections: What's Inside (documentation content), Preview (with URL), Highlights (empty but present), and Checklist (with issue link and CLA reminder), though some sections could be more detailed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jp/dynamodb-integration
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
docs/cluster/integrations/index.md (2)

72-81: Consider clarity improvements for integration type descriptions.

The descriptions are clear but could be improved for consistency and conciseness.

♻️ Suggested improvements
 There are 3 different integration types:
-- **Full load only**: Imports all your data and immediately ends after.
-- **CDC only**: It indefinitely listens to CDC (Change Data Capture) events on the 
-  source and applies them into your CrateDB table. Once it reaches the last CDC event 
-  it waits for new events to come.
+- **Full load only**: Imports all your data and stops immediately after completion.
+- **CDC only**: Listens indefinitely to CDC (Change Data Capture) events from the 
+  source and applies them to your CrateDB table. After reaching the last CDC event, 
+  it waits for new events.
 - **Full load and CDC**
-  It imports all the data like the type __full load only__, but once that phase finishes
-  it starts processing CDC events. If the source supports it, it will try to read CDC
-  events starting from right when the import phase started. This way any data alteration
-  during the import phase will be picked up and processed.
+  Imports all data like **full load only**, then processes CDC events after that phase finishes.
+  If the source supports it, the integration reads CDC
+  events from the moment the import phase started, ensuring the system captures and processes
+  any data changes that occurred during import.

Based on learnings: In the cloud-docs repository, ensure Markdown documentation uses a concise, neutral guide tone and reduce unnecessary phrasing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/index.md` around lines 72 - 81, Revise the three
integration type bullet descriptions for consistent, concise, neutral tone:
shorten "Full load only" to "Full load only: Import all source data and then
exit." shorten "CDC only" to "CDC only: Continuously consume and apply Change
Data Capture (CDC) events, waiting for new events when caught up." and reword
"Full load and CDC" to "Full load and CDC: Perform a full import, then switch to
CDC processing; where supported, start CDC from the import start position so
changes during import are captured." Keep the same bullet labels ("Full load
only", "CDC only", "Full load and CDC") so existing references remain valid.

51-56: Consider active voice for system behavior description.

This section uses passive constructions throughout. Rephrasing to active voice would improve clarity and align with documentation guidelines.

♻️ Suggested rephrase
-The data field names will be inferred and made compatible with CrateDB naming 
-restrictions automatically.
-
-The data field types will be inferred as new fields are discovered. If you require data
-to be cast to a specific type, simply create the CrateDB table with the columns and 
-types you want and the system will try to cast it to the types at the destination.
+CrateDB automatically infers data field names and makes them compatible with CrateDB naming 
+restrictions.
+
+CrateDB infers data field types as it discovers new fields. To cast data
+to a specific type, create the CrateDB table with the columns and 
+types you want, and the system attempts to cast values to the destination types.

Based on learnings: In the cloud-docs repository, for Markdown files under docs/, prefer active constructions (subject-verb-object) and clearly state who performs actions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/index.md` around lines 51 - 56, Reword the passive
sentences to active voice: replace "The data field names will be inferred..."
with something like "The system infers data field names and makes them
compatible with CrateDB naming restrictions," change "The data field types will
be inferred..." to "The system infers data field types as new fields are
discovered," and rephrase the cast guidance to an active form such as "If you
need a specific type, create the CrateDB table with the desired columns and
types and the system will try to cast incoming data to those destination types,"
keeping tone consistent with docs/ guidelines.
docs/cluster/integrations/dynamo-cdc.md (3)

22-22: Consider active voice for setup requirement.

The phrase "needs to be set up" uses passive construction. Consider rephrasing to active voice for consistency with the documentation guidelines.

♻️ Suggested rephrase
-   Please note the Kinesis Data Stream needs to be set up before you can use this stage.
+   You must set up the Kinesis Data Stream before using this stage.

Based on learnings: In the cloud-docs repository, for Markdown files under docs/, prefer active constructions (subject-verb-object) and clearly state who performs actions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` at line 22, Replace the passive
sentence "Please note the Kinesis Data Stream needs to be set up before you can
use this stage." with an active-voice instruction such as "Set up the Kinesis
Data Stream before using this stage." Locate that exact sentence in the
docs/cluster/integrations/dynamo-cdc.md content and update it to the active
construction to match documentation guidelines.

116-118: Consider active voice for clarity.

The constraint description uses passive constructions. Consider rephrasing to active voice.

♻️ Suggested rephrase
-Column or property names containing square brackets `[]` are not supported and
-are replaced with `_obkt_` and `_cbkt_` respectively. Likewise, column
-names containing dots `.` are not supported and are replaced with `_dot_`.
+CrateDB does not support column or property names containing square brackets `[]` and
+replaces them with `_obkt_` and `_cbkt_` respectively. Likewise, CrateDB
+replaces dots `.` in column names with `_dot_`.

Based on learnings: In the cloud-docs repository, for Markdown files under docs/, prefer active constructions (subject-verb-object) and clearly state who performs actions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 116 - 118, Reword the
passive sentence "Column or property names containing square brackets `[]` are
not supported and are replaced with `_obkt_` and `_cbkt_` respectively.
Likewise, column names containing dots `.` are not supported and are replaced
with `_dot_`." into active voice and state the actor (the system): e.g. "The
connector rejects column or property names containing square brackets `[]` and
replaces `[` with `_obkt_` and `]` with `_cbkt_`. It also replaces dots `.` in
column names with `_dot_`." Keep the replacement tokens `_obkt_`, `_cbkt_`, and
`_dot_` unchanged and ensure the rephrasing appears in the same paragraph
describing naming constraints.

59-59: Consider active voice for clarity.

The passive construction "has now been created" could be rephrased actively.

♻️ Suggested rephrase
-10. The Kinesis data stream has now been created. If you want to change its retention policy (recommended), click on the tab **Configuration**, then click on **Edit** under the **Data retention** section.
+10. AWS creates the Kinesis data stream. To change its retention policy (recommended), click the **Configuration** tab, then click **Edit** under the **Data retention** section.

Based on learnings: In the cloud-docs repository, for Markdown files under docs/, prefer active constructions (subject-verb-object) and clearly state who performs actions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` at line 59, Replace the passive
sentence "The Kinesis data stream has now been created." with an active-voice
construction that names the actor and uses subject‑verb‑object (for example,
"AWS creates the Kinesis data stream" or "The console creates the Kinesis data
stream"), and update the surrounding sentence so the step reads actively: locate
the sentence "The Kinesis data stream has now been created. If you want to
change its retention policy (recommended), click on the tab **Configuration**,
then click on **Edit** under the **Data retention** section." and rephrase the
first clause to an active form while keeping the guidance about changing
retention policy intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Line 57: The markdown line has a typo in the bold formatting for the field
label "Maximum record size" (it uses one closing asterisk instead of two); edit
the line containing "Choose a **Maximum record size* that can hold more than two
times your biggest DynamoDB table items." to use proper bold markup (i.e.,
surround "Maximum record size" with double asterisks on both sides) so it reads
"Choose a **Maximum record size** that can hold more than two times your biggest
DynamoDB table items."
- Around line 108-111: In the "Create the Integration" section replace the
incorrect reference "MongoDB data" with "DynamoDB data": locate the sentence
that reads 'CrateDB Cloud will now sync your MongoDB data based on the selected
settings.' and update it to 'CrateDB Cloud will now sync your DynamoDB data
based on the selected settings.' so the wording correctly reflects the DynamoDB
integration.
- Line 67: The heading sentence currently says "MongoDB CDC integration" but
should reference DynamoDB; update the text fragment "MongoDB CDC integration" in
the docs/cluster/integrations/dynamo-cdc.md file to "DynamoDB CDC integration"
so the line reads "Follow these steps in the CrateDB Cloud Console to set up the
DynamoDB CDC integration:"; ensure any other occurrences of "MongoDB CDC
integration" in this document are similarly corrected to "DynamoDB CDC
integration".
- Line 55: The markdown bold formatting for the field label "Data stream name"
is missing a closing asterisk; update the sentence so the label is properly
wrapped in double asterisks (i.e., change " **Data stream name* " to " **Data
stream name** ") wherever the phrase appears (specifically the step text that
currently reads 'Fill in the **Data stream name*. Please note...') to restore
correct bold rendering.
- Line 1: The document label at the top incorrectly uses
"integrations-mongo-cdc"; change that anchor/label to "integrations-dynamo-cdc"
so the toctree reference to integrations-dynamo-cdc resolves correctly. Locate
the label string "integrations-mongo-cdc" in the file and replace it with
"integrations-dynamo-cdc", ensuring any internal references or cross-references
in the same document are updated to match the new label.

---

Nitpick comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Line 22: Replace the passive sentence "Please note the Kinesis Data Stream
needs to be set up before you can use this stage." with an active-voice
instruction such as "Set up the Kinesis Data Stream before using this stage."
Locate that exact sentence in the docs/cluster/integrations/dynamo-cdc.md
content and update it to the active construction to match documentation
guidelines.
- Around line 116-118: Reword the passive sentence "Column or property names
containing square brackets `[]` are not supported and are replaced with `_obkt_`
and `_cbkt_` respectively. Likewise, column names containing dots `.` are not
supported and are replaced with `_dot_`." into active voice and state the actor
(the system): e.g. "The connector rejects column or property names containing
square brackets `[]` and replaces `[` with `_obkt_` and `]` with `_cbkt_`. It
also replaces dots `.` in column names with `_dot_`." Keep the replacement
tokens `_obkt_`, `_cbkt_`, and `_dot_` unchanged and ensure the rephrasing
appears in the same paragraph describing naming constraints.
- Line 59: Replace the passive sentence "The Kinesis data stream has now been
created." with an active-voice construction that names the actor and uses
subject‑verb‑object (for example, "AWS creates the Kinesis data stream" or "The
console creates the Kinesis data stream"), and update the surrounding sentence
so the step reads actively: locate the sentence "The Kinesis data stream has now
been created. If you want to change its retention policy (recommended), click on
the tab **Configuration**, then click on **Edit** under the **Data retention**
section." and rephrase the first clause to an active form while keeping the
guidance about changing retention policy intact.

In `@docs/cluster/integrations/index.md`:
- Around line 72-81: Revise the three integration type bullet descriptions for
consistent, concise, neutral tone: shorten "Full load only" to "Full load only:
Import all source data and then exit." shorten "CDC only" to "CDC only:
Continuously consume and apply Change Data Capture (CDC) events, waiting for new
events when caught up." and reword "Full load and CDC" to "Full load and CDC:
Perform a full import, then switch to CDC processing; where supported, start CDC
from the import start position so changes during import are captured." Keep the
same bullet labels ("Full load only", "CDC only", "Full load and CDC") so
existing references remain valid.
- Around line 51-56: Reword the passive sentences to active voice: replace "The
data field names will be inferred..." with something like "The system infers
data field names and makes them compatible with CrateDB naming restrictions,"
change "The data field types will be inferred..." to "The system infers data
field types as new fields are discovered," and rephrase the cast guidance to an
active form such as "If you need a specific type, create the CrateDB table with
the desired columns and types and the system will try to cast incoming data to
those destination types," keeping tone consistent with docs/ guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d44da165-a406-4ee8-b9b5-8344b5728dce

📥 Commits

Reviewing files that changed from the base of the PR and between c46fc7d and f6494c2.

📒 Files selected for processing (2)
  • docs/cluster/integrations/dynamo-cdc.md
  • docs/cluster/integrations/index.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/cluster/integrations/dynamo-cdc.md (2)

55-55: ⚠️ Potential issue | 🟡 Minor

Fix broken bold markup for “Data stream name”.

The emphasis is malformed (**Data stream name*) and renders incorrectly.

🔧 Proposed fix
-7. Fill in the **Data stream name*. Please note you will need this name later when setting up the integration in CrateDB Cloud. 
+7. Fill in the **Data stream name**. Please note you will need this name later when setting up the integration in CrateDB Cloud.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` at line 55, Fix the broken Markdown
bolding for the phrase currently written as "**Data stream name*" by replacing
it with properly closed bold markup "**Data stream name**" in the docs text
(look for the string "**Data stream name*") so the emphasis renders correctly.

61-61: ⚠️ Potential issue | 🟡 Minor

Fix broken bold markup in the DynamoDB page title phrase.

**Stream to an Amazon Kinesis data stream* DynamoDB page is missing a closing *.

🔧 Proposed fix
-12. Back to the **Stream to an Amazon Kinesis data stream* DynamoDB page, select the newly created Kinesis Stream from the list.
+12. Back to the **Stream to an Amazon Kinesis data stream** DynamoDB page, select the newly created Kinesis Stream from the list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` at line 61, Fix the broken Markdown
bolding by adding the missing asterisk to the phrase "**Stream to an Amazon
Kinesis data stream* DynamoDB page": change it to "**Stream to an Amazon Kinesis
data stream** DynamoDB page" so the entire phrase "Stream to an Amazon Kinesis
data stream" is correctly bolded; update the text occurrence of that phrase in
the document (look for the exact string shown) and verify the Markdown renders
properly.
🧹 Nitpick comments (2)
docs/cluster/integrations/dynamo-cdc.md (2)

56-62: Use consistent ordered-list numbering in the step list.

Step 8 is duplicated, which hurts readability/maintainability in source.

♻️ Proposed fix
-8. Choose a **Maximum record size** that can hold more than two times your biggest DynamoDB table items.
-9. Click the button **Create data stream**.
-10. The Kinesis data stream has now been created. If you want to change its retention policy (recommended), click on the tab **Configuration**, then click on **Edit** under the **Data retention** section.
-11. Select the data retention that suits you. At least 7 days is recommended. Please note the data retention period will have an impact on the cost of the Kinesis Data Stream.
-12. Back to the **Stream to an Amazon Kinesis data stream** DynamoDB page, select the newly created Kinesis Stream from the list.
-13. Click on **Turn on stream**.
+9. Choose a **Maximum record size** that can hold more than two times your biggest DynamoDB table items.
+10. Click the button **Create data stream**.
+11. The Kinesis data stream has now been created. If you want to change its retention policy (recommended), click on the tab **Configuration**, then click on **Edit** under the **Data retention** section.
+12. Select the data retention that suits you. At least 7 days is recommended. Please note the data retention period will have an impact on the cost of the Kinesis Data Stream.
+13. Back to the **Stream to an Amazon Kinesis data stream** DynamoDB page, select the newly created Kinesis Stream from the list.
+14. Click on **Turn on stream**.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 56 - 62, The ordered
list in docs/cluster/integrations/dynamo-cdc.md contains a duplicated step
number "8." which breaks numbering; update the list so numbering is consecutive
(change the second "8." to "9." and increment the following steps accordingly),
ensuring steps that reference actions like "Create data stream", "Configuration
> Edit > Data retention", and "Turn on stream" retain their relative order and
wording while their numeric prefixes are corrected.

116-118: Prefer active voice for the column-name restriction rule.

This section is passive and can be tightened for guide tone consistency.

✍️ Proposed rewrite
-Column or property names containing square brackets `[]` are not supported and
-are replaced with `_obkt_` and `_cbkt_` respectively. Likewise, column
-names containing dots `.` are not supported and are replaced with `_dot_`.
+CrateDB Cloud does not support column or property names with square brackets `[]`;
+it replaces `[` with `_obkt_` and `]` with `_cbkt_`. It also replaces dots `.`
+in names with `_dot_`.

Based on learnings: “audit content for passive voice … prefer active constructions” and “use a concise, neutral guide tone across Markdown files under docs/”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 116 - 118, Rewrite the
passive sentence about unsupported column/property names into active voice and a
concise guide tone: change "Column or property names containing square brackets
`[]` are not supported and are replaced with `_obkt_` and `_cbkt_` respectively.
Likewise, column names containing dots `.` are not supported and are replaced
with `_dot_`." to an active construction that directly instructs readers (e.g.,
"We replace square brackets `[]` in column/property names with `_obkt_` and
`_cbkt_`, and we replace dots `.` with `_dot_`."), keeping the same tokens
`_obkt_`, `_cbkt_`, and `_dot_` and preserving the exact replacement rules and
examples.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Around line 109-112: The trailing horizontal rule ('---') after the "Click
**Create Integration**..." paragraph is being parsed as a setext-style heading
and causing MD003 failures; fix it by either removing the horizontal rule
entirely or separating it from the paragraph with an extra blank line or by
replacing it with an explicit ATX heading or a different separator so the block
boundary no longer looks like a setext underline (locate the paragraph starting
"Click **Create Integration**..." and the subsequent '---' and update
accordingly).

---

Duplicate comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Line 55: Fix the broken Markdown bolding for the phrase currently written as
"**Data stream name*" by replacing it with properly closed bold markup "**Data
stream name**" in the docs text (look for the string "**Data stream name*") so
the emphasis renders correctly.
- Line 61: Fix the broken Markdown bolding by adding the missing asterisk to the
phrase "**Stream to an Amazon Kinesis data stream* DynamoDB page": change it to
"**Stream to an Amazon Kinesis data stream** DynamoDB page" so the entire phrase
"Stream to an Amazon Kinesis data stream" is correctly bolded; update the text
occurrence of that phrase in the document (look for the exact string shown) and
verify the Markdown renders properly.

---

Nitpick comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Around line 56-62: The ordered list in docs/cluster/integrations/dynamo-cdc.md
contains a duplicated step number "8." which breaks numbering; update the list
so numbering is consecutive (change the second "8." to "9." and increment the
following steps accordingly), ensuring steps that reference actions like "Create
data stream", "Configuration > Edit > Data retention", and "Turn on stream"
retain their relative order and wording while their numeric prefixes are
corrected.
- Around line 116-118: Rewrite the passive sentence about unsupported
column/property names into active voice and a concise guide tone: change "Column
or property names containing square brackets `[]` are not supported and are
replaced with `_obkt_` and `_cbkt_` respectively. Likewise, column names
containing dots `.` are not supported and are replaced with `_dot_`." to an
active construction that directly instructs readers (e.g., "We replace square
brackets `[]` in column/property names with `_obkt_` and `_cbkt_`, and we
replace dots `.` with `_dot_`."), keeping the same tokens `_obkt_`, `_cbkt_`,
and `_dot_` and preserving the exact replacement rules and examples.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 29ee7d22-549a-45c1-a894-4eba30a6ece3

📥 Commits

Reviewing files that changed from the base of the PR and between f6494c2 and 5b72317.

📒 Files selected for processing (1)
  • docs/cluster/integrations/dynamo-cdc.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/cluster/integrations/index.md (1)

72-81: Consider improving clarity and consistency in integration type descriptions.

The descriptions could be more consistent and clear. Consider these refinements:

✍️ Suggested improvements
 There are 3 different integration types:
 - **Full load only**: Imports all your data and immediately ends after.
-- **CDC only**: It indefinitely listens to CDC (Change Data Capture) events on the 
-  source and applies them into your CrateDB table. Once it reaches the last CDC event 
-  it waits for new events to come.
+- **CDC only**: Continuously listens for CDC (Change Data Capture) events on the 
+  source and applies them to your CrateDB table. After processing the last available 
+  event, waits for new events.
 - **Full load and CDC**
-  It imports all the data like the type **full load only**, but once that phase finishes
-  it starts processing CDC events. If the source supports it, it will try to read CDC
-  events starting from right when the import phase started. This way any data alteration
-  during the import phase will be picked up and processed.
+  Imports all data like **full load only**, then begins processing CDC events. When 
+  the source supports it, reads CDC events from the moment the import phase started, 
+  ensuring data alterations during import are captured and processed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/index.md` around lines 72 - 81, Rewrite the three
integration type descriptions ("Full load only", "CDC only", "Full load and
CDC") to use consistent sentence structure and punctuation: start each with a
short bolded type name followed by a colon, then a single concise sentence
describing behavior; for "CDC only" clarify it continuously listens for and
applies change events and waits for new events when caught up; for "Full load
and CDC" state that it first imports all data like "Full load only" and then
begins processing CDC events, and explicitly note that if supported the CDC
reader will attempt to start from the import start time so changes during import
are captured. Ensure consistent capitalization (sentence case) and parallel
phrasing across all three entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/cluster/integrations/index.md`:
- Line 78: Replace the underscore-based bold syntax "__full load only__" with
asterisk-based strong emphasis "**full load only**" in the Markdown content
(look for the exact string "__full load only__" in the integrations index
document) so the bold formatting follows the repository's Markdown style guide.
- Around line 51-57: Rewrite the two passive sentences to active voice and a
neutral guide tone: replace "The data field names will be inferred and made
compatible with CrateDB naming restrictions automatically." with an active
sentence such as "The system infers data field names and adjusts them to meet
CrateDB naming restrictions." Replace "The data field types will be inferred as
new fields are discovered. If you require data to be cast to a specific type,
simply create the CrateDB table with the columns and types you want and the
system will try to cast it to the types at the destination." with an active,
concise version like "The system infers data field types as new fields appear;
to enforce specific types, create the CrateDB table with the desired columns and
types and the system will attempt to cast incoming data to those destination
types." Locate and update the sentences that start with "The data field names
will be inferred..." and "The data field types will be inferred..." to implement
these changes.

---

Nitpick comments:
In `@docs/cluster/integrations/index.md`:
- Around line 72-81: Rewrite the three integration type descriptions ("Full load
only", "CDC only", "Full load and CDC") to use consistent sentence structure and
punctuation: start each with a short bolded type name followed by a colon, then
a single concise sentence describing behavior; for "CDC only" clarify it
continuously listens for and applies change events and waits for new events when
caught up; for "Full load and CDC" state that it first imports all data like
"Full load only" and then begins processing CDC events, and explicitly note that
if supported the CDC reader will attempt to start from the import start time so
changes during import are captured. Ensure consistent capitalization (sentence
case) and parallel phrasing across all three entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d036a837-4680-4ce4-a4f6-4577d9032456

📥 Commits

Reviewing files that changed from the base of the PR and between 5b72317 and 147c299.

📒 Files selected for processing (1)
  • docs/cluster/integrations/index.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
docs/cluster/integrations/dynamo-cdc.md (2)

107-112: ⚠️ Potential issue | 🟡 Minor

Remove the trailing horizontal rule after the stepper block.

The separator at Line 111 is likely being parsed as a setext heading underline in this block context (MD003 warning). Removing it resolves the lint issue cleanly.

🔧 Proposed fix
 #### Create the Integration
 Click **Start import** to finalize the setup. CrateDB Cloud will now sync
 your DynamoDB data based on the selected settings.
 :::::
----
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 107 - 112, Remove the
trailing horizontal rule ("---") that follows the stepper block under the
"Create the Integration" section; specifically delete the lone line containing
the three dashes after the closing stepper block so it no longer parses as a
setext heading underline (this addresses the MD003 lint warning).

54-54: ⚠️ Potential issue | 🟡 Minor

Fix broken bold formatting in step text.

Line 54 has malformed bold markup (**Data stream name*), which breaks rendering.

🔧 Proposed fix
-7. Fill in the **Data stream name*. Please note you will need this name later when setting up the integration in CrateDB Cloud. 
+7. Fill in the **Data stream name**. Please note you will need this name later when setting up the integration in CrateDB Cloud.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` at line 54, Fix the malformed bold
markup in the step text by changing the broken `**Data stream name*` markup to a
properly closed bold token so the phrase "Data stream name" renders as bold
(e.g., use `**Data stream name**`); locate the step containing the phrase "Data
stream name" and update the markdown there to close the bold formatting
correctly.
🧹 Nitpick comments (2)
docs/cluster/integrations/dynamo-cdc.md (2)

85-87: Align integration mode labels with the canonical naming used in the integrations index.

Lines 85-87 currently use Full Load Only, but the canonical labels in docs/cluster/integrations/index.md:69-81 are Full load only, CDC only, and Full load and CDC. Keeping exact naming consistent across pages avoids user confusion.

♻️ Proposed fix
-   - **Full Load Only**: Imports the data once but doesn’t sync changes.
-   - **Full Load and CDC**: Imports the data and syncs changes in real-time.
-   - **CDC Only**: Syncs only new changes in real-time without importing existing data.
+   - **Full load only**: Imports the data once but doesn’t sync changes.
+   - **Full load and CDC**: Imports the data and syncs changes in real-time.
+   - **CDC only**: Syncs only new changes in real-time without importing existing data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 85 - 87, Update the
integration mode labels to exactly match the canonical names used in the
integrations index: change "Full Load Only" to "Full load only", "Full Load and
CDC" to "Full load and CDC", and "CDC Only" to "CDC only" in the dynamo-cdc.md
list so the three entries read "Full load only", "Full load and CDC", and "CDC
only"; ensure the text elsewhere in this file uses these exact capitalizations
to maintain consistency with docs/cluster/integrations/index.md.

4-11: Prefer active voice and tighter guide tone in the intro.

This block can be shorter and more direct in active voice to match docs style consistency.

✍️ Suggested rewrite
-CrateDB Cloud enables continuous data ingestion from DynamoDB using Change Data
-Capture (CDC), providing real-time synchronization of your data.
+Use CrateDB Cloud to continuously ingest DynamoDB Change Data Capture (CDC) events
+and keep your data synchronized in real time.
@@
-The DynamoDB CDC integration in CrateDB Cloud allows you to keep your data
-synchronized between a DynamoDB table and your CrateDB Cloud cluster
-in real-time.
+The DynamoDB CDC integration keeps data synchronized between a DynamoDB table
+and your CrateDB Cloud cluster in real time.

Based on learnings: prefer active voice and a concise, neutral guide tone for Markdown files under docs/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/integrations/dynamo-cdc.md` around lines 4 - 11, The intro uses
passive phrasing and is wordy; rewrite the opening paragraph and the first
sentence under "## Key Concepts" to use active voice and a tighter, neutral
guide tone (e.g., "CrateDB Cloud ingests changes from DynamoDB via CDC to
synchronize data in real time.") — update the lines that currently describe
"DynamoDB CDC integration" and the initial paragraph so they are shorter,
direct, and match docs style consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Around line 107-112: Remove the trailing horizontal rule ("---") that follows
the stepper block under the "Create the Integration" section; specifically
delete the lone line containing the three dashes after the closing stepper block
so it no longer parses as a setext heading underline (this addresses the MD003
lint warning).
- Line 54: Fix the malformed bold markup in the step text by changing the broken
`**Data stream name*` markup to a properly closed bold token so the phrase "Data
stream name" renders as bold (e.g., use `**Data stream name**`); locate the step
containing the phrase "Data stream name" and update the markdown there to close
the bold formatting correctly.

---

Nitpick comments:
In `@docs/cluster/integrations/dynamo-cdc.md`:
- Around line 85-87: Update the integration mode labels to exactly match the
canonical names used in the integrations index: change "Full Load Only" to "Full
load only", "Full Load and CDC" to "Full load and CDC", and "CDC Only" to "CDC
only" in the dynamo-cdc.md list so the three entries read "Full load only",
"Full load and CDC", and "CDC only"; ensure the text elsewhere in this file uses
these exact capitalizations to maintain consistency with
docs/cluster/integrations/index.md.
- Around line 4-11: The intro uses passive phrasing and is wordy; rewrite the
opening paragraph and the first sentence under "## Key Concepts" to use active
voice and a tighter, neutral guide tone (e.g., "CrateDB Cloud ingests changes
from DynamoDB via CDC to synchronize data in real time.") — update the lines
that currently describe "DynamoDB CDC integration" and the initial paragraph so
they are shorter, direct, and match docs style consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a3a6b754-fddd-4ff1-a120-868ed743e0d1

📥 Commits

Reviewing files that changed from the base of the PR and between 147c299 and 9a3244a.

📒 Files selected for processing (1)
  • docs/cluster/integrations/dynamo-cdc.md

@amotl
Copy link
Member

amotl commented Mar 18, 2026

Thank you! Maybe also mention DynamoDB in the topmost section on the Import page?

@juanpardo
Copy link
Contributor Author

Thank you! Maybe also mention DynamoDB in the topmost section on the Import page?

Good point, thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)

4-9: ⚠️ Potential issue | 🟡 Minor

Add DynamoDB to the sources list.

Based on the PR discussion, amotl suggested mentioning DynamoDB in the topmost section of the Import page, and you acknowledged this feedback. However, DynamoDB has not been added to the sources list here (lines 4-9), only to the formats list at line 16. For consistency with MongoDB, which appears as both "MongoDB database" (line 9, sources) and "MongoDB collection" (line 15, formats), consider adding "DynamoDB database" or similar to the sources list.

📝 Proposed addition to sources list
 You can import data into your CrateDB directly from various sources, including:
 - Local files
 - URLs
 - AWS S3 buckets
 - Azure storage
 - MongoDB database
+- DynamoDB database
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/import.md` around lines 4 - 9, Add "DynamoDB" to the import
sources list to mirror the formats section: update the top sources bullet list
(the block that currently contains "Local files", "URLs", "AWS S3 buckets",
"Azure storage", "MongoDB database") to include "DynamoDB database" (or similar
phrasing) so DynamoDB appears both in sources and in the formats section
alongside "MongoDB database"/"MongoDB collection".
🧹 Nitpick comments (1)
docs/cluster/import.md (1)

16-16: Consider adding specificity for consistency.

Line 15 uses "MongoDB collection" rather than just "MongoDB". For consistency and clarity, consider using "DynamoDB table" instead of just "DynamoDB" to indicate the specific entity being imported.

📝 Suggested refinement
 - MongoDB collection
-- DynamoDB
+- DynamoDB table
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cluster/import.md` at line 16, Update the wording so it matches the
specificity used for MongoDB by changing the lone "DynamoDB" token to "DynamoDB
table" wherever the short form appears (e.g., the line currently containing
"DynamoDB"); ensure the phrasing mirrors "MongoDB collection" for consistency in
the import documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/cluster/import.md`:
- Around line 4-9: Add "DynamoDB" to the import sources list to mirror the
formats section: update the top sources bullet list (the block that currently
contains "Local files", "URLs", "AWS S3 buckets", "Azure storage", "MongoDB
database") to include "DynamoDB database" (or similar phrasing) so DynamoDB
appears both in sources and in the formats section alongside "MongoDB
database"/"MongoDB collection".

---

Nitpick comments:
In `@docs/cluster/import.md`:
- Line 16: Update the wording so it matches the specificity used for MongoDB by
changing the lone "DynamoDB" token to "DynamoDB table" wherever the short form
appears (e.g., the line currently containing "DynamoDB"); ensure the phrasing
mirrors "MongoDB collection" for consistency in the import documentation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4ab25ca9-686e-4423-bd60-f9112dc45931

📥 Commits

Reviewing files that changed from the base of the PR and between 9a3244a and 78c5589.

📒 Files selected for processing (4)
  • docs/cluster/import.md
  • docs/cluster/integrations/dynamo-cdc.md
  • docs/cluster/integrations/index.md
  • docs/cluster/integrations/mongo-cdc.md
✅ Files skipped from review due to trivial changes (1)
  • docs/cluster/integrations/mongo-cdc.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/cluster/integrations/index.md
  • docs/cluster/integrations/dynamo-cdc.md

@juanpardo juanpardo requested a review from plaharanne March 18, 2026 10:03
@juanpardo juanpardo merged commit 2e25b71 into main Mar 20, 2026
3 of 7 checks passed
@juanpardo juanpardo deleted the jp/dynamodb-integration branch March 20, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants