Skip to content

ByteBufBsonDocument and RawBsonDocument simplifications#1902

Open
rozza wants to merge 1 commit intomongodb:ByteBuffrom
rozza:ByteBufBsonDocumentClone
Open

ByteBufBsonDocument and RawBsonDocument simplifications#1902
rozza wants to merge 1 commit intomongodb:ByteBuffrom
rozza:ByteBufBsonDocumentClone

Conversation

@rozza
Copy link
Member

@rozza rozza commented Feb 26, 2026

Rationale

ByteBufBsonDocument#clone used to return a RawBsonDocument. The recent changes returned a normal BsonDocument, which is potentially expensive depending on its usage.

The ByteBufBsonDocument changes also added complex iterator logic, when RawBsonDocument deferred to BsonDocument iterators. As iteration is essentially a hydrating mechanism, there is opportunity for improvements for both implementations. By changing the RawBsonDocument iterators to be more efficient, ByteBufBsonDocument can now utilize these efficiency gains by proxy, relying on the cachedDocument iterators.

This change both reduces the complexity of ByteBufBsonDocument and relies on an improved RawBsonDocument implementation.

Summary of changes

  • ByteBufBsonDocument:

    • Simplify by returning RawBsonDocument from toBsonDocument, avoiding full BSON deserialization. When there are no sequence fields, the body bytes are cloned directly. When sequence fields exist, BsonBinaryWriter.pipe() merges the body with sequence arrays efficiently.
    • Use toBsonDocument for iterators. This eliminates the need for custom iterators (IteratorMode, CombinedIterator, createBodyIterator, and sequence iterators) since entrySet/values/keySet now delegate to the cached RawBsonDocument.
  • RawBsonDocument:

    • Renamed toBaseBsonDocument to override the default toBsonDocument implementation.
    • Implemented the iterators so that they don't need to fully convert the document to a BsonDocument.
  • Tests:

    • Updated ByteBufBsonDocumentTest iteration tests.
    • Updated ByteBufBsonArrayTest#fromValues as entrySet now returns RawBsonDocument instances.

JAVA-6010

### Rationale

`ByteBufBsonDocument#clone` used to return a `RawBsonDocument`.
The recent changes returned a normal `BsonDocument`, which is potentially
expensive depending on its usage.

The `ByteBufBsonDocument` changes also added complex iterator logic, when
`RawBsonDocument` deferred to `BsonDocument` iterators. As iteration is
essentially a hydrating mechanism, there is opportunity for improvements
for both implementations. By changing the `RawBsonDocument` iterators to
be more efficient, `ByteBufBsonDocument` can now utilize these efficiency
gains by proxy, relying on the `cachedDocument` iterators.

This change both reduces the complexity of `ByteBufBsonDocument` and relies
on an improved `RawBsonDocument` implementation.

### Summary of changes

* **`ByteBufBsonDocument`**:
  * Simplify by returning `RawBsonDocument` from `toBsonDocument`, avoiding
    full BSON deserialization. When there are no sequence fields, the body
    bytes are cloned directly. When sequence fields exist,
    `BsonBinaryWriter.pipe()` merges the body with sequence arrays efficiently.
  * Use `toBsonDocument` for iterators. This eliminates the need for custom
    iterators (`IteratorMode`, `CombinedIterator`, `createBodyIterator`, and
    sequence iterators) since `entrySet`/`values`/`keySet` now delegate to the
    cached `RawBsonDocument`.

* **`RawBsonDocument`**:
  * Renamed `toBaseBsonDocument` to override the default `toBsonDocument`
    implementation.
  * Implemented the iterators so that they don't need to fully convert the
    document to a `BsonDocument`.

* **Tests**:
  * Updated `ByteBufBsonDocumentTest` iteration tests.
  * Updated `ByteBufBsonArrayTest#fromValues` as `entrySet` now returns
    `RawBsonDocument` instances.

JAVA-6010

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rozza rozza requested a review from strogiyotec February 26, 2026 13:49
@rozza rozza requested a review from a team as a code owner February 26, 2026 13:49
@Override
public Collection<BsonValue> values() {
return toBaseBsonDocument().values();
List<BsonValue> values = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the login , the entrySet returns a SET though if I check previous implementation without a change it can still return duplicates
Here is the test case using latest main without this change

    @Test
    void testLatestPR() {
        BsonDocument doc = new BsonDocument()
                .append("a", new BsonInt32(1))
                .append("b", new BsonInt32(1))
                .append("c", new BsonInt32(1));
        RawBsonDocument rawDoc = new RawBsonDocument(doc, new BsonDocumentCodec());

        // BsonDocument preserves all 3 values
        assertEquals(3, doc.values().size());

        // RawBsonDocument should too — but LinkedHashSet deduplicates them to 1
        assertEquals(3, rawDoc.values().size());
    }

This test case is passing but fails in this branch because LinkedHashSet removes duplicate values, I believe we should preserve an already broken behavior and return duplicates

@Override
public Set<Entry<String, BsonValue>> entrySet() {
return toBaseBsonDocument().entrySet();
List<Entry<String, BsonValue>> entries = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the javadoc for RawBsonDocument say it's immutable, does it make sense to store entrySet/values in memory instead of calculating them all from scratch ? Something like what java.lang.String does with a hashcode, keep hashcode as class field but calculates it only once during the first hashCode call

Not mandatory

cachedDocument = new RawBsonDocument(clonedBytes);
} else {
// With sequence fields: pipe body + extra elements
BasicOutputBuffer buffer = new BasicOutputBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this buffer is never closed, can we wrap it with try with resources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked , the close just sets buffer to null, so not mandatory but still good to have in case we later change the class and close() will be more complicated than that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants