Haystack integration by staru09 · Pull Request #89 · usemoss/moss

staru09 · 2026-03-29T17:01:02Z

Pull Request Checklist

Please ensure that your PR meets the following requirements:

I have read the CONTRIBUTING guide.
I have updated the documentation (if applicable).
My code follows the style guidelines of this project.
I have performed a self-review of my own code.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Fixes #79

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

vercel · 2026-03-29T17:01:07Z

@staru09 is attempting to deploy a commit to the Moss Team Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-03-29T17:01:10Z

All committers have signed the CLA.

yatharthk2 · 2026-03-30T16:48:04Z

@HarshaNalluru , would you please be able review ?

Copilot

Pull request overview

Adds a Haystack cookbook integration example so Moss can be used as a lightweight document store + retriever in Haystack RAG-style pipelines.

Changes:

Introduces a MossDocumentStore + MossRetriever helper implementation for Haystack usage.
Adds runnable example and a live-platform test script for end-to-end verification.
Documents installation, setup, and usage in a new cookbook README.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
examples/cookbook/haystack/moss_haystack.py	Implements Haystack-facing document store/retriever wrappers plus metadata + filter conversion helpers
examples/cookbook/haystack/example_usage.py	Runnable demo that creates an index, writes sample docs from `faqs.json`, queries, and deletes the index
examples/cookbook/haystack/test_live.py	Live test script that exercises write/count/upsert/filter/query/delete/cleanup against the Moss platform
examples/cookbook/haystack/README.md	Setup and usage documentation for the new cookbook example

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T15:32:09Z

+    """Moss document store for Haystack.
+
+    Provides write_documents, count_documents, delete_documents,
+    and filter_documents for use in Haystack pipelines.


MossDocumentStore is presented as a Haystack document store, but it does not implement/extend Haystack’s DocumentStore type (it’s a plain class). This makes it incompatible with any Haystack components/utilities that expect a DocumentStore instance and undermines the “document store integration” deliverable. Consider subclassing/implementing haystack.document_stores.types.DocumentStore (and required methods) or clearly renaming/documenting this as a minimal helper not meant to satisfy the DocumentStore contract.

Suggested change

"""Moss document store for Haystack.

Provides write_documents, count_documents, delete_documents,

and filter_documents for use in Haystack pipelines.

"""Minimal Moss-backed helper for Haystack-style document operations.

This is **not** a full implementation of ``haystack.document_stores.types.DocumentStore``;

it only provides a small subset of common methods (such as ``write_documents``,

``count_documents``, ``delete_documents``, and ``filter_documents``) for use in

custom Haystack components or pipelines that interact with Moss directly.

Fixed this one

Copilot · 2026-04-01T15:32:10Z

+def _serialize_metadata(meta: Optional[dict]) -> Optional[dict]:
+    """Convert arbitrary-typed metadata to Moss string-only metadata."""
+    if not meta:
+        return None
+    result = {}
+    needs_json = False
+    for k, v in meta.items():
+        if isinstance(v, str):
+            result[k] = v
+        else:
+            result[k] = json.dumps(v)
+            needs_json = True
+    if needs_json:
+        result["__moss_json_meta__"] = "true"
+    return result


_serialize_metadata injects a reserved key __moss_json_meta__ into user metadata when any value is JSON-encoded. If the input Document.meta already contains this key, the value will be silently overwritten, losing user data and potentially breaking deserialization. Use a collision-safe approach (e.g., namespace under a single reserved sub-dict, or detect/escape pre-existing keys) to avoid overwriting user-provided metadata.

It would be an edge case but covered this one

Copilot · 2026-04-01T15:32:10Z

+        if policy == DuplicatePolicy.SKIP:
+            existing = _run_async(self.client.get_docs(self.index_name))
+            existing_ids = {doc.id for doc in existing}
+            docs = [d for d in docs if d.id not in existing_ids]


write_documents(..., policy=DuplicatePolicy.SKIP) calls get_docs() with no options and builds a full existing_ids set. This forces a full index scan/download of all documents just to dedupe IDs, which can be very expensive for non-trivial indexes. If Moss supports fetching by IDs or listing only IDs, use that; otherwise consider documenting that SKIP is O(N) or dropping/limiting SKIP support in this example.

fixed this one

Copilot · 2026-04-01T15:32:10Z

+    def filter_documents(self, filters: Optional[dict] = None) -> list[Document]:
+        """Retrieve documents, optionally filtered.
+
+        Without filters: fetches all documents.
+        With filters: loads index and runs a broad query with filters.
+        """
+        if filters is None:
+            docs = _run_async(self.client.get_docs(self.index_name))
+            return [_moss_doc_to_haystack(doc) for doc in docs]
+
+        self._ensure_loaded()
+        moss_filters = _convert_haystack_filters(filters)
+        results = _run_async(
+            self.client.query(
+                self.index_name,
+                "",
+                QueryOptions(top_k=1000, alpha=self.alpha, filter=moss_filters),
+            )
+        )
+        return [_moss_doc_to_haystack(doc, score=doc.score) for doc in results.docs]


filter_documents(filters=...) uses query() with an empty query string and a hard-coded top_k=1000. This cannot guarantee returning all documents that match the filter (anything beyond the top_k cutoff is dropped, and empty-query semantics may be undefined), so callers may get incomplete results. Consider implementing filtered retrieval by fetching all docs and filtering client-side, adding an explicit top_k/limit parameter, or raising NotImplementedError for filtered listing until a scan-by-filter API exists.

changed to top_k

Copilot · 2026-04-01T15:32:11Z

+
+| File | Description |
+|------|-------------|
+| `moss_haystack.py` | MossDocumentStore + MossRetriever — copy this into your project |
+| `example_usage.py` | Runnable demo with FAQ data |


The issue/PR description indicates a Haystack integration under integrations/haystack/ (and a MossDocumentStore/MossRetriever intended for reuse), but this PR only adds a cookbook example that instructs users to copy moss_haystack.py into their project. Either move the integration to the expected integrations/haystack/ location (and package it), or update the PR/issue linkage/docs so it’s clear this is an example-only snippet rather than a shipped integration.

updated with verbose description

Copilot · 2026-04-01T15:32:11Z

+def report(name, success, detail=""):
+    global passed, failed
+    status = "PASS" if success else "FAIL"
+    if success:
+        passed += 1
+    else:
+        failed += 1
+    msg = f"  [{status}] {name}"
+    if detail:
+        msg += f" -- {detail}"
+    print(msg)
+
+
+def run_tests():
+    if not PROJECT_ID or not PROJECT_KEY:
+        print("ERROR: MOSS_PROJECT_ID and MOSS_PROJECT_KEY must be set.")
+        sys.exit(1)


test_live.py is a standalone script with manual PASS/FAIL reporting and hard sys.exit usage, so it won’t be discovered/executed by the repo’s existing Python test runners (CI currently runs unittest for the langchain cookbook). If this PR is meant to satisfy the checklist item “added tests”, consider converting these checks into unittest/pytest tests that can be run in CI and auto-skip when MOSS_PROJECT_ID/KEY are not set.

Copilot · 2026-04-01T15:32:11Z

+PROJECT_ID = os.getenv("MOSS_PROJECT_ID")
+PROJECT_KEY = os.getenv("MOSS_PROJECT_KEY")
+TEST_INDEX = "haystack-live-test"
+


TEST_INDEX is a fixed name and the script deletes the index during cleanup. If a user already has an index with this name (or multiple developers run the script concurrently), this can overwrite/delete real data. Prefer generating a unique index name per run (timestamp/UUID) and printing it for debugging, similar to examples/python/comprehensive_sample.py’s timestamped index naming.

Copilot · 2026-04-01T15:32:12Z

+INDEX_NAME = "haystack-faq-demo"
+


INDEX_NAME is hard-coded and the script unconditionally deletes that index at the end. If the user already has an index with the same name, running this example will delete it. Consider generating a unique index name per run (timestamp/UUID) or adding a clear guard/confirmation around deletion.

Can we ignore this since it's just an example index name?

yes please ignore

yatharthk2 · 2026-04-01T23:11:23Z

Is this ready for review ? were the copilot comments resolved ?

staru09 · 2026-04-02T07:50:55Z

Is this ready for review ? were the copilot comments resolved ?

yes please review now

yatharthk2 · 2026-04-06T19:17:03Z

+    # Load FAQ data
+    faqs_path = os.path.join(
+        os.path.dirname(__file__), "..", "..", "python", "faqs.json"
+    )


this path wont work

It seems to work on my local, is it throwing an error in your machine?

yatharthk2 · 2026-04-06T22:18:23Z

+    store.load_index()
+
+    # Create retriever and search
+    retriever = MossRetriever(document_store=store, top_k=3)


This looks complicated, can we do MossRetriever.load_index and MossRetriever.run

Keep the implementation as simple as posssible

yatharthk2 · 2026-04-06T22:29:42Z

+    print("\n--- Retrieval: 'payment methods' ---")
+    result = retriever.run(query="What payment methods do you accept?")
+    for doc in result["documents"]:
+        print(f"  [{doc.score:.2f}] {doc.content[:100]}...")


what is the role of haystack here ? how is it adding value ?

yatharthk2 · 2026-04-06T23:04:50Z

+# --- Filter Conversion (for MossRetriever server-side queries) ---
+
+
+def _convert_haystack_filters(filters: dict) -> dict:


lets remove the filters for now, it maybe too complicated for now

devin-ai-integration

Devin Review found 2 new potential issues.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-08T08:49:23Z

+        options = (
+            MutationOptions(upsert=True)
+            if policy == DuplicatePolicy.OVERWRITE
+            else None
+        )
+        _run_async(self.client.add_docs(self.index_name, docs, options))


🚩 DuplicatePolicy.NONE behavior depends on Moss backend

When policy is DuplicatePolicy.NONE (which per Haystack's contract should raise on duplicate document IDs), the code at lines 184-189 sets options = None (not OVERWRITE, not SKIP) and calls add_docs without upsert. Whether this correctly raises an error on duplicates depends entirely on the Moss backend's default behavior for add_docs without mutation options. If the backend silently ignores or overwrites duplicates, this would violate the Haystack DocumentStore protocol. Not flagged as a bug because the Moss SDK's default behavior is not determinable from this codebase alone.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 3 new potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-08T09:05:29Z

🚩 Live tests all skip without credentials — no mock-based unit tests

Unlike the langchain cookbook which has mock-based unit tests in examples/cookbook/langchain/test_integration.py that run without credentials, the haystack cookbook only has test_live.py with live integration tests that are skipped when MOSS_PROJECT_ID/MOSS_PROJECT_KEY are not set. This means CI without credentials would run zero tests for this code. The metadata serialization round-trip logic in particular would benefit from unit tests with mocks, especially given the edge cases in sentinel handling. CONTRIBUTING.md requires 'add tests' for new code — live tests satisfy the letter of that rule, but mock tests would provide better CI coverage.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-08T15:03:43Z

+        if isinstance(v, str):
+            result[k] = v


🟡 Metadata round-trip corrupts string values that start with the typed prefix

_serialize_metadata stores plain strings as-is (line 47), but _deserialize_metadata (line 64) treats any string starting with __moss_typed__: as a JSON-encoded non-string value. If a user stores a string metadata value like "__moss_typed__:42", on serialization it's stored unchanged, but on deserialization the prefix is stripped and json.loads("42") returns the integer 42 — silently corrupting the value from str to int. The fix is to also encode string values that happen to start with the prefix, e.g., wrapping them with json.dumps so they survive the round-trip.

Suggested change

if isinstance(v, str):

result[k] = v

if isinstance(v, str) and not v.startswith(_MOSS_TYPED_PREFIX):

result[k] = v

Was this helpful? React with 👍 or 👎 to provide feedback.

it's an edge case, can ignore this IMO

yatharthk2 · 2026-04-13T06:30:39Z

+ersion = 1
+revision = 1


Why was uv lock changed ?

I think it was pushed by mistake
removed it

yatharthk2 · 2026-04-13T06:32:07Z

@@ -0,0 +1,140 @@
+# Haystack + Moss Cookbook Example
+
+Use [Moss](https://moss.dev) as a document store and retriever in [Haystack](https://haystack.deepset.ai/) RAG pipelines. Moss provides sub-10ms semantic search, Haystack orchestrates the retrieval-to-generation pipeline.


can you please say "Use Moss as realtime semantic search in ...."

yatharthk2 · 2026-04-13T06:33:05Z

+## Installation
+
+```bash
+pip install haystack-ai moss python-dotenv


can you please add pyproject.toml

minor fix

devin-ai-integration · 2026-04-16T21:00:32Z

+        self._index_loaded = False
+
+    def filter_documents(self, filters: Optional[dict] = None) -> list[Document]:
+        raise NotImplementedError("filter_documents is not supported.")


MossDocumentStore subclasses haystack.document_stores.types.DocumentStore, which requires filter_documents as part of the protocol. Raising NotImplementedError here breaks that contract — any Haystack component or utility that probes the store (e.g. DocumentWriter with duplicate detection, evaluation helpers, writers that call filter_documents(filters=None) to enumerate existing docs) will crash.

Pick one of:

Implement it — at minimum, pass through to client.get_docs(...) when filters is None, and raise a clearer error only when non-trivial filters are passed. Supporting Moss metadata filters ($eq, $and, $in) via QueryOptions.filter is a natural follow-up.

Stop subclassing DocumentStore — downgrade to a plain helper class (as the earlier Copilot review comment originally suggested) and update the README/docstring to say this is not a full DocumentStore implementation.

Having it both ways — declaring the interface but refusing to honor it — is the worst outcome because it fails only at pipeline-run time with a confusing stack trace.

staru09 requested review from CoderOMaster, HarshaNalluru, r4ghu and yatharthk2 as code owners March 29, 2026 17:01

yatharthk2 requested a review from Copilot April 1, 2026 15:26

Copilot started reviewing on behalf of yatharthk2 April 1, 2026 15:26 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

staru09 force-pushed the haystack_integration branch from 628291f to cbcef90 Compare April 2, 2026 07:50

yatharthk2 requested changes Apr 6, 2026

View reviewed changes

staru09 force-pushed the haystack_integration branch from cbcef90 to 64de14e Compare April 7, 2026 10:15

This comment was marked as resolved.

Sign in to view

staru09 force-pushed the haystack_integration branch from 64de14e to 5b8ce80 Compare April 8, 2026 08:40

devin-ai-integration bot reviewed Apr 8, 2026

View reviewed changes

staru09 force-pushed the haystack_integration branch from 362920b to 2abff20 Compare April 8, 2026 14:59

devin-ai-integration bot reviewed Apr 8, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

staru09 force-pushed the haystack_integration branch from a69858b to 036fb11 Compare April 8, 2026 16:03

This comment was marked as resolved.

Sign in to view

yatharthk2 requested changes Apr 13, 2026

View reviewed changes

staru09 added 4 commits April 13, 2026 16:06

added haystack_integration

913216b

minor fix

better example added

034df7a

minor fix

aaf7b2a

update readme

bad8119

staru09 force-pushed the haystack_integration branch from a5b176b to bad8119 Compare April 13, 2026 10:42

devin-ai-integration bot reviewed Apr 16, 2026

View reviewed changes

-    """Moss document store for Haystack.
-    Provides write_documents, count_documents, delete_documents,
-    and filter_documents for use in Haystack pipelines.
+    """Minimal Moss-backed helper for Haystack-style document operations.
+    This is **not** a full implementation of ``haystack.document_stores.types.DocumentStore``;
+    it only provides a small subset of common methods (such as ``write_documents``,
+    ``count_documents``, ``delete_documents``, and ``filter_documents``) for use in
+    custom Haystack components or pipelines that interact with Moss directly.

		# --- Filter Conversion (for MossRetriever server-side queries) ---


		def _convert_haystack_filters(filters: dict) -> dict:

		@@ -0,0 +1,140 @@
		# Haystack + Moss Cookbook Example

		Use [Moss](https://moss.dev) as a document store and retriever in [Haystack](https://haystack.deepset.ai/) RAG pipelines. Moss provides sub-10ms semantic search, Haystack orchestrates the retrieval-to-generation pipeline.

		INDEX_NAME = "haystack-faq-demo"

		ersion = 1
		revision = 1

Conversation

staru09 commented Mar 29, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Description

Type of Change

Uh oh!

vercel bot commented Mar 29, 2026

Uh oh!

CLAassistant commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yatharthk2 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yatharthk2 commented Apr 1, 2026

Uh oh!

staru09 commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

staru09 commented Mar 29, 2026 •

edited by devin-ai-integration bot

Loading

CLAassistant commented Mar 29, 2026 •

edited

Loading

devin-ai-integration bot Apr 8, 2026 •

edited

Loading