Skip to content

feat(migration): migrate lb4-llm-chat-extension from LangGraph to Mastra [WIP]#20

Draft
rohit-sourcefuse wants to merge 5 commits intomainfrom
feat/mastra
Draft

feat(migration): migrate lb4-llm-chat-extension from LangGraph to Mastra [WIP]#20
rohit-sourcefuse wants to merge 5 commits intomainfrom
feat/mastra

Conversation

@rohit-sourcefuse
Copy link
Copy Markdown

@rohit-sourcefuse rohit-sourcefuse commented May 4, 2026

Summary

Migration of loopback4-llm-chat-extension from LangGraph v2 (LangChain-based) to Mastra v3 (Vercel AI SDK-based).

This replaces the LangGraph execution engine and all LangChain provider adapters with Mastra workflows/steps and the Vercel AI SDK, while preserving the existing public API surface (POST /reply, SSE event protocol, LB4 bindings).

TDD: Technical Design Document — Mastra Migration


Type of change

  • Breaking change — underlying AI runtime replaced (LangGraph → Mastra)
  • Intermediate change — work in progress, not ready for merge

What's in this PR

Dependencies (package.json)

  • Removed: entire @langchain/* family — @langchain/core, @langchain/community, @langchain/langgraph, langchain, @langchain/anthropic, @langchain/aws, @langchain/cerebras, @langchain/google-genai, @langchain/groq, @langchain/ollama, @langchain/openai, @langfuse/langchain
  • Added: ai v6 (Vercel AI SDK core runtime) + @ai-sdk/amazon-bedrock, @ai-sdk/anthropic, @ai-sdk/cerebras, @ai-sdk/google, @ai-sdk/groq, @ai-sdk/openai, ollama-ai-provider
  • Version bumped: 2.0.02.1.0

Test layer — migrated

  • Deleted: 12 LangGraph node test files in src/__tests__/db-query/unit/nodes/ (~3,000 lines) — CheckCacheNode, CheckPermissionsNode, ClassifyChangeNode, FailedNode, FixQueryNode, GetColumnsNode, GetTablesNode, IsImprovementNode, SaveDataSetNode, SemanticValidatorNode, SqlGenerationNode, SyntacticValidatorNode
  • Deleted: 2 acceptance tests for DbQueryGraph and GetTablesNode (LangGraph compile/invoke pattern)
  • Added: 10 Mastra step unit tests in src/__tests__/db-query/unit/mastra-steps/ — one file per step with pure-function (state, context, deps) signatures
  • Added: db-query.workflow.integration.ts — 443-line end-to-end integration test covering 7 scenarios: happy path, retry loop, cache hit, template hit, max-attempts guard, condition routing
  • Added: src/__tests__/fixtures/fake-ai-models.ts — shared AI SDK fake models (createFakeLanguageModel, createFakeStreamingLanguageModel, createFakeEmbeddingModel) implementing LanguageModelV3/EmbeddingModelV2 interfaces, replacing sinon stubs

Source implementation — in progress

Migration of production source files (src/graphs/, src/components/db-query/nodes/, src/services/, src/transports/) is ongoing. Current phases:

  • ✅ Phase 0–1: Project scaffolding, dependency swap, test infrastructure
  • ✅ Phase 2: ChatGraph migration
  • 🔄 Phase 3: DbQuery workflow — in progress

What's NOT done yet

  • DbQuery Mastra workflow source implementation (src/components/db-query/)
  • Resolve merge conflicts with main (recent changes: visualizer bar prompt fix, CVE patch)
  • SSE transport compatibility with Mastra stream output
  • LangFuse observability adapter for Mastra (replaces @langfuse/langchain)
  • npm test passing end-to-end
  • Decision on Option A vs Option B for generate-query sub-graph (see TDD §14.1)

Architecture decisions (see TDD for full context)

Decision Approach
Tool execution Mastra Agent with LB4-registered IMastraTool implementations
Session/context MastraMemory replaces LangGraph thread state
Token counting stream.usage from Vercel AI SDK (post-stream)
Intermediate events onStepFinish callback → shared eventQueue drained by SSE generator
Permission gates Per-tool execute() checks (no mid-flow orchestration hook)
Extension pattern Subclass MastraChatOrchestrator OR register additional IMastraTool

How to test (once complete)

npm install
npm run build
npm test

Regression baseline: run lb4-llm-sandbox with lb4-llm-chat-component swapped to this branch — all 7 tabs must behave identically to the LangGraph v2 build.


Notes for reviewers

  • Conflicts: branch diverged from main during active development — needs rebase before merge
  • Commit messages: migration commits will be squashed and rewritten before final merge
  • Breaking surface: public API unchanged — POST /reply, SSE event types, all AiIntegrationBindings.* keys preserved

@rohit-sourcefuse rohit-sourcefuse requested a review from Copilot May 4, 2026 09:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@rohit-sourcefuse rohit-sourcefuse changed the title Feat/mastra feat(migration): migrate lb4-llm-chat-extension from LangGraph to Mastra [WIP] May 4, 2026
@rohit-sourcefuse
Copy link
Copy Markdown
Author

Review — WIP Status Assessment (2026-05-04)

What's confirmed working in this diff

Dependency swap is complete and correct.
`@langchain/*` family fully removed, Vercel AI SDK in. The `package.json` change is clean — no orphaned LangChain peer deps visible. Version bump to `2.1.0` is appropriate for a breaking runtime change.

Test infrastructure is solid.
The new `fake-ai-models.ts` fixture properly implements `LanguageModelV3` / `EmbeddingModelV2` interfaces from the AI SDK rather than sinon-stubbing LangChain internals — this is the right approach and will make tests stable across provider changes.

Step unit tests follow the correct pattern.
Each `mastra-steps/*.unit.ts` tests the step function in isolation with `(state, context, deps)` signature. The 443-line integration test in `db-query.workflow.integration.ts` covers 7 scenarios including the retry loop and max-attempts guard — good coverage of the hard edge cases.


Issues to address before merge

1. Merge conflicts — must resolve before review can proceed

Branch has diverged from `main`. Recent main changes that will conflict:

Action required: `git rebase origin/main` on `feat/mastra`, resolve conflicts, force-push.

2. Commit messages need rewrite

The 3 migration commits have vague, non-conventional messages:

  • `feat: phase 0,1 complete`
  • `feat: phase 2complete chatGraph migration added` ← typo
  • `feat: migrated all graphs and sdks`

Before merge, squash into a single conventional commit or rewrite each with proper scope. Example:
```
feat(migration): replace LangGraph with Mastra and Vercel AI SDK

  • Remove @langchain/* dependencies, add ai v6 + @ai-sdk/* providers
  • Migrate DbQuery LangGraph nodes to Mastra workflow steps
  • Migrate ChatGraph to Mastra Agent with onStepFinish event bridging
  • Replace LangChain sinon stubs with AI SDK LanguageModelV3 fake implementations

BREAKING CHANGE: internal execution engine replaced; public API surface preserved
```

3. Source implementation files not yet in this PR

Current diff is test-layer + dependencies only. The production source migration (`src/graphs/`, `src/components/db-query/nodes/`, `src/services/generation.service.ts`, `src/transports/sse.transport.ts`) must be added before this can be reviewed for correctness. The integration test will fail until the source is wired.

4. Open architectural decisions (per TDD Rev 4)

These need sign-off before implementation is finalised — see discussion doc on Confluence:

  • §14.1 — `generate-query` sub-graph: keep as LangGraph sub-graph (Option A) or replace with Mastra Workflow (Option B)?
  • §10.1.1 — per-tool permission gate pattern acceptable for v3?
  • §10.7 — Pattern 2 (subclass `MastraChatOrchestrator`) sufficient, formal lifecycle hooks deferred to v3.1?
  • §10.2 — Confirm: `onStepFinish` post-completion only; intermediate events from `execute()` directly

Not blocking but worth noting

  • `db-query.workflow.integration.ts` is named `.integration.ts` but lives in the `unit/` directory — move to `acceptance/` or rename to avoid CI split confusion.
  • The removed `fix-query.node.unit.ts` (379 lines) had complex retry-loop coverage — confirm the equivalent is captured in the new integration test's max-attempts scenario.
  • No LangFuse adapter shown yet — the `@langfuse/langchain` removal is tracked but the Mastra OTEL replacement is not in this diff.

@rohit-sourcefuse
Copy link
Copy Markdown
Author

rohit-sourcefuse commented May 5, 2026

Review update — commit 43bf4bb (2026-05-05)

Commit: feat:refactor the code and removed dead code and added unit tests


✅ What's improved since yesterday

  • Test count: 10 step unit tests → 14 step unit tests (added check-templates, fix-query, get-columns, get-tables, save-dataset, semantic-validator, verify-checklist)
  • Integration test expanded: 443 → 460 lines, 7 scenarios
  • All old LangGraph node tests fully removed (no traces left)
  • db-knowledge-graph.service.unit.ts properly refactored to Vercel AI SDK fakes

🔴 Must fix — vacuous tests (no assertions)

1. get-tables.step.unit.ts — last test has no assertion

it('filters tables by permissions when permissionHelper provided', async () => {
  // no expect() here
}).timeout(5000);

Needs an actual assertion — e.g. expect(result.status).to.equal(ToolStatus.Failed) or await expect(...).to.be.rejectedWith(...).

2. check-templates.step.unit.ts — assertion passes regardless of outcome

// Either resolves template or returns {} — both are valid paths
sinon.assert.calledOnce(onUsageSpy);

This passes whether the template matched or not. Split into two separate tests: one asserting result.sql is set (match), one asserting result is {} (no match).

3. get-tables.step.unit.ts — function identity check instead of call check

expect((smartModel as never as {doGenerate: sinon.SinonSpy}).doGenerate).to.be.a.Function();

Should be sinon.assert.calledOnce(smartModel.doGenerate) to prove the smart model was actually invoked.


🟡 Should fix before merge

4. Commit message format
feat:refactor the code and removed dead code and added unit tests — missing space after colon, mixes three concerns. Before final merge, split or rewrite as conventional commits.

5. Import path change
generation.controllers.acceptance.ts imports LLMStreamEvent from ../../types/events — verify this path exists in the source tree.

6. check-cache fixture field name
Test uses actions: [{type: DatasetActionType.Liked}] — confirm the step implementation reads item.type not item.action.


Checklist before requesting re-review

  • Fix 3 vacuous/weak tests
  • Fix check-cache fixture field name
  • Verify src/types/events export path exists
  • Confirm source files are on branch (npm run build passes)
  • Confirm @mastra/core status (see separate comment)
  • Rebase on main to resolve conflicts
  • Rewrite vague migration commit messages before merge

@rohit-sourcefuse
Copy link
Copy Markdown
Author

rohit-sourcefuse commented May 5, 2026

Open question — @mastra/core not in package.json

Checked the full package.json on feat/mastra branch. Only these AI-related packages are present:

"dependencies": {
  "ai": "^6.0.168"
},
"devDependencies": {
  "@ai-sdk/openai": "^3.0.54",
  "@ai-sdk/anthropic": ...,
  "@ai-sdk/bedrock": ...
}

@mastra/core is absent.

The test files import from local paths:

import {checkCacheStep} from '../../../../mastra/db-query/workflow/steps/check-cache.step';
import {MastraDbQueryContext} from '../../../../mastra/db-query/types/db-query.types';

This means the codebase has a custom src/mastra/ folder with step functions written as plain TypeScript — not using @mastra/core npm classes.

This needs clarification before proceeding:

Option A — @mastra/core should be added (per TDD)

The TDD references @mastra/core/agent, createStep, createWorkflow. If the implementation uses these, the package must be in package.json:

npm install @mastra/core

Option B — Custom lightweight Mastra-compatible layer

If the step functions are deliberately plain TypeScript (not using @mastra/core classes), this is a different architecture than what the TDD describes and needs explicit sign-off.

@rohit-sourcefuse
Copy link
Copy Markdown
Author

rohit-sourcefuse commented May 5, 2026

Architecture clarification — plain functions vs @mastra/core classes

After deeper review of the test code, the step functions follow this signature:

async function checkCacheStep(
  state: DbQueryState,
  context: MastraDbQueryContext,
  deps: { llm: LLMProvider; datasetSearch: ...; }
): Promise<Partial<DbQueryState>>

These are plain TypeScript functions — not @mastra/core step objects. The src/mastra/ folder implements Mastra's architectural pattern (steps, conditions, workflow orchestrator) without necessarily depending on the @mastra/core npm package.

This is a valid approach — pure functions are easier to test and have no framework coupling.


Two questions that need answering before this can be fully reviewed

1. Does MastraDbQueryWorkflow use @mastra/core's createWorkflow/createStep internally, or is it a plain class that calls step functions directly?

2. Does the chat graph migration use @mastra/core's Agent class (new Agent({model: openai(...)}))?

These answers confirm whether @mastra/core needs to be in package.json. The step logic, test coverage, and AI SDK dependency swap all look correct — this is the only remaining architectural question.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

SonarQube reviewer guide

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
15 New Major Issues (required ≤ 5)
41 New Critical Issues (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@piyushsinghgaur1 piyushsinghgaur1 requested a review from Copilot May 6, 2026 15:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@piyushsinghgaur1 piyushsinghgaur1 requested a review from Copilot May 6, 2026 15:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants