Skip to content

feat(web): add MCP and API key usage tracking to analytics#950

Open
msukkari wants to merge 8 commits intomainfrom
msukkari/mcp-analytics-SOU-579
Open

feat(web): add MCP and API key usage tracking to analytics#950
msukkari wants to merge 8 commits intomainfrom
msukkari/mcp-analytics-SOU-579

Conversation

@msukkari
Copy link
Contributor

@msukkari msukkari commented Feb 27, 2026

Summary

Analytics: MCP & API usage tracking

  • Move audit event creation from client-side search bar to service functions (search, streamSearch, listRepos, getFileSource, getTree)
  • Add source metadata to audit events to distinguish MCP requests from other API calls
  • Extend analytics SQL query to include new audit actions and display MCP request and API request counts
  • Add two new charts to analytics dashboard: MCP Requests and API Requests

Audit log retention policy

  • Add SOURCEBOT_EE_AUDIT_RETENTION_DAYS environment variable (default: 180) to configure how long audit logs are kept
  • Add AuditLogPruner background job that runs daily and prunes audit records older than the configured retention period, deleting in batches of 10K to avoid long-running transactions
  • Set to 0 to disable pruning and retain logs indefinitely

Analytics UI: retention info

  • Surface the configured retention period and oldest audit record date in the analytics page header
  • Wrap the analytics response in an object ({ rows, retentionDays, oldestRecordDate }) instead of returning a flat array

Documentation updates

  • Audit logs docs: add retention policy section, update audit action types table (removed 4 stale actions, added 11 missing actions), add chat to targetType enum, add source to metadata schema
  • Environment variables docs: add SOURCEBOT_EE_AUDIT_RETENTION_DAYS entry
  • Sizing guide: add audit log storage section with enterprise callout, per-event storage estimate (~350 bytes), projected storage table for various team sizes, and retention policy guidance

Mock data script

  • Update inject-audit-data.ts with 5 mixed-usage user profiles (web-only, web+MCP hybrid, MCP-heavy, API-only, web+API hybrid) with randomized weights
  • Add all new audit actions and source metadata for MCP/API users

Screenshots

image

Retention policy and oldest audit log date displayed in analytics header:

Retention info in analytics header

Test plan

  • Verify web UI searches still fire audit events and appear in analytics
  • Verify MCP tool invocations fire audit events with source='mcp' and appear in MCP Requests chart
  • Verify non-MCP API key usage fires audit events and appears in API Requests chart
  • Verify API users count as active users in analytics
  • Verify retention period and "data since" date display correctly in analytics header
  • Verify audit log pruner runs daily and correctly deletes records older than the configured retention period
  • Verify setting SOURCEBOT_EE_AUDIT_RETENTION_DAYS=0 disables pruning

Linear: SOU-579

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features

  • Audit log retention policy: Automatically prune audit logs older than 180 days (configurable via environment settings; default 180 days, set to 0 to disable).
  • Enhanced analytics: View metrics breakdown by source (web app, MCP, API) for better visibility into usage patterns.
  • Improved audit tracking: Source information now captured for all user actions to distinguish between web UI, API, and other client sources.

Documentation

  • Added audit log retention configuration guide and sizing recommendations for audit log storage.

msukkari and others added 4 commits February 26, 2026 15:06
Move audit event creation from client-side to service functions for search, repos, file source, and file tree endpoints. Add source metadata to distinguish MCP requests from other API calls. Extend analytics SQL to include new actions and display MCP request and API request counts on the analytics dashboard.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add SOURCEBOT_EE_AUDIT_RETENTION_DAYS env var (default 180) and AuditLogPruner background job that prunes old audit records daily in batches
- Surface retention period and oldest record date in analytics page header
- Update audit action types table in docs (remove 4 stale, add 11 missing)
- Add audit log storage section to sizing guide with enterprise callout and storage estimates
- Update mock data script with mixed-usage user profiles and new audit actions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds audit log retention policy with automatic daily pruning (default 180 days), introduces source metadata tracking across user actions to differentiate web client, MCP, and API traffic, enhances analytics to provide source-based metrics aggregation, and includes data migration and injection tooling for audit records.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md, docs/docs/configuration/audit-logs.mdx, docs/docs/configuration/environment-variables.mdx, docs/docs/deployment/sizing-guide.mdx
Added audit log retention policy documentation, configuration of SOURCEBOT_EE_AUDIT_RETENTION_DAYS env variable (default 180 days), expanded audit action types table with new actions, and sizing guide for audit log storage requirements.
Audit Pruning Infrastructure
packages/backend/src/ee/auditLogPruner.ts, packages/backend/src/index.ts
Introduced AuditLogPruner class that manages scheduled daily pruning of audit logs older than retention threshold via batch deletion; integrated into backend startup/shutdown lifecycle.
Environment & Configuration
packages/shared/src/env.server.ts
Added SOURCEBOT_EE_AUDIT_RETENTION_DAYS server environment variable with default value of 180.
Audit Metadata Schema
packages/web/src/ee/features/audit/types.ts
Extended auditMetadataSchema with optional source field to track request source from X-Sourcebot-Client-Source header.
Analytics Backend
packages/web/src/ee/features/analytics/actions.ts, packages/web/src/ee/features/analytics/types.ts
Expanded analytics query to include metadata and actorId, added new actions (user.listed_repos, user.fetched_file_source, user.fetched_file_tree), introduced source-based metrics aggregation (web, MCP, API), and updated return type to include retentionDays and oldestRecordDate.
Analytics Frontend
packages/web/src/ee/features/analytics/analyticsContent.tsx
Refactored analytics UI to display web/MCP/API metrics separately, added retention and oldest record date display, introduced ChartDefinition-driven rendering, and updated data structure from flat AnalyticsResponse to rows-based format.
API Auditing
packages/web/src/app/api/(server)/repos/listReposApi.ts, packages/web/src/features/git/getFileSourceApi.ts, packages/web/src/features/git/getTreeApi.ts, packages/web/src/features/search/searchApi.ts
Added audit logging and source metadata tracking to repo listing, file source fetching, tree fetching, and code search endpoints; extended signatures with optional sourceOverride parameter.
Chat Features
packages/web/src/app/api/(server)/chat/blocking/route.ts, packages/web/src/features/chat/actions.ts, packages/web/src/features/chat/useCreateNewChatThread.ts, packages/web/src/features/chat/agent.ts, packages/web/src/features/chat/tools.ts
Added audit logging for chat creation (user.created_ask_chat) and source parameter propagation; updated agent and tools to pass 'sourcebot-ask-agent' as source override.
Code Navigation
packages/web/src/ee/features/codeNav/components/symbolHoverPopup/index.tsx, packages/web/src/features/codeNav/api.ts
Added source metadata ('sourcebot-ui-codenav') to goto definition and find references audit events; updated search calls with sourceOverride parameter.
Search UI
packages/web/src/app/[domain]/components/searchBar/searchBar.tsx
Removed user.performed_code_search audit emission from search bar onSubmit handler.
Database & Tooling
packages/db/prisma/migrations/20260226000000_backfill_audit_source_metadata/migration.sql, packages/db/tools/scriptRunner.ts, packages/db/tools/scripts/inject-audit-data-v2.ts
Added migration to backfill source metadata for historical web and UI navigation audit events; introduced script registry entry and new data injection script for generating synthetic audit data across web, MCP, and API sources.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

sourcebot-team

Suggested reviewers

  • brendan-kellam
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes a core feature of the changeset: adding MCP and API tracking to analytics. While the PR also introduces audit log retention/pruning, the title correctly emphasizes the main user-facing feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch msukkari/mcp-analytics-SOU-579

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
packages/db/tools/scripts/inject-audit-data.ts (1)

146-170: Consider batching audit record creation for performance.

The createAudits function creates records one at a time with individual prisma.audit.create() calls. For the scale of data being generated (50 users × 90 days × multiple actions), this could be slow.

Consider using prisma.audit.createMany() for batch inserts:

♻️ Optional: Batch inserts for better performance
 async function createAudits(
     userId: string,
     action: string,
     count: number,
     currentDate: Date,
     isWeekend: boolean,
     targetType: string,
     metadata?: Prisma.InputJsonValue,
 ) {
+    const records = [];
     for (let i = 0; i < count; i++) {
-        await prisma.audit.create({
-            data: {
-                timestamp: randomTimestamp(currentDate, isWeekend),
-                action,
-                actorId: userId,
-                actorType: 'user',
-                targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
-                targetType,
-                sourcebotVersion: '1.0.0',
-                orgId,
-                ...(metadata ? { metadata } : {}),
-            }
-        });
+        records.push({
+            timestamp: randomTimestamp(currentDate, isWeekend),
+            action,
+            actorId: userId,
+            actorType: 'user',
+            targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
+            targetType,
+            sourcebotVersion: '1.0.0',
+            orgId,
+            ...(metadata ? { metadata } : {}),
+        });
+    }
+    if (records.length > 0) {
+        await prisma.audit.createMany({ data: records });
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data.ts` around lines 146 - 170, The
createAudits function currently does single-row inserts via prisma.audit.create
in a loop which will be slow; change it to accumulate records into an array and
call prisma.audit.createMany once per call (or in configurable batch sizes) to
insert many rows at once, preserving fields set in the loop (timestamp from
randomTimestamp(currentDate, isWeekend), action, actorId, actorType 'user',
targetId, targetType, sourcebotVersion, orgId and optional metadata) and ensure
metadata is included when present; update createAudits to build the batch
payload and handle createMany response/errors instead of awaiting each
prisma.audit.create.
packages/web/src/features/search/searchApi.ts (2)

31-40: Silent error swallowing hides audit failures.

The .catch(() => {}) pattern silently ignores all audit errors, making it difficult to detect and debug audit system issues in production. Consider logging at a debug/warn level instead.

♻️ Proposed fix to add minimal error logging
             getAuditService().createAudit({
                 action: 'user.performed_code_search',
                 actor: { id: user.id, type: 'user' },
                 target: { id: org.id.toString(), type: 'org' },
                 orgId: org.id,
                 metadata: { source },
-            }).catch(() => {});
+            }).catch((e) => console.debug('Audit creation failed:', e));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/search/searchApi.ts` around lines 31 - 40, The
audit call to getAuditService().createAudit currently swallows all errors via
.catch(() => {}); change this to surface failures by catching the error and
logging it (including the error object/message and relevant context such as
action 'user.performed_code_search', actor id, org id and source) at an
appropriate level (warn or debug) instead of a no-op; update the promise chain
around createAudit(...) in searchApi.ts so errors are logged with clear context
to aid debugging.

30-40: Duplicated audit logic between search and streamSearch.

Both functions have identical audit creation code. Consider extracting a helper to reduce duplication.

♻️ Example helper extraction
const auditCodeSearch = async (user: UserWithAccounts, org: { id: number }) => {
    const source = (await headers()).get('X-Sourcebot-Client-Source') ?? undefined;
    getAuditService().createAudit({
        action: 'user.performed_code_search',
        actor: { id: user.id, type: 'user' },
        target: { id: org.id.toString(), type: 'org' },
        orgId: org.id,
        metadata: { source },
    }).catch((e) => console.debug('Audit creation failed:', e));
};

Also applies to: 61-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/search/searchApi.ts` around lines 30 - 40,
Duplicate audit creation logic in the search and streamSearch handlers: extract
a helper (e.g., auditCodeSearch) that accepts the user (UserWithAccounts) and
org ({ id: number }), calls headers() to read 'X-Sourcebot-Client-Source', and
invokes getAuditService().createAudit(...) with the same payload used in both
handlers; then replace the inline blocks in search and streamSearch with a
fire-and-forget call to auditCodeSearch(...).ensure the helper catches errors
(e.g., .catch(...) or try/catch) so the handlers don’t fail on audit errors and
reference the original symbols headers(), getAuditService().createAudit, search,
and streamSearch when making the change.
packages/web/src/ee/features/analytics/actions.ts (1)

112-122: Consider caching or combining queries for oldest record.

The additional query for oldestRecordDate adds a separate database round-trip on every analytics fetch. For high-traffic deployments, consider either:

  1. Caching this value (it changes slowly)
  2. Incorporating it into the main CTE query

This is a minor optimization opportunity and not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/ee/features/analytics/actions.ts` around lines 112 - 122,
The current analytics action issues a separate prisma.audit.findFirst call to
populate oldestRecordDate (and returns env.SOURCEBOT_EE_AUDIT_RETENTION_DAYS),
which adds an extra DB round-trip; either cache this slowly-changing value with
a short TTL (e.g., in-memory or Redis) and return cached oldestRecordDate from
the action, or modify the main CTE/rows query to also SELECT MIN(timestamp) AS
oldestTimestamp (so you can set oldestRecordDate from that result) and remove
the prisma.audit.findFirst call; update references to oldestRecordDate and
remove the extra query (look for prisma.audit.findFirst, oldestRecordDate, rows
and the main CTE in this file).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/db/tools/scripts/inject-audit-data.ts`:
- Around line 146-170: The createAudits function currently does single-row
inserts via prisma.audit.create in a loop which will be slow; change it to
accumulate records into an array and call prisma.audit.createMany once per call
(or in configurable batch sizes) to insert many rows at once, preserving fields
set in the loop (timestamp from randomTimestamp(currentDate, isWeekend), action,
actorId, actorType 'user', targetId, targetType, sourcebotVersion, orgId and
optional metadata) and ensure metadata is included when present; update
createAudits to build the batch payload and handle createMany response/errors
instead of awaiting each prisma.audit.create.

In `@packages/web/src/ee/features/analytics/actions.ts`:
- Around line 112-122: The current analytics action issues a separate
prisma.audit.findFirst call to populate oldestRecordDate (and returns
env.SOURCEBOT_EE_AUDIT_RETENTION_DAYS), which adds an extra DB round-trip;
either cache this slowly-changing value with a short TTL (e.g., in-memory or
Redis) and return cached oldestRecordDate from the action, or modify the main
CTE/rows query to also SELECT MIN(timestamp) AS oldestTimestamp (so you can set
oldestRecordDate from that result) and remove the prisma.audit.findFirst call;
update references to oldestRecordDate and remove the extra query (look for
prisma.audit.findFirst, oldestRecordDate, rows and the main CTE in this file).

In `@packages/web/src/features/search/searchApi.ts`:
- Around line 31-40: The audit call to getAuditService().createAudit currently
swallows all errors via .catch(() => {}); change this to surface failures by
catching the error and logging it (including the error object/message and
relevant context such as action 'user.performed_code_search', actor id, org id
and source) at an appropriate level (warn or debug) instead of a no-op; update
the promise chain around createAudit(...) in searchApi.ts so errors are logged
with clear context to aid debugging.
- Around line 30-40: Duplicate audit creation logic in the search and
streamSearch handlers: extract a helper (e.g., auditCodeSearch) that accepts the
user (UserWithAccounts) and org ({ id: number }), calls headers() to read
'X-Sourcebot-Client-Source', and invokes getAuditService().createAudit(...) with
the same payload used in both handlers; then replace the inline blocks in search
and streamSearch with a fire-and-forget call to auditCodeSearch(...).ensure the
helper catches errors (e.g., .catch(...) or try/catch) so the handlers don’t
fail on audit errors and reference the original symbols headers(),
getAuditService().createAudit, search, and streamSearch when making the change.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e369199 and ad35631.

📒 Files selected for processing (17)
  • CHANGELOG.md
  • docs/docs/configuration/audit-logs.mdx
  • docs/docs/configuration/environment-variables.mdx
  • docs/docs/deployment/sizing-guide.mdx
  • packages/backend/src/ee/auditLogPruner.ts
  • packages/backend/src/index.ts
  • packages/db/tools/scripts/inject-audit-data.ts
  • packages/shared/src/env.server.ts
  • packages/web/src/app/[domain]/components/searchBar/searchBar.tsx
  • packages/web/src/app/api/(server)/repos/listReposApi.ts
  • packages/web/src/ee/features/analytics/actions.ts
  • packages/web/src/ee/features/analytics/analyticsContent.tsx
  • packages/web/src/ee/features/analytics/types.ts
  • packages/web/src/ee/features/audit/types.ts
  • packages/web/src/features/git/getFileSourceApi.ts
  • packages/web/src/features/git/getTreeApi.ts
  • packages/web/src/features/search/searchApi.ts
💤 Files with no reviewable changes (1)
  • packages/web/src/app/[domain]/components/searchBar/searchBar.tsx

Extend the sourceOverride pattern to getFileSource and getTree so
internal callers (chat AI agent) can tag audit events with the correct
source instead of relying on the HTTP header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/web/src/features/git/getFileSourceApi.ts (1)

35-45: Variable shadowing: source declared twice with different meanings.

The variable source is declared on line 37 for the audit metadata (client source string), but a different source variable is declared on line 61 for the file content. This shadowing can lead to confusion and maintenance issues.

Consider renaming the audit-related variable to something more descriptive like clientSource or auditSource.

Suggested rename
 if (user) {
-    const source = sourceOverride ?? (await headers()).get('X-Sourcebot-Client-Source') ?? undefined;
+    const clientSource = sourceOverride ?? (await headers()).get('X-Sourcebot-Client-Source') ?? undefined;
     getAuditService().createAudit({
         action: 'user.fetched_file_source',
         actor: { id: user.id, type: 'user' },
         target: { id: org.id.toString(), type: 'org' },
         orgId: org.id,
-        metadata: { source },
+        metadata: { source: clientSource },
     }).catch(() => {});
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/git/getFileSourceApi.ts` around lines 35 - 45, The
audit code in getFileSource declares a variable named source for the
client/audit metadata which is later shadowed by another variable named source
used for file content; rename the audit variable to a clearer name (e.g.,
clientSource or auditSource) inside the withOptionalAuthV2 block where
getAuditService().createAudit is called and update its usage in metadata: {
source } to metadata: { clientSource } (or chosen name) so it no longer
conflicts with the file content variable later in getFileSource.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/web/src/features/git/getFileSourceApi.ts`:
- Around line 35-45: The audit code in getFileSource declares a variable named
source for the client/audit metadata which is later shadowed by another variable
named source used for file content; rename the audit variable to a clearer name
(e.g., clientSource or auditSource) inside the withOptionalAuthV2 block where
getAuditService().createAudit is called and update its usage in metadata: {
source } to metadata: { clientSource } (or chosen name) so it no longer
conflicts with the file content variable later in getFileSource.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad35631 and 794a05c.

📒 Files selected for processing (4)
  • packages/web/src/features/chat/agent.ts
  • packages/web/src/features/chat/tools.ts
  • packages/web/src/features/git/getFileSourceApi.ts
  • packages/web/src/features/git/getTreeApi.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/web/src/features/git/getTreeApi.ts

msukkari and others added 2 commits February 26, 2026 17:57
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tag all audit events with source metadata (sourcebot-web-client,
  sourcebot-ask-agent, sourcebot-ui-codenav, mcp) via sourceOverride
- Restructure analytics SQL to segment by Web App (sourcebot-*),
  MCP, and API (everything else)
- Add global active users chart at top of analytics page
- Add info hover tooltips explaining each chart
- Prefix chart names with their section (Web/MCP/API) for clarity
- Update inject-audit-data script to use correct source values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/web/src/features/chat/actions.ts (1)

133-164: ⚠️ Potential issue | 🟠 Major

Constrain source at the server-action boundary before audit persistence.

On Lines 133 and 163, source?: string is client-supplied and written as-is into audit metadata. Please normalize or validate here too, so audit dimensions remain trustworthy and bounded.

Suggested hardening
 export const createChat = async ({ source }: { source?: string } = {}) => sew(() =>
     withOptionalAuthV2(async ({ org, user, prisma }) => {
+        const normalizedSource =
+            source && /^[a-z0-9._-]{1,64}$/i.test(source)
+                ? source
+                : undefined;
         const isGuestUser = user === undefined;
@@
                 orgId: org.id,
-                metadata: { source },
+                metadata: { source: normalizedSource },
             });
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/chat/actions.ts` around lines 133 - 164, The
createChat server action accepts a client-supplied source string and writes it
directly into audit metadata; sanitize and constrain it before persisting by
validating the source parameter in createChat (e.g., enforce allowed values or a
whitelist, trim and limit length, and replace invalid/too-long values with a
normalized token like "unknown" or "untrusted_client") and then pass that
cleaned value into auditService.createAudit's metadata. Locate createChat and
the call to auditService.createAudit and add a small normalization step for the
source argument (validate characters/length or map to known enums) so the stored
metadata is bounded and trustworthy.
🧹 Nitpick comments (3)
packages/web/src/ee/features/analytics/actions.ts (1)

23-23: Run independent DB reads in parallel to reduce request latency.

The analytics row query and oldest-record lookup are independent; executing them sequentially adds avoidable round-trip time.

⚡ Proposed refactor
-      const rows = await prisma.$queryRaw<AnalyticsRow[]>`
+      const rowsPromise = prisma.$queryRaw<AnalyticsRow[]>`
       WITH core AS (
         ...
       ORDER BY b.period, b.bucket;
     `;
-
-      const oldestRecord = await prisma.audit.findFirst({
+      const oldestRecordPromise = prisma.audit.findFirst({
         where: { orgId: org.id },
         orderBy: { timestamp: 'asc' },
         select: { timestamp: true },
       });
+
+      const [rows, oldestRecord] = await Promise.all([rowsPromise, oldestRecordPromise]);

Also applies to: 148-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/ee/features/analytics/actions.ts` at line 23, The two
independent DB reads (the analytics rows query currently awaited into rows via
prisma.$queryRaw<AnalyticsRow[]> and the separate oldest-record lookup that
assigns oldestRecord) should be executed in parallel to reduce latency: start
both queries without awaiting (e.g., rowsPromise =
prisma.$queryRaw<AnalyticsRow[]>`...` and oldestRecordPromise =
prisma.$queryRaw`...` or the existing oldestRecord query call), then await both
with const [rows, oldestRecord] = await Promise.all([rowsPromise,
oldestRecordPromise]); replace the sequential awaits so both round-trips run
concurrently.
packages/db/tools/scripts/inject-audit-data.ts (1)

146-170: Consider batching audit inserts to reduce database round-trips.

The createAudits function performs one individual await prisma.audit.create() call per audit record (line 156), and the function itself is called repeatedly across nested loops iterating over users, days, and action types. With counts ranging from 1–30+ per action, this can result in thousands of sequential database calls.

Suggested refactor (batched inserts)
         async function createAudits(
             userId: string,
             action: string,
             count: number,
             currentDate: Date,
             isWeekend: boolean,
             targetType: string,
             metadata?: Prisma.InputJsonValue,
         ) {
-            for (let i = 0; i < count; i++) {
-                await prisma.audit.create({
-                    data: {
-                        timestamp: randomTimestamp(currentDate, isWeekend),
-                        action,
-                        actorId: userId,
-                        actorType: 'user',
-                        targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
-                        targetType,
-                        sourcebotVersion: '1.0.0',
-                        orgId,
-                        ...(metadata ? { metadata } : {}),
-                    }
-                });
-            }
+            if (count <= 0) return;
+
+            const rows = Array.from({ length: count }, () => ({
+                timestamp: randomTimestamp(currentDate, isWeekend),
+                action,
+                actorId: userId,
+                actorType: 'user',
+                targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
+                targetType,
+                sourcebotVersion: '1.0.0',
+                orgId,
+                ...(metadata ? { metadata } : {}),
+            }));
+
+            const BATCH_SIZE = 1000;
+            for (let i = 0; i < rows.length; i += BATCH_SIZE) {
+                await prisma.audit.createMany({ data: rows.slice(i, i + BATCH_SIZE) });
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data.ts` around lines 146 - 170, The
createAudits helper currently issues one prisma.audit.create per record causing
many round-trips; change createAudits to accumulate records into an array (using
the same fields: timestamp from randomTimestamp(currentDate, isWeekend), action,
actorId=userId, actorType='user', targetId, targetType, sourcebotVersion, orgId,
and optional metadata) and then use prisma.audit.createMany to insert the batch;
if concerned about very large counts, flush in chunks (e.g., 500–1000) to avoid
huge payloads and preserve randomTimestamp generation per record before calling
prisma.audit.createMany.
packages/web/src/features/chat/useCreateNewChatThread.ts (1)

40-40: Prefer a shared constant for source literals.

Line 40 hardcodes 'sourcebot-web-client'. Consider centralizing source identifiers in a shared constant/enum to prevent drift and typos across clients/routes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/chat/useCreateNewChatThread.ts` at line 40, The
call to createChat in useCreateNewChatThread.ts uses the hardcoded source string
'sourcebot-web-client'; replace this literal with a shared constant (e.g.,
SOURCEBOT_WEB_CLIENT or SourceBotSource enum) exported from a common constants
module and import it into useCreateNewChatThread, then call createChat({ source:
SOURCEBOT_WEB_CLIENT }) to centralize source identifiers and avoid drift/typos
across callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/db/tools/scripts/inject-audit-data.ts`:
- Around line 199-202: The audit entry for the 'user.created_ask_chat' action is
using the wrong targetType ('org'); update the createAudits invocation that logs
this event (the call to createAudits(user.id, 'user.created_ask_chat',
scaledCount(...), currentDate, isWeekend, 'org', webMeta)) to use targetType
'chat' instead of 'org' so the audit correctly records a chat targetType.
- Around line 115-123: The cohort counters (webOnlyCount, hybridWebMcpCount,
mcpHeavyCount, apiOnlyCount, hybridWebApiCount) overlap because each uses
independent filters; convert them to mutually exclusive buckets by classifying
each user once (e.g., in a single users.reduce loop or with prioritized if/else)
and incrementing only one counter per user: keep the existing names
(webOnlyCount, hybridWebMcpCount, mcpHeavyCount, apiOnlyCount,
hybridWebApiCount) but compute them via a single pass that applies exclusive
logic (examples: webOnly: webWeight>0 && mcpWeight===0 && apiWeight===0;
hybridWebMcp: webWeight>0 && mcpWeight>0 && apiWeight===0; hybridWebApi:
webWeight>0 && apiWeight>0 && mcpWeight===0; apiOnly: apiWeight>0 &&
webWeight===0 && mcpWeight===0; mcpHeavy: mcpWeight>0 && webWeight<0.3 &&
apiWeight===0), or implement a prioritized if/else order to ensure no user is
counted in more than one bucket before the console.log uses those counts.

In `@packages/web/src/app/api/`(server)/chat/blocking/route.ts:
- Around line 127-133: The current getAuditService().createAudit(...).catch(()
=> {}) swallows errors; make the call non-blocking but log failures with context
instead of ignoring them: replace the empty catch with a handler that logs the
error and relevant audit context (action, actor.id/orgId/source) using the
existing logger (e.g., processLogger or logger) or console.error if none is
available, so getAuditService().createAudit(...) remains non-blocking but
audit-write failures are recorded for debugging/alerting.
- Around line 126-133: The request header value assigned to source is recorded
verbatim into audit metadata; validate and normalize it before calling
getAuditService().createAudit to prevent unbounded/unsafe values. In the route
handler, read the header into a temp variable, trim and strip control
characters, enforce a max length (e.g. 64 chars), then either map it against a
small allowlist of known client source identifiers or collapse unknown values to
a single token like "unknown" or "other"; finally pass the sanitized value (not
the raw header) into the metadata for createAudit. Ensure the change touches the
same variable used in getAuditService().createAudit so the stored metadata is
the normalized value.

In `@packages/web/src/ee/features/analytics/analyticsContent.tsx`:
- Around line 409-411: The "Data since" date uses new
Date(analyticsResponse.oldestRecordDate).toLocaleDateString(...) which can shift
by a day in some client timezones; update that call to produce a timezone-stable
string by specifying UTC in the formatting options (add timeZone: "UTC") or by
formatting using getUTCFullYear()/getUTCMonth()/getUTCDate() from the Date
constructed with analyticsResponse.oldestRecordDate so the JSX p element always
shows the UTC date without client-local day shifts.

---

Outside diff comments:
In `@packages/web/src/features/chat/actions.ts`:
- Around line 133-164: The createChat server action accepts a client-supplied
source string and writes it directly into audit metadata; sanitize and constrain
it before persisting by validating the source parameter in createChat (e.g.,
enforce allowed values or a whitelist, trim and limit length, and replace
invalid/too-long values with a normalized token like "unknown" or
"untrusted_client") and then pass that cleaned value into
auditService.createAudit's metadata. Locate createChat and the call to
auditService.createAudit and add a small normalization step for the source
argument (validate characters/length or map to known enums) so the stored
metadata is bounded and trustworthy.

---

Nitpick comments:
In `@packages/db/tools/scripts/inject-audit-data.ts`:
- Around line 146-170: The createAudits helper currently issues one
prisma.audit.create per record causing many round-trips; change createAudits to
accumulate records into an array (using the same fields: timestamp from
randomTimestamp(currentDate, isWeekend), action, actorId=userId,
actorType='user', targetId, targetType, sourcebotVersion, orgId, and optional
metadata) and then use prisma.audit.createMany to insert the batch; if concerned
about very large counts, flush in chunks (e.g., 500–1000) to avoid huge payloads
and preserve randomTimestamp generation per record before calling
prisma.audit.createMany.

In `@packages/web/src/ee/features/analytics/actions.ts`:
- Line 23: The two independent DB reads (the analytics rows query currently
awaited into rows via prisma.$queryRaw<AnalyticsRow[]> and the separate
oldest-record lookup that assigns oldestRecord) should be executed in parallel
to reduce latency: start both queries without awaiting (e.g., rowsPromise =
prisma.$queryRaw<AnalyticsRow[]>`...` and oldestRecordPromise =
prisma.$queryRaw`...` or the existing oldestRecord query call), then await both
with const [rows, oldestRecord] = await Promise.all([rowsPromise,
oldestRecordPromise]); replace the sequential awaits so both round-trips run
concurrently.

In `@packages/web/src/features/chat/useCreateNewChatThread.ts`:
- Line 40: The call to createChat in useCreateNewChatThread.ts uses the
hardcoded source string 'sourcebot-web-client'; replace this literal with a
shared constant (e.g., SOURCEBOT_WEB_CLIENT or SourceBotSource enum) exported
from a common constants module and import it into useCreateNewChatThread, then
call createChat({ source: SOURCEBOT_WEB_CLIENT }) to centralize source
identifiers and avoid drift/typos across callers.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56c7f40 and f9dea29.

📒 Files selected for processing (11)
  • packages/db/tools/scripts/inject-audit-data.ts
  • packages/web/src/app/api/(server)/chat/blocking/route.ts
  • packages/web/src/app/api/(server)/repos/listReposApi.ts
  • packages/web/src/ee/features/analytics/actions.ts
  • packages/web/src/ee/features/analytics/analyticsContent.tsx
  • packages/web/src/ee/features/analytics/types.ts
  • packages/web/src/ee/features/codeNav/components/symbolHoverPopup/index.tsx
  • packages/web/src/features/chat/actions.ts
  • packages/web/src/features/chat/useCreateNewChatThread.ts
  • packages/web/src/features/codeNav/api.ts
  • packages/web/src/features/search/searchApi.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/web/src/features/search/searchApi.ts
  • packages/web/src/app/api/(server)/repos/listReposApi.ts

Comment on lines 199 to 202
// Ask chats (0-2 base) - web only
await createAudits(user.id, 'user.created_ask_chat',
scaledCount(0, 2, user.webWeight, isWeekend), currentDate, isWeekend, 'org', webMeta);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

user.created_ask_chat should use targetType: 'chat'.

Line 201 currently tags chat creation as org, which breaks target-type semantics for this action.

Suggested fix
-                    await createAudits(user.id, 'user.created_ask_chat',
-                        scaledCount(0, 2, user.webWeight, isWeekend), currentDate, isWeekend, 'org', webMeta);
+                    await createAudits(user.id, 'user.created_ask_chat',
+                        scaledCount(0, 2, user.webWeight, isWeekend), currentDate, isWeekend, 'chat', webMeta);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Ask chats (0-2 base) - web only
await createAudits(user.id, 'user.created_ask_chat',
scaledCount(0, 2, user.webWeight, isWeekend), currentDate, isWeekend, 'org', webMeta);
// Ask chats (0-2 base) - web only
await createAudits(user.id, 'user.created_ask_chat',
scaledCount(0, 2, user.webWeight, isWeekend), currentDate, isWeekend, 'chat', webMeta);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data.ts` around lines 199 - 202, The
audit entry for the 'user.created_ask_chat' action is using the wrong targetType
('org'); update the createAudits invocation that logs this event (the call to
createAudits(user.id, 'user.created_ask_chat', scaledCount(...), currentDate,
isWeekend, 'org', webMeta)) to use targetType 'chat' instead of 'org' so the
audit correctly records a chat targetType.

Comment on lines +126 to +133
const source = request.headers.get('X-Sourcebot-Client-Source') ?? undefined;
getAuditService().createAudit({
action: 'user.created_ask_chat',
actor: { id: user.id, type: 'user' },
target: { id: org.id.toString(), type: 'org' },
orgId: org.id,
metadata: { source },
}).catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate and normalize X-Sourcebot-Client-Source before storing it.

On Line 126, source is taken directly from a request header and persisted to audit metadata. This permits arbitrary/unbounded values, which can skew analytics dimensions and risk logging unexpected content.

Suggested hardening
 if (user) {
-    const source = request.headers.get('X-Sourcebot-Client-Source') ?? undefined;
+    const rawSource = request.headers.get('X-Sourcebot-Client-Source');
+    const source =
+        rawSource && /^[a-z0-9._-]{1,64}$/i.test(rawSource)
+            ? rawSource
+            : undefined;

     getAuditService().createAudit({
         action: 'user.created_ask_chat',
         actor: { id: user.id, type: 'user' },
         target: { id: org.id.toString(), type: 'org' },
         orgId: org.id,
         metadata: { source },
     }).catch(() => {});
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const source = request.headers.get('X-Sourcebot-Client-Source') ?? undefined;
getAuditService().createAudit({
action: 'user.created_ask_chat',
actor: { id: user.id, type: 'user' },
target: { id: org.id.toString(), type: 'org' },
orgId: org.id,
metadata: { source },
}).catch(() => {});
const rawSource = request.headers.get('X-Sourcebot-Client-Source');
const source =
rawSource && /^[a-z0-9._-]{1,64}$/i.test(rawSource)
? rawSource
: undefined;
getAuditService().createAudit({
action: 'user.created_ask_chat',
actor: { id: user.id, type: 'user' },
target: { id: org.id.toString(), type: 'org' },
orgId: org.id,
metadata: { source },
}).catch(() => {});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/app/api/`(server)/chat/blocking/route.ts around lines 126 -
133, The request header value assigned to source is recorded verbatim into audit
metadata; validate and normalize it before calling getAuditService().createAudit
to prevent unbounded/unsafe values. In the route handler, read the header into a
temp variable, trim and strip control characters, enforce a max length (e.g. 64
chars), then either map it against a small allowlist of known client source
identifiers or collapse unknown values to a single token like "unknown" or
"other"; finally pass the sanitized value (not the raw header) into the metadata
for createAudit. Ensure the change touches the same variable used in
getAuditService().createAudit so the stored metadata is the normalized value.

Comment on lines +127 to +133
getAuditService().createAudit({
action: 'user.created_ask_chat',
actor: { id: user.id, type: 'user' },
target: { id: org.id.toString(), type: 'org' },
orgId: org.id,
metadata: { source },
}).catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t fully swallow audit-write failures.

On Line 133, .catch(() => {}) hides audit data-loss incidents. Keep it non-blocking, but log failure context.

Suggested fix
-                }).catch(() => {});
+                }).catch((error) => {
+                    logger.warn('Failed to write audit event user.created_ask_chat', {
+                        orgId: org.id,
+                        userId: user.id,
+                        error: error instanceof Error ? error.message : String(error),
+                    });
+                });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/app/api/`(server)/chat/blocking/route.ts around lines 127 -
133, The current getAuditService().createAudit(...).catch(() => {}) swallows
errors; make the call non-blocking but log failures with context instead of
ignoring them: replace the empty catch with a handler that logs the error and
relevant audit context (action, actor.id/orgId/source) using the existing logger
(e.g., processLogger or logger) or console.error if none is available, so
getAuditService().createAudit(...) remains non-blocking but audit-write failures
are recorded for debugging/alerting.

Comment on lines +409 to +411
<p className="text-xs text-muted-foreground/70">
Data since: {new Date(analyticsResponse.oldestRecordDate).toLocaleDateString("en-US", { month: "short", day: "numeric", year: "numeric" })}
</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pin Data since display timezone to avoid day-shift in some locales.

Line 410 can render an off-by-one date depending on client timezone. Use a fixed timezone (UTC) for stable display.

🗓️ Proposed fix
-                                Data since: {new Date(analyticsResponse.oldestRecordDate).toLocaleDateString("en-US", { month: "short", day: "numeric", year: "numeric" })}
+                                Data since: {new Date(analyticsResponse.oldestRecordDate).toLocaleDateString("en-US", {
+                                    month: "short",
+                                    day: "numeric",
+                                    year: "numeric",
+                                    timeZone: "UTC",
+                                })}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<p className="text-xs text-muted-foreground/70">
Data since: {new Date(analyticsResponse.oldestRecordDate).toLocaleDateString("en-US", { month: "short", day: "numeric", year: "numeric" })}
</p>
<p className="text-xs text-muted-foreground/70">
Data since: {new Date(analyticsResponse.oldestRecordDate).toLocaleDateString("en-US", {
month: "short",
day: "numeric",
year: "numeric",
timeZone: "UTC",
})}
</p>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/ee/features/analytics/analyticsContent.tsx` around lines 409
- 411, The "Data since" date uses new
Date(analyticsResponse.oldestRecordDate).toLocaleDateString(...) which can shift
by a day in some client timezones; update that call to produce a timezone-stable
string by specifying UTC in the formatting options (add timeZone: "UTC") or by
formatting using getUTCFullYear()/getUTCMonth()/getUTCDate() from the Date
constructed with analyticsResponse.oldestRecordDate so the JSX p element always
shows the UTC date without client-local day shifts.

Add a migration that backfills the 'source' field in audit metadata for
historical events created before source tracking was introduced. All
old events were web-only, so code searches and chats get
'sourcebot-web-client' and navigations get 'sourcebot-ui-codenav'.

Also restore the original inject-audit-data script and add
inject-audit-data-v2 with source-aware mock data generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
packages/db/prisma/migrations/20260226000000_backfill_audit_source_metadata/migration.sql (1)

1-19: LGTM – Migration is correct and idempotent.

The jsonb_set with COALESCE pattern correctly handles both NULL metadata and existing metadata objects. The WHERE clause ensures the migration is idempotent and won't overwrite future records that already have a source value.

One operational consideration: if the Audit table is very large (millions of rows), these unbatched UPDATE statements could cause prolonged row locks and transaction log growth during deployment. For production tables with high volume, consider chunked updates if rollout time becomes an issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/db/prisma/migrations/20260226000000_backfill_audit_source_metadata/migration.sql`
around lines 1 - 19, Migration is correct but may lock millions of rows; change
the two full-table UPDATEs on "Audit" (the UPDATE "Audit" ... SET metadata =
jsonb_set(... ) WHERE action IN (...) AND (metadata IS NULL OR
metadata->>'source' IS NULL)) into batched updates: iterate updating a
fixed-sized chunk (e.g., by primary key/id range or LIMIT with ORDER BY id) and
repeat until no rows affected, preserving the exact WHERE predicates and
jsonb_set expression so idempotency is unchanged; implement the batching logic
in the migration runner or as wrapped SQL loops to avoid long transactions and
large WAL growth.
packages/db/tools/scripts/inject-audit-data-v2.ts (2)

146-169: Per-row inserts in nested loops will be very slow at this volume.

await prisma.audit.create(...) per record inside deep loops causes excessive round-trips. Batch inserts with createMany will make this script dramatically faster.

Proposed refactor
 async function createAudits(
   userId: string,
   action: string,
   count: number,
   currentDate: Date,
   isWeekend: boolean,
   targetType: string,
   metadata?: Prisma.InputJsonValue,
 ) {
-  for (let i = 0; i < count; i++) {
-    await prisma.audit.create({
-      data: {
-        timestamp: randomTimestamp(currentDate, isWeekend),
-        action,
-        actorId: userId,
-        actorType: 'user',
-        targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
-        targetType,
-        sourcebotVersion: '1.0.0',
-        orgId,
-        ...(metadata ? { metadata } : {}),
-      }
-    });
-  }
+  if (count <= 0) return;
+  const data = Array.from({ length: count }, () => ({
+    timestamp: randomTimestamp(currentDate, isWeekend),
+    action,
+    actorId: userId,
+    actorType: 'user',
+    targetId: `${targetType}_${Math.floor(Math.random() * 1000)}`,
+    targetType,
+    sourcebotVersion: '1.0.0',
+    orgId,
+    ...(metadata ? { metadata } : {}),
+  }));
+  await prisma.audit.createMany({ data });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts` around lines 146 - 169,
The current createAudits function is doing per-row writes using
prisma.audit.create inside a loop (randomTimestamp, action, actorId, targetId,
etc.), which will be slow; change it to accumulate audit objects into an array
and call prisma.audit.createMany with those records (chunking into reasonable
sizes if count is large) instead of awaiting each create; ensure you preserve
fields used now (timestamp from randomTimestamp(currentDate, isWeekend), action,
actorId=userId, actorType='user', targetId, targetType, sourcebotVersion, orgId
and optional metadata) and handle any JSON/Prisma typing for metadata and
conflicts/options needed by createMany.

3-3: Avoid duplicate confirmation prompts.

Line [124] prompts again even though packages/db/tools/scriptRunner.ts already confirms before execution. This adds friction for manual runs and tooling automation.

Proposed simplification
-import { confirmAction } from "../utils";
@@
-        confirmAction();

Also applies to: 124-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts` at line 3, The duplicate
interactive check comes from calling confirmAction inside
packages/db/tools/scripts/inject-audit-data-v2.ts (around the main execution
block), but scriptRunner already performs confirmation; remove the confirmAction
call and its surrounding prompt/early-exit logic from inject-audit-data-v2.ts
and delete the now-unused import of confirmAction so the script relies on
packages/db/tools/scriptRunner.ts for confirmation instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/db/tools/scripts/inject-audit-data-v2.ts`:
- Around line 110-114: The 90-day window is off-by-one because startDate is set
with startDate.setDate(startDate.getDate() - 90) while the subsequent inclusive
date loop (iterating from startDate through endDate) counts both endpoints;
update the logic so exactly 90 days are generated—either change startDate to
startDate.setDate(startDate.getDate() - 89) or keep startDate as-is and make the
loop exclusive of endDate (use < endDate rather than <=). Reference the
variables endDate, startDate and the inclusive date iteration loop to apply the
fix.
- Around line 277-292: The current summary pulls all audits via
prisma.audit.findMany into allAudits and treats unknown/null metadata as "api",
which is expensive and misclassifies data; instead replace the full fetch and
in-memory loop (allAudits, webCount, mcpCount, apiCount, metadata, source) with
targeted DB counts: run separate prisma.count queries (or a grouped aggregation)
that filter on orgId and the JSON metadata->source conditions (e.g. source
startsWith 'sourcebot-' for web, source == 'mcp' for mcp) and explicitly
exclude/null-check metadata for the API bucket (or add an "unknown" bucket) so
null/undefined metadata are not counted as API; ensure the code updates the
variables webCount, mcpCount and apiCount (or adds unknownCount) from these
query results rather than iterating allAudits in memory.
- Around line 1-2: Replace the current value imports with type-only imports:
change the imports for Script, PrismaClient, and Prisma to use "import type" and
update the source for PrismaClient/Prisma to "@sourcebot/db" (matching the
pattern in scriptRunner.ts) so the file imports only types at runtime;
specifically update the import of Script from "../scriptRunner" to "import type
{ Script } from '../scriptRunner'" and update the Prisma imports to "import type
{ PrismaClient, Prisma } from '@sourcebot/db'".

---

Nitpick comments:
In
`@packages/db/prisma/migrations/20260226000000_backfill_audit_source_metadata/migration.sql`:
- Around line 1-19: Migration is correct but may lock millions of rows; change
the two full-table UPDATEs on "Audit" (the UPDATE "Audit" ... SET metadata =
jsonb_set(... ) WHERE action IN (...) AND (metadata IS NULL OR
metadata->>'source' IS NULL)) into batched updates: iterate updating a
fixed-sized chunk (e.g., by primary key/id range or LIMIT with ORDER BY id) and
repeat until no rows affected, preserving the exact WHERE predicates and
jsonb_set expression so idempotency is unchanged; implement the batching logic
in the migration runner or as wrapped SQL loops to avoid long transactions and
large WAL growth.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts`:
- Around line 146-169: The current createAudits function is doing per-row writes
using prisma.audit.create inside a loop (randomTimestamp, action, actorId,
targetId, etc.), which will be slow; change it to accumulate audit objects into
an array and call prisma.audit.createMany with those records (chunking into
reasonable sizes if count is large) instead of awaiting each create; ensure you
preserve fields used now (timestamp from randomTimestamp(currentDate,
isWeekend), action, actorId=userId, actorType='user', targetId, targetType,
sourcebotVersion, orgId and optional metadata) and handle any JSON/Prisma typing
for metadata and conflicts/options needed by createMany.
- Line 3: The duplicate interactive check comes from calling confirmAction
inside packages/db/tools/scripts/inject-audit-data-v2.ts (around the main
execution block), but scriptRunner already performs confirmation; remove the
confirmAction call and its surrounding prompt/early-exit logic from
inject-audit-data-v2.ts and delete the now-unused import of confirmAction so the
script relies on packages/db/tools/scriptRunner.ts for confirmation instead.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9dea29 and ab8b86f.

📒 Files selected for processing (3)
  • packages/db/prisma/migrations/20260226000000_backfill_audit_source_metadata/migration.sql
  • packages/db/tools/scriptRunner.ts
  • packages/db/tools/scripts/inject-audit-data-v2.ts

Comment on lines +1 to +2
import { Script } from "../scriptRunner";
import { PrismaClient, Prisma } from "../../dist";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# 1. First, let's see the full content of the file to understand how Script, PrismaClient, and Prisma are used
cat -n packages/db/tools/scripts/inject-audit-data-v2.ts | head -50

Repository: sourcebot-dev/sourcebot

Length of output: 2297


🏁 Script executed:

# 2. Check TypeScript configuration that affects import elision
find . -name 'tsconfig*.json' -type f | head -10 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'

Repository: sourcebot-dev/sourcebot

Length of output: 5268


🏁 Script executed:

# 3. Check if `@sourcebot/db` package exists and what it exports
find . -name 'package.json' -type f -path '*/@sourcebot/db*' -o -name 'package.json' -type f -path '*/packages/db/*' | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && cat {} | jq -r ".name, .exports" 2>/dev/null || cat {}'

Repository: sourcebot-dev/sourcebot

Length of output: 120


🏁 Script executed:

# 4. Check current import patterns in the file and surrounding context
rg -n 'import.*Script.*scriptRunner|import.*PrismaClient|import.*Prisma' packages/db/tools/scripts/inject-audit-data-v2.ts -A 2

Repository: sourcebot-dev/sourcebot

Length of output: 210


🏁 Script executed:

# 1. Check if Prisma type is used in the entire file
cat packages/db/tools/scripts/inject-audit-data-v2.ts | grep -n "Prisma" | head -20

Repository: sourcebot-dev/sourcebot

Length of output: 584


🏁 Script executed:

# 2. Check the scriptRunner to see if it has side effects
cat packages/db/tools/scriptRunner.ts | head -100

Repository: sourcebot-dev/sourcebot

Length of output: 1738


🏁 Script executed:

# 3. Check packages/db package.json for exports configuration
cat packages/db/package.json | jq '.exports, .main, .types' 2>/dev/null || cat packages/db/package.json

Repository: sourcebot-dev/sourcebot

Length of output: 92


🏁 Script executed:

# 4. Verify the dist directory structure from tools/scripts perspective
ls -la packages/db/dist/ 2>/dev/null | head -20

Repository: sourcebot-dev/sourcebot

Length of output: 49


🏁 Script executed:

# 5. Check if there's a tsconfig for tools/scripts or if it inherits from packages/db
find packages/db/tools -name 'tsconfig.json' -type f

Repository: sourcebot-dev/sourcebot

Length of output: 49


🏁 Script executed:

# Check what packages/db exports - look for the main entry point
find packages/db/src -name 'index.ts' -o -name 'index.d.ts' | xargs cat 2>/dev/null | head -50

Repository: sourcebot-dev/sourcebot

Length of output: 214


🏁 Script executed:

# Also check the .d.ts files to see exported types
find packages/db -name '*.d.ts' -type f | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -30 {}'

Repository: sourcebot-dev/sourcebot

Length of output: 49


Use import type for type-only imports.

At lines 1-2, all three symbols (Script, PrismaClient, Prisma) are used only in type positions. Import them using import type to prevent unnecessary runtime coupling. Additionally, import from @sourcebot/db instead of ../../dist — this aligns with the pattern in scriptRunner.ts and is more resilient across build environments.

Proposed fix
-import { Script } from "../scriptRunner";
-import { PrismaClient, Prisma } from "../../dist";
+import type { Script } from "../scriptRunner";
+import type { PrismaClient, Prisma } from "@sourcebot/db";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts` around lines 1 - 2,
Replace the current value imports with type-only imports: change the imports for
Script, PrismaClient, and Prisma to use "import type" and update the source for
PrismaClient/Prisma to "@sourcebot/db" (matching the pattern in scriptRunner.ts)
so the file imports only types at runtime; specifically update the import of
Script from "../scriptRunner" to "import type { Script } from '../scriptRunner'"
and update the Prisma imports to "import type { PrismaClient, Prisma } from
'@sourcebot/db'".

Comment on lines +110 to +114
// Generate data for the last 90 days
const endDate = new Date();
const startDate = new Date();
startDate.setDate(startDate.getDate() - 90);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

“Last 90 days” window is off by one day.

At Line [113] with an inclusive loop at Line [173], this generates 91 days.

Proposed fix
 const endDate = new Date();
-const startDate = new Date();
-startDate.setDate(startDate.getDate() - 90);
+const startDate = new Date(endDate);
+startDate.setDate(startDate.getDate() - 89);

Also applies to: 173-173

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts` around lines 110 - 114,
The 90-day window is off-by-one because startDate is set with
startDate.setDate(startDate.getDate() - 90) while the subsequent inclusive date
loop (iterating from startDate through endDate) counts both endpoints; update
the logic so exactly 90 days are generated—either change startDate to
startDate.setDate(startDate.getDate() - 89) or keep startDate as-is and make the
loop exclusive of endDate (use < endDate rather than <=). Reference the
variables endDate, startDate and the inclusive date iteration loop to apply the
fix.

Comment on lines +277 to +292
const allAudits = await prisma.audit.findMany({
where: { orgId },
select: { metadata: true }
});

let webCount = 0, mcpCount = 0, apiCount = 0;
for (const audit of allAudits) {
const meta = audit.metadata as Record<string, unknown> | null;
const source = meta?.source as string | undefined;
if (source && typeof source === 'string' && source.startsWith('sourcebot-')) {
webCount++;
} else if (source === 'mcp') {
mcpCount++;
} else {
apiCount++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Source summary logic can over-read and misclassify records.

At Line [277]-Line [280], loading all org audits for summary can be expensive, and at Line [290]-Line [292] unknown/null metadata is counted as API, which skews output.

Proposed fix
 const allAudits = await prisma.audit.findMany({
-  where: { orgId },
+  where: {
+    orgId,
+    timestamp: { gte: startDate, lte: endDate },
+  },
   select: { metadata: true }
 });
 
-let webCount = 0, mcpCount = 0, apiCount = 0;
+let webCount = 0, mcpCount = 0, apiCount = 0, unknownCount = 0;
 for (const audit of allAudits) {
   const meta = audit.metadata as Record<string, unknown> | null;
   const source = meta?.source as string | undefined;
   if (source && typeof source === 'string' && source.startsWith('sourcebot-')) {
     webCount++;
   } else if (source === 'mcp') {
     mcpCount++;
+  } else if (source === 'cli' || source === 'sdk' || source === 'custom-app') {
+    apiCount++;
   } else {
-    apiCount++;
+    unknownCount++;
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/tools/scripts/inject-audit-data-v2.ts` around lines 277 - 292,
The current summary pulls all audits via prisma.audit.findMany into allAudits
and treats unknown/null metadata as "api", which is expensive and misclassifies
data; instead replace the full fetch and in-memory loop (allAudits, webCount,
mcpCount, apiCount, metadata, source) with targeted DB counts: run separate
prisma.count queries (or a grouped aggregation) that filter on orgId and the
JSON metadata->source conditions (e.g. source startsWith 'sourcebot-' for web,
source == 'mcp' for mcp) and explicitly exclude/null-check metadata for the API
bucket (or add an "unknown" bucket) so null/undefined metadata are not counted
as API; ensure the code updates the variables webCount, mcpCount and apiCount
(or adds unknownCount) from these query results rather than iterating allAudits
in memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant