Skip to content
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Added audit log retention policy with `SOURCEBOT_EE_AUDIT_RETENTION_DAYS` environment variable (default 180 days). Daily background job prunes old audit records. [#950](https://github.com/sourcebot-dev/sourcebot/pull/950)

### Fixed
- Fixed search query parser rejecting parenthesized regex alternation in filter values (e.g. `file:(test|spec)`, `-file:(test|spec)`). [#946](https://github.com/sourcebot-dev/sourcebot/pull/946)
- Fixed `content:` filter ignoring the regex toggle. [#947](https://github.com/sourcebot-dev/sourcebot/pull/947)
Expand Down
49 changes: 30 additions & 19 deletions docs/docs/configuration/audit-logs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ This feature gives security and compliance teams the necessary information to en
## Enabling/Disabling Audit Logs
Audit logs are enabled by default and can be controlled with the `SOURCEBOT_EE_AUDIT_LOGGING_ENABLED` [environment variable](/docs/configuration/environment-variables).

## Retention Policy
By default, audit logs older than 180 days are automatically pruned daily. You can configure the retention period using the `SOURCEBOT_EE_AUDIT_RETENTION_DAYS` [environment variable](/docs/configuration/environment-variables). Set it to `0` to disable automatic pruning and retain logs indefinitely.

## Fetching Audit Logs
Audit logs are stored in the [postgres database](/docs/overview#architecture) connected to Sourcebot. To fetch all of the audit logs, you can use the following API:

Expand Down Expand Up @@ -110,30 +113,37 @@ curl --request GET '$SOURCEBOT_URL/api/ee/audit' \

| Action | Actor Type | Target Type |
| :------- | :------ | :------|
| `api_key.creation_failed` | `user` | `org` |
| `api_key.created` | `user` | `api_key` |
| `api_key.deletion_failed` | `user` | `org` |
| `api_key.creation_failed` | `user` | `org` |
| `api_key.deleted` | `user` | `api_key` |
| `api_key.deletion_failed` | `user` | `org` |
| `audit.fetch` | `user` | `org` |
| `chat.deleted` | `user` | `chat` |
| `chat.shared_with_users` | `user` | `chat` |
| `chat.unshared_with_user` | `user` | `chat` |
| `chat.visibility_updated` | `user` | `chat` |
| `org.ownership_transfer_failed` | `user` | `org` |
| `org.ownership_transferred` | `user` | `org` |
| `user.created_ask_chat` | `user` | `org` |
| `user.creation_failed` | `user` | `user` |
| `user.owner_created` | `user` | `org` |
| `user.performed_code_search` | `user` | `org` |
| `user.performed_find_references` | `user` | `org` |
| `user.performed_goto_definition` | `user` | `org` |
| `user.created_ask_chat` | `user` | `org` |
| `user.jit_provisioning_failed` | `user` | `org` |
| `user.jit_provisioned` | `user` | `org` |
| `user.join_request_creation_failed` | `user` | `org` |
| `user.join_requested` | `user` | `org` |
| `user.join_request_approve_failed` | `user` | `account_join_request` |
| `user.join_request_approved` | `user` | `account_join_request` |
| `user.invite_failed` | `user` | `org` |
| `user.invites_created` | `user` | `org` |
| `user.delete` | `user` | `user` |
| `user.fetched_file_source` | `user` | `org` |
| `user.fetched_file_tree` | `user` | `org` |
| `user.invite_accept_failed` | `user` | `invite` |
| `user.invite_accepted` | `user` | `invite` |
| `user.invite_failed` | `user` | `org` |
| `user.invites_created` | `user` | `org` |
| `user.join_request_approve_failed` | `user` | `account_join_request` |
| `user.join_request_approved` | `user` | `account_join_request` |
| `user.list` | `user` | `org` |
| `user.listed_repos` | `user` | `org` |
| `user.owner_created` | `user` | `org` |
| `user.performed_code_search` | `user` | `org` |
| `user.performed_find_references` | `user` | `org` |
| `user.performed_goto_definition` | `user` | `org` |
| `user.read` | `user` | `user` |
| `user.signed_in` | `user` | `user` |
| `user.signed_out` | `user` | `user` |
| `org.ownership_transfer_failed` | `user` | `org` |
| `org.ownership_transferred` | `user` | `org` |


## Response schema
Expand Down Expand Up @@ -180,7 +190,7 @@ curl --request GET '$SOURCEBOT_URL/api/ee/audit' \
},
"targetType": {
"type": "string",
"enum": ["user", "org", "file", "api_key", "account_join_request", "invite"]
"enum": ["user", "org", "file", "api_key", "account_join_request", "invite", "chat"]
},
"sourcebotVersion": {
"type": "string"
Expand All @@ -192,7 +202,8 @@ curl --request GET '$SOURCEBOT_URL/api/ee/audit' \
"properties": {
"message": { "type": "string" },
"api_key": { "type": "string" },
"emails": { "type": "string" }
"emails": { "type": "string" },
"source": { "type": "string" }
},
"additionalProperties": false
},
Expand Down
1 change: 1 addition & 0 deletions docs/docs/configuration/environment-variables.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ The following environment variables allow you to configure your Sourcebot deploy
| `HTTPS_PROXY` | - | <p>HTTPS proxy URL for routing SSL requests through a proxy server (e.g., `http://proxy.company.com:8080`). Requires `NODE_USE_ENV_PROXY=1`.</p> |
| `NO_PROXY` | - | <p>Comma-separated list of hostnames or domains that should bypass the proxy (e.g., `localhost,127.0.0.1,.internal.domain`). Requires `NODE_USE_ENV_PROXY=1`.</p> |
| `SOURCEBOT_EE_AUDIT_LOGGING_ENABLED` | `true` | <p>Enables/disables audit logging</p> |
| `SOURCEBOT_EE_AUDIT_RETENTION_DAYS` | `180` | <p>The number of days to retain audit logs. Audit log records older than this will be automatically pruned daily. Set to `0` to disable pruning and retain logs indefinitely.</p> |
| `AUTH_EE_GCP_IAP_ENABLED` | `false` | <p>When enabled, allows Sourcebot to automatically register/login from a successful GCP IAP redirect</p> |
| `AUTH_EE_GCP_IAP_AUDIENCE` | - | <p>The GCP IAP audience to use when verifying JWT tokens. Must be set to enable GCP IAP JIT provisioning</p> |
| `EXPERIMENT_EE_PERMISSION_SYNC_ENABLED` | `false` | <p>Enables [permission syncing](/docs/features/permission-syncing).</p> |
Expand Down
28 changes: 28 additions & 0 deletions docs/docs/deployment/sizing-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,34 @@ If your instance is resource-constrained, you can reduce the concurrency of back

Lowering these values reduces peak resource usage at the cost of slower initial indexing.

## Audit log storage

<Info>
Audit logging is an enterprise feature and is only available with an [enterprise license](/docs/overview#license-key). If you are not on an enterprise plan, audit logs are not stored and this section does not apply.
</Info>

[Audit logs](/docs/configuration/audit-logs) are stored in the Postgres database connected to your Sourcebot deployment. Each audit record captures the action performed, the actor, the target, a timestamp, and optional metadata (e.g., request source). There are three database indexes on the audit table to support analytics and lookup queries.

**Estimated storage per audit event: ~350 bytes** (including row data and indexes).

<Info>
The table below assumes 50 events per user per day. The actual number depends on usage patterns — each user action (code search, file view, navigation, Ask chat, etc.) creates one audit event. Users who interact via [MCP](/docs/features/mcp-server) or the [API](/docs/api-reference/search) tend to generate significantly more events than web-only users, so your real usage may vary.
</Info>

| Team size | Avg events / user / day | Daily events | Monthly storage | 6-month storage |
|---|---|---|---|---|
| 10 users | 50 | 500 | ~5 MB | ~30 MB |
| 50 users | 50 | 2,500 | ~25 MB | ~150 MB |
| 100 users | 50 | 5,000 | ~50 MB | ~300 MB |
| 500 users | 50 | 25,000 | ~250 MB | ~1.5 GB |
| 1,000 users | 50 | 50,000 | ~500 MB | ~3 GB |

### Retention policy

By default, audit logs older than **180 days** are automatically pruned daily by a background job. You can adjust this with the `SOURCEBOT_EE_AUDIT_RETENTION_DAYS` [environment variable](/docs/configuration/environment-variables). Set it to `0` to disable pruning and retain logs indefinitely.

For most deployments, the default 180-day retention keeps database size manageable. If you have a large team with heavy MCP/API usage and need longer retention, plan your Postgres disk allocation accordingly using the estimates above.

## Monitoring

We recommend monitoring the following metrics after deployment to validate your sizing:
Expand Down
71 changes: 71 additions & 0 deletions packages/backend/src/ee/auditLogPruner.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import { PrismaClient } from "@sourcebot/db";
import { createLogger, env } from "@sourcebot/shared";
import { setIntervalAsync } from "../utils.js";

const BATCH_SIZE = 10_000;
const ONE_DAY_MS = 24 * 60 * 60 * 1000;

const logger = createLogger('audit-log-pruner');

export class AuditLogPruner {
private interval?: NodeJS.Timeout;

constructor(private db: PrismaClient) {}

startScheduler() {
if (env.SOURCEBOT_EE_AUDIT_LOGGING_ENABLED !== 'true') {
logger.info('Audit logging is disabled, skipping audit log pruner.');
return;
}

if (env.SOURCEBOT_EE_AUDIT_RETENTION_DAYS <= 0) {
logger.info('SOURCEBOT_EE_AUDIT_RETENTION_DAYS is 0, audit log pruning is disabled.');
return;
}

logger.info(`Audit log pruner started. Retaining logs for ${env.SOURCEBOT_EE_AUDIT_RETENTION_DAYS} days.`);

// Run immediately on startup, then every 24 hours
this.pruneOldAuditLogs();
this.interval = setIntervalAsync(() => this.pruneOldAuditLogs(), ONE_DAY_MS);
}

async dispose() {
if (this.interval) {
clearInterval(this.interval);
this.interval = undefined;
}
}

private async pruneOldAuditLogs() {
const cutoff = new Date(Date.now() - env.SOURCEBOT_EE_AUDIT_RETENTION_DAYS * ONE_DAY_MS);
let totalDeleted = 0;

logger.info(`Pruning audit logs older than ${cutoff.toISOString()}...`);

// Delete in batches to avoid long-running transactions
while (true) {
const batch = await this.db.audit.findMany({
where: { timestamp: { lt: cutoff } },
select: { id: true },
take: BATCH_SIZE,
});

if (batch.length === 0) break;

const result = await this.db.audit.deleteMany({
where: { id: { in: batch.map(r => r.id) } },
});

totalDeleted += result.count;

if (batch.length < BATCH_SIZE) break;
}

if (totalDeleted > 0) {
logger.info(`Pruned ${totalDeleted} audit log records.`);
} else {
logger.info('No audit log records to prune.');
}
}
}
4 changes: 4 additions & 0 deletions packages/backend/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import { ConfigManager } from "./configManager.js";
import { ConnectionManager } from './connectionManager.js';
import { INDEX_CACHE_DIR, REPOS_CACHE_DIR, SHUTDOWN_SIGNALS } from './constants.js';
import { AccountPermissionSyncer } from "./ee/accountPermissionSyncer.js";
import { AuditLogPruner } from "./ee/auditLogPruner.js";
import { GithubAppManager } from "./ee/githubAppManager.js";
import { RepoPermissionSyncer } from './ee/repoPermissionSyncer.js';
import { shutdownPosthog } from "./posthog.js";
Expand Down Expand Up @@ -64,9 +65,11 @@ const repoPermissionSyncer = new RepoPermissionSyncer(prisma, settings, redis);
const accountPermissionSyncer = new AccountPermissionSyncer(prisma, settings, redis);
const repoIndexManager = new RepoIndexManager(prisma, settings, redis, promClient);
const configManager = new ConfigManager(prisma, connectionManager, env.CONFIG_PATH);
const auditLogPruner = new AuditLogPruner(prisma);

connectionManager.startScheduler();
repoIndexManager.startScheduler();
auditLogPruner.startScheduler();

if (env.EXPERIMENT_EE_PERMISSION_SYNC_ENABLED === 'true' && !hasEntitlement('permission-syncing')) {
logger.error('Permission syncing is not supported in current plan. Please contact team@sourcebot.dev for assistance.');
Expand Down Expand Up @@ -105,6 +108,7 @@ const listenToShutdownSignals = () => {
await connectionManager.dispose()
await repoPermissionSyncer.dispose()
await accountPermissionSyncer.dispose()
await auditLogPruner.dispose()
await configManager.dispose()

await prisma.$disconnect();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
-- Backfill source metadata for historical audit events.
--
-- Before this change, all audit events were created from the web UI without
-- a 'source' field in metadata. The new analytics dashboard segments events
-- by source (sourcebot-*, mcp, or null/other for API). Without this backfill,
-- historical web UI events would be misclassified as API traffic.

-- Code searches and chat creation were web-only (no server-side audit existed)
UPDATE "Audit"
SET metadata = jsonb_set(COALESCE(metadata, '{}')::jsonb, '{source}', '"sourcebot-web-client"')
WHERE action IN ('user.performed_code_search', 'user.created_ask_chat')
AND (metadata IS NULL OR metadata->>'source' IS NULL);

-- Navigation events (find references, goto definition) were web-only
-- (created from the symbolHoverPopup client component)
UPDATE "Audit"
SET metadata = jsonb_set(COALESCE(metadata, '{}')::jsonb, '{source}', '"sourcebot-ui-codenav"')
WHERE action IN ('user.performed_find_references', 'user.performed_goto_definition')
AND (metadata IS NULL OR metadata->>'source' IS NULL);
2 changes: 2 additions & 0 deletions packages/db/tools/scriptRunner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { PrismaClient } from "@sourcebot/db";
import { ArgumentParser } from "argparse";
import { migrateDuplicateConnections } from "./scripts/migrate-duplicate-connections";
import { injectAuditData } from "./scripts/inject-audit-data";
import { injectAuditDataV2 } from "./scripts/inject-audit-data-v2";
import { injectUserData } from "./scripts/inject-user-data";
import { confirmAction } from "./utils";
import { injectRepoData } from "./scripts/inject-repo-data";
Expand All @@ -14,6 +15,7 @@ export interface Script {
export const scripts: Record<string, Script> = {
"migrate-duplicate-connections": migrateDuplicateConnections,
"inject-audit-data": injectAuditData,
"inject-audit-data-v2": injectAuditDataV2,
"inject-user-data": injectUserData,
"inject-repo-data": injectRepoData,
"test-repo-query-perf": testRepoQueryPerf,
Expand Down
Loading