feat(search): implement FTS5 w/ sqlite for faster and better searching#6839
feat(search): implement FTS5 w/ sqlite for faster and better searching#6839perfectra1n wants to merge 48 commits intomainfrom
Conversation
feat(search): don't limit the number of blobs to put in virtual tables fix(search): improve FTS triggers to handle all SQL operations correctly The root cause of FTS index issues during import was that database triggers weren't properly handling all SQL operations, particularly upsert operations (INSERT ... ON CONFLICT ... DO UPDATE) that are commonly used during imports. Key improvements: - Fixed INSERT trigger to handle INSERT OR REPLACE operations - Updated UPDATE trigger to fire on ANY change (not just specific columns) - Improved blob triggers to use INSERT OR REPLACE for atomic updates - Added proper handling for notes created before their blobs (import scenario) - Added triggers for protection state changes - All triggers now use LEFT JOIN to handle missing blobs gracefully This ensures the FTS index stays synchronized even when: - Entity events are disabled during import - Notes are re-imported (upsert operations) - Blobs are deduplicated across notes - Notes are created before their content blobs The solution works entirely at the database level through triggers, removing the need for application-level workarounds. fix(search): consolidate FTS trigger fixes into migration 234 - Merged improved trigger logic from migration 235 into 234 - Deleted unnecessary migration 235 since DB version is still 234 - Ensures triggers handle all SQL operations (INSERT OR REPLACE, upserts) - Fixes FTS indexing for imported notes by handling missing blobs - Schema.sql and migration 234 now have identical trigger implementations
apps/server/src/assets/db/schema.sql
Outdated
| CREATE INDEX IDX_entity_changes_component | ||
| ON entity_changes (componentId, utcDateChanged DESC); |
There was a problem hiding this comment.
Can you double-check if a component index is really needed? The component ID is used on the client side to distinguish which UI element made the change to avoid accidentally updating the very same editor that the user is using.
On the server side I don't think the components are really necessary.
| // Verify SQLite version supports trigram tokenizer (requires 3.34.0+) | ||
| const sqliteVersion = sql.getValue<string>(`SELECT sqlite_version()`); | ||
| const [major, minor, patch] = sqliteVersion.split('.').map(Number); | ||
| const versionNumber = major * 10000 + minor * 100 + (patch || 0); | ||
| const requiredVersion = 3 * 10000 + 34 * 100 + 0; // 3.34.0 | ||
|
|
||
| if (versionNumber < requiredVersion) { | ||
| log.error(`SQLite version ${sqliteVersion} does not support trigram tokenizer (requires 3.34.0+)`); | ||
| log.info("Skipping FTS5 trigram migration - will use fallback search implementation"); | ||
| return; // Skip FTS5 setup, rely on fallback search | ||
| } |
There was a problem hiding this comment.
Is this really necessary at runtime? Our server uses pinned versions so as long as the version is correct, there's no need for runtime check.
| -- Drop existing FTS table if it exists (for re-running migration in dev) | ||
| DROP TABLE IF EXISTS notes_fts; |
There was a problem hiding this comment.
We don't re-run migrations in dev, so that would be unnecessary.
| -- Create FTS5 virtual table with trigram tokenizer | ||
| -- Trigram tokenizer provides language-agnostic substring matching: | ||
| -- 1. Fast substring matching (50-100x speedup for LIKE queries without wildcards) | ||
| -- 2. Case-insensitive search without custom collation | ||
| -- 3. No language-specific stemming assumptions (works for all languages) | ||
| -- 4. Boolean operators (AND, OR, NOT) and phrase matching with quotes | ||
| -- | ||
| -- IMPORTANT: Trigram requires minimum 3-character tokens for matching | ||
| -- detail='none' reduces index size by ~50% while maintaining MATCH/rank performance | ||
| -- (loses position info for highlight() function, but snippet() still works) |
There was a problem hiding this comment.
Comments could be moved out of the SQL statement and into the code to avoid embedding them at build time.
There was a problem hiding this comment.
Since migration 235 doesn't exist anymore, why not merge it with 0234 into a single migration?
| // Additional validation: ensure token doesn't contain SQL injection attempts | ||
| if (sanitized.includes(';') || sanitized.includes('--')) { | ||
| log.error(`Potential SQL injection attempt detected in token: "${token}"`); | ||
| return "__invalid_token__"; | ||
| } |
There was a problem hiding this comment.
Shouldn't we simply escape the characters instead of dismissing the search entirely? Some people might complain that their search doesn't work properly.
There was a problem hiding this comment.
The file is too big, consider splitting it into... something.
There was a problem hiding this comment.
Do we really need this performance monitoring mechanism?
apps/server/src/services/app_info.ts
Outdated
| import { AppInfo } from "@triliumnext/commons"; | ||
|
|
||
| const APP_DB_VERSION = 233; | ||
| const APP_DB_VERSION = 236; |
There was a problem hiding this comment.
Don't forget to revert the version to 234 if you join the migrations together.
apps/server/src/services/sql.ts
Outdated
| function getDbConnection(): DatabaseType { | ||
| return dbConnection; | ||
| } |
| * @returns String with LIKE wildcards escaped | ||
| */ | ||
| export function escapeLikeWildcards(str: string): string { | ||
| return str.replace(/[%_]/g, '\\$&'); |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the problem, we must ensure that any backslashes in the input string are themselves escaped (turned into \\) before any other escaping occurs, as per established best practices for escaping SQL LIKE wildcards. The correct sequence is:
- Escape backslashes (
\becomes\\). - Escape
%and_(%becomes\%,_becomes\_).
This can be accomplished by first replacing all backslashes with double backslashes, then replacing % and _ with their backslash-prefixed equivalents. This should be done in the escapeLikeWildcards function, which is at lines 88–90 in apps/server/src/services/search/fts/query_builder.ts.
No additional imports are needed, as standard JS replace suffices.
| @@ -86,7 +86,8 @@ | ||
| * @returns String with LIKE wildcards escaped | ||
| */ | ||
| export function escapeLikeWildcards(str: string): string { | ||
| return str.replace(/[%_]/g, '\\$&'); | ||
| // Escape backslash, then escape % and _ | ||
| return str.replace(/\\/g, '\\\\').replace(/[%_]/g, '\\$&'); | ||
| } | ||
|
|
||
| /** |
|
Should be ready if you want to do a run through @eliandoran :) |
|
I’m currently testing and running it; the initial indexing takes quite a while on my collection. It isn’t finished yet, but I’ll report back later with timing results and how it impacts my speed if everything runs smoothly. Optimizing future migration time is better handled as follow-up work rather than part of this PR. Users with small databases will be minimally affected, while users with large collections can investigate the causes of longer migration times if needed. Once it gets through review, it might be useful to include in a future release note a warning that migrations can take a significant amount of time for larger collections. |
|
0234_add_fts5_search.ts must be optimized and cannot be deferred to later PR. It is too slow. On my collection running the migration for a few hours didn't get me to step 2 of the 7 step migration process. Likely cause: the DB queries are not optimized for large collections yet. Also the code in 0234_add_fts5_search.ts is one large function I would prefer if it was modularized more, by breaking it down into smaller functions. Having everything in one big function makes the code human unfriendly. |
Report for commit 9246a69Outcome OutputMeasured Factors
Bugs with Justifications
Footnotes
|
|
Thanks for the great report @werererer - can you include more details on what you used to run the server? |
|
Trilium Server Version: v0.101.3 OS / Kernel: Hardware: RAM This VM has 70 services running, that might influence metrics |
This reverts commit 41f6fed.
Report Appendix: Search in Depth9246a69 (this PR) vs 9142f2d (main) I tried to get data on search, so that we get more accurate data, on how the performance changes without using unreliable reference like "perceived performance" JumpTo:
Output 1this PRmainTyping into Reference: Output 2This PRmainWhy did Search not improve? Also when I tested my rust module i analyzed the code with console.time. I identified that more than 90% of search time is actually invested here: "~/git/Trilium/apps/server/src/services/search/services/search.ts" this line: More specifically most time is spend inside the execute functions of: I measured with console.time, and smth like this was the result. (Note: speed is incredibly slow because of the many labels, however if you only put a console.time around expression, then you will get similar results): Conclusion I don't see this PR improving where 90% of the performance is lost. So search speed is only improved insignificantly, however other factors around search have been speed up (please compare output 2). Therefore, moving to FTS5 is the correct step. Furthermore I believe the search can in the future be reworked with more direct database calls reaping the benefits. |
|
Yeah, I've been noticing the same things. FTS5 is worth investing in, just not immediately for the traditional "search" capabilities. I agree that the majority of performance is lost around that line in |
0234__add_fts5_search.tsReplaces the JavaScript-based fulltext search with native SQLite FTS5 (Full-Text Search 5) using trigram tokenization. This enables:
Why It Was Implemented
The existing JavaScript-based search had significant performance issues:
Risk Assessment
Priority Review Files
apps/server/src/migrations/0234__add_fts5_search.ts- Migration logicapps/server/src/services/search/fts/search_service.ts- Core FTS operationsapps/server/src/services/search/expressions/note_content_fulltext.ts- Integration pointapps/server/src/services/search/sqlite_functions.ts- Custom SQL functionsBreaking Changes
Database Migration Requirements
notes_ftsandattributes_ftsnote_search_contentandnote_tokenstablesAPI Changes
None - All changes are internal implementation details. The search API remains unchanged.
Backwards Compatibility
=,*=*,~=, etc.) retain their behaviorProtected Notes Handling
Protected notes are explicitly excluded from the FTS index:
Protected notes are searched separately using decryption when a protected session is active. This ensures:
Input Validation
Performance Implications
Trigram Tokenization Benefits
FTS5 with trigram tokenization provides:
New Indexes Created (18 Total)
Notes Table:
IDX_notes_search_composite- Common search filtersIDX_notes_metadata_covering- Note metadata queriesIDX_notes_protected_deleted- Protected notes filteringBranches Table:
IDX_branches_tree_traversal- Hierarchy traversalIDX_branches_covering- Branch queriesIDX_branches_note_parents- Reverse lookupAttributes Table:
IDX_attributes_search_composite- Attribute searchesIDX_attributes_covering- Attribute queriesIDX_attributes_inheritable- Inherited attributesIDX_attributes_labels- Label-specificIDX_attributes_relations- Relation-specificOther Tables:
IDX_blobs_content_size- Blob filteringIDX_attachments_composite- Attachment queriesIDX_revisions_note_date- Revision queriesIDX_entity_changes_sync- Sync operationsIDX_recent_notes_date- Recent notesMemory Considerations
detail='full'increases index size by ~50% but enables phrase queriesANALYZEruns on all tables to optimize query plannerMigration Details
Migration File
apps/server/src/migrations/0234__add_fts5_search.tsMigration Steps
Create FTS5 Virtual Tables
CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5( noteId UNINDEXED, title, content, tokenize = 'trigram', detail = 'full' );Populate FTS Tables
isProtected = 0)isDeleted = 0)text,code,mermaid,canvas,mindMapCreate Synchronization Triggers (12 total)
notes_fts_insertnotes_fts_updatenotes_fts_deletenotes_fts_soft_deletenotes_fts_protectnotes_fts_unprotectnotes_fts_blob_insertnotes_fts_blob_updateattributes_fts_insertattributes_fts_updateattributes_fts_deleteattributes_fts_soft_deleteCreate Performance Indexes
Run ANALYZE
Cleanup Legacy Tables
note_search_contentandnote_tokensRollback Strategy
If migration fails:
DROP INDEX IF EXISTSTo manually rollback after successful migration:
New Test Files (16 spec files, ~9,000 lines)
fts5_integration.spec.tsfts_search.spec.tsoperators.spec.tsfuzzy_search.spec.tsattribute_search.spec.tshierarchy_search.spec.tslogical_operators.spec.tsedge_cases.spec.tssearch_results.spec.tsspecial_features.spec.tsprogressive_search.spec.tscontent_search.spec.tsproperty_search.spec.tssqlite_functions.spec.tsetapi/search.spec.tsTest Helpers and Fixtures
test/search_test_helpers.tstest/search_fixtures.tstest/search_assertion_helpers.tsStress Testing
New stress test script:
scripts/stress-test-populate.ts(512 lines)Running Tests
Architecture Overview
New FTS Module Structure
Integration Points
Search Expressions (
note_content_fulltext.ts)Search Service (
services/search.ts)SQL Module (
sql.ts)Custom SQLite Functions
edit_distance(a, b)WHERE edit_distance(title, ?) <= 2regex_match(pattern, text)WHERE regex_match(?, content)Search Operators Supported
=note.content = "hello world"!=note.content != "test"*=*note.content *=* "partial"*=note.content *= "hello"=*note.content =* "world"~=note.content ~= "aproximate"~*note.content ~* "helo"%=note.content %= "^test.*$"Files Changed Summary
Core FTS Implementation (6 files)
fts/index.tsfts/types.tsfts/errors.tsfts/search_service.tsfts/query_builder.tsfts/index_manager.tsIntegration Files (4 files)
expressions/note_content_fulltext.tsservices/search.tsservices/build_comparator.tssqlite_functions.tsMigration (1 file)
migrations/0234__add_fts5_search.tsTests (17 files, ~9,000 lines)
See Testing Requirements section above.