Full-Text Search
Tamsaek goes beyond file name search by indexing the actual content of your documents.
Supported File Types
Section titled “Supported File Types”| Category | Extensions |
|---|---|
| Documents | .pdf, .docx, .doc, .odt, .rtf, .txt |
| Spreadsheets | .xlsx, .xls, .csv, .ods |
| Presentations | .pptx, .ppt, .odp |
| E-books | .epub, .mobi |
| Code | .js, .ts, .py, .rs, .go, .java, .c, .cpp, .md, .json, .yaml, .xml, .html, .css |
.eml, .msg |
How It Works
Section titled “How It Works”- Text Extraction: Tamsaek extracts text from supported file types
- Tokenization: Text is broken into searchable terms
- Indexing: Terms are added to a high-performance Tantivy index
- Search: Queries match against the full-text index
Search Capabilities
Section titled “Search Capabilities”Keyword Search
Section titled “Keyword Search”Find files containing specific words:
quarterly revenuePhrase Search
Section titled “Phrase Search”Use quotes for exact phrases:
"annual report"Boolean Operators
Section titled “Boolean Operators”Combine terms:
budget AND 2024budget OR expensesbudget NOT draftWildcards
Section titled “Wildcards”Partial matching:
mark*Matches “marketing”, “markdown”, “marker”, etc.
Content vs. Name Search
Section titled “Content vs. Name Search”By default, Tamsaek searches both file names and content. To search only content:
content:quarterly reportTo search only names:
name:report.pdfPerformance
Section titled “Performance”- Tantivy engine: Same technology as modern search engines
- Incremental updates: Only new/changed files are re-indexed
- Compressed index: Efficient storage for large collections
- Sub-second results: Even across millions of indexed terms