Skip to content

Embeddings & FTS5: Auto Mode and Repair

Overview

  • YAMS can automatically generate embeddings and index document content on add.
  • Embeddings are stored in the vector database; text content is indexed in SQLite FTS5.
  • Both paths are best-effort and non-blocking to keep ingestion fast.

Auto Embeddings on Add

Toggle via config in ~/.config/yams/config.toml:

[embeddings]
auto_on_add = true           # Queue background embedding on add
preferred_model = "all-MiniLM-L6-v2"  # Model name
embedding_dim = 384          # Must match model output dimensions

Behavior: - yams add and daemon directory indexing call embeddings asynchronously when enabled. - Extraction + FTS5 indexing still run immediately during add (for supported formats).

FTS5 Indexing

  • When available in the SQLite build, YAMS indexes extracted text into documents_fts.
  • DocumentService indexes FTS5 during add; directory add uses the same storage path.
  • Searching uses hybrid keyword (FTS5) + semantic (vector) ranking when configured.

Repair and Rebuild

CLI supports targeted repair flows:

# Generate missing embeddings for stored documents
yams repair --embeddings

# Rebuild FTS5 entries (delete/insert) using robust extraction
yams repair --fts5

# Build/repair knowledge graph from tags/metadata
yams doctor repair --graph

# Run all repair operations
yams repair --all

The daemon RepairCoordinator also performs best-effort FTS5 reindex for documents it fixes embeddings for.

Model Management

# List available models
yams model list

# Download a model
yams model download all-MiniLM-L6-v2

# Check ONNX runtime and plugin status
yams model check

# Set preferred model in config
yams config set embeddings.preferred_model all-MiniLM-L6-v2

Model resolution order: 1. embeddings.model_path in config 2. ~/.yams/models/<name>/model.onnx 3. models/ in current directory 4. /usr/local/share/yams/models

Embedding Dimension Source of Truth

Single key: Set embeddings.embedding_dim in ~/.config/yams/config.toml.

Runtime precedence: config > env (YAMS_EMBED_DIM) > generator > heuristic.

The daemon derives vector DB schema and in-memory index dimensions from this single value to prevent drift.

Notes

  • Embedding and FTS5 operations degrade gracefully: failures are logged and skipped.
  • Batch sizes and retries are conservative to avoid blocking foreground operations.

Troubleshooting

Plugin not loaded:

yams plugin list                              # Check loaded plugins
yams plugin trust add ~/.local/lib/yams/plugins  # Trust plugin directory
yams doctor plugin onnx                       # Diagnose ONNX plugin

Dimension mismatch:

yams doctor                                   # Shows vector DB dim vs model target
yams doctor --recreate-vectors --dim 384      # Recreate with correct dimension

Missing embeddings:

yams repair --embeddings                      # Generate missing embeddings
yams stats -v                                 # Check embedding coverage