Changelog Archive: v0.7.x Series¶

Archived Changelogs¶

v0.7.x archive: docs/changelogs/v0.7.md
v0.6.x archive: docs/changelogs/v0.6.md
v0.5.x archive: docs/changelogs/v0.5.md
v0.4.x archive: docs/changelogs/v0.4.md
v0.3.x archive: docs/changelogs/v0.3.md
v0.2.x archive: docs/changelogs/v0.2.md
v0.1.x archive: docs/changelogs/v0.1.md

[v0.7.10] - 2026-12-20¶

Added¶

Graph command --list-types flag (yams-66h): Node type discovery for knowledge graph
New --list-types flag shows all distinct node types with counts
Table output with TYPE and COUNT columns, ordered by count descending
JSON output with nodeTypes array containing type and count fields
Added getNodeTypeCounts() method to KnowledgeGraphStore interface
Extended GraphQueryRequest IPC protocol with listTypes mode
Usage hint when no nodes found: suggests yams add <path>
Location: src/cli/commands/graph_command.cpp, include/yams/metadata/knowledge_graph_store.h
KnowledgeGraphStore query tests (yams-cqp): Unit tests for graph query methods
findNodesByType pagination tests: limit, offset, combined pagination, empty results
findIsolatedNodes tests: nodes with no incoming edges, different relation types
getNodeTypeCounts tests: type counts, ordering, empty graph
4 test cases with 246 assertions
Location: tests/unit/daemon/graph_component_catch2_test.cpp
P4 language support for symbol extraction: Network data plane language (P4_16)
Node types: headerTypeDeclaration, structTypeDeclaration, controlDeclaration, parserDeclaration, actionDeclaration, tableDeclaration
Query patterns for actions, functions, headers, structs, controls, parsers, tables, typedefs
Aliases: p4, p4_16, p4lang
Grammar auto-download from prona-p4-learning-platform/tree-sitter-p4
Vector diagnostics in DaemonMetrics: Moved collect_vector_diag to background polling
Added vectorEmbeddingsAvailable, vectorScoringEnabled, searchEngineBuildReason to MetricsSnapshot
Status requests now read from cached snapshot (non-blocking)
Resolves status command hangs when vector services are slow
Entity extraction metrics in status output: Added entity queue/inflight counters
New metrics: entityQueued, entityDropped, entityConsumed, entityInFlight
Exposed via yams status and yams status -v output
JSON output includes entity_queued, entity_consumed, entity_dropped, entity_inflight
Location: include/yams/daemon/components/DaemonMetrics.h, src/cli/commands/status_command.cpp
Gitignore support for directory ingestion: Skip files matching .gitignore patterns
New --no-gitignore flag for yams add command to disable gitignore filtering
Default behavior respects .gitignore patterns in the root directory
Supports standard gitignore patterns: wildcards, directory patterns, anchored paths
Location: src/cli/commands/add_command.cpp, src/app/services/indexing_service.cpp

Changed¶

Constexpr language configuration for symbol extraction: Centralized compile-time configuration
17 languages with constexpr node types and query patterns: C, C++, Python, Rust, Go, Java, JavaScript, TypeScript, C#, PHP, Kotlin, Perl, R, SQL, Solidity, Dart, P4
LanguageConfig struct with class_types, field_types, function_types, import_types, identifier_types
Query patterns: function_queries, class_queries, import_queries, call_queries
Language alias support (e.g., “cpp” → “c++”, “cxx”, “cc”)
getLanguageConfig() constexpr lookup function
Location: plugins/symbol_extractor_treesitter/symbol_extractor.cpp
Field extraction: New extractFields() method extracts class member variables
Uses node type traversal with language-specific field types
Creates field kind symbols with proper byte ranges
Member containment relations: New extractMemberRelations() method
Creates contains edges from classes to their methods/fields
Uses byte range containment to determine class membership
Improves knowledge graph structure for code navigation
PostIngestQueue per-stage metrics: Exposed extraction/KG/symbol stage inflight counts
New getters: extractionInFlight(), kgInFlight(), symbolInFlight(), totalInFlight()
Static constexpr limits: maxExtractionConcurrent(), maxKgConcurrent(), maxSymbolConcurrent()
Exposed via daemon status: extraction_inflight, kg_inflight, symbol_inflight
yams status shows POST line when there’s active work
yams status -v shows per-stage breakdown
yams daemon status -d shows full Post-Ingest Pipeline section
JSON output includes stages object with per-stage counts
Location: include/yams/daemon/components/PostIngestQueue.h, src/cli/commands/status_command.cpp, src/cli/commands/daemon_command.cpp
PostIngestQueue dynamic concurrency scaling (PBI-05a): Auto-scale based on queue depth
New TuneAdvisor tunables: postExtractionConcurrent(), postKgConcurrent(), postSymbolConcurrent(), postEntityConcurrent()
Dynamic limits replace static constexpr values in PostIngestQueue pollers
TuningManager scales concurrency based on queue depth thresholds:
- >1000 queued: extraction=hwThreads/2, kg=hwThreads/2
- >500 queued: extraction=hwThreads/4, kg=32
- >100 queued: extraction=hwThreads/8+4, kg=16
- >10 queued: extraction=8
- idle: extraction=4 (default)
Status output shows limits: stages: extract=4/4, kg(q=0/i=0/8), symbol=0/4
JSON includes extraction_limit, kg_limit, symbol_limit, entity_limit
Location: include/yams/daemon/components/TuneAdvisor.h, src/daemon/components/TuningManager.cpp, src/daemon/components/DaemonMetrics.cpp
Knowledge Graph cleanup on document deletion: Deleting documents now cascades to KG
deleteNodesForDocumentHash(): Removes doc:<hash> nodes and symbol nodes with matching document_hash
Integrated into document deletion flow for automatic cleanup
Location: include/yams/metadata/knowledge_graph_store.h, src/app/services/document_service.cpp
Stale edge cleanup on re-indexing: Symbol extraction now cleans up old relationships
deleteEdgesForSourceFile(): Removes edges where properties.source_file matches path
Called automatically before re-extraction to prevent stale relationship accumulation
Location: src/daemon/components/EntityGraphService.cpp
Optimized isolated node query: yams graph --isolated now uses single SQL query
findIsolatedNodes(): Efficient NOT EXISTS subquery instead of N+1 pattern
New IPC fields: isolatedMode, isolatedRelation in GraphQueryRequest
Significant performance improvement for large graphs
Location: src/cli/commands/graph_command.cpp, src/daemon/components/dispatcher/request_dispatcher_graph.cpp
Daemon log command: Added yams daemon log
ExternalPluginHost: New plugin host for Python/process-based plugins (RFC-EPH-001)
Implements IPluginHost interface for external plugins running as separate processes
JSON-RPC 2.0 communication over stdio using existing PluginProcess and JsonRpcClient
Supported plugin types: Python (.py), Node.js (.js), any executable with JSON-RPC support
Process lifecycle management: spawn, monitor, health checks, graceful shutdown
Automatic crash recovery with configurable restart policy (max retries, backoff)
Trust-based security model with persistent trust file
RPC gateway for calling arbitrary plugin methods (callRpc)
Plugin statistics tracking (uptime, restart count, health status)
State change callbacks for monitoring plugin lifecycle events
Location: include/yams/daemon/resource/external_plugin_host.h, src/daemon/resource/external_plugin_host.cpp
Auto-init mode: New yams init --auto flag for containerized/headless environments
Enables vector database with default model (all-MiniLM-L6-v2)
Enables plugins directory setup
Generates authentication keys
Skips S3 configuration (uses local storage)
Non-interactive: no prompts, uses sensible defaults
Tree-sitter grammar download: yams init now offers to download tree-sitter grammars
Interactive menu: recommended (C, C++, Python, JS, TS, Rust, Go), all, or custom selection
Auto-downloads and builds grammars from official GitHub repos
Supports 14 languages: C, C++, Python, JavaScript, TypeScript, Rust, Go, Java, C#, PHP, Kotlin, Dart, SQL, Solidity
Cross-platform: MSVC, MinGW, GCC, Clang compilation support
Grammar prompt also available when YAMS is already initialized
Grammars installed to XDG_DATA_HOME/yams/grammars (Unix) or %LOCALAPPDATA%\yams\grammars (Windows)
New embedding model option: Added multi-qa-MiniLM-L6-cos-v1 as second model choice
Trained on 215M question-answer pairs for semantic search optimization
Same dimensions (384) as default model for compatibility
Replaces all-mpnet-base-v2 (768 dim) in model selection
Git-based version detection: Build system now auto-detects version from git tags
Uses most recent semver tag (v*) as effective version
Falls back to project version only if no tags exist
Command-line override (-Dyams-version=X.Y.Z) takes highest priority
Commit hash in version output: yams --version now shows short commit hash
Format: 0.7.9 (commit: c16939f) built:2025-11-29T17:30:15Z
Helps identify exact build for bug reports and debugging
Init command tests: New test suite for init command model download functionality
Tests for valid HuggingFace URLs, model dimensions, naming conventions
CLI flag acceptance tests (--auto, --non-interactive, --force)
Content-type-aware search profiles: New CorpusProfile enum and auto-detection
CODE: Boosts symbol/path search for source code repositories (60%+ code files)
PROSE: Boosts FTS5/vector search for text-heavy corpora (60%+ docs)
DOCS: Balanced weights for mixed code/documentation
MIXED: Default balanced weights for heterogeneous corpora
SearchEngineConfig::detectProfile(): Auto-detects from file extension distribution
SearchEngineConfig::forProfile(): Returns preset weights for a profile
Session-isolated memory: Documents can now be isolated to working sessions
New CLI commands: yams session create, open, close, status, merge, discard
Documents added during an active session are tagged with session_id metadata
Session documents are invisible to global searches (use --global to bypass)
merge: Removes session tag to promote documents to global index
discard: Permanently deletes all session documents
Supports multiple concurrent sessions with automatic isolation
Database migration adds session tracking to metadata repository
Windows Job Object for plugin processes: External plugin child process cleanup
Plugin processes are now assigned to Windows Job Objects
All child processes are automatically terminated when plugin unloads
Prevents orphaned processes from holding file locks (e.g., PID files)
Uses JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE for reliable cleanup
Location: src/extraction/plugin_process.cpp
Plugin health command: New yams plugin health [name] subcommand for plugin diagnostics
Shows plugin status, interfaces, models loaded, and error state
Displays model provider FSM state (Idle, Loading, Ready, Degraded, Failed)
Lists all loaded models when provider is ready
JSON output support with --json flag
Location: src/cli/commands/plugin_command.cpp
Plugin info improvements: Enhanced yams plugin info output
Now uses StatusResponse.providers for accurate plugin status
Shows plugin type (native/external), interfaces, and path
Properly handles both ABI and external plugin hosts

Changed¶

Embedding model list: Both recommended models now have 384 dimensions
all-MiniLM-L6-v2: Lightweight general-purpose semantic search (default)
multi-qa-MiniLM-L6-cos-v1: Optimized for question-answer semantic search
ServiceManager Decomposition: Extracted focused components from monolithic ServiceManager
New ConfigResolver: Static config/env resolution utilities (248 lines)
New VectorSystemManager: Vector DB and index lifecycle (397 lines)
New DatabaseManager: Metadata DB, connection pool, KG store lifecycle (254 lines)
New PluginManager: Plugin host, loader, and interface adoption (515 lines)
ServiceManager accessors now delegate to extracted managers
Configurable Vector DB Capacity: Vector index max_elements now configurable
Environment variable: YAMS_VECTOR_MAX_ELEMENTS
Config file: [vector_database] max_elements
Default: 100,000 (range: 1,000 - 10,000,000)
FTS5 index hygiene (migration v18): Removed unused content_type column from FTS5 index
content_type was indexed but never queried via FTS MATCH
Content type filtering uses JOIN on documents.mime_type instead
Reduces FTS5 index size and improves indexing performance
Automatic migration rebuilds index on first database open
Daemon socket logging noise reduction: Request/mux/enqueue/drain logs now emit at debug level
Default info-level daemon logs no longer show per-request socket traffic
Enable debug logging to inspect connection-level request handling details
SearchEngine Consolidation: Unified search architecture by removing legacy HybridSearchEngine
SearchEngine is now the sole search engine, consolidating multi-component search (FTS5, PathTree, Symbol, KG, Vector, Tag, Metadata)
Removed ~2000 lines of legacy code: hybrid_search_engine.cpp, hybrid_search_factory.cpp, and associated headers
Parallel Execution: SearchEngine now uses std::async to execute all 7 component queries simultaneously
- Configurable via SearchEngineConfig::enableParallelExecution (default: true)
- Per-component timeout via SearchEngineConfig::componentTimeout (default: 100ms)
- Graceful degradation: timed-out components are skipped, others continue
Updated Interfaces: AppContext.searchEngine replaces AppContext.hybridEngine across CLI, daemon, and services
SearchEngineBuilder: Simplified to create SearchEngine directly (removed MetadataKeywordAdapter and KG scorer wiring)
Removed unused benchmark executables: engine_comparison_bench, hybrid_search_bench
Location: src/search/, include/yams/search/, src/app/services/, src/cli/
HotzoneManager Persistence: Added save/load functionality for hotzone state
HotzoneManager::save(path): Serializes hotzone entries to JSON with atomic write (temp + rename)
HotzoneManager::load(path): Restores persisted hotzone state on startup
Stores version, half-life config, and timestamped entry scores
Location: src/search/hotzone_manager.cpp, include/yams/search/hotzone_manager.h
CheckpointManager Component: New daemon component for periodic state persistence
Manages vector index and hotzone checkpoint scheduling
Configurable interval, threshold-based vector index saves, optional hotzone persistence
Async timer-based loop with graceful shutdown support
Post-ingest pipeline parallelization: PostIngestQueue and EntityGraphService now use WorkCoordinator
Removed serial strand-based processing bottleneck in PostIngestQueue
EntityGraphService now posts extraction jobs to shared WorkCoordinator thread pool
Removed unused PoolManager “post_ingest” pool and associated TuningManager tuning logic
Documents process in parallel across all worker threads with work stealing
Graph BFS traversal optimization: Reduced N+1 query patterns in graph traversal
New getEdgesBidirectional() API: returns incoming + outgoing edges in single query (UNION)
New getNodesByIds() API: batch node retrieval for hydration
Edge cache in BFS: edges fetched during neighbor collection reused for connecting edges
Reduces per-node queries from 4 (2×getEdgesFrom + 2×getEdgesTo) to 1
Location: src/app/services/graph_query_service.cpp, src/metadata/knowledge_graph_store_sqlite.cpp
Graph command cleanup: Removed unused --reverse flag
Bidirectional traversal is now the default behavior
Flag was redundant since BFS optimization returns all connected edges
Location: src/cli/commands/graph_command.cpp

Fixed¶

JavaScript/TypeScript symbol extraction: Audited and fixed against Tree-sitter grammars
JavaScript: Added function_expression, generator_function, generator_function_declaration, namespace_import, export_statement, export_specifier
TypeScript: Added abstract_class_declaration, abstract_method_signature, function_expression, generator_function, import_alias
Added queries for function expressions, generators, abstract methods, export statements
Graph --name query now shows symbol relationships: Fixed yams graph --name <file> showing “Graph data unavailable”
Now resolves filename to file node key and uses KG query path
Shows connected symbols, includes, and document nodes
Falls back to document-based lookup if file node not found
Location: src/cli/commands/graph_command.cpp
KG queue metric now shows pending count: Fixed kg(q=N) showing cumulative total instead of pending items
Now calculates: pending = queued - consumed - inflight
Affects yams status -v and yams daemon status -d displays
Location: src/cli/commands/status_command.cpp, src/cli/commands/daemon_command.cpp
Symbol extraction extension mapping: Fixed extension lookup not matching due to leading dot mismatch
Database stores extensions with dots (.cpp), map keys without (cpp)
PostIngestQueue now strips leading dot before lookup
Location: src/daemon/components/PostIngestQueue.cpp
Graph query bidirectional traversal: Fixed graph queries showing 0 connections for blob nodes
BFS traversal now follows both incoming and outgoing edges by default
Blob nodes (which only have incoming has_version edges from path nodes) now return connected nodes
Refactored dispatcher to delegate to GraphQueryService (single responsibility)
Repair tracking (migration v21): Added repair status tracking to prevent duplicate work
New repair_status column (pending, processing, completed, failed, skipped)
repair_attempted_at timestamp and repair_attempts counter
RepairCoordinator filters by status to avoid re-queuing processed documents
Plugin interface parsing: Fixed object-format interfaces not parsing correctly
Plugin host sharing: Fixed model provider adoption failure after component extraction
VectorIndexManager initialization: Fixed “VectorIndexManager not provided” search engine build failure
Model download mapping: Added multi-qa-MiniLM-L6-cos-v1 to HuggingFace repo mapping
Version display: Fixed yams --version showing fallback values
Socket crash on shutdown: Fixed EXC_BAD_ACCESS in kqueue_reactor during program exit
Windows daemon status metrics: CPU and memory now report accurate values
--name flag for yams add: Fixed custom document naming for single-file adds
External plugin extractors: Fixed content extractors from external plugins not being used
Trust file persistence: Fixed plugin trust file being deleted on daemon restart
Trust file comment parsing: Fixed daemon crash when loading trust file with comments
Plugin trust initialization order: Fixed plugins not loading despite being trusted
Post-ingestion pipeline reliability: Improved async processing consistency
Graph IPC serialization: Added missing ProtoBinding specializations for GraphQueryRequest/Response
Status command document count: Fixed yams status showing docs=0 after daemon restart
Short status now uses documents_total (from metadata DB, initialized on startup)
Previously used storage_documents (CAS object count, which was 0 on fresh start)
Detailed status was unaffected as it already used the correct field
Location: src/cli/commands/status_command.cpp

CLI Improvements¶

PowerShell completion: Added yams completion powershell for PowerShell auto-complete
Consistent --json output: Extended JSON output support across commands
Actionable error hints: Centralized error hint system with pattern-based hints
Daemon error messages: Enhanced daemon start/stop failure messages with recovery hints

Removed¶

HybridSearchEngine: Legacy search engine removed in favor of unified SearchEngine
Deleted: src/search/hybrid_search_engine.cpp (~1844 lines)
Deleted: src/search/hybrid_search_factory.cpp (~168 lines)
Deleted: include/yams/search/hybrid_search_engine.h
Deleted: include/yams/search/hybrid_search_factory.h
HybridSearchEngine Tests: Removed obsolete test files
tests/unit/search/hybrid_search_engine_test.cpp
tests/unit/search/hybrid_grouping_smoke_test.cpp
tests/unit/search/learned_fusion_smoke_test.cpp
tests/unit/search/hierarchical_search_test.cpp
tests/unit/metadata/search_metadata_interface_test.cpp
Legacy Adapters: Removed MetadataKeywordAdapter (was bridge for HybridSearchEngine)
CLI Adapter Rename: HybridSearchResultAdapter → SearchResultItemAdapter in result_renderer.h

[v0.7.8] - 2025-11-14¶

Added¶

Thread Pool Consolidation
WorkCoordinator Component: New centralized thread pool manager with Boost.Asio io_context
- Replaces 3 separate thread pools (IngestService, PostIngestQueue, EmbeddingService)
- Provides strand allocation for per-service ordering guarantees
- Hardware-aware thread count (8-32 threads based on CPU cores)
Search Service Parallel Post-Processing
- New ParallelPostProcessor class for concurrent search result processing
- Parallelizes filtering, facet generation, and highlighting when result count ≥ 100
- Uses std::async to run independent operations concurrently
- Threshold-based activation (PARALLEL_THRESHOLD = 100) avoids overhead on small result sets
- Performance Measured (100 iterations):
- 100 results: 0.06ms (~1.66M ops/sec) - sequential path
- 500 results: 0.23ms (~2.21M ops/sec) - parallel path
- 1000 results: 0.43ms (~2.32M ops/sec) - parallel path
- Speedup: ~3.4x faster at 1000 results vs linear scaling
- Location: include/yams/search/parallel_post_processor.hpp, src/search/parallel_post_processor.cpp
- Integration: search_executor.cpp now uses ParallelPostProcessor instead of sequential processing
- Benchmarks: tests/benchmarks/search_benchmarks.cpp

Changed¶

Search Service: --fuzzy searches now merge BM25 keyword matches with fuzzy results so enabling typo tolerance never suppresses literal hits. (src/app/services/search_service.cpp)
Metadata Repository: Removed the default 50K fuzzy-index cap. The index now covers the full corpus by default and only enforces limits when YAMS_FUZZY_INDEX_LIMIT is set, adding a small safety buffer and explicit guard logging. (src/metadata/metadata_repository.cpp, include/yams/metadata/fuzzy_index_builder.h)
Service Architecture Refactor
IngestService: Converted from manual thread pool to strand-based channel polling
- Removed kSyncThreshold heuristics and compat::jthread pool
- New channelPoller() awaitable for document processing
PostIngestQueue: Converted from worker threads to strand-based pipeline
- Removed Worker struct, thread pool, and token bucket scheduler (~200 lines)
- Implemented awaitable pipeline: processMetadataStage → (processKnowledgeGraphStage || processEmbeddingStage)
- Parallel KG and Embedding stages using make_parallel_group
EmbeddingService: Converted from worker threads to strand-based channel polling
- Removed worker thread pool (~70 lines)
- New channelPoller() awaitable with async timer
TuningManager: Converted from manual thread to strand-based periodic execution
- Removed compat::jthread with stop_token
- New tuningLoop() awaitable with boost::asio::steady_timer
- Uses WorkCoordinator strand for pool size adjustments
- Maintains TuneAdvisor::statusTickMs() polling interval
DaemonMetrics: Converted from manual thread to strand-based polling loop
- Removed std::thread for CPU/memory metrics collection
- New pollingLoop() awaitable with 250ms timer interval
- Uses WorkCoordinator strand for metric updates
- Thread-safe snapshot access via shared_mutex
BackgroundTaskManager: Migrated from GlobalIOContext to WorkCoordinator
- Removed fallback to GlobalIOContext (proper architectural separation)
- Now uses WorkCoordinator executor for all background tasks
- Integrated with unified work-stealing thread pool
- Fts5Job consumer polling delay reduced: 200ms → 10ms (20x throughput improvement)
- Fixed orphan scan queue overflow (was causing hundreds of dropped batches)
ServiceManager: Refactored async operations
- Eliminated all 5 uses of std::future/std::async
- Converted database operations to use make_parallel_group with timeouts
SearchPool Removal
Deleted the unused SearchPool component and associated meson/build wiring
ServiceManager no longer constructs dead search infrastructure; HybridSearchEngine remains the sole search path
TuneAdvisor/TuningManager now derive concurrency targets directly from SearchExecutor load metrics instead of phantom pool sizes
Ingestion Pipeline Cleanup
Removed deferExtraction Technical Debt: Eliminated bypass mechanism that skipped full production pipeline
- Removed deferExtraction field from StoreDocumentRequest and AddDirectoryRequest structs
- Removed conditional logic in DocumentService that skipped FTS5 extraction
- All document ingestion now uses full pipeline: metadata storage → FTS5 extraction → PostIngestQueue → (KG extraction || Embedding generation)
- Updated IngestService to always enqueue to PostIngestQueue (removed lines setting deferExtraction=true)
- Updated CLI add_command fallback paths (3 locations) to use full pipeline
- Updated mobile bindings to remove sync_now-based deferral
- Removed --defer-extraction and --no-defer-extraction flags from ingestion_throughput_bench
- Updated test helpers (tests/common/capability.h, integration test) to use full pipeline
Grep Output Update
New default output format

Example output:

=== Results for "TaskManager" in 3 files (5 regex, 2 semantic) ===

File: src/core/TaskManager.cpp (cpp)
   Matches: 3 (3 regex)

   Line   45: [Regex] class TaskManager {
   Line  102: [Regex] TaskManager::TaskManager() : initialized_(false) {
   Line  237: [Regex] void TaskManager::shutdown() {

[Total: 7 matches across 3 files]

Location: src/cli/commands/grep_command.cpp:531-645
Grep Service Optimizations
Literal Extraction from Regex Patterns
- New LiteralExtractor utility extracts literal substrings from regex patterns
- Enables two-phase matching: fast literal pre-filter → full regex only on candidates
- Based on ripgrep’s literal extraction strategy
Boyer-Moore-Horspool (BMH) String Search
- Replaces std::string::find() with BMH algorithm for patterns ≥ 3 characters
SIMD Vectorized Newline Scanning
- Platform-specific implementations: AVX2 (32 bytes), SSE2 (16 bytes), NEON (16 bytes)
- Scalar fallback using optimized memchr for portability
- Replaces byte-by-byte scanning in line boundary detection
- Performance: 4-8x speedup on large files
Parallel Candidate Filtering
- Pre-filters unsuitable files before worker distribution using std::async
- Integrates magic_numbers.hpp for accurate binary detection (86 compile-time patterns)
- Filters build artifacts (.o, .class, .pyc), libraries (.a, .so, .dll), executables, packages
- Chunk-based parallel processing for large candidate sets (>100 files)
- Performance: 2-4x speedup on large corpora

Fixed¶

Content-backed Fuzzy Hits: Content-derived fuzzy matches (_content entries) now map back to their owning documents, ensuring CLI searches show the expected files. (src/metadata/metadata_repository.cpp, tests/unit/metadata/metadata_repository_test.cpp)
Cold Start Vector Index Loading: Fixed issue where search and grep commands returned no results after daemon cold start despite having indexed documents.
Search Async Path: Fixed SearchCommand::executeAsync() not populating pathPatterns field in daemon request, causing server-side multi-pattern filtering to fail. The async code path (default execution) now correctly sends all include patterns to the daemon, matching the behavior of the sync path. (src/cli/commands/search_command.cpp:1360-1365)
Database Schema Compatibility: Fixed “constraint failed” errors during document insertion on databases with migration v12 (pre-path-indexing schema). The insertDocument() function now conditionally builds INSERT statements based on the hasPathIndexing_ flag, supporting both legacy (13-column) and modern (17-column with path indexing) schemas. This allows YAMS to work correctly regardless of whether migration v13 has been applied. (src/metadata/metadata_repository.cpp:318-380)
MCP Protocol Version Negotiation: Fixed “Unsupported protocol version requested by client” error (code -32901) by making protocol version negotiation permissive by default (strictProtocol_ = false). The server now gracefully accepts any protocol version requested by clients, falling back to the latest supported version (2025-03-26) if the requested version is not in the supported list. Also added intermediate MCP protocol versions (2024-12-05, 2025-01-15) to the supported list. This ensures maximum compatibility with MCP clients regardless of which spec version they implement. (src/mcp/mcp_server.cpp:560,1254-1260)
MCP Large Response Buffering: Fixed “Error: MPC -32602: Error: End of file” errors when MCP server sends large responses (list, search, grep with many results). Implemented chunked buffered output in StdioTransport::sendFramedSerialized() that breaks payloads >512KB into 64KB chunks with explicit flushes between chunks. This prevents stdout buffer overflow and ensures reliable delivery of large JSON-RPC responses over stdio transport. Also added threshold-based routing in MCPServer::sendResponse() to use buffered sending for payloads >256KB. (src/mcp/mcp_server.cpp:69-95,169-203)

[v0.7.7] - 11-07-2025¶

Added¶

Hierarchical Embedding Architecture & Two-Stage Hybrid Search
Data model extensions for hierarchical embeddings
- Added EmbeddingLevel enum (CHUNK, DOCUMENT) to distinguish embedding granularity
- Extended VectorRecord with level, source_chunk_ids, parent_document_hash, child_document_hashes fields
- Modified embed_and_insert_document to generate document-level embeddings (normalized mean of chunk vectors)
- Document-level embeddings stored alongside chunk-level for two-stage search readiness
- Added twoStageVectorSearch method that retrieves broader candidate set and applies hierarchical boosting
- Configuration fields: enable_two_stage, doc_stage_limit, chunk_stage_limit, hierarchy_boost
- Groups results by document and boosts scores based on document-level similarity
- Wired into both parallel and sequential search paths for transparent operation
Profiling build support for performance analysis
- New build type: ./setup.sh Profiling enables instrumentation for Tracy, Valgrind, Perf
- Builds to build/profiling directory with debug symbols + profiling hooks
- Fuzzing build stub: ./setup.sh Fuzzing reserved for future AFL++/libFuzzer integration
- See docs/developer/profiling.md for comprehensive profiling guide
EmbeddingService Architecture
Problem: PostIngestQueue workers were blocking on slow embedding generation, causing:
- Documents not searchable until embeddings complete
- Add commands hanging/timing out
- Ingest pipeline stalled waiting for embedding models
Solution: Separated embedding generation into dedicated EmbeddingService that consumes from InternalBus
- PostIngestQueue now 2-stage pipeline (Metadata + KnowledgeGraph) - embeddings removed
- Documents searchable immediately after FTS5 indexing (~milliseconds)
- Embeddings generated asynchronously in background by EmbeddingService workers
- Better resource isolation: ingest and embedding workers independently tunable
- No more blocking: add commands return immediately, documents queryable right away
ServiceManager & Daemon Lifecycle Improvements
Structured Concurrency: Replaced manual backpressure logic with std::counting_semaphore for natural bounded concurrency
SocketServer Improvements:
- Converted async_accept to as_tuple pattern, eliminating exception overhead during shutdown
- Connection future tracking for graceful shutdown with 2s timeout verification
Modern Error Handling: Consistent use of boost::asio::as_tuple(use_awaitable) for error codes instead of exceptions
Future Tracking: Replaced detached spawns with use_future for verifiable connection lifecycle management
Doctor Prune Command: Intelligent cleanup of build artifacts, logs, cache, and temporary files
Support for 9 build systems (CMake, Ninja, Meson, Make, Gradle, Maven, NPM/Yarn, Cargo, Go)
Detection across 10+ programming languages (C/C++, Java, Python, JavaScript, Rust, Go, OCaml, Haskell, Erlang, etc.)
Hierarchical category system: build-artifacts, build-system, logs, cache, temp, coverage, IDE
Extended package manager support: Added 9 new categories for package dependencies and caches
- IDE-specific: ide-vscode, ide-intellij, ide-eclipse for workspace caches
- Dependencies: package-node-modules (npm/yarn/pnpm), package-composer-vendor (PHP), package-cargo-target (Rust)
- Caches: package-python-cache (pycache/), package-maven-repo, package-gradle-cache, package-go-cache, package-gem-cache, package-nuget-cache
- Composite groups: package-deps, package-cache, packages (all), ide-all
- Path-based detection for directories: node_modules/, pycache/, .vscode/, target/, vendor/, etc.
Dry-run by default with --apply flag for execution
Usage: yams doctor prune --category build-artifacts --older-than 30d --apply
Usage: yams doctor prune --category packages --apply (clean all package artifacts)
Started C++23 Compatibility support expansion
Migrated vectordb to https://github.com/trvon/sqlite-vec-cpp
Tree-sitter Symbol Extraction Plugin Enhanced multi-language symbol extraction with Solidity support
Solidity Support: Added complete Solidity language support with 4 query patterns (functions, constructors, modifiers, fallback/receive)
Enhanced C++ Patterns: 16 function patterns + 6 class patterns including templates, constructors, destructors, operator overloads, method declarations inside class bodies
Multi-Language Improvements: Enhanced patterns for Python (decorated functions), Rust (impl/trait methods), JavaScript/TypeScript (arrow functions, generators, async), Kotlin (property declarations) across all 15 supported languages
Critical Bug Fix: Fixed query execution early-return bug that caused pattern short-circuiting - now executes all patterns resulting in 2.2x recall improvement (20.6% → 45.1%)
Benchmark Infrastructure: Catch2-based benchmark suite with quality metrics (Recall/Precision/F1), performance metrics (Throughput/Latency), and JSON output for CI integration
GTest Suite: 7 Solidity tests covering ERC20 tokens, inheritance, interfaces, events, and modifiers (372 lines, all passing)
Plugin auto-downloads tree-sitter grammars on first use (configurable via plugins.symbol_extraction.auto_download_grammars)
CLI commands: yams config grammar list/download/path/auto-enable/auto-disable
Supports tree-sitter v13-15 grammar versions
Entity Graph Service: Background service for extracting and materializing code symbols into Knowledge Graph
Wired into IndexingPipeline and RepairCoordinator for automatic symbol extraction
Supports plugin-based language-specific symbol extraction
Foundation for symbol-aware search and code intelligence features
Database Schema v16: Added symbol_metadata table for rich symbol information storage
Stores symbol definitions, references, and metadata from code analysis plugins
Indexed by document hash and symbol name for efficient lookups
Integrated with Knowledge Graph for entity relationship tracking
Migration includes tests for both schema changes and symbol metadata storage
Symbol-Aware Search Infrastructure: Enhanced search with symbol/entity detection and enrichment
SymbolEnricher class extracts rich metadata from Knowledge Graph (definitions, references, call graphs)
Symbol context includes type, scope, caller/callee counts, and related symbols
Hybrid Search Symbol Integration: Symbol metadata now actively boosts search ranking
- Added symbol_weight configuration field (default: 0.15 = 15% multiplicative boost)
- HybridSearchEngine::setSymbolEnricher() method wires SymbolEnricher into search pipeline
- Symbol matches receive score boost when isSymbolQuery && symbolScore > 0.3

Fixed¶

Grep Command Duplicate Output: Fixed yams grep printing results twice when stderr is redirected
Migration System Crash (macOS): Fixed SIGSEGV crash in MigrationManager::recordMigration() during daemon startup
Root Cause: ServiceManager::co_migrateDatabase() called mm.initialize() but ignored its return value. If initialization failed to create the migration_history table, migrations would continue and crash when attempting to INSERT into the non-existent table.
Fix: Added error checking for mm.initialize() with early return and proper error logging in ServiceManager.cpp
Embedding System Architecture Simplification: Simplified FSM readiness logic to check provider availability directly instead of waiting for model load events
IModelProvider checks isAvailable() immediately after plugin adoption
Eliminates unnecessary ModelLoading state transition
Fixes “Embedding Ready: Waiting” status showing incorrectly when embeddings were actually available
Model dimension retrieved via getEmbeddingDim() at adoption time
Database Schema Recovery: Manual creation of missing kg_doc_entities table from migration 7
Table includes 8 columns with foreign keys to documents and kg_nodes
Created indexes: idx_kg_doc_entities_document, idx_kg_doc_entities_node
Fixes search query errors: “no such table: kg_doc_entities”
Worker Thread Premature Exit: Fixed io_context workers exiting immediately on startup by adding executor_work_guard to keep the context alive until explicit shutdown.
SocketServer Backpressure: Manual backpressure polling with std::counting_semaphore, eliminating 5-20ms delay loops and providing natural bounded concurrency.
Embedding Consumer Deadlock: Fixed race condition causing embedding job consumer to stall
Added defensive retry mechanism with exponential backoff for queue state recovery
Impact: Embedding background processing now reliable under high load
FSM Cleanup & Degradation Tracking: Standardized FSM usage across ServiceManager
Added DaemonLifecycleFsm reference to ServiceManager for centralized subsystem degradation tracking
ServiceManager FSM Architecture: Centralized state management and eliminated duplication
Added DaemonLifecycleFsm& lifecycleFsm_ reference to ServiceManager for daemon-level degradation tracking
Removed scattered manual FSM state checks in favor of FSM query methods (isReady(), isLoadingOrReady())
Text Extraction for Source Code: Fixed critical issue where JavaScript/TypeScript/Solidity/config files failed FTS5 extraction
- src/extraction/extraction_util.cpp: Replaced hardcoded is_text_like() with FileTypeDetector::isTextMimeType() which uses comprehensive magic_numbers.hpp database; added extension normalization (handles both .js and js formats)
- src/extraction/plain_text_extractor.cpp:
- Removed hardcoded 50+ extension list, delegating to FileTypeDetector for dynamic detection
- Enhanced isBinaryFile() with UTF-8 BOM support and reduced false positives
- Added isParseableText() with proper UTF-8 validation (validates multi-byte sequences, continuation bytes)
- Baseline registration now includes common config/markup extensions (.toml, .ini, .yml, .md, .rst)
- src/app/services/search_service.cpp: Updated lightweight indexing to use FileTypeDetector::isTextMimeType()

Changed¶

Embedding Provider Lifecycle: Transitioned from event-driven model loading to direct availability checking
Provider adoption now immediately dispatches ModelLoadedEvent if isAvailable() returns true
Simplified from 4-state FSM (Unavailable → ProviderAdopted → ModelLoading → ModelReady) to immediate ready transition
Aligns FSM with IModelProvider on-demand model loading architecture

Removed¶

Fuzzy Index Memory Optimization: Enhanced BK-tree index building with intelligent document prioritization
Uses metadata and Knowledge Graph to rank documents by relevance (tagged > KG-connected > recent > code files)
Limits index to 50,000 documents by default (configurable via YAMS_FUZZY_INDEX_LIMIT environment variable)
Graceful degradation with std::bad_alloc handling prevents daemon crashes on large repositories
Known Limitation: Fuzzy search on very large repositories (>100k documents) may experience memory pressure. Consider using metadata/KG filters or grep with exact patterns for better performance
ONNX Plugin Model Path Resolution: Enhanced model path search to support XDG Base Directory specification
Platform-Aware Plugin Installation: Build system now auto-detects Homebrew prefix on macOS
/opt/homebrew on Apple Silicon, /usr/local on Intel Macs and Linux
System plugin directory automatically trusted by daemon at runtime
Override via YAMS_INSTALL_PREFIX environment variable
Model loading timeouts hardened: adapter and ONNX plugin now use std::async with bounded wait; removed detached threads causing UAF/segfaults (AsioConnectionPool guarded)
Vector DB dim resolution no longer hardcodes 384; resolves from DB/config/env/provider preferred model, else warns and defers embeddings
ONNX plugin: removed implicit 384 defaults, derives embeddingDim dynamically from model/config; added env override YAMS_ONNX_PRECREATE_RESOURCES
Improved load diagnostics: detailed logs for ABI table pointers, phases, and timeout causes
Search Service Path Heuristic: Tightened path-first detection to only trigger for single-token or quoted path-like queries (slashes, wildcards, or extensions). Multi-word queries now proceed to hybrid/metadata search, restoring results for phrases such as "docs/delivery backlog prd tasks PBI" while preserving fast path lookups for actual paths.
Daemon stop reliability: yams daemon stop now only reports success after the process actually exits and will fall back to PID-based termination (and orphan cleanup) when the socket path is unresponsive.
Prompt termination on signals: the daemon now handles SIGTERM/SIGINT to exit promptly when graceful shutdown isn’t possible, addressing lingering yams-daemon processes after stop.
Hybrid Search Simplification: Removed complexity and environment variable overrides
Removed 6 environment variables: YAMS_DISABLE_KEYWORD, YAMS_DISABLE_ONNX, YAMS_DISABLE_KG, YAMS_ADAPTIVE_TUNING, YAMS_FUSION_WEIGHTS, YAMS_OVERRELIANCE_PENALTY
Kept YAMS_DISABLE_VECTOR for CI compatibility
Removed adaptive weight tuning logic (~30 LOC)
Removed over-reliance penalty mechanism
Keyword search now always executes (controlled by config.keyword_weight)
Fixed fusion weights for LEARNED_FUSION strategy: {-2.0f, 3.0f, 2.0f, 1.5f, 1.0f}

Removed¶

Removed WASM, and legacy plugin system from codebase and ServiceManager

[v0.7.6] - 10-13-2025¶

Added¶

CLI Pattern Ergonomics: Added --pattern/-p flag to list command as an alias for --name, improving consistency with other commands. The flag supports glob wildcards (*, ?, **) and auto-normalizes relative paths to absolute when no wildcards are present. (src/cli/commands/list_command.cpp)
Grep Literal Text Hints: Added smart error detection and helpful hints when grep patterns contain regex special characters. When a pattern fails regex compilation or returns no results, grep now suggests using the -F flag with the exact command to run. Added -Q as a short alias for -F/--fixed-strings/--literal-text to match git grep convention. (src/cli/commands/grep_command.cpp)
Search Literal Text Aliases: Added -F/-Q/--fixed-strings aliases to search command for consistency with grep. These short flags make it easier to search for literal text containing special characters like ()[]{}.*+?. Updated help text with concrete examples. (src/cli/commands/search_command.cpp)
Grep: enabling [search.path_tree] now lets explicit path filters reuse the metadata-backed path-tree engine, and tag-only invocations default the pattern to .*, removing the need for placeholder expressions. citesrc/app/services/grep_service.cpp:240src/cli/commands/grep_command.cpp:305
Tree-based List with Filters: Extended tree-based path queries to support tag, MIME type, and extension filtering. The list command now uses the tree index even when filters are applied, improving performance for pattern+filter queries (e.g., yams list --name "docs/**" --tags "test").
Benchmark: Tree List Filters: New benchmark suite (tree_list_filter_bench) measures query performance with various filter combinations. Results show 100-160μs query times with up to 10k queries/sec throughput. Filter queries often outperform path-only queries due to reduced result set sizes.
Grep SQL-Level Pattern Filtering: Added queryDocumentsByGlobPatterns() function that converts glob patterns (e.g., tests/**/*.cpp) to SQL LIKE patterns and queries the database directly, eliminating the need to load all documents into memory before filtering. Grep performance with --include patterns improved dramatically on large repositories.
Search Multi-Pattern Support: Added pathPatterns vector field to SearchRequest IPC protocol (field 34) enabling server-side filtering of multiple include patterns. Search command now sends all patterns to daemon instead of filtering results client-side, eliminating timeouts and OOM errors in sandboxed environments.
MCP Search Multi-Pattern Support: Added include_patterns array parameter to MCP search tool, enabling clients to specify multiple path patterns with OR logic. The MCP server now populates pathPatterns in daemon requests, matching CLI behavior. (include/yams/mcp/tool_registry.h, src/mcp/mcp_server.cpp)

Changed¶

MCP stdio transport: stdout buffering now adapts to interactive vs non-interactive streams, stderr is forced unbuffered, and JSON-RPC batch arrays over stdio are parsed in-line to match the Model Context Protocol 2025-03-26 transport requirements. Additional unit coverage exercises batch handling and error budgets for framed headers.
MCP Server: cat and get tools now resolve relative paths using weakly_canonical, improving document lookup for non-absolute paths.
Path Canonicalization: Document paths are now canonicalized using weakly_canonical() at ingestion time to ensure consistent path matching across symlinked directories (e.g., /var → /private/var on macOS). This fixes pattern-based queries that previously failed due to path mismatch between indexed and query paths. (src/metadata/path_utils.cpp)
Integration Test Stability: Improved TreeBasedListE2E test reliability by replacing fixed sleep with polling-based wait for document indexing completion. Test pass rate improved from ~60% to 100%.
Grep Performance: Grep service now uses SQL-level pattern filtering when --include patterns are provided, fetching only matching documents from the database instead of loading all documents and filtering in memory. Converts glob patterns to SQL LIKE patterns (e.g., *.cpp → %.cpp, tests/**/*.h → tests/%.h). This eliminates hangs on large repositories (10K+ documents).
Search Service: Updated to handle multiple path patterns via pathPatterns vector field, iterating through all patterns with OR logic for server-side filtering. Removed client-side filtering that previously caused timeouts with multiple --include patterns.
Build System: Fixed VS Code task definitions with correct Conan 2.x output paths. Meson native file paths updated from builddir/conan_meson_native.ini to builddir/build-debug/conan/conan_meson_native.ini and build/release/build-release/conan/conan_meson_native.ini to match actual Conan 2 directory structure. (.vscode/tasks.json)

Fixed¶

MCP Search/Grep Hang: Fixed MCP server’s search and grep tools hanging indefinitely by ensuring all async components use the same io_context. The root cause was a multi-layered async execution mismatch: (1) The DaemonClient’s config didn’t specify an executor, causing it to create its own internal io_context. (2) callTool() was spawning work on GlobalIOContext and blocking a worker thread waiting for results. (3) The two separate io_contexts couldn’t communicate, causing deadlock. Fixed by: (a) Configuring daemon_client to use GlobalIOContext executor in MCPServer constructor. (b) Removing the nested local io_context from callTool() - now correctly spawns on GlobalIOContext (which has background worker threads) and waits for results. (c) Removing nested io_context from handleSearchDocuments() to use co_await directly. (src/mcp/mcp_server.cpp:403,2029-2039,2225-2226)
Document Service: Improved resolveNameToHash to correctly handle filename-only lookups by searching for paths ending with the given name, ensuring that commands like cat with a simple filename succeed.
Tree Query Pattern Matching: Fixed wildcard pattern parsing to correctly handle recursive patterns (/**) by stripping all trailing wildcards iteratively instead of checking for a single wildcard character. (src/app/services/document_service.cpp)
Grep Hang: Fixed grep command hanging indefinitely when using --include patterns on large repositories. The service was fetching all documents before filtering; now uses SQL-level pattern matching to fetch only relevant documents.
Search Timeout: Fixed search command timeouts/OOM when using multiple --include patterns. Previously only the first pattern was sent to daemon with remaining patterns filtered client-side after retrieving ALL results. Now all patterns are sent to daemon for server-side filtering.
Search Pattern Matching: Fixed glob pattern normalization to correctly match filename patterns like *.md and *.cpp anywhere in the path tree. Patterns starting with a single * (e.g., *.ext) are now automatically prefixed with **/ to match paths at any depth. This ensures patterns like

[v0.7.4] - 2025-10-010¶

Changed¶

MetadataRepository: Added atomic counters (cachedDocumentCount_, cachedIndexedCount_, cachedExtractedCount_) updated on every insert/delete/update operation. Eliminated 3 COUNT(*) queries from hot path (220-400ms → <1μs)
VectorDatabase: Added cachedVectorCount_ atomic counter updated on insert/delete operations. Eliminated COUNT(*) query from getVectorCount()
ServiceManager Concurrency: Converted searchEngineMutex_ from std::mutex to std::shared_mutex enabling N concurrent readers with single exclusive writer. Allows parallel status requests without serialization bottleneck
Status Request Optimization: Removed blocking VectorDatabase initialization from hot path. Status handler now reports readiness accurately without attempting to “fix” uninitialized state, eliminating 1-5s blocking operations
Performance: Sequential request throughput improved to ~1960 req/s with sub-millisecond latency (avg: 0.02ms, max: 1ms). First connection latency: 2ms. Daemon readiness validation added to prevent test methodology races with initialization
Document Retrieval Optimization: Replaced O(n) full table scans with O(log n) indexed lookups in cat/get operations. Changed from queryDocumentsByPattern('%') → getDocumentByHash(hash) eliminating 120K+ document scans per retrieval (lines 850, 934 in document_service.cpp)
Name Resolution Fix: Fixed pattern generation for basename-only queries. Now generates '%/basename' pattern FIRST to use containsFragment query instead of failing exactPath (path_hash) match. yams get --name and yams cat <name> now work correctly
Grep FTS-First: Optimized grep to START with FTS5 index search for literal patterns before falling back to full document scan. Regex patterns still use full scan. Significantly improves grep performance on large repositories
ONNX Plugin: Upgraded the ONNX plugin to conform to the modern model_provider_v1 (v1.2) interface specification.
Enhanced ui_helpers.hpp with 30+ new utilities: value formatters (format_bytes, format_number, format_duration, format_percentage), status indicators (status_ok, status_warning, status_error), table rendering (Table, render_table), progress bars, text utilities (word wrap, centering, indentation)
Improved yams status with color-coded severity indicators, human-readable formatting, and sectioned layout
Enhanced yams daemon status with humanized counter names (CAS, IPC, EMA, DB acronyms preserved), smart byte/number formatting
Added yams daemon status -d detailed view with storage overhead breakdown showing disk usage by component (CAS blocks, ref counter DB, metadata DB, vector DB, vector index) with overhead percentage relative to content

Deprecated¶

MCP get_by_name tool: Use get tool with name parameter instead. The get tool now smartly handles both hash and name lookups with optimized pattern matching
Streaming Protocol Bug: Fixed critical bug where GetResponse/CatResponse sent header-only frame (empty content) followed by data frame, causing CLI to process first frame and fail. Added force_unary_response check in request_handler.cpp to disable streaming for these response types, forcing single complete frame transmission
Protobuf Schema: Added missing bool has_content = 6 field to GetResponse message in ipc_envelope.proto. Updated serialization to explicitly set/read flag instead of recalculating, preventing desync between daemon and CLI
Daemon: Fixed a regression in the plugin loader that prevented legacy model provider plugins (like the ONNX provider) from being correctly detected and adopted. The loader now includes a fallback to detect and register providers using the legacy getProviderName/createProvider symbols, restoring embedding generation functionality.
Grep Service: Fixed critical bug where --paths-only mode returned all candidate documents without checking pattern matches, causing incorrect “(no results)” responses. Removed premature fast-exit optimization; grep now properly runs pattern matching and returns only files that match. (Issue: 135K docs indexed but grep returned empty, audit revealed fast-exit bypassed validation)
Grep CLI: Fixed session pattern handling bug where session include patterns were incorrectly used as document selectors instead of result filters. Session patterns now properly merged into includePatterns for filtering, not paths for selection. This prevented grep from finding any results when a session was active.

[v0.7.3] - 2025-10-08¶

Added¶

Bench: minimal daemon warm-start latency check moved to an opt-in bench target and suite.
New standalone binary tests/yams_bench_daemon_warm executes a bounded start/sleep/stop cycle with vectors disabled and tight init timeouts; asserts <5s end-to-end.
Meson test registered as bench_daemon_warm_latency in the yams:bench suite.
Disabled by default in CI; enable by setting RUN_DAEMON_WARM_BENCH=true (workflow env) and YAMS_ENABLE_DAEMON_BENCH=1 (step env) to run only this bench.
Tree-Diff Metadata & Retrieval Modernization🎉
Tree-based snapshot comparison: Implemented Merkle tree-based diff algorithm for efficient snapshot comparison with O(log n) subtree hash optimization for unchanged directories.
Rename detection: Hash-based rename/move detection with ≥99% accuracy, enabled by default in yams diff command.
Knowledge Graph integration: Path and blob nodes with version edges and rename tracking via fetchPathHistory() API.
Enhanced graph command: yams graph now queries KG store for same-content relationships and rename chains.
Tree diff as default: yams diff uses tree-based comparison by default; --flat-diff flag available for legacy behavior.
RPC/IPC exposure: Added ListTreeDiff method to daemon protocol (protobuf + binary serialization).

Changed¶

Daemon Async Architecture: Unified on modern Boost.Asio 1.82+ patterns with C++20 coroutines (asio::awaitable)
Single io_context with work guard for all async operations
Strands for logical separation (init, plugin, model domains)
RAII cleanup guards for automatic resource management
Error codes via as_tuple instead of exceptions for hot paths
Semaphore-based bounded concurrency instead of manual atomic flags
Compression-first retrieval DocumentService, CLI, and daemon IPC now default to returning compressed payloads with full metadata (algorithm, CRC32s, sizes)
Path query pipeline: Replaced the legacy findDocumentsByPath helper with the normalized queryDocuments API and the shared queryDocumentsByPattern utility. All services (daemon, CLI, MCP, mobile bindings, repair tooling, vector ingestion) now issue structured queries that leverage the path_prefix, reverse_path, and path_hash indexes plus FTS5 for suffix matches, eliminating full-table LIKE scans.
Schema migration: Migration v13 (Add path indexing schema) continues to govern the derived columns/indices; applying this release replays the up hook in place (normalizing existing rows and rebuilding the FTS table), so existing deployments automatically benefit from the optimized lookups after the usual migration step.
CLI Retrieval (get/cat): partial-hash resolution now routes through RetrievalService using the daemon’s streaming search and the metadata-layer hash-prefix index.
yams get and yams cat accept 6–64 hex prefixes; ambiguity can be resolved via --latest/--oldest. No more local metadata table scans; latency improves especially on large catalogs.
Internals: RetrievalService::resolveHashPrefix consumes SearchService hash results and applies newest/oldest selection hints; GetCommand validates and normalizes hash input before issuing a daemon Get.

Fixed¶

Daemon IPC: Fixed a regression in the grep IPC protocol where GrepRequest and GrepResponse messages were not fully serialized, causing data loss. The protocol definitions and serializers have been updated to correctly handle all fields, including show_diff in requests and detailed statistics in responses.
Indexing: Fixed an issue where updated files were not being re-indexed. The change detection logic now correctly considers file modification time and size, in addition to content hash, to reliably identify changes.
Indexing: Corrected the document update process to prevent duplicate records for the same file path when a file is updated. The indexer now properly distinguishes between new documents and updates to existing ones.
Daemon IPC: Fixed an issue where search and grep commands could time out without producing output by improving the efficiency of the daemon’s streaming response mechanism.
Daemon IPC: Optimized non-multiplexed communication paths to prevent performance issues and potential timeouts with large responses from commands like get and cat.

[v0.7.2] - 2025-10-03¶

Added¶

Automatic directory snapshot generation with ISO 8601 timestamp IDs and git metadata detection (commit, branch, remote). Every yams add <directory> now creates a timestamped snapshot stored in the tree_snapshots table.
Snapshot Listing: New yams list --snapshots command displays all available snapshots with table and JSON output formats, showing snapshot IDs, directory paths, labels, git commits, and file counts.
Implemented yams diff <snapshotA> <snapshotB> command with tree, flat, and JSON output formats for comparing directory snapshots.
TreeDiffer automatically detects renamed/moved files via SHA-256 hash equivalence matching, enabled by default.

Changed¶

Snapshot Labels: yams add now accepts optional --label flag for human-readable snapshot names.
Indexing Service: Enhanced to persist snapshot metadata (snapshot_id, directory_path, git metadata, file count) to database after directory ingestion.
Metadata Repository: Added upsertTreeSnapshot(), listTreeSnapshots(), and tree diff persistence methods for snapshot and change history management.
Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
Search: Search thread pools are now configured by the central TuningManager to adapt to system load and tuning profiles.
Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
Search: Search thread pools are now configured by the central TuningManager to adapt to system load and tuning profiles.
Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
Added FTS5 readiness fast-path check in getByNameSmart() to prevent 3-second blocking timeouts when search indexes are updating.
Added post_ingest_queue_depth field to status response, enabling clients to check if FTS5 indexes are ready before attempting expensive search operations.
TUI browse command now resolves listings and fuzzy search through the shared AppContext service bundle (TUIServices + IDocumentService/ISearchService), with graceful fallback to metadata/content-store paths when the daemon is degraded.
CLI Browse: Shift+R reindex dialog now performs a full extraction + index refresh through TUIServices::reindexDocument, providing inline success/error feedback instead of the previous placeholder flow.

Fixed¶

Daemon IPC: SocketServer now shares a live writer-budget reference with every connection and the tuning manager pushes updates through it. Multiplexed streams adjust bandwidth limits immediately when profiles or runtime heuristics change.
Search: Corrected an issue where yams search --include was not being applied for hybrid searches. The include pattern is now passed to the daemon and correctly filters results.
Fixed protobuf UTF-8 validation errors when grepping binary files or non-UTF-8 text. Changed GrepMatch.line, context_before, and context_after fields from string to bytes type in protobuf definition. This allows grep to handle arbitrary byte sequences including binary content, Latin-1, Windows-1252, and other legacy encodings without validation failures. (PBI-001, task 001-33)
Daemon IPC: replaced the io_context.run_for polling loop with dedicated run_one workers so async accept completions are no longer starved during streaming requests. Added optional diagnostic thread (YAMS_SOCKET_RUN_DIAG) for debugging.
CLI Browse: refuse to launch the FTXUI browser when the terminal is non-interactive, lacks TERM capabilities, or is smaller than 60x18; emit a clear resize guidance message instead of hanging or crashing.
CLI Search: release pooled daemon clients before process teardown to prevent the std::system_error: mutex lock failed abort when yams search exits after hitting the daemon path.

[v0.7.1] - 2025-09-29¶

Changed¶

GrepService: expanded candidate discovery to preselect from req.paths using SQL LIKE prefix scans, aligning service behavior with CLI expectations for directory patterns.
RepairCoordinator refocus: on live DocumentAdded events, skip queuing when the post‑ingest
Post‑ingest pipeline: improvements
ServiceManager enqueue path: simplified enqueuePostIngest to a direct blocking enqueue. This improves predictability and throughput under high load.
CLI Download UX: yams download now clearly displays the ingested content hash

Fixed¶

GrepService streaming: flushes the final partial line when scanning cold CAS streams so single-line files are matched reliably (e.g., hello.txt).
Reduced GrepService log verbosity to debug for internal counters and match traces.
Fixed IPC protocol regression where grep and list commands failed to properly communicate with the daemon after migration, causing incomplete results or timeouts in multi-service environments.
This issue impacted other tools result output
Guarded compression monitor global statistics with a dedicated mutex to stop concurrent tracker updates from crashing unit_shard5 (validated via meson test -C build/debug unit_shard5 --print-errorlogs).
Repaired the document_service metadata pipeline regression so fixture-driven search tests no longer observe missing extracted content.
MCP stdio transport: replaced unused static output mutex with an instance mutex to satisfy ODR/build on certain platforms.

[v0.7.0] - 2025-09-25¶

Highlights¶

These changes reduce CPU spikes observed in profiles for large greps and remove blocking storage scans from interactive status paths. Post-ingest work is intentionally bounded; processing may take longer, but overall system responsiveness improves.
Stability: resolved connection timeouts under multi-agent load by removing the hard 100-connection cap and deriving a dynamic accept limit. Defaults honor YAMS_MAX_ACTIVE_CONN or compute a safe cap from CPU cores and IO concurrency.
Throughput: added tuning profiles (efficient | balanced | aggressive). Profiles modulate pool growth, IO thresholds, and post-ingest workers. Default is balanced.
Indexing UX: Add/ingest returns fast; post‑ingest queue handles FTS/embeddings/KG in the background. Path‑series versioning (Phase 1) is on by default behind an env flag.

Added¶

Tuning profiles selectable via config or env:
Config: yams config set tuning.profile <efficient|balanced|aggressive>
Env: YAMS_TUNING_PROFILE=<profile>
Config defaults now include [tuning] profile = "balanced".
Docs: docs/admin/tuning_profiles.md covering profiles, envs, and observability.
Versioning (Phase 1): path‑series lineage with VersionOf edges and metadata flags version, is_latest, series_key. Duplicate (same hash) re‑ingest does not create a new version; alternate locations and timestamps are updated.
CLI Search: grouped multi‑version presentation (default on) with new controls.
Groups results by canonical path when multiple versions of the same file are returned.
New flags:
- --no-group-versions — disable grouping and show the flat list.
- --versions <latest|all> — choose best only (default: latest) or list versions per path.
- --versions-topk <N> — cap versions shown per path when --versions=all (default: 3).
- --versions-sort <score|path|title> — sort versions within a group (default: score).
- --no-tools — hide per‑version tool hints.
- --json-grouped — emit grouped JSON; plain --json remains flat and backward compatible.
Tool hints shown per version (when grouped): yams get --hash <hash> | yams cat --hash <hash> | yams restore --hash <hash>; if a local file path is resolved, a yams diff --hash <hash> <local-path> hint is added.
Environment toggles: YAMS_NO_GROUP_VERSIONS=1 and YAMS_NO_GROUP_TOOLS=1 to flip defaults.
Note: This is a presentation‑layer change; service/daemon APIs are unchanged.

Changed¶

Build System
The primary build system has been migrated from CMake to Meson. All build, test, and packaging scripts have been updated to use the new Meson-based workflow.
Status/Stats (CLI): use daemon metrics by default and never trigger local storage scans.
yams status and yams stats -v now render from the same non-detailed daemon snapshot; removed the “scanning storage…” spinner and filesystem walks.
Verbose output formats the JSON fields instead of performing extra scans.
Tools/Stats (yams-tools): tools/yams-tools/src/commands/stats_command.cpp refactored to prefer daemon-first metrics with a legacy local fallback only if daemon is unavailable.
MCP add_directory: switched to daemon-first ingestion with a brief readiness wait to avoid “Content store not available” races. Removes local store preflight; maps NotInitialized to a clear, retryable message from the daemon.
MCP search: path normalization + optional diff parity with CLI.
New request field include_diff adds a structured diff block to results when the path_pattern points to a local file; mirrors yams search diff behavior.
MCPSearch DTOs extended to round-trip include_diff, diff, and local_input_file.
Daemon accept scaling: removed fixed cap; now dynamically computes maxConnections from recommendedThreads * ioConnPerThread * 4 (min 256) unless YAMS_MAX_ACTIVE_CONN is set.
Backpressure: increased default read pause to 10ms to smooth heavy load.
Post‑ingest: preserves bounded capacity; de‑dupes inflight, indexes FTS, updates fuzzy index, and emits KG nodes/edges best‑effort.
Status/Stats: JSON correctness improvements; omit misleading savings when physical size unknown; surface post‑ingest bus usage and document counters.
CLI Search: grouping of multiple versions per path is enabled by default; paths‑only output and flat JSON remain unchanged unless --json-grouped is specified.

Fixed¶

Regression in metadata extraction and storage used in search and grep tools The async post-ingest pipeline never persisted extracted text into the metadata store. As a result, document_content stayed empty, so search, repairs, and semantic pipelines saw “Document content not found” despite vector insert logs.
Many tuning optimizations for daemon usage
Grep pipeline: staged KG → metadata → content with caps and budget.
Prefers “hot” text (metadata-extracted) and caps cold CAS reads; early path/include filters.
Added a global time budget (internal) to stop long content scans gracefully.
Capped grep worker threads to a small, background-friendly number by default (≤4).
Grep streaming optimization: replaced per-character streambuf overflow with bulk line splitting (memchr-based) to eliminate the per-byte hotspot in profiles during CAS streaming.
Post-ingest queue: bounded by configuration, not CPU heuristics.
Default worker threads set conservatively to 1 unless configured in [tuning] as post_ingest_threads. Queue capacity now honored from post_ingest_queue_max.
Added a tiny yield between tasks to reduce contention and smooth CPU.
Addressed intermittent CLI timeouts and “Broken pipe” logs observed when many agents connected concurrently. Accept loop backoff now respects the higher connection cap and IO pool growth from the tuning manager.
Minor unit test fixes (Result value handling) to unblock CI.