Skip to content

Changelog Archive: v0.7.x Series

Archived Changelogs

  • v0.7.x archive: docs/changelogs/v0.7.md
  • v0.6.x archive: docs/changelogs/v0.6.md
  • v0.5.x archive: docs/changelogs/v0.5.md
  • v0.4.x archive: docs/changelogs/v0.4.md
  • v0.3.x archive: docs/changelogs/v0.3.md
  • v0.2.x archive: docs/changelogs/v0.2.md
  • v0.1.x archive: docs/changelogs/v0.1.md

[v0.7.10] - 2026-12-20

Added

  • Graph command --list-types flag (yams-66h): Node type discovery for knowledge graph
  • New --list-types flag shows all distinct node types with counts
  • Table output with TYPE and COUNT columns, ordered by count descending
  • JSON output with nodeTypes array containing type and count fields
  • Added getNodeTypeCounts() method to KnowledgeGraphStore interface
  • Extended GraphQueryRequest IPC protocol with listTypes mode
  • Usage hint when no nodes found: suggests yams add <path>
  • Location: src/cli/commands/graph_command.cpp, include/yams/metadata/knowledge_graph_store.h
  • KnowledgeGraphStore query tests (yams-cqp): Unit tests for graph query methods
  • findNodesByType pagination tests: limit, offset, combined pagination, empty results
  • findIsolatedNodes tests: nodes with no incoming edges, different relation types
  • getNodeTypeCounts tests: type counts, ordering, empty graph
  • 4 test cases with 246 assertions
  • Location: tests/unit/daemon/graph_component_catch2_test.cpp
  • P4 language support for symbol extraction: Network data plane language (P4_16)
  • Node types: headerTypeDeclaration, structTypeDeclaration, controlDeclaration, parserDeclaration, actionDeclaration, tableDeclaration
  • Query patterns for actions, functions, headers, structs, controls, parsers, tables, typedefs
  • Aliases: p4, p4_16, p4lang
  • Grammar auto-download from prona-p4-learning-platform/tree-sitter-p4
  • Vector diagnostics in DaemonMetrics: Moved collect_vector_diag to background polling
  • Added vectorEmbeddingsAvailable, vectorScoringEnabled, searchEngineBuildReason to MetricsSnapshot
  • Status requests now read from cached snapshot (non-blocking)
  • Resolves status command hangs when vector services are slow
  • Entity extraction metrics in status output: Added entity queue/inflight counters
  • New metrics: entityQueued, entityDropped, entityConsumed, entityInFlight
  • Exposed via yams status and yams status -v output
  • JSON output includes entity_queued, entity_consumed, entity_dropped, entity_inflight
  • Location: include/yams/daemon/components/DaemonMetrics.h, src/cli/commands/status_command.cpp
  • Gitignore support for directory ingestion: Skip files matching .gitignore patterns
  • New --no-gitignore flag for yams add command to disable gitignore filtering
  • Default behavior respects .gitignore patterns in the root directory
  • Supports standard gitignore patterns: wildcards, directory patterns, anchored paths
  • Location: src/cli/commands/add_command.cpp, src/app/services/indexing_service.cpp

Changed

  • Constexpr language configuration for symbol extraction: Centralized compile-time configuration
  • 17 languages with constexpr node types and query patterns: C, C++, Python, Rust, Go, Java, JavaScript, TypeScript, C#, PHP, Kotlin, Perl, R, SQL, Solidity, Dart, P4
  • LanguageConfig struct with class_types, field_types, function_types, import_types, identifier_types
  • Query patterns: function_queries, class_queries, import_queries, call_queries
  • Language alias support (e.g., “cpp” → “c++”, “cxx”, “cc”)
  • getLanguageConfig() constexpr lookup function
  • Location: plugins/symbol_extractor_treesitter/symbol_extractor.cpp
  • Field extraction: New extractFields() method extracts class member variables
  • Uses node type traversal with language-specific field types
  • Creates field kind symbols with proper byte ranges
  • Member containment relations: New extractMemberRelations() method
  • Creates contains edges from classes to their methods/fields
  • Uses byte range containment to determine class membership
  • Improves knowledge graph structure for code navigation
  • PostIngestQueue per-stage metrics: Exposed extraction/KG/symbol stage inflight counts
  • New getters: extractionInFlight(), kgInFlight(), symbolInFlight(), totalInFlight()
  • Static constexpr limits: maxExtractionConcurrent(), maxKgConcurrent(), maxSymbolConcurrent()
  • Exposed via daemon status: extraction_inflight, kg_inflight, symbol_inflight
  • yams status shows POST line when there’s active work
  • yams status -v shows per-stage breakdown
  • yams daemon status -d shows full Post-Ingest Pipeline section
  • JSON output includes stages object with per-stage counts
  • Location: include/yams/daemon/components/PostIngestQueue.h, src/cli/commands/status_command.cpp, src/cli/commands/daemon_command.cpp
  • PostIngestQueue dynamic concurrency scaling (PBI-05a): Auto-scale based on queue depth
  • New TuneAdvisor tunables: postExtractionConcurrent(), postKgConcurrent(), postSymbolConcurrent(), postEntityConcurrent()
  • Dynamic limits replace static constexpr values in PostIngestQueue pollers
  • TuningManager scales concurrency based on queue depth thresholds:
    • >1000 queued: extraction=hwThreads/2, kg=hwThreads/2
    • >500 queued: extraction=hwThreads/4, kg=32
    • >100 queued: extraction=hwThreads/8+4, kg=16
    • >10 queued: extraction=8
    • idle: extraction=4 (default)
  • Status output shows limits: stages: extract=4/4, kg(q=0/i=0/8), symbol=0/4
  • JSON includes extraction_limit, kg_limit, symbol_limit, entity_limit
  • Location: include/yams/daemon/components/TuneAdvisor.h, src/daemon/components/TuningManager.cpp, src/daemon/components/DaemonMetrics.cpp
  • Knowledge Graph cleanup on document deletion: Deleting documents now cascades to KG
  • deleteNodesForDocumentHash(): Removes doc:<hash> nodes and symbol nodes with matching document_hash
  • Integrated into document deletion flow for automatic cleanup
  • Location: include/yams/metadata/knowledge_graph_store.h, src/app/services/document_service.cpp
  • Stale edge cleanup on re-indexing: Symbol extraction now cleans up old relationships
  • deleteEdgesForSourceFile(): Removes edges where properties.source_file matches path
  • Called automatically before re-extraction to prevent stale relationship accumulation
  • Location: src/daemon/components/EntityGraphService.cpp
  • Optimized isolated node query: yams graph --isolated now uses single SQL query
  • findIsolatedNodes(): Efficient NOT EXISTS subquery instead of N+1 pattern
  • New IPC fields: isolatedMode, isolatedRelation in GraphQueryRequest
  • Significant performance improvement for large graphs
  • Location: src/cli/commands/graph_command.cpp, src/daemon/components/dispatcher/request_dispatcher_graph.cpp
  • Daemon log command: Added yams daemon log
  • ExternalPluginHost: New plugin host for Python/process-based plugins (RFC-EPH-001)
  • Implements IPluginHost interface for external plugins running as separate processes
  • JSON-RPC 2.0 communication over stdio using existing PluginProcess and JsonRpcClient
  • Supported plugin types: Python (.py), Node.js (.js), any executable with JSON-RPC support
  • Process lifecycle management: spawn, monitor, health checks, graceful shutdown
  • Automatic crash recovery with configurable restart policy (max retries, backoff)
  • Trust-based security model with persistent trust file
  • RPC gateway for calling arbitrary plugin methods (callRpc)
  • Plugin statistics tracking (uptime, restart count, health status)
  • State change callbacks for monitoring plugin lifecycle events
  • Location: include/yams/daemon/resource/external_plugin_host.h, src/daemon/resource/external_plugin_host.cpp
  • Auto-init mode: New yams init --auto flag for containerized/headless environments
  • Enables vector database with default model (all-MiniLM-L6-v2)
  • Enables plugins directory setup
  • Generates authentication keys
  • Skips S3 configuration (uses local storage)
  • Non-interactive: no prompts, uses sensible defaults
  • Tree-sitter grammar download: yams init now offers to download tree-sitter grammars
  • Interactive menu: recommended (C, C++, Python, JS, TS, Rust, Go), all, or custom selection
  • Auto-downloads and builds grammars from official GitHub repos
  • Supports 14 languages: C, C++, Python, JavaScript, TypeScript, Rust, Go, Java, C#, PHP, Kotlin, Dart, SQL, Solidity
  • Cross-platform: MSVC, MinGW, GCC, Clang compilation support
  • Grammar prompt also available when YAMS is already initialized
  • Grammars installed to XDG_DATA_HOME/yams/grammars (Unix) or %LOCALAPPDATA%\yams\grammars (Windows)
  • New embedding model option: Added multi-qa-MiniLM-L6-cos-v1 as second model choice
  • Trained on 215M question-answer pairs for semantic search optimization
  • Same dimensions (384) as default model for compatibility
  • Replaces all-mpnet-base-v2 (768 dim) in model selection
  • Git-based version detection: Build system now auto-detects version from git tags
  • Uses most recent semver tag (v*) as effective version
  • Falls back to project version only if no tags exist
  • Command-line override (-Dyams-version=X.Y.Z) takes highest priority
  • Commit hash in version output: yams --version now shows short commit hash
  • Format: 0.7.9 (commit: c16939f) built:2025-11-29T17:30:15Z
  • Helps identify exact build for bug reports and debugging
  • Init command tests: New test suite for init command model download functionality
  • Tests for valid HuggingFace URLs, model dimensions, naming conventions
  • CLI flag acceptance tests (--auto, --non-interactive, --force)
  • Content-type-aware search profiles: New CorpusProfile enum and auto-detection
  • CODE: Boosts symbol/path search for source code repositories (60%+ code files)
  • PROSE: Boosts FTS5/vector search for text-heavy corpora (60%+ docs)
  • DOCS: Balanced weights for mixed code/documentation
  • MIXED: Default balanced weights for heterogeneous corpora
  • SearchEngineConfig::detectProfile(): Auto-detects from file extension distribution
  • SearchEngineConfig::forProfile(): Returns preset weights for a profile
  • Session-isolated memory: Documents can now be isolated to working sessions
  • New CLI commands: yams session create, open, close, status, merge, discard
  • Documents added during an active session are tagged with session_id metadata
  • Session documents are invisible to global searches (use --global to bypass)
  • merge: Removes session tag to promote documents to global index
  • discard: Permanently deletes all session documents
  • Supports multiple concurrent sessions with automatic isolation
  • Database migration adds session tracking to metadata repository
  • Windows Job Object for plugin processes: External plugin child process cleanup
  • Plugin processes are now assigned to Windows Job Objects
  • All child processes are automatically terminated when plugin unloads
  • Prevents orphaned processes from holding file locks (e.g., PID files)
  • Uses JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE for reliable cleanup
  • Location: src/extraction/plugin_process.cpp
  • Plugin health command: New yams plugin health [name] subcommand for plugin diagnostics
  • Shows plugin status, interfaces, models loaded, and error state
  • Displays model provider FSM state (Idle, Loading, Ready, Degraded, Failed)
  • Lists all loaded models when provider is ready
  • JSON output support with --json flag
  • Location: src/cli/commands/plugin_command.cpp
  • Plugin info improvements: Enhanced yams plugin info output
  • Now uses StatusResponse.providers for accurate plugin status
  • Shows plugin type (native/external), interfaces, and path
  • Properly handles both ABI and external plugin hosts

Changed

  • Embedding model list: Both recommended models now have 384 dimensions
  • all-MiniLM-L6-v2: Lightweight general-purpose semantic search (default)
  • multi-qa-MiniLM-L6-cos-v1: Optimized for question-answer semantic search
  • ServiceManager Decomposition: Extracted focused components from monolithic ServiceManager
  • New ConfigResolver: Static config/env resolution utilities (248 lines)
  • New VectorSystemManager: Vector DB and index lifecycle (397 lines)
  • New DatabaseManager: Metadata DB, connection pool, KG store lifecycle (254 lines)
  • New PluginManager: Plugin host, loader, and interface adoption (515 lines)
  • ServiceManager accessors now delegate to extracted managers
  • Configurable Vector DB Capacity: Vector index max_elements now configurable
  • Environment variable: YAMS_VECTOR_MAX_ELEMENTS
  • Config file: [vector_database] max_elements
  • Default: 100,000 (range: 1,000 - 10,000,000)
  • FTS5 index hygiene (migration v18): Removed unused content_type column from FTS5 index
  • content_type was indexed but never queried via FTS MATCH
  • Content type filtering uses JOIN on documents.mime_type instead
  • Reduces FTS5 index size and improves indexing performance
  • Automatic migration rebuilds index on first database open
  • Daemon socket logging noise reduction: Request/mux/enqueue/drain logs now emit at debug level
  • Default info-level daemon logs no longer show per-request socket traffic
  • Enable debug logging to inspect connection-level request handling details
  • SearchEngine Consolidation: Unified search architecture by removing legacy HybridSearchEngine
  • SearchEngine is now the sole search engine, consolidating multi-component search (FTS5, PathTree, Symbol, KG, Vector, Tag, Metadata)
  • Removed ~2000 lines of legacy code: hybrid_search_engine.cpp, hybrid_search_factory.cpp, and associated headers
  • Parallel Execution: SearchEngine now uses std::async to execute all 7 component queries simultaneously
    • Configurable via SearchEngineConfig::enableParallelExecution (default: true)
    • Per-component timeout via SearchEngineConfig::componentTimeout (default: 100ms)
    • Graceful degradation: timed-out components are skipped, others continue
  • Updated Interfaces: AppContext.searchEngine replaces AppContext.hybridEngine across CLI, daemon, and services
  • SearchEngineBuilder: Simplified to create SearchEngine directly (removed MetadataKeywordAdapter and KG scorer wiring)
  • Removed unused benchmark executables: engine_comparison_bench, hybrid_search_bench
  • Location: src/search/, include/yams/search/, src/app/services/, src/cli/
  • HotzoneManager Persistence: Added save/load functionality for hotzone state
  • HotzoneManager::save(path): Serializes hotzone entries to JSON with atomic write (temp + rename)
  • HotzoneManager::load(path): Restores persisted hotzone state on startup
  • Stores version, half-life config, and timestamped entry scores
  • Location: src/search/hotzone_manager.cpp, include/yams/search/hotzone_manager.h
  • CheckpointManager Component: New daemon component for periodic state persistence
  • Manages vector index and hotzone checkpoint scheduling
  • Configurable interval, threshold-based vector index saves, optional hotzone persistence
  • Async timer-based loop with graceful shutdown support
  • Post-ingest pipeline parallelization: PostIngestQueue and EntityGraphService now use WorkCoordinator
  • Removed serial strand-based processing bottleneck in PostIngestQueue
  • EntityGraphService now posts extraction jobs to shared WorkCoordinator thread pool
  • Removed unused PoolManager “post_ingest” pool and associated TuningManager tuning logic
  • Documents process in parallel across all worker threads with work stealing
  • Graph BFS traversal optimization: Reduced N+1 query patterns in graph traversal
  • New getEdgesBidirectional() API: returns incoming + outgoing edges in single query (UNION)
  • New getNodesByIds() API: batch node retrieval for hydration
  • Edge cache in BFS: edges fetched during neighbor collection reused for connecting edges
  • Reduces per-node queries from 4 (2×getEdgesFrom + 2×getEdgesTo) to 1
  • Location: src/app/services/graph_query_service.cpp, src/metadata/knowledge_graph_store_sqlite.cpp
  • Graph command cleanup: Removed unused --reverse flag
  • Bidirectional traversal is now the default behavior
  • Flag was redundant since BFS optimization returns all connected edges
  • Location: src/cli/commands/graph_command.cpp

Fixed

  • JavaScript/TypeScript symbol extraction: Audited and fixed against Tree-sitter grammars
  • JavaScript: Added function_expression, generator_function, generator_function_declaration, namespace_import, export_statement, export_specifier
  • TypeScript: Added abstract_class_declaration, abstract_method_signature, function_expression, generator_function, import_alias
  • Added queries for function expressions, generators, abstract methods, export statements
  • Graph --name query now shows symbol relationships: Fixed yams graph --name <file> showing “Graph data unavailable”
  • Now resolves filename to file node key and uses KG query path
  • Shows connected symbols, includes, and document nodes
  • Falls back to document-based lookup if file node not found
  • Location: src/cli/commands/graph_command.cpp
  • KG queue metric now shows pending count: Fixed kg(q=N) showing cumulative total instead of pending items
  • Now calculates: pending = queued - consumed - inflight
  • Affects yams status -v and yams daemon status -d displays
  • Location: src/cli/commands/status_command.cpp, src/cli/commands/daemon_command.cpp
  • Symbol extraction extension mapping: Fixed extension lookup not matching due to leading dot mismatch
  • Database stores extensions with dots (.cpp), map keys without (cpp)
  • PostIngestQueue now strips leading dot before lookup
  • Location: src/daemon/components/PostIngestQueue.cpp
  • Graph query bidirectional traversal: Fixed graph queries showing 0 connections for blob nodes
  • BFS traversal now follows both incoming and outgoing edges by default
  • Blob nodes (which only have incoming has_version edges from path nodes) now return connected nodes
  • Refactored dispatcher to delegate to GraphQueryService (single responsibility)
  • Repair tracking (migration v21): Added repair status tracking to prevent duplicate work
  • New repair_status column (pending, processing, completed, failed, skipped)
  • repair_attempted_at timestamp and repair_attempts counter
  • RepairCoordinator filters by status to avoid re-queuing processed documents
  • Plugin interface parsing: Fixed object-format interfaces not parsing correctly
  • Plugin host sharing: Fixed model provider adoption failure after component extraction
  • VectorIndexManager initialization: Fixed “VectorIndexManager not provided” search engine build failure
  • Model download mapping: Added multi-qa-MiniLM-L6-cos-v1 to HuggingFace repo mapping
  • Version display: Fixed yams --version showing fallback values
  • Socket crash on shutdown: Fixed EXC_BAD_ACCESS in kqueue_reactor during program exit
  • Windows daemon status metrics: CPU and memory now report accurate values
  • --name flag for yams add: Fixed custom document naming for single-file adds
  • External plugin extractors: Fixed content extractors from external plugins not being used
  • Trust file persistence: Fixed plugin trust file being deleted on daemon restart
  • Trust file comment parsing: Fixed daemon crash when loading trust file with comments
  • Plugin trust initialization order: Fixed plugins not loading despite being trusted
  • Post-ingestion pipeline reliability: Improved async processing consistency
  • Graph IPC serialization: Added missing ProtoBinding specializations for GraphQueryRequest/Response
  • Status command document count: Fixed yams status showing docs=0 after daemon restart
  • Short status now uses documents_total (from metadata DB, initialized on startup)
  • Previously used storage_documents (CAS object count, which was 0 on fresh start)
  • Detailed status was unaffected as it already used the correct field
  • Location: src/cli/commands/status_command.cpp

CLI Improvements

  • PowerShell completion: Added yams completion powershell for PowerShell auto-complete
  • Consistent --json output: Extended JSON output support across commands
  • Actionable error hints: Centralized error hint system with pattern-based hints
  • Daemon error messages: Enhanced daemon start/stop failure messages with recovery hints

Removed

  • HybridSearchEngine: Legacy search engine removed in favor of unified SearchEngine
  • Deleted: src/search/hybrid_search_engine.cpp (~1844 lines)
  • Deleted: src/search/hybrid_search_factory.cpp (~168 lines)
  • Deleted: include/yams/search/hybrid_search_engine.h
  • Deleted: include/yams/search/hybrid_search_factory.h
  • HybridSearchEngine Tests: Removed obsolete test files
  • tests/unit/search/hybrid_search_engine_test.cpp
  • tests/unit/search/hybrid_grouping_smoke_test.cpp
  • tests/unit/search/learned_fusion_smoke_test.cpp
  • tests/unit/search/hierarchical_search_test.cpp
  • tests/unit/metadata/search_metadata_interface_test.cpp
  • Legacy Adapters: Removed MetadataKeywordAdapter (was bridge for HybridSearchEngine)
  • CLI Adapter Rename: HybridSearchResultAdapterSearchResultItemAdapter in result_renderer.h

[v0.7.8] - 2025-11-14

Added

  • Thread Pool Consolidation
  • WorkCoordinator Component: New centralized thread pool manager with Boost.Asio io_context
    • Replaces 3 separate thread pools (IngestService, PostIngestQueue, EmbeddingService)
    • Provides strand allocation for per-service ordering guarantees
    • Hardware-aware thread count (8-32 threads based on CPU cores)
  • Search Service Parallel Post-Processing
    • New ParallelPostProcessor class for concurrent search result processing
    • Parallelizes filtering, facet generation, and highlighting when result count ≥ 100
    • Uses std::async to run independent operations concurrently
    • Threshold-based activation (PARALLEL_THRESHOLD = 100) avoids overhead on small result sets
    • Performance Measured (100 iterations):
    • 100 results: 0.06ms (~1.66M ops/sec) - sequential path
    • 500 results: 0.23ms (~2.21M ops/sec) - parallel path
    • 1000 results: 0.43ms (~2.32M ops/sec) - parallel path
    • Speedup: ~3.4x faster at 1000 results vs linear scaling
    • Location: include/yams/search/parallel_post_processor.hpp, src/search/parallel_post_processor.cpp
    • Integration: search_executor.cpp now uses ParallelPostProcessor instead of sequential processing
    • Benchmarks: tests/benchmarks/search_benchmarks.cpp

Changed

  • Search Service: --fuzzy searches now merge BM25 keyword matches with fuzzy results so enabling typo tolerance never suppresses literal hits. (src/app/services/search_service.cpp)
  • Metadata Repository: Removed the default 50K fuzzy-index cap. The index now covers the full corpus by default and only enforces limits when YAMS_FUZZY_INDEX_LIMIT is set, adding a small safety buffer and explicit guard logging. (src/metadata/metadata_repository.cpp, include/yams/metadata/fuzzy_index_builder.h)
  • Service Architecture Refactor
  • IngestService: Converted from manual thread pool to strand-based channel polling
    • Removed kSyncThreshold heuristics and compat::jthread pool
    • New channelPoller() awaitable for document processing
  • PostIngestQueue: Converted from worker threads to strand-based pipeline
    • Removed Worker struct, thread pool, and token bucket scheduler (~200 lines)
    • Implemented awaitable pipeline: processMetadataStage → (processKnowledgeGraphStage || processEmbeddingStage)
    • Parallel KG and Embedding stages using make_parallel_group
  • EmbeddingService: Converted from worker threads to strand-based channel polling
    • Removed worker thread pool (~70 lines)
    • New channelPoller() awaitable with async timer
  • TuningManager: Converted from manual thread to strand-based periodic execution
    • Removed compat::jthread with stop_token
    • New tuningLoop() awaitable with boost::asio::steady_timer
    • Uses WorkCoordinator strand for pool size adjustments
    • Maintains TuneAdvisor::statusTickMs() polling interval
  • DaemonMetrics: Converted from manual thread to strand-based polling loop
    • Removed std::thread for CPU/memory metrics collection
    • New pollingLoop() awaitable with 250ms timer interval
    • Uses WorkCoordinator strand for metric updates
    • Thread-safe snapshot access via shared_mutex
  • BackgroundTaskManager: Migrated from GlobalIOContext to WorkCoordinator
    • Removed fallback to GlobalIOContext (proper architectural separation)
    • Now uses WorkCoordinator executor for all background tasks
    • Integrated with unified work-stealing thread pool
    • Fts5Job consumer polling delay reduced: 200ms → 10ms (20x throughput improvement)
    • Fixed orphan scan queue overflow (was causing hundreds of dropped batches)
  • ServiceManager: Refactored async operations
    • Eliminated all 5 uses of std::future/std::async
    • Converted database operations to use make_parallel_group with timeouts
  • SearchPool Removal
  • Deleted the unused SearchPool component and associated meson/build wiring
  • ServiceManager no longer constructs dead search infrastructure; HybridSearchEngine remains the sole search path
  • TuneAdvisor/TuningManager now derive concurrency targets directly from SearchExecutor load metrics instead of phantom pool sizes
  • Ingestion Pipeline Cleanup
  • Removed deferExtraction Technical Debt: Eliminated bypass mechanism that skipped full production pipeline
    • Removed deferExtraction field from StoreDocumentRequest and AddDirectoryRequest structs
    • Removed conditional logic in DocumentService that skipped FTS5 extraction
    • All document ingestion now uses full pipeline: metadata storage → FTS5 extraction → PostIngestQueue → (KG extraction || Embedding generation)
    • Updated IngestService to always enqueue to PostIngestQueue (removed lines setting deferExtraction=true)
    • Updated CLI add_command fallback paths (3 locations) to use full pipeline
    • Updated mobile bindings to remove sync_now-based deferral
    • Removed --defer-extraction and --no-defer-extraction flags from ingestion_throughput_bench
    • Updated test helpers (tests/common/capability.h, integration test) to use full pipeline
  • Grep Output Update
  • New default output format
  • Example output:
    === Results for "TaskManager" in 3 files (5 regex, 2 semantic) ===
    
    File: src/core/TaskManager.cpp (cpp)
       Matches: 3 (3 regex)
    
       Line   45: [Regex] class TaskManager {
       Line  102: [Regex] TaskManager::TaskManager() : initialized_(false) {
       Line  237: [Regex] void TaskManager::shutdown() {
    
    [Total: 7 matches across 3 files]
    
  • Location: src/cli/commands/grep_command.cpp:531-645
  • Grep Service Optimizations
  • Literal Extraction from Regex Patterns
    • New LiteralExtractor utility extracts literal substrings from regex patterns
    • Enables two-phase matching: fast literal pre-filter → full regex only on candidates
    • Based on ripgrep’s literal extraction strategy
  • Boyer-Moore-Horspool (BMH) String Search
    • Replaces std::string::find() with BMH algorithm for patterns ≥ 3 characters
  • SIMD Vectorized Newline Scanning
    • Platform-specific implementations: AVX2 (32 bytes), SSE2 (16 bytes), NEON (16 bytes)
    • Scalar fallback using optimized memchr for portability
    • Replaces byte-by-byte scanning in line boundary detection
    • Performance: 4-8x speedup on large files
  • Parallel Candidate Filtering
    • Pre-filters unsuitable files before worker distribution using std::async
    • Integrates magic_numbers.hpp for accurate binary detection (86 compile-time patterns)
    • Filters build artifacts (.o, .class, .pyc), libraries (.a, .so, .dll), executables, packages
    • Chunk-based parallel processing for large candidate sets (>100 files)
    • Performance: 2-4x speedup on large corpora

Fixed

  • Content-backed Fuzzy Hits: Content-derived fuzzy matches (_content entries) now map back to their owning documents, ensuring CLI searches show the expected files. (src/metadata/metadata_repository.cpp, tests/unit/metadata/metadata_repository_test.cpp)
  • Cold Start Vector Index Loading: Fixed issue where search and grep commands returned no results after daemon cold start despite having indexed documents.
  • Search Async Path: Fixed SearchCommand::executeAsync() not populating pathPatterns field in daemon request, causing server-side multi-pattern filtering to fail. The async code path (default execution) now correctly sends all include patterns to the daemon, matching the behavior of the sync path. (src/cli/commands/search_command.cpp:1360-1365)
  • Database Schema Compatibility: Fixed “constraint failed” errors during document insertion on databases with migration v12 (pre-path-indexing schema). The insertDocument() function now conditionally builds INSERT statements based on the hasPathIndexing_ flag, supporting both legacy (13-column) and modern (17-column with path indexing) schemas. This allows YAMS to work correctly regardless of whether migration v13 has been applied. (src/metadata/metadata_repository.cpp:318-380)
  • MCP Protocol Version Negotiation: Fixed “Unsupported protocol version requested by client” error (code -32901) by making protocol version negotiation permissive by default (strictProtocol_ = false). The server now gracefully accepts any protocol version requested by clients, falling back to the latest supported version (2025-03-26) if the requested version is not in the supported list. Also added intermediate MCP protocol versions (2024-12-05, 2025-01-15) to the supported list. This ensures maximum compatibility with MCP clients regardless of which spec version they implement. (src/mcp/mcp_server.cpp:560,1254-1260)
  • MCP Large Response Buffering: Fixed “Error: MPC -32602: Error: End of file” errors when MCP server sends large responses (list, search, grep with many results). Implemented chunked buffered output in StdioTransport::sendFramedSerialized() that breaks payloads >512KB into 64KB chunks with explicit flushes between chunks. This prevents stdout buffer overflow and ensures reliable delivery of large JSON-RPC responses over stdio transport. Also added threshold-based routing in MCPServer::sendResponse() to use buffered sending for payloads >256KB. (src/mcp/mcp_server.cpp:69-95,169-203)

[v0.7.7] - 11-07-2025

Added

  • Hierarchical Embedding Architecture & Two-Stage Hybrid Search
  • Data model extensions for hierarchical embeddings
    • Added EmbeddingLevel enum (CHUNK, DOCUMENT) to distinguish embedding granularity
    • Extended VectorRecord with level, source_chunk_ids, parent_document_hash, child_document_hashes fields
    • Modified embed_and_insert_document to generate document-level embeddings (normalized mean of chunk vectors)
    • Document-level embeddings stored alongside chunk-level for two-stage search readiness
    • Added twoStageVectorSearch method that retrieves broader candidate set and applies hierarchical boosting
    • Configuration fields: enable_two_stage, doc_stage_limit, chunk_stage_limit, hierarchy_boost
    • Groups results by document and boosts scores based on document-level similarity
    • Wired into both parallel and sequential search paths for transparent operation
  • Profiling build support for performance analysis
    • New build type: ./setup.sh Profiling enables instrumentation for Tracy, Valgrind, Perf
    • Builds to build/profiling directory with debug symbols + profiling hooks
    • Fuzzing build stub: ./setup.sh Fuzzing reserved for future AFL++/libFuzzer integration
    • See docs/developer/profiling.md for comprehensive profiling guide
  • EmbeddingService Architecture
  • Problem: PostIngestQueue workers were blocking on slow embedding generation, causing:
    • Documents not searchable until embeddings complete
    • Add commands hanging/timing out
    • Ingest pipeline stalled waiting for embedding models
  • Solution: Separated embedding generation into dedicated EmbeddingService that consumes from InternalBus
    • PostIngestQueue now 2-stage pipeline (Metadata + KnowledgeGraph) - embeddings removed
    • Documents searchable immediately after FTS5 indexing (~milliseconds)
    • Embeddings generated asynchronously in background by EmbeddingService workers
    • Better resource isolation: ingest and embedding workers independently tunable
    • No more blocking: add commands return immediately, documents queryable right away
  • ServiceManager & Daemon Lifecycle Improvements
  • Structured Concurrency: Replaced manual backpressure logic with std::counting_semaphore for natural bounded concurrency
  • SocketServer Improvements:
    • Converted async_accept to as_tuple pattern, eliminating exception overhead during shutdown
    • Connection future tracking for graceful shutdown with 2s timeout verification
  • Modern Error Handling: Consistent use of boost::asio::as_tuple(use_awaitable) for error codes instead of exceptions
  • Future Tracking: Replaced detached spawns with use_future for verifiable connection lifecycle management
  • Doctor Prune Command: Intelligent cleanup of build artifacts, logs, cache, and temporary files
  • Support for 9 build systems (CMake, Ninja, Meson, Make, Gradle, Maven, NPM/Yarn, Cargo, Go)
  • Detection across 10+ programming languages (C/C++, Java, Python, JavaScript, Rust, Go, OCaml, Haskell, Erlang, etc.)
  • Hierarchical category system: build-artifacts, build-system, logs, cache, temp, coverage, IDE
  • Extended package manager support: Added 9 new categories for package dependencies and caches
    • IDE-specific: ide-vscode, ide-intellij, ide-eclipse for workspace caches
    • Dependencies: package-node-modules (npm/yarn/pnpm), package-composer-vendor (PHP), package-cargo-target (Rust)
    • Caches: package-python-cache (pycache/), package-maven-repo, package-gradle-cache, package-go-cache, package-gem-cache, package-nuget-cache
    • Composite groups: package-deps, package-cache, packages (all), ide-all
    • Path-based detection for directories: node_modules/, pycache/, .vscode/, target/, vendor/, etc.
  • Dry-run by default with --apply flag for execution
  • Usage: yams doctor prune --category build-artifacts --older-than 30d --apply
  • Usage: yams doctor prune --category packages --apply (clean all package artifacts)
  • Started C++23 Compatibility support expansion
  • Migrated vectordb to https://github.com/trvon/sqlite-vec-cpp
  • Tree-sitter Symbol Extraction Plugin Enhanced multi-language symbol extraction with Solidity support
  • Solidity Support: Added complete Solidity language support with 4 query patterns (functions, constructors, modifiers, fallback/receive)
  • Enhanced C++ Patterns: 16 function patterns + 6 class patterns including templates, constructors, destructors, operator overloads, method declarations inside class bodies
  • Multi-Language Improvements: Enhanced patterns for Python (decorated functions), Rust (impl/trait methods), JavaScript/TypeScript (arrow functions, generators, async), Kotlin (property declarations) across all 15 supported languages
  • Critical Bug Fix: Fixed query execution early-return bug that caused pattern short-circuiting - now executes all patterns resulting in 2.2x recall improvement (20.6% → 45.1%)
  • Benchmark Infrastructure: Catch2-based benchmark suite with quality metrics (Recall/Precision/F1), performance metrics (Throughput/Latency), and JSON output for CI integration
  • GTest Suite: 7 Solidity tests covering ERC20 tokens, inheritance, interfaces, events, and modifiers (372 lines, all passing)
  • Plugin auto-downloads tree-sitter grammars on first use (configurable via plugins.symbol_extraction.auto_download_grammars)
  • CLI commands: yams config grammar list/download/path/auto-enable/auto-disable
  • Supports tree-sitter v13-15 grammar versions
  • Entity Graph Service: Background service for extracting and materializing code symbols into Knowledge Graph
  • Wired into IndexingPipeline and RepairCoordinator for automatic symbol extraction
  • Supports plugin-based language-specific symbol extraction
  • Foundation for symbol-aware search and code intelligence features
  • Database Schema v16: Added symbol_metadata table for rich symbol information storage
  • Stores symbol definitions, references, and metadata from code analysis plugins
  • Indexed by document hash and symbol name for efficient lookups
  • Integrated with Knowledge Graph for entity relationship tracking
  • Migration includes tests for both schema changes and symbol metadata storage
  • Symbol-Aware Search Infrastructure: Enhanced search with symbol/entity detection and enrichment
  • SymbolEnricher class extracts rich metadata from Knowledge Graph (definitions, references, call graphs)
  • Symbol context includes type, scope, caller/callee counts, and related symbols
  • Hybrid Search Symbol Integration: Symbol metadata now actively boosts search ranking
    • Added symbol_weight configuration field (default: 0.15 = 15% multiplicative boost)
    • HybridSearchEngine::setSymbolEnricher() method wires SymbolEnricher into search pipeline
    • Symbol matches receive score boost when isSymbolQuery && symbolScore > 0.3

Fixed

  • Grep Command Duplicate Output: Fixed yams grep printing results twice when stderr is redirected
  • Migration System Crash (macOS): Fixed SIGSEGV crash in MigrationManager::recordMigration() during daemon startup
  • Root Cause: ServiceManager::co_migrateDatabase() called mm.initialize() but ignored its return value. If initialization failed to create the migration_history table, migrations would continue and crash when attempting to INSERT into the non-existent table.
  • Fix: Added error checking for mm.initialize() with early return and proper error logging in ServiceManager.cpp
  • Embedding System Architecture Simplification: Simplified FSM readiness logic to check provider availability directly instead of waiting for model load events
  • IModelProvider checks isAvailable() immediately after plugin adoption
  • Eliminates unnecessary ModelLoading state transition
  • Fixes “Embedding Ready: Waiting” status showing incorrectly when embeddings were actually available
  • Model dimension retrieved via getEmbeddingDim() at adoption time
  • Database Schema Recovery: Manual creation of missing kg_doc_entities table from migration 7
  • Table includes 8 columns with foreign keys to documents and kg_nodes
  • Created indexes: idx_kg_doc_entities_document, idx_kg_doc_entities_node
  • Fixes search query errors: “no such table: kg_doc_entities”
  • Worker Thread Premature Exit: Fixed io_context workers exiting immediately on startup by adding executor_work_guard to keep the context alive until explicit shutdown.
  • SocketServer Backpressure: Manual backpressure polling with std::counting_semaphore, eliminating 5-20ms delay loops and providing natural bounded concurrency.
  • Embedding Consumer Deadlock: Fixed race condition causing embedding job consumer to stall
  • Added defensive retry mechanism with exponential backoff for queue state recovery
  • Impact: Embedding background processing now reliable under high load
  • FSM Cleanup & Degradation Tracking: Standardized FSM usage across ServiceManager
  • Added DaemonLifecycleFsm reference to ServiceManager for centralized subsystem degradation tracking
  • ServiceManager FSM Architecture: Centralized state management and eliminated duplication
  • Added DaemonLifecycleFsm& lifecycleFsm_ reference to ServiceManager for daemon-level degradation tracking
  • Removed scattered manual FSM state checks in favor of FSM query methods (isReady(), isLoadingOrReady())
  • Text Extraction for Source Code: Fixed critical issue where JavaScript/TypeScript/Solidity/config files failed FTS5 extraction
    • src/extraction/extraction_util.cpp: Replaced hardcoded is_text_like() with FileTypeDetector::isTextMimeType() which uses comprehensive magic_numbers.hpp database; added extension normalization (handles both .js and js formats)
    • src/extraction/plain_text_extractor.cpp:
    • Removed hardcoded 50+ extension list, delegating to FileTypeDetector for dynamic detection
    • Enhanced isBinaryFile() with UTF-8 BOM support and reduced false positives
    • Added isParseableText() with proper UTF-8 validation (validates multi-byte sequences, continuation bytes)
    • Baseline registration now includes common config/markup extensions (.toml, .ini, .yml, .md, .rst)
    • src/app/services/search_service.cpp: Updated lightweight indexing to use FileTypeDetector::isTextMimeType()

Changed

  • Embedding Provider Lifecycle: Transitioned from event-driven model loading to direct availability checking
  • Provider adoption now immediately dispatches ModelLoadedEvent if isAvailable() returns true
  • Simplified from 4-state FSM (Unavailable → ProviderAdopted → ModelLoading → ModelReady) to immediate ready transition
  • Aligns FSM with IModelProvider on-demand model loading architecture

Removed

  • Fuzzy Index Memory Optimization: Enhanced BK-tree index building with intelligent document prioritization
  • Uses metadata and Knowledge Graph to rank documents by relevance (tagged > KG-connected > recent > code files)
  • Limits index to 50,000 documents by default (configurable via YAMS_FUZZY_INDEX_LIMIT environment variable)
  • Graceful degradation with std::bad_alloc handling prevents daemon crashes on large repositories
  • Known Limitation: Fuzzy search on very large repositories (>100k documents) may experience memory pressure. Consider using metadata/KG filters or grep with exact patterns for better performance
  • ONNX Plugin Model Path Resolution: Enhanced model path search to support XDG Base Directory specification
  • Platform-Aware Plugin Installation: Build system now auto-detects Homebrew prefix on macOS
  • /opt/homebrew on Apple Silicon, /usr/local on Intel Macs and Linux
  • System plugin directory automatically trusted by daemon at runtime
  • Override via YAMS_INSTALL_PREFIX environment variable
  • Model loading timeouts hardened: adapter and ONNX plugin now use std::async with bounded wait; removed detached threads causing UAF/segfaults (AsioConnectionPool guarded)
  • Vector DB dim resolution no longer hardcodes 384; resolves from DB/config/env/provider preferred model, else warns and defers embeddings
  • ONNX plugin: removed implicit 384 defaults, derives embeddingDim dynamically from model/config; added env override YAMS_ONNX_PRECREATE_RESOURCES
  • Improved load diagnostics: detailed logs for ABI table pointers, phases, and timeout causes
  • Search Service Path Heuristic: Tightened path-first detection to only trigger for single-token or quoted path-like queries (slashes, wildcards, or extensions). Multi-word queries now proceed to hybrid/metadata search, restoring results for phrases such as "docs/delivery backlog prd tasks PBI" while preserving fast path lookups for actual paths.

  • Daemon stop reliability: yams daemon stop now only reports success after the process actually exits and will fall back to PID-based termination (and orphan cleanup) when the socket path is unresponsive.

  • Prompt termination on signals: the daemon now handles SIGTERM/SIGINT to exit promptly when graceful shutdown isn’t possible, addressing lingering yams-daemon processes after stop.
  • Hybrid Search Simplification: Removed complexity and environment variable overrides
  • Removed 6 environment variables: YAMS_DISABLE_KEYWORD, YAMS_DISABLE_ONNX, YAMS_DISABLE_KG, YAMS_ADAPTIVE_TUNING, YAMS_FUSION_WEIGHTS, YAMS_OVERRELIANCE_PENALTY
  • Kept YAMS_DISABLE_VECTOR for CI compatibility
  • Removed adaptive weight tuning logic (~30 LOC)
  • Removed over-reliance penalty mechanism
  • Keyword search now always executes (controlled by config.keyword_weight)
  • Fixed fusion weights for LEARNED_FUSION strategy: {-2.0f, 3.0f, 2.0f, 1.5f, 1.0f}

Removed

  • Removed WASM, and legacy plugin system from codebase and ServiceManager

[v0.7.6] - 10-13-2025

Added

  • CLI Pattern Ergonomics: Added --pattern/-p flag to list command as an alias for --name, improving consistency with other commands. The flag supports glob wildcards (*, ?, **) and auto-normalizes relative paths to absolute when no wildcards are present. (src/cli/commands/list_command.cpp)
  • Grep Literal Text Hints: Added smart error detection and helpful hints when grep patterns contain regex special characters. When a pattern fails regex compilation or returns no results, grep now suggests using the -F flag with the exact command to run. Added -Q as a short alias for -F/--fixed-strings/--literal-text to match git grep convention. (src/cli/commands/grep_command.cpp)
  • Search Literal Text Aliases: Added -F/-Q/--fixed-strings aliases to search command for consistency with grep. These short flags make it easier to search for literal text containing special characters like ()[]{}.*+?. Updated help text with concrete examples. (src/cli/commands/search_command.cpp)
  • Grep: enabling [search.path_tree] now lets explicit path filters reuse the metadata-backed path-tree engine, and tag-only invocations default the pattern to .*, removing the need for placeholder expressions. citesrc/app/services/grep_service.cpp:240src/cli/commands/grep_command.cpp:305
  • Tree-based List with Filters: Extended tree-based path queries to support tag, MIME type, and extension filtering. The list command now uses the tree index even when filters are applied, improving performance for pattern+filter queries (e.g., yams list --name "docs/**" --tags "test").
  • Benchmark: Tree List Filters: New benchmark suite (tree_list_filter_bench) measures query performance with various filter combinations. Results show 100-160μs query times with up to 10k queries/sec throughput. Filter queries often outperform path-only queries due to reduced result set sizes.
  • Grep SQL-Level Pattern Filtering: Added queryDocumentsByGlobPatterns() function that converts glob patterns (e.g., tests/**/*.cpp) to SQL LIKE patterns and queries the database directly, eliminating the need to load all documents into memory before filtering. Grep performance with --include patterns improved dramatically on large repositories.
  • Search Multi-Pattern Support: Added pathPatterns vector field to SearchRequest IPC protocol (field 34) enabling server-side filtering of multiple include patterns. Search command now sends all patterns to daemon instead of filtering results client-side, eliminating timeouts and OOM errors in sandboxed environments.
  • MCP Search Multi-Pattern Support: Added include_patterns array parameter to MCP search tool, enabling clients to specify multiple path patterns with OR logic. The MCP server now populates pathPatterns in daemon requests, matching CLI behavior. (include/yams/mcp/tool_registry.h, src/mcp/mcp_server.cpp)

Changed

  • MCP stdio transport: stdout buffering now adapts to interactive vs non-interactive streams, stderr is forced unbuffered, and JSON-RPC batch arrays over stdio are parsed in-line to match the Model Context Protocol 2025-03-26 transport requirements. Additional unit coverage exercises batch handling and error budgets for framed headers.
  • MCP Server: cat and get tools now resolve relative paths using weakly_canonical, improving document lookup for non-absolute paths.
  • Path Canonicalization: Document paths are now canonicalized using weakly_canonical() at ingestion time to ensure consistent path matching across symlinked directories (e.g., /var/private/var on macOS). This fixes pattern-based queries that previously failed due to path mismatch between indexed and query paths. (src/metadata/path_utils.cpp)
  • Integration Test Stability: Improved TreeBasedListE2E test reliability by replacing fixed sleep with polling-based wait for document indexing completion. Test pass rate improved from ~60% to 100%.
  • Grep Performance: Grep service now uses SQL-level pattern filtering when --include patterns are provided, fetching only matching documents from the database instead of loading all documents and filtering in memory. Converts glob patterns to SQL LIKE patterns (e.g., *.cpp%.cpp, tests/**/*.htests/%.h). This eliminates hangs on large repositories (10K+ documents).
  • Search Service: Updated to handle multiple path patterns via pathPatterns vector field, iterating through all patterns with OR logic for server-side filtering. Removed client-side filtering that previously caused timeouts with multiple --include patterns.
  • Build System: Fixed VS Code task definitions with correct Conan 2.x output paths. Meson native file paths updated from builddir/conan_meson_native.ini to builddir/build-debug/conan/conan_meson_native.ini and build/release/build-release/conan/conan_meson_native.ini to match actual Conan 2 directory structure. (.vscode/tasks.json)

Fixed

  • MCP Search/Grep Hang: Fixed MCP server’s search and grep tools hanging indefinitely by ensuring all async components use the same io_context. The root cause was a multi-layered async execution mismatch: (1) The DaemonClient’s config didn’t specify an executor, causing it to create its own internal io_context. (2) callTool() was spawning work on GlobalIOContext and blocking a worker thread waiting for results. (3) The two separate io_contexts couldn’t communicate, causing deadlock. Fixed by: (a) Configuring daemon_client to use GlobalIOContext executor in MCPServer constructor. (b) Removing the nested local io_context from callTool() - now correctly spawns on GlobalIOContext (which has background worker threads) and waits for results. (c) Removing nested io_context from handleSearchDocuments() to use co_await directly. (src/mcp/mcp_server.cpp:403,2029-2039,2225-2226)
  • Document Service: Improved resolveNameToHash to correctly handle filename-only lookups by searching for paths ending with the given name, ensuring that commands like cat with a simple filename succeed.
  • Tree Query Pattern Matching: Fixed wildcard pattern parsing to correctly handle recursive patterns (/**) by stripping all trailing wildcards iteratively instead of checking for a single wildcard character. (src/app/services/document_service.cpp)
  • Grep Hang: Fixed grep command hanging indefinitely when using --include patterns on large repositories. The service was fetching all documents before filtering; now uses SQL-level pattern matching to fetch only relevant documents.
  • Search Timeout: Fixed search command timeouts/OOM when using multiple --include patterns. Previously only the first pattern was sent to daemon with remaining patterns filtered client-side after retrieving ALL results. Now all patterns are sent to daemon for server-side filtering.
  • Search Pattern Matching: Fixed glob pattern normalization to correctly match filename patterns like *.md and *.cpp anywhere in the path tree. Patterns starting with a single * (e.g., *.ext) are now automatically prefixed with **/ to match paths at any depth. This ensures patterns like

[v0.7.4] - 2025-10-010

Changed

  • MetadataRepository: Added atomic counters (cachedDocumentCount_, cachedIndexedCount_, cachedExtractedCount_) updated on every insert/delete/update operation. Eliminated 3 COUNT(*) queries from hot path (220-400ms → <1μs)
  • VectorDatabase: Added cachedVectorCount_ atomic counter updated on insert/delete operations. Eliminated COUNT(*) query from getVectorCount()
  • ServiceManager Concurrency: Converted searchEngineMutex_ from std::mutex to std::shared_mutex enabling N concurrent readers with single exclusive writer. Allows parallel status requests without serialization bottleneck
  • Status Request Optimization: Removed blocking VectorDatabase initialization from hot path. Status handler now reports readiness accurately without attempting to “fix” uninitialized state, eliminating 1-5s blocking operations
  • Performance: Sequential request throughput improved to ~1960 req/s with sub-millisecond latency (avg: 0.02ms, max: 1ms). First connection latency: 2ms. Daemon readiness validation added to prevent test methodology races with initialization
  • Document Retrieval Optimization: Replaced O(n) full table scans with O(log n) indexed lookups in cat/get operations. Changed from queryDocumentsByPattern('%')getDocumentByHash(hash) eliminating 120K+ document scans per retrieval (lines 850, 934 in document_service.cpp)
  • Name Resolution Fix: Fixed pattern generation for basename-only queries. Now generates '%/basename' pattern FIRST to use containsFragment query instead of failing exactPath (path_hash) match. yams get --name and yams cat <name> now work correctly
  • Grep FTS-First: Optimized grep to START with FTS5 index search for literal patterns before falling back to full document scan. Regex patterns still use full scan. Significantly improves grep performance on large repositories
  • ONNX Plugin: Upgraded the ONNX plugin to conform to the modern model_provider_v1 (v1.2) interface specification.
  • Enhanced ui_helpers.hpp with 30+ new utilities: value formatters (format_bytes, format_number, format_duration, format_percentage), status indicators (status_ok, status_warning, status_error), table rendering (Table, render_table), progress bars, text utilities (word wrap, centering, indentation)
  • Improved yams status with color-coded severity indicators, human-readable formatting, and sectioned layout
  • Enhanced yams daemon status with humanized counter names (CAS, IPC, EMA, DB acronyms preserved), smart byte/number formatting
  • Added yams daemon status -d detailed view with storage overhead breakdown showing disk usage by component (CAS blocks, ref counter DB, metadata DB, vector DB, vector index) with overhead percentage relative to content

Deprecated

  • MCP get_by_name tool: Use get tool with name parameter instead. The get tool now smartly handles both hash and name lookups with optimized pattern matching

  • Streaming Protocol Bug: Fixed critical bug where GetResponse/CatResponse sent header-only frame (empty content) followed by data frame, causing CLI to process first frame and fail. Added force_unary_response check in request_handler.cpp to disable streaming for these response types, forcing single complete frame transmission

  • Protobuf Schema: Added missing bool has_content = 6 field to GetResponse message in ipc_envelope.proto. Updated serialization to explicitly set/read flag instead of recalculating, preventing desync between daemon and CLI
  • Daemon: Fixed a regression in the plugin loader that prevented legacy model provider plugins (like the ONNX provider) from being correctly detected and adopted. The loader now includes a fallback to detect and register providers using the legacy getProviderName/createProvider symbols, restoring embedding generation functionality.
  • Grep Service: Fixed critical bug where --paths-only mode returned all candidate documents without checking pattern matches, causing incorrect “(no results)” responses. Removed premature fast-exit optimization; grep now properly runs pattern matching and returns only files that match. (Issue: 135K docs indexed but grep returned empty, audit revealed fast-exit bypassed validation)
  • Grep CLI: Fixed session pattern handling bug where session include patterns were incorrectly used as document selectors instead of result filters. Session patterns now properly merged into includePatterns for filtering, not paths for selection. This prevented grep from finding any results when a session was active.

[v0.7.3] - 2025-10-08

Added

  • Bench: minimal daemon warm-start latency check moved to an opt-in bench target and suite.
  • New standalone binary tests/yams_bench_daemon_warm executes a bounded start/sleep/stop cycle with vectors disabled and tight init timeouts; asserts <5s end-to-end.
  • Meson test registered as bench_daemon_warm_latency in the yams:bench suite.
  • Disabled by default in CI; enable by setting RUN_DAEMON_WARM_BENCH=true (workflow env) and YAMS_ENABLE_DAEMON_BENCH=1 (step env) to run only this bench.
  • Tree-Diff Metadata & Retrieval Modernization🎉
  • Tree-based snapshot comparison: Implemented Merkle tree-based diff algorithm for efficient snapshot comparison with O(log n) subtree hash optimization for unchanged directories.
  • Rename detection: Hash-based rename/move detection with ≥99% accuracy, enabled by default in yams diff command.
  • Knowledge Graph integration: Path and blob nodes with version edges and rename tracking via fetchPathHistory() API.
  • Enhanced graph command: yams graph now queries KG store for same-content relationships and rename chains.
  • Tree diff as default: yams diff uses tree-based comparison by default; --flat-diff flag available for legacy behavior.
  • RPC/IPC exposure: Added ListTreeDiff method to daemon protocol (protobuf + binary serialization).

Changed

  • Daemon Async Architecture: Unified on modern Boost.Asio 1.82+ patterns with C++20 coroutines (asio::awaitable)
  • Single io_context with work guard for all async operations
  • Strands for logical separation (init, plugin, model domains)
  • RAII cleanup guards for automatic resource management
  • Error codes via as_tuple instead of exceptions for hot paths
  • Semaphore-based bounded concurrency instead of manual atomic flags
  • Compression-first retrieval DocumentService, CLI, and daemon IPC now default to returning compressed payloads with full metadata (algorithm, CRC32s, sizes)
  • Path query pipeline: Replaced the legacy findDocumentsByPath helper with the normalized queryDocuments API and the shared queryDocumentsByPattern utility. All services (daemon, CLI, MCP, mobile bindings, repair tooling, vector ingestion) now issue structured queries that leverage the path_prefix, reverse_path, and path_hash indexes plus FTS5 for suffix matches, eliminating full-table LIKE scans.
  • Schema migration: Migration v13 (Add path indexing schema) continues to govern the derived columns/indices; applying this release replays the up hook in place (normalizing existing rows and rebuilding the FTS table), so existing deployments automatically benefit from the optimized lookups after the usual migration step.
  • CLI Retrieval (get/cat): partial-hash resolution now routes through RetrievalService using the daemon’s streaming search and the metadata-layer hash-prefix index.
  • yams get and yams cat accept 6–64 hex prefixes; ambiguity can be resolved via --latest/--oldest. No more local metadata table scans; latency improves especially on large catalogs.
  • Internals: RetrievalService::resolveHashPrefix consumes SearchService hash results and applies newest/oldest selection hints; GetCommand validates and normalizes hash input before issuing a daemon Get.

Fixed

  • Daemon IPC: Fixed a regression in the grep IPC protocol where GrepRequest and GrepResponse messages were not fully serialized, causing data loss. The protocol definitions and serializers have been updated to correctly handle all fields, including show_diff in requests and detailed statistics in responses.
  • Indexing: Fixed an issue where updated files were not being re-indexed. The change detection logic now correctly considers file modification time and size, in addition to content hash, to reliably identify changes.
  • Indexing: Corrected the document update process to prevent duplicate records for the same file path when a file is updated. The indexer now properly distinguishes between new documents and updates to existing ones.
  • Daemon IPC: Fixed an issue where search and grep commands could time out without producing output by improving the efficiency of the daemon’s streaming response mechanism.
  • Daemon IPC: Optimized non-multiplexed communication paths to prevent performance issues and potential timeouts with large responses from commands like get and cat.

[v0.7.2] - 2025-10-03

Added

  • Automatic directory snapshot generation with ISO 8601 timestamp IDs and git metadata detection (commit, branch, remote). Every yams add <directory> now creates a timestamped snapshot stored in the tree_snapshots table.
  • Snapshot Listing: New yams list --snapshots command displays all available snapshots with table and JSON output formats, showing snapshot IDs, directory paths, labels, git commits, and file counts.
  • Implemented yams diff <snapshotA> <snapshotB> command with tree, flat, and JSON output formats for comparing directory snapshots.
  • TreeDiffer automatically detects renamed/moved files via SHA-256 hash equivalence matching, enabled by default.

Changed

  • Snapshot Labels: yams add now accepts optional --label flag for human-readable snapshot names.
  • Indexing Service: Enhanced to persist snapshot metadata (snapshot_id, directory_path, git metadata, file count) to database after directory ingestion.
  • Metadata Repository: Added upsertTreeSnapshot(), listTreeSnapshots(), and tree diff persistence methods for snapshot and change history management.
  • Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
  • Search: Search thread pools are now configured by the central TuningManager to adapt to system load and tuning profiles.
  • Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
  • Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
  • Search: Search thread pools are now configured by the central TuningManager to adapt to system load and tuning profiles.
  • Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
  • Added FTS5 readiness fast-path check in getByNameSmart() to prevent 3-second blocking timeouts when search indexes are updating.
  • Added post_ingest_queue_depth field to status response, enabling clients to check if FTS5 indexes are ready before attempting expensive search operations.
  • TUI browse command now resolves listings and fuzzy search through the shared AppContext service bundle (TUIServices + IDocumentService/ISearchService), with graceful fallback to metadata/content-store paths when the daemon is degraded.
  • CLI Browse: Shift+R reindex dialog now performs a full extraction + index refresh through TUIServices::reindexDocument, providing inline success/error feedback instead of the previous placeholder flow.

Fixed

  • Daemon IPC: SocketServer now shares a live writer-budget reference with every connection and the tuning manager pushes updates through it. Multiplexed streams adjust bandwidth limits immediately when profiles or runtime heuristics change.
  • Search: Corrected an issue where yams search --include was not being applied for hybrid searches. The include pattern is now passed to the daemon and correctly filters results.
  • Fixed protobuf UTF-8 validation errors when grepping binary files or non-UTF-8 text. Changed GrepMatch.line, context_before, and context_after fields from string to bytes type in protobuf definition. This allows grep to handle arbitrary byte sequences including binary content, Latin-1, Windows-1252, and other legacy encodings without validation failures. (PBI-001, task 001-33)
  • Daemon IPC: replaced the io_context.run_for polling loop with dedicated run_one workers so async accept completions are no longer starved during streaming requests. Added optional diagnostic thread (YAMS_SOCKET_RUN_DIAG) for debugging.
  • CLI Browse: refuse to launch the FTXUI browser when the terminal is non-interactive, lacks TERM capabilities, or is smaller than 60x18; emit a clear resize guidance message instead of hanging or crashing.
  • CLI Search: release pooled daemon clients before process teardown to prevent the std::system_error: mutex lock failed abort when yams search exits after hitting the daemon path.

[v0.7.1] - 2025-09-29

Changed

  • GrepService: expanded candidate discovery to preselect from req.paths using SQL LIKE prefix scans, aligning service behavior with CLI expectations for directory patterns.
  • RepairCoordinator refocus: on live DocumentAdded events, skip queuing when the post‑ingest
  • Post‑ingest pipeline: improvements
  • ServiceManager enqueue path: simplified enqueuePostIngest to a direct blocking enqueue. This improves predictability and throughput under high load.
  • CLI Download UX: yams download now clearly displays the ingested content hash

Fixed

  • GrepService streaming: flushes the final partial line when scanning cold CAS streams so single-line files are matched reliably (e.g., hello.txt).
  • Reduced GrepService log verbosity to debug for internal counters and match traces.
  • Fixed IPC protocol regression where grep and list commands failed to properly communicate with the daemon after migration, causing incomplete results or timeouts in multi-service environments.
  • This issue impacted other tools result output
  • Guarded compression monitor global statistics with a dedicated mutex to stop concurrent tracker updates from crashing unit_shard5 (validated via meson test -C build/debug unit_shard5 --print-errorlogs).
  • Repaired the document_service metadata pipeline regression so fixture-driven search tests no longer observe missing extracted content.
  • MCP stdio transport: replaced unused static output mutex with an instance mutex to satisfy ODR/build on certain platforms.

[v0.7.0] - 2025-09-25

Highlights

  • These changes reduce CPU spikes observed in profiles for large greps and remove blocking storage scans from interactive status paths. Post-ingest work is intentionally bounded; processing may take longer, but overall system responsiveness improves.
  • Stability: resolved connection timeouts under multi-agent load by removing the hard 100-connection cap and deriving a dynamic accept limit. Defaults honor YAMS_MAX_ACTIVE_CONN or compute a safe cap from CPU cores and IO concurrency.
  • Throughput: added tuning profiles (efficient | balanced | aggressive). Profiles modulate pool growth, IO thresholds, and post-ingest workers. Default is balanced.
  • Indexing UX: Add/ingest returns fast; post‑ingest queue handles FTS/embeddings/KG in the background. Path‑series versioning (Phase 1) is on by default behind an env flag.

Added

  • Tuning profiles selectable via config or env:
  • Config: yams config set tuning.profile <efficient|balanced|aggressive>
  • Env: YAMS_TUNING_PROFILE=<profile>
  • Config defaults now include [tuning] profile = "balanced".
  • Docs: docs/admin/tuning_profiles.md covering profiles, envs, and observability.
  • Versioning (Phase 1): path‑series lineage with VersionOf edges and metadata flags version, is_latest, series_key. Duplicate (same hash) re‑ingest does not create a new version; alternate locations and timestamps are updated.

  • CLI Search: grouped multi‑version presentation (default on) with new controls.

  • Groups results by canonical path when multiple versions of the same file are returned.
  • New flags:
    • --no-group-versions — disable grouping and show the flat list.
    • --versions <latest|all> — choose best only (default: latest) or list versions per path.
    • --versions-topk <N> — cap versions shown per path when --versions=all (default: 3).
    • --versions-sort <score|path|title> — sort versions within a group (default: score).
    • --no-tools — hide per‑version tool hints.
    • --json-grouped — emit grouped JSON; plain --json remains flat and backward compatible.
  • Tool hints shown per version (when grouped): yams get --hash <hash> | yams cat --hash <hash> | yams restore --hash <hash>; if a local file path is resolved, a yams diff --hash <hash> <local-path> hint is added.
  • Environment toggles: YAMS_NO_GROUP_VERSIONS=1 and YAMS_NO_GROUP_TOOLS=1 to flip defaults.
  • Note: This is a presentation‑layer change; service/daemon APIs are unchanged.

Changed

  • Build System
  • The primary build system has been migrated from CMake to Meson. All build, test, and packaging scripts have been updated to use the new Meson-based workflow.
  • Status/Stats (CLI): use daemon metrics by default and never trigger local storage scans.
  • yams status and yams stats -v now render from the same non-detailed daemon snapshot; removed the “scanning storage…” spinner and filesystem walks.
  • Verbose output formats the JSON fields instead of performing extra scans.
  • Tools/Stats (yams-tools): tools/yams-tools/src/commands/stats_command.cpp refactored to prefer daemon-first metrics with a legacy local fallback only if daemon is unavailable.
  • MCP add_directory: switched to daemon-first ingestion with a brief readiness wait to avoid “Content store not available” races. Removes local store preflight; maps NotInitialized to a clear, retryable message from the daemon.
  • MCP search: path normalization + optional diff parity with CLI.
  • New request field include_diff adds a structured diff block to results when the path_pattern points to a local file; mirrors yams search diff behavior.
  • MCPSearch DTOs extended to round-trip include_diff, diff, and local_input_file.
  • Daemon accept scaling: removed fixed cap; now dynamically computes maxConnections from recommendedThreads * ioConnPerThread * 4 (min 256) unless YAMS_MAX_ACTIVE_CONN is set.
  • Backpressure: increased default read pause to 10ms to smooth heavy load.
  • Post‑ingest: preserves bounded capacity; de‑dupes inflight, indexes FTS, updates fuzzy index, and emits KG nodes/edges best‑effort.
  • Status/Stats: JSON correctness improvements; omit misleading savings when physical size unknown; surface post‑ingest bus usage and document counters.
  • CLI Search: grouping of multiple versions per path is enabled by default; paths‑only output and flat JSON remain unchanged unless --json-grouped is specified.

Fixed

  • Regression in metadata extraction and storage used in search and grep tools The async post-ingest pipeline never persisted extracted text into the metadata store. As a result, document_content stayed empty, so search, repairs, and semantic pipelines saw “Document content not found” despite vector insert logs.
  • Many tuning optimizations for daemon usage
  • Grep pipeline: staged KG → metadata → content with caps and budget.
  • Prefers “hot” text (metadata-extracted) and caps cold CAS reads; early path/include filters.
  • Added a global time budget (internal) to stop long content scans gracefully.
  • Capped grep worker threads to a small, background-friendly number by default (≤4).
  • Grep streaming optimization: replaced per-character streambuf overflow with bulk line splitting (memchr-based) to eliminate the per-byte hotspot in profiles during CAS streaming.
  • Post-ingest queue: bounded by configuration, not CPU heuristics.
  • Default worker threads set conservatively to 1 unless configured in [tuning] as post_ingest_threads. Queue capacity now honored from post_ingest_queue_max.
  • Added a tiny yield between tasks to reduce contention and smooth CPU.
  • Addressed intermittent CLI timeouts and “Broken pipe” logs observed when many agents connected concurrently. Accept loop backoff now respects the higher connection cap and IO pool growth from the tuning manager.
  • Minor unit test fixes (Result value handling) to unblock CI.