Changelog Archive: v0.7.x Series¶
Archived Changelogs¶
- v0.7.x archive: docs/changelogs/v0.7.md
- v0.6.x archive: docs/changelogs/v0.6.md
- v0.5.x archive: docs/changelogs/v0.5.md
- v0.4.x archive: docs/changelogs/v0.4.md
- v0.3.x archive: docs/changelogs/v0.3.md
- v0.2.x archive: docs/changelogs/v0.2.md
- v0.1.x archive: docs/changelogs/v0.1.md
[v0.7.10] - 2026-12-20¶
Added¶
- Graph command
--list-typesflag (yams-66h): Node type discovery for knowledge graph - New
--list-typesflag shows all distinct node types with counts - Table output with TYPE and COUNT columns, ordered by count descending
- JSON output with
nodeTypesarray containingtypeandcountfields - Added
getNodeTypeCounts()method toKnowledgeGraphStoreinterface - Extended
GraphQueryRequestIPC protocol withlistTypesmode - Usage hint when no nodes found: suggests
yams add <path> - Location:
src/cli/commands/graph_command.cpp,include/yams/metadata/knowledge_graph_store.h - KnowledgeGraphStore query tests (yams-cqp): Unit tests for graph query methods
findNodesByTypepagination tests: limit, offset, combined pagination, empty resultsfindIsolatedNodestests: nodes with no incoming edges, different relation typesgetNodeTypeCountstests: type counts, ordering, empty graph- 4 test cases with 246 assertions
- Location:
tests/unit/daemon/graph_component_catch2_test.cpp - P4 language support for symbol extraction: Network data plane language (P4_16)
- Node types:
headerTypeDeclaration,structTypeDeclaration,controlDeclaration,parserDeclaration,actionDeclaration,tableDeclaration - Query patterns for actions, functions, headers, structs, controls, parsers, tables, typedefs
- Aliases:
p4,p4_16,p4lang - Grammar auto-download from
prona-p4-learning-platform/tree-sitter-p4 - Vector diagnostics in DaemonMetrics: Moved
collect_vector_diagto background polling - Added
vectorEmbeddingsAvailable,vectorScoringEnabled,searchEngineBuildReasontoMetricsSnapshot - Status requests now read from cached snapshot (non-blocking)
- Resolves status command hangs when vector services are slow
- Entity extraction metrics in status output: Added entity queue/inflight counters
- New metrics:
entityQueued,entityDropped,entityConsumed,entityInFlight - Exposed via
yams statusandyams status -voutput - JSON output includes
entity_queued,entity_consumed,entity_dropped,entity_inflight - Location:
include/yams/daemon/components/DaemonMetrics.h,src/cli/commands/status_command.cpp - Gitignore support for directory ingestion: Skip files matching
.gitignorepatterns - New
--no-gitignoreflag foryams addcommand to disable gitignore filtering - Default behavior respects
.gitignorepatterns in the root directory - Supports standard gitignore patterns: wildcards, directory patterns, anchored paths
- Location:
src/cli/commands/add_command.cpp,src/app/services/indexing_service.cpp
Changed¶
- Constexpr language configuration for symbol extraction: Centralized compile-time configuration
- 17 languages with constexpr node types and query patterns: C, C++, Python, Rust, Go, Java, JavaScript, TypeScript, C#, PHP, Kotlin, Perl, R, SQL, Solidity, Dart, P4
LanguageConfigstruct withclass_types,field_types,function_types,import_types,identifier_types- Query patterns:
function_queries,class_queries,import_queries,call_queries - Language alias support (e.g., “cpp” → “c++”, “cxx”, “cc”)
getLanguageConfig()constexpr lookup function- Location:
plugins/symbol_extractor_treesitter/symbol_extractor.cpp - Field extraction: New
extractFields()method extracts class member variables - Uses node type traversal with language-specific field types
- Creates
fieldkind symbols with proper byte ranges - Member containment relations: New
extractMemberRelations()method - Creates
containsedges from classes to their methods/fields - Uses byte range containment to determine class membership
- Improves knowledge graph structure for code navigation
- PostIngestQueue per-stage metrics: Exposed extraction/KG/symbol stage inflight counts
- New getters:
extractionInFlight(),kgInFlight(),symbolInFlight(),totalInFlight() - Static constexpr limits:
maxExtractionConcurrent(),maxKgConcurrent(),maxSymbolConcurrent() - Exposed via daemon status:
extraction_inflight,kg_inflight,symbol_inflight yams statusshows POST line when there’s active workyams status -vshows per-stage breakdownyams daemon status -dshows full Post-Ingest Pipeline section- JSON output includes
stagesobject with per-stage counts - Location:
include/yams/daemon/components/PostIngestQueue.h,src/cli/commands/status_command.cpp,src/cli/commands/daemon_command.cpp - PostIngestQueue dynamic concurrency scaling (PBI-05a): Auto-scale based on queue depth
- New TuneAdvisor tunables:
postExtractionConcurrent(),postKgConcurrent(),postSymbolConcurrent(),postEntityConcurrent() - Dynamic limits replace static constexpr values in PostIngestQueue pollers
- TuningManager scales concurrency based on queue depth thresholds:
>1000 queued: extraction=hwThreads/2, kg=hwThreads/2>500 queued: extraction=hwThreads/4, kg=32>100 queued: extraction=hwThreads/8+4, kg=16>10 queued: extraction=8idle: extraction=4 (default)
- Status output shows limits:
stages: extract=4/4, kg(q=0/i=0/8), symbol=0/4 - JSON includes
extraction_limit,kg_limit,symbol_limit,entity_limit - Location:
include/yams/daemon/components/TuneAdvisor.h,src/daemon/components/TuningManager.cpp,src/daemon/components/DaemonMetrics.cpp - Knowledge Graph cleanup on document deletion: Deleting documents now cascades to KG
deleteNodesForDocumentHash(): Removesdoc:<hash>nodes and symbol nodes with matching document_hash- Integrated into document deletion flow for automatic cleanup
- Location:
include/yams/metadata/knowledge_graph_store.h,src/app/services/document_service.cpp - Stale edge cleanup on re-indexing: Symbol extraction now cleans up old relationships
deleteEdgesForSourceFile(): Removes edges whereproperties.source_filematches path- Called automatically before re-extraction to prevent stale relationship accumulation
- Location:
src/daemon/components/EntityGraphService.cpp - Optimized isolated node query:
yams graph --isolatednow uses single SQL query findIsolatedNodes(): EfficientNOT EXISTSsubquery instead of N+1 pattern- New IPC fields:
isolatedMode,isolatedRelationinGraphQueryRequest - Significant performance improvement for large graphs
- Location:
src/cli/commands/graph_command.cpp,src/daemon/components/dispatcher/request_dispatcher_graph.cpp - Daemon log command: Added
yams daemon log - ExternalPluginHost: New plugin host for Python/process-based plugins (RFC-EPH-001)
- Implements
IPluginHostinterface for external plugins running as separate processes - JSON-RPC 2.0 communication over stdio using existing
PluginProcessandJsonRpcClient - Supported plugin types: Python (
.py), Node.js (.js), any executable with JSON-RPC support - Process lifecycle management: spawn, monitor, health checks, graceful shutdown
- Automatic crash recovery with configurable restart policy (max retries, backoff)
- Trust-based security model with persistent trust file
- RPC gateway for calling arbitrary plugin methods (
callRpc) - Plugin statistics tracking (uptime, restart count, health status)
- State change callbacks for monitoring plugin lifecycle events
- Location:
include/yams/daemon/resource/external_plugin_host.h,src/daemon/resource/external_plugin_host.cpp - Auto-init mode: New
yams init --autoflag for containerized/headless environments - Enables vector database with default model (
all-MiniLM-L6-v2) - Enables plugins directory setup
- Generates authentication keys
- Skips S3 configuration (uses local storage)
- Non-interactive: no prompts, uses sensible defaults
- Tree-sitter grammar download:
yams initnow offers to download tree-sitter grammars - Interactive menu: recommended (C, C++, Python, JS, TS, Rust, Go), all, or custom selection
- Auto-downloads and builds grammars from official GitHub repos
- Supports 14 languages: C, C++, Python, JavaScript, TypeScript, Rust, Go, Java, C#, PHP, Kotlin, Dart, SQL, Solidity
- Cross-platform: MSVC, MinGW, GCC, Clang compilation support
- Grammar prompt also available when YAMS is already initialized
- Grammars installed to XDG_DATA_HOME/yams/grammars (Unix) or %LOCALAPPDATA%\yams\grammars (Windows)
- New embedding model option: Added
multi-qa-MiniLM-L6-cos-v1as second model choice - Trained on 215M question-answer pairs for semantic search optimization
- Same dimensions (384) as default model for compatibility
- Replaces
all-mpnet-base-v2(768 dim) in model selection - Git-based version detection: Build system now auto-detects version from git tags
- Uses most recent semver tag (
v*) as effective version - Falls back to project version only if no tags exist
- Command-line override (
-Dyams-version=X.Y.Z) takes highest priority - Commit hash in version output:
yams --versionnow shows short commit hash - Format:
0.7.9 (commit: c16939f) built:2025-11-29T17:30:15Z - Helps identify exact build for bug reports and debugging
- Init command tests: New test suite for init command model download functionality
- Tests for valid HuggingFace URLs, model dimensions, naming conventions
- CLI flag acceptance tests (
--auto,--non-interactive,--force) - Content-type-aware search profiles: New
CorpusProfileenum and auto-detection CODE: Boosts symbol/path search for source code repositories (60%+ code files)PROSE: Boosts FTS5/vector search for text-heavy corpora (60%+ docs)DOCS: Balanced weights for mixed code/documentationMIXED: Default balanced weights for heterogeneous corporaSearchEngineConfig::detectProfile(): Auto-detects from file extension distributionSearchEngineConfig::forProfile(): Returns preset weights for a profile- Session-isolated memory: Documents can now be isolated to working sessions
- New CLI commands:
yams session create,open,close,status,merge,discard - Documents added during an active session are tagged with
session_idmetadata - Session documents are invisible to global searches (use
--globalto bypass) merge: Removes session tag to promote documents to global indexdiscard: Permanently deletes all session documents- Supports multiple concurrent sessions with automatic isolation
- Database migration adds session tracking to metadata repository
- Windows Job Object for plugin processes: External plugin child process cleanup
- Plugin processes are now assigned to Windows Job Objects
- All child processes are automatically terminated when plugin unloads
- Prevents orphaned processes from holding file locks (e.g., PID files)
- Uses
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSEfor reliable cleanup - Location:
src/extraction/plugin_process.cpp - Plugin health command: New
yams plugin health [name]subcommand for plugin diagnostics - Shows plugin status, interfaces, models loaded, and error state
- Displays model provider FSM state (Idle, Loading, Ready, Degraded, Failed)
- Lists all loaded models when provider is ready
- JSON output support with
--jsonflag - Location:
src/cli/commands/plugin_command.cpp - Plugin info improvements: Enhanced
yams plugin infooutput - Now uses
StatusResponse.providersfor accurate plugin status - Shows plugin type (native/external), interfaces, and path
- Properly handles both ABI and external plugin hosts
Changed¶
- Embedding model list: Both recommended models now have 384 dimensions
all-MiniLM-L6-v2: Lightweight general-purpose semantic search (default)multi-qa-MiniLM-L6-cos-v1: Optimized for question-answer semantic search- ServiceManager Decomposition: Extracted focused components from monolithic ServiceManager
- New
ConfigResolver: Static config/env resolution utilities (248 lines) - New
VectorSystemManager: Vector DB and index lifecycle (397 lines) - New
DatabaseManager: Metadata DB, connection pool, KG store lifecycle (254 lines) - New
PluginManager: Plugin host, loader, and interface adoption (515 lines) - ServiceManager accessors now delegate to extracted managers
- Configurable Vector DB Capacity: Vector index
max_elementsnow configurable - Environment variable:
YAMS_VECTOR_MAX_ELEMENTS - Config file:
[vector_database] max_elements - Default: 100,000 (range: 1,000 - 10,000,000)
- FTS5 index hygiene (migration v18): Removed unused
content_typecolumn from FTS5 index content_typewas indexed but never queried via FTS MATCH- Content type filtering uses JOIN on
documents.mime_typeinstead - Reduces FTS5 index size and improves indexing performance
- Automatic migration rebuilds index on first database open
- Daemon socket logging noise reduction: Request/mux/enqueue/drain logs now emit at debug level
- Default info-level daemon logs no longer show per-request socket traffic
- Enable debug logging to inspect connection-level request handling details
- SearchEngine Consolidation: Unified search architecture by removing legacy HybridSearchEngine
- SearchEngine is now the sole search engine, consolidating multi-component search (FTS5, PathTree, Symbol, KG, Vector, Tag, Metadata)
- Removed ~2000 lines of legacy code:
hybrid_search_engine.cpp,hybrid_search_factory.cpp, and associated headers - Parallel Execution: SearchEngine now uses
std::asyncto execute all 7 component queries simultaneously- Configurable via
SearchEngineConfig::enableParallelExecution(default: true) - Per-component timeout via
SearchEngineConfig::componentTimeout(default: 100ms) - Graceful degradation: timed-out components are skipped, others continue
- Configurable via
- Updated Interfaces:
AppContext.searchEnginereplacesAppContext.hybridEngineacross CLI, daemon, and services - SearchEngineBuilder: Simplified to create
SearchEnginedirectly (removedMetadataKeywordAdapterand KG scorer wiring) - Removed unused benchmark executables:
engine_comparison_bench,hybrid_search_bench - Location:
src/search/,include/yams/search/,src/app/services/,src/cli/ - HotzoneManager Persistence: Added save/load functionality for hotzone state
HotzoneManager::save(path): Serializes hotzone entries to JSON with atomic write (temp + rename)HotzoneManager::load(path): Restores persisted hotzone state on startup- Stores version, half-life config, and timestamped entry scores
- Location:
src/search/hotzone_manager.cpp,include/yams/search/hotzone_manager.h - CheckpointManager Component: New daemon component for periodic state persistence
- Manages vector index and hotzone checkpoint scheduling
- Configurable interval, threshold-based vector index saves, optional hotzone persistence
- Async timer-based loop with graceful shutdown support
- Post-ingest pipeline parallelization: PostIngestQueue and EntityGraphService now use WorkCoordinator
- Removed serial strand-based processing bottleneck in PostIngestQueue
- EntityGraphService now posts extraction jobs to shared WorkCoordinator thread pool
- Removed unused PoolManager “post_ingest” pool and associated TuningManager tuning logic
- Documents process in parallel across all worker threads with work stealing
- Graph BFS traversal optimization: Reduced N+1 query patterns in graph traversal
- New
getEdgesBidirectional()API: returns incoming + outgoing edges in single query (UNION) - New
getNodesByIds()API: batch node retrieval for hydration - Edge cache in BFS: edges fetched during neighbor collection reused for connecting edges
- Reduces per-node queries from 4 (2×getEdgesFrom + 2×getEdgesTo) to 1
- Location:
src/app/services/graph_query_service.cpp,src/metadata/knowledge_graph_store_sqlite.cpp - Graph command cleanup: Removed unused
--reverseflag - Bidirectional traversal is now the default behavior
- Flag was redundant since BFS optimization returns all connected edges
- Location:
src/cli/commands/graph_command.cpp
Fixed¶
- JavaScript/TypeScript symbol extraction: Audited and fixed against Tree-sitter grammars
- JavaScript: Added
function_expression,generator_function,generator_function_declaration,namespace_import,export_statement,export_specifier - TypeScript: Added
abstract_class_declaration,abstract_method_signature,function_expression,generator_function,import_alias - Added queries for function expressions, generators, abstract methods, export statements
- Graph
--namequery now shows symbol relationships: Fixedyams graph --name <file>showing “Graph data unavailable” - Now resolves filename to file node key and uses KG query path
- Shows connected symbols, includes, and document nodes
- Falls back to document-based lookup if file node not found
- Location:
src/cli/commands/graph_command.cpp - KG queue metric now shows pending count: Fixed
kg(q=N)showing cumulative total instead of pending items - Now calculates:
pending = queued - consumed - inflight - Affects
yams status -vandyams daemon status -ddisplays - Location:
src/cli/commands/status_command.cpp,src/cli/commands/daemon_command.cpp - Symbol extraction extension mapping: Fixed extension lookup not matching due to leading dot mismatch
- Database stores extensions with dots (
.cpp), map keys without (cpp) - PostIngestQueue now strips leading dot before lookup
- Location:
src/daemon/components/PostIngestQueue.cpp - Graph query bidirectional traversal: Fixed graph queries showing 0 connections for blob nodes
- BFS traversal now follows both incoming and outgoing edges by default
- Blob nodes (which only have incoming
has_versionedges from path nodes) now return connected nodes - Refactored dispatcher to delegate to GraphQueryService (single responsibility)
- Repair tracking (migration v21): Added repair status tracking to prevent duplicate work
- New
repair_statuscolumn (pending, processing, completed, failed, skipped) repair_attempted_attimestamp andrepair_attemptscounter- RepairCoordinator filters by status to avoid re-queuing processed documents
- Plugin interface parsing: Fixed object-format interfaces not parsing correctly
- Plugin host sharing: Fixed model provider adoption failure after component extraction
- VectorIndexManager initialization: Fixed “VectorIndexManager not provided” search engine build failure
- Model download mapping: Added
multi-qa-MiniLM-L6-cos-v1to HuggingFace repo mapping - Version display: Fixed
yams --versionshowing fallback values - Socket crash on shutdown: Fixed
EXC_BAD_ACCESSin kqueue_reactor during program exit - Windows daemon status metrics: CPU and memory now report accurate values
--nameflag foryams add: Fixed custom document naming for single-file adds- External plugin extractors: Fixed content extractors from external plugins not being used
- Trust file persistence: Fixed plugin trust file being deleted on daemon restart
- Trust file comment parsing: Fixed daemon crash when loading trust file with comments
- Plugin trust initialization order: Fixed plugins not loading despite being trusted
- Post-ingestion pipeline reliability: Improved async processing consistency
- Graph IPC serialization: Added missing ProtoBinding specializations for GraphQueryRequest/Response
- Status command document count: Fixed
yams statusshowingdocs=0after daemon restart - Short status now uses
documents_total(from metadata DB, initialized on startup) - Previously used
storage_documents(CAS object count, which was 0 on fresh start) - Detailed status was unaffected as it already used the correct field
- Location:
src/cli/commands/status_command.cpp
CLI Improvements¶
- PowerShell completion: Added
yams completion powershellfor PowerShell auto-complete - Consistent
--jsonoutput: Extended JSON output support across commands - Actionable error hints: Centralized error hint system with pattern-based hints
- Daemon error messages: Enhanced daemon start/stop failure messages with recovery hints
Removed¶
- HybridSearchEngine: Legacy search engine removed in favor of unified SearchEngine
- Deleted:
src/search/hybrid_search_engine.cpp(~1844 lines) - Deleted:
src/search/hybrid_search_factory.cpp(~168 lines) - Deleted:
include/yams/search/hybrid_search_engine.h - Deleted:
include/yams/search/hybrid_search_factory.h - HybridSearchEngine Tests: Removed obsolete test files
tests/unit/search/hybrid_search_engine_test.cpptests/unit/search/hybrid_grouping_smoke_test.cpptests/unit/search/learned_fusion_smoke_test.cpptests/unit/search/hierarchical_search_test.cpptests/unit/metadata/search_metadata_interface_test.cpp- Legacy Adapters: Removed
MetadataKeywordAdapter(was bridge for HybridSearchEngine) - CLI Adapter Rename:
HybridSearchResultAdapter→SearchResultItemAdapterin result_renderer.h
[v0.7.8] - 2025-11-14¶
Added¶
- Thread Pool Consolidation
- WorkCoordinator Component: New centralized thread pool manager with Boost.Asio io_context
- Replaces 3 separate thread pools (IngestService, PostIngestQueue, EmbeddingService)
- Provides strand allocation for per-service ordering guarantees
- Hardware-aware thread count (8-32 threads based on CPU cores)
- Search Service Parallel Post-Processing
- New
ParallelPostProcessorclass for concurrent search result processing - Parallelizes filtering, facet generation, and highlighting when result count ≥ 100
- Uses
std::asyncto run independent operations concurrently - Threshold-based activation (PARALLEL_THRESHOLD = 100) avoids overhead on small result sets
- Performance Measured (100 iterations):
- 100 results: 0.06ms (~1.66M ops/sec) - sequential path
- 500 results: 0.23ms (~2.21M ops/sec) - parallel path
- 1000 results: 0.43ms (~2.32M ops/sec) - parallel path
- Speedup: ~3.4x faster at 1000 results vs linear scaling
- Location:
include/yams/search/parallel_post_processor.hpp,src/search/parallel_post_processor.cpp - Integration:
search_executor.cppnow uses ParallelPostProcessor instead of sequential processing - Benchmarks:
tests/benchmarks/search_benchmarks.cpp
- New
Changed¶
- Search Service:
--fuzzysearches now merge BM25 keyword matches with fuzzy results so enabling typo tolerance never suppresses literal hits. (src/app/services/search_service.cpp) - Metadata Repository: Removed the default 50K fuzzy-index cap. The index now covers the full corpus by default and only enforces limits when
YAMS_FUZZY_INDEX_LIMITis set, adding a small safety buffer and explicit guard logging. (src/metadata/metadata_repository.cpp,include/yams/metadata/fuzzy_index_builder.h) - Service Architecture Refactor
- IngestService: Converted from manual thread pool to strand-based channel polling
- Removed
kSyncThresholdheuristics andcompat::jthreadpool - New
channelPoller()awaitable for document processing
- Removed
- PostIngestQueue: Converted from worker threads to strand-based pipeline
- Removed Worker struct, thread pool, and token bucket scheduler (~200 lines)
- Implemented awaitable pipeline:
processMetadataStage → (processKnowledgeGraphStage || processEmbeddingStage) - Parallel KG and Embedding stages using
make_parallel_group
- EmbeddingService: Converted from worker threads to strand-based channel polling
- Removed worker thread pool (~70 lines)
- New
channelPoller()awaitable with async timer
- TuningManager: Converted from manual thread to strand-based periodic execution
- Removed
compat::jthreadwith stop_token - New
tuningLoop()awaitable withboost::asio::steady_timer - Uses WorkCoordinator strand for pool size adjustments
- Maintains
TuneAdvisor::statusTickMs()polling interval
- Removed
- DaemonMetrics: Converted from manual thread to strand-based polling loop
- Removed
std::threadfor CPU/memory metrics collection - New
pollingLoop()awaitable with 250ms timer interval - Uses WorkCoordinator strand for metric updates
- Thread-safe snapshot access via
shared_mutex
- Removed
- BackgroundTaskManager: Migrated from GlobalIOContext to WorkCoordinator
- Removed fallback to GlobalIOContext (proper architectural separation)
- Now uses WorkCoordinator executor for all background tasks
- Integrated with unified work-stealing thread pool
- Fts5Job consumer polling delay reduced: 200ms → 10ms (20x throughput improvement)
- Fixed orphan scan queue overflow (was causing hundreds of dropped batches)
- ServiceManager: Refactored async operations
- Eliminated all 5 uses of
std::future/std::async - Converted database operations to use
make_parallel_groupwith timeouts
- Eliminated all 5 uses of
- SearchPool Removal
- Deleted the unused
SearchPoolcomponent and associated meson/build wiring ServiceManagerno longer constructs dead search infrastructure;HybridSearchEngineremains the sole search pathTuneAdvisor/TuningManagernow derive concurrency targets directly fromSearchExecutorload metrics instead of phantom pool sizes- Ingestion Pipeline Cleanup
- Removed
deferExtractionTechnical Debt: Eliminated bypass mechanism that skipped full production pipeline- Removed
deferExtractionfield fromStoreDocumentRequestandAddDirectoryRequeststructs - Removed conditional logic in DocumentService that skipped FTS5 extraction
- All document ingestion now uses full pipeline: metadata storage → FTS5 extraction → PostIngestQueue → (KG extraction || Embedding generation)
- Updated IngestService to always enqueue to PostIngestQueue (removed lines setting
deferExtraction=true) - Updated CLI add_command fallback paths (3 locations) to use full pipeline
- Updated mobile bindings to remove
sync_now-based deferral - Removed
--defer-extractionand--no-defer-extractionflags from ingestion_throughput_bench - Updated test helpers (tests/common/capability.h, integration test) to use full pipeline
- Removed
- Grep Output Update
- New default output format
- Example output:
=== Results for "TaskManager" in 3 files (5 regex, 2 semantic) === File: src/core/TaskManager.cpp (cpp) Matches: 3 (3 regex) Line 45: [Regex] class TaskManager { Line 102: [Regex] TaskManager::TaskManager() : initialized_(false) { Line 237: [Regex] void TaskManager::shutdown() { [Total: 7 matches across 3 files] - Location:
src/cli/commands/grep_command.cpp:531-645 - Grep Service Optimizations
- Literal Extraction from Regex Patterns
- New
LiteralExtractorutility extracts literal substrings from regex patterns - Enables two-phase matching: fast literal pre-filter → full regex only on candidates
- Based on ripgrep’s literal extraction strategy
- New
- Boyer-Moore-Horspool (BMH) String Search
- Replaces
std::string::find()with BMH algorithm for patterns ≥ 3 characters
- Replaces
- SIMD Vectorized Newline Scanning
- Platform-specific implementations: AVX2 (32 bytes), SSE2 (16 bytes), NEON (16 bytes)
- Scalar fallback using optimized memchr for portability
- Replaces byte-by-byte scanning in line boundary detection
- Performance: 4-8x speedup on large files
- Parallel Candidate Filtering
- Pre-filters unsuitable files before worker distribution using
std::async - Integrates
magic_numbers.hppfor accurate binary detection (86 compile-time patterns) - Filters build artifacts (.o, .class, .pyc), libraries (.a, .so, .dll), executables, packages
- Chunk-based parallel processing for large candidate sets (>100 files)
- Performance: 2-4x speedup on large corpora
- Pre-filters unsuitable files before worker distribution using
Fixed¶
- Content-backed Fuzzy Hits: Content-derived fuzzy matches (
_contententries) now map back to their owning documents, ensuring CLI searches show the expected files. (src/metadata/metadata_repository.cpp,tests/unit/metadata/metadata_repository_test.cpp) - Cold Start Vector Index Loading: Fixed issue where search and grep commands returned no results after daemon cold start despite having indexed documents.
- Search Async Path: Fixed
SearchCommand::executeAsync()not populatingpathPatternsfield in daemon request, causing server-side multi-pattern filtering to fail. The async code path (default execution) now correctly sends all include patterns to the daemon, matching the behavior of the sync path. (src/cli/commands/search_command.cpp:1360-1365) - Database Schema Compatibility: Fixed “constraint failed” errors during document insertion on databases with migration v12 (pre-path-indexing schema). The
insertDocument()function now conditionally builds INSERT statements based on thehasPathIndexing_flag, supporting both legacy (13-column) and modern (17-column with path indexing) schemas. This allows YAMS to work correctly regardless of whether migration v13 has been applied. (src/metadata/metadata_repository.cpp:318-380) - MCP Protocol Version Negotiation: Fixed “Unsupported protocol version requested by client” error (code -32901) by making protocol version negotiation permissive by default (
strictProtocol_ = false). The server now gracefully accepts any protocol version requested by clients, falling back to the latest supported version (2025-03-26) if the requested version is not in the supported list. Also added intermediate MCP protocol versions (2024-12-05,2025-01-15) to the supported list. This ensures maximum compatibility with MCP clients regardless of which spec version they implement. (src/mcp/mcp_server.cpp:560,1254-1260) - MCP Large Response Buffering: Fixed “Error: MPC -32602: Error: End of file” errors when MCP server sends large responses (list, search, grep with many results). Implemented chunked buffered output in
StdioTransport::sendFramedSerialized()that breaks payloads >512KB into 64KB chunks with explicit flushes between chunks. This prevents stdout buffer overflow and ensures reliable delivery of large JSON-RPC responses over stdio transport. Also added threshold-based routing inMCPServer::sendResponse()to use buffered sending for payloads >256KB. (src/mcp/mcp_server.cpp:69-95,169-203)
[v0.7.7] - 11-07-2025¶
Added¶
- Hierarchical Embedding Architecture & Two-Stage Hybrid Search
- Data model extensions for hierarchical embeddings
- Added
EmbeddingLevelenum (CHUNK, DOCUMENT) to distinguish embedding granularity - Extended
VectorRecordwithlevel,source_chunk_ids,parent_document_hash,child_document_hashesfields - Modified
embed_and_insert_documentto generate document-level embeddings (normalized mean of chunk vectors) - Document-level embeddings stored alongside chunk-level for two-stage search readiness
- Added
twoStageVectorSearchmethod that retrieves broader candidate set and applies hierarchical boosting - Configuration fields:
enable_two_stage,doc_stage_limit,chunk_stage_limit,hierarchy_boost - Groups results by document and boosts scores based on document-level similarity
- Wired into both parallel and sequential search paths for transparent operation
- Added
- Profiling build support for performance analysis
- New build type:
./setup.sh Profilingenables instrumentation for Tracy, Valgrind, Perf - Builds to
build/profilingdirectory with debug symbols + profiling hooks - Fuzzing build stub:
./setup.sh Fuzzingreserved for future AFL++/libFuzzer integration - See
docs/developer/profiling.mdfor comprehensive profiling guide
- New build type:
- EmbeddingService Architecture
- Problem: PostIngestQueue workers were blocking on slow embedding generation, causing:
- Documents not searchable until embeddings complete
- Add commands hanging/timing out
- Ingest pipeline stalled waiting for embedding models
- Solution: Separated embedding generation into dedicated
EmbeddingServicethat consumes fromInternalBus- PostIngestQueue now 2-stage pipeline (Metadata + KnowledgeGraph) - embeddings removed
- Documents searchable immediately after FTS5 indexing (~milliseconds)
- Embeddings generated asynchronously in background by EmbeddingService workers
- Better resource isolation: ingest and embedding workers independently tunable
- No more blocking: add commands return immediately, documents queryable right away
- ServiceManager & Daemon Lifecycle Improvements
- Structured Concurrency: Replaced manual backpressure logic with
std::counting_semaphorefor natural bounded concurrency - SocketServer Improvements:
- Converted async_accept to
as_tuplepattern, eliminating exception overhead during shutdown - Connection future tracking for graceful shutdown with 2s timeout verification
- Converted async_accept to
- Modern Error Handling: Consistent use of
boost::asio::as_tuple(use_awaitable)for error codes instead of exceptions - Future Tracking: Replaced detached spawns with
use_futurefor verifiable connection lifecycle management - Doctor Prune Command: Intelligent cleanup of build artifacts, logs, cache, and temporary files
- Support for 9 build systems (CMake, Ninja, Meson, Make, Gradle, Maven, NPM/Yarn, Cargo, Go)
- Detection across 10+ programming languages (C/C++, Java, Python, JavaScript, Rust, Go, OCaml, Haskell, Erlang, etc.)
- Hierarchical category system: build-artifacts, build-system, logs, cache, temp, coverage, IDE
- Extended package manager support: Added 9 new categories for package dependencies and caches
- IDE-specific:
ide-vscode,ide-intellij,ide-eclipsefor workspace caches - Dependencies:
package-node-modules(npm/yarn/pnpm),package-composer-vendor(PHP),package-cargo-target(Rust) - Caches:
package-python-cache(pycache/),package-maven-repo,package-gradle-cache,package-go-cache,package-gem-cache,package-nuget-cache - Composite groups:
package-deps,package-cache,packages(all),ide-all - Path-based detection for directories: node_modules/, pycache/, .vscode/, target/, vendor/, etc.
- IDE-specific:
- Dry-run by default with
--applyflag for execution - Usage:
yams doctor prune --category build-artifacts --older-than 30d --apply - Usage:
yams doctor prune --category packages --apply(clean all package artifacts) - Started C++23 Compatibility support expansion
- Migrated vectordb to https://github.com/trvon/sqlite-vec-cpp
- Tree-sitter Symbol Extraction Plugin Enhanced multi-language symbol extraction with Solidity support
- Solidity Support: Added complete Solidity language support with 4 query patterns (functions, constructors, modifiers, fallback/receive)
- Enhanced C++ Patterns: 16 function patterns + 6 class patterns including templates, constructors, destructors, operator overloads, method declarations inside class bodies
- Multi-Language Improvements: Enhanced patterns for Python (decorated functions), Rust (impl/trait methods), JavaScript/TypeScript (arrow functions, generators, async), Kotlin (property declarations) across all 15 supported languages
- Critical Bug Fix: Fixed query execution early-return bug that caused pattern short-circuiting - now executes all patterns resulting in 2.2x recall improvement (20.6% → 45.1%)
- Benchmark Infrastructure: Catch2-based benchmark suite with quality metrics (Recall/Precision/F1), performance metrics (Throughput/Latency), and JSON output for CI integration
- GTest Suite: 7 Solidity tests covering ERC20 tokens, inheritance, interfaces, events, and modifiers (372 lines, all passing)
- Plugin auto-downloads tree-sitter grammars on first use (configurable via
plugins.symbol_extraction.auto_download_grammars) - CLI commands:
yams config grammar list/download/path/auto-enable/auto-disable - Supports tree-sitter v13-15 grammar versions
- Entity Graph Service: Background service for extracting and materializing code symbols into Knowledge Graph
- Wired into IndexingPipeline and RepairCoordinator for automatic symbol extraction
- Supports plugin-based language-specific symbol extraction
- Foundation for symbol-aware search and code intelligence features
- Database Schema v16: Added
symbol_metadatatable for rich symbol information storage - Stores symbol definitions, references, and metadata from code analysis plugins
- Indexed by document hash and symbol name for efficient lookups
- Integrated with Knowledge Graph for entity relationship tracking
- Migration includes tests for both schema changes and symbol metadata storage
- Symbol-Aware Search Infrastructure: Enhanced search with symbol/entity detection and enrichment
SymbolEnricherclass extracts rich metadata from Knowledge Graph (definitions, references, call graphs)- Symbol context includes type, scope, caller/callee counts, and related symbols
- Hybrid Search Symbol Integration: Symbol metadata now actively boosts search ranking
- Added
symbol_weightconfiguration field (default: 0.15 = 15% multiplicative boost) HybridSearchEngine::setSymbolEnricher()method wires SymbolEnricher into search pipeline- Symbol matches receive score boost when
isSymbolQuery && symbolScore > 0.3
- Added
Fixed¶
- Grep Command Duplicate Output: Fixed
yams grepprinting results twice when stderr is redirected - Migration System Crash (macOS): Fixed SIGSEGV crash in
MigrationManager::recordMigration()during daemon startup - Root Cause:
ServiceManager::co_migrateDatabase()calledmm.initialize()but ignored its return value. If initialization failed to create themigration_historytable, migrations would continue and crash when attempting to INSERT into the non-existent table. - Fix: Added error checking for
mm.initialize()with early return and proper error logging inServiceManager.cpp - Embedding System Architecture Simplification: Simplified FSM readiness logic to check provider availability directly instead of waiting for model load events
- IModelProvider checks
isAvailable()immediately after plugin adoption - Eliminates unnecessary ModelLoading state transition
- Fixes “Embedding Ready: Waiting” status showing incorrectly when embeddings were actually available
- Model dimension retrieved via
getEmbeddingDim()at adoption time - Database Schema Recovery: Manual creation of missing
kg_doc_entitiestable from migration 7 - Table includes 8 columns with foreign keys to documents and kg_nodes
- Created indexes:
idx_kg_doc_entities_document,idx_kg_doc_entities_node - Fixes search query errors: “no such table: kg_doc_entities”
- Worker Thread Premature Exit: Fixed io_context workers exiting immediately on startup by adding
executor_work_guardto keep the context alive until explicit shutdown. - SocketServer Backpressure: Manual backpressure polling with
std::counting_semaphore, eliminating 5-20ms delay loops and providing natural bounded concurrency. - Embedding Consumer Deadlock: Fixed race condition causing embedding job consumer to stall
- Added defensive retry mechanism with exponential backoff for queue state recovery
- Impact: Embedding background processing now reliable under high load
- FSM Cleanup & Degradation Tracking: Standardized FSM usage across ServiceManager
- Added
DaemonLifecycleFsmreference to ServiceManager for centralized subsystem degradation tracking - ServiceManager FSM Architecture: Centralized state management and eliminated duplication
- Added
DaemonLifecycleFsm& lifecycleFsm_reference to ServiceManager for daemon-level degradation tracking - Removed scattered manual FSM state checks in favor of FSM query methods (
isReady(),isLoadingOrReady()) - Text Extraction for Source Code: Fixed critical issue where JavaScript/TypeScript/Solidity/config files failed FTS5 extraction
src/extraction/extraction_util.cpp: Replaced hardcodedis_text_like()withFileTypeDetector::isTextMimeType()which uses comprehensivemagic_numbers.hppdatabase; added extension normalization (handles both.jsandjsformats)src/extraction/plain_text_extractor.cpp:- Removed hardcoded 50+ extension list, delegating to
FileTypeDetectorfor dynamic detection - Enhanced
isBinaryFile()with UTF-8 BOM support and reduced false positives - Added
isParseableText()with proper UTF-8 validation (validates multi-byte sequences, continuation bytes) - Baseline registration now includes common config/markup extensions (
.toml,.ini,.yml,.md,.rst) src/app/services/search_service.cpp: Updated lightweight indexing to useFileTypeDetector::isTextMimeType()
Changed¶
- Embedding Provider Lifecycle: Transitioned from event-driven model loading to direct availability checking
- Provider adoption now immediately dispatches
ModelLoadedEventifisAvailable()returns true - Simplified from 4-state FSM (Unavailable → ProviderAdopted → ModelLoading → ModelReady) to immediate ready transition
- Aligns FSM with IModelProvider on-demand model loading architecture
Removed¶
- Fuzzy Index Memory Optimization: Enhanced BK-tree index building with intelligent document prioritization
- Uses metadata and Knowledge Graph to rank documents by relevance (tagged > KG-connected > recent > code files)
- Limits index to 50,000 documents by default (configurable via
YAMS_FUZZY_INDEX_LIMITenvironment variable) - Graceful degradation with
std::bad_allochandling prevents daemon crashes on large repositories - Known Limitation: Fuzzy search on very large repositories (>100k documents) may experience memory pressure. Consider using metadata/KG filters or grep with exact patterns for better performance
- ONNX Plugin Model Path Resolution: Enhanced model path search to support XDG Base Directory specification
- Platform-Aware Plugin Installation: Build system now auto-detects Homebrew prefix on macOS
/opt/homebrewon Apple Silicon,/usr/localon Intel Macs and Linux- System plugin directory automatically trusted by daemon at runtime
- Override via
YAMS_INSTALL_PREFIXenvironment variable - Model loading timeouts hardened: adapter and ONNX plugin now use std::async with bounded wait; removed detached threads causing UAF/segfaults (AsioConnectionPool guarded)
- Vector DB dim resolution no longer hardcodes 384; resolves from DB/config/env/provider preferred model, else warns and defers embeddings
- ONNX plugin: removed implicit 384 defaults, derives embeddingDim dynamically from model/config; added env override YAMS_ONNX_PRECREATE_RESOURCES
- Improved load diagnostics: detailed logs for ABI table pointers, phases, and timeout causes
-
Search Service Path Heuristic: Tightened path-first detection to only trigger for single-token or quoted path-like queries (slashes, wildcards, or extensions). Multi-word queries now proceed to hybrid/metadata search, restoring results for phrases such as
"docs/delivery backlog prd tasks PBI"while preserving fast path lookups for actual paths. -
Daemon stop reliability:
yams daemon stopnow only reports success after the process actually exits and will fall back to PID-based termination (and orphan cleanup) when the socket path is unresponsive. - Prompt termination on signals: the daemon now handles SIGTERM/SIGINT to exit promptly when graceful shutdown isn’t possible, addressing lingering yams-daemon processes after stop.
- Hybrid Search Simplification: Removed complexity and environment variable overrides
- Removed 6 environment variables:
YAMS_DISABLE_KEYWORD,YAMS_DISABLE_ONNX,YAMS_DISABLE_KG,YAMS_ADAPTIVE_TUNING,YAMS_FUSION_WEIGHTS,YAMS_OVERRELIANCE_PENALTY - Kept
YAMS_DISABLE_VECTORfor CI compatibility - Removed adaptive weight tuning logic (~30 LOC)
- Removed over-reliance penalty mechanism
- Keyword search now always executes (controlled by
config.keyword_weight) - Fixed fusion weights for
LEARNED_FUSIONstrategy:{-2.0f, 3.0f, 2.0f, 1.5f, 1.0f}
Removed¶
- Removed WASM, and legacy plugin system from codebase and ServiceManager
[v0.7.6] - 10-13-2025¶
Added¶
- CLI Pattern Ergonomics: Added
--pattern/-pflag tolistcommand as an alias for--name, improving consistency with other commands. The flag supports glob wildcards (*,?,**) and auto-normalizes relative paths to absolute when no wildcards are present. (src/cli/commands/list_command.cpp) - Grep Literal Text Hints: Added smart error detection and helpful hints when grep patterns contain regex special characters. When a pattern fails regex compilation or returns no results, grep now suggests using the
-Fflag with the exact command to run. Added-Qas a short alias for-F/--fixed-strings/--literal-textto match git grep convention. (src/cli/commands/grep_command.cpp) - Search Literal Text Aliases: Added
-F/-Q/--fixed-stringsaliases tosearchcommand for consistency withgrep. These short flags make it easier to search for literal text containing special characters like()[]{}.*+?. Updated help text with concrete examples. (src/cli/commands/search_command.cpp) - Grep: enabling
[search.path_tree]now lets explicit path filters reuse the metadata-backed path-tree engine, and tag-only invocations default the pattern to.*, removing the need for placeholder expressions. citesrc/app/services/grep_service.cpp:240src/cli/commands/grep_command.cpp:305 - Tree-based List with Filters: Extended tree-based path queries to support tag, MIME type, and extension filtering. The
listcommand now uses the tree index even when filters are applied, improving performance for pattern+filter queries (e.g.,yams list --name "docs/**" --tags "test"). - Benchmark: Tree List Filters: New benchmark suite (
tree_list_filter_bench) measures query performance with various filter combinations. Results show 100-160μs query times with up to 10k queries/sec throughput. Filter queries often outperform path-only queries due to reduced result set sizes. - Grep SQL-Level Pattern Filtering: Added
queryDocumentsByGlobPatterns()function that converts glob patterns (e.g.,tests/**/*.cpp) to SQL LIKE patterns and queries the database directly, eliminating the need to load all documents into memory before filtering. Grep performance with--includepatterns improved dramatically on large repositories. - Search Multi-Pattern Support: Added
pathPatternsvector field toSearchRequestIPC protocol (field 34) enabling server-side filtering of multiple include patterns. Search command now sends all patterns to daemon instead of filtering results client-side, eliminating timeouts and OOM errors in sandboxed environments. - MCP Search Multi-Pattern Support: Added
include_patternsarray parameter to MCP search tool, enabling clients to specify multiple path patterns with OR logic. The MCP server now populatespathPatternsin daemon requests, matching CLI behavior. (include/yams/mcp/tool_registry.h,src/mcp/mcp_server.cpp)
Changed¶
- MCP stdio transport: stdout buffering now adapts to interactive vs non-interactive streams, stderr is forced unbuffered, and JSON-RPC batch arrays over stdio are parsed in-line to match the Model Context Protocol 2025-03-26 transport requirements. Additional unit coverage exercises batch handling and error budgets for framed headers.
- MCP Server:
catandgettools now resolve relative paths usingweakly_canonical, improving document lookup for non-absolute paths. - Path Canonicalization: Document paths are now canonicalized using
weakly_canonical()at ingestion time to ensure consistent path matching across symlinked directories (e.g.,/var→/private/varon macOS). This fixes pattern-based queries that previously failed due to path mismatch between indexed and query paths. (src/metadata/path_utils.cpp) - Integration Test Stability: Improved
TreeBasedListE2Etest reliability by replacing fixed sleep with polling-based wait for document indexing completion. Test pass rate improved from ~60% to 100%. - Grep Performance: Grep service now uses SQL-level pattern filtering when
--includepatterns are provided, fetching only matching documents from the database instead of loading all documents and filtering in memory. Converts glob patterns to SQL LIKE patterns (e.g.,*.cpp→%.cpp,tests/**/*.h→tests/%.h). This eliminates hangs on large repositories (10K+ documents). - Search Service: Updated to handle multiple path patterns via
pathPatternsvector field, iterating through all patterns with OR logic for server-side filtering. Removed client-side filtering that previously caused timeouts with multiple--includepatterns. - Build System: Fixed VS Code task definitions with correct Conan 2.x output paths. Meson native file paths updated from
builddir/conan_meson_native.initobuilddir/build-debug/conan/conan_meson_native.iniandbuild/release/build-release/conan/conan_meson_native.inito match actual Conan 2 directory structure. (.vscode/tasks.json)
Fixed¶
- MCP Search/Grep Hang: Fixed MCP server’s
searchandgreptools hanging indefinitely by ensuring all async components use the same io_context. The root cause was a multi-layered async execution mismatch: (1) The DaemonClient’s config didn’t specify an executor, causing it to create its own internal io_context. (2)callTool()was spawning work on GlobalIOContext and blocking a worker thread waiting for results. (3) The two separate io_contexts couldn’t communicate, causing deadlock. Fixed by: (a) Configuring daemon_client to use GlobalIOContext executor in MCPServer constructor. (b) Removing the nested local io_context fromcallTool()- now correctly spawns on GlobalIOContext (which has background worker threads) and waits for results. (c) Removing nested io_context fromhandleSearchDocuments()to useco_awaitdirectly. (src/mcp/mcp_server.cpp:403,2029-2039,2225-2226) - Document Service: Improved
resolveNameToHashto correctly handle filename-only lookups by searching for paths ending with the given name, ensuring that commands likecatwith a simple filename succeed. - Tree Query Pattern Matching: Fixed wildcard pattern parsing to correctly handle recursive patterns (
/**) by stripping all trailing wildcards iteratively instead of checking for a single wildcard character. (src/app/services/document_service.cpp) - Grep Hang: Fixed grep command hanging indefinitely when using
--includepatterns on large repositories. The service was fetching all documents before filtering; now uses SQL-level pattern matching to fetch only relevant documents. - Search Timeout: Fixed search command timeouts/OOM when using multiple
--includepatterns. Previously only the first pattern was sent to daemon with remaining patterns filtered client-side after retrieving ALL results. Now all patterns are sent to daemon for server-side filtering. - Search Pattern Matching: Fixed glob pattern normalization to correctly match filename patterns like
*.mdand*.cppanywhere in the path tree. Patterns starting with a single*(e.g.,*.ext) are now automatically prefixed with**/to match paths at any depth. This ensures patterns like
[v0.7.4] - 2025-10-010¶
Changed¶
- MetadataRepository: Added atomic counters (
cachedDocumentCount_,cachedIndexedCount_,cachedExtractedCount_) updated on every insert/delete/update operation. Eliminated 3COUNT(*)queries from hot path (220-400ms → <1μs) - VectorDatabase: Added
cachedVectorCount_atomic counter updated on insert/delete operations. EliminatedCOUNT(*)query fromgetVectorCount() - ServiceManager Concurrency: Converted
searchEngineMutex_fromstd::mutextostd::shared_mutexenabling N concurrent readers with single exclusive writer. Allows parallel status requests without serialization bottleneck - Status Request Optimization: Removed blocking VectorDatabase initialization from hot path. Status handler now reports readiness accurately without attempting to “fix” uninitialized state, eliminating 1-5s blocking operations
- Performance: Sequential request throughput improved to ~1960 req/s with sub-millisecond latency (avg: 0.02ms, max: 1ms). First connection latency: 2ms. Daemon readiness validation added to prevent test methodology races with initialization
- Document Retrieval Optimization: Replaced O(n) full table scans with O(log n) indexed lookups in
cat/getoperations. Changed fromqueryDocumentsByPattern('%')→getDocumentByHash(hash)eliminating 120K+ document scans per retrieval (lines 850, 934 in document_service.cpp) - Name Resolution Fix: Fixed pattern generation for basename-only queries. Now generates
'%/basename'pattern FIRST to usecontainsFragmentquery instead of failingexactPath(path_hash) match.yams get --nameandyams cat <name>now work correctly - Grep FTS-First: Optimized grep to START with FTS5 index search for literal patterns before falling back to full document scan. Regex patterns still use full scan. Significantly improves grep performance on large repositories
- ONNX Plugin: Upgraded the ONNX plugin to conform to the modern
model_provider_v1(v1.2) interface specification. - Enhanced
ui_helpers.hppwith 30+ new utilities: value formatters (format_bytes,format_number,format_duration,format_percentage), status indicators (status_ok,status_warning,status_error), table rendering (Table,render_table), progress bars, text utilities (word wrap, centering, indentation) - Improved
yams statuswith color-coded severity indicators, human-readable formatting, and sectioned layout - Enhanced
yams daemon statuswith humanized counter names (CAS, IPC, EMA, DB acronyms preserved), smart byte/number formatting - Added
yams daemon status -ddetailed view with storage overhead breakdown showing disk usage by component (CAS blocks, ref counter DB, metadata DB, vector DB, vector index) with overhead percentage relative to content
Deprecated¶
-
MCP
get_by_nametool: Usegettool withnameparameter instead. Thegettool now smartly handles both hash and name lookups with optimized pattern matching -
Streaming Protocol Bug: Fixed critical bug where
GetResponse/CatResponsesent header-only frame (empty content) followed by data frame, causing CLI to process first frame and fail. Addedforce_unary_responsecheck in request_handler.cpp to disable streaming for these response types, forcing single complete frame transmission - Protobuf Schema: Added missing
bool has_content = 6field toGetResponsemessage in ipc_envelope.proto. Updated serialization to explicitly set/read flag instead of recalculating, preventing desync between daemon and CLI - Daemon: Fixed a regression in the plugin loader that prevented legacy model provider plugins (like the ONNX provider) from being correctly detected and adopted. The loader now includes a fallback to detect and register providers using the legacy
getProviderName/createProvidersymbols, restoring embedding generation functionality. - Grep Service: Fixed critical bug where
--paths-onlymode returned all candidate documents without checking pattern matches, causing incorrect “(no results)” responses. Removed premature fast-exit optimization; grep now properly runs pattern matching and returns only files that match. (Issue: 135K docs indexed but grep returned empty, audit revealed fast-exit bypassed validation) - Grep CLI: Fixed session pattern handling bug where session include patterns were incorrectly used as document selectors instead of result filters. Session patterns now properly merged into
includePatternsfor filtering, notpathsfor selection. This prevented grep from finding any results when a session was active.
[v0.7.3] - 2025-10-08¶
Added¶
- Bench: minimal daemon warm-start latency check moved to an opt-in bench target and suite.
- New standalone binary
tests/yams_bench_daemon_warmexecutes a bounded start/sleep/stop cycle with vectors disabled and tight init timeouts; asserts <5s end-to-end. - Meson test registered as
bench_daemon_warm_latencyin theyams:benchsuite. - Disabled by default in CI; enable by setting
RUN_DAEMON_WARM_BENCH=true(workflow env) andYAMS_ENABLE_DAEMON_BENCH=1(step env) to run only this bench. - Tree-Diff Metadata & Retrieval Modernization🎉
- Tree-based snapshot comparison: Implemented Merkle tree-based diff algorithm for efficient snapshot comparison with O(log n) subtree hash optimization for unchanged directories.
- Rename detection: Hash-based rename/move detection with ≥99% accuracy, enabled by default in
yams diffcommand. - Knowledge Graph integration: Path and blob nodes with version edges and rename tracking via
fetchPathHistory()API. - Enhanced graph command:
yams graphnow queries KG store for same-content relationships and rename chains. - Tree diff as default:
yams diffuses tree-based comparison by default;--flat-diffflag available for legacy behavior. - RPC/IPC exposure: Added
ListTreeDiffmethod to daemon protocol (protobuf + binary serialization).
Changed¶
- Daemon Async Architecture: Unified on modern Boost.Asio 1.82+ patterns with C++20 coroutines (
asio::awaitable) - Single io_context with work guard for all async operations
- Strands for logical separation (init, plugin, model domains)
- RAII cleanup guards for automatic resource management
- Error codes via
as_tupleinstead of exceptions for hot paths - Semaphore-based bounded concurrency instead of manual atomic flags
- Compression-first retrieval DocumentService, CLI, and daemon IPC now default to returning compressed payloads with full metadata (algorithm, CRC32s, sizes)
- Path query pipeline: Replaced the legacy
findDocumentsByPathhelper with the normalizedqueryDocumentsAPI and the sharedqueryDocumentsByPatternutility. All services (daemon, CLI, MCP, mobile bindings, repair tooling, vector ingestion) now issue structured queries that leverage thepath_prefix,reverse_path, andpath_hashindexes plus FTS5 for suffix matches, eliminating full-table LIKE scans. - Schema migration: Migration v13 (
Add path indexing schema) continues to govern the derived columns/indices; applying this release replays the up hook in place (normalizing existing rows and rebuilding the FTS table), so existing deployments automatically benefit from the optimized lookups after the usual migration step. - CLI Retrieval (get/cat): partial-hash resolution now routes through
RetrievalServiceusing the daemon’s streaming search and the metadata-layer hash-prefix index. yams getandyams cataccept 6–64 hex prefixes; ambiguity can be resolved via--latest/--oldest. No more local metadata table scans; latency improves especially on large catalogs.- Internals:
RetrievalService::resolveHashPrefixconsumesSearchServicehash results and applies newest/oldest selection hints;GetCommandvalidates and normalizes hash input before issuing a daemonGet.
Fixed¶
- Daemon IPC: Fixed a regression in the
grepIPC protocol whereGrepRequestandGrepResponsemessages were not fully serialized, causing data loss. The protocol definitions and serializers have been updated to correctly handle all fields, includingshow_diffin requests and detailed statistics in responses. - Indexing: Fixed an issue where updated files were not being re-indexed. The change detection logic now correctly considers file modification time and size, in addition to content hash, to reliably identify changes.
- Indexing: Corrected the document update process to prevent duplicate records for the same file path when a file is updated. The indexer now properly distinguishes between new documents and updates to existing ones.
- Daemon IPC: Fixed an issue where
searchandgrepcommands could time out without producing output by improving the efficiency of the daemon’s streaming response mechanism. - Daemon IPC: Optimized non-multiplexed communication paths to prevent performance issues and potential timeouts with large responses from commands like
getandcat.
[v0.7.2] - 2025-10-03¶
Added¶
- Automatic directory snapshot generation with ISO 8601 timestamp IDs and git metadata detection (commit, branch, remote). Every
yams add <directory>now creates a timestamped snapshot stored in thetree_snapshotstable. - Snapshot Listing: New
yams list --snapshotscommand displays all available snapshots with table and JSON output formats, showing snapshot IDs, directory paths, labels, git commits, and file counts. - Implemented
yams diff <snapshotA> <snapshotB>command with tree, flat, and JSON output formats for comparing directory snapshots. - TreeDiffer automatically detects renamed/moved files via SHA-256 hash equivalence matching, enabled by default.
Changed¶
- Snapshot Labels:
yams addnow accepts optional--labelflag for human-readable snapshot names. - Indexing Service: Enhanced to persist snapshot metadata (snapshot_id, directory_path, git metadata, file count) to database after directory ingestion.
- Metadata Repository: Added
upsertTreeSnapshot(),listTreeSnapshots(), and tree diff persistence methods for snapshot and change history management. - Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
- Search: Search thread pools are now configured by the central
TuningManagerto adapt to system load and tuning profiles. - Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
- Search: Parallelized keyword search scoring loop to significantly improve performance on multi-core systems.
- Search: Search thread pools are now configured by the central
TuningManagerto adapt to system load and tuning profiles. - Search: Implemented structural scoring to boost relevance of results that are co-located in the same directory.
- Added FTS5 readiness fast-path check in
getByNameSmart()to prevent 3-second blocking timeouts when search indexes are updating. - Added
post_ingest_queue_depthfield to status response, enabling clients to check if FTS5 indexes are ready before attempting expensive search operations. - TUI browse command now resolves listings and fuzzy search through the shared AppContext service bundle (
TUIServices+IDocumentService/ISearchService), with graceful fallback to metadata/content-store paths when the daemon is degraded. - CLI Browse: Shift+R reindex dialog now performs a full extraction + index refresh through
TUIServices::reindexDocument, providing inline success/error feedback instead of the previous placeholder flow.
Fixed¶
- Daemon IPC: SocketServer now shares a live writer-budget reference with every connection and the tuning manager pushes updates through it. Multiplexed streams adjust bandwidth limits immediately when profiles or runtime heuristics change.
- Search: Corrected an issue where
yams search --includewas not being applied for hybrid searches. The include pattern is now passed to the daemon and correctly filters results. - Fixed protobuf UTF-8 validation errors when grepping binary files or non-UTF-8 text. Changed
GrepMatch.line,context_before, andcontext_afterfields fromstringtobytestype in protobuf definition. This allows grep to handle arbitrary byte sequences including binary content, Latin-1, Windows-1252, and other legacy encodings without validation failures. (PBI-001, task 001-33) - Daemon IPC: replaced the
io_context.run_forpolling loop with dedicatedrun_oneworkers so async accept completions are no longer starved during streaming requests. Added optional diagnostic thread (YAMS_SOCKET_RUN_DIAG) for debugging. - CLI Browse: refuse to launch the FTXUI browser when the terminal is non-interactive, lacks TERM capabilities, or is smaller than 60x18; emit a clear resize guidance message instead of hanging or crashing.
- CLI Search: release pooled daemon clients before process teardown to prevent the
std::system_error: mutex lock failedabort whenyams searchexits after hitting the daemon path.
[v0.7.1] - 2025-09-29¶
Changed¶
- GrepService: expanded candidate discovery to preselect from
req.pathsusing SQL LIKE prefix scans, aligning service behavior with CLI expectations for directory patterns. - RepairCoordinator refocus: on live
DocumentAddedevents, skip queuing when the post‑ingest - Post‑ingest pipeline: improvements
- ServiceManager enqueue path: simplified
enqueuePostIngestto a direct blocking enqueue. This improves predictability and throughput under high load. - CLI Download UX:
yams downloadnow clearly displays the ingested content hash
Fixed¶
- GrepService streaming: flushes the final partial line when scanning cold CAS streams so single-line files are matched reliably (e.g.,
hello.txt). - Reduced GrepService log verbosity to
debugfor internal counters and match traces. - Fixed IPC protocol regression where grep and list commands failed to properly communicate with the daemon after migration, causing incomplete results or timeouts in multi-service environments.
- This issue impacted other tools result output
- Guarded compression monitor global statistics with a dedicated mutex to stop concurrent tracker
updates from crashing
unit_shard5(validated viameson test -C build/debug unit_shard5 --print-errorlogs). - Repaired the
document_servicemetadata pipeline regression so fixture-driven search tests no longer observe missing extracted content. - MCP stdio transport: replaced unused static output mutex with an instance mutex to satisfy ODR/build on certain platforms.
[v0.7.0] - 2025-09-25¶
Highlights¶
- These changes reduce CPU spikes observed in profiles for large greps and remove blocking storage scans from interactive status paths. Post-ingest work is intentionally bounded; processing may take longer, but overall system responsiveness improves.
- Stability: resolved connection timeouts under multi-agent load by removing the hard
100-connection cap and deriving a dynamic accept limit. Defaults honor
YAMS_MAX_ACTIVE_CONNor compute a safe cap from CPU cores and IO concurrency. - Throughput: added tuning profiles (efficient | balanced | aggressive). Profiles modulate
pool growth, IO thresholds, and post-ingest workers. Default is
balanced. - Indexing UX: Add/ingest returns fast; post‑ingest queue handles FTS/embeddings/KG in the background. Path‑series versioning (Phase 1) is on by default behind an env flag.
Added¶
- Tuning profiles selectable via config or env:
- Config:
yams config set tuning.profile <efficient|balanced|aggressive> - Env:
YAMS_TUNING_PROFILE=<profile> - Config defaults now include
[tuning] profile = "balanced". - Docs:
docs/admin/tuning_profiles.mdcovering profiles, envs, and observability. -
Versioning (Phase 1): path‑series lineage with
VersionOfedges and metadata flagsversion,is_latest,series_key. Duplicate (same hash) re‑ingest does not create a new version; alternate locations and timestamps are updated. -
CLI Search: grouped multi‑version presentation (default on) with new controls.
- Groups results by canonical path when multiple versions of the same file are returned.
- New flags:
--no-group-versions— disable grouping and show the flat list.--versions <latest|all>— choose best only (default: latest) or list versions per path.--versions-topk <N>— cap versions shown per path when--versions=all(default: 3).--versions-sort <score|path|title>— sort versions within a group (default: score).--no-tools— hide per‑version tool hints.--json-grouped— emit grouped JSON; plain--jsonremains flat and backward compatible.
- Tool hints shown per version (when grouped):
yams get --hash <hash> | yams cat --hash <hash> | yams restore --hash <hash>; if a local file path is resolved, ayams diff --hash <hash> <local-path>hint is added. - Environment toggles:
YAMS_NO_GROUP_VERSIONS=1andYAMS_NO_GROUP_TOOLS=1to flip defaults. - Note: This is a presentation‑layer change; service/daemon APIs are unchanged.
Changed¶
- Build System
- The primary build system has been migrated from CMake to Meson. All build, test, and packaging scripts have been updated to use the new Meson-based workflow.
- Status/Stats (CLI): use daemon metrics by default and never trigger local storage scans.
yams statusandyams stats -vnow render from the same non-detailed daemon snapshot; removed the “scanning storage…” spinner and filesystem walks.- Verbose output formats the JSON fields instead of performing extra scans.
- Tools/Stats (yams-tools):
tools/yams-tools/src/commands/stats_command.cpprefactored to prefer daemon-first metrics with a legacy local fallback only if daemon is unavailable. - MCP add_directory: switched to daemon-first ingestion with a brief readiness wait to avoid “Content store not available” races. Removes local store preflight; maps NotInitialized to a clear, retryable message from the daemon.
- MCP search: path normalization + optional diff parity with CLI.
- New request field
include_diffadds a structureddiffblock to results when thepath_patternpoints to a local file; mirrorsyams searchdiff behavior. - MCPSearch DTOs extended to round-trip
include_diff,diff, andlocal_input_file. - Daemon accept scaling: removed fixed cap; now dynamically computes
maxConnectionsfromrecommendedThreads * ioConnPerThread * 4(min 256) unlessYAMS_MAX_ACTIVE_CONNis set. - Backpressure: increased default read pause to 10ms to smooth heavy load.
- Post‑ingest: preserves bounded capacity; de‑dupes inflight, indexes FTS, updates fuzzy index, and emits KG nodes/edges best‑effort.
- Status/Stats: JSON correctness improvements; omit misleading savings when physical size unknown; surface post‑ingest bus usage and document counters.
- CLI Search: grouping of multiple versions per path is enabled by default; paths‑only output
and flat JSON remain unchanged unless
--json-groupedis specified.
Fixed¶
- Regression in metadata extraction and storage used in search and grep tools The async post-ingest pipeline never persisted extracted text into the metadata store. As a result, document_content stayed empty, so search, repairs, and semantic pipelines saw “Document content not found” despite vector insert logs.
- Many tuning optimizations for daemon usage
- Grep pipeline: staged KG → metadata → content with caps and budget.
- Prefers “hot” text (metadata-extracted) and caps cold CAS reads; early path/include filters.
- Added a global time budget (internal) to stop long content scans gracefully.
- Capped grep worker threads to a small, background-friendly number by default (≤4).
- Grep streaming optimization: replaced per-character streambuf overflow with bulk line splitting (memchr-based) to eliminate the per-byte hotspot in profiles during CAS streaming.
- Post-ingest queue: bounded by configuration, not CPU heuristics.
- Default worker threads set conservatively to 1 unless configured in
[tuning]aspost_ingest_threads. Queue capacity now honored frompost_ingest_queue_max. - Added a tiny yield between tasks to reduce contention and smooth CPU.
- Addressed intermittent CLI timeouts and “Broken pipe” logs observed when many agents connected concurrently. Accept loop backoff now respects the higher connection cap and IO pool growth from the tuning manager.
- Minor unit test fixes (Result
value handling) to unblock CI.