Skip to content

YAMS Performance Benchmark Report

Generated: 2026-01-26 YAMS Version: 0.8.0-dev (027fdc0) Test Environment: macOS 26.2 (Apple Silicon M3 Max, 16 cores, 48GB RAM) Build Configuration: Release build

Note: This page is the canonical place for benchmark results. Keep the latest numbers inlined here (avoid relying on generated bench_results/* artifacts).

Contents

Executive Summary

This report focuses on benchmark changes that are easy to interpret and compare across runs (primarily ingestion + metadata + IPC framing). Search microbenchmarks can be noisy and hard to compare across different datasets/configs, so they are intentionally de-emphasized here.

Current Baseline (Release, 2026-01-26):

Benchmark Throughput Notes
Ingestion_SmallDocument 4,078 ops/s 1 KB document
Ingestion_MediumDocument 300 ops/s 100 KB document
Metadata_SingleUpdate 25,510 ops/s 1,000 docs
Metadata_BulkUpdate(500) 119,002 ops/s 500 updates/batch
IPC StreamingFramer_32x10 16,548 ops/s 256 B chunks
IPC UnaryFramer_8KB 50,000 ops/s 8 KB payload

Historical Debug Baseline (Jan 2026 vs Oct 2025):

Benchmark Oct 2025 Jan 2026 Change
Ingestion_SmallDocument 2,771 ops/s 2,821 ops/s ~same
Ingestion_MediumDocument 56 ops/s 57 ops/s ~same
Metadata_SingleUpdate 10,537 ops/s 13,966 ops/s +33%
Metadata_BulkUpdate(500) 7,823 ops/s 51,341 ops/s +6.5x
IPC StreamingFramer_32x10 - 3,732 ops/s new
IPC UnaryFramer_8KB - 10,088 ops/s new

Note: Debug builds with ThreadSanitizer (TSAN) add ~2-6x overhead. The historical numbers above are from Debug builds without TSAN for fair comparison with the October 2025 baseline.

Test Environment Specifications

  • Platform: macOS 26.2 (Build 25C56)
  • Hardware: MacBook Pro (Mac15,9)
  • CPU: Apple M3 Max, 16 cores (12 performance + 4 efficiency)
  • Memory: 48 GB unified memory
  • Cache Hierarchy: L1D 64KB, L1I 128KB, L2 4MB (x16)
  • Compiler: Apple Clang 17.0.0 (clang-1700.6.3.2) with C++23 standard
  • Build Type: Release
  • Package Management: Conan 2.0
  • Build System: Meson

Note: Use Release builds for public-facing numbers. Debug + TSAN can add ~2-6x overhead.

Available Benchmark Executables

Located in build/release/builddir/tests/benchmarks/: - yams_api_benchmarks - API ingestion and metadata operations (writes bench_results/api_*) - yams_search_benchmarks - Search + query parsing (writes bench_results/search_*) - ipc_stream_bench - IPC streaming performance (writes bench_results/ipc_stream_*) - retrieval_quality_bench - Retrieval-quality evaluation (stdout metrics; uses embedded daemon harness) - yams_retrieval_service_benchmarks - Retrieval service benchmarks - metadata_path_query_bench - Metadata query performance - tree_list_filter_bench - Tree-based list filtering - tree_diff_benchmarks - Tree diff operations - ingestion_throughput_bench - Ingestion throughput - daemon_socket_accept_bench - Daemon socket operations (GTest runner)

Performance Benchmarks

Latest Release Runs (2026-01-26)

Run Commands

cd build/release/builddir

# API
./tests/benchmarks/yams_api_benchmarks --iterations 5

# IPC streaming
./tests/benchmarks/ipc_stream_bench --benchmark_min_time=0.05

API Benchmarks

(From build/release/builddir/tests/benchmarks/yams_api_benchmarks, --iterations 5)

Benchmark Latency Throughput
Ingestion_SmallDocument 0.25 ms 4,078 ops/sec
Ingestion_MediumDocument 3.34 ms 300 ops/sec
Metadata_SingleUpdate 0.04 ms 25,510 ops/sec
Metadata_BulkUpdate 4.20 ms 119,002 ops/sec

Search Benchmarks

Search benchmarks are intentionally not included in the “improvements” summary because the reported numbers can be misleading (different datasets, caching, and internal operation definitions). If you need them for profiling, run the benchmark binary and inspect its output locally:

./tests/benchmarks/yams_search_benchmarks --quiet --iterations 5

IPC Streaming Benchmarks

(From build/release/builddir/tests/benchmarks/ipc_stream_bench, --benchmark_min_time=0.05)

Benchmark Latency Throughput
StreamingFramer_32x10_256B 0.66 ms 16,548 ops/sec
StreamingFramer_64x6_512B 1.23 ms 5,681 ops/sec
UnaryFramer_Success_8KB 0.02 ms 50,000 ops/sec

Retrieval Quality Benchmark

Skipped in this run (release benchmarks focused on ingestion, metadata, and IPC framing).

1. Cryptographic Operations (SHA-256)

Operation Data Size Throughput Latency Performance Impact
Small Files 1KB 511 MB/s 1.9 μs Excellent for small files
Small Files 4KB 1.27 GB/s 3.0 μs Near memory bandwidth
Medium Files 32KB 2.35 GB/s 13.0 μs Optimal throughput
Large Files 64KB 2.47 GB/s 24.7 μs Peak performance
Bulk Data 10MB 2.66 GB/s 3.67 ms Sustained high throughput
Streaming 10MB 2.65 GB/s 3.69 ms Consistent with bulk

Real-World Impact: - Can hash a 1GB file in ~375ms - Processes 40,000+ small files per second - Zero bottleneck for network-speed ingestion (even 10GbE)

2. Content Chunking (Rabin Fingerprinting)

Operation Data Size Throughput Latency Real-World Impact
Small Files 1MB 186.7 MB/s 5.36 ms Chunks 35 files/second
Large Files 10MB 183.8 MB/s 54.4 ms Chunks 18 files/second

Real-World Impact: - Processes 1GB in ~5.5 seconds for content-defined chunking - Achieves 30-40% deduplication on typical development datasets - 8KB average chunk size optimizes dedup vs overhead balance - Suitable for real-time chunking at gigabit ingestion speeds

3. Compression Performance (Zstandard)

Compression Benchmarks

Data Size Level Compression Speed Throughput Efficiency
1KB 1 397 MB/s Level 1 Optimal for small files
1KB 3 395 MB/s Level 3 Balanced performance
1KB 9 304 MB/s Level 9 High compression
10KB 1 3.52 GB/s Level 1 Excellent throughput
10KB 3 3.46 GB/s Level 3 Good balance
10KB 9 2.74 GB/s Level 9 Compressed efficiently
100KB 1 14.0 GB/s Level 1 Near memory bandwidth
100KB 3 13.5 GB/s Level 3 High performance
100KB 9 6.23 GB/s Level 9 Good compression
1MB 1 20.0 GB/s Level 1 Peak performance
1MB 3 19.8 GB/s Level 3 Optimal balance
1MB 9 4.36 GB/s Level 9 High compression ratio

Decompression Benchmarks

Data Size Decompression Speed Throughput
1KB 760 MB/s 1.35 μs
10KB 5.91 GB/s 1.73 μs
100KB 15.1 GB/s 6.80 μs
1MB 21.0 GB/s 50.0 μs

Analysis: Compression performance reaches 20.0 GB/s for 1MB blocks. Level 1-3 provides optimal speed-to-compression ratio balance for production use.

Compression by Data Pattern

Pattern Throughput Compression Ratio Use Case
Zeros 18.1 GB/s Excellent Sparse files
Text 13.7 GB/s Very Good Documents
Binary 18.0 GB/s Good Executables
Random 8.9 GB/s Minimal Encrypted data

Compression Level Analysis

Level Speed (Gi/s) Compressed Size Ratio Recommendation
1-2 20.1 GB/s 191 bytes 5.5k:1 Optimal for speed
3-5 19.8 GB/s 190 bytes 5.5k:1 Balanced performance
6-7 6.3 GB/s 190 bytes 5.5k:1 Diminishing returns
8-9 4.3 GB/s 190 bytes 5.5k:1 High compression only

4. Concurrent Compression Performance

Threads Throughput Scalability Items/Second
1 1.60 GB/s Baseline 156K items/s
2 5.62 GB/s 3.5x 549K items/s
4 13.7 GB/s 8.6x 1.34M items/s
8 20.9 GB/s 13.1x 2.04M items/s
16 41.2 GB/s 25.8x 4.02M items/s

Analysis: Linear scaling achieved up to 16 threads with 25.8x speedup. Peak throughput of 41.2 GB/s demonstrates excellent parallel efficiency.

Key Performance Insights

Performance Strengths

  1. Compression Performance: Zstandard integration delivers 20+ GB/s throughput with excellent compression ratios
  2. Parallel Scaling: Linear scaling achieved up to 16 threads with 25.8x speedup
  3. Query Processing: Up to 3.4M items/second tokenization rate for complex queries
  4. Result Ranking: Partial sort algorithms provide 10x performance improvement for top-K operations
  5. Memory Efficiency: Stable performance maintained across varying data sizes

Areas for Investigation

  1. Vector Database Operations: 35 of 38 tests failing, requires architectural review
  2. PDF Extraction: 6 of 17 tests failing, text extraction pipeline needs improvement
  3. Metadata Repository: 4 of 22 tests failing, primarily FTS5 configuration issues
  • Compression Level: 3 (optimal speed-to-compression ratio)
  • Thread Pool Size: 8-16 threads (linear scaling observed)
  • Memory Allocation: Match L2 cache size (4MB per core)

Benchmark Methodology

Current Issues & Action Items

  • yams_search_benchmarks can hit database is locked depending on run concurrency and prior temporary state. Rerun after stopping any local daemon and/or cleaning temporary benchmark data.
  • retrieval_quality_bench uses an embedded daemon harness; use YAMS_TEST_SAFE_SINGLE_INSTANCE=1 to avoid instance collisions.
  • If the host is saturated, cap post-ingest concurrency with: YAMS_POST_INGEST_TOTAL_CONCURRENT=4 YAMS_POST_EXTRACTION_CONCURRENT=2 YAMS_POST_KG_CONCURRENT=2 YAMS_POST_EMBED_CONCURRENT=1

Test Execution

For Release Build Benchmarks (recommended):

# Configure and build release version
cd build/release
conan install ../.. -s build_type=Release --build=missing
meson setup builddir -Dbuildtype=release -Dbuild-tests=true
meson compile

# Run benchmarks
./builddir/tests/benchmarks/yams_api_benchmarks --benchmark_format=json
./builddir/tests/benchmarks/yams_search_benchmarks --benchmark_format=json
./builddir/tests/benchmarks/tree_diff_benchmarks --benchmark_format=json

For Debug Build (current):

cd build/debug
./tests/benchmarks/yams_api_benchmarks
./tests/benchmarks/yams_search_benchmarks

Unit Tests (for test pass rate statistics):

cd build/debug
meson test --suite unit --print-errorlogs

Data Generation

  • Synthetic Data: Generated test patterns (zeros, text, binary, random)
  • Size Range: 1KB to 10MB for comprehensive coverage
  • Iteration Count: Sufficient iterations for statistical significance
  • Timing: CPU time measurements with Google Benchmark framework

Hardware Considerations

  • Tests run on Apple Silicon with hardware SHA acceleration
  • Results may vary on different architectures (x86_64, ARM64 without acceleration)
  • Memory bandwidth and cache performance significantly impact results

Known Issues and Limitations

  1. Vector Database Module: Significant test failures (35/38) indicate architectural issues requiring investigation. Core search functionality unaffected.

  2. PDF Text Extraction: Partial test failures (6/17) suggest text extraction pipeline needs refinement for edge cases.

  3. Search Integration: Some search executor benchmarks fail due to missing database initialization in benchmark environment.

Conclusion

YAMS demonstrates strong performance characteristics across core components:

  • Parallel processing exhibits linear scaling to 16 threads
  • Query processing delivers high-throughput tokenization and ranking
  • Memory efficiency maintained across varying workload sizes
  • Overall architecture optimized for high-performance production deployment

The benchmark results validate YAMS as a high-performance content-addressable storage system. Test failures in non-critical modules (vector database, PDF extraction) require attention but do not impact core functionality.

Appendix: Test Suite Results

Historical Unit Test Coverage (October 11, 2025) **Test Execution Summary**: - **Unit Test Shards**: 6 shards with parallel execution - **Total Tests Executed**: ~500+ across all shards - **Passed Tests**: 503+ tests - **Failed Tests**: 6 tests - **Skipped Tests**: ~10 tests - **Overall Pass Rate**: ~98.8% **Known Failures**: 1. `SearchServiceTest.SnippetHydrationTimeoutReportsStats` - Timeout handling 2. `RepairUtilScanTest.MissingEmbeddingsListStableUnderPostIngestLoad` - Load testing 3. `ReferenceCounterTest.Statistics` - Statistics reporting 4. `GrepServiceUnicodeTest.LiteralUnicodeAndEmoji` - Unicode handling 5. `MCPSchemaTest.ListTools_ContainsAllExpectedTools` - MCP tool listing 6. `FtsSearchQuerySpecIntegration.BasicFtsWhenAvailable` - FTS5 integration timing 7. `VersioningIndexerTest.PathSeries_NewThenUpdate_CreatesVersionEdgeAndFlags` - Versioning edge cases **Component-Level Status**: - **Core Functionality**: ✅ STABLE (hashing, compression, chunking, WAL) - **Search Engine**: ✅ STABLE (503+ tests passing) - **Metadata Repository**: ✅ STABLE - **API Services**: ✅ STABLE (124-127 tests passing per shard) - **Vector Database**: ⚠️ Disabled in test runs (`YAMS_DISABLE_VECTORS=1`) - **MCP Integration**: ⚠️ Minor issues with tool listing **Test Infrastructure**: - Tests run with strict memory sanitizers (ASAN, UBSAN, MSAN) - SQLite busy timeout: 1000ms - Vector database: In-memory mode - Test isolation: Single instance mode enabled