Benchmarks
All benchmarks run on a single node (m6i.2xlarge, 8 vCPU, 32 GB RAM, gp3 NVMe SSD) unless otherwise noted. Competitors are run with their recommended single-node production settings.
Hybrid search quality (MS MARCO)
MS MARCO Passage Ranking (dev set, 6,980 queries, 8.8M passages). Evaluated at nDCG@10.
| Engine | Mode | nDCG@10 | Notes |
|---|---|---|---|
| BM25-only (baseline) | Sparse only | 0.184 | Standard BM25, no reranking |
| Purple8 Graph | Vector only | 0.341 | HNSW, all-MiniLM-L6-v2 |
| Purple8 Graph | BM25 + Vector | 0.389 | RRF merge, α=0.5 |
| Purple8 Graph | BM25 + Vector + Graph | 0.412 | Graph context reranks top-20 |
| Neo4j Vector | Vector only | 0.337 | text-embedding-ada-002 |
| FalkorDB | Vector only | 0.318 | all-MiniLM-L6-v2 |
Adding the graph traversal step to BM25+Vector improves nDCG@10 by +0.023 (+6.3%) over the two-modality baseline.
Entity disambiguation (HotpotQA subset)
Multi-hop reasoning queries from HotpotQA (2,000 queries). Each query requires connecting ≥2 entities before a correct answer can be extracted. Evaluated on exact match (EM) score.
| Engine | EM Score | Avg. hops resolved | Notes |
|---|---|---|---|
| Vector-only baseline | 0.41 | 1.0 | No graph traversal |
| Purple8 Graph (vector+graph) | 0.67 | 2.3 | HNSW seeding + BFS traversal |
| Neo4j (LangChain graph-rag) | 0.58 | 1.8 | Separate vector + graph steps |
Throughput — entity disambiguation (Suite A)
500-node knowledge graph, 3-hop queries, 4 concurrent clients, 60-second warm-up.
| Engine | QPS | P50 (ms) | P99 (ms) | Memory |
|---|---|---|---|---|
| Purple8 Graph (HNSW) | 1,847 | 2.1 | 8.7 | 1.2 GB |
| Purple8 Graph (DiskANN) | 1,203 | 3.1 | 11.2 | 0.4 GB |
| Neo4j + vector index | 892 | 4.8 | 19.3 | 3.1 GB |
| Kùzu + manual embedding | 741 | 5.9 | 24.1 | 2.4 GB |
| FalkorDB | 634 | 7.2 | 31.5 | 1.8 GB |
Throughput — pure graph traversal
100K-node graph, 3-hop MATCH with 4 predicates, 8 concurrent clients.
| Engine | QPS | P50 (ms) | P99 (ms) |
|---|---|---|---|
| Purple8 Graph | 4,120 | 1.8 | 5.9 |
| Neo4j 5.x | 3,210 | 2.4 | 9.1 |
| Kùzu | 3,890 | 1.9 | 6.4 |
| FalkorDB | 2,740 | 3.1 | 11.7 |
Ingestion throughput
Bulk-ingest 1M nodes + 4M edges from a flat file. Single-threaded write loop.
| Engine | Nodes+Edges/sec | Time (1M+4M) | Peak RAM |
|---|---|---|---|
| Purple8 Graph | 48,200 | 104 s | 2.1 GB |
| Neo4j (batch import) | 31,500 | 159 s | 4.8 GB |
| Kùzu | 52,100 | 96 s | 3.9 GB |
| FalkorDB | 27,800 | 180 s | 2.6 GB |
Kùzu has faster bulk ingest because it is a columnar engine optimized for append-only workloads. Purple8 maintains real-time HNSW index updates during ingestion — disable with P8G_INDEX_DEFERRED=true for bulk-only scenarios.
Memory footprint (1M nodes, 768-dim vectors)
| Config | RAM |
|---|---|
| Purple8 Graph, HNSW, no compression | 5.8 GB |
| Purple8 Graph, HNSW, int8 quantization | 2.1 GB |
| Purple8 Graph, HNSW, binary quantization | 0.9 GB |
| Purple8 Graph, DiskANN (on-disk index) | 0.4 GB |
| Neo4j Vector index | 6.4 GB |
| FalkorDB | 5.1 GB |
Benchmark reproducibility
All benchmark scripts are available in benchmarks/ at the root of this repository:
# Install benchmark dependencies
pip install purple8-graph[bench]
# Run the entity disambiguation suite
python benchmarks/suite_a_entity_disambiguation.py --n-nodes 500 --hops 3
# Run MS MARCO nDCG evaluation
python benchmarks/eval_ms_marco.py --split dev --limit 6980Reproduce on your own hardware
We encourage you to run these benchmarks on your own infrastructure with your own data. The numbers above reflect our test hardware and workload profile — your results will vary based on vector dimensionality, graph topology, and query patterns.