Benchmarks

All benchmarks run on a single node (m6i.2xlarge, 8 vCPU, 32 GB RAM, gp3 NVMe SSD) unless otherwise noted. Competitors are run with their recommended single-node production settings.

Hybrid search quality (MS MARCO)

MS MARCO Passage Ranking (dev set, 6,980 queries, 8.8M passages). Evaluated at nDCG@10.

Engine	Mode	nDCG@10	Notes
BM25-only (baseline)	Sparse only	0.184	Standard BM25, no reranking
Purple8 Graph	Vector only	0.341	HNSW, `all-MiniLM-L6-v2`
Purple8 Graph	BM25 + Vector	0.389	RRF merge, α=0.5
Purple8 Graph	BM25 + Vector + Graph	0.412	Graph context reranks top-20
Neo4j Vector	Vector only	0.337	`text-embedding-ada-002`
FalkorDB	Vector only	0.318	`all-MiniLM-L6-v2`

Adding the graph traversal step to BM25+Vector improves nDCG@10 by +0.023 (+6.3%) over the two-modality baseline.

Entity disambiguation (HotpotQA subset)

Multi-hop reasoning queries from HotpotQA (2,000 queries). Each query requires connecting ≥2 entities before a correct answer can be extracted. Evaluated on exact match (EM) score.

Engine	EM Score	Avg. hops resolved	Notes
Vector-only baseline	0.41	1.0	No graph traversal
Purple8 Graph (vector+graph)	0.67	2.3	HNSW seeding + BFS traversal
Neo4j (LangChain graph-rag)	0.58	1.8	Separate vector + graph steps

Throughput — entity disambiguation (Suite A)

500-node knowledge graph, 3-hop queries, 4 concurrent clients, 60-second warm-up.

Engine	QPS	P50 (ms)	P99 (ms)	Memory
Purple8 Graph (HNSW)	1,847	2.1	8.7	1.2 GB
Purple8 Graph (DiskANN)	1,203	3.1	11.2	0.4 GB
Neo4j + vector index	892	4.8	19.3	3.1 GB
Kùzu + manual embedding	741	5.9	24.1	2.4 GB
FalkorDB	634	7.2	31.5	1.8 GB

Throughput — pure graph traversal

100K-node graph, 3-hop MATCH with 4 predicates, 8 concurrent clients.

Engine	QPS	P50 (ms)	P99 (ms)
Purple8 Graph	4,120	1.8	5.9
Neo4j 5.x	3,210	2.4	9.1
Kùzu	3,890	1.9	6.4
FalkorDB	2,740	3.1	11.7

Ingestion throughput

Bulk-ingest 1M nodes + 4M edges from a flat file. Single-threaded write loop.

Engine	Nodes+Edges/sec	Time (1M+4M)	Peak RAM
Purple8 Graph	48,200	104 s	2.1 GB
Neo4j (batch import)	31,500	159 s	4.8 GB
Kùzu	52,100	96 s	3.9 GB
FalkorDB	27,800	180 s	2.6 GB

Kùzu has faster bulk ingest because it is a columnar engine optimized for append-only workloads. Purple8 maintains real-time HNSW index updates during ingestion — disable with P8G_INDEX_DEFERRED=true for bulk-only scenarios.

Memory footprint (1M nodes, 768-dim vectors)

Config	RAM
Purple8 Graph, HNSW, no compression	5.8 GB
Purple8 Graph, HNSW, int8 quantization	2.1 GB
Purple8 Graph, HNSW, binary quantization	0.9 GB
Purple8 Graph, DiskANN (on-disk index)	0.4 GB
Neo4j Vector index	6.4 GB
FalkorDB	5.1 GB

Benchmark reproducibility

All benchmark scripts are available in benchmarks/ at the root of this repository:

bash

# Install benchmark dependencies
pip install purple8-graph[bench]

# Run the entity disambiguation suite
python benchmarks/suite_a_entity_disambiguation.py --n-nodes 500 --hops 3

# Run MS MARCO nDCG evaluation
python benchmarks/eval_ms_marco.py --split dev --limit 6980

Reproduce on your own hardware

We encourage you to run these benchmarks on your own infrastructure with your own data. The numbers above reflect our test hardware and workload profile — your results will vary based on vector dimensionality, graph topology, and query patterns.

Benchmarks ​

Hybrid search quality (MS MARCO) ​

Entity disambiguation (HotpotQA subset) ​

Throughput — entity disambiguation (Suite A) ​

Throughput — pure graph traversal ​

Ingestion throughput ​

Memory footprint (1M nodes, 768-dim vectors) ​

Benchmark reproducibility ​