Skip to content

Benchmarks

All benchmarks run on a single node (m6i.2xlarge, 8 vCPU, 32 GB RAM, gp3 NVMe SSD) unless otherwise noted. Competitors are run with their recommended single-node production settings.

Hybrid search quality (MS MARCO)

MS MARCO Passage Ranking (dev set, 6,980 queries, 8.8M passages). Evaluated at nDCG@10.

EngineModenDCG@10Notes
BM25-only (baseline)Sparse only0.184Standard BM25, no reranking
Purple8 GraphVector only0.341HNSW, all-MiniLM-L6-v2
Purple8 GraphBM25 + Vector0.389RRF merge, α=0.5
Purple8 GraphBM25 + Vector + Graph0.412Graph context reranks top-20
Neo4j VectorVector only0.337text-embedding-ada-002
FalkorDBVector only0.318all-MiniLM-L6-v2

Adding the graph traversal step to BM25+Vector improves nDCG@10 by +0.023 (+6.3%) over the two-modality baseline.

Entity disambiguation (HotpotQA subset)

Multi-hop reasoning queries from HotpotQA (2,000 queries). Each query requires connecting ≥2 entities before a correct answer can be extracted. Evaluated on exact match (EM) score.

EngineEM ScoreAvg. hops resolvedNotes
Vector-only baseline0.411.0No graph traversal
Purple8 Graph (vector+graph)0.672.3HNSW seeding + BFS traversal
Neo4j (LangChain graph-rag)0.581.8Separate vector + graph steps

Throughput — entity disambiguation (Suite A)

500-node knowledge graph, 3-hop queries, 4 concurrent clients, 60-second warm-up.

EngineQPSP50 (ms)P99 (ms)Memory
Purple8 Graph (HNSW)1,8472.18.71.2 GB
Purple8 Graph (DiskANN)1,2033.111.20.4 GB
Neo4j + vector index8924.819.33.1 GB
Kùzu + manual embedding7415.924.12.4 GB
FalkorDB6347.231.51.8 GB

Throughput — pure graph traversal

100K-node graph, 3-hop MATCH with 4 predicates, 8 concurrent clients.

EngineQPSP50 (ms)P99 (ms)
Purple8 Graph4,1201.85.9
Neo4j 5.x3,2102.49.1
Kùzu3,8901.96.4
FalkorDB2,7403.111.7

Ingestion throughput

Bulk-ingest 1M nodes + 4M edges from a flat file. Single-threaded write loop.

EngineNodes+Edges/secTime (1M+4M)Peak RAM
Purple8 Graph48,200104 s2.1 GB
Neo4j (batch import)31,500159 s4.8 GB
Kùzu52,10096 s3.9 GB
FalkorDB27,800180 s2.6 GB

Kùzu has faster bulk ingest because it is a columnar engine optimized for append-only workloads. Purple8 maintains real-time HNSW index updates during ingestion — disable with P8G_INDEX_DEFERRED=true for bulk-only scenarios.

Memory footprint (1M nodes, 768-dim vectors)

ConfigRAM
Purple8 Graph, HNSW, no compression5.8 GB
Purple8 Graph, HNSW, int8 quantization2.1 GB
Purple8 Graph, HNSW, binary quantization0.9 GB
Purple8 Graph, DiskANN (on-disk index)0.4 GB
Neo4j Vector index6.4 GB
FalkorDB5.1 GB

Benchmark reproducibility

All benchmark scripts are available in benchmarks/ at the root of this repository:

bash
# Install benchmark dependencies
pip install purple8-graph[bench]

# Run the entity disambiguation suite
python benchmarks/suite_a_entity_disambiguation.py --n-nodes 500 --hops 3

# Run MS MARCO nDCG evaluation
python benchmarks/eval_ms_marco.py --split dev --limit 6980

Reproduce on your own hardware

We encourage you to run these benchmarks on your own infrastructure with your own data. The numbers above reflect our test hardware and workload profile — your results will vary based on vector dimensionality, graph topology, and query patterns.

Purple8 Graph is proprietary software. All rights reserved.