What is Purple8 Graph?
The honest answer: it is a new category of thing.
It started as a knowledge graph. It grew a vector index because every production AI system eventually needs one. Then it grew a workflow engine because every AI workflow eventually needs to track state, enforce SLAs, involve a human, and prove — with an immutable audit trail — exactly what the AI decided, when, and why.
The result is not a knowledge graph. It is not a vector database. It is not a workflow engine. It is all three, sharing the same storage layer, the same query engine, and the same process — and it is the sharing that matters.
What it actually contains
A property graph engine with Cypher
Nodes, edges, labels, properties — standard property graph model. A custom Cypher implementation with 161 passing test cases covering MATCH, WHERE, WITH, UNWIND, MERGE, CREATE, DELETE, aggregations, path patterns, and subqueries.
Backed by RocksDB with a write-ahead log. ACID transactions. No JVM. No separate server. pip install and you have the full engine in-process.
HNSW + DiskANN vector search, built into the query planner
Not a sidecar. Not a separate index you query separately and join in application code. The vector index lives inside the same RocksDB instance. The Cypher engine calls into it mid-query:
CALL db.vector.search('Document', $queryVec, 10) YIELD node, score
WHERE node.region = 'APAC' AND score > 0.85
MATCH (node)-[:AUTHORED_BY]->(author:Person)
RETURN node.title, author.name, score
ORDER BY score DESCOne query. One round-trip. 3.5 ms median at 100k documents.
In-memory HNSW for speed. On-disk DiskANN for datasets that don't fit in RAM (pip install "purple8-graph[diskann]"). BM25 full-text included for hybrid retrieval.
A Journey Engine that uses the graph as its state store
The part that makes Purple8 genuinely different from "Neo4j + Pinecone."
The Journey Engine tracks any real-world entity — a customer, a loan application, a support ticket, a legal matter — as it moves through a defined sequence of stages across multiple systems. Each stage transition is written as a graph edge (ADVANCED_TO). SLA breaches are written as graph edges (SLA_BREACHED). AI decisions are written as graph edges (AI_ADVISED).
The entire operational history of every entity is the graph. You query it with Cypher. You traverse it. You vector-search against it. There is no separate workflow database to sync with.
je = JourneyEngine(engine)
je.define_journey("loan_application", stages=[
StageSpec("submitted"),
StageSpec("kyc_verified", sla=SLAPolicy(breach_after_seconds=7200)),
StageSpec("credit_assessed", sla=SLAPolicy(breach_after_seconds=14400)),
StageSpec("approved"),
])
instance = je.start("loan_application", entity_id="customer_123")
je.advance(instance.instance_id, to_stage="kyc_verified", actor="SystemB")Every advance() call writes a graph edge, fires the AI advisor, checks SLAs, and publishes a CDC event — all in the same transaction.
A JourneyAIAdvisor that reads the graph to advise
On every stage transition, JourneyAIAdvisor is called with the full journey definition, the current instance state, the complete transition history from the graph, and any few_shot_patterns extracted from past journeys. It returns a structured recommendation written back to the graph. The AI never sees raw data it shouldn't — the advisor only sees what the graph exposes.
Human-in-the-loop, built in
Stages marked requires_human=True create a HITLTask node. Humans claim, approve, reject, or escalate via REST endpoints secured by the same JWT RBAC layer. The decision is written to the graph. The audit trail is complete without any extra tooling.
Change Data Capture with WebSocket streaming
Every graph mutation — node write, edge write, journey advance, SLA breach — publishes a ChangeEvent to an EventBus. Downstream systems subscribe in real-time via WebSocket (/ws/changes). Events persist to a RocksDB column family for replay. No Kafka required for most workloads.
Envelope encryption with 5 KMS providers
Fields marked sensitive are encrypted at rest with AES-256-GCM. Key wrapping is handled by any of: local key file, HashiCorp Vault, AWS KMS, GCP Cloud KMS, or Azure Key Vault — configurable at startup, no code changes.
REST + GraphQL + MCP — all included
A FastAPI server (purple8-graph serve) exposes the full engine over REST. A Strawberry GraphQL layer is available via pip install "purple8-graph[graphql]". A first-party MCP server (pip install "purple8-graph[mcp]") exposes all 13 tools to Claude, Cursor, or any MCP-compatible agent.
How it compares
| Neo4j | Vector DB (Pinecone / Weaviate) | Purple8 Graph | |
|---|---|---|---|
| Property graph + Cypher | ✅ | ❌ | ✅ |
| Vector search | ❌ plugin | ✅ | ✅ native |
| Hybrid vector + graph in one query | ❌ client-side join | ❌ | ✅ |
| Workflow / journey tracking | ❌ | ❌ | ✅ |
| SLA enforcement | ❌ | ❌ | ✅ |
| AI decision audit trail | ❌ | ❌ | ✅ |
| Human-in-the-loop | ❌ | ❌ | ✅ |
| Real-time CDC / event streaming | plugin | ❌ | ✅ |
| Envelope encryption (5 KMS providers) | Enterprise add-on | ❌ | ✅ |
| MCP server | ❌ | ❌ | ✅ |
| In-process, no server required | ❌ | ❌ | ✅ |
pip install → full engine | ❌ | SDK only | ✅ |
What it can and cannot do compared to Neo4j / TigerGraph
Purple8 ships PageRank, Louvain community detection, Dijkstra shortest path, and betweenness centrality. It has a full Cypher engine, horizontal sharding (ShardedGraphEngine), federated queries across shards, and Raft replication. It can do fraud detection. It can do recommendation graphs. It can do supply-chain tracing at moderate scale.
Where the ceiling actually is:
| Dimension | Purple8 | Neo4j | TigerGraph |
|---|---|---|---|
| Node/edge scale | Hundreds of millions | Billions+ | Billions+ |
| Deep traversal (depth 10+) | Good | Excellent (native graph format) | Excellent |
| Graph algorithm library | 4 built-in | 65+ (GDS) | 50+ |
| Bulk iterative analytics (GSQL-style) | No | Partial (GDS) | Yes |
| In-process, no server | ✅ | ❌ | ❌ |
| Vector search native | ✅ | Plugin | ❌ |
| AI workflow + audit trail | ✅ | ❌ | ❌ |
pip install | ✅ | ❌ | ❌ |
The honest framing: if your workload is purely large-scale graph analytics — 50 billion nodes, depth-15 traversals, 50 graph algorithms, no AI, no vectors, no workflows — Neo4j or TigerGraph will outperform Purple8 at that specific thing.
But if you are building an AI system that also needs a graph — or a graph system that also needs AI — Purple8 is the only option that does not require you to run, sync, and pay for three separate services.
Next steps
- Quickstart (pip) — running in 5 minutes
- Docker Quickstart — one
docker run - Hybrid Search guide — the core query pattern
- Journey Engine guide — tracking real-world workflows in the graph
- Graph as Memory — how AI decisions accumulate into knowledge
- MCP Integration — expose everything to Claude or Cursor