What is Purple8 Graph?

The honest answer: it is a new category of thing.

It started as a knowledge graph. It grew a vector index because every production AI system eventually needs one. Then it grew a workflow engine because every AI workflow eventually needs to track state, enforce SLAs, involve a human, and prove — with an immutable audit trail — exactly what the AI decided, when, and why.

The result is not a knowledge graph. It is not a vector database. It is not a workflow engine. It is all three, sharing the same storage layer, the same query engine, and the same process — and it is the sharing that matters.

What it actually contains

A property graph engine with Cypher

Nodes, edges, labels, properties — standard property graph model. A custom Cypher implementation with 161 passing test cases covering MATCH, WHERE, WITH, UNWIND, MERGE, CREATE, DELETE, aggregations, path patterns, and subqueries.

Backed by RocksDB with a write-ahead log. ACID transactions. No JVM. No separate server. pip install and you have the full engine in-process.

HNSW + DiskANN vector search, built into the query planner

Not a sidecar. Not a separate index you query separately and join in application code. The vector index lives inside the same RocksDB instance. The Cypher engine calls into it mid-query:

cypher

CALL db.vector.search('Document', $queryVec, 10) YIELD node, score
WHERE node.region = 'APAC' AND score > 0.85
MATCH (node)-[:AUTHORED_BY]->(author:Person)
RETURN node.title, author.name, score
ORDER BY score DESC

One query. One round-trip. 3.5 ms median at 100k documents.

In-memory HNSW for speed. On-disk DiskANN for datasets that don't fit in RAM (pip install "purple8-graph[diskann]"). BM25 full-text included for hybrid retrieval.

A Journey Engine that uses the graph as its state store

The part that makes Purple8 genuinely different from "Neo4j + Pinecone."

The Journey Engine tracks any real-world entity — a customer, a loan application, a support ticket, a legal matter — as it moves through a defined sequence of stages across multiple systems. Each stage transition is written as a graph edge (ADVANCED_TO). SLA breaches are written as graph edges (SLA_BREACHED). AI decisions are written as graph edges (AI_ADVISED).

The entire operational history of every entity is the graph. You query it with Cypher. You traverse it. You vector-search against it. There is no separate workflow database to sync with.

python

je = JourneyEngine(engine)

je.define_journey("loan_application", stages=[
    StageSpec("submitted"),
    StageSpec("kyc_verified",    sla=SLAPolicy(breach_after_seconds=7200)),
    StageSpec("credit_assessed", sla=SLAPolicy(breach_after_seconds=14400)),
    StageSpec("approved"),
])

instance = je.start("loan_application", entity_id="customer_123")
je.advance(instance.instance_id, to_stage="kyc_verified", actor="SystemB")

Every advance() call writes a graph edge, fires the AI advisor, checks SLAs, and publishes a CDC event — all in the same transaction.

A JourneyAIAdvisor that reads the graph to advise

On every stage transition, JourneyAIAdvisor is called with the full journey definition, the current instance state, the complete transition history from the graph, and any few_shot_patterns extracted from past journeys. It returns a structured recommendation written back to the graph. The AI never sees raw data it shouldn't — the advisor only sees what the graph exposes.

Human-in-the-loop, built in

Stages marked requires_human=True create a HITLTask node. Humans claim, approve, reject, or escalate via REST endpoints secured by the same JWT RBAC layer. The decision is written to the graph. The audit trail is complete without any extra tooling.

Change Data Capture with WebSocket streaming

Every graph mutation — node write, edge write, journey advance, SLA breach — publishes a ChangeEvent to an EventBus. Downstream systems subscribe in real-time via WebSocket (/ws/changes). Events persist to a RocksDB column family for replay. No Kafka required for most workloads.

Envelope encryption with 5 KMS providers

Fields marked sensitive are encrypted at rest with AES-256-GCM. Key wrapping is handled by any of: local key file, HashiCorp Vault, AWS KMS, GCP Cloud KMS, or Azure Key Vault — configurable at startup, no code changes.

REST + GraphQL + MCP — all included

A FastAPI server (purple8-graph serve) exposes the full engine over REST. A Strawberry GraphQL layer is available via pip install "purple8-graph[graphql]". A first-party MCP server (pip install "purple8-graph[mcp]") exposes all 13 tools to Claude, Cursor, or any MCP-compatible agent.

How it compares

	Neo4j	Vector DB (Pinecone / Weaviate)	Purple8 Graph
Property graph + Cypher	✅	❌	✅
Vector search	❌ plugin	✅	✅ native
Hybrid vector + graph in one query	❌ client-side join	❌	✅
Workflow / journey tracking	❌	❌	✅
SLA enforcement	❌	❌	✅
AI decision audit trail	❌	❌	✅
Human-in-the-loop	❌	❌	✅
Real-time CDC / event streaming	plugin	❌	✅
Envelope encryption (5 KMS providers)	Enterprise add-on	❌	✅
MCP server	❌	❌	✅
In-process, no server required	❌	❌	✅
`pip install` → full engine	❌	SDK only	✅

What it can and cannot do compared to Neo4j / TigerGraph

Purple8 ships PageRank, Louvain community detection, Dijkstra shortest path, and betweenness centrality. It has a full Cypher engine, horizontal sharding (ShardedGraphEngine), federated queries across shards, and Raft replication. It can do fraud detection. It can do recommendation graphs. It can do supply-chain tracing at moderate scale.

Where the ceiling actually is:

Dimension	Purple8	Neo4j	TigerGraph
Node/edge scale	Hundreds of millions	Billions+	Billions+
Deep traversal (depth 10+)	Good	Excellent (native graph format)	Excellent
Graph algorithm library	4 built-in	65+ (GDS)	50+
Bulk iterative analytics (GSQL-style)	No	Partial (GDS)	Yes
In-process, no server	✅	❌	❌
Vector search native	✅	Plugin	❌
AI workflow + audit trail	✅	❌	❌
`pip install`	✅	❌	❌

The honest framing: if your workload is purely large-scale graph analytics — 50 billion nodes, depth-15 traversals, 50 graph algorithms, no AI, no vectors, no workflows — Neo4j or TigerGraph will outperform Purple8 at that specific thing.

But if you are building an AI system that also needs a graph — or a graph system that also needs AI — Purple8 is the only option that does not require you to run, sync, and pay for three separate services.

Next steps

Quickstart (pip) — running in 5 minutes
Docker Quickstart — one docker run
Hybrid Search guide — the core query pattern
Journey Engine guide — tracking real-world workflows in the graph
Graph as Memory — how AI decisions accumulate into knowledge
MCP Integration — expose everything to Claude or Cursor

What is Purple8 Graph? ​

What it actually contains ​

A property graph engine with Cypher ​

HNSW + DiskANN vector search, built into the query planner ​

A Journey Engine that uses the graph as its state store ​

A JourneyAIAdvisor that reads the graph to advise ​

Human-in-the-loop, built in ​

Change Data Capture with WebSocket streaming ​

Envelope encryption with 5 KMS providers ​

REST + GraphQL + MCP — all included ​

How it compares ​

What it can and cannot do compared to Neo4j / TigerGraph ​

Next steps ​