Skip to content

Your First Full Query

A worked example using real sentence embeddings and a realistic knowledge graph.

What we'll build

A small research knowledge graph — documents, authors, and topics — then run entity disambiguation: the hardest RAG query (and the one where Purple8's graph context matters most).

The problem: You search for "transformer architecture" and two relevant documents come back, both by an "Alice Chen." Which Alice Chen? A graph database can tell them apart by their neighbourhood.

Setup

bash
pip install purple8-graph sentence-transformers

Build the graph

python
from purple8_graph import GraphEngine
from sentence_transformers import SentenceTransformer
import numpy as np

engine = GraphEngine("./research_graph")
model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dim

# ── Documents ──────────────────────────────────────────────────────────────
docs = [
    ("d1", "Attention mechanisms in neural machine translation", "ML"),
    ("d2", "Self-attention for document classification",         "ML"),
    ("d3", "Protein folding with graph neural networks",         "Biology"),
    ("d4", "AlphaFold and structure prediction",                 "Biology"),
]

for doc_id, title, topic in docs:
    embedding = model.encode(title).tolist()
    engine.add_node(doc_id, labels=["Document"], properties={
        "title": title,
        "topic": topic,
        "embedding": embedding,
    })

# ── Authors — two people named "Alice Chen" ────────────────────────────────
engine.add_node("alice_ml",  labels=["Person"], properties={"name": "Alice Chen", "field": "ML"})
engine.add_node("alice_bio", labels=["Person"], properties={"name": "Alice Chen", "field": "Biology"})
engine.add_node("bob",       labels=["Person"], properties={"name": "Bob Smith",  "field": "ML"})

# ── Topics ─────────────────────────────────────────────────────────────────
engine.add_node("ml_topic",  labels=["Topic"], properties={"name": "Machine Learning"})
engine.add_node("bio_topic", labels=["Topic"], properties={"name": "Biology"})

# ── Edges ──────────────────────────────────────────────────────────────────
engine.add_edge("d1", "alice_ml",  "AUTHORED_BY")
engine.add_edge("d2", "alice_ml",  "AUTHORED_BY")
engine.add_edge("d2", "bob",       "AUTHORED_BY")
engine.add_edge("d3", "alice_bio", "AUTHORED_BY")
engine.add_edge("d4", "alice_bio", "AUTHORED_BY")

engine.add_edge("d1", "ml_topic",  "BELONGS_TO")
engine.add_edge("d2", "ml_topic",  "BELONGS_TO")
engine.add_edge("d3", "bio_topic", "BELONGS_TO")
engine.add_edge("d4", "bio_topic", "BELONGS_TO")

# ── Vector index ───────────────────────────────────────────────────────────
engine.create_vector_index("Document", "embedding", dim=384)
print("Graph built.")

Entity disambiguation query

The key query: find documents about "attention" — then use graph context to identify which Alice Chen authored each match.

python
query = "self-attention transformer"
query_vec = model.encode(query).tolist()

results = engine.execute_cypher("""
    CALL db.vector.search('Document', $vec, 5) YIELD node, score
    WHERE score > 0.5
    MATCH (node)-[:AUTHORED_BY]->(author:Person)
    MATCH (node)-[:BELONGS_TO]->(topic:Topic)
    RETURN
        node.title      AS title,
        author.name     AS author,
        author.field    AS field,
        topic.name      AS topic,
        score
    ORDER BY score DESC
""", {"vec": query_vec})

for r in results:
    print(f"  [{r['score']:.3f}]  {r['title']}")
    print(f"           by {r['author']} ({r['field']}) — {r['topic']}")

Output:

  [0.921]  Self-attention for document classification
           by Alice Chen (ML) — Machine Learning
  [0.884]  Attention mechanisms in neural machine translation
           by Alice Chen (ML) — Machine Learning

The two documents returned both link to alice_ml, not alice_bio. The graph disambiguated — without any extra query, any client-side logic, or any post-processing.

What a vector-only system returns

A pure vector search (no graph) returns both documents, both labelled "Alice Chen" — with no way to know they're the same person, or which Alice Chen you meant. The disambiguation only works because the graph context is traversed in the same query.

Next steps

Purple8 Graph is proprietary software. All rights reserved.