Skip to content

Schema & Data Model

Purple8 Graph operates on a data-first model — you can write data immediately without defining a schema. The schema is optional, and when you do define one, it can be discovered from existing data rather than declared upfront.

The spectrum

Purple8 operates across a full spectrum from completely schemaless to strictly enforced:

schemaless          warnings only         strictly enforced
    │                    │                      │
GraphEngine()       SchemaValidator(        SchemaValidator(
                      strict_mode=False)      strict_mode=True)

Pole 1 — Fully schemaless (default)

python
from purple8_graph import GraphEngine

engine = GraphEngine("./data")

# Any label, any property, any shape — accepted without restriction
engine.add_node("n1", labels=["Person"],   properties={"name": "Alice", "age": 30})
engine.add_node("n2", labels=["Document"], properties={"title": "Q4 Report", "pages": 42})
engine.add_node("n3", labels=["Whatever"], properties={"foo": "bar", "nested": {"a": 1}})

No schema file. No DDL. No migration. Just write.

Pole 2 — Warnings only

python
from purple8_graph.validation import SchemaValidator, ValidatingGraphEngine, GraphSchema, NodeSchema, PropertySchema, PropertyType

schema = GraphSchema(name="my_schema")
schema.add_node_schema(NodeSchema(
    label="Person",
    properties=[
        PropertySchema(name="name", type=PropertyType.STRING, required=True),
        PropertySchema(name="age",  type=PropertyType.INTEGER),
    ],
))

validator = SchemaValidator(strict_mode=False, allow_extra_properties=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)

# Validates but doesn't reject — logs warnings for unknown labels or missing required fields
validated_engine.add_node("n4", labels=["UnknownLabel"], properties={"x": 1})

Pole 3 — Strict enforcement

python
validator_strict = SchemaValidator(strict_mode=True, allow_extra_properties=False)
validated_engine = ValidatingGraphEngine(engine, validator_strict, schema)

# This raises ValidationError — "UnknownLabel" is not in the schema
validated_engine.add_node("n5", labels=["UnknownLabel"], properties={"x": 1})

# This raises ValidationError — "age" is the wrong type
validated_engine.add_node("n6", labels=["Person"], properties={"name": "Bob", "age": "thirty"})

Discovering the schema from existing data

The most important feature of Purple8's schema model: schema can be an output of discovery, not an input.

python
from purple8_graph.validation import create_schema_from_graph

# Build a knowledge graph from 10,000 documents — no schema upfront
engine = GraphEngine("./data")

# ... add thousands of nodes and edges from LLM extraction ...

# NOW infer the schema from what was actually written
schema = create_schema_from_graph(engine)

print(schema.node_schemas)
# → [Person, Organization, Location, Event, Document, ...]

print(schema.edge_schemas)
# → [WORKS_FOR, LOCATED_IN, ATTENDED, AUTHORED_BY, ...]

# Now enforce it going forward
validator = SchemaValidator(strict_mode=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)

create_schema_from_graph() scans all nodes and edges, infers property types from observed values, and returns a GraphSchema you can inspect, modify, and enforce.

LLM-inferred schema

For teams who want to define a schema from sample documents before any data exists:

python
from purple8_graph.genai import SchemaDetector, OpenAIProvider

provider = OpenAIProvider(api_key="...")
detector = SchemaDetector(provider)

# Feed sample documents — LLM infers entities and relationships
schema = detector.detect_schema(sample_documents[:50])

# Returns GraphSchema with NodeSchema + EdgeSchema inferred by the LLM
print(schema.node_schemas)   # → [Person, Organization, Document, ...]

The schema is an output of AI inference — not a prerequisite.

Property types

PropertyTypePython equivalentExample
STRINGstr"Alice"
INTEGERint42
FLOATfloat3.14
BOOLEANboolTrue
DATETIMEdatetimedatetime(2026, 3, 25)
LISTlist[1, 2, 3]
DICTdict{"key": "value"}

Comparison with schema-first systems

SystemSchema modelFirst write requires schema?Retroactive schema inference?
Purple8Data-first, optional strict mode❌ Nocreate_schema_from_graph()
Neo4jSchema-optional (explicit constraints)❌ No❌ No
KùzuSchemaless❌ No❌ No
Spanner GraphSchema-first (DDL required)✅ Yes❌ No
TigerGraphSchema-first (DDL required)✅ Yes❌ No

Why this matters for AI workloads

When you build a knowledge graph from LLM-extracted entities and relationships, the schema is emergent — it's a property of your data that you discover, not something you can define in advance. Spanner Graph and TigerGraph require DDL before the first byte of data. Purple8 lets you write first, understand your data, then optionally enforce a schema.

Purple8 Graph is proprietary software. All rights reserved.