Schema & Data Model
Purple8 Graph operates on a data-first model — you can write data immediately without defining a schema. The schema is optional, and when you do define one, it can be discovered from existing data rather than declared upfront.
The spectrum
Purple8 operates across a full spectrum from completely schemaless to strictly enforced:
schemaless warnings only strictly enforced
│ │ │
GraphEngine() SchemaValidator( SchemaValidator(
strict_mode=False) strict_mode=True)Pole 1 — Fully schemaless (default)
from purple8_graph import GraphEngine
engine = GraphEngine("./data")
# Any label, any property, any shape — accepted without restriction
engine.add_node("n1", labels=["Person"], properties={"name": "Alice", "age": 30})
engine.add_node("n2", labels=["Document"], properties={"title": "Q4 Report", "pages": 42})
engine.add_node("n3", labels=["Whatever"], properties={"foo": "bar", "nested": {"a": 1}})No schema file. No DDL. No migration. Just write.
Pole 2 — Warnings only
from purple8_graph.validation import SchemaValidator, ValidatingGraphEngine, GraphSchema, NodeSchema, PropertySchema, PropertyType
schema = GraphSchema(name="my_schema")
schema.add_node_schema(NodeSchema(
label="Person",
properties=[
PropertySchema(name="name", type=PropertyType.STRING, required=True),
PropertySchema(name="age", type=PropertyType.INTEGER),
],
))
validator = SchemaValidator(strict_mode=False, allow_extra_properties=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)
# Validates but doesn't reject — logs warnings for unknown labels or missing required fields
validated_engine.add_node("n4", labels=["UnknownLabel"], properties={"x": 1})Pole 3 — Strict enforcement
validator_strict = SchemaValidator(strict_mode=True, allow_extra_properties=False)
validated_engine = ValidatingGraphEngine(engine, validator_strict, schema)
# This raises ValidationError — "UnknownLabel" is not in the schema
validated_engine.add_node("n5", labels=["UnknownLabel"], properties={"x": 1})
# This raises ValidationError — "age" is the wrong type
validated_engine.add_node("n6", labels=["Person"], properties={"name": "Bob", "age": "thirty"})Discovering the schema from existing data
The most important feature of Purple8's schema model: schema can be an output of discovery, not an input.
from purple8_graph.validation import create_schema_from_graph
# Build a knowledge graph from 10,000 documents — no schema upfront
engine = GraphEngine("./data")
# ... add thousands of nodes and edges from LLM extraction ...
# NOW infer the schema from what was actually written
schema = create_schema_from_graph(engine)
print(schema.node_schemas)
# → [Person, Organization, Location, Event, Document, ...]
print(schema.edge_schemas)
# → [WORKS_FOR, LOCATED_IN, ATTENDED, AUTHORED_BY, ...]
# Now enforce it going forward
validator = SchemaValidator(strict_mode=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)create_schema_from_graph() scans all nodes and edges, infers property types from observed values, and returns a GraphSchema you can inspect, modify, and enforce.
LLM-inferred schema
For teams who want to define a schema from sample documents before any data exists:
from purple8_graph.genai import SchemaDetector, OpenAIProvider
provider = OpenAIProvider(api_key="...")
detector = SchemaDetector(provider)
# Feed sample documents — LLM infers entities and relationships
schema = detector.detect_schema(sample_documents[:50])
# Returns GraphSchema with NodeSchema + EdgeSchema inferred by the LLM
print(schema.node_schemas) # → [Person, Organization, Document, ...]The schema is an output of AI inference — not a prerequisite.
Property types
PropertyType | Python equivalent | Example |
|---|---|---|
STRING | str | "Alice" |
INTEGER | int | 42 |
FLOAT | float | 3.14 |
BOOLEAN | bool | True |
DATETIME | datetime | datetime(2026, 3, 25) |
LIST | list | [1, 2, 3] |
DICT | dict | {"key": "value"} |
Comparison with schema-first systems
| System | Schema model | First write requires schema? | Retroactive schema inference? |
|---|---|---|---|
| Purple8 | Data-first, optional strict mode | ❌ No | ✅ create_schema_from_graph() |
| Neo4j | Schema-optional (explicit constraints) | ❌ No | ❌ No |
| Kùzu | Schemaless | ❌ No | ❌ No |
| Spanner Graph | Schema-first (DDL required) | ✅ Yes | ❌ No |
| TigerGraph | Schema-first (DDL required) | ✅ Yes | ❌ No |
Why this matters for AI workloads
When you build a knowledge graph from LLM-extracted entities and relationships, the schema is emergent — it's a property of your data that you discover, not something you can define in advance. Spanner Graph and TigerGraph require DDL before the first byte of data. Purple8 lets you write first, understand your data, then optionally enforce a schema.