Schema & Data Model

Purple8 Graph operates on a data-first model — you can write data immediately without defining a schema. The schema is optional, and when you do define one, it can be discovered from existing data rather than declared upfront.

The spectrum

Purple8 operates across a full spectrum from completely schemaless to strictly enforced:

schemaless          warnings only         strictly enforced
    │                    │                      │
GraphEngine()       SchemaValidator(        SchemaValidator(
                      strict_mode=False)      strict_mode=True)

Pole 1 — Fully schemaless (default)

python

from purple8_graph import GraphEngine

engine = GraphEngine("./data")

# Any label, any property, any shape — accepted without restriction
engine.add_node("n1", labels=["Person"],   properties={"name": "Alice", "age": 30})
engine.add_node("n2", labels=["Document"], properties={"title": "Q4 Report", "pages": 42})
engine.add_node("n3", labels=["Whatever"], properties={"foo": "bar", "nested": {"a": 1}})

No schema file. No DDL. No migration. Just write.

Pole 2 — Warnings only

python

from purple8_graph.validation import SchemaValidator, ValidatingGraphEngine, GraphSchema, NodeSchema, PropertySchema, PropertyType

schema = GraphSchema(name="my_schema")
schema.add_node_schema(NodeSchema(
    label="Person",
    properties=[
        PropertySchema(name="name", type=PropertyType.STRING, required=True),
        PropertySchema(name="age",  type=PropertyType.INTEGER),
    ],
))

validator = SchemaValidator(strict_mode=False, allow_extra_properties=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)

# Validates but doesn't reject — logs warnings for unknown labels or missing required fields
validated_engine.add_node("n4", labels=["UnknownLabel"], properties={"x": 1})

Pole 3 — Strict enforcement

python

validator_strict = SchemaValidator(strict_mode=True, allow_extra_properties=False)
validated_engine = ValidatingGraphEngine(engine, validator_strict, schema)

# This raises ValidationError — "UnknownLabel" is not in the schema
validated_engine.add_node("n5", labels=["UnknownLabel"], properties={"x": 1})

# This raises ValidationError — "age" is the wrong type
validated_engine.add_node("n6", labels=["Person"], properties={"name": "Bob", "age": "thirty"})

Discovering the schema from existing data

The most important feature of Purple8's schema model: schema can be an output of discovery, not an input.

python

from purple8_graph.validation import create_schema_from_graph

# Build a knowledge graph from 10,000 documents — no schema upfront
engine = GraphEngine("./data")

# ... add thousands of nodes and edges from LLM extraction ...

# NOW infer the schema from what was actually written
schema = create_schema_from_graph(engine)

print(schema.node_schemas)
# → [Person, Organization, Location, Event, Document, ...]

print(schema.edge_schemas)
# → [WORKS_FOR, LOCATED_IN, ATTENDED, AUTHORED_BY, ...]

# Now enforce it going forward
validator = SchemaValidator(strict_mode=True)
validated_engine = ValidatingGraphEngine(engine, validator, schema)

create_schema_from_graph() scans all nodes and edges, infers property types from observed values, and returns a GraphSchema you can inspect, modify, and enforce.

LLM-inferred schema

For teams who want to define a schema from sample documents before any data exists:

python

from purple8_graph.genai import SchemaDetector, OpenAIProvider

provider = OpenAIProvider(api_key="...")
detector = SchemaDetector(provider)

# Feed sample documents — LLM infers entities and relationships
schema = detector.detect_schema(sample_documents[:50])

# Returns GraphSchema with NodeSchema + EdgeSchema inferred by the LLM
print(schema.node_schemas)   # → [Person, Organization, Document, ...]

The schema is an output of AI inference — not a prerequisite.

Property types

`PropertyType`	Python equivalent	Example
`STRING`	`str`	`"Alice"`
`INTEGER`	`int`	`42`
`FLOAT`	`float`	`3.14`
`BOOLEAN`	`bool`	`True`
`DATETIME`	`datetime`	`datetime(2026, 3, 25)`
`LIST`	`list`	`[1, 2, 3]`
`DICT`	`dict`	`{"key": "value"}`

Comparison with schema-first systems

System	Schema model	First write requires schema?	Retroactive schema inference?
Purple8	Data-first, optional strict mode	❌ No	✅ `create_schema_from_graph()`
Neo4j	Schema-optional (explicit constraints)	❌ No	❌ No
Kùzu	Schemaless	❌ No	❌ No
Spanner Graph	Schema-first (DDL required)	✅ Yes	❌ No
TigerGraph	Schema-first (DDL required)	✅ Yes	❌ No

Why this matters for AI workloads

When you build a knowledge graph from LLM-extracted entities and relationships, the schema is emergent — it's a property of your data that you discover, not something you can define in advance. Spanner Graph and TigerGraph require DDL before the first byte of data. Purple8 lets you write first, understand your data, then optionally enforce a schema.

Schema & Data Model ​

The spectrum ​

Pole 1 — Fully schemaless (default) ​

Pole 2 — Warnings only ​

Pole 3 — Strict enforcement ​

Discovering the schema from existing data ​

LLM-inferred schema ​

Property types ​

Comparison with schema-first systems ​

Schema & Data Model

The spectrum

Pole 1 — Fully schemaless (default)

Pole 2 — Warnings only

Pole 3 — Strict enforcement

Discovering the schema from existing data

LLM-inferred schema

Property types

Comparison with schema-first systems