Graph Analytics

Purple8 Graph ships two complementary analytics layers that run directly on the live graph — no separate data warehouse, no ETL pipeline, no external ML cluster.

Layer	What it does	API
Graph Algorithms	Authority scoring, path analysis, cluster discovery	`GET /algorithms/*`
OLAP Analytics Engine	Structural aggregations, distributions, projections	`AnalyticsEngine` (Python SDK)

Both layers are available from Desktop Pro upward.

Graph Algorithms

PageRank — influence & authority scoring

PageRank measures how often a node would be visited by a random graph traversal. High-scoring nodes are influential connectors — the most-cited documents, most-trusted knowledge sources, or most-connected entities in your graph.

Algorithm: Power iteration with dangling-node redistribution (Google's original method). Converges when the L1 delta between iterations falls below tolerance.

python

result = engine.pagerank(
    damping=0.85,          # probability of following an edge (default: 0.85)
    max_iterations=100,    # cap on iterations before giving up
    tolerance=1e-6,        # convergence threshold
    top_k=20,              # number of top nodes to surface
)

print(result.converged)    # True/False
print(result.iterations)   # how many rounds it took
for node in result.top_nodes:
    print(node.node_id, node.score)

REST API:

bash

GET /algorithms/pagerank?damping=0.85&top_k=20

Use cases:

Rank documents by citation authority in a knowledge graph
Score team members by collaboration centrality
Find the most-referenced entities in an AI agent's memory graph
Prioritise nodes for manual review or enrichment

Community Detection — cluster discovery

Community detection partitions the graph into groups of nodes that are more densely connected to each other than to the rest of the graph. Purple8 uses synchronous label propagation — fast (O(E) per pass), parameter-free, and reproducible with a seed.

Returns:

Community ID per node
Community size distribution
Total number of communities
Newman-Girvan modularity Q — a score in [-1, 1] where values above 0.3 indicate well-defined clusters

python

result = engine.detect_communities(
    max_iterations=50,     # propagation rounds before stopping
    seed=42,               # optional — makes results reproducible
)

print(result.num_communities)
print(f"Modularity Q: {result.modularity:.4f}")

# Which community does a node belong to?
community_id = result.communities["node_abc"]

# Top communities by size
for c in sorted(result.community_sizes.items(), key=lambda x: -x[1])[:5]:
    print(f"Community {c[0]}: {c[1]} nodes")

REST API:

bash

GET /algorithms/communities?max_iterations=50&seed=42&top_k=10

Use cases:

Discover topic clusters in a document or knowledge graph
Map organisational clusters from collaboration or reporting relationships
Identify customer segments from interaction graphs
Detect anomalous isolated subgraphs (communities of size 1)
Measure knowledge graph quality — low modularity means weak structure

Betweenness Centrality — critical connector identification

Betweenness centrality measures how often a node sits on the shortest path between other pairs of nodes. High-centrality nodes are bridges — remove them and the graph fragments.

Algorithm: Brandes' exact algorithm. Time complexity O(V·E) — suitable for graphs up to ~1M nodes. For very large graphs, use the top_k parameter to limit output without affecting computation.

python

result = engine.betweenness_centrality(
    normalise=True,   # divide by (n-1)(n-2) so scores are in [0, 1]
    top_k=20,
)

for node in result.top_nodes:
    print(node.node_id, f"{node.score:.6f}")

REST API:

bash

GET /algorithms/centrality?normalise=true&top_k=20

Use cases:

Find bottleneck nodes in a process or workflow graph
Identify critical connectors in an access management graph (nodes that grant access to many paths)
Surface bridge entities between otherwise separate knowledge clusters
Detect single points of failure in infrastructure dependency graphs

Shortest Path — weighted path queries

Dijkstra weighted shortest path between any two nodes. Supports optional edge type filtering and a custom weight property.

python

result = engine.shortest_path(
    source_id="node_a",
    target_id="node_z",
    weight_property="latency_ms",   # optional — uniform weight if omitted
    edge_type="DEPENDS_ON",         # optional — restricts traversal to this edge type
)

print(result.found)        # True/False
print(result.hops)         # number of edges in the path
print(result.total_cost)   # sum of weight_property along the path
print(result.path)         # list of node IDs from source to target

REST API:

bash

GET /nodes/{source_id}/shortest-path?target=node_z&weight_property=latency_ms

Use cases:

Find the lowest-latency path through an infrastructure graph
Determine the chain of approvals between a user and a resource (access management)
Shortest dependency resolution path in a software component graph
Minimum-cost routing in a logistics or supply chain graph

OLAP Analytics Engine

The AnalyticsEngine runs read-only OLAP-style computations over a live GraphEngine. All methods iterate over the storage layer — no data export required.

python

from purple8_graph import GraphEngine, AnalyticsEngine

engine = GraphEngine(storage)
ax = AnalyticsEngine()

Node & edge aggregations

python

# How many nodes of each label exist?
ax.label_counts(engine)
# → {"Document": 12400, "Person": 830, "Concept": 4200}

# How many edges of each type?
ax.edge_type_counts(engine)
# → {"AUTHORED_BY": 8900, "REFERENCES": 31000, "TAGGED_WITH": 19200}

# Out-degree frequency histogram with min / max / mean
ax.degree_distribution(engine)
# → {"data": {0: 120, 1: 430, 2: 890, ...}, "min": 0, "max": 147, "mean": 4.3}

# Top-K nodes by degree (out / in / both)
ax.top_k_nodes_by_degree(engine, k=10, direction="out")
# → [("node_abc", 147), ("node_xyz", 134), ...]

# Average out-degree
ax.avg_degree(engine)
# → 4.3

# Graph density — ratio of actual to maximum possible directed edges
ax.density(engine)
# → 0.00031

Structural analytics

python

# Weakly-connected components — find isolated subgraphs
result = ax.connected_components(engine)
# result.data → {"num_components": 3, "component_sizes": [12400, 8, 1], ...}

Temporal analytics

python

# Group nodes (or edges) by a date property, bucketed by day / month / year
result = ax.temporal_grouping(
    engine,
    property_name="created_at",
    interval="month",
    entity="node",
)
# result.data → {"2026-01": 340, "2026-02": 890, "2026-03": 1200}

Property analytics

python

# Numeric aggregation over any node or edge property
result = ax.property_stats(engine, property_name="score", entity="node")
# result.data → {"min": 0.01, "max": 0.99, "mean": 0.54, "count": 12400}

Graph projection (for ML / BI pipelines)

GraphProjection extracts structured representations from the live graph for downstream use — pandas, networkx, PyTorch Geometric, or any BI tool.

python

from purple8_graph import GraphProjection

proj = GraphProjection()

# List of dicts for all "Document" nodes — ready for pandas
docs = proj.project_label(engine, label="Document")
df = pd.DataFrame(docs)

# Adjacency list: {node_id: [neighbour_ids]}
adj = proj.to_adjacency_list(engine)

# Edge list: [(source_id, target_id, edge_type)]
edges = proj.to_edge_list(engine)

# Node feature matrix: {node_id: {property: value, ...}}
features = proj.to_node_feature_matrix(engine, label="Document")

Performance guidelines

Algorithm	Complexity	Practical limit
PageRank	O(V + E) per iteration	10M+ nodes in seconds
Community detection	O(E) per pass	10M+ nodes in seconds
Betweenness centrality	O(V·E)	~500K nodes comfortably; sample for larger
Shortest path (Dijkstra)	O((V + E) log V)	Real-time for most graphs
OLAP aggregations	O(V) or O(E)	Linear — handles full graph size

Running analytics in production

For Cloud tiers, all algorithms run as background jobs on dedicated compute — they do not share resources with your query traffic. For Self-Hosted deployments, schedule heavy analytics (betweenness centrality on large graphs) during off-peak hours.

Web UI

The Analytics tab in the Purple8 web dashboard (Pro Cloud and above) provides a no-code interface for all three graph algorithms:

PageRank — configurable damping factor and top-K, bar chart of top nodes
Communities — run detection, see modularity score, browse community sizes
Centrality — normalised betweenness scores, ranked node list

REST API access to all algorithms is available on all tiers from Desktop Pro upward.

Graph Analytics ​

Graph Algorithms ​

PageRank — influence & authority scoring ​

Community Detection — cluster discovery ​

Betweenness Centrality — critical connector identification ​

Shortest Path — weighted path queries ​

OLAP Analytics Engine ​

Node & edge aggregations ​

Structural analytics ​

Temporal analytics ​

Property analytics ​

Graph projection (for ML / BI pipelines) ​

Performance guidelines ​

Web UI ​

Graph Analytics

Graph Algorithms

PageRank — influence & authority scoring

Community Detection — cluster discovery

Betweenness Centrality — critical connector identification

Shortest Path — weighted path queries

OLAP Analytics Engine

Node & edge aggregations

Structural analytics

Temporal analytics

Property analytics

Graph projection (for ML / BI pipelines)

Performance guidelines

Web UI