Graph Analytics
Purple8 Graph ships two complementary analytics layers that run directly on the live graph — no separate data warehouse, no ETL pipeline, no external ML cluster.
| Layer | What it does | API |
|---|---|---|
| Graph Algorithms | Authority scoring, path analysis, cluster discovery | GET /algorithms/* |
| OLAP Analytics Engine | Structural aggregations, distributions, projections | AnalyticsEngine (Python SDK) |
Both layers are available from Desktop Pro upward.
Graph Algorithms
PageRank — influence & authority scoring
PageRank measures how often a node would be visited by a random graph traversal. High-scoring nodes are influential connectors — the most-cited documents, most-trusted knowledge sources, or most-connected entities in your graph.
Algorithm: Power iteration with dangling-node redistribution (Google's original method). Converges when the L1 delta between iterations falls below tolerance.
result = engine.pagerank(
damping=0.85, # probability of following an edge (default: 0.85)
max_iterations=100, # cap on iterations before giving up
tolerance=1e-6, # convergence threshold
top_k=20, # number of top nodes to surface
)
print(result.converged) # True/False
print(result.iterations) # how many rounds it took
for node in result.top_nodes:
print(node.node_id, node.score)REST API:
GET /algorithms/pagerank?damping=0.85&top_k=20Use cases:
- Rank documents by citation authority in a knowledge graph
- Score team members by collaboration centrality
- Find the most-referenced entities in an AI agent's memory graph
- Prioritise nodes for manual review or enrichment
Community Detection — cluster discovery
Community detection partitions the graph into groups of nodes that are more densely connected to each other than to the rest of the graph. Purple8 uses synchronous label propagation — fast (O(E) per pass), parameter-free, and reproducible with a seed.
Returns:
- Community ID per node
- Community size distribution
- Total number of communities
- Newman-Girvan modularity Q — a score in
[-1, 1]where values above0.3indicate well-defined clusters
result = engine.detect_communities(
max_iterations=50, # propagation rounds before stopping
seed=42, # optional — makes results reproducible
)
print(result.num_communities)
print(f"Modularity Q: {result.modularity:.4f}")
# Which community does a node belong to?
community_id = result.communities["node_abc"]
# Top communities by size
for c in sorted(result.community_sizes.items(), key=lambda x: -x[1])[:5]:
print(f"Community {c[0]}: {c[1]} nodes")REST API:
GET /algorithms/communities?max_iterations=50&seed=42&top_k=10Use cases:
- Discover topic clusters in a document or knowledge graph
- Map organisational clusters from collaboration or reporting relationships
- Identify customer segments from interaction graphs
- Detect anomalous isolated subgraphs (communities of size 1)
- Measure knowledge graph quality — low modularity means weak structure
Betweenness Centrality — critical connector identification
Betweenness centrality measures how often a node sits on the shortest path between other pairs of nodes. High-centrality nodes are bridges — remove them and the graph fragments.
Algorithm: Brandes' exact algorithm. Time complexity O(V·E) — suitable for graphs up to ~1M nodes. For very large graphs, use the top_k parameter to limit output without affecting computation.
result = engine.betweenness_centrality(
normalise=True, # divide by (n-1)(n-2) so scores are in [0, 1]
top_k=20,
)
for node in result.top_nodes:
print(node.node_id, f"{node.score:.6f}")REST API:
GET /algorithms/centrality?normalise=true&top_k=20Use cases:
- Find bottleneck nodes in a process or workflow graph
- Identify critical connectors in an access management graph (nodes that grant access to many paths)
- Surface bridge entities between otherwise separate knowledge clusters
- Detect single points of failure in infrastructure dependency graphs
Shortest Path — weighted path queries
Dijkstra weighted shortest path between any two nodes. Supports optional edge type filtering and a custom weight property.
result = engine.shortest_path(
source_id="node_a",
target_id="node_z",
weight_property="latency_ms", # optional — uniform weight if omitted
edge_type="DEPENDS_ON", # optional — restricts traversal to this edge type
)
print(result.found) # True/False
print(result.hops) # number of edges in the path
print(result.total_cost) # sum of weight_property along the path
print(result.path) # list of node IDs from source to targetREST API:
GET /nodes/{source_id}/shortest-path?target=node_z&weight_property=latency_msUse cases:
- Find the lowest-latency path through an infrastructure graph
- Determine the chain of approvals between a user and a resource (access management)
- Shortest dependency resolution path in a software component graph
- Minimum-cost routing in a logistics or supply chain graph
OLAP Analytics Engine
The AnalyticsEngine runs read-only OLAP-style computations over a live GraphEngine. All methods iterate over the storage layer — no data export required.
from purple8_graph import GraphEngine, AnalyticsEngine
engine = GraphEngine(storage)
ax = AnalyticsEngine()Node & edge aggregations
# How many nodes of each label exist?
ax.label_counts(engine)
# → {"Document": 12400, "Person": 830, "Concept": 4200}
# How many edges of each type?
ax.edge_type_counts(engine)
# → {"AUTHORED_BY": 8900, "REFERENCES": 31000, "TAGGED_WITH": 19200}
# Out-degree frequency histogram with min / max / mean
ax.degree_distribution(engine)
# → {"data": {0: 120, 1: 430, 2: 890, ...}, "min": 0, "max": 147, "mean": 4.3}
# Top-K nodes by degree (out / in / both)
ax.top_k_nodes_by_degree(engine, k=10, direction="out")
# → [("node_abc", 147), ("node_xyz", 134), ...]
# Average out-degree
ax.avg_degree(engine)
# → 4.3
# Graph density — ratio of actual to maximum possible directed edges
ax.density(engine)
# → 0.00031Structural analytics
# Weakly-connected components — find isolated subgraphs
result = ax.connected_components(engine)
# result.data → {"num_components": 3, "component_sizes": [12400, 8, 1], ...}Temporal analytics
# Group nodes (or edges) by a date property, bucketed by day / month / year
result = ax.temporal_grouping(
engine,
property_name="created_at",
interval="month",
entity="node",
)
# result.data → {"2026-01": 340, "2026-02": 890, "2026-03": 1200}Property analytics
# Numeric aggregation over any node or edge property
result = ax.property_stats(engine, property_name="score", entity="node")
# result.data → {"min": 0.01, "max": 0.99, "mean": 0.54, "count": 12400}Graph projection (for ML / BI pipelines)
GraphProjection extracts structured representations from the live graph for downstream use — pandas, networkx, PyTorch Geometric, or any BI tool.
from purple8_graph import GraphProjection
proj = GraphProjection()
# List of dicts for all "Document" nodes — ready for pandas
docs = proj.project_label(engine, label="Document")
df = pd.DataFrame(docs)
# Adjacency list: {node_id: [neighbour_ids]}
adj = proj.to_adjacency_list(engine)
# Edge list: [(source_id, target_id, edge_type)]
edges = proj.to_edge_list(engine)
# Node feature matrix: {node_id: {property: value, ...}}
features = proj.to_node_feature_matrix(engine, label="Document")Performance guidelines
| Algorithm | Complexity | Practical limit |
|---|---|---|
| PageRank | O(V + E) per iteration | 10M+ nodes in seconds |
| Community detection | O(E) per pass | 10M+ nodes in seconds |
| Betweenness centrality | O(V·E) | ~500K nodes comfortably; sample for larger |
| Shortest path (Dijkstra) | O((V + E) log V) | Real-time for most graphs |
| OLAP aggregations | O(V) or O(E) | Linear — handles full graph size |
Running analytics in production
For Cloud tiers, all algorithms run as background jobs on dedicated compute — they do not share resources with your query traffic. For Self-Hosted deployments, schedule heavy analytics (betweenness centrality on large graphs) during off-peak hours.
Web UI
The Analytics tab in the Purple8 web dashboard (Pro Cloud and above) provides a no-code interface for all three graph algorithms:
- PageRank — configurable damping factor and top-K, bar chart of top nodes
- Communities — run detection, see modularity score, browse community sizes
- Centrality — normalised betweenness scores, ranked node list
REST API access to all algorithms is available on all tiers from Desktop Pro upward.