Skip to content

Graph Analytics

Purple8 Graph ships two complementary analytics layers that run directly on the live graph — no separate data warehouse, no ETL pipeline, no external ML cluster.

LayerWhat it doesAPI
Graph AlgorithmsAuthority scoring, path analysis, cluster discoveryGET /algorithms/*
OLAP Analytics EngineStructural aggregations, distributions, projectionsAnalyticsEngine (Python SDK)

Both layers are available from Desktop Pro upward.


Graph Algorithms

PageRank — influence & authority scoring

PageRank measures how often a node would be visited by a random graph traversal. High-scoring nodes are influential connectors — the most-cited documents, most-trusted knowledge sources, or most-connected entities in your graph.

Algorithm: Power iteration with dangling-node redistribution (Google's original method). Converges when the L1 delta between iterations falls below tolerance.

python
result = engine.pagerank(
    damping=0.85,          # probability of following an edge (default: 0.85)
    max_iterations=100,    # cap on iterations before giving up
    tolerance=1e-6,        # convergence threshold
    top_k=20,              # number of top nodes to surface
)

print(result.converged)    # True/False
print(result.iterations)   # how many rounds it took
for node in result.top_nodes:
    print(node.node_id, node.score)

REST API:

bash
GET /algorithms/pagerank?damping=0.85&top_k=20

Use cases:

  • Rank documents by citation authority in a knowledge graph
  • Score team members by collaboration centrality
  • Find the most-referenced entities in an AI agent's memory graph
  • Prioritise nodes for manual review or enrichment

Community Detection — cluster discovery

Community detection partitions the graph into groups of nodes that are more densely connected to each other than to the rest of the graph. Purple8 uses synchronous label propagation — fast (O(E) per pass), parameter-free, and reproducible with a seed.

Returns:

  • Community ID per node
  • Community size distribution
  • Total number of communities
  • Newman-Girvan modularity Q — a score in [-1, 1] where values above 0.3 indicate well-defined clusters
python
result = engine.detect_communities(
    max_iterations=50,     # propagation rounds before stopping
    seed=42,               # optional — makes results reproducible
)

print(result.num_communities)
print(f"Modularity Q: {result.modularity:.4f}")

# Which community does a node belong to?
community_id = result.communities["node_abc"]

# Top communities by size
for c in sorted(result.community_sizes.items(), key=lambda x: -x[1])[:5]:
    print(f"Community {c[0]}: {c[1]} nodes")

REST API:

bash
GET /algorithms/communities?max_iterations=50&seed=42&top_k=10

Use cases:

  • Discover topic clusters in a document or knowledge graph
  • Map organisational clusters from collaboration or reporting relationships
  • Identify customer segments from interaction graphs
  • Detect anomalous isolated subgraphs (communities of size 1)
  • Measure knowledge graph quality — low modularity means weak structure

Betweenness Centrality — critical connector identification

Betweenness centrality measures how often a node sits on the shortest path between other pairs of nodes. High-centrality nodes are bridges — remove them and the graph fragments.

Algorithm: Brandes' exact algorithm. Time complexity O(V·E) — suitable for graphs up to ~1M nodes. For very large graphs, use the top_k parameter to limit output without affecting computation.

python
result = engine.betweenness_centrality(
    normalise=True,   # divide by (n-1)(n-2) so scores are in [0, 1]
    top_k=20,
)

for node in result.top_nodes:
    print(node.node_id, f"{node.score:.6f}")

REST API:

bash
GET /algorithms/centrality?normalise=true&top_k=20

Use cases:

  • Find bottleneck nodes in a process or workflow graph
  • Identify critical connectors in an access management graph (nodes that grant access to many paths)
  • Surface bridge entities between otherwise separate knowledge clusters
  • Detect single points of failure in infrastructure dependency graphs

Shortest Path — weighted path queries

Dijkstra weighted shortest path between any two nodes. Supports optional edge type filtering and a custom weight property.

python
result = engine.shortest_path(
    source_id="node_a",
    target_id="node_z",
    weight_property="latency_ms",   # optional — uniform weight if omitted
    edge_type="DEPENDS_ON",         # optional — restricts traversal to this edge type
)

print(result.found)        # True/False
print(result.hops)         # number of edges in the path
print(result.total_cost)   # sum of weight_property along the path
print(result.path)         # list of node IDs from source to target

REST API:

bash
GET /nodes/{source_id}/shortest-path?target=node_z&weight_property=latency_ms

Use cases:

  • Find the lowest-latency path through an infrastructure graph
  • Determine the chain of approvals between a user and a resource (access management)
  • Shortest dependency resolution path in a software component graph
  • Minimum-cost routing in a logistics or supply chain graph

OLAP Analytics Engine

The AnalyticsEngine runs read-only OLAP-style computations over a live GraphEngine. All methods iterate over the storage layer — no data export required.

python
from purple8_graph import GraphEngine, AnalyticsEngine

engine = GraphEngine(storage)
ax = AnalyticsEngine()

Node & edge aggregations

python
# How many nodes of each label exist?
ax.label_counts(engine)
# → {"Document": 12400, "Person": 830, "Concept": 4200}

# How many edges of each type?
ax.edge_type_counts(engine)
# → {"AUTHORED_BY": 8900, "REFERENCES": 31000, "TAGGED_WITH": 19200}

# Out-degree frequency histogram with min / max / mean
ax.degree_distribution(engine)
# → {"data": {0: 120, 1: 430, 2: 890, ...}, "min": 0, "max": 147, "mean": 4.3}

# Top-K nodes by degree (out / in / both)
ax.top_k_nodes_by_degree(engine, k=10, direction="out")
# → [("node_abc", 147), ("node_xyz", 134), ...]

# Average out-degree
ax.avg_degree(engine)
# → 4.3

# Graph density — ratio of actual to maximum possible directed edges
ax.density(engine)
# → 0.00031

Structural analytics

python
# Weakly-connected components — find isolated subgraphs
result = ax.connected_components(engine)
# result.data → {"num_components": 3, "component_sizes": [12400, 8, 1], ...}

Temporal analytics

python
# Group nodes (or edges) by a date property, bucketed by day / month / year
result = ax.temporal_grouping(
    engine,
    property_name="created_at",
    interval="month",
    entity="node",
)
# result.data → {"2026-01": 340, "2026-02": 890, "2026-03": 1200}

Property analytics

python
# Numeric aggregation over any node or edge property
result = ax.property_stats(engine, property_name="score", entity="node")
# result.data → {"min": 0.01, "max": 0.99, "mean": 0.54, "count": 12400}

Graph projection (for ML / BI pipelines)

GraphProjection extracts structured representations from the live graph for downstream use — pandas, networkx, PyTorch Geometric, or any BI tool.

python
from purple8_graph import GraphProjection

proj = GraphProjection()

# List of dicts for all "Document" nodes — ready for pandas
docs = proj.project_label(engine, label="Document")
df = pd.DataFrame(docs)

# Adjacency list: {node_id: [neighbour_ids]}
adj = proj.to_adjacency_list(engine)

# Edge list: [(source_id, target_id, edge_type)]
edges = proj.to_edge_list(engine)

# Node feature matrix: {node_id: {property: value, ...}}
features = proj.to_node_feature_matrix(engine, label="Document")

Performance guidelines

AlgorithmComplexityPractical limit
PageRankO(V + E) per iteration10M+ nodes in seconds
Community detectionO(E) per pass10M+ nodes in seconds
Betweenness centralityO(V·E)~500K nodes comfortably; sample for larger
Shortest path (Dijkstra)O((V + E) log V)Real-time for most graphs
OLAP aggregationsO(V) or O(E)Linear — handles full graph size

Running analytics in production

For Cloud tiers, all algorithms run as background jobs on dedicated compute — they do not share resources with your query traffic. For Self-Hosted deployments, schedule heavy analytics (betweenness centrality on large graphs) during off-peak hours.


Web UI

The Analytics tab in the Purple8 web dashboard (Pro Cloud and above) provides a no-code interface for all three graph algorithms:

  • PageRank — configurable damping factor and top-K, bar chart of top nodes
  • Communities — run detection, see modularity score, browse community sizes
  • Centrality — normalised betweenness scores, ranked node list

REST API access to all algorithms is available on all tiers from Desktop Pro upward.

Purple8 Graph is proprietary software. All rights reserved.