Graph Construction
Note
If you use the HyCut class, graph construction is
handled automatically. This page is for advanced users who want to build
graphs manually or customize the pipeline.
All PGCuts algorithms operate on a weighted graph where nodes are data points
and edge weights encode similarity. The pgcuts.graph module provides the
graph construction pipeline.
Standard pipeline
The recommended function is build_rbf_knn_graph(), which
performs the full pipeline:
KNN distances – compute k-nearest neighbor distances
Symmetrize –
(W + W^T) / 2Row-normalize –
W / W.sum(axis=1)Scale by degree – multiply by the number of neighbors per node
Gaussian RBF –
exp(-d^2 / 2)Final symmetrization
from pgcuts.graph import build_rbf_knn_graph
# X: numpy array, shape (n, d)
W = build_rbf_knn_graph(X, n_neighbors=100)
# Returns: scipy sparse CSR matrix, shape (n, n)
The result is a sparse symmetric similarity matrix suitable for all PGCuts losses.
Choosing n_neighbors
Typical values: 20–200
Larger values produce denser graphs with smoother cuts
Smaller values are more sensitive to local structure
A good starting point is
n_neighbors=100
Step-by-step construction
For more control, use the individual functions:
from pgcuts.graph import knn_graph, gaussian_rbf_kernel
# Step 1: build symmetric KNN distance graph
W_dist = knn_graph(X, n_neighbors=50, mode="distance")
# Step 2: convert distances to RBF similarities
W_sim = gaussian_rbf_kernel(W_dist, sigma=None) # auto bandwidth
The sigma parameter controls the RBF bandwidth. When None, it uses the
median of per-row maximum distances.
Edge pairs for mini-batch training
Training requires sampling edges from the graph. Extract the edge list and
use ShuffledRangeDataset:
import numpy as np
from torch.utils.data import DataLoader
from pgcuts.utils.data import ShuffledRangeDataset
pairs = np.array(W.nonzero()).T # shape (E, 2)
dataset = ShuffledRangeDataset(n=pairs.shape[0], k=8192)
loader = DataLoader(dataset, batch_size=1, num_workers=4)
for batch_idx in loader:
batch_pairs = pairs[batch_idx[0]] # (k, 2)
# ... training step ...
Extracting edge weights and node maps
For each batch of edge pairs, use get_pairs_unique_map() to
deduplicate nodes (since many edges share endpoints):
from pgcuts.utils import get_pairs_unique_map
unique_idx, left_idx, right_idx = get_pairs_unique_map(batch_pairs)
# unique_idx: indices of unique nodes in X
# left_idx: indices into unique_idx for left endpoints
# right_idx: indices into unique_idx for right endpoints
x_batch = X_tensor[unique_idx] # only forward unique nodes
p = softmax(network(x_batch))
p_left = p[left_idx]
p_right = p[right_idx]
Vertex degrees
For NCut, you need per-vertex degrees:
import numpy as np
degrees = np.array(W.sum(axis=1)).flatten() # shape (n,)