Graph Construction

Note

If you use the HyCut class, graph construction is handled automatically. This page is for advanced users who want to build graphs manually or customize the pipeline.

All PGCuts algorithms operate on a weighted graph where nodes are data points and edge weights encode similarity. The pgcuts.graph module provides the graph construction pipeline.

Standard pipeline

The recommended function is build_rbf_knn_graph(), which performs the full pipeline:

  1. KNN distances – compute k-nearest neighbor distances

  2. Symmetrize(W + W^T) / 2

  3. Row-normalizeW / W.sum(axis=1)

  4. Scale by degree – multiply by the number of neighbors per node

  5. Gaussian RBFexp(-d^2 / 2)

  6. Final symmetrization

from pgcuts.graph import build_rbf_knn_graph

# X: numpy array, shape (n, d)
W = build_rbf_knn_graph(X, n_neighbors=100)
# Returns: scipy sparse CSR matrix, shape (n, n)

The result is a sparse symmetric similarity matrix suitable for all PGCuts losses.

Choosing n_neighbors

  • Typical values: 20–200

  • Larger values produce denser graphs with smoother cuts

  • Smaller values are more sensitive to local structure

  • A good starting point is n_neighbors=100

Step-by-step construction

For more control, use the individual functions:

from pgcuts.graph import knn_graph, gaussian_rbf_kernel

# Step 1: build symmetric KNN distance graph
W_dist = knn_graph(X, n_neighbors=50, mode="distance")

# Step 2: convert distances to RBF similarities
W_sim = gaussian_rbf_kernel(W_dist, sigma=None)  # auto bandwidth

The sigma parameter controls the RBF bandwidth. When None, it uses the median of per-row maximum distances.

Edge pairs for mini-batch training

Training requires sampling edges from the graph. Extract the edge list and use ShuffledRangeDataset:

import numpy as np
from torch.utils.data import DataLoader
from pgcuts.utils.data import ShuffledRangeDataset

pairs = np.array(W.nonzero()).T  # shape (E, 2)
dataset = ShuffledRangeDataset(n=pairs.shape[0], k=8192)
loader = DataLoader(dataset, batch_size=1, num_workers=4)

for batch_idx in loader:
    batch_pairs = pairs[batch_idx[0]]  # (k, 2)
    # ... training step ...

Extracting edge weights and node maps

For each batch of edge pairs, use get_pairs_unique_map() to deduplicate nodes (since many edges share endpoints):

from pgcuts.utils import get_pairs_unique_map

unique_idx, left_idx, right_idx = get_pairs_unique_map(batch_pairs)
# unique_idx: indices of unique nodes in X
# left_idx: indices into unique_idx for left endpoints
# right_idx: indices into unique_idx for right endpoints

x_batch = X_tensor[unique_idx]   # only forward unique nodes
p = softmax(network(x_batch))
p_left = p[left_idx]
p_right = p[right_idx]

Vertex degrees

For NCut, you need per-vertex degrees:

import numpy as np

degrees = np.array(W.sum(axis=1)).flatten()  # shape (n,)