Graph Construction ================== .. note:: If you use the :class:`~pgcuts.HyCut` class, graph construction is handled automatically. This page is for advanced users who want to build graphs manually or customize the pipeline. All PGCuts algorithms operate on a weighted graph where nodes are data points and edge weights encode similarity. The ``pgcuts.graph`` module provides the graph construction pipeline. Standard pipeline ----------------- The recommended function is :func:`~pgcuts.graph.build_rbf_knn_graph`, which performs the full pipeline: 1. **KNN distances** -- compute k-nearest neighbor distances 2. **Symmetrize** -- ``(W + W^T) / 2`` 3. **Row-normalize** -- ``W / W.sum(axis=1)`` 4. **Scale by degree** -- multiply by the number of neighbors per node 5. **Gaussian RBF** -- ``exp(-d^2 / 2)`` 6. **Final symmetrization** .. code-block:: python from pgcuts.graph import build_rbf_knn_graph # X: numpy array, shape (n, d) W = build_rbf_knn_graph(X, n_neighbors=100) # Returns: scipy sparse CSR matrix, shape (n, n) The result is a sparse symmetric similarity matrix suitable for all PGCuts losses. Choosing ``n_neighbors`` ~~~~~~~~~~~~~~~~~~~~~~~~ - Typical values: 20--200 - Larger values produce denser graphs with smoother cuts - Smaller values are more sensitive to local structure - A good starting point is ``n_neighbors=100`` Step-by-step construction -------------------------- For more control, use the individual functions: .. code-block:: python from pgcuts.graph import knn_graph, gaussian_rbf_kernel # Step 1: build symmetric KNN distance graph W_dist = knn_graph(X, n_neighbors=50, mode="distance") # Step 2: convert distances to RBF similarities W_sim = gaussian_rbf_kernel(W_dist, sigma=None) # auto bandwidth The ``sigma`` parameter controls the RBF bandwidth. When ``None``, it uses the median of per-row maximum distances. Edge pairs for mini-batch training ----------------------------------- Training requires sampling edges from the graph. Extract the edge list and use :class:`~pgcuts.utils.data.ShuffledRangeDataset`: .. code-block:: python import numpy as np from torch.utils.data import DataLoader from pgcuts.utils.data import ShuffledRangeDataset pairs = np.array(W.nonzero()).T # shape (E, 2) dataset = ShuffledRangeDataset(n=pairs.shape[0], k=8192) loader = DataLoader(dataset, batch_size=1, num_workers=4) for batch_idx in loader: batch_pairs = pairs[batch_idx[0]] # (k, 2) # ... training step ... Extracting edge weights and node maps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For each batch of edge pairs, use :func:`~pgcuts.utils.get_pairs_unique_map` to deduplicate nodes (since many edges share endpoints): .. code-block:: python from pgcuts.utils import get_pairs_unique_map unique_idx, left_idx, right_idx = get_pairs_unique_map(batch_pairs) # unique_idx: indices of unique nodes in X # left_idx: indices into unique_idx for left endpoints # right_idx: indices into unique_idx for right endpoints x_batch = X_tensor[unique_idx] # only forward unique nodes p = softmax(network(x_batch)) p_left = p[left_idx] p_right = p[right_idx] Vertex degrees -------------- For NCut, you need per-vertex degrees: .. code-block:: python import numpy as np degrees = np.array(W.sum(axis=1)).flatten() # shape (n,)