Quickstart

PGCuts provides an sklearn-compatible API. If you’ve used KMeans or SpectralClustering, you already know how to use PGCuts.

Basic usage

from pgcuts import HyCut

# X is an (N, D) numpy array of features (e.g., from DINOv2)
labels = HyCut(n_clusters=10).fit_predict(X)

That’s it. Under the hood, H-Cut:

  1. Builds a KNN similarity graph on X

  2. Trains a linear model to minimize the hypergeometric NCut bound

  3. Returns hard cluster assignments via argmax

Comparison with sklearn

PGCuts is a drop-in alternative to sklearn’s clustering:

from sklearn.cluster import KMeans, SpectralClustering
from pgcuts import HyCut

X = ...  # (N, D) feature matrix

# K-Means
labels_km = KMeans(n_clusters=10).fit_predict(X)

# Spectral Clustering
labels_sc = SpectralClustering(n_clusters=10).fit_predict(X)

# PGCuts (Hypergeometric NCut)
labels_hycut = HyCut(n_clusters=10).fit_predict(X)

Choosing the objective

PGCuts supports three graph cut objectives:

# Hypergeometric NCut (default) — best for most cases
HyCut(n_clusters=K, objective="hyp_ncut")

# Hypergeometric RatioCut — simpler, no degree binning
HyCut(n_clusters=K, objective="hyp_rcut")

# Probabilistic RatioCut — PRCut baseline
HyCut(n_clusters=K, objective="prcut")

When to use which:

  • hyp_ncut: Default. Best when cluster sizes are unbalanced or the graph has heterogeneous degree distribution.

  • hyp_rcut: Simpler variant. Works well when clusters are roughly equal-sized.

  • prcut: Original PRCut objective. Useful as a baseline.

Common options

model = HyCut(
    n_clusters=10,
    objective="hyp_ncut",    # "hyp_ncut", "hyp_rcut", or "prcut"
    n_neighbors=50,          # KNN graph neighbors (default: 50)
    steps=3000,              # optimization steps (default: 3000)
    lr=1e-3,                 # learning rate (default: 1e-3)
    m=512,                   # polynomial degree for 2F1 bound
    device="cuda",           # "cuda" or "cpu"
)
labels = model.fit_predict(X)

# Access cut values after fitting
print(f"NCut: {model.ncut_:.4f}, RCut: {model.rcut_:.4f}")

Evaluation

from pgcuts import evaluate_clustering

results = evaluate_clustering(y_true, labels, n_clusters=10)
print(f"ACC: {results['accuracy']:.4f}")
print(f"NMI: {results['nmi']:.4f}")

Working with embeddings

PGCuts works best with pre-extracted embeddings from foundation models (DINOv2, CLIP, etc.):

import numpy as np
from pgcuts import HyCut

# Load pre-extracted features
X = np.load("features.npy")       # (N, D) float32
y = np.load("labels.npy")         # (N,) int, for evaluation only

model = HyCut(n_clusters=len(np.unique(y)))
labels = model.fit_predict(X)

Graph quality

Before clustering, check if the KNN graph captures class structure:

from pgcuts.graph import build_rbf_knn_graph
from pgcuts.metrics import compute_rcut_ncut
import numpy as np

W = build_rbf_knn_graph(X, n_neighbors=50)

# Graph quality Q: 1.0 = perfect, 0.0 = random
T = W.toarray() / W.sum(axis=1)
q = np.mean([T[i] @ (y == y[i]) for i in range(len(y))])
q_chance = np.sum((np.bincount(y) / len(y)) ** 2)
Q = (q - q_chance) / (1 - q_chance)
print(f"Graph quality Q = {Q:.3f}")

If Q < 0.3, the embeddings don’t separate classes well enough for any graph-cut method to work. Try a better embedding model.