Quickstart
PGCuts provides an sklearn-compatible API. If you’ve used
KMeans or SpectralClustering, you already know how to use PGCuts.
Basic usage
from pgcuts import HyCut
# X is an (N, D) numpy array of features (e.g., from DINOv2)
labels = HyCut(n_clusters=10).fit_predict(X)
That’s it. Under the hood, H-Cut:
Builds a KNN similarity graph on
XTrains a linear model to minimize the hypergeometric NCut bound
Returns hard cluster assignments via argmax
Comparison with sklearn
PGCuts is a drop-in alternative to sklearn’s clustering:
from sklearn.cluster import KMeans, SpectralClustering
from pgcuts import HyCut
X = ... # (N, D) feature matrix
# K-Means
labels_km = KMeans(n_clusters=10).fit_predict(X)
# Spectral Clustering
labels_sc = SpectralClustering(n_clusters=10).fit_predict(X)
# PGCuts (Hypergeometric NCut)
labels_hycut = HyCut(n_clusters=10).fit_predict(X)
Choosing the objective
PGCuts supports three graph cut objectives:
# Hypergeometric NCut (default) — best for most cases
HyCut(n_clusters=K, objective="hyp_ncut")
# Hypergeometric RatioCut — simpler, no degree binning
HyCut(n_clusters=K, objective="hyp_rcut")
# Probabilistic RatioCut — PRCut baseline
HyCut(n_clusters=K, objective="prcut")
When to use which:
hyp_ncut: Default. Best when cluster sizes are unbalanced or the graph has heterogeneous degree distribution.hyp_rcut: Simpler variant. Works well when clusters are roughly equal-sized.prcut: Original PRCut objective. Useful as a baseline.
Common options
model = HyCut(
n_clusters=10,
objective="hyp_ncut", # "hyp_ncut", "hyp_rcut", or "prcut"
n_neighbors=50, # KNN graph neighbors (default: 50)
steps=3000, # optimization steps (default: 3000)
lr=1e-3, # learning rate (default: 1e-3)
m=512, # polynomial degree for 2F1 bound
device="cuda", # "cuda" or "cpu"
)
labels = model.fit_predict(X)
# Access cut values after fitting
print(f"NCut: {model.ncut_:.4f}, RCut: {model.rcut_:.4f}")
Evaluation
from pgcuts import evaluate_clustering
results = evaluate_clustering(y_true, labels, n_clusters=10)
print(f"ACC: {results['accuracy']:.4f}")
print(f"NMI: {results['nmi']:.4f}")
Working with embeddings
PGCuts works best with pre-extracted embeddings from foundation models (DINOv2, CLIP, etc.):
import numpy as np
from pgcuts import HyCut
# Load pre-extracted features
X = np.load("features.npy") # (N, D) float32
y = np.load("labels.npy") # (N,) int, for evaluation only
model = HyCut(n_clusters=len(np.unique(y)))
labels = model.fit_predict(X)
Graph quality
Before clustering, check if the KNN graph captures class structure:
from pgcuts.graph import build_rbf_knn_graph
from pgcuts.metrics import compute_rcut_ncut
import numpy as np
W = build_rbf_knn_graph(X, n_neighbors=50)
# Graph quality Q: 1.0 = perfect, 0.0 = random
T = W.toarray() / W.sum(axis=1)
q = np.mean([T[i] @ (y == y[i]) for i in range(len(y))])
q_chance = np.sum((np.bincount(y) / len(y)) ** 2)
Q = (q - q_chance) / (1 - q_chance)
print(f"Graph quality Q = {Q:.3f}")
If Q < 0.3, the embeddings don’t separate classes well enough
for any graph-cut method to work. Try a better embedding model.