Portfolio Optimization, Part 5: Hierarchical and Clustering Methods

The matrix-inversion problem

Classical mean-variance optimization has a quiet structural flaw: it inverts the full covariance matrix. That matrix is estimated from noisy data, it is often nearly singular when assets are correlated, and inverting it amplifies the noise. The result is the infamous instability of mean-variance weights — tiny changes in inputs produce wildly different, often absurdly concentrated portfolios. It is the "error maximizer" we met in Part 1, seen from the linear-algebra side.

Hierarchical methods, pioneered by Marcos López de Prado, sidestep the inversion entirely. Instead of treating all assets as one undifferentiated block, they first discover the structure of the market — which assets cluster together — and then allocate down through that structure. The outcome is more stable, more intuitive portfolios that survive out-of-sample noticeably better. As ever, output is reviewed research behind the gallery's gates.

HRP: hierarchical risk parity

Hierarchical Risk Parity (HRP) is the anchor of the family and proceeds in three moves:

Cluster. Build a tree (dendrogram) that groups assets by how they co-move, using the correlation structure as a distance.
Quasi-diagonalize. Reorder the covariance matrix so similar assets sit next to each other, revealing the block structure.
Recursive bisection. Split the tree top-down, allocating risk between each pair of branches by their relative riskiness, all the way to individual assets.

No matrix inversion, no expected-return forecast. HRP tends to spread risk sensibly across genuinely distinct groups instead of piling into whatever pair of assets happened to look most efficient in a noisy sample.

HERC, Nested Clusters, and Schur

The family extends HRP in three useful directions:

HERC (Hierarchical Equal Risk Contribution) marries the clustering tree with equal-risk-contribution allocation: it equalizes risk across clusters and within them, combining the stability of hierarchy with the discipline of risk parity. It also lets you swap in tail or drawdown risk measures (HERC on CVaR, on CDaR) rather than only variance — the same idea we met when budgeting on CVaR in Part 4.
Nested Clusters Optimization (NCO) optimizes within each cluster and then across clusters as a second stage. By solving small, well-conditioned sub-problems instead of one giant ill-conditioned one, NCO contains the instability and lets you use a conventional optimizer safely inside each cluster.
Schur Complementary Allocation is the newest idea: it uses the Schur complement of the covariance matrix to separate common from residual risk structure, preserving more of the full covariance information than HRP discards while keeping the stability benefits. Think of it as a bridge between hierarchical robustness and full-covariance optimality.

Reading the dendrogram

The signature artifact of this family is the dendrogram — the cluster tree — and reviewing it is how you sanity-check the whole result. Look at:

The cluster structure. Do the groupings make economic sense? If two assets you know are closely related sit in distant branches, the correlation estimate may be off.
The linkage method. How clusters are merged (single, complete, average, Ward linkage) changes the tree shape and therefore the weights; the gallery exposes it as a parameter.
Cluster-level concentration. Hierarchy spreads risk across clusters, but a single cluster can still dominate if it contains many assets — check the cluster-level risk contributions, not just the asset-level ones.

When hierarchy wins

Hierarchical methods are at their best with large, correlated universes where classical optimization is most fragile — broad equity books, multi-asset sleeves, anything where the covariance matrix is big and ill-conditioned. They are also a strong default when you have no return views and want a robust, explainable-by-structure allocation. They are less compelling for tiny universes (where there's little structure to discover) or when you have high-conviction views to express — that's the Bayesian family's job, coming up next.

The takeaway

Hierarchical methods replace fragile matrix inversion with structure discovery: HRP clusters and allocates through the tree, HERC adds equal-risk-contribution discipline, NCO contains instability by optimizing within clusters, and Schur allocation recovers more covariance information while staying robust. Read the dendrogram to trust the result. Next in the series: views and Bayesian methods — Black-Litterman, entropy pooling, and opinion pooling, for expressing genuine conviction without blowing up the portfolio.

Comments

Esther Howard

Apr 17, 2024

Until recently, the prevailing view assumed lorem ipsum was born as a nonsense text. It's not Latin though it looks like nothing.

Reply