Skip to content

Paper Review

mHC: How DeepSeek Scaled Residual Connections Without Breaking Training

The success of modern deep learning rests on a deceptively simple idea: the residual connection. Yet after a decade of stacking layers deeper and deeper, researchers at DeepSeek asked a different question—what if we could scale width instead? Their answer, Manifold-Constrained Hyper-Connections (mHC), solves a fundamental instability problem that has blocked this path for years.

This article breaks down the evolution from basic residuals to mHC, explaining why each step was necessary and what makes DeepSeek's solution work at scale.