Skip to content

2026

mHC: How DeepSeek Scaled Residual Connections Without Breaking Training

Modern deep learning rests on the residual connection. After a decade of stacking layers deeper, researchers at DeepSeek asked a different question: what if we scaled width instead? Their answer, Manifold-Constrained Hyper-Connections (mHC), fixes a long-standing stability problem with width scaling.

In this post, I'll walk through the evolution from basic residuals to mHC, explaining why each step was necessary and how DeepSeek's solution actually works at scale.

Enterprise RAG Challenge 3: Winning Approaches for Autonomous AI Agents

The Enterprise RAG Challenge 3 (ECR3) just wrapped up. 524 teams, more than 341,000 agent runs, and only 0.4% of teams hit a perfect score. With the leaderboard and write-ups now public, I went through the winning solutions to figure out what the top teams did differently.

This post covers what ECR3 is, what the tasks looked like, and the patterns I kept seeing in the architectures that worked.

The Complete Guide to LLM Fine-Tuning in 2025: From Theory to Production

Most fine-tuning projects I've seen fail not in training, but in the steps before and after it: bad data, the wrong base model, no real eval. The actual training is the easy part. This guide is what I wish I'd had before my first serious fine-tune — when to do it, when not to, the methods that work today (LoRA, QLoRA, DoRA, ORPO), and how to get a model from a notebook to a serving endpoint.