ai-search 1 month ago

AI Search | The Insane Engineering of Deepseek V4

AI Search explores the technical architecture behind DeepSeek V4, detailing how a compact team achieved massive scale despite limited computational resources.

The analysis breaks down innovations in hybrid attention systems, manifold constrained hyperconnections, and optimized training pipelines that allow this 1.6 trillion parameter model to manage a 1 million token context window efficiently.


00:00 - Deepseek V4 intro

01:00 - Deepseek V4 specs

02:06 - The challenge of 1M context

04:16 - Hybrid attention

05:11 - CSA & sparse selection

06:50 - HCA

08:22 - Sliding window attention

10:44 - Insane efficiency gains

12:02 - Signal explosion

13:00 - Residual connections

13:52 - mHC

14:17 - ChatLLM

15:24 - mHC continued

17:54 - Muon

19:26 - Infra challenges

22:31 - Training challenges

24:09 - Anticipatory routing

25:24 - SOTA results


AI Search
ai never sleeps!