DeepSeek and the Effects of GPU Export Controls

AljwadhJanuary 23, 2025

0 1,589 2 minutes read

Last week, DeepSeek unveiled their V3 model, which was trained on 2,048 H800 GPUs – a fraction of the hardware used by OpenAI or Meta. DeepSeek claims that their model matches or exceeds many benchmarks set by GPT-4 and Claude

What’s interesting is not just the results, but how they got there.

The Numbers Game

Let’s look at the raw numbers:

Training cost: $5.5M (vs $40M for GPT-4)
GPU count: 2,048 H800s (vs estimated 20,000+ H100s for large labs)
Parameters: 671B
Training: 2.788M GPU hours

Recent research shows model training costs growing 2.4x per year since 2016. Everyone believes you need more GPU clusters to compete at the frontier. DeepSeek suggests otherwise.

Export Controls: Task Failed?

The US has banned China’s high-end GPU exports to slow its AI development. DeepSeek should work on H800s – disabled versions of the H100 with half the bandwidth. But this restraint may have accidentally spurred innovation.

Instead of throwing computation at the problem, Deepseek focuses on architectural efficiency:

FP8 accuracy training
Algorithmic infrastructure optimization
New training frameworks

They can’t access unlimited hardware, so they make their hardware smarter. They seem forced to solve another, possibly more valuable problem.

The High-Flyer Factor

Context is important though. DeepSeek is no ordinary startup – they are backed by High-Flyer, an $8B funding round. Their CEO Liang Wenfeng built High-Flyer from scratch and seems to be focused on fundamental research on quick profits:

“If the goal is to create applications, using the Llama structure for quick product deployment makes sense. But our destination is AGI, which means we need to learn new structures of the model to realize stronger model capabilities with limited resources.”

Beyond the Hype

We must be careful about over-interpreting these results. Yes, DeepSeek achieves impressive efficiency. No, this does not mean that export controls have “backfired” or that they have broken some magic formula.

What this shows is that the road to better AI isn’t just about throwing more GPUs at the problem. There is still huge room for fundamental improvements in how we train these models.

For developers, this is actually exciting news. This suggests that you don’t need a hyperscaler budget to do meaningful work on the frontier. True innovations can come from the resource-constrained, not the resource-rich.

The Road Ahead

The DeepSeek paper mentions that they are working on “breaking the architectural limitations of transformers.” Given their track record with efficiency improvements, it’s worth a look.

https://api.mightyshare.io/v1/MSeTLvoDQQXTffir/a9bfed6ce8824918dd41aebd8932c4a13b14a0dff6ff0c40ce756bc3a3f55341/jpeg?cache=true&height=630&width=1200&template=standard-1&template_values=%5B%7B%22name%22%3A%22google_font%22%2C%22google_font%22%3A%22%22%7D%2C%7B%22name%22%3A%22logo_width%22%2C%22text%22%3A%22%22%7D%2C%7B%22name%22%3A%22primary_color%22%2C%22color%22%3A%22%23222a40%22%7D%2C%7B%22name%22%3A%22logo%22%2C%22image_url%22%3A%22https%253A%252F%252Fwww.vincentschmalbach.com%252Fwp-content%252Fuploads%252F2021%252F05%252FVincent-Schmalbach-draft01.jpg%22%7D%2C%7B%22name%22%3A%22background%22%2C%22image_url%22%3A%22https%253A%252F%252Fwww.vincentschmalbach.com%252Fwp-content%252Fuploads%252F2024%252F11%252Ftechnologybackground-scaled.jpg%22%7D%2C%7B%22name%22%3A%22title%22%2C%22text%22%3A%22DeepSeek%2520and%2520the%2520Effects%2520of%2520GPU%2520Export%2520Controls%22%7D%2C%7B%22name%22%3A%22description%22%2C%22text%22%3A%22%22%7D%5D&page=https%3A%2F%2Fwww.vincentschmalbach.com%2Fdeepseek-and-the-effects-of-gpu-export-controls

2025-01-23 15:53:00

AljwadhJanuary 23, 2025

0 1,589 2 minutes read

DeepSeek and the Effects of GPU Export Controls

The Numbers Game

Export Controls: Task Failed?

The High-Flyer Factor

Beyond the Hype

The Road Ahead

Aljwadh

Leave a Reply Cancel reply

Elon Musk agrees with Tweet saying Americans aren’t smart enough for tech jobs

Apple Allows Support for Satellite T-Mobile and Starlink in the iPhone

Lamar Kendrick will appear in Synth Riders experience on Apple Pro vision

The 2024 Movie Monster State of the Union

Thousands of people are evacuating in LA as wildfires and extreme winds hit Southern California

The Chinese stocks, the emergent market debt see great inflaws in February, says IIF

Ryan Reynolds and Andrew Garfield Are Game to Return as Deadpool and Spider-Man

Your Dishwasher Is Gross. Here’s How to Clean It

Apple Music expands its live radio offerings with three new stations

Ready Player Me’s Player Zero sees momentum for Web3 collectible avatars

The 33 Best Shows on Apple TV+ Right Now (December 2024)

The Numbers Game

Export Controls: Task Failed?

The High-Flyer Factor

Beyond the Hype

The Road Ahead

Subscribe to my Newsletter

Aljwadh

Galaxy S25 side button launches Gemini, Samsung app Extensions

Liverpool are battling to sign a 'world-class' midfielder.

Related Articles

macOS No Longer Ships with Emacs

SeaWedding2769 commented what are these dots on the tin lids? and the concentric circles relief. various shops/brands, uk

Orbit by Mozilla

Reflection: A path to SuperIntellence

Leave a Reply Cancel reply

The Chinese stocks, the emergent market debt see great inflaws in February, says IIF

Ryan Reynolds and Andrew Garfield Are Game to Return as Deadpool and Spider-Man

Your Dishwasher Is Gross. Here’s How to Clean It

Apple Music expands its live radio offerings with three new stations

Ready Player Me’s Player Zero sees momentum for Web3 collectible avatars

The 33 Best Shows on Apple TV+ Right Now (December 2024)