On January 24, Perplexity released an assistant for Android phones. On January 23, Open AI previewed the Operator AI agent that can “go to the web to perform tasks for you”. On January 24 , Meta said its AI ambitions include a massive data centre. The same day, Google said Gemini can now control a smart home.
Individually, each is a significant leap in AI; together, it is even more.
Except, something else happened on January 20, that overshadowed all these. China’s relatively unknown DeepSeek launched a new generation of AI models that compete with the ones developed by US Big Tech, but at a fraction of the cost.
Suddenly, everyone is talking only about DeepSeek, whose launches also highlight that US sanctions meant to slow China’s AI progress haven’t really worked. It took a week, but the attention for DeepSeek made its AI assistant the top-rated free application available on Apple’s App Store in the United States. The app has also clocked more than a million downloads on Google’s Play Store for Android phones.
Moreover, enthusiasm around DeepSeek sparked a rout in US markets on Monday, pummelling US AI companies that have soared over the 18 months.
The Nasdaq plunged more than 3% in early trade as Chip giant Nvidia, a US pacesetter in the race towards AI, fell 13% , a hit of $465 billion in market value — the biggest in US market history.
Worse still, DeepSeek, which outdoes other AI models on almost all the metrics that matter — the cost of training, access to hardware, capability and availability — isn’t alone. Another Chinese firm Moonshot AI, has released a chatbot called Kimi Chat, which supposedly has the same capabilities as OpenAI’s latest generation o1 large language model (LLM).
DeepSeek claims to have spent around $5.5 million to train its V3 model, a considerably frugal approach to delivering the same results, that took the likes of Google, OpenAI, Meta and others, hundreds of millions of dollars in investments to achieve.
According to research by Epoch.AI, Google and OpenAI spent roughly between $70 million and $100 million in 2023 to train the Gemini 1.0 Ultra and GPT-4 frontier models respectively.
What stands out from information released by DeepSeek is the frugality of hardware too.
“I was trained on a combination of Nvidia A100 and H100 GPUs,” the DeepSeek chatbot tells us. It doesn’t share an exact number, and this is specific to the R1 model.
DeepSeek CEO Liang Wenfeng is a billionaire, who runs a hedge fund and is funding DeepSeek that reportedly hired top talent from other Chinese tech companies including ByteDance and Tencent.
To be sure, DeepSeek is clearly careful about its responses on China.
For instance, in response to a question from this writer on a list of challenges, including human rights ones, facing China, DeepSeek listed several including internet censorship, the urban-rural divide, housing market complexities and the treatment of Uyghur Muslims in Xinjiang momentarily, before this was erased and replaced with a simple “ “Sorry, that’s beyond my current scope. Let’s talk about something else.”
It was a lot more forthcoming on economic challenges facing China , and also economic and social challenges faced by India and the US.
DeepSeek, it emerges, has been at it for a while now, just that no one was really looking. The DeepSeek Coder was released in late 2023, and through 2024, that was followed up by the 67-billion parameter DeepSeek LLM, DeepSeek V2, a more advanced DeepSeek Coder V2 with 236 billion parameters, the 671 billion parameter DeepSeek V3 as well as the 32 billion and 70 billion models of the DeepSeek R1.
“A joke of a budget,” is how Andrej Karpathy, founder of EurekaLabsAI describes the company’s achievement of doing all this with its stated training spend. He isn’t the only one.
“DeepSeek is now number 1 on the App Store, surpassing ChatGPT—no NVIDIA supercomputers or $100M needed. The real treasure of AI isn’t the UI or the model—they’ve become commodities. The true value lies in data and metadata, the oxygen fuelling AI’s potential,” wrote Marc Benioff, CEO of Salesforce, in a post on X.
Analysts are already calling this the tipping point of AI economics. It’s easy to see why: DeepSeek R1’s API costs just $0.55 per million input tokens and $2.19 per million output tokens. In comparison, OpenAI’s API usually costs around $15 per million input and $60 per million output tokens.
Much like OpenAI’s o1 model, the R1 too uses reinforced learning, or RL. This means, models learn through trial and error and self-improve through algorithmic rewards, something that develops reasoning capabilities. Models learn by receiving feedback based on their interactions.
With R1, DeepSeek realigned the traditional approach to AI models. Traditional generative and contextual AI usese 32-bit floating points (a flaoting point is a way to encode large and small numbers). DeepSeek’s approach uses a 8-bit foalting point, without compromising accuracy. In fact, it is better than GPT-4 and Claude in many tasks. The result, as much as 75% lesser memory needed to run AI.
Then there is the multi-token system that read entire phrases and set of words at one, instead of in sequence and one by one. That means AI will be able to respond twice as fast.
DeepSeek’s Mixture-of-Experts (MOE) language model is an evolution too. DeepSeek V3 for instance, with 671 billion parameters in total, will activate 37 billion parameters for each token—the key is, these parameters are the ones most relevant to that specific token.
“Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, and engineer), they have specialised experts that only wake up when needed,” explains Morgan Brown, VP of Product & Growth — AI, at Dropbox. Traditional models tend to keep all parameters active for each token and query.
There is of course, the apprehension associated with DeepSeek, Moonshot AI and all other tech companies from China . Questions about any Chinese tech company’s proximity (known, or otherwise) with the government will always be in the spotlight when it comes to sharing data.
There is also a lack of clarity about Chinese tech’s access to latest generation GPUs and AI chips in general. SemiAnalysis’ Dylan Patel estimates DeepSeek has 50,000 Nvidia GPUs, and not 10,000 as some online chatter seems to suggest.
The Nvidia A100 (around $16,000 each; launched in 2020) and H100 (a $30,000 chip launched in 2022) aren’t cutting edge chips compared to what the Silicon Valley has access to, but it isn’t clear how a Chinese tech company laid its hands on them.
The company hasn’t officially detailed these specifics. It is unlikely if the world will every know all the hardware that was in play, and how it was sourced. That, though, could reveal the true cost of making R1, and the models that preceded it.