
How DeepSeek Changed the Race for AGI Forever- OpenAI, Meta, Google and Nvidia To Rethink Their AI Development Strategies
The AGI race is on and the race has become two-way, with USA and China up for the prize. What seemed to be squarely a race between American tech titans has now seen the entrance of a real Chinese competitor call DeepSeek, founded by Liang Wenfeng, 40-year old quant and electrical engineer from Zhanjiang, Guangdong in China.
The race has been colored by its affinity for money as fuel for the athletes. But not anymore. DeepSeek has found a way to train LLM’s through hardware and algorithmic optimizations.
Last time we checked, OpenAI, owners of ChatGPT has raised $17.9 bn. In its latest funding round in October, it raised $6.6 billion.
It is public knowledge that frontier LL.M models demand a lot of capital expenditure. Compute power takes a lot of money. Such that ChatGPT was burning $700,000 daily, based on a report by TechCrunch. The demand for compute power has led tech giants such as OpenAI to aim at investing $500bn in energy infrastructure and data centers, while Microsoft has a budget of $80billion for 2025 for the same purpose. Google just invested another $1billion in Anthropic to deliver new products, while Meta is looking at investing in Nuclear Reactors to power its computing demands. But where will the money come from?
Stargate to The Rescue
Recently, a consortium of companies in the presence of US President- Donald Trump formed a company called Stargate. The sole objective was to act as a vehicle to fund the growing need for larger computing infrastructure.
Stargate looks to invest $500 billion, and it plans to start with $100 billion. The key actors behind Stargate are; Sam Altman (Open AI CEO), Larry Ellison (Oracle CEO) and Masayoshi Son (SoftBank CEO). Though Elon Musk says the consortium is barely funded. However, the objective is clear- to ensure the USA wins the AGI race.
But the strangest thing happened a few days ago. All the while and somewhere in China, a company called DeepSeek released a model that did better than all the frontier LLM models. It performed better than OpenAI’s o1, Meta’s Llama 3.1 and Anthropic’s Claude 3.5 Sonnet.
The name of DeepSeek model is DeepSeek-R1 model.
Enter DeepSeek R1: Turnaround for the Global AI Ecosystem
DeepSeek was founded by High Flyer, a hedge fund; based out of Hangzhou, Zhejiang, China. High Flyer and DeepSeek in turn was founded by Liang Wenfeng. High Flyer was founded in 2015, while DeepSeek was founded in May 2023.
DeepSeek R1 model is open-weight, meaning that the algorithms can be studied and built upon. OpenAI o1, compared to this model is a black box. But R1 is not completely an open-source model because the training data cannot be accessed. However, it is open-sourced because it has been declared free under MIT license for other applications to be built upon it.
How DeepSeek Performs
The advantages stated above do not constitute the greatest upset. DeepSeek R1 is far cheaper, compared to other models. It costs one-thirtieth what it costs to run OpenAI’s o1 model. An experiment that takes £300 to run with o1 costs less than $10 to run on R1. This has sent shock waves through the entire artificial intelligence complex in the western world, especially the funding structures that power the development of artificial intelligence in the West.
The R1 does comparably well in Chemistry, Mathematics and coding, just like OpenAI o1. But it does so with lesser resources. It beats other open-source models such as Llama 3.1 at various tasks, and it matches and beats OpenAI’s o1 on many tasks, especially mathematics and Chinese language. Anthropic proves to be a better match for it. The cost of developing the R1 is valued $6million in hardware to develop the R1, but according to TechTarget, it cost about $640 million to develop Meta’s Llama 3.1. That is 100 times more. Yet, DeepSeek’s R1 model has about 671 Billion parameters, while Llama 3.1 has 405billion parameters.
Not Exactly a New Kid on The Block: How Exactly Did It Emerge?
This is not DeepSeek’s first outing in the Artificial General Intelligence race. Just last month, it released a Chatbot called V3, an LLM, alongside its chatbot; both which are free. The release of R1 on January 20th reveals a frontier model that is now ranked as a top 10 LLM as reported by WSJ.
DeepSeek stands out on the basis of capability, cost-effectiveness and openness. The fact that it was developed on a shoestring budget of $6million provides a source of worry for multiple western countries whose investment plans and valuations have been developed around intensive use of computing hardware and energy resources.
This feat of developing a cost-effective LLM was achieved as a result of necessity. According to Live Science; “U.S. export controls, which limit Chinese companies’ access to the best AI computing chips, forced R1’s developers to build smarter, more energy-efficient algorithms to compensate for their lack of computing power,”
The DeepSeek team despite purchasing 10,000 Nvidia H100 GPU’s has been able to match the performance of American models by devising strategies such as custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies, according to Wired. Methodologies such as Multi-head Latent Attention (MLA) and Mixture-of-Experts, have also helped the DeepSeek models to become cost-effective by requiring fewer computing resources.
The Role of Talent in DeepSeek’s Emergence
DeepSeek emerged as a top player largely due to the recruitment strategy and culture installed by Liang Wenfeng. The company recruited the best scientists and PhD students who finished their studies two years ago or a year before they graduated. These are scientists that have won awards at conferences for their research. They were simply tasked to feel free and answer the hard questions in AI development.
The culture at DeepSeek is the opposite of the culture at other Chinese companies where most of the teams have to fight for resources, thereby reducing cooperation and desired outcomes for an emerging field like artificial intelligence.
What is Next for America?
It is now obvious that the chokepoints created by the export controls on high-tech chips have failed to stop China’s AGI race. What is next for America?
Firstly, the Chinese don’t exactly throw money at problems like American capitalists do. This is reflected in the AGI race so far. But China intended to invest more this time around, but for the providence of GPU constraints imposed by America in 2022. According to Science Business, “China has spent $90 billion, while the US has spent north of $328 billion, followed by Europe at $45 billion”.
However, Yahoo Finance reports that China could see up to $1.4 trillion in AI investment in the next 5 years, based on a speech given by Chen Liang, chairman of the state-backed investment vehicle China International Capital Corporation (CICC). Meanwhile, Business Insider estimates an investment of $1trillion in the next 5 years for tech giants in America.
The DeepSeek development is proof that everyone may need to scale down their Capex.
Secondly and finally, there is an unspoken but protested practice where China surreptitiously funds companies of Chinese origin through back channels, and then creates enough capacity to make their products cheap. This has long been a trade practice that has been decried by the USA in particular. This, coupled with the unique deployment of research talent of DeepSeek, requires a measured and effective response
The AGI race in the USA has had a celebrity flair to it, such that scientists have been somewhat alienated in places like OpenAI, where some have resigned. It is time for the US government to invest heavily and not leave only private entities to fund the AGI race.
Marc Andreessen said this is a Sputnik event and I agree. If the US wants to land a man on the moon (Win the AGI race), it has to get on board like it did for the Apollo program and Manhattan Program. Tech giants also have to treat this race like what it is, “The race for scientific breakthrough”, and not an ego contest, like we have seen with Sam Altman and Elon Musk. Get the scientists and technology experts to work.