5,576 million dollars. That is what the Deepseek V3 creators claim that they were spent on the complete training of this surprising AI model. They make it clear in Your official technical reportwhich tells us how they used 2,788 million hours of GPU (Nvidia H800) to complete it.
Do we believe it? The accounts come out to them, of course, but what about the rest of the world? That is the open debate since this weekend exploded the news of the launch of Deepseek-R1. This reasoning model, derived from Deepseek v3compete from you with OPENAI O1, and that it has so much for so little companies such as Nvidia lose 400,000 million dollars In a single day in the stock market.
Difficult to believe. The question is whether we can trust the figures that Deepseek offers. Ben Thompson He tried to offer Some light in this regard and recalled that in the technical report of Deepseek V3 their responsible breaks down the cost of training, but clarify something important. The 5,576 million dollars do not include “costs associated with previous investigations and experiments in architectures, algorithms or data”.
The opinion of the analysts. Ben Thompson indicates In Stratechery That makes it clear that one cannot take 5.6 million dollars and replicate what Deepseek has done, but also that “I still still don’t believe that number.” Other analysts like Nathan Lambertthat in its newsletter interconnect too debated on this subject.
There were (of course) previous expenses. As they explain In Financial Timesthere are other analysts who also doubt that internal Deepseek data. Dylan Patel, from Semiianalysis, argues according to the appointments of the economic newspaper that Depseek has had access to “tens of thousands” of Nvidia GPUS that were used to train the models that preceded R1.
But that happens with other models. “Deepseek has clearly spent more than 500 million dollars in GPUS in its history,” said Patel, “and although his training was very efficient, he required a lot of experimentation and tests to function.” It is an interesting note, although it is also true that many other companies spend hundreds and even billions of dollars in infrastructure to train their models and then offer them to users.
And not so transparent costs. Those 5.6 million dollars also do not reflect additional expenses such as those that surely had to be assumed when adapting Deepseek v3 – which is the base – to Deepseek R1. There is no talk of salaries, work in annotation of the data so that they were of quality for training, or in possible incomplete training processes or that for some reason they were interrupted and failed.
Comparing with flame 3. A the so -called Praneet Rathi researcher (@pseuddd) published a few hours ago a Very extensive and detailed analysis On the cost of Deepseek V3 671b training (with 37b assets, which reduces the needs to train it) and compared it to the flame 3, finish. It indicates what it calls 405b need 30 million GPU hours forehead at 2.8 million of those who speak in Deepseek.
It may not be. Here he estimated that 1 hour of H800 (more limited than H100) used in Deepseek V3 was equivalent to about 0.75 hours used in flame 3, and after providing many more data (such as the fact of Use only reinforcement and precision learning FP8which saves a lot in resources) their data seemed to support the thesis that Deepseek’s cost is the one that claims to be. Other comments with Similar arguments in Reddit They also seem to give credibility to the numbers published by the Chinese startup. Of course, it is impossible to know safely and other users They comment on Threads Like that comparison “is too good to be true.”
But it is getting cheaper to train models. The truth is that infrastructure is increasingly powerful and the most efficient processes. Not only for Deepseek, but for everyone. A Recent analysis He estimated that GPT-4 training in early 2023 had cost about 63 million. In the third quarter of 2023 that cost would have been 20 million dollars, and it is reasonable to think that today that process would have been even cheaper. It would be interesting to know what they would say in OpenAi of that estimate.
Is it possible to replicate Deepseek R1? That Deepseek-R1 is an Open Source model and that those responsible have given so many data on how they have managed to develop this model opens the door for others to take the witness and develop similar models and then improve them. Is precisely what the Open-R1 project whose participants advance that there are still pieces of the missing puzzle, such as what data were used for training or with what “hyperparameters” the model trained.
Image | Taylor Vick
In Xataka | The next phase of AI is not to see who invests more but who invests less