The cost of training of models of artificial intelligence (IA) More advanced is in the spotlight. And it is understandable that it is so. The irruption of the Chinese company model Deepseekwhich presumably has A moderate training costhas questioned the strategy and investments deployed so far by OpenAi, Google or Microsoft, among other companies.
A brief review before moving forward: those responsible for Deepseek argue that the infrastructure they have used to train their agglutin model 2,048 chips H800 of Nvidia. And also that this process with 671,000 million parameters has cost 5.6 million dollars. However, some analysts defend that these figures do not reflect reality.
The report prepared by SEMIANALYSIS He maintains that, in reality, the infrastructure used by Deepseek to train his AI model approximately 50,000 NVIDIA GPU with Hopper MicroAritecture. According to Dylan Patel, AJ Kourabi, Doug O’Laughlin and Reyk Knuhttsen, at least 10,000 of these chips are GPU H100 of Nvidia, and at least another 10,000 are GPU H800. The remaining chips, according to these analysts, are the cuts cut H20.
The ‘S1’ model takes more firewood
On January 31, a group of researchers from Stanford University and the University of Washington, both in the US, published in the repository of open access scientific articles Arxiv A text in which it claims to have managed to train an AI model with reasoning capacity and benefits comparable to those of OPENAI or DEPEEEK O1 models facing an investment of just under $ 50.
A boat soon seems impossible. With that money a priori it is absolutely unfeasible to train an artificial intelligence model. And less an advanced and capable of competing from you to you with those of OpenAi or Deepseek. However, it is true. To understand how they have achieved it We need to investigate the strategy they have devised. On the one hand, those 50 dollars represent the cost of renting the cloud computing infrastructure to which they have resorted to carry out the training. It makes sense if the time invested is very moderate.
‘S1’ has been elaborated from the free qwen2.5-32b model developed by the Chinese laboratory Qwen
But there is something else. Something very important. His reasoning model, which they have called S1, has been elaborated from the free artificial intelligence model QWEN2.5-32B developed by the Chinese Laboratory Qwen, alibaba. And its reasoning process is inspired by the GEMINI 2.0 Flash Thinking Google model. They have not left zero at all. An interesting note: the S1 model is available in GITHUB together with the data and code used by these scientists to train it.
On the other hand, the training process lasted less than 30 minutes using only 16 NVIDIA H100 chips belonging to the cloud computing network used by these researchers. From here comes the cost of Somewhat less than 50 dollars. However, there is another data that is worth not overlooked: the S1 Reasoning Model has been generated by distillation of the Gemini 2.0 Flash Thinking experimental model.
Distillation is, in broad strokes, an automatic learning technique that allows the knowledge base to be transferred from a large and advanced model to a much smaller and efficient. This strategy saves many resources, although it does not serve to create models from scratch. Beyond the caraded 50 dollars of cost, the really important thing is that, as we have just verified, it is possible to put to tuning models of very competitive facing a much more restrained investment than those made by the large technology companies so far.
Image | Luis Gomes
More information | Arxiv | GITHUB
In Xataka | Samsung is preparing to give TSMC a bars where it hurts most: the manufacture of the chips for ia