The best free tools to install models of AI as Deepseek, call, mistern, gemma and more

We bring you a list with the best free tools for Install artificial intelligence models locallyand thus create your own chatgpt with models such as Deepseek, Callsand more. These are open source models, which means that you can install and use them for free on your computer. Installing an AI locally has disadvantages such as using less powerful versions of these models. However, it has important advantages such as that all data are left on your computer and are not compiled by any company, and that you can use it for free. In this article we have tried to focus only on the eight best programs to do this. However, if you think we have left any list that you consider important, we suggest that Tell us in the comments so that the rest of the users can benefit from your knowledge. Ollama Ollama It is an open source application without graphic environment that you can install both in Windows and Macos and GNU/Linux. What it offers is the possibility of install and use AI models from the terminal of your computer, without complications and without having to open extra apps. This program will allow you to install a large number of models, from the flame to Depseek, Phi, Nomic, Qwen and many more. Each model has different versions, both complete and distilled, and you have the possibility of lowering them with different parameter sizes. LM Studio An open source application that serves to lower LLM models of artificial intelligence on your computer. Offers a unified graphic interface, since you can search and lower AI models within the program with a search engine and in a simple way, and then lower them and throw them in it. This program has versions for Windows, Macos and GNU/Linux, and allows you to use the models in your IU or a local server compatible with OpenAi. You can also use local documents with AI, you will use the models without connection, and you can download them from Hugging Face repositories. You can use models as a flame, Mistral, Phi, Gemma, Qwen or Deepseek Anythingllm A program all in one to be able to use artificial intelligence models on your computer, locally and offline. It is open source, and allows you to chat with documents, execute AI agents and manage various tasks. In addition, if your computer is not very powerful, it has subscriptions to use it from the cloud. It has a very flexible architecture, with three components working together, and in addition to being able to use AI models with open source connect locally to privatesuch as Openai, Azure and others services. It focuses mainly on privacy and customization, having many available controls. GPT4ALL Another open source project to install LLM models on your computer, being able to work on the CPU or the GPU. It has the capacity to install up to 1,000 open source languor models, such as Deepseek R1, Llama, Mistral, Nous-Hermes and many more. It is a payment application, although with a ratuita version with limited tokens. But for daily use it should be enough. It has programs for Windows, Windows ARM, Macos and Ubuntu. Jan An open open source program that allows you to install open source models locally, such as Call, Gemma or Mistral. It also allows you to connect to cloud services such as OpenAi or Anthropic when you need it. All data is stored locally. It has versions for Windows, Macos and GNU/Linux, being compatible with the GPUS NVIDIA (CUDA), AMD (Vulkan) or Intel Arc. Has an extensions system That will allow you to customize it and configure it to your liking. The interface is light and beautiful. Flame.cpp An open source program created to use locally Any flame -based model of finish. This program can work both in the CPU and in the GPU of your computer, which allows it to be better in domestic equipment, although it is a bit more complex to use. NextChat NextChat allows you to use the chatgpt characteristics in an open source package that is under your control. It is a web and desktop application that connects directly to external AI services, such as Google, Openai or Claude, but storing the data locally in the browser. This program also allows users to create “masks”, something similar to GPT with which to create IA tools with specific contexts and configurations. It can work in Windows, Macos and GNU (Linux. Flamefile A program that converts AI models into executable filesso that you can use them independently. It is a Mozilla Builders project, which combines flame.cpp with Cosmopolitan Libc. It is compatible with Windows, GNU/Linux, Macos and BSD. In Xataka Basics | Prompts pages: 16 free websites and communities to find ideas for your prompts and find advice to improve them

Some researchers claim to have created an AI as good as those of Openai and Deepseek for $ 50. And the data is real

The cost of training of models of artificial intelligence (IA) More advanced is in the spotlight. And it is understandable that it is so. The irruption of the Chinese company model Deepseekwhich presumably has A moderate training costhas questioned the strategy and investments deployed so far by OpenAi, Google or Microsoft, among other companies. A brief review before moving forward: those responsible for Deepseek argue that the infrastructure they have used to train their agglutin model 2,048 chips H800 of Nvidia. And also that this process with 671,000 million parameters has cost 5.6 million dollars. However, some analysts defend that these figures do not reflect reality. The report prepared by SEMIANALYSIS He maintains that, in reality, the infrastructure used by Deepseek to train his AI model approximately 50,000 NVIDIA GPU with Hopper MicroAritecture. According to Dylan Patel, AJ Kourabi, Doug O’Laughlin and Reyk Knuhttsen, at least 10,000 of these chips are GPU H100 of Nvidia, and at least another 10,000 are GPU H800. The remaining chips, according to these analysts, are the cuts cut H20. The ‘S1’ model takes more firewood On January 31, a group of researchers from Stanford University and the University of Washington, both in the US, published in the repository of open access scientific articles Arxiv A text in which it claims to have managed to train an AI model with reasoning capacity and benefits comparable to those of OPENAI or DEPEEEK O1 models facing an investment of just under $ 50. A boat soon seems impossible. With that money a priori it is absolutely unfeasible to train an artificial intelligence model. And less an advanced and capable of competing from you to you with those of OpenAi or Deepseek. However, it is true. To understand how they have achieved it We need to investigate the strategy they have devised. On the one hand, those 50 dollars represent the cost of renting the cloud computing infrastructure to which they have resorted to carry out the training. It makes sense if the time invested is very moderate. ‘S1’ has been elaborated from the free qwen2.5-32b model developed by the Chinese laboratory Qwen But there is something else. Something very important. His reasoning model, which they have called S1, has been elaborated from the free artificial intelligence model QWEN2.5-32B developed by the Chinese Laboratory Qwen, alibaba. And its reasoning process is inspired by the GEMINI 2.0 Flash Thinking Google model. They have not left zero at all. An interesting note: the S1 model is available in GITHUB together with the data and code used by these scientists to train it. On the other hand, the training process lasted less than 30 minutes using only 16 NVIDIA H100 chips belonging to the cloud computing network used by these researchers. From here comes the cost of Somewhat less than 50 dollars. However, there is another data that is worth not overlooked: the S1 Reasoning Model has been generated by distillation of the Gemini 2.0 Flash Thinking experimental model. Distillation is, in broad strokes, an automatic learning technique that allows the knowledge base to be transferred from a large and advanced model to a much smaller and efficient. This strategy saves many resources, although it does not serve to create models from scratch. Beyond the caraded 50 dollars of cost, the really important thing is that, as we have just verified, it is possible to put to tuning models of very competitive facing a much more restrained investment than those made by the large technology companies so far. Image | Luis Gomes More information | Arxiv | GITHUB In Xataka | Samsung is preparing to give TSMC a bars where it hurts most: the manufacture of the chips for ia

Mistral AI is the French startup that opted for efficiency before Deepseek. His future is uncertain

Mistral ai is the French technological jewel. The AI ​​startup has become practically the only European representative that competes with large companies and technological startups in the US or China. He also does it with an absolute focus on efficiency, which is just what is now valued as Deepseek. However, its future is complex. A promising European startup. When we talk about her for the first time in 2023, he surprised that without having any product He managed to raise 105 million euros investment Soon, yes, the fruits of such commitment would begin to appear, which also generated that the firm raised More financing rounds. Flag efficiency. One of the defining characteristics of Mistral AI was that his work was always oriented to do more with less. To seek efficiency. The startup, knowing not being able to compete with the Bigt Tech in resources, has always sought to create more compact models but with great performance. He succeeded with Large 2 123b, which was three times smaller that calls 3.1 405b but matched that model and others like Claude 3.5 or GPT-4o in some metrics. They return to the load with Small 3 24b. The startup He has just announced The availability of Small 3 24b, its new “small” model. Actually it is not so much: those with size between 1b and 14. However, it is certainly an interesting llm for one thing: it competes from you to you with flame 3.3 70b – the last of the finish line, almost three sometimes bigger – and also does it with a latency (time it takes to appear every token) much smaller. The performance is fantastic. Latency, if we pay attention to the internal tests of Mistral, fantastic. Answer three times faster than the goal model. Its performance is comparable to GPT-4o Mini, and also exceeds QWEN-2.5 32B latency, which is something better in some benchmarks. This model, yes, has just had successor with QWen2.5-Max. Open Source, and European data. Another advantages of Mistral’s AI is that it is an Open Source AI –“Open Weights”, rather-, as he calls, Deepseek or Qwen. Unlike them, Mistral is a startup that raises data governance in the EU, something that can be inert for government agencies and European companies. It has already been seen how there is suspicion about where the data ends From our chats with Deepseek –Italy is investigating In this regard – and this is an undoubtedly striking option. Interesting uses. Mistral developers explain that their model is perfect for conversational assistants, because in them precisely matters that they respond quickly. They also highlight the ability to customize/polish the model to specialize in certain tasks such as legal council, medical diagnosis or technical support. The model is available on platforms such as Hugging Facebut it can also be executed at home in quantized seeing if you have at least 4090 or for example a MacBook with 32 GB of memory. At the moment it does not seem available In le chatthe web service that has confessed to us to be based on Mistral Large 2.1. Reasoning model in sight. But while their competitors are launched to the race for the reasoning models, Mistral has left something behind in this area. In a message in x They clarify that Small 3 does not use synthetic data “which makes it a great base for anything in reason.” In their official announcement they go further and point out that “among many other things, in the coming weeks, large and small Mistral models are expected with greater reasoning capacity.” Deepseek advancing on the right. The Chinese fashion startup has become the star of the moment with its models, and especially with its reasoning model, Deepseek R1. He has also achieved it using the same weapon as Mistral: efficiency. Deepseek’s success validates Mistral’s strategy, of course, but the question is whether Mistral will continue to compete or be overwhelmed by Chinese and US startups if they also partially or totally adopt that same approach. Mistral’s market share is modest, but is in danger of reducing with the strong competition of US and China companies. Source: FT. Doubts about the future. In media as Financial Times The concern about the future of Mistral and the European startups is discussed. There is no other to work in LLM –The German Aleph Alpha He left that race In September 2024-, and that compromises the future of European efforts to compete and not depend completely on the models of China and the US no matter how they can be. In Spain projects such as Alia are an interesting first step, but for now they are far from the LLMs and models mentioned. Operational limbo. This economic newspaper also indicates that with an assessment of 6,000 million euros Mistral is in a kind of limbo. He has raised too much money to gradually disappear, but not enough to compete with US mega -companies, for example. Acquihire on the horizon? Sean Maher, from the Entext consultancy, believes that Mistral can Follow the steps of inflection aiwhich was acquired by Microsoft – Mustafa Suleyman, current head of Ia in Redmond, was his co -founder. Thus, a potential acquisition of its resources and especially of their talent (which is usually called ‘acquihire’) is not ruled out. Image | Mistral ai

Of algorithmic trader to revolutionize AI. This is the story of Liang Wenfeng, the founder of Deepseek

In just two weeks, an unknown 40 -year -old Chinese engineer has shaken Silicon Valley and has monopolized the world conversation about AI. It has even caused a global stock earthquake. Liang Wenfeng has achieved something that seemed impossible: Develop an AI model that rivals Openai to a fraction of its cost. If we move the focus, what has actually done is to launch an order to the American domain in Ia. Liang is not a technological entrepreneur to use. Born on the southern coast of China, in Zhanjiang, he began his career studying electronic engineering at the University of Zheijang. With Bright notes. In 2015 co -founded High-Flyera quantitative investment fund that managed more than $ 13,000 million using algorithms of Machine Learning To operate in the stock market. What distinguishes Liang is His unusual way of guiding his career. While most Chinese companies of AI focused on marketing products, He opted for research Pure and hard. “In the last thirty years, (the Chinese technology industry) has only emphasized to make money and ignored innovation,” he told China Waves as collected Reuters. “Innovation is not driven only by the business, it also needs curiosity and desire to create.” This vision materialized in 2021, when it began to accumulate thousands of Nvidia chips for a project without name, just before the United States restricted its sale to China. Two years later he founded Deepseek with just over one million dollars of initial capital. Today, They say local mediaDeepseek has only 140 employees. That is 10% of the OpenAI size, for example. Deepseek’s success He has driven Liang in his country. On January 20, he was in a closed door with Li Qiang, the prime minister. Liang was the youngest in the room. His meteoric ascent, going from a very limited fame to his field to being the epicenter of the global technological conversation in a few days, he has also put him in the trigger for those who question that Deepseek has been able to develop V3 and R1 only with the declared infrastructure. This is what Alexandr Wang, CEO of Scale AI, suggested in statements to CNBCwhen he assumed that his access to Chips had been much older but could not admit it for commercial restrictions. Dario Amodei, CEO of Anthropic, It was more comprehensive and even magnanimousbut not condescending. For Liang, The goal goes beyond competing with Silicon Valley. As explained to 36krseeks that China “gradually transit” to be a beneficiary to a taxpayer in the AI ​​industry. “What we see is that China cannot always be in a follower,” he said. “We often say that there is a gap of one or two years between China and American AI, but the true gap is between originality and imitation.” With a discreet profile and a disagreement image That has nothing to do with Altman, Zuckerberg and company, some liang companions They have described it as a pragmatic leader most motivated by his curiosity than for wealth or fame. It fits with seen so far. His commitment to research rather than by commercial applications reveals A certain background of your personality: It puts curiosity for long -term knowledge to immediate benefits. And perhaps that is what has changed the role of China in the Global Race of AI. Outstanding image | X, Xataka, Deepseek In Xataka | I have tried Deepseek on the web and in my Mac. Chatgpt, Claude and Gemini have a problem

What is High-Flyer, the Chinese fund that drives Deepseek and has been using AI for years to make investment decisions

Deepseek is the fashionable artificial intelligence (AI) company. Your most recent language models They have challenged Openai’s leadership and have caused a real earthquake in the technology industry. These days we have known that It was founded in May 2023 and that has developed its products with a fraction of the computing capacity of some of its main western rivals. But what else is known? Let’s see it. The promising present of Deepseek is the result of years of investigation that began long before its official constitution. Its origin is found in High-Flyer, a quantitative investment fund created in 2015 by the Electronic Engineering student Liang Wenfeng with two classmates. As they count on their websitethe idea was that the algorithms became the heart of their business by allowing real -time operations. A company focused on the Chinese stock market High-Flyer completed its first stock market assisted by AI in October 2016, a movement that triggered an unstoppable effort to continue working in that regard. The company formed software and hardware research and development teams. And apparently it was the appropriate decision. In 2017 I already applied AI In almost all its strategies of quantitative investment, but to continue advancing I needed to break some barriers. They discovered that complex models training tasks required a huge calculation power. This did not discourage them and in 2019 they launched a dedicated division called High-Flyer ai to address the challenge. The group built started working with 500 GPU, then built a 1,100 GPU supercomputer A100 of NVIDIA And in 2022 he spent 140 million dollars to raise the number up to 10,000 GPU, before the entry into force of the export controls of the United States. High-Flyer was completely focused on developing its algorithmic trading business. He had his own deep learning training platform and a Outstanding computer infrastructure. Meanwhile, in the United States there was a company called Openai that bet on the generative AI and that He had surprised many with the benefits of his GPT-3 language model. As China Talk collectsLiang wanted to go beyond finance. For a long time he had been convinced that AI would change the world, and had found the opportunity to bring his effort to the next level. In 2023, High-Flyer announced that it would lay the foundations of a new organization to advance the development of general artificial intelligence (AGI). Thus Deepseek was born, with an injection of capital of high-flyer. Deepseek is a product of High-Flyer work and has obviously drunk this company. Both signatures share offices in the same building, although they seem to use different computing resources. The AI ​​startup says it has H20 chips, that are sold as donuts in Chinaand NVIDIA H800, and that has used only 2,048 GPU of this latest model to train its most recent models, an affirmation that some have questioned. Images | High-flyer | Deepseek In Xataka | “They are brilliant researchers under the control of an authoritarian government.” Anthropic’s CEO has spoken about Depseek

Spain was going to invest a fortune in data centers. And then Deepseek arrived

Data centers They looked like the new gold fever in Spain. Recent data revealed how investments of Various Big Tech They promised to significant this market. Artificial intelligence promoted all those efforts, but these days some companies are rethinking what to do. The reason is, of course, Deepseek. Deepseek. The arrival of the models Deepseek v3 in November and Deepseek-R1 Just a few days ago it has made all these investments now questioned for a single reason. It may not be necessary to spend so much money. Chinese models of this startup seem to show that the same can be achieved (or more) With much less. “Unrealistic”. As revealed expandingSpain could attract more than 43.7 billion investment until 2030. However, sources in the sector have indicated in that economic newspaper that some millmillonarium projects to build large data centers in Spain were “unrealistic” and there is talk of figures that were not even guaranteed by the funds. Market adjustment. The Search for efficiency You can make these analyzes a certain adjustment in the market. Both investment funds and risk capital companies can now show more prudence when investing. Spaindc provided for the arrival of 58,000 million euros to the data centers sector until 2030. But there will be (a lot of investment). Although it seems clear that there will be a review of the budgets in various projects, as long as the need for the creation of data centers will continue to exist. The demand, due to the rise of AI and cloud services, will be remarkable. Great plans. In our country ACS has for example plans to invest globally 60,000 million euros (12,000 million in Spain). Merlin, another of the leading companies in this sector, announced the promotion of two megacampus in Extremadura with 1 GW capacity in each. Repsol, also cited expanding, It has plans to invest 4,000 million in Spain in this area. Long -term optimism, but short caution. The movements that have occurred these days after the impact caused by Depseek seems to have made many companies be recalibrating their short, medium and long term plans. However, it is impossible to know what the impact of medium -term Deepseek will be. Some experts pointed out In the Independent How they were not entirely convinced of the real efficiency of Deepseek, but they admit that that will certainly force a potential rethinking of the data centers. Image | Amazon In Xataka | We have calculated how much money the Big Tech are being spent on data centers. The numbers are dizzy

How to use Deepseek to search on the Internet and see the sources used in the answer

Let’s explain How to look for things online using Deepseekthe popular Chinese artificial intelligence. In this case we will use its official website, since if you opt for Install Depseek on your computer with Ollamathen you will not be able to use this function. What we are going to teach you is to ask for this thing, and what Look for results on the Internet. Deepseek will generate an answer from them, but you will also be able to look at the sources that you have used and enter the articles. Search the Internet with Depseek The first thing you have to do is Enter the Chat Website with Deepseekwhose address is chat.deepseek.com. This will take you to the screen where you can start a new conversation with AI. Once on this screen, search in the button Search That appears under the writing field. You can combine this button with Deceptk R1, depending on whether you want the AI ​​reason or not about the questions. Once you have marked the option Searchlook for what you want to find. When you ask Depseek, it will take a few seconds in search for sources to extract informationand then it will compose an answer. When you do, in each paragraph you will see a series of numbers, and above all you will have a message Found XX Results which indicates how many pages he has used as a source. The numbers at the end of each paragraph indicate which sources have been used for that text fragment. If you pass the mouse on one of those numbersthen a window will be displayed with the source, and you can click on it to enter the article. If you click on the message Found XX Resultsthen a column will open to the left where you are going to show A list of all used online items To compose the answer. And so, you can click on each of them to review the information or expand it. In Xataka Basics | Deepseek history: how to see or erase everything you have asked artificial intelligence

There will be a before and after Deepseek. We already know why this is so efficient

The publication of the V3 model of the artificial intelligence (AI) Deepseek as an open source is a blessing. And it is because little by little we are knowing in detail the strategy of the engineers of this Chinese company to Take a model of so efficient. Before moving forward with this article, it is important that we keep in mind that Depseek says that has trained his model using only 2,048 chips H800 of Nvidia. Some analysts defend that, in reality, its infrastructure brings together 50,000 GPU H100 Buy through intermediaries, but for the moment it is just a conjecture. This chip is more powerful than the H800, but it is perfectly credible that Depseek has been forced to settle for the latter because The sanctions of the US government They have prevented Chinese companies from access to the H100 GPU. In fact, since November 2023 Nvidia cannot deliver To your Chinese clients your H800 chip. One of Depseek’s keys is called PTX In the recipe for the thrilling growth that Nvidia has experienced during the last five years, its GPU does not intervene; CUDA technology (Compute Unified Device Architecture) also has An essential role in your business. Most of the AI ​​projects that are currently being developed are implemented on CUDA. This technology brings together the compiler and development tools used by programmers to develop their software for NVIDIA GPUs, and replace it with another option in the projects that are already underway it is a problem. Huawei, who aspires to an important portion From this market in China, it has Cann (Compute Architecture for Neural Networks), which is its alternative to CUDA, but for the moment CUDA dominates the market. In addition, this Nvidia tool puts in the hands of programmers High level language that allows them to access the GPU hardware in an affordable way. Even so, and we reach the heart of this article, Deepseek engineers have not used Cuda to develop their AI: They have used PTX (PARALLEL THREAD EXECUTION). Deepseek engineers have decided to use PTX to get the most out of the H800 GPUs as possible This language is similar to the assembly. In fact, it is somehow the assembly that proposes the developers who use their GPUs and need to implement low level optimizations in their code. Programming with PTX is more difficult and laborious than doing it with CUDA, but it entails the advantage that it allows developers to write a more efficient code, and, therefore, capable of taking better advantage of the resources offered by the Hardware of the GPU. Presumably the Deepseek engineers have decided to use PTX to get the most out of the H800 GPUs they had in their possession. One of the stratagems they have devised has consisted of assigning only 20 SM (Multiprocessors streaming) From each GPU to the communication between the servers, which has allowed them Dedicate the remaining 112 From each chip to calculation processes. In essence, Deepseek has been built since zero by resorting to this type of optimizations, which largely explains why this AI model is so efficient. The programmers of this Chinese company have objectively materialized an achievement in the field of engineering that will in all likelihood will have a deep impact on the way in which AI models developers will face their projects in the future. This is the palpable evidence that China is successfully adapting to the shortage of GPUs that have triggered US sanctions in their companies. Image | Nvidia More information | Mirae Asset Securities Korea In Xataka | We can forget an AI without hallucinations for now. The general director of Nvidia explains why

Deepseek does the same as Openai’s most advanced models with much less resources. The key: “Reinforcement Learning”

The entire world is wondering how it is possible that the models of AI of Deepseek They have become overnight the great protagonists of today in the field of artificial intelligence. The answer is relatively simple. These models have managed to demonstrate that You can do more with much less. Both Deepseek V3 and Deepseek-R1 are comparable to GPT-4 or O1 OPENAI respectively, but it is estimated that their training has been much less expensive and its inference, of course, is: the prices of the Deepseek API are up to 35 sometimes lower than those of OpenAi, but that makes one wonder how it is possible. The answer is clear, and it is because we have at our disposal the technical reports of these AI models. Precisely his study has allowed us to clarify What are the techniques that this Chinese R&D laboratory has used to develop these models so efficient and capable. Many techniques, a single objective: efficiency There are several differences that make Deepseek’s new model especially efficient. Its creators explain in detail in the detailed Technical Report that is publicly available. Here are the most relevant: Deepseekmoe (“Mixture of experts”): In models such as GPT-3.5 the entire model was activated in both training and inference (when we use it). However, not all model components are necessary for our requests. The MOE technique – already introving with Deepseek V2 – precisely divides the model into multiple “experts” and only activates those that are necessary according to the request. GPT-4 is already a MOE model. But as we said, Depseekmoe even went further and differentiated between even more specialized experts, in addition to using some somewhat more generalist experts that could contribute value in certain requests. Managing all those specialized or generalist experts not only benefits inference, but also the training phase, making it more efficient. This technique is similar to the so -called “Time Scaling test” that also adjusts the size or complexity of a model during efficiency. Deepseekmla (Multi-Head Latent attention): It is another substantial improvement-even more than the previous one, and also introduced with Deepseek V2-that affects the way in which memory is managed in these models. Normally it is necessary to load both the model and the entire context window – the one that allows us to write prompts and include long texts, for example. Context windows are especially expensive because each token requires both a key and their corresponding value. With the improvement introduced with this technique, what was made possible was to compress that warehouse of keys and values, dramatically reducing memory use during inference. Auxiliary -los-Free Load Balancing: If we imagine a model like a great orchestra, each musician is an “expert” within the model. To play a complex piece, not all musicians are necessary all the time. Traditionally the so -called “auxiliary losses” were used to make sure that all musicians played enough, but these losses could interfere with that interpretation of the musical piece (model training), which could degrade general performance. With Deepseek V3 the model is able to balance the work of each expert dynamically. That does the simplest, direct and efficient training by eliminating “auxiliary losses.” In addition, the elimination of interference allows the model to learn better and with less resources … and get better results. Multi-Token Prediction Training Objective: Often predicting the following word depends on several previous words or context. With this technique instead of predicting only the following word, the model learns to predict several words at the same time. That makes more natural and understandable and less ambiguous texts generate, but also accelerates training by reducing the number of steps necessary to generate the complete text sequence. FP8 Mixed Precision Training: The use of Numbers FP8 allows significantly reducing memory consumption and accelerates calculations. Some critical parts of the model continue to use FP32 training to guarantee precision, but there is another additional benefit of FP8: the size of the models is reduced. Other models use techniques such as quantization or parameter pruning. Although Openai does not give data on GPT-4 in this section, the assumption is that it works with BF16, more expensive in terms of memory. Although FP8 theoretically leads to less precise models, other complementary techniques such as fine-grained quantization are used to reduce the negative impact of values ​​that come out of the common, which makes a stable training possible. Cross-Node All-to-Lall Communication: During training it is necessary to constantly exchange information between all nodes (computers) connected in training data centers. That can become a bottleneck, but these new Deepseek V3 techniques include efficient communication protocols, data traffic reduction and efficient synchronization to accelerate training and, once again, reduce the costs of that process. Reinforcement and “distillation” learning as keys But in addition to all these techniques, those responsible for Deepseek V3 explain how they pressed it with 14.8 billion tokens, a process to which a supervised adjustment followed (Superved Fine-Tuning, SFT) and several stages of Reinforcement Learning (Reinforcement Learning, RL). The SFT phase-which is mentioned in the Deepseek V3 report-was completely omitted in the case of Deepseek-R1. However, learning by reinforcement is an absolute protagonist in the development of both models, especially in R1. The technique is well known in the field of artificial intelligence, and it is as if we trained a dog with prizes and punishments. The model learns to respond better by giving rewards if you do well. Over time, the model learns to take actions that maximize long -term reward. In Deepseek, learning for reinforcement is used to break down complex problems in smaller steps. In it Deepseek R1 technical report It also indicates how this model makes use of RL techniques directly on the base model, without the need for supervised training. That saves computing resources. The call also comes into play here Thought chain (chain-of-though)also mentioned in the technical report. This refers to the ability of a language model to show the intermediate steps of its reasoning. The model not only … Read more

Deepseek has had to pull pure ingenuity, breaking the “more = better” paradigm

Satya Nadella, the general director of Microsoft, It is very clear: “Deepseek’s new model is really impressive both for how they have effectively develop a model of artificial intelligence (AI) open source which performs calculations in time of inference as for its incredible computational efficiency. We must take the developments from China very, very seriously (…) As IA becomes more efficient and accessible we will see that its use triggers, becoming a merchandise from which we cannot do without. “ In this statement to FortuneNadella gives credit to the technological triumph that the Chinese company Deepseek has reached. And he honors him that he recognizes him without ambiguity, especially if we are in mind that Microsoft is one of the competitors of the AI ​​industry that has just a few hours ago witnessed how Its value in the bag has fallen in an abrupt way after the emergence of Deepseek R1. Anyway, we can be sure that to a large extent this AI model is the result of the pressure that US sanctions are exerting on Chinese companies. Jensen Huang, the founder and general director of Nvidia, He anticipated it in one of the statements he made at the end of May 2023 in Computex: “China is dedicating mass resources to the implementation of emerging companies specialized in the development of GPU. Do not underestimate them.” This warning was aimed at the US government in a clear attempt to prevent you about the consequences that They will have the sanctions that seek to stop the technological development of China. Huang talks about GPU Chinese designers, but his statement can be extrapolated to Chinese companies that develop AI models. After all, in this area, the GPUs and the great language models go hand in hand. USA will continue to lead in AI A good part of the sanctions approved by the administration led by Joe Biden as of October 7, 2022 seeks to slow down the development of the Chinese semiconductor industry, and also its AI technology. In fact, as we have just seen, the integrated circuits and the AI ​​go hand in hand. These prohibitions prevent NVIDIA, AMD or Intel, among other chips manufacturers for AI applications, sell their most advanced GPU to their Chinese clients. This is presumably the germ of Deepseek’s greatest achievement. According to Depseek the infrastructure used to train its AI model 2,048 NVIDIA H800 chips If we stick to the information that this Chinese company has made the infrastructure used to train Depseek R1 agglutina 2,048 chips H800 of Nvidia. And training with 671,000 million parameters has cost 5.6 million dollars. This is precisely what Satya Nadella speaks in the statements that we have reviewed a few lines above. These figures are extremely restrained. Some analysts defend that, in reality, its infrastructure brings together 50,000 GPU H100 Buy through intermediaries, but for the moment it is just a conjecture. If we give the statements made by the Deepseek spokesmen to good Financial Timesand for the moment it is reasonable to do so, the reason why their engineers have mounted their training infrastructure on NVIDIA H800 GPUs is that US sanctions have prevented them from accessing the H100 chips, which are more powerful. The prohibitions of November 16, 2023 They prevent Nvidia Delivering to their Chinese clients the H800 GPUs, but presumably at that time Depseek already had its infrastructure assembled. In any case, at this situation the meritorious is that with a relatively modest chip this Chinese company has materialized a remarkable achievement. Depseek’s undisputed success is a victory for China, but it is a partial victory. This technological war at the moment is winning the US. Its advantage lies in an unappealable reality: the country led by Donald Trump controls so much Most GPU manufacturers Like many of the companies that are dedicated to developing AI models. And the latter have access without restrictions on the most advanced GPUs produced by NVIDIA and other companies. China has the Huawei GPU, which They seem to be very competitive In inference processes, and also with those of companies such as Moore Threads, Metax, Biren Technology, Innosilicon, Zhaoxin, Iluvatar Corex, Denglinai or Vast Ai Tech, among others. But, for the moment, it is in a position of clear disadvantage. Even so, this confrontation goes for long, so any conclusion that we reach about which country will finally impose itself in the AI ​​domain, if any, it would be premature. Image | Nvidia More information | Fortune | Financial Times In Xataka | China is closely monitoring the United States movement with Stargate. And your answer has already prepared

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.