We will try to explain in a simple and understandable way What are distilled models When we talk about artificial intelligence. When we talk to you about Install Depseek on the computer We mentioned that there were distilled versions, and other AIs are also being created that are distilled versions of other specific models.
We also usually refer to it as LLM distillationto specify that we refer to Large linguistic models either Large Language Modelwhich are those capable of processing the text, understanding what we write and responding to text. Come on, like Chatgpt , Deepseek, COPILOT, Gemini either Grok.
What is LLM distillation
The distillation of artificial intelligence models is A technique to reduce the size of the modelsreplicating the results and performance you can get with them.
Although we are used to using them through applications and web pages, LLM models They consume a lot of space and resources. We do not usually notice because when you use an AI from a website or app, you connect to the servers of large companies where this model is running. But if you wanted to have a complete model installed on your computer, you would need a very powerful processor and a lot of space.
The solution to this problem is to create a distilled model, A model trained to occupy less space. This model can replicate most of the performance, but it will be smaller and fast, you will need less resources to work.
The way to do it is similar to a teacher and a student. The complete model is a teacher who shares his experience and knowledge with a student, transmitting complex concepts and knowledge. Meanwhile, the student model learns to imitate what is being taught in a simpler and more effective way.
With that, lighter models are achieved. Your results will never be so good like those of the teacher, but the main and performance characteristics will remain. Come on, which comes to be a Lite version, a small but light and versatile version.


There are different techniques To create distilled models, such as knowledge distillation with final results for the student model to know the decision -making process or use the teacher to generate additional training data. It is also distilled in intermediate layer so as not to transfer only final results but intermediate layers, or use several teacher models to train the student.
In general, private companies that create artificial intelligence models are also responsible for creating distilled versions. The normal thing is that a specific name is added to the distilled version, such as the “flash” of Google Gemini or “Mini” of OpenAi.
In other cases, especially In open source modelsThey can use the name of the master model for the distillate but adding as a last name the models that have been used as a student. Come on, you can take a smaller model like Qwen and use it to create a distilled version of Deepseek that is called Deepseek qwen, or Deepseek distill qwen, to indicate that it is distilled.
Pros and cons of distilled models
A complete artificial intelligence model has billions of parameters, and the quantity of space and computer power To execute them it is huge. In a domestic computer you will need technology and tip power, in addition to a lot of space, already level of a companies such as OpenAi or Google that offer their AI by web or app, you need many resources on their servers.
Therefore, creating distilled models helps reduce size and occupy less space. But it also allows them to work faster, and that less computational costs are necessary. That makes Google or OpenAi offering you Free “small” versions Of its main models, leaving the most complete for payment users. Because keeping the complete requires money and investment.
And if we are talking about an open source model, have distilled versions allows you and I can install them and use them on our computer without having to spend thousands of euros on a new processor, on graphics cards or internal storage.
These techniques can also be used to create artificial intelligence models at a lower cost than would involve complete training. For that, you take already created models and train to a new one from their data and their knowledge, and you do not have to perform the process from scratch.
However, distilled models do not have the same amount of data and parameters, they are often lower in resources, and More failures and hallucinations may arise.
I will give you an example. If you follow our guide to Install Depseek on the computeryou will see that at a certain point you have several versions. You have versions 8bversions 14bor the full version of 671b. This number refers to its characters, and the lower the less resources you need, but more distilled and small will be the model.
Therefore, in this example, if you install an Deepseek 8b and a 14B, you will see that The lower model has more hallucinations And it gives you less precise answers. Therefore, the better you have the greater the model will have to be the model, and less distilled it will have to be.
The same goes for commercial models. If you are using a 2.0 flash gemini, the results will be worse than the full Gemini 2.0, and the same with the OPENAI O3 and O3 mini. However, the Flash or Mini version is the one offered to all free users, while the complete is for payment users, in order to assume the cost of maintaining these models in operation.
In Xataka Basics | Prompts pages: 16 free websites and communities to find ideas for your prompts and find advice to improve them
GIPHY App Key not set. Please check settings