He does not want me to sell his chips for the most advanced outside of China

The US Department of Commerce has taken a very important step forward in his offensive against Huawei’s business outside China. During the last weeks this Chinese company has presented two chips for artificial intelligence (AI), the Ascend 910d and the Ascend 920with those who pursue occupy the holes in the Chinese market that presumably will leave Nvidia as a result of the latest sanctions to China deployed by the US. The American company led by Jensen Huang can no longer deliver to its Chinese clients Your H20 GPUand, precisely, Huawei aspires to get that market portion with its new chip ascend 920. The other GPU, the Ascend 910D chip, presumably delivers a performance comparable to that of the GPU NVIDIA H100so it aspires to consolidate as a solid alternative to the latter. The US cannot control the presence of Huawei in China, but has taken a very important step forward to cut off its presence outside its country of origin. The US is using the most powerful tool you have: its patents Frequently some readers ask us why the US has the power to prevent Asml, which is a company of the Netherlands, to sell its most advanced lithography teams to its Chinese clients. This right lies on a fundamental principle: the most advanced machines produced by ASML, such as its equipment of extreme ultraviolet photolithography (UVE) or deep ultraviolet (UVP) use US technologies. One of the most important is the innovation that allows these machines to generate ultraviolet radiation with adequate wavelength. According to the Department of Commerce, Huawei has produced these chips illegally using US technologies This is in essence the same principle that the US Department of Commerce has appealed to approve a resolution by which no country on the planet You can buy the GPU for the Ascend de Huawei. According to this American institution, this Chinese company has produced these chips using US technologies illegally, so its export outside the country borders governed by Xi Jinping violates the export controls of the Department of Commerce. In practice, the US will cost the commercial flow of the GPUs for Huawei out of China, especially when these semiconductors go to allies of the latter country. Its strategy to exert pressure on countries interested in getting the Huawei chips is to announce fines, the possibility of revoking export rights, and even establishing criminal consequences. However, the Department of Commerce has not banned the circulation outside of China of All GPUs for Huawei. The chips ascend 910b, 910c and the imminent 910d are prohibited, but the ascend 910 that TSMC produced Legally for Huawei in 2019 and 2020 it can circulate on the planet with freedom. Image | Huawei More information | US Department of Commerce In Xataka | In a low voice, China has begun to remove some tariffs from US products. Your concern: the chips

No one has advanced NASA in the exploration of other planets from the USSR. China plans to do it even in Neptune

While NASA applies Efficiency Department cuts Directed by Elon Musk, China has presented the most ambitious space exploration program in its history. A SCIENTIFIC MISSIONS directed by him newly created Deep space exploration laboratory that would not only advance to the US in several milestones: no one would arrive in the Solar System. Kamo’oalewa (2025). With the launch of the Tianwen-2 probe, scheduled for May of this year, China aspires to become the third country to collect samples of an asteroid near the Earth. Japan had only done (twice) and the United States (after the recent Osiris-Rex mission). 469219 KAMO’OALEWA is a “miniluna”a terrestrial quasisisatelite between 40 and 90 meters in diameter that China aspires to “touch” with Tianwen-2, a mission that will also study the Elst-Pizarro comet-asteroid and that will serve as a learning for the future mission of recovery of Mars samples. Mars (2028). Tianwen-3 is the mission for bruise that could mark the Sorpasso Symbolic of the Chinese Space Agency to NASA, since the American mission of return of Martian samples (in which the European Space Agency also participates) is found right now in pause. Tianwen-3 is simpler than Mars Sample Return. If what NASA wants is to go find the rocks that have Carefully selected the Rover PerseveranceChina forms to reach the red planet, pierce the ground, collect at least 500 grams of samples and take off with a small rocket so that a return probe brings them to the earth. 2015 XF261 (2028). The same year that China plans to launch Tianwen-3, it would also launch its First major planetary defense mission. Two Chinese probes will follow the steps of the missions NASA Dart and Hera of that. One will impact the near Asteroid 2015 XF261 and the other will observe the impact to confirm that humanity has diverted a second asteroid. Callisto and Uranus (2029). The Tianwen-4 mission, whose launch is scheduled for 2029, has a double objective. It would arrive in Jupiter in 2035. A first orbiting probe Callisto, one of the moons of the gaseous giant, at the same time as JUICE DE LA ESA Orbit Ganímedes, another of its moons. A second probe, equipped with thermoelectric radioisotope generators, would take advantage of Jupiter’s gravitational assistance to get to Uranus in 2045. It would be one of the first scientific missions to Uranus, which He received the visit of the Voyager 2 probe In 1986. Venus (2033). Another sampling recovery mission, but this time atmospheric. A Chinese probe would travel to neighbor Venus and take its “air” to analyze whether it has microorganisms or possible biological traces, such as Some recent studies suggest. Neptune (2033). A freshly proposed mission to the last planet of the solar system would orbit the ice cream giant and display an atmospheric probe equipped with a balloon, in addition to performing flycards from its Triton moon. With a useful life of up to 20 years to the thermoelectric generators of radioisotopes, it would be the first probe sent exclusively to Neptune. For its launch, the future CZ-9 rocket, the “Chinese starship” would be used. Mars (2038). In addition to a manned station on the moon, China plans to display an autonomous robotic station on Mars to investigate techniques for the use of Martian resources that could serve for future manned missions. China plans to step on Mars in the 2040s. Triton (2039). The second Chinese mission to Neptune and his Triton moon is the most ambitious of the program because he would use a nuclear fission reactor to feed the ship’s electrical thrusters. All that energy would not only allow you to orbit Neptune, but also penetrate the ice from the Triton surface to explore the hypothetical ocean that is hidden below in search of life. Image | Xinhua In Xataka | NASA has cut 420 million dollars following Doge’s guidelines. It is Elon Musk’s favorite number

Within the most advanced chips manufacturing machines there is something incredible: small supernovae

Identifying a Supernova is an event that astronomers usually celebrate with enthusiasm. And it is not for less if we consider that they are One of the most violent events with which we can run into the cosmos. Knowing them better is very important because it can help us understand more precision How are the latest stages of The life of mass starsand also the mechanisms that explain how the material caused by stellar synthesis can lead to new star systems. The mathematical tools handled by astrophysics current nuclear fusion that take place in the nucleus of mass stars. During the stage known as the main sequence, stars obtain their energy from the fusion of hydrogen nuclei. As this chemical element is consumed, the star begins to produce helium nuclei, and, of course, its composition begins to evolve. During this process a huge amount of energy is released and the star is forced to continuously readjust to maintain hydrostatic balance, a phenomenon that is the result of the coexistence of two opposite forces capable of compensating. One of them is the gravitational contraction, which compresses the subject of the star, pressing it without rest. And the other is the radiation and gase pressure, which is the fruit of the ignition of the nuclear oven and tries to expand the star. The small supernovae of the extreme ultraviolet lithography equipment As we have anticipated from the holder, this article does not go only from Supernovas; It is also starred by the semiconductors. A priori we can intuit that these cosmic events and integrated circuits have nothing to do, but, curiously, they do have something in common. This is the reason why I found a good idea to start this text reviewing what a supernova is and why they occur. Otherwise we could not understand in all its extension the idea in which we are about to investigate. The ultraviolet radiation generation process used by UVE lithography equipment is very similar to what happens during a Supernova In the teams of extreme ultraviolet lithography (UVE) that manufactures the Dutch company ASML, high power lasers instantly heat tens of thousands of tiny tin drops in a single second until they reach a temperature of half a million Celsius degrees. This interaction produces An extremely hot plasma that emits ultraviolet light with a wavelength of 13.5 nm. This light must later be transported to the wafer thanks to a very precise mirrors and lenses system with the purpose of capturing the patterns that define the integrated circuits on a layer of photorers. Very broadly this is the strategy used by the most advanced semiconductor manufacturing machines that currently exist. And, as we have just seen, high -power lasers interpret an unquestionably protagonist role. As Jays Stewart, Chief of Research at ASML, explains in the very interesting article he has published in IEEE Spectrumthe ultraviolet radiation generation process used by UVE lithography equipment to produce avant -garde chips is very similar to what happens during a supernova. When a massive star exhausts its fuel and stops nuclear fusion processes, radiation pressure and gases is no longer able to counteract gravitational contraction. This phenomenon causes the star iron core It suddenly contracts under the enormous pressure that all layers of material that it has above. The star has lost the hydrostatic balance. At this moment all this matter loses the support that the nucleus exercised, which is now much more compact, and falls on it with enormous speed. When all that star material touches the surface of the nucleus there is a rebound effect that causes it to be fired with a huge energy towards the stellar medium, being disseminated. A supernova has just been produced. Some of them are so energetic that for a few seconds they emit more light than the entire galaxy that contains them. The tiny explosions that take place inside the UVE lithography equipment when a laser affects a tin drop produce a shock wave similar to that originating in the stellar medium, although much smaller scale. Surprisingly the mathematical equations that describe the evolution of these two types of explosions are the same. ASML engineers use them to calculate very precisely how the evolution of the shock wave that triggers plasma balls within the UVE equipment will be. And astrophysicians use them to describe the remains of the supernovas and deduce the properties of the star explosion that originated them. A Supernova has 10⁴⁵ times more energy That an explosion of tin, but thanks to this parallel, ASML engineers have been able to solve the complex problem derived from tin residues inside their most advanced lithography equipment. Image | ASML More information | IEEE Spectrum In Xataka | ‘Focus: The Asml Way’: The book that reveals the secrets of the most powerful European company in the chips industry

How China has advanced the US on its presidential plane

Every time the president of China, one of the main economic and military powers of the world, travels abroad in a State visita safety operational complex is activated. From transport to personal protection, everything is designed to minimize any risk and ensure that the diplomatic agenda runs without any setback. In the most complex trips, China displays a Boeing 747 fleet. According to simple flyingwhile an airplane is reserved for Xi Jinping, another transports its accompaniment personnel and a third is responsible for the load, including the presidential limousine. A logistics scheme reminiscent of the United States uses in this type of displacement. China and the United States choose Boeing Boeing is going through difficult timesbut that does not prevent him from manufactured some of the most emblematic aircraft in the world. A clear example is the Boeing 747in service since 1970 and still considered a reference in aviation. Both the United States and China have chosen it for the transport of its leaders, being a key piece on international trips. President Donald Trump has two Boeing 747-200b (VC-25), in service since the 1990s. These aircraft have been deeply modified to function as true control centers in the air, with private suites, offices, meetings and advanced communication and defense systems to guarantee VVIP level security. Xi Jinping travels in a Boeing 747-8, a more modern version than used by the president of the United States. This aircraft was delivered In December 2014but it was not until 2016 when it received its VVIP configuration, with specific modifications for presidential transport. Of course, the information about this point is extremely scarce. What is known is that, although the logo of China Airits use is reserved exclusively for Xi Jinping and acts as its air operations center. As we indicated above, the Chinese government also asks the state airline to have commercial aircraft for official functions when necessary. China uses a Boeing 747-8 for the transport of personnel and a Boeing 747-400 load for logistics support. In the latter the Hongqi N701Xi Jinping’s armored limousine, considered the Chinese version of the American “The Beast”, which in the case of the US moves into a C-17 Globemaster IIIa military load plane built by Boeing. Although there is no confirmed replacement plan, China could replace its presidential plane of the American Boeing with an alternative of its own manufacturing. The country has already developed with SUCCESS THE COMAC C919a narrow fuselage plane, and works on the C929a wide fuselage model that could become a viable candidate. The United States has been trying to replace its Boeing 747-200B presidential by Boeing 747-8, but the process has been full of delays. In this context, the Boom Supersonic CEO, Blake Scholl, has suggested that The White House leaves 747 and bet on a special version of the Overtureits supersonic development plane. Images | Hugo Luc | United States Air Force | Asuspine In Xataka | That a plane ends upside down in flames with 80 passengers inside is not good news. What happened later yes

Openai steps on the accelerator in the AI ​​race with its new and advanced language model

The career for the development of artificial intelligence does not stop accelerating. In November 2022 we witnessed the launch of GPT-3.5the model that laid the bases of Chatgpt and marked a turning point in the The conversational. Almost two years later, we are facing the arrival of GPT-4.5a new evolution that aims to continue expanding the limits of this promising technology. We are facing a transition model between GPT-4O and GPT-5, but with a key role in OpenAi’s strategy. The company has decided that GPT-4.5 will be the last without incorporating “chain of thought”, giving way to a new generation with incorporated reasoning capabilities. This change seeks to improve the clarity of your family of products, that has become complex over time. A more advanced model, with greater emotional intelligence AI is reaching a point where we no longer settle for having models capable of solving complex problems. That improve in programming, science, engineering and mathematics is very good, but we look for models that can interact with more spontaneous way With humans. This makes us understand OpenAi with GPT-4.5. His new language model is presented as the greatest and most advanced published by the startup to date. The researchers affirm that it is not only more natural, but their knowledge base has expanded to face complex challenges in various scientific disciplines, while improving their ability to solve logical problems. GPT-4.5 wants to stand out for its emotional intelligence, and in a recent test it has proven to live up to it. A member of the OpenAI team tested his abilities by sending him a message full of emotion. With a natural tone and Without hiding your anger, He explained that a friend had canceled him at the last minute and asked for help to write a message that reflected his discomfort. Chatgpt with GPT-4.5 detects that the user is frustrated and, instead of enlivening the anger, he suggests a calmer and diplomatic message to express his discomfort without damaging friendship: “Hello, the truth is that it has annoyed me a lot that the plans have been canceled again, I really wanted to see you. Can we talk about what is happening?” To achieve this change, Openai has modified its focus on creating models. Not radically, but incorporating New supervision techniques Together with traditional methods such as the fine supervised adjustment (SFT) and learning by reinforcement based on human feedback (RLHF), which were already fundamental in the development of GPT-4. In development. Images | OpenAI In Xataka | Alexa+, first impressions: the explosion of AI feels great to the Amazon assistant. But there are also many unknowns

China prepares to lead the manufacture of chips for advanced weapons

China monopolizes gallium production. In fact, Up to 2022 monopolized 98% of the world gross Gallic, and this figure presumably has barely varied since then. For the country led by Xi Jinping This metal has a strategic value comparable to the one for the US due to its potential in military applications. And, in addition, Gallium export control allows China to respond to the sanctions to which the US and their allies are subjecting this Asian country in the scope of the semiconductor industry. Gallium is a very special metal. Its physicochemical properties make it suitable to be combined with other metals with the purpose of manufacturing a special type of integrated circuits called broadband semiconductors. These chips have three properties that make them very valuable to intervene in the manufacture of Advanced military teams: They support voltages, temperatures and frequencies higher than integrated conventional silicon circuits. During the 70s the US Advanced Defense Research Projects (Darpa) He dedicated many resources to the development of semiconductors in which Gallium was involved due to the potential he had in military technology projects. He Gallium Arseniuro (GAAS) played a fundamental role in the development of the global positioning system (GPS), and also in Radark tuning and precision weapons. China has taken a very important step forward Currently, Gallium Nitruro (GAN), which also has Darpa’s backing, is being used to make state -of -the -art radars that are capable of accurately identifying smaller, fast and numerous objects at more distance. Each of these radars incorporates several thousand chips in which the gallium intervenes. Everything we have just seen invites us to reach an obvious conclusion: Gallium is an essential metal for the US. But this country is not the only military superpower on the planet. The Chinese army and research institutions have been working with Gaul for many years and developing technologies that allow you to use it Third generation advanced semiconductors. As we have seen, China has Gallium in abundance, but producing integrated circuits with this metal is not easy. Neither for the country led by Xi Jinping nor for which Donald Trump is currently governing. Modifying electrons energy levels, it is possible to accurately control climbing and reduce defects To manufacture gallium nitruro, silicon and sapphire substrates are often used, but the effectiveness of the processes used so far was moderate because the hexagonal atomic structure of the GAN causes the appearance of a defect known as climbing. In broad strokes this curious phenomenon triggers the displacement of groups of atoms in a certain region of the crystal, which affects its structure and reduces its properties. In fact, GAN manufacturing defects cause electric leaks, reduce their thermal stability and reduce the performance of semiconductors. Until now the researchers who work with GAN had difficulty understanding why these defects appear in the crystalline structure of this material. And also to deal with them. But the team of scientists led by Professor Huang Bing at the University of Beijing (China) has identified the cause which triggers the production of defects during the growth of the Gan Crystals. What these scientists have discovered is that modifying the energy levels of electrons it is possible to accurately control the climbing and reduce defects. “Traditional strategies to avoid defects include the use of different substrates and the adjustment of crystallization temperatures, but these approaches only address symptoms, not the cause,” Professor Huang Bing explained. If China manages to get this research from the laboratory and bring to production chains this technology will have the ability to manufacture cheapest GAN -GAN semiconductors, of more quality and a much larger scale. And at this juncture it will not be unreasonable to anticipate that it will be done with the leadership of the application of 3rd generation semiconductors in the military field and 5G technologies. Image | TSMC More information | SCMP In Xataka | This semiconductor is spectacular. So much so that for the MIT is its nº 1 candidate to replace the silicon

His new great bet are advanced humanoid robots, according to Bloomberg

After getting rid of 5% of its workforce at the beginning of the yearMeta has created a team destined to develop its next great ambition: humanoid robots driven by artificial intelligence (AI). The news comes from Bloombergwhich states that the company led by Mark Zuckerberg will make a “significant investment” this new project. The sources point out that androids will be able to interact with users and will help them perform daily tasks. Goal aspires that its technology helps overcome current barriers where robots have certain limitations to wear a glass of water without scattering, placing them dishes on a shelf to clean them or bend clothes. A team within reality labs Instead of creating a new division, the giant of social networks has decided Metaversoglasses Quest 2, Quest 3 and Quest Proas well as smart glasses Ray-Ban Meta. The initiative will be led by Marc Whittenwho until recently served as CEO of the Cruise Autonomous Vehicle Company owned by General Motors. He previously led the Amazon entertainment devices division and worked for 17 years in Microsoft, where he was an engineer of the original Xbox team. According to the aforementioned medium, Meta seeks to focus on the underlying sensors and software of robberies, including an artificial intelligence system. It is an attempt to use part of the technology developed for its virtual and extended reality products. The glasses, for example, have multiple sensors to, for example, eye tracking. Neo Beta, a 1x robot The use of technologies that were born in other products is not new in the technology industry. Optimus de Tesla, for example, is based on a variety of advances derived from the autonomous driving technology of the firm’s cars. It is not only on software, but also of processing chips, batteries and sensors. More than manufacturing humanoid robots, goal seeks to consolidate as the reference technological ecosystem in this area, providing key components. The production and marketing will remain in the hands of other companies, and for this he already has conversations with China Unitree Robotics and the American figures AI. We will have to wait to know how this new initiative will evolve, but we can advance that goal will not be alone. There are currently many other companies that are betting on humanoid robots. In addition to those mentioned are 1x, Fourier Intelligence, Boston Dynamicsamong others. Images | Xataka with Grok | 1x In Xataka | Apple believes to have an ace in the sleeve to shine in a market as saturated as that of robots: “emotional robotics”

This Chinese fantasy film is destroying records in his country and has already advanced Marvel in more than one milestone

‘Ne zha 2’ is sweeping. Maybe you have not been able to talk about this animated Chinese production (the first installment was titled ‘Nezha: the rebirth of a god’ and You can see it in Netflix), but it reached Chinese cinemas taking advantage of The celebration of the Chinese New Year, the most profitable and coveted day for the premieresand in 5 days he kneaded 435 million dollars. To get an idea of ​​how much that can be, ‘Avengers: Endgame’ added something less, 427 million, in its first five days in US cinemas. If you continue on this path (And you are following: In its first week, it has reached 671 million), it has another even more spectacular milestone to exceed: become the first feature film that exceeds 1,000 million dollars of collection in a single country. Again, Disney shadows him with ‘Star Wars: the awakening of force’, which raised 936 million in the United States. It is the one that shows the greatest brand so far. Finally, ‘Ne zha 2’ has another record to overcome, and everything indicates that he will: be the first Chinese film to exceed one billion collection. ‘The battle of Lake Changjin‘He stayed at the doors, with 913 million at the box office. Finally, last milestone: ‘Ne zha 2 is about to advance to the first’ Nezha ‘, which became the first animated film that exceeded 700 million collection in a single territory. Many brands and many milestones, as can be seen, for a film that continues the argument of the first part and that remains starring Nezha, a protective deity that appears in the Buddhist, confucian and Taoist folklore. The argument of these animated films are based on a classic Chinese novel of the 16th century, which in turn has inspired other films in the country such as ‘New Gods: Nezha Reborn’. In Xataka | ‘Vaiana’ is already the greatest success in streaming history, and there is a clear responsible: the way of watching children’s films

He has just launched one of the most advanced military satellites in Europe

Spain already has one of the most advanced communications satellites of the world. Run from the United States with a Spacex rocket, the Spainsat NG 1 positions Spain as one of the NATO members with the greatest operational capacity in sovereign communications. The secret is in their antennas. The Spainsat Ng 1 communications satellite. The most advanced in Spain and the first in Europe with next -generation antennas. The spainsat Next Generation 1 was successfully deployed this morning to offer safe communications in government and military missions of the European Union, as well as its partners in the Atlantic Alliance. The satellite meets NATO’s requirements For missions and deployments, and positions Spain among the most advanced countries in sovereign communications. Adaptable to critical and emergency response operations, it is planned to remain in service until 2037, covering North and South America, Africa and the Middle East, and part of Asia to Singapore. Launch by Spacex. It is striking that a European military satellite is put into orbit by an American company. But it is not surprising, taking into account the multiple delays of Ariane 6: the only European rocket that could have launched the six -ton NG 1 spainsat. He did it Instead a Falcon 9 of Spacex From Cabo Cañaveral. The rocket loaded with 450 tons of propellants took off from the 39A platform of the Kennedy Space Center at 1:37 UTC, taking advantage of the last drop of fuel to take the satellite to a geostationary transfer orbit. His first stage was discarded without landing after 21 successful pitches, including the Japanese lunar mission Hakuto-R and the deployment of 400 Starlink satellites. Deployment in geostationary orbit. After separating from the second stage of the Falcon 9 rocket, the satellite undertook a solo trip until its final position in geostationary orbit, 35,786 km on the earth, a height of almost three times the diameter of the earth itself. The Spainsat NG 1 will undergo a series of commissioning tests before entering operation. The Spanish company Hisdesat will be in charge of its operations During the next 15 years, with the support of the Department of Connectivity and Safe Communications of ESA. What makes it so special. The 6.1 tons satellite, the size of a minibus, carries X Band Antennas with Beam Hopping technology, which means that they can modify the communications beam to different regions or users electronically, without moving. The antennas were developed by Airbus Defense and Space in Barajas, but A consortium of Spanish companies participated In the design and development of the satellite: Sener, Indra, Technobit, Arquimea, GMV and the Hisdesat itself. This industrial and institutional collaboration is included in the Pacis 3 project, which is in turn part of the Spainsat Next Generation program of Hisdesat, with an expected fleet of two satellites based on EUROPAR NEO DE AIRBUS technology. Images | ESA, Spacex In Xataka | In full brawl between Musk and Europe, Spacex prepares to launch the most advanced Spanish satellite ever built

Deepseek does the same as Openai’s most advanced models with much less resources. The key: “Reinforcement Learning”

The entire world is wondering how it is possible that the models of AI of Deepseek They have become overnight the great protagonists of today in the field of artificial intelligence. The answer is relatively simple. These models have managed to demonstrate that You can do more with much less. Both Deepseek V3 and Deepseek-R1 are comparable to GPT-4 or O1 OPENAI respectively, but it is estimated that their training has been much less expensive and its inference, of course, is: the prices of the Deepseek API are up to 35 sometimes lower than those of OpenAi, but that makes one wonder how it is possible. The answer is clear, and it is because we have at our disposal the technical reports of these AI models. Precisely his study has allowed us to clarify What are the techniques that this Chinese R&D laboratory has used to develop these models so efficient and capable. Many techniques, a single objective: efficiency There are several differences that make Deepseek’s new model especially efficient. Its creators explain in detail in the detailed Technical Report that is publicly available. Here are the most relevant: Deepseekmoe (“Mixture of experts”): In models such as GPT-3.5 the entire model was activated in both training and inference (when we use it). However, not all model components are necessary for our requests. The MOE technique – already introving with Deepseek V2 – precisely divides the model into multiple “experts” and only activates those that are necessary according to the request. GPT-4 is already a MOE model. But as we said, Depseekmoe even went further and differentiated between even more specialized experts, in addition to using some somewhat more generalist experts that could contribute value in certain requests. Managing all those specialized or generalist experts not only benefits inference, but also the training phase, making it more efficient. This technique is similar to the so -called “Time Scaling test” that also adjusts the size or complexity of a model during efficiency. Deepseekmla (Multi-Head Latent attention): It is another substantial improvement-even more than the previous one, and also introduced with Deepseek V2-that affects the way in which memory is managed in these models. Normally it is necessary to load both the model and the entire context window – the one that allows us to write prompts and include long texts, for example. Context windows are especially expensive because each token requires both a key and their corresponding value. With the improvement introduced with this technique, what was made possible was to compress that warehouse of keys and values, dramatically reducing memory use during inference. Auxiliary -los-Free Load Balancing: If we imagine a model like a great orchestra, each musician is an “expert” within the model. To play a complex piece, not all musicians are necessary all the time. Traditionally the so -called “auxiliary losses” were used to make sure that all musicians played enough, but these losses could interfere with that interpretation of the musical piece (model training), which could degrade general performance. With Deepseek V3 the model is able to balance the work of each expert dynamically. That does the simplest, direct and efficient training by eliminating “auxiliary losses.” In addition, the elimination of interference allows the model to learn better and with less resources … and get better results. Multi-Token Prediction Training Objective: Often predicting the following word depends on several previous words or context. With this technique instead of predicting only the following word, the model learns to predict several words at the same time. That makes more natural and understandable and less ambiguous texts generate, but also accelerates training by reducing the number of steps necessary to generate the complete text sequence. FP8 Mixed Precision Training: The use of Numbers FP8 allows significantly reducing memory consumption and accelerates calculations. Some critical parts of the model continue to use FP32 training to guarantee precision, but there is another additional benefit of FP8: the size of the models is reduced. Other models use techniques such as quantization or parameter pruning. Although Openai does not give data on GPT-4 in this section, the assumption is that it works with BF16, more expensive in terms of memory. Although FP8 theoretically leads to less precise models, other complementary techniques such as fine-grained quantization are used to reduce the negative impact of values ​​that come out of the common, which makes a stable training possible. Cross-Node All-to-Lall Communication: During training it is necessary to constantly exchange information between all nodes (computers) connected in training data centers. That can become a bottleneck, but these new Deepseek V3 techniques include efficient communication protocols, data traffic reduction and efficient synchronization to accelerate training and, once again, reduce the costs of that process. Reinforcement and “distillation” learning as keys But in addition to all these techniques, those responsible for Deepseek V3 explain how they pressed it with 14.8 billion tokens, a process to which a supervised adjustment followed (Superved Fine-Tuning, SFT) and several stages of Reinforcement Learning (Reinforcement Learning, RL). The SFT phase-which is mentioned in the Deepseek V3 report-was completely omitted in the case of Deepseek-R1. However, learning by reinforcement is an absolute protagonist in the development of both models, especially in R1. The technique is well known in the field of artificial intelligence, and it is as if we trained a dog with prizes and punishments. The model learns to respond better by giving rewards if you do well. Over time, the model learns to take actions that maximize long -term reward. In Deepseek, learning for reinforcement is used to break down complex problems in smaller steps. In it Deepseek R1 technical report It also indicates how this model makes use of RL techniques directly on the base model, without the need for supervised training. That saves computing resources. The call also comes into play here Thought chain (chain-of-though)also mentioned in the technical report. This refers to the ability of a language model to show the intermediate steps of its reasoning. The model not only … Read more

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.