Google has made AI consume up to six times less memory. Micron, Samsung and SK Hynix are paying dearly

we carry months wrapped in the memory crisisbut maybe there is a way out. Last week Google Research published a study in which he revealed a technique called TurboQuant. This is a compression algorithm capable of compressing the working memory of AI models up to six times without appreciable loss of quality or performance. Great news for end users, who see a light at the end of the tunnel, but terrible news for manufacturers, who this golden age could end. Let’s explain what KV cache is.. To understand TurboQuant you have to understand what that memory is that it manages to compress. When a language model processes a long conversationyou need to remember the context. Each token that is processed is stored in the so-called KV cache, a type of working memory that grows as we chat. The longer the conversation, the more memory the model requires. Compressing what is a gerund. It is one of the main bottlenecks in the AI ​​inference stage (that is, when we use the models), and one of the reasons why data centers they need as much RAM or HBM memory. TurboQuant uses a vector quantization method to compress this cache while maintaining the precision of the model. Pied Piper. As soon as this Google study appeared, the analogies began with the plot of the series ‘Silicon Valley’. In it, the fictional startup in the plot managed to develop an extraordinarily efficient compression algorithm called Pied Piper that threatened to revolutionize the technology industry. These days, multiple references to the series appeared on social media, which had already been referred to as visionary for reflecting what is happening with spectacular accuracy even when the series was a comedy. Six times less memory. The Google Research paper states that this method is capable of reducing the KV cache six times without an appreciable difference in performance in long conversations. The researchers will present their results at an event next month and explain the two methods that allow it to be put into practice. If they confirm what they’ve already teased, the implications are huge: less memory for inference means data centers can do the same thing with much less hardware/memory. Google’s DeepSeek moment. The discovery has some analysts calling this Google’s “DeepSeek moment.” A year ago, the Chinese startup DeepSeek launched an AI model that competed with the best but had cost much less to develop. That shook the industry, and now we return to a technical achievement that points to the same thing. In AI, doing the same with less is crucial, given the enormous resources that this technology requires. There are those who already have done evidence preliminaries with TurboQuant and have confirmed that the method does indeed work. Micron, Samsung and SK Hynix pay dearly. The impact of this technique can be enormous, and this has already begun to be noticed in the stock market valuations of DRAM memory manufacturers and HBM. Companies like Micron, Samsung, SK Hynix, SanDisk and Kioxia fell noticeably last week from their recent highs. On March 18 it was around $471, and today its shares are at $357, a staggering 24.2% drop. The same has happened with the rest of the manufacturers, which were already falling since that date, but have accelerated that fall with the launch of TurboQuant. But. The technique can theoretically be applied only to the inference phase, but the training phase of AI models is not affected by this compression technique. Therefore, huge amounts of memory will still be needed during the training phase. Besides we will have to wait for AI companies to actually start applying said system if it is confirmed to work, and that will be when we can see the real impact. Theoretically this will give a lot of room for maneuver to big tech, which will be able to reduce token prices even further, but it remains to be seen if they do so. RAM memories drop in price. The impact of TurboQuant has also been clear in the prices of memory modules, which have dropped appreciably in price. For example, the Corsair Vengeance DDR5 32 GB 6000MHz (2x16GB) modules were at 489.59 euros on Amazon until a few weeks ago according to CamelCamelCamel, but right now they are at 339.89 euros, a notable discount. It is true that not all components are falling equally, but there are indeed cases in which reductions seem to be occurring. In Xataka | The RAM crisis is destroying all of Valve’s plans with its Steam Machine

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.