In 2017, the Paper “Attention is all you need”Google changed the technical basis of language generation: the Transformers They allowed to process long sequences in parallel and climb models to sizes that were previously unfeasible. That climbing route has driven architectures such as GPT and Bert and has converted self -how The central piece of generative AI Contemporary.
But this new approach was accompanied by growing costs in memory and energy when the context lengthens, a limitation that has motivated research to develop alternatives. Spikingbrain-1.0 aims to break molds.
Of the “attention is all you need” to the brain: the new commitment to break limits in the
A team from the Chinese Academy of Sciences Automation Institute He has just presented Spikingbrain-1.0. We are talking about a family of spiky models aimed at reducing data and computation necessary for tasks with very long contexts. The experts propose two approaches: Spikingbrain-7B, of linear architecture focused on efficiency, and spikingbrain-76b, which combines linear attention with Mixture of Experts (MOE) mechanisms of greater capacity.
The authors detail that much of the development and the tests were carried out in clusters of GPU Metax C550, with libraries and operators specifically designed for that platform. This makes the project not only a promising advance at the software level, but also a demonstration of own hardware capabilities. An especially relevant aspect if China’s effort is taken into account for reducing his dependence on Nvidia, A strategy that we already saw reflected with Depseek 3.1.


Spikingbrain-1.0 is directly inspired by how our brain works. Instead of having neurons that are always “burning” by calculating numbers, uses spiky neurons: units that accumulate signals until they exceed a threshold and trigger a peak (spike). Between peak and peak they do nothing, which saves operations and, in theory, energy. The key is that not only does it matter how many peaks there are, but when they occur: the exact moment and the order of these peaks carry information, as in the brain.
In order for this design to work with the current ecosystem, the team developed methods that convert traditional self -acting blocks into linear versions, easier to integrate into its spiky system, and created a kind of “virtual time” that simulates temporal processes without stopping the yield in GPU. In addition, the Spikingbrain-76B version includes Mixture of Experts (MOE), a system that “awakens” only to certain submodos when we are needed, which we have also seen in GPT-4O and GPT-5.


The authors suggest applications where the context length is decisive: analysis of large legal files, complete medical records, DNA sequencing and massive experimental data sets in high energy physics, among others. That lace appears reasoned in the document: if the architecture maintains efficiency in contexts of millions of tokens, would reduce costs and open possibilities in domains today limited by access to very expensive computer infrastructure. But validation in real environments is pending outside the laboratory.
The team The code of 7,000 million parameters has released in Github next to a detailed technical report. It also offers a web interface similar to chatgpt to interact with the modelwhich according to the authors are deployed entirely in national hardware. Access, however, is limited to Chinesewhich complicates its use outside that ecosystem. The proposal is ambitious, but its true scope will depend on the community to reproduce the results and make comparisons in homogeneous environments that evaluate precision, latencies and energy consumption in real conditions.
Images | Xataka with Gemini 2.5 | ABODI VESAKARAN
In Xataka | Openai believes having discovered why the IAS hallucinates: they don’t know how to say “I don’t know”

GIPHY App Key not set. Please check settings