parameters Archives - usatoday24

In AI, teraflops came first, then parameters. Now what matters are the ‘bragawatts’

November 9, 2025 by usatoday24

The technological conversation revolves around fashions, and there is nothing as fashionable as artificial intelligence. All the countries that want to be part of the conversation are developing their models and tools and it is interesting how geopolitics permeates everything: the US seeking to be sovereign while China wants to monetize now. But as interesting as the capabilities of one model or another, it is to talk about two concepts that are totally aligned: data centers that feed the enormous amount of calculation necessary to train artificial intelligence and, evidently, Where do they get that absurd amount of energy from?. And as a result of that conversation a fascinating term has been born: the one with the ‘bragawatts’. The ‘bragawatts’ as the bragging of AI Something common when companies like OpenAI or Google announce new data centers focused on AI is that they give a bombastic number about the amount of energy it will consume. RecentlyOpenAI announced a new campus in Michigan that, together with six other also recently revealedthey will need more than 8 GW to operate. They also talk about money: a plan launched in January of this year of 500 billion dollars and 10 GW of planned capacity. According to the company, it is “the infrastructure necessary to advance AI and reindustrialize the country.” In Financial Times They have done the math and, with the Michigan project, the company has 46 GW of computing power. As when talking about operations like the purchase of Activision-Blizzard by Microsoft for 75 billion dollars, context is needed because it is difficult to imagine such enormous numbers. If 1 GW is enough to power 800,000 homes in the United States (with what they spend on air conditioning at any time of the year), these OpenAI data centers would consume as much energy as more than 44 million homes. More context pointed out in the Financial Times: almost three times all the homes there are in California. And the fact that companies give this power data so happily has led to some coin the term ‘bragawatt’. This neologism is a sarcastic combination between ‘brag’, “to show off”, and ‘watts’, the unit of power. In Spanish it is difficult to find a name, but basically it is a boast, something that some companies use, publicly exaggerating the energy consumption capacities planned for their infrastructures. There are several reasons why this is done, but as with any type of announcement by companies that are ‘public’ -those listed on the stock exchange-, the objective is to attract the attention of both the press and the technology sector and, above all, investors. In the economic environment they comment that these bombastic figures are not always met, but beyond the marketing boastthere is a bottom to all this. OpenAI asked the US government to secure 100 GW annually to fuel the country’s different AI developments and NVIDIA explained quite well why estimating the demand for these centers is a problem. In a recent report, the company commented something very interesting: Unlike a traditional data center, which runs thousands of unrelated tasks, an AI “factory” operates as a single system. When training a large language model, or LLM, thousands of GPUs perform intensive calculation cycles, followed by periods of data exchange. Everything is done in perfect synchrony that generates an energy profile characterized by massive and rapid load variations. The electrical consumption of a rack can go from an “idle” state, around 30% utilization, to 100% and back again in a matter of milliseconds. This forces engineers to oversize components to support the maximum current, not the average, which increases costs and space requirements. When these oscillations are added across an entire data room – which can represent hundreds of megawatts rising and falling in seconds – they pose a significant threat to the stability of the electrical grid, making interconnection with the grid a key bottleneck for the expansion of AI. Therefore, beyond the aforementioned boasting, there is some substance in those enormous figures that companies give. And what Nvidia says is backed by data. The big technology companies in the United States are taking over important technology centers. nuclear electricity production or with contracts with oil and gas companies. The coal is re-emerging in full decarbonization to feed the ‘gluttons’ data centers and we are seeing that this focus on LLM is leading large oil companies to give a turn in their plans to adopt renewable energies. AI needs fast energy capable of supporting those performance peaks, and renewables don’t seem like the way to go at the moment. Since we are dealing with grandiose figures, esteem that, between now and 2029, the world will spend about 3 trillion dollars (“its” three trillion) on data centers. And to give more context, it is what France’s economy was worth in 2024. Yeah Are we talking about a bubble or not?is another topic, but there are those who think that these ‘fanfare’ are very difficult to believe. Also who point that AI will have more impact than technologies so far, including the Internet, so we may need all that energy. Only time will tell. Image | İsmail Enes Ayhan In Xataka | While Silicon Valley seeks electricity, China subsidizes it: this is how it wants to win the AI war

Alibaba has presented its largest AI model, with a billion parameters. The question is whether at this point that means something

September 9, 2025 by usatoday24

The Chinese giant Alibaba has announced a new language model, the largest they have announced to date. It is called Qwen-3-Max and presumes that it has more than 1 billion parameters. The biggest. It is the last model within the series Qwen3 which was launched in May of this year and, as its name ‘Max’ indicates, it is the largest to date. Its size is given by the parameters, 1 billion to be exact, while the previous models of its series reached a maximum of 235,000 million. According to South China Morning Post (Which owner Alibaba), his model stands out in understanding of language, reasoning and text generation. Benchmarks. The results of the benchmarks place QWen3-Max ahead of competitors such as Claude Opus 4, Deepseek v3.1 and Kimi K2. If Gemini 2.5 Pro or GPT-5 does not appear, it is because they are models of reasoning and have only compared rapid response models. As they point out in Dev.toboth Gemini 2.5 Pro and GPT-5 obtain higher scores in mathematics and code, so reasoning models continue to have advantage in those areas. Qwen3-max-preview can already be tested free of charge. Benchmarks shared by Alibaba. Parameters. The parameters are all the internal variables that a model learns during training. In other words, it is the knowledge that the model has obtained from the data with which it has trained and allows it to interpret our requests and generate their answers. In theory, the more parameters, the model will have more and better capabilities. It also implies that it needs more computational power both to train and to execute the model. More does not mean better. The speech of the parameters remembers that of the megapixels with the first cameras. A 100 megapixel sensor will take larger photos than a 10 sensor, but there are other crucial factors that affect image quality such as sensor size or lens luminosity. Quality data. More parameters can be translated into more learning capacity and more resolution of complex tasks, as long as quality training data has been used. It is obvious: a language model that has been trained with redundant, incorrect or biased data will learn and continue to reproduce those errors in their operation. There are more. In 2022, the laboratory Deepmind from Google, discovered that many models were oversized in parameters but underlined in data. To demonstrate it they created the Chinchilla model with “only” 70,000 million parameters, but four times more data. The result was that it beat Gopher, a model with four times more parameters. Architecture. The architecture of the model is another decisive factor in order to achieve an efficient model; A standard architecture is not the same that forces the model to use its entire neuronal network, than one like Mixture of experts which consists of many smaller networks. It would be something like having an expert committee each with a specialty. In this way, the model can choose your expert for each query and not have to use the entire network. For example, with this technique, Mistral manages to use only a fraction of his parameters And so it is faster and cheap to execute. Image | Markus Winkler, via Pexels In Xataka | The ASML-Mistral alliance reveals the European plan B: if we cannot manufacture chips, we will at least control how they are manufactured

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections