An experiment has put four chatbots from the US and two from China to invest $10,000 in cryptocurrencies. The Chinese are sweeping

What would happen if you gave GPT-5 $10,000 to invest in cryptocurrencies? What if you gave them to other models at the same time and they competed with each other? That’s just the idea they had in Nof1…and the result is fascinating.

Six models investing in cryptos. Those responsible for Nof1 have created Alpha Arena, a new type of benchmark that according to them “gets more difficult the smarter the AI ​​is.” The idea is relatively simple: measure the performance of six cutting-edge models to see how they perform when given $10,000 (real) and invested in cryptocurrencies in real markets. The contenders are the following:

  • GPT-5
  • Gemini 2.5 Pro
  • Claude Sonnet 4.5
  • Grok 4
  • DeepSeek Chat v3.1
  • Qwen 3 Max
Screenshot 2025 10 29 At 16 04 54
Screenshot 2025 10 29 At 16 04 54

DeepSeek has turned his $10,000 into almost $20,000, and Qwen into $15,000, fantastic. GPT-5 and Gemini 2.5 Pro have lost 65% of their value and are both at $3,500. Total disaster.

DeepSeek and Qwen triumph, GPT-5 and Gemini sink. The result of these 11 days since this “race” began is fascinating. The two Chinese models, DeepSeek and Qwen, have obtained enormous benefits: in DeepSeek the return is 97% at the moment (it was as high as 123%), while Qwen is not doing badly at 53%. Claude (0.84%) and Grok (-8.2%) are maintaining or losing slightly, but pay attention, because GPT-5 (-65.7%) and Gemini 2.5 Pro (66%) are currently losing two thirds of what they invested.

Screenshot 2025 10 29 At 16 06 18
Screenshot 2025 10 29 At 16 06 18

The summary of winners and losers not only shows that positive or negative return, but also something curious: the number of operations. GPT-5 (75 moves) and especially Gemini 2.5 Pro (193!) are extremely restless. Although it does not have to be this way always, those who operate the least are the ones who are earning the most.

Crypto fortunes that come and go. For this experiment, the models can invest in six of the most relevant cryptocurrencies on the market: bitcoin, ethereum, dogecoin, ripple, solana and BNB. The models decide whether to take positions in one or several, as well as the amounts and level of leverage. Positions are normally held for a few hours, although in some cases they may be held for days.

Learning little by little. All of them have been competing since last October 18 in the “first season” of an experiment that will last until November 3. As explain its creatorsthis first iteration will allow us to obtain the first conclusions about how these models perform in the financial field.

Here we come to earn money. The goal is simple: maximize profits and minimize losses (PnL). This first season is just that, because from then on we will apply what we have learned after each season to polish the prompts and add new features to the experiment and thus create models that in theory will perform better and better when investing in financial markets.

Algorithmic trading at its best. What these models are doing would be crazy for human investors, especially since all of them not only expose themselves to the volatility of the crypto market, but also multiply it because they make use of the leverage (leverage). With this mechanism one can achieve huge profits much faster, but the risk is also extreme. The models in fact use absolutely extraordinary leverages of 20x or 25x, and can take either short positions (short, you “bet” that the price of an asset will go down) or long (long, you “bet” that the price of the asset will go up).

Nof1 Process
Nof1 Process

The operation of the benchmark experiment is relatively simple, but it will become more complicated in future seasons.

Machines don’t panic. To try to control these risks, the models have clear rules in their prompts regarding risk limits (establishing clear stop loss signals, for example) or confidence in their criteria. And furthermore, they follow them, which allows the models to maintain their position unless these signals occur. Here, by the way, we are talking about medium or low frequency trading: decisions are made in minutes or even hours, not in microseconds. That, the creators say, allows us to answer the question of whether a model can make good decisions if it has enough time and information.

Don’t even think about doing it at home.. This experiment is just that, an experiment, and in fact financially speaking it is leaking everywhere. To begin with, because the trial period of this first season is extremely short and does not allow long-term behavior to be evaluated. And finally (among many other things), because the information to which the models have access is very limited. They do not take into account news related to this area and only have numerical data that correspond to average prices and current and historical volumes, and some technical indicators. That information.

Screenshot 2025 10 29 At 15 58 27
Screenshot 2025 10 29 At 15 58 27

On the right side DeepSeek v3.1 confesses how it maintains its position because no condition that invalidates it is met, and by clicking on it you can see what it takes into account (value of BTC or ETH, for example) to modify or not modify that criterion.

The models tell everything. One of the sections of the interface shows the “Model Chat” where it is possible to see how each model “reflects” on its position. If we click on that reflection we can see all the current and historical data with which he has worked to reach that decision (I maintain my position, I change it) and thus we can find out at all times his reasons for making a move.

Just because they win now doesn’t mean they are the best.. Those responsible for Nof1 explain that this is not about declaring the best trading model of the six, because this is just an experiment. As they say, “we are deeply aware of the flaws of this first season, including, but not limited to: response bias, limited sample sizes/lack of statistical rigor, and brevity of the evaluation period.” This experiment will be repeated over different seasons and with new features that will be added to the decision mechanisms and information available to the models, and without a doubt all of this will contribute to better determining how these models behave and, perhaps, how to be clear if some actually behave better than others in a consistent manner. Fascinating.

Image | Aedrian Salazar

In Xataka | A country has undertaken the largest cryptocurrency experiment in the world: Bhutan and the 800,000 ID cards with Ethereum

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.