usatoday24

I’ve spent a few hours trying Grok 3the new version of Xai AI. I wanted to see their real abilities and especially how it behaves, what kind of results, in front of Chatgpt, Claude, Le chat, Deepseek…

Reasoning and problem solving

It stands out in mathematical problems. I made him complete the challenge Aime’24of which he was 6 of the 15 problems, for the 9 hits of OPENAI O3-mini-high. In addition, Grok 3 took a little less than five minutes, but O3-mini-High took almost six. Very striking to see their self -assessments until you find the correct answer (although sometimes it was not).

A fragment of the steps that Grok 3 took to evaluate their own conclusions before presenting them as a final result. Image: Xataka with Grok 3.

In basic reasoning tests, such as determining the number of repeated letters in somewhat complex words (the classic “lollapalooza”) or comparing decimals (9.11 vs 9.9), Grok 3 responds correctly after a few seconds of visible “thought”.

O3-mini-High Be right after 6 seconds. Image: Xataka with chatgpt.

Grok 3 also succeeded, but after more than four times longer. Image: Xataka with Grok 3.

In a Greek mythology question about Jason’s maternal great-grandfather, Grok 3 found the correct answer in 18 seconds … while O3-mini-High needed 22 seconds to fail. Well played, Grok.

O3-mini-High missed. Image: Xataka with chatgpt.

Grok 3 instead gave a better built response, in addition to correct. And it takes less. Image: Xataka with Grok 3.

I have tried Deepseek on the web and in my Mac. Chatgpt, Claude and Gemini have a problem

Search and synthesis

Its function Deepsearch is fast but sometimes it is not entirely precise and does not mention any important detail. I asked to analyze the impact of AI on chips design and, although it generated a text of 1,504 words with several quotes in just over a minute, omitted to mention important advances such as the framework ALPHACHIP from Google. In later and insistent attempts he did.
I also asked for a full report on Xataka On financial, media, reputational level, etc. It was quite successful, although it showed an inherent limitation of any Deep Research system: He knows a lot about what is in public, but he doesn’t have many Insightsit lacks the expert criteria that knows not only the public, but what underlies. This is something of Grok and any other with Deep Research. When you ask for information about something you do not control, it is easy to assume that Deep Research (or in this case, Deepsearch) gives you everything. When you are in garlic it is easy to detect deficiencies. As in this example.

Image: Xataka with Grok 3.

The speed impresses: it is remarkably faster than Deep Research of OpenAi … but at the cost of sacrificing depth for speed. Of course, your selection of sources and appointments is usually really good.
Unlike Gemini, it does not allow exporting reports directly to documents or customize the research approach. Again: Grok is very intelligent and capable, in his own way, but he lacks a product. Little llm is of little use if it forces to start from scratch and process all the information.

Creativity and tone

To try his creative writing I asked him for a story about a time traveler facing a paradox. The result was quite solid in construction of characters, details, descriptions and atmosphere, overcoming even the one that I consider the best in that aspect, Claude 3.5 Sonnet. Of course, some plot turns seem quite forced.

Image: Xataka with Grok 3.

His humor is basic and predictable, limited almost all the time to quite obvious words games. Teenage humor If the concept of disturbing valley You can move to a chatbot, Grok 3 is in that 99%: too fine to look like a caid robot, too predictable to finish convincing.
It maintains political neutrality even on issues such as immigration or trans rights. Musk says it can be politically incorrectbut it seems that it is something that has more to do with what the user requires that a feature of his personality. That is to say: it can be taken out of correction, but only when the user pushes to it.

Some limitations

It does not allow customizing the behavior of the model, unlike chatgpt; or the response style, as Claude allows.
It is limited to being a text drawer. You are only accompanied by the buttons to attach a file, activate your Deepsearch or activate your reasoning mode. That, and a few elementary instructions. No projects such as Claude or GPTS of Chatgpt, or the agents of Le Chat. In short: Nothing to retain pre -established contexts and guidelines or documentation to facilitate work. We always have to start from a new canvas.

The interface is good, intuitive, simple … but they miss tools that make it more versatile and appetizing to integrate it into our day to day. It is powerful and capable for specific uses, but the product built around chatgpt, claude or le chat (projects, agents, previous instructions, etc.) make those alternatives something much more interesting for serious and recurring use. Image: Xataka with Grok 3.

The security keeps are stricter than those of Grok 2. with that version We hallucinate for its lack of scruplesbut Grok 3 seems to recover them: he refused to generate a template for a mass fraud campaign by mail simulating that I am a Valencian prince in search of heiress.
The generation of images does seem, again, more lax. Midjourney does not allow to create anything containing the words “Donald Trump” or “President of the United States”. Nothing. Grok 3 does not put so many objections. Not even with its owner.

Image: Xataka with Grok 3.

You can try Grok 3 from its official website or from Its integration into x (which is why you have seen two somewhat different interfaces in this article). It is temporarily free, but we already know that it will be one of the reasons to pay a subscription to X, and not the cheap.

Its ability is undeniable, but we have so much offer of similar alternatives that being a bit more intelligent or fast is not differential. The difference is the product, and that is where Grok 3 has more margin of improvement.

Outstanding image | Xataka with Mockuuuups Studio

In Xataka | Deep Research is not just a new AI function. It is the beginning of the end of intellectual work as we know it

Leave your vote

0 Points

Upvote Downvote

I have tried Grok 3 and it is really intelligent and fast. The problem is that this is no longer enough

Reasoning and problem solving

Search and synthesis

Creativity and tone

Some limitations

Leave your vote

Leave a CommentCancel reply

Reasoning and problem solving

Search and synthesis

Creativity and tone

Some limitations

Leave your vote

Leave a CommentCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections