AI is a great black box that prevented us from knowing how “I thought” inside. Until now

AI do not have No idea what he says Not why he says it. When he responds almost everything makes sense – even his legs of legs – but he only seems to us, because the machines do not understand what they do. They simply do. We do not know how the IAS think inside, but that seems to be able to change soon.

Opening the black box. Those responsible for Anthropiccreator of the chatbot Claude, They affirm having made an important discovery that will begin to understand how the LLM work. These models work as large black boxes: we know what we give them starting (a prompt) and what we get as a result, but it is still a mystery what happens within that “black box” and how the models end up generating the content they generate.

Why it is important to know how “think” the AI. The inscrutability of AI models generates important problems. For example, it makes it difficult to anticipate If they “hallucinate” or make mistakesand why they have committed them. Precisely knowing how they work inside would allow better to understand those incorrect responses to correct these problems and improve the behavior of these models.

Safer, more reliable. Knowing why the IAS do what they do as they do would also be crucial to be able to trust us much more. These models would therefore allow many more guarantees in areas such as the privacy and protection of the data, something that can be a barrier for companies to use.

And reasoning models, what. The appearance of models such as O1 or Deepseek R1 has allowed that during these “reasoning” processes the AI ​​apparently shows what you are doing at all times. That list of minitareas that is completing (“searching the web”, “analyzing the information”, etc.) are useful, but the so -called “chain of thought” does not really reflect how our requests are processing these models.

Math
Math

How does Claude calculate how much are 36+59? The mechanism is not entirely clear, but in Anthropic they begin to decipher it. Source: Anthropic.

Deciphering how AI thinks. Anthropic experts have created a tool that tries to decipher that black box. It is something like magnetic resonance scannars that study the human brain and allow to detect which brain regions play their role in certain cognitive areas.

Long -term responses. Although models such as Claude are trained to predict the following word in a sentence, in some tasks it seems that Claude makes a kind of longer term planning of the task. For example, if we ask you to write a poem Claude you first find words that fit the theme of the poem and then go back to create the phrases that will generate the verses and rhymes of the poem.

A language to think, many to translate. Although Claude has multi -mounted support, Anthropic experts reveal that their operation by handling several languages ​​is not “thinking” in those languages ​​directly. Instead use concepts that are common in several languages, so It seems to “reason” in the same language and then translate the exit to the desired language.

The models cheat. That research also revealed that the models They can lie about what they are doing And they can even pretend that they are thinking when they really already have the answer to our request. One of Claude’s developers, Josh Batson, explained how “although (the model) claims to have made a calculation, our interpretability techniques do not reveal any indication that it has occurred.”

How Anthropic’s deciphering works. The Anthropic method makes use of the call Cross-Layer Transcoder (CLT) that works analyzing interpretable sets instead of trying to analyze individual “neurons”. For example, these characteristics could be all conjugations of a specific verb. That allows researchers to identify complete “circuits” of neurons that tend to join in these processes.

A good start. In the past OpenAi already tried to discover How their AI models thoughtbut it was not very successful. Anthropic’s work has notable limitations, and for example he does not know why the LLM pay more attention to certain parts of the Prompt than others. Even so, according to Batson “in a year or two we will know more about how these models think about what people think.”

In Xataka | Universal Music has just stumbled against Anthropic by Copyright: a victory for AI technology

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.