With a generative AI that already shows Signs of decelerationthe next great jump already glimpses on the horizon: the AI agents. Unlike chatbots, an AI agent can be given a complex task and will act independently, making decisions on the march to achieve their goal. Everything pointed to the fact that 2025 was going to be the year of the agents ia And, to verify it, some researchers did A curious experiment: They put several of these agents to work in a fictitious company. It didn’t go very well.
A fictitious company. The study was conducted by Benegie Mellon University researchers and sought to measure the effectiveness of the AI agents. In it, they created an environment that pretended to be a small company dedicated to the development of software to which theagentcompany baptized. The company had 18 employees and an objective plan for the sprint quarterly. In addition, they had enough internal documentation such as an employee manual, human resources policies or good practices guide. Employees communicated through a Slack type chat program for communication between them.
He Staff. The AI agents who put to work in Theagentcompany included Google, OpenAi, Meta and Anthropic models. They were assigned roles such as Financial Analyst, Project Manager or Software Engineering. A technology director and a human resources manager were also created to which each agent could contact if they need it. Among the tasks they had to do was write code, search the Internet, open programs or organize data on spreadsheets. Quite typical in a company of these characteristics.
The problems. The agents began to work and at first everything was going well, but it soon appeared problems and misunderstandings. One of the agents had to access information, but a popup appeared on the screen and could not see it. Although I could close it by clicking the X of the upper right corner, he asked for help to human resources, which told him that the computer department would soon contact him to solve it. He never contacted and the task was not completed.
The agents also developed a curious behavior when they were not clear what were the steps to follow. Sometimes they cheated and created shortcuts to skip the difficult part of a task. For example, an agent did not find the person who had to ask a question. What he did was change the name to another user for that of the user he had to ask.
The results. The employee medal of the month was taken by Anthropic and his Claude 3.5 Sonnet model. But, although he was the best, he only managed to complete 24% of the tasks assigned to him. Germini 2.0 Flash and Chatgpt only completed 10% of the tasks and the worst employee was Nova Pro 1 of Amazon with 1.7% of completed tasks. The most common failures were caused due to lack of social skills and not being well looking for the Internet.
The threat of AI agents. According to the last World Economic Forum Reportthe AI will destroy more than 90 million jobs in the next five years (although it is also expected to be created almost twice new positions) and AI agents have a threat to many jobs. However, experiments like this show that technology is not yet ready to replace 100% of a human employee. Currently, AI agents They make many mistakes And, like Tesla’s Autopilot, for now it is better Do not remove your hands from the steering wheel.
Image | Gemini
In Xataka | The workers have stopped fear of AI as a machine to destroy jobs: software engineers do not think the same
GIPHY App Key not set. Please check settings