Claude Tried to Run a Business.  It Got Weird. — Jason Michael Perry

Anthropic and a Andon Labs ran an experiment with an AI agent named Claudius. Could Claudius run a snack shop inside a company break room?

The store was modest, a fridge, baskets, and an iPad for self-checkout, but the business was real with actual cash at stake. Claudius was also given real tools, notes pads to manage inventory and finances, access to email to talk with suppliers, a web browser to do research, and the companies slack to interact with employees. For things the agent could not do it relied on physical employees for things like restocking.

On the path to AGI, this is an early test of Level 5 on OpenAI’s AGI roadmap, the point where AI becomes an organizer, capable of managing people, tools, and systems like a CEO. As a refresher, OpenAI’s former CTO laid out five levels on the road to AGI:

  1. Recall
  2. Reasoning
  3. Acting (agents/tools)
  4. Teaching
  5. Organizing (aka boss-mode)

Right now, most models live between Level 2 and 3, they can recall information, reason through problems, and complete some tasks with tools.

So, how did it go?

Anthropic concedes, it “would not hire Claudius”. So shop owners can breathe easy for now.

To be fair Claudius was not a complete failure. It found suppliers, but as the great writeup explores it hallucinated conversations, often failed to negotiate profit margins, and was easily convinced into giving deep discount codes or products for free.

Check out the full article, its a worthy read.

Anthropic Artificial Inteligence