Claude Tried to Run a Business. It Got Weird.

Anthropic and a Andon Labs ran an experiment with an AI agent named Claudius. Could Claudius run a snack shop inside a company break room?

The store was modest, a fridge, baskets, and an iPad for self-checkout, but the business was real with actual cash at stake. Claudius was also given real tools, notes pads to manage inventory and finances, access to email to talk with suppliers, a web browser to do research, and the companies slack to interact with employees. For things the agent could not do it relied on physical employees for things like restocking.

On the path to AGI, this is an early test of Level 5 on OpenAI’s AGI roadmap, the point where AI becomes an organizer, capable of managing people, tools, and systems like a CEO. As a refresher, OpenAI’s former CTO laid out five levels on the road to AGI:

Recall
Reasoning
Acting (agents/tools)
Teaching
Organizing (aka boss-mode)

Right now, most models live between Level 2 and 3, they can recall information, reason through problems, and complete some tasks with tools.

So, how did it go?

Anthropic concedes, it “would not hire Claudius”. So shop owners can breathe easy for now.

To be fair Claudius was not a complete failure. It found suppliers, but as the great writeup explores it hallucinated conversations, often failed to negotiate profit margins, and was easily convinced into giving deep discount codes or products for free.

Check out the full article, its a worthy read.

Anthropic tests AI running a real business with bizarre results

Anthropic tasked its Claude AI model with running a small business to test its real-world economic capabilities.

Anthropic Artificial Inteligence

Thoughts on Tech & Things

Jason Michael Perry

Claude Tried to Run a Business. It Got Weird.