One day, our cyberspace will be flooded with AI agents. Before that day, we need to figure out how to coordinate our AI agents better. Let’s say each of us own 5 agents, there will be 8 billions*5, 40 billion agents in the future. How should my AI assistant interact with yours? What if they collude? Many companies already start bringing AI agents to our day-to-day life.
Now is the right time to call for a new decentralized coordination protocol. Before large language model (LLM) based agents become the new paradigm, multi agent systems rely on reinforcement learning to model complex interactions and achieve coordinations. The general architecture of these systems is new, marked by an increasing reliance on the sovereignty and autonomy of individual agents. Therefore, LLM-based multi agent systems require new coordination mechanisms.
We want to improve Language Agents as Optimizable Graphs, a computational graph for collective intelligence (intelligence arises from interactions among individuals). We envision a more generalized and optimal protocol. There are three core ideas/insights.
-
This target protocol does not connect node optimizations with edge optimizations. Node optimizations are mostly LLM prompts, which should affect orchestration (edge optimizations). What if agents build their own graphs autonomously?
-
The target protocol mainly focuses on problem-solving, and demonstrates optimization results on related benchmarks. AI agents will not only serve as problem solvers; as demonstrated by Character AI, they also impact companionships and provide emotional support. Since the target protocol is based on swarm, which perform well in world simulations, we aim to develop a more generalized coordination protocol that can address diverse and complex inter-agent relationships.
-
An agent’s capabilities are upper-bounded by the complexity of the world it lives in.
Games (and simulation in general) will provide the next trillion high-quality tokens to train our foundation models. In addition, unlike multi agent reinforcement learning systems, which primarily rely on offline training datasets, LLM-based multi agent systems learn mainly from real-time feedback through interactions with the environment, other agents, and humans. So we aim at having our proof of concept in a gaming environment for people to interact with an AI agent society.
What is your discovery methodology for investigating the current state of the target protocol? Eg: field observation, expert interviews, historical data analysis, failure event analysis
- Literature review: the list can be very long, so below are some highlights.
- Expert interview: thanks to being in San Francisco Bay Area, and having some lucky experiences and thick skins, we have been exchanging notes with some experts and will continue doing so.
- Testing: thanks for a lot of open-source projects, we ran and tested different multi-agent projects.
In what form will you prototype your improvement idea?
- Have our agents in a customized version of this open source MMORPG game: it is similar to Minecraft, where agents can take actions and evolve themselves in a complex virtual worlds, as well as team up with people.