Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
Modern AI agents consist of at least a large language model (LLM) that can invoke some tools. If you have the right set of tools for coding, you can start generating code, run it inside a container, observe the results, and modify the code, increasing your chances of producing useful code.
In contrast, generative AI models take some input and produce an output through a process of predicting expectations. For example, if you give it a coding task, it will generate some code, and depending on the complexity of the task, that code may be ready for use.
Agents are responsible for different tasks, so they need to be able to talk to each other. For example, imagine your company’s intranet with a convenient search box that lets you access the apps and resources you need. For large enough companies, these apps owned by different departments will each have their own search box. It makes a lot of sense to create agents to extend the search box using techniques such as search extension generation (RAG). It makes no sense to force the user to repeat a query after the first query is identified in the search box as valid. Rather, we want top agents to work with other agents representing different apps and provide users with a unified, unified chat interface.
Multi-agent systems representing different workflows of a software or organization have several interesting benefits, including increased productivity and robustness, operational resilience, and the ability to perform faster upgrades of different modules. . I hope this article helped you understand how this is achieved.
But first, how do we build these multi-agent systems?
Understand the organization and roles
First, you need to understand the processes, roles, responsible nodes, and connections of the various actors in your organization. Actors are individuals and software apps that act as knowledge workers within an organization.
An organization chart might be a good place to start, but we recommend starting with workflows, as the same people within an organization tend to behave in different processes and people depending on the workflow.
Tools are available that use AI to help identify workflows. You can also build your own AI-generated models. built as GPT It takes a domain or company name description and generates an agent network definition. My company utilizes a multi-agent framework built in-house, so GPT generates the network as a Hocon file, which explains the role and responsibilities of each agent and what other agents do. It should become clear what is there. is connected to.
Note that you need to ensure that your agent network is a directed acyclic graph (DAG). This means that no agent can be downchain and upchain of another agent at the same time, directly or indirectly. This greatly reduces the chance of queries in the agent network going into tailspin.
In the example described here, all agents are LLM-based. If a node in a multi-agent organization does not have autonomy, its agents must be paired with a human counterpart and everything must be run by a human. All processing nodes must be represented as agents, including apps, humans, and existing agents.
There have been a number of announcements recently from companies offering specialist agents. Of course, we would like to use such an agent if possible. You can take advantage of agent-to-agent communication protocols by taking an existing agent and wrapping its API into one of your agents. This means that such third-party agents must have an API available to us.
How to define an agent
Various agent architectures have been proposed so far. For example, a blackboard architecture requires a central communication point where various agents declare their roles and capabilities, and where the blackboard calls the agents depending on how it plans to carry out its requests ( OAA).
I prefer a more distributed architecture that respects encapsulation of responsibility. Each agent that receives a request decides whether it can process it and what it needs to do to process it, and returns a list of requirements to the requesting upchain agent. If the agent has a downchain, it will ask if it can help fulfill all or part of the request. When it receives requirements from a connected downchain, it checks other agents to see if they can fulfill those requirements. If not, it will be sent upchain so that you can ask a human user the question. This architecture is called Ulva architecture, and interesting fact, the architecture used in early versions of Siri.
Below are sample system prompts that you can use to turn your agent into an AAOSA agent.
When we receive your inquiry, we will:
- Invoke the tool to determine which downchain agents within the tool are responsible for all or part of it.
- Ask your downchain agents what they need to handle their part of the inquiry.
- Once the requirements are collected, delegate the inquiries and fulfilled requirements to the appropriate downchain agent.
- Once all downchain agents have responded, compile their responses and return the final response.
- It may then be called by other agents in the system and need to act as a downchain to those agents.
In addition to a set of roles and responsibilities defined in natural language at each agent’s system prompts, the agent may or may not contain tools that can be invoked with various arguments passed to the tool. There is. For example, a product manager agent might need to be able to process various tickets on a virtual kanban board, or an alert agent might need to call a tool that issues alerts in an alert system.
Current multi-agent systems, such as Microsoft AutoGen, have elaborate and often hard-coded agent coordination mechanisms and architectures. I prefer a more robust setup where the agent treats the direct downchain agent as a tool, with loosely defined arguments that can be entered, and the semantics determined by the agent when needed.
This configuration allows you to define downchain agents as function calls.
“aaosa_call”: {
“description”: “Returns a natural language string as a response, depending on the mode.”,
“parameter”: {
Type: Object;
“property”: {
“inquiry”: {
“Type”: “String”,
“Description”: “Inquiry”
},
“mode”: {
“Type”: “String”,
“explanation”: “””
Indicates whether the agent is asked to determine whether the contact belongs to the agent
its whole or part. or if you are asked to provide requirements to meet
If you have any inquiries. or if you are asked to respond to an inquiry.
“”
},
},
“Required”: [
“inquiry”,
“mode”
]
}
}
AI agent practice
Let’s take a look at the HR agent scenario. Users may notify an agent in the main search box of a life-changing event, such as “a loved one has passed away,” and summon a legal or human resources agent whose role is to serve the user. This is because they receive information from some of the downchain agents, namely legal, payroll, and benefits agents. The unified response that the user receives from the main search box agent looks like this:
- Human resources:
- Benefits representatives help you understand how your death will affect your benefits, health insurance, and retirement plans. You will need to provide information about the type of plan involved and whether your loved one was a dependent.
- Payroll professionals can help manage tasks like updating tax credits and processing bereavement leave benefits. This requires information about your company’s policy on bereavement leave and current pay settings.
- Legal aspects:
- Your legal representative can provide general advice on how this passing relates to your company’s operations.
Each department requires specific information in order to provide customized support.
In contrast, in an IT scenario, the single entry point is a customer support agent who can handle complex, multifaceted requests such as: We recently added staff to our office, so I think we’re overloaded. You will also need a quote for the upgrade. ”
The example response outlines the steps.
Assess network performance: Network operations center specialists can help address slowdowns and potential overload issues. You need access to network performance data, details about your current network infrastructure, and information about the number of people added and network usage patterns.
Upgraded Quotes: Sales engineers and field technicians can provide upgrade quotes. To do this you will need:
- Specific information about your current network setup, including equipment and bandwidth requirements.
- Desired performance improvement.
- Budget constraints and specific preferences and requirements.
- The size of your upgrade and your specific performance goals.
Hopefully you now have a good idea of what it takes to set up a multi-agent network. In Part 2, we discuss the importance of implementing safeguards when creating multi-agent systems and outline how to incorporate controls that allow for human intervention and uncertainty checking. It also details the steps required to create a safety agent that monitors an agent network, and discusses multi-agent network development challenges such as tailspin and overload, and how to overcome them using timeouts, task splitting, and redundancy. Learn more about how to reduce this.
Babak Hodjat is CTO of AI. I recognize.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
Why not consider contributing your own articles?
Read more about DataDecisionMakers