Baseline Agentic AI Systems Architecture

JorgeGX · Aug 20, 2024

Agentic AI Systems are designed to resolved complex problems with limited direct human supervision [1]. These systems are composed of multiple conversable agents that converse with each other and can be orchestrated centrally or self-organize in a decentralized manner [1, 2]. As the usage of multi-agents systems increases in the enterprise to automate complex processes or solve complex tasks, we would like to take a closer look at what the architecture of such systems could look like.

These agents possess capabilities such as planning, allowing them to predict future states and select optimal actions to achieve specific goals. They also incorporate memory, enabling them to recall past interactions, experiences, and knowledge, which is crucial for maintaining continuity in tasks and refining strategies. Additionally, agents can utilize various tools, including APIs and external software, to execute code, query databases, and interact with other systems [1, 3]. This tool usage extends their functionality and enables them to perform a wide range of actions.

Because agents can take actions, write, and execute code, there is a potential risk of running code that could be malicious or harmful to the host system or other users [3]. Therefore, understanding the architecture of these systems is crucial to sandboxing code execution, restricting or denying access to production data and services, and mitigating failures, vulnerabilities, and abuses.

This article provides a baseline architecture for building and deploying Agentic AI Systems that use frameworks like AutoGen, LangChain, LlamaIndex or Semantic Kernel. It is based on Baseline OpenAI end-to-end chat reference architecture [4]. It provides Azure Container Apps or Azure Kubernetes Services as the main platform to deploy agents, orchestrator, API and prompt flow. For certain models, it is required to have a Machine Learning Workspace, therefore we integrated it also in the architecture. To use Azure Open AI efficiently we propose to place the service behin an Azure API Management that provides policies dedicated to the Azure OpenAI service. Optionnaly if a UI is required, we propose to use App Service. Outside of the workloads network, an Azure Container Apps environment is deployed with serverless code interpreter session (preview) [3] to run the code generated by agents.

Architecture

Components

Many components of this proposed architecture are similar to the Baseline OpenAI end-to-end chat reference architecture [4] as the main components of Agentic AI Systems are Azure OpenAI, Azure AI Search and Azure AI Services. We are going to highlight the main components of this architecture.

Azure AI Studio [5] is a managed cloud service used to train, deploy, automate, and manage machine learning models, including large language models (LLM), small language models (SLM), and multi-modal models used by the agents. The platform provides a comprehensive suite of tools and services to facilitate the end-to-end machine learning lifecycle. Key features of the Azure AI Studio include:
- Prompt Flow [6] is a development tool designed to streamline the entire development lifecycle of Generative AI applications. It supports creating, testing, and deploying prompt flows, which can be used to generate responses or actions based on given prompts. These prompt flows can be deployed to a Machine Learning Workspace or containerized and deployed to Azure Container Apps or Azure Kubernetes Services [7]. AI Studio can also be used to develop and deploy these prompt flows.
- Managed Online Endpoints are used by agents and backend services to invoke prompt flows for real-time inference. They provide scalable, reliable, and secure endpoints for deploying machine learning models, enabling real-time decision-making and interactions [7].
- Azure AI Dependencies include essential Azure services and resources that support the functioning of AI Studio and associated projects [8]:
  - Azure Storage Account stores artifacts for projects, such as prompt flows and evaluation data. It is primarily used by the AI Studio to manage data and model assets.
  - Azure AI Search, a comprehensive cloud search service that supports full-text search, semantic search, vector search, and hybrid search. It provides search capabilities for AI projects and agents and is essential for implementing the Retrieval-Augmented Generation (RAG) pattern. This pattern involves extracting relevant queries from a prompt, querying the AI Search service, and using the results to generate a response using an LLM or SLM model.
  - Azure Key Vault used for securely storing and managing secrets, keys, and certificates required by agents, AI projects, and backend services.
  - Azure Container Registry stores and manages container images of agents, backend APIs, orchestrators, and other components. It also stores images created when using a custom runtime for prompt flows.
  - Azure OpenAI service enables natural language processing tasks like text generation, summarization, and conversation.
  - Azure AI Services offers APIs for vision, speech, language, and decision-making, including custom models.
    - Document Intelligence extracts data from documents for intelligent processing.
    - Azure Speech converts speech to text and vice versa, with translation capabilities.
    - Content AI Safety ensures AI-generated content is ethical and safe, preventing the creation or spread of harmful or biased material.
These components and services provided by the Azure AI Studio enable seamless integration, deployment, and management of sophisticated AI solutions, facilitating the development and operation of Agentic AI Systems.

Azure Cosmos DB is well suited for Agentic AI Systems and AI agent [9]. It can provide "session" memory with the message history for conversable agents (e.g. ConversableAgent.chat_messages in Autogen [9, 10]). It can also be used for LLM caching [9, 11]. Finally it could be used as a vector database [9, 12].
Azure Cache for Redis is an in-memory store that can be used to store short term memory for agents and LLM caching like for Autogen [11, 13]. It could also be used by backend services to improve performance and as a session store [13].
Azure API Management is, for us, a core architectural component to manage access to the Azure OpenAI service, especially when it is used by multiple agents. First, you can import your OpenAI API in the API Management directly or using an OpenAPI specification [14]. When imported, you have multiple ways to authenticate and authorize access to Azure OpenAI APIs using API Management policies [15]. You can also use API Management to monitor and analyze the usage of the Azure OpenAI service [16], set a token limit policy [17], and enable semantic caching of the responses to Azure OpenAI requests to reduce bandwidth, processing requirements, and latency [18]. For semantic caching, the deployed Azure Cache for Redis can be used [18]. Finally, with API Management policies, you can implement smart load balancing of Azure OpenAI service [19-21]. For all these reasons, in this architecture, the agents are not calling directly the Azure OpenAI service but are calling it through the Azure API Management. API Management is also used to exposed API of the backend services to the agents and the external world.

Azure Container Apps is a serverless platform designed focus on containerized applications and less on the infrastructure [22]. It is well suited for Agentic AI Systems. Agents, orchestrator, prompt flows and backend API can all be deployed as Container Apps. It provides a reliable solution for your agents, orchestrator, prompt flows and backend API. They can scaled automatically regarding the load. Container Apps also provide Dapr integration that helps you implement simple, portable, resilient and secure microservices and agents [23].
- For asynchronous between agents and between agents and an orchestrator, we propose to use Azure Service Bus. It is a fully managed enterprise message broker with message queues and publish-subscribe topics [24]. It provides a decoupled communication between agents and between agents and an orchestrator. Dapr can be used to communicate with Azure Service Bus [24]. Dapr provides resiliency policies for communication with Azure Service Bus (preview) [25].
- For synchronous communication between agents and between agents and an orchestrator, you can Dapr service-to-service invocation. It is a simple way to call another service (agent or orchestrator) directly with authomatic mTLS authentication and encryption and using service discovery [24]. Dapr also provides resiliency for calling services but it cannot be applied to requests made using the Dapr Service Invocation API [26].
- An Azure Kubernetes Services (AKS) architecture is provided below. You can deploy Dapr on Azure Kubernetes Services or use service meshes for direct communication between agents and between agents and an orchestrator. Azure Kubernetes Services provides also a reliable solution for your agents, orchestrator, prompt flows and backend API.

Azure Containers Apps Code Interpreter Sessions (preview) is fully isolated and is designed to run untrusted code [3]. It is is provided by Azure Container Apps dynamic sessions (preview) that provides fast access to secured sandbox environment with strong isolation [27]. Code interpreter sessions are fully isolated form each other by a Hyper-V boundary, providing enterprise-grade security and isolation [3, 27]. Outbound traffic can also be restricted [3]. By default, Python code interpreter sessions include popular Python packages such as NumPy, Pandas, and Scikit-learn [3]. You can also create custom Azure Container Apps custom container sessions tailored to your needs [28]. Azure Container Apps Code Interpreter Sessions and Custom Container Sessions are well suited for running code generated by agents in a secure and isolated environment. It is a critical component of the architecture to prevent malicious code execution and protect the host system and other users.

Conclusion

Agentic AI Systems represent a significant advancement in artificial intelligence, providing autonomous decision-making and problem-solving capabilities with minimal human intervention. By leveraging conversable agents with planning, memory, and tool usage capabilities, these systems can address complex enterprise challenges. The proposed architecture, utilizing Azure's suite of services—including Azure OpenAI, AI Studio, Azure API Management, Container Apps and many others —provides a robust foundation for deploying these intelligent systems. Ensuring the safety, reliability, and ethical operation of such systems is critical, particularly in managing code execution and data security. As the field evolves, continuous refinement of these architectures and practices will be essential to maximize the benefits and minimize risks associated with Agentic AI.

References

Appendix

Thanks

Special thanks to our colleagues for their feedback on the architecture:

Anurag Karuparti
Freddy Ayala
Hitasi Patel
Joji Varghese
Paulrick Garraway
Sam El-Anis
Srikanth Bhakthan
Zouhair Ramram

Baseline Agentic AI Systems Architecture

JorgeGX

Architecture​

Components​

Conclusion​

References​

Appendix​

Thanks​