C
Charles_Chukwudozie
Introduction:
In this post, I explore a recent experiment aimed at creating a RAG pipeline tailored for the insurance industry, specifically for handling automobile insurance claims, with the goal of potentially reducing processing times.
I also showcase the implementation of Autogen AI Agents to enhance search retrieval through agent interaction and function calls on sample auto insurance claims documents, a Q&A use case, and how this workflow can substantially reduce the time required for claims processing.
RAG workflows in my opinion represent a novel data stack, distinct from traditional ETL processes. Although they encompass data ingestion and processing similar to traditional ETL in data engineering, they introduce additional pipeline stages like chunking, embedding, and the loading of data into vector databases, diverging from the standard Lakehouse or data warehouse pipelines.
Each stage of the RAG application workflow is pivotal to the accuracy and pertinence of the downstream LLM application. One of these stages is the chunking method, and for this proof of concept, I chose to test a page-based chunking technique that leverages the document’s layout without relying on third party packages.
Key Services and Features:
By leveraging enterprise-grade features of Azure AI services, I can securely integrate Azure AI Document Intelligence, Azure AI Search, and Azure OpenAI through private endpoints. This integration ensures that the solution adheres to best practice cybersecurity standards. In addition, it offers secure network isolation and private connectivity to and from virtual networks and associated Azure services.
Some of these services are:
Document Extraction and Chunking:
These templates include forms with data detailing the accident location, description, vehicle information of the involved parties, and any injuries sustained. Thanks to the folks at LlamaIndex for providing the sample claims documents. Below is a sample of the forms template.
claims sample form
The claim documents are PDF files housed in Azure Blob Storage. Data ingestion begins from the container URL of the blob storage using the Azure AI Document Intelligence Python SDK.
This implementation of a page-based chunking method utilizes the markdown output from the Azure AI Document Intelligence SDK. The SDK, setup with the prebuilt-layout extraction model, extracts the content of pages, including forms and text, into markdown formats, preserving the document’s specific structure, such as paragraphs and sections, and its context.
The SDK facilitates the extraction of documents page by page, via the pages collection of the documents, allowing for the sequential organization of markdown output data. Each page is preserved as an element within a list of pages, streamlining the process of efficiently extracting page numbers for each segment. More details about the document intelligence service and layout model can be found at this link.
The snippet below illustrates the process of page-based extraction, preprocessing of page elements, and their assignment to a Python list:
page extraction
Each page content will be used as the value of the content field in the vector database index, alongside other metadata fields in the vector index. Each page content is its own chunk and will be embedded before being loaded into the vector database. The following snippet demonstrates this operation:
Define Autogen AI Agents and Agent Tool/Function:
The concept of an AI Agent is modeled after human reasoning and the question-and-answer process. The agent is driven by a Large Language Model (its brain), which assists in determining whether additional information is required to answer a question or if a tool needs to be executed to complete a task.
In contrast, non-agentic RAG pipelines incorporate meticulously designed prompts that integrate context information (typically through a context variable within the prompt) sourced from the vector store before initiating a request to the LLM for a response. AI agents possess the autonomy to determine the "best" method for accomplishing a task or providing an answer. This experiment presents a straightforward agentic RAG workflow. In upcoming posts, I will delve into more complex, agent-driven RAG solutions. More details about Autogen Agents can be accessed here.
I set up two Autogen agent instances designed to simulate or engage in a question-and-answer chat conversation among themselves to carry out search tasks based on the input messages. To facilitate the agents' ability to search and fetch query results from the Azure AI Search vector store via function calls, I authored a Python function that will be associated with these agents. The AssistantAgent, which is configured to invoke the function, and the UserProxyAgent, which is tasked with executing the function, are both examples of the Autogen Conversable Agent class.
The user agent begins a dialogue with the assistant agent by asking a question about the search documents. The assistant agent then gathers and synthesizes the response according to the system message prompt instructions and the context data retrieved from the vector store.
The snippets below provide the definition of Autogen agents and a chat conversation between the agents. The complete notebook implementation is available in the linked GitHub repository.
Last Thoughts:
The assistant agent correctly answered all six questions, aligning with my assessment of the documents' information and ground truth. This proof of concept demonstrates the integration of pertinent services into a RAG workflow to develop an LLM application, which aims to substantially decrease the time frame for processing claims in the auto insurance industry scenario.
As previously stated, each phase of the RAG workflow is crucial to the response quality. The system message prompt for the Assistant agent needs precise crafting, as it can alter the response outcomes based on the set instructions. Similarly, the custom retrieval function's logic plays a significant role in the agent's ability to locate and synthesize responses to the messages.
The accuracy of the responses has been assessed manually. Ideally, this process should be automated.
In an upcoming post, I intend to explore the automated evaluation of the RAG workflow. Which methods can be utilized to accurately assess and subsequently refine the RAG pipeline?
Both the retrieval and generative stages of the RAG process require thorough evaluation.
What tools can we use to accurately evaluate the end-to-end phases of a RAG workflow, including extraction, processing, and chunking strategies? How can we compare various chunking methods, such as the page-based chunking described in this article versus the recursive character text split chunking option?
How do we compare the retrieval results of an HNSW vector search algorithm against the KNN exhaustive algorithm?
What kind of evaluation tools are available and what metrics can be captured for agent-based systems?
Is a one-size-fits-all tool available to manage these? We will find answers to these questions.
Moreover, I would also like to examine and assess how this and other RAG and generative ai workflows are reviewed to ensure alignment with the standards of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability as defined in the Responsible AI Ethics framework for building and developing these systems.
Continue reading...
In this post, I explore a recent experiment aimed at creating a RAG pipeline tailored for the insurance industry, specifically for handling automobile insurance claims, with the goal of potentially reducing processing times.
I also showcase the implementation of Autogen AI Agents to enhance search retrieval through agent interaction and function calls on sample auto insurance claims documents, a Q&A use case, and how this workflow can substantially reduce the time required for claims processing.
RAG workflows in my opinion represent a novel data stack, distinct from traditional ETL processes. Although they encompass data ingestion and processing similar to traditional ETL in data engineering, they introduce additional pipeline stages like chunking, embedding, and the loading of data into vector databases, diverging from the standard Lakehouse or data warehouse pipelines.
Each stage of the RAG application workflow is pivotal to the accuracy and pertinence of the downstream LLM application. One of these stages is the chunking method, and for this proof of concept, I chose to test a page-based chunking technique that leverages the document’s layout without relying on third party packages.
Key Services and Features:
By leveraging enterprise-grade features of Azure AI services, I can securely integrate Azure AI Document Intelligence, Azure AI Search, and Azure OpenAI through private endpoints. This integration ensures that the solution adheres to best practice cybersecurity standards. In addition, it offers secure network isolation and private connectivity to and from virtual networks and associated Azure services.
Some of these services are:
- Azure AI Document Intelligence and the prebuilt-layout model.
- Azure AI Search Index and Vector database configured with the HNSW search algorithm.
- Azure OpenAI GPT-4-o model.
- Page-based Chunking technique.
- Autogen AI Agents.
- Azure Open AI Embedding model: text-ada-003.
- Azure Key Vault.
- Private Endpoints integration across all services.
- Azure Blob Storage.
- Azure Function App. (This serverless compute platform can be replaced with Microsoft Fabric or Azure Databricks)
Document Extraction and Chunking:
These templates include forms with data detailing the accident location, description, vehicle information of the involved parties, and any injuries sustained. Thanks to the folks at LlamaIndex for providing the sample claims documents. Below is a sample of the forms template.
claims sample form
The claim documents are PDF files housed in Azure Blob Storage. Data ingestion begins from the container URL of the blob storage using the Azure AI Document Intelligence Python SDK.
This implementation of a page-based chunking method utilizes the markdown output from the Azure AI Document Intelligence SDK. The SDK, setup with the prebuilt-layout extraction model, extracts the content of pages, including forms and text, into markdown formats, preserving the document’s specific structure, such as paragraphs and sections, and its context.
The SDK facilitates the extraction of documents page by page, via the pages collection of the documents, allowing for the sequential organization of markdown output data. Each page is preserved as an element within a list of pages, streamlining the process of efficiently extracting page numbers for each segment. More details about the document intelligence service and layout model can be found at this link.
The snippet below illustrates the process of page-based extraction, preprocessing of page elements, and their assignment to a Python list:
page extraction
Each page content will be used as the value of the content field in the vector database index, alongside other metadata fields in the vector index. Each page content is its own chunk and will be embedded before being loaded into the vector database. The following snippet demonstrates this operation:
Define Autogen AI Agents and Agent Tool/Function:
The concept of an AI Agent is modeled after human reasoning and the question-and-answer process. The agent is driven by a Large Language Model (its brain), which assists in determining whether additional information is required to answer a question or if a tool needs to be executed to complete a task.
In contrast, non-agentic RAG pipelines incorporate meticulously designed prompts that integrate context information (typically through a context variable within the prompt) sourced from the vector store before initiating a request to the LLM for a response. AI agents possess the autonomy to determine the "best" method for accomplishing a task or providing an answer. This experiment presents a straightforward agentic RAG workflow. In upcoming posts, I will delve into more complex, agent-driven RAG solutions. More details about Autogen Agents can be accessed here.
I set up two Autogen agent instances designed to simulate or engage in a question-and-answer chat conversation among themselves to carry out search tasks based on the input messages. To facilitate the agents' ability to search and fetch query results from the Azure AI Search vector store via function calls, I authored a Python function that will be associated with these agents. The AssistantAgent, which is configured to invoke the function, and the UserProxyAgent, which is tasked with executing the function, are both examples of the Autogen Conversable Agent class.
The user agent begins a dialogue with the assistant agent by asking a question about the search documents. The assistant agent then gathers and synthesizes the response according to the system message prompt instructions and the context data retrieved from the vector store.
The snippets below provide the definition of Autogen agents and a chat conversation between the agents. The complete notebook implementation is available in the linked GitHub repository.
Last Thoughts:
The assistant agent correctly answered all six questions, aligning with my assessment of the documents' information and ground truth. This proof of concept demonstrates the integration of pertinent services into a RAG workflow to develop an LLM application, which aims to substantially decrease the time frame for processing claims in the auto insurance industry scenario.
As previously stated, each phase of the RAG workflow is crucial to the response quality. The system message prompt for the Assistant agent needs precise crafting, as it can alter the response outcomes based on the set instructions. Similarly, the custom retrieval function's logic plays a significant role in the agent's ability to locate and synthesize responses to the messages.
The accuracy of the responses has been assessed manually. Ideally, this process should be automated.
In an upcoming post, I intend to explore the automated evaluation of the RAG workflow. Which methods can be utilized to accurately assess and subsequently refine the RAG pipeline?
Both the retrieval and generative stages of the RAG process require thorough evaluation.
What tools can we use to accurately evaluate the end-to-end phases of a RAG workflow, including extraction, processing, and chunking strategies? How can we compare various chunking methods, such as the page-based chunking described in this article versus the recursive character text split chunking option?
How do we compare the retrieval results of an HNSW vector search algorithm against the KNN exhaustive algorithm?
What kind of evaluation tools are available and what metrics can be captured for agent-based systems?
Is a one-size-fits-all tool available to manage these? We will find answers to these questions.
Moreover, I would also like to examine and assess how this and other RAG and generative ai workflows are reviewed to ensure alignment with the standards of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability as defined in the Responsible AI Ethics framework for building and developing these systems.
Continue reading...