Document Ingestion for Gen AI Applications using Logic Apps from 1000+ data sources!

  • Thread starter Thread starter DivSwa
  • Start date Start date
D

DivSwa

Data is central to building any AI application, and efficient data ingestion is critical for success. With over 1,400 enterprise connectors, Logic Apps provides unparalleled access to a wide range of systems, applications, and databases, making it easier than ever to create powerful generative AI applications. By leveraging connectors like Azure OpenAI and Azure AI Search, businesses can seamlessly implement the Retrieval-Augmented Generation (RAG) pattern, allowing the ingestion and retrieval of data from multiple sources with ease.



What’s New !!




We are excited to share the public preview of two new actions in Azure Logic Apps—Parse Document and Chunk text. With these additions, building ingestion workflows for AI applications to “Chat with your Data” is now possible in just six simple steps, entirely out of the box and without writing a single line of code!



These actions are built on the Apache Tika toolkit and parser libraries, allowing you to parse thousands of file types such as PDF, DOCX, PPT, HTML, and more, in multiple languages. You can seamlessly read and parse documents from virtually any source, all without needing any custom logic or configuration!



This no-code approach enables you to automate complex workflows—whether it’s document parsing, data chunking, or powering generative AI models—helping you unlock the full potential of your data with minimal effort.



In addition to these out-of-the-box actions, Azure Logic Apps also offers pre-built templates for data ingestion from many common data sources, including SharePoint, Azure File Storage, Blob Storage, SFTP, and more, helping you rapidly build and deploy your applications.





RAG based Ingestion with Azure Logic Apps​




DivSwa_0-1726842317943.png





In RAG (Retrieval-Augmented Generation), the ingestion process involves several stages to ensure that documents are processed, retrieved, and used effectively by generative AI models. Here’s a breakdown of each stage and how you can use Logic Apps for them –



  • Document collection – Leverage 1400+ connectors in Logic Apps to gather relevant documents, datasets or other sources of information.



  • Document Parsing – Leverage Parse a document to convert content, such as PDF document, CSV filePPT and so on, into a tokenized string



DivSwa_1-1726842376475.png


  • Document Chunking – Leverage Chunk text to split tokenized content into smaller, manageable chunks for processing in the subsequent steps by AI models. The action provides options to choose chunking strategy, token size, etc so that users can configure the chunks so that they are optimal size and in accordance to their AI models



  • Vectorization – Leverage Azure Open AI connector, and specifically Generate Embeddings action to convert the tokenized chunks into vector embeddings. The embeddings represent text in a format in a way that AI can understand and compare for efficient retrieval.



  • Ingestion – Prepare data for ingestion using Select action by mapping the generated embeddings to the Azure AI search index schema. Then, use the Azure AI Search connector and Index multiple documents action to store the vector embeddings in the vector database for fast and efficient similarity-based searches.

Here is a sample workflow that triggers when a new file is Created at Sharepoint site, and then ingests into Azure AI search with all out of box actions.​


DivSwa_2-1726842452990.png


Getting Started!​




Logic Apps now offers pre-built templates for RAG ingestion, allowing quick onboarding by connecting common data sources like SharePoint, Azure File, SFTP, Azure Blob Storage, and more. These templates accelerate development time, enabling users to get started fast while maintaining the flexibility to customize workflows to meet specific needs. If you don’t see a template for your preferred data source, let us know, and we’ll add it. You can also modify an existing template or start from scratch with an empty workflow.





DivSwa_3-1726842507834.png



And here's a video that walks through this capability in more details. As always, please reach out to us for any questions or feedback.

Continue reading...
 
Back
Top